We have a project that reads documents and we need to quickly get it off the ground and get it moving so we can go live. The number of formats is 500 million different designs on the documents. So it cannot be a template based solution.
Please note this is not a long bench with months, we are talking about 1-2 weeks at most.
We have not started mapping the labels but we have Tesseract v5 running and it stores it in the DB for sure.
So, we now need to do some text/keyword value mapping.
- it is focus on the detection of strings.
- SPACY, NLTK, Gensim is on the table, which one is best and fastest?
- Geometric detection is then the best way.
- Graph Neural Network is an alternative to enhance the cross-checks.
The values we must map is in the attached file.
Milestones and some estimated time to do it.
MS1 OCR and parser
MS2, 12 labels, easy
MS3, 20 Labels, cross-check with geometric and tabula
MS4, 5 labels
MS5, 5 labels, we might skip them
We work here and now, not in 1,2 or 3 weeks. The platform must be launched now.
We also need to be able to compile the python code and run it under W10 or MAC.
This process is handled by another developer.
We will when we have more documents be able to do ML/DL/NN also to complement it.
For clarification: I do not expect 100% in precision the first day, but you must know what you are doing.
You are the expert and you should be able to advice me what the best way is. I have a wealth of experience, but you get humle from knowledge how little you know.
The drive forward is of essence. Do not suggest long term projects here. Now it is time to market so please consider that.
Python, NLP, geometric detection, label parsing
3. Time frame:
1-2 weeks approximately.
5. Number of developers
6. Code stack
Python, must be possible to make it compiled python (our other developer will do that)
I am looking forward to your proposal.
23 freelancere byder i gennemsnit $2325 timen for dette job
Hi, Robert. I just read your posting. I have rich experience in OCR using python for 5 years. so I think we can help you about this issue. Let's discuss more in chat Best Regards Elite Lancers Team
Hello there. We will build a machine learning platform so that new variants of documents can be passed through it. Kindly contact me and lets discuss more.