Merge and sequence different sections from input package consisting of text, figures, tables etc.


• Classify different sections of a document and output the sequenced sections as per input specifications

Required Skill set

1. Word Document (.docx) Understanding

• Understanding of the internal storage of word documents i.e. xml for text part and objects (figures, tables, equations (Word equations and Mathtype))

2. NLP Experience

• Extracting features from text provided

• Libraries like Python-NLTK/Stanford NER etc. or other libraries can be used to accomplish this

3. Machine Learning Experience

• Familiar with building classification models like SVMs, Random Forest, Neural Networks etc.

• ML Libraries like scikit-learn, statsmodels etc. or WEKA etc. can be used to build the classification models

Detailed requirements will be shared with selected vendors.

