Local word grouping: NLP

Overview of the task:

This is a project to find a particular type of words from any given Hindi text labeled with their parts of speech. It is a task of natural language processing (NLP).

It requires a high level of programming knowledge in Java (especially the string/text processing).

Description of the task:

Input: Parts of Speech (PoS) tagged text, one sentence per line.

Desired Output: Verbal words boundary marked with a special predefined label.

Example Input text:

वह\PPR.0.sg.3.dir.0.n.n.0.n किताब\NC.fem.sg.dir.0 पढ़ेगा\VM.mas.sg.3.fut.sim.dcl.fin.n ।\PU

वह\PPR.0.sg.3.dir.0.n.n.0.n किताब\NC.fem.sg.dir.0 पढ़ता\VM.mas.sg.3.0.impf.dcl.fin.n है\VAUX.0.sg.3.prs.sim.dcl.fin.n ।\PU

वह\PPR.0.sg.3.dir.0.n.n.0.n किताब\NC.fem.sg.dir.0 पढ़\VM. रहा\VAUX.mas.sg.3.0.pft.dcl.fin.n है\VAUX.0.sg.3.prs.sim.dcl.fin.n ।\PU

Example Output text:

वह किताब [पढ़ेगा][url removed, login to view] ।\PU

वह किताब [पढ़ता है[url removed, login to view] ।\PU

वह किताब [पढ़ रहा है][url removed, login to view] ।\PU


वह\PPR.0.sg.3.dir.0.n.n.0.n किताब\NC.fem.sg.dir.0 [पढ़ेगा\VM.mas.sg.3.fut.sim.dcl.fin.n][url removed, login to view] ।\PU

वह\PPR.0.sg.3.dir.0.n.n.0.n किताब\NC.fem.sg.dir.0 [पढ़ता\VM.mas.sg.3.0.impf.dcl.fin.n है\VAUX.0.sg.3.prs.sim.dcl.fin.n][url removed, login to view] ।\PU

वह\PPR.0.sg.3.dir.0.n.n.0.n किताब\NC.fem.sg.dir.0 [पढ़\VM. रहा\VAUX.mas.sg.3.0.pft.dcl.fin.n है\VAUX.0.sg.3.prs.sim.dcl.fin.n][url removed, login to view] ।\PU

(Note: If you see junk text instead of the Hindi characters in the examples above, please see the attached file.)

Other Details:

There are a total of 192 tags that can be assigned as the boundary marker for the verb group.

More details will follow. A brief of the algorithm is as follows:

1. Start searching for verbal word from the right of the sentence boundary.

2. When a verbal word is found, match it with a template and store it.

3. Continue the search rightward looking for more verbal words and match it with the TAM template. Continue it till the last verbal word found in the sequence.

4. From among the templates matched, choose the longest verb sequence matched with the TAM template and mark the boundary of the verb sequence within square braces ‘[ ]’.

5. Assign the tag of the TAM template matched at the end of the square bracket, prefixed with /VG./.

Evner: Java, JSP

Se mere: local word grouping, word match algorithm, word find template, text search algorithm, template for algorithm, template algorithm, string searching in c, string searching algorithm, string searching, string search algorithm c, string search algorithm, string processing in c, string processing algorithm, string match algorithm, string match, string algorithm, start java programming, search string examples, searching java, searching a string in c, searching a string, searching algorithm in c, searching algorithm, search algorithm examples, search algorithm example

Om arbejdsgiveren:
( 2 bedømmelser ) New Delhi, India

Projekt ID: #992469

Tildelt til:


Hi, I think I can do this if you can provide some details in the PMB. Cheers!!

$520 USD in 30 dage
(3 bedømmelser)

7 freelancere byder i gennemsnit $546 på dette job


Hello, Please view PMB. Ashwin

$750 USD in 15 dage
(39 bedømmelser)

Hi, I can do it. Conatact me to discuss details.

$500 USD in 10 dage
(7 bedømmelser)

Please see PMB

$500 USD in 5 dage
(2 bedømmelser)

Hi, I'm a professional Java developer (SCJP 6) with experience in NLP and text processing. Contact me to discuss it further.

$500 USD in 10 dage
(1 bedømmelse)

Hello! I am developing in Java, JSP since 2000, so I have a huge experience. I would be thrilled to do this project for You.

$400 USD in 15 dage
(0 bedømmelser)

check my PM

$650 USD in 20 dage
(0 bedømmelser)