Word/Phrase Tokenzier
$30-100 USD
Betalt ved levering
Java Tokenzier
In java, write a tokenzier class to tokenize a string into a word, phrase (greedy style), or other tokens according to the convention used by break iterator (i.e. subclass break iterator). Return type is List. The dictionary of reference is [url removed, login to view], initialize/cache the memory with pharses (2 words or more) for better performance. Speed is extremely important. Please discover the most optimal phrase search alogrithm.
Test Input:
I like coffee table!
Test Output:
list("I", " ", "like", " ", "coffe table", "!")
I have attached a code written to parse chinese language and found their greedy search algorithm to be usable. However, the code is buggy and has a lot of undesired processing for chinese characters. Please recommend a better alogrithm if your bid message.
Projekt ID: #65205
Om projektet
Tildelt til:
9 freelancere byder i gennemsnit $69 timen for dette job
I can do this for you in a very efficient speed, but as it was asked why is "coffee table" not "coffee" " " "table"
Hi , I would not use StringTokenizer but regular expression. Get in touch and we can continue this conversation. Regards Brocker
Tokenizer shouldnt be a problem, I am assuming you are looking at some sort of translator program that breaks each word up and then returns a Chinese equivalent.
Will code for you and explain the code to you over the phone if you reside in the USA or Canada - or Over Skype to Skype anywhere in the world. Let me know if you would likt this to be done fast. Thank you.