I need a routine in C# that can determine which human language a piece of random text is. For example, the string "This is an English sentence." would produce the output 'English' and "Esso es Espanol" would output 'Spanish'. Where 'English' and 'Spanish' would be labels of an enum.
Fortunately, there is an open source Perl program that is just a few pages that does a fairly reasonable job of this. It is available for download from:
<[url removed, login to view]~vannoord/TextCat/>
All I need is someone who is familiar with Perl to translate the program into C#. NO GUI required, except enough to test the program, it can even be a console application. I really don't care about the GUI at all.
While translating TextCat is probably the easiest approach, I will entertain any bids for alternative algorithms implemented in C# if the accuracy of the algorithm can be demonstrated to be above 95%.
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):
a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.
b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
Windows with .NET extensions installed (Windows XP is preconfigured in this manner)
Both unicode and plain vanilla C strings must be handled. (Easy enough in C#)