I am looking for some way to translate a lot of text on a daily basis in a automated way. The translation can be "dictionary" based in the sense that I don't need to include the grammar in the translation, just the words. I wanted to use google's translate API but they limit the amount of characters per day to 100,000 which is too little. I will post my original specifications for this job below but it seems that google won't work for this project.
I would like someone who has had experience with google api and in particular with the google translate api.
The more details you can provide to me that show evidence that you can do this job the more likely I am to select you.
This has to be done online, however we have csv files on our webserver and we want them translated using the google translate API. Not all of the fields are text and so not all of the fields will need translation.
So the input would be either an already uploaded file, or an upload box would allow the user to upload a file for the translation.
Also on the interface should be 2 pull down boxes where the user can choose the to and from languages (English should be the default choice for the "to" language). It may be the case that the from language might not be known or that there could be multiple languages in the file and so there should be a "unknown or multiple" option in the "from" selection box.
The processing should use google's api to translate each of the records in the input file. I already have an Excel program that does this and I can share this with you.
However, in order not to overload the google service I want to use a PEAR package ([url removed, login to view]) that can detect 52 different languages. Since most of my text is English (about 80 percent), this PEAR package should identify the English records and not send them to the Google API. (of course they have to be in the result file and hopefully in the proper place). Once it is determined that the record is NOT English however it would be sent to the Google API. It may be best to first parse the entire file for non English and then just send them all to the Google API in the most efficient way possible. Afterwards the order of the records could be re-established using the ID field. I believe that Google's auto detect language mechanism should work to know what the non English language is in most cases.
Since you will be using google translate you only are responsible for making this work with languages that google can translate.
In addition to the translation there should be a new field added to the record indicating which language it was originally.
The output should be file with the same name as the original file but with the phrase "translatedToEnglish" or whichever language it is translated to appended to the end of the file name. The user should have the choice of downloading this file to their computer or saving it to the server (if to the server they will need a password so you should put that in the code as well.).
And could you have a little "visual" to show that the program is working? Some files will be 3000 lines. Some could be 150,000 and bigger so it may take some time to process. If it will take more than two minutes the user should be sent an email saying the process is finished if you can do that (not required)
As I said above, Since there are so many records to translate I want to use some sort of software on the server to filter out the English text so that it is not sent to Google's API. I suggest this package but there may be others.
[url removed, login to view]
One thing I am not sure about is what this Pear package does in the case of a language not in its database. It says it does not detect Eastern Asian languages. I am only envisioning using it for detecting English versus Not-English and so it should say Not-English if it is both not English and is Eastern Asian.