The maximum for this project that I can pay is $100.
This project has 3 separate but related tasks. They should all be combined onto a nice looking GUI (web based and also a standalone GUI).
Both start with a .csv file as input. The csv file has 7 columns. Column number 4 is called Text and it contains a string of text. Column number 5 is called location and it has either a city name, a city name and country name, or a country name, (or it could be blank or unreadable). Task one has to do with column 4 (text) and task two has to do with column 5 (location). Both involve looking up information in other files and entering the result into the .csv file. I will call the .csv file [url removed, login to view] from now on. The result of this program should be an executable .exe file and gui.
I will provide you with 5 .csv files. These contain lists of words. Altogether I think there are 76 lists. For each record in [url removed, login to view] you have to search through these lists and and count how many of the words in each of the 76 lists appear in the Text field (column 4) and output the original file but with an extra 76 or so columns (actually there are a few more that are based on simple calculations of a few of these results).
There are 2 additional files for this task [url removed, login to view] and countries.txt. These two files provide the longitude and latitude of cities throughout the world. For task 1 you have to find the longitude and latitude for each record in [url removed, login to view] and append them to the records.
It will turn out that some of the records won't have a longitude/latitude pair assigned to them. In this task you are to combine the results from Task 1 and Task 2 but in such a way that only records that have a longitude/latitude pair are output. So even though all of the records from task 1 will have the extra 76 appended fields, the result of task 3 will be a file which only has the records that have a long/lat pair and this file will include the long field, the lat field, and the 76 other fields (but only for records that have long/lat)
There should be an option for outputting the results from just task 1, just task 2, or just task 3 in the GUI.
The outputs have to be input-able into a program called weka [url removed, login to view] . This means, I think, that text fields must be quoted, commas, apostrophes, and quotes, need to be escaped in text and so on. I can provide you with a perl script that does this kind of formatting (but it does it for different data). Also the dates have to be formatted correctly as well. Again you can look at the perl script that I have for guidance on this (but it won't work to just copy and paste because the input file was different). In fact though the search through the 76 fields is implemented in this perl script as well but there is no compiled .exe version and it is for a different input file as well.
In the case where there are various cities with the same name and only the city is given the choose the most populated one.
The project should have two GUI s, one is web based and the other is stand alone.
This has to be written in such a way that if we want to call this program from a perl or C# gui other than the one you are making that it will be easy to do this. The details of how this can be done must be explained in your documentation. Your code should be nicely documented and usage should be explained as well.
This should be written in such a way that if the field number for the city and country field changes that it will be easy to change the code to adjust for this. And the same for the location of the Text field. So make variables for these that can be reset.
Note: the lists of words could change in the future and so the program has to be written so that this is possible.
I am also attaching a file for task 2 input: [url removed, login to view] is a sample for task 2 but it only has a few of the fields.
Actually the numPos and numNeg come from the [url removed, login to view] and
[url removed, login to view] files.