I have a .csv file. Each column represents certain information of a newspaper article (year, publisher, body of text). Each row represents one newspaper article. Based on the text of the "body" column, I want new columns to be created for all rows of newspaper articles: (1) total number of words, (2) number of times any country from a list of countries (excluding the U.S.) is found, (3) most frequent word (excluding stop words), (4) number of times the most frequent words appear, (5) second most frequent word (excluding stop words), (6) number of times the second most frequent words appear, (7) third most frequent word (excluding stop words), (8) number of times the third most frequent word appear, (9) fourth most frequent word (excluding stop words), (10) number of times the fourth most frequent word appear, (11) fifth most frequent word (excluding stop words), (12) number of times the fifth most frequent word appear
I am looking for (1) the .py program that would create the above columns to the given .csv file (can use .xlsx file if preferred) and (2) the output .csv (or .xlsx file) with the above columns added to the original file. Experience with Python (pandas, nltk) is necessary.
A link to a shared Dropbox folder with the .csv (and .xlsx) file will be provided when the project is awarded. The .csv file is 138MB with 42967 lines of data (i.e., 42967 newspaper articles).
This is a simple test run to another project, and satisfactory completion of this project will lead to a larger related project. Preferences will be given to those that can complete this mini project as soon as possible.
6 freelancere byder i gennemsnit $30 på dette job
hi i am very interested in your task i am python expert and have rich experience in data processing , handling excel , text analysis i can help you with good result and work for you as long term thanks