I need this project to be completed as soon as possible. It requires a programmer with well-developed web scrapping skills. If interested, please send me: (i) A bid; (ii) An estimate of how long this will take you; and (iii) A very brief explanation of how you will execute this task.
These are the instructions in detail:
1. The comma-delimited text file “[login to view URL]” is a list of 12977 names with 4 columns: ROWID, NOMBRES, APELLIDO_PATERNO, and APELLIDO_MATERNO.
2. For each row in [login to view URL], go to [login to view URL] and enter the NOMBRES, APPELIDO_PATERNO, and APELLIDO_MATERNO in the search engine. Then click on “buscar”.
3. Click on the person that EXACTLY matches the information entered in the step above. (see [login to view URL] for more information on this).
4. Click on the PROCESOS ELECTORALES tab. (URL finishes in “IdTab=1”). Check if the politician was a mayoral candidate (i.e., either “ALCALDE DISTRITAL” or “ALCALDE PROVINCIAL”) for the election “ELECCIONES REGIONALES Y MUNICIPALES 2014”. You will see these in the sub-table (see [login to view URL]). If yes, go to 5. If not, move on to the next name.
5. Click on the “HOJA DE VIDA” of that corresponds to the 2014 election “ELECCIONES REGIONALES Y MUNICIPALES 2014”. This link is embedded in the PROCESOS ELECTORALRES sub-table. The link in the uppermost part of the webpage saying “ver hoja de vida” is NOT the one we want.
6. Scrape all the data found in the HOJA DE VIDA. The freelancer will need to make sure that his/her code extracts *all* the information available. Also, the freelancer will figure out the best way for him/her to report the scrapped data. I suggest a rectangular format (or several tables) where each row correspond to a politician and each column to an item of the HOJA DE VIDA. The key is that I will need to be able to link each piece of information to a rowid in [login to view URL] and the politician id that can be found in the URL of PROCESOS ELECTORALES (IdPolitico).
7. Save the PROCESOS ELECTORALES tab (URL finishes in “IdTab=1”) as HTML with the name “IdTab1_IdPolitico#.html, where # is the politician’s id number. Do the same for the HISTORIAL PARTIDARIO tab (URL finishes in “IdTab=0”). Save that web page as HTML with the name “IdTab0_IdPolitico#.html”.
8. Record all your steps in “[login to view URL]”. The idea is to save all the URLs from which information was downloaded and the corresponding file names. See the attached example for details.
9. I am attaching and example ([login to view URL]), the name list, and further clarifications. Please, do take a detailed look at each of these. Also, use the example logfile I provide as a template for yours.
The deliverables for this project are:
a) All downloaded files.
b) Dataset(s) with the scraped information of the HOJAS DE VIDA (XLSX).
c) A complete logfile (XLSX).
d) The code you used to download the information.
Thanks,
Hi there, I have read the project description.. I will write a scraper script/software to do the job. will provide both data and script. Let me know & we can discuss details.. Thanks..