Find Jobs
Hire Freelancers

Scrape information from web pages -- 2

$30-250 USD

Annulleret
Slået op over 7 år siden

$30-250 USD

Betales ved levering
I need this project to be completed as soon as possible. It requires a programmer with well-developed web scrapping skills. If interested, please send me: (i) A bid; (ii) An estimate of how long this will take you; and (iii) A very brief explanation of how you will execute this task. These are the instructions in detail: 1. The comma-delimited text file “[login to view URL]” is a list of 12977 names with 4 columns: ROWID, NOMBRES, APELLIDO_PATERNO, and APELLIDO_MATERNO. 2. For each row in [login to view URL], go to [login to view URL] and enter the NOMBRES, APPELIDO_PATERNO, and APELLIDO_MATERNO in the search engine. Then click on “buscar”. 3. Click on the person that EXACTLY matches the information entered in the step above. (see [login to view URL] for more information on this). 4. Click on the PROCESOS ELECTORALES tab. (URL finishes in “IdTab=1”). Check if the politician was a mayoral candidate (i.e., either “ALCALDE DISTRITAL” or “ALCALDE PROVINCIAL”) for the election “ELECCIONES REGIONALES Y MUNICIPALES 2014”. You will see these in the sub-table (see [login to view URL]). If yes, go to 5. If not, move on to the next name. 5. Click on the “HOJA DE VIDA” of that corresponds to the 2014 election “ELECCIONES REGIONALES Y MUNICIPALES 2014”. This link is embedded in the PROCESOS ELECTORALRES sub-table. The link in the uppermost part of the webpage saying “ver hoja de vida” is NOT the one we want. 6. Scrape all the data found in the HOJA DE VIDA. The freelancer will need to make sure that his/her code extracts *all* the information available. Also, the freelancer will figure out the best way for him/her to report the scrapped data. I suggest a rectangular format (or several tables) where each row correspond to a politician and each column to an item of the HOJA DE VIDA. The key is that I will need to be able to link each piece of information to a rowid in [login to view URL] and the politician id that can be found in the URL of PROCESOS ELECTORALES (IdPolitico). 7. Save the PROCESOS ELECTORALES tab (URL finishes in “IdTab=1”) as HTML with the name “IdTab1_IdPolitico#.html, where # is the politician’s id number. Do the same for the HISTORIAL PARTIDARIO tab (URL finishes in “IdTab=0”). Save that web page as HTML with the name “IdTab0_IdPolitico#.html”. 8. Record all your steps in “[login to view URL]”. The idea is to save all the URLs from which information was downloaded and the corresponding file names. See the attached example for details. 9. I am attaching and example ([login to view URL]), the name list, and further clarifications. Please, do take a detailed look at each of these. Also, use the example logfile I provide as a template for yours. The deliverables for this project are: a) All downloaded files. b) Dataset(s) with the scraped information of the HOJAS DE VIDA (XLSX). c) A complete logfile (XLSX). d) The code you used to download the information. Thanks,
Projekt-ID: 12073602

Om projektet

5 forslag
Projekt på afstand
Aktiv 7 år siden

Leder du efter muligheder for at tjene penge?

Fordele ved budafgivning på Freelancer

Fastsæt dit budget og din tidsramme
Bliv betalt for dit arbejde
Oprids dit forslag
Det er gratis at skrive sig op og byde på jobs
5 freelancere byder i gennemsnit $74 USD på dette job
Brug Avatar.
Hi there, I have read the project description.. I will write a scraper script/software to do the job. will provide both data and script. Let me know & we can discuss details.. Thanks..
$100 USD på 1 dag
5,0 (118 anmeldelser)
6,2
6,2
Brug Avatar.
Text me if you are OK with my bid
$111 USD på 2 dage
0,0 (0 anmeldelser)
0,0
0,0

Om klienten

Flag for UNITED STATES
Durham, United States
5,0
3
Betalingsmetode verificeret
Medlem siden aug. 6, 2016

Klientverificering

Tak! Vi har sendt dig en e-mail med et link, så du kan modtage din kredit.
Noget gik galt, da vi forsøgte at sende din mail. Prøv venligst igen.
Registrerede brugere Oprettede jobs i alt
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Indlæser forhåndsvisning
Geolokalisering er tilladt.
Din session er udløbet, og du er blevet logget ud. Log venligst ind igen.