I gang

Web Site Scraping

Web Site Scraping

We want to build a service which srapes web sites in order to maintain an external database and to extract data from dynamic web pages. The targeted website has to be entered through a log into site.

The service will be initiated by an external scheduler. The external scheduler uses XML code which contains all information for the service. The service shall execute the following steps

a) receive XML

d) pass the log into site

c) maintain the external database

d) extract data

e) send XML

Once the service is finished, it shall report its success (XML).

Technical details: Communication only via XML interface. The XML schema is given. We expect cURL or Java. Multiple instances on the same machine are required.

As a contractor you can use a testing system for the XML interface. Regarding the third party websites you will receive the login data for a user account and a screen shot documentation of the manually maintenance for every targeted web site. Please note that we cannot provide a testing system for third party websites, every change is real life and has to be restored to the original data.

We want to scrape 250 web sites successive within the next months. This is an enquiry for the first package of 25 web sites. Ongoing we need another 10 a month, eventually up to 25 a month.

At the moment we are asking for external development only and will do the ongoing maintenance by ourselves. In a further stage we will shift this work as well.

Færdigheder: Dataindførsel, Databehandling, Java, PHP, XML

Se mere: work web site, web site work, web site service, website web site, web site interface, website development enquiry, web development steps, web contractor, uses php web development, user testing sites, system development web, steps website development, steps web development, site maintenance service, service dynamic, php java web development, php contractor, order web site, order web development, note web development, java website development site, java web code, java shift, development web site, development web information

Om arbejdsgiveren:
( 9 bedømmelser ) Berlin, Germany

Projekt-ID: #64189