We are building a web-shopping engine, we need a robot to crawl online web catalogs and excel file; the sellers inventory. Then treat the data in order to add it to a standardised-searchable database.
Every inventory will have different shape and format so we hope to have a clever robot that is going to be able to treat the information without need of writhing a script for each excel file and html catalogs.
• What does this html page and file characterize?
They are wine shops price list, inventory.
• Where are they located?
90 % of them are located on a server where we have access simply with the URL
10 % are located on wine shop’s computer these once are going to be sent to us by email
• What is their format?
Every shop builds their inventory with different software: Html, Excel … But also they use their own style to design the inventory.
Plus, they use different currency, abbreviation (France – FR), (Bordeaux – BDX), (Bottle – 75cl) and some of them do spelling mistake. Some include taxes, some don’t.
• What is our format?
In order to have a searchable data base we need to standardize all this data in our own format.
For example we will change: Bordeaux, BDX, and Bordau to &#8594; BRDX
• What we need at the end?
A clean Mysql database with all their data in our format
We need a robot-engine that is going to be able to:
Crawl the webpage inventory of the seller
Download the online excel format inventory of seller from url
Receive excel file by email a treat them automatically
Change the prices to us dollars
Remove sale tax if price include tax
Change the data to our format
Add to the database
The table we want to enrich is as follow:
• Shop identification code
• Date and time we have collected this information
• Product name
• Packaging (1x75cl, 12X75cl, [url removed, login to view])
• Price excluding taxes in dollars
• IB, DP
• Url of the product’s page if exist
As the database could probably end up with a few million offer, it should be optimized to be searchable very fast.
25 freelancere byder i gennemsnit $2768 på dette job
Hello. Please check PMB. I'm experienced Programmer with skills & knowledge. I can write this crawler following Your guidelines in the specified time frames. I'm available 8 hours a day on-line.
Hello, We have read the posting and would like develop as per your [login to view URL] check your private messages for more information about this project. I hope to assist you with it.