We need to scrape the links in the enclosed file (there is only 30 000 links- the full number of links is about 135,000)
When is scraped some link from the enclosed file, you get list of items from auction portal - Allegro.pl.
We are interested in car parts. The items, you get by scraping the links in the enclosed file, are only autoparts for specified cars we are interested in !
The list of items you get by scraping of of one link can be on 1 page or on more pages - up to 100 html pages.
In the html code of page is JSON structure, where is list of items, with data on any item in the list.
For every item we need these data - ID, NAME , PRICES , LINKS TO ALL ITEMS IMAGES .
For each item you must also write to result data MODEL_ID and KATEGORIE from the enclosed file where are the links to scrape !!!!
Our problem we have ?
We scraped these data for years, but [login to view URL] - the auction portal added some restrictions on the number of pages downloaded during the short time period.
Those restrictions are as follows - after about 300 downloaded pages (9 minutes), the next links and pages walking is interrupted, and you must fill in the captcha.
Because the number of pages that need to be downloaded is hundreds of thousands, it is not possible for us to get that data by downloading all the pages we are interested in.