Distribute Outbound Requests to Farm of Proxy Servers

I have a web application that for certain interactions must query Google and other sites that do not offer APIs. A HTTP request is made and the page is scraped, The issue is that I am quickly blocked, either straight-up or forced to fill in a captcha. No good.

What I would like is a platform that I can configure with many different private proxy accounts. I submit the request through this system. This system intelligently distributes incoming requests to the proxy servers randomly. When blocks are detected, that proxy must be removed from the farm.

The details of the system will be more complicated I'm sure - but that's the general idea. Be able to route HTTP requests through a pool of proxy servers in order to avoid getting blocked by sites that prohibit scraping. The system should be housed on AWS or Google App Engine. Google App Engine would be ideal.

I will only entertain responses that prove you understand the problem and are willing to work through a decent solution. Canned responses will be ignored immediately.

Evner: Ingeniørarbejde, Netværksadministrator, Python, Software Arkitektur, Web Sikkerhed

Se mere: you proxy google, web scraping solution, system architecture of a web application, submit google app, google system architecture, google app engine web scraping, aws engineering, submit 50 responses, http web requests, farm, distribute, route software, http proxy request, aws configure, AWS Accounts, system proxy, google proxy captcha, google app engine application, order query platform, http proxy work, aws solution, google app engine web application, aws web scraping, pool software, farm app

Om arbejdsgiveren:
( 2 bedømmelser ) Evansville, United States

Projekt ID: #5992422