Distribute Outbound Requests to Farm of Proxy Servers

I have a web application that for certain interactions must query Google and other sites that do not offer APIs. A HTTP request is made and the page is scraped, The issue is that I am quickly blocked, either straight-up or forced to fill in a captcha. No good.

What I would like is a platform that I can configure with many different private proxy accounts. I submit the request through this system. This system intelligently distributes incoming requests to the proxy servers randomly. When blocks are detected, that proxy must be removed from the farm.

The details of the system will be more complicated I'm sure - but that's the general idea. Be able to route HTTP requests through a pool of proxy servers in order to avoid getting blocked by sites that prohibit scraping. The system should be housed on AWS or Google App Engine. Google App Engine would be ideal.

I will only entertain responses that prove you understand the problem and are willing to work through a decent solution. Canned responses will be ignored immediately.

Færdigheder: Ingeniørarbejde, Netværksadministrator, Python, Software Arkitektur, Web Sikkerhed

Se mere: system architecture web application, submit google app, google system architecture, aws engineering, submit responses, farm, distribute, route software, aws configure, aws accounts, system proxy, google proxy captcha, google app engine application, order query platform, aws solution, pool software, farm app, forced software, aws architecture, captcha problem proxy, web proxy app, pool app, proxy google captcha, proxy private, proxy page

Om arbejdsgiveren:
( 2 bedømmelser ) Evansville, United States

Projekt-ID: #5992422