I have a web application that for certain interactions must query Google and other sites that do not offer APIs. A HTTP request is made and the page is scraped, The issue is that I am quickly blocked, either straight-up or forced to fill in a captcha. No good.
What I would like is a platform that I can configure with many different private proxy accounts. I submit the request through this system. This system intelligently distributes incoming requests to the proxy servers randomly. When blocks are detected, that proxy must be removed from the farm.
The details of the system will be more complicated I'm sure - but that's the general idea. Be able to route HTTP requests through a pool of proxy servers in order to avoid getting blocked by sites that prohibit scraping. The system should be housed on AWS or Google App Engine. Google App Engine would be ideal.
I will only entertain responses that prove you understand the problem and are willing to work through a decent solution. Canned responses will be ignored immediately.