* Topic: Multiprocessing/Multithreading scrapy application with proxy
* Expectation: Please only bid if you had project experience in this domain. No students PLEASE. Expert level (300-400) conversation / consultation / demo for 2 hours. Architecture diagram deliverables and component demo code for proof of concept (Not production ready code).
* Project background: we have millions of tasks stored in MySQL, and planning to develop a multi-threading/processing application with scrapy to perform these tasks. The end goal is to have multiple Scrapy instance assign tasks from database independently, get tasks from database for that particular instance, complete tasks inside Scrapy in multi-threading manner, then bulk upload results back to MySQL. Potentially, this will be deploy in Docker cluster (open for suggestions). We need some overall high level consultation on the following topics:
* Python Scrapy multi-threading/processing management; What modules/packages are available on the market; What are the pros/cons; Implementation for: fetching, throttling (non-blocking), completing, data collecting, graceful termination, monitoring (Component Demo Code needed)
* Proxy implementation with retry; scrapy middleware? How this tie into the whole architecture; IP rotations; (Component Demo Code needed)
* User Agent switching; (Component Demo Code needed)
* Feel free to ask any questions.