The web crawler should be written in python and run on Windows from the command line.
The command line will have 2 parameters:
1. A name of a CSV file that configures which images to download.
2. An output directory (for saving the downloaded images).
Each line in the CSV configuration file will specify one search definition, and include:
1. search term (e.g.: "curtain", "hand", "man with glasses" etc.)
2. start date
3. end date
4. minimum size (if an image doesn't exists in this size - get the next bigger size)
5. max number of images
For each such line (defining a search), the crawler will open a sub-directory in the output folder,
and put the images for this search there.
Next to each downloaded image, the crawler will create a .txt file with the name of the image
(with a .txt extension), and in this text file it will put:
1. Image name
2. Image date (YYYY-MM-DD)
3. Size in pixels
4. Flickr tags
5. Flickr description
6. The search term used
ADDITIONAL IMPORTANT REQUIREMENTS:
1. The user should be able to stop the crawler, and then restart from it from the same state it was when stopped
(we intend to schedule it to run only during nights and weekends, and then pause during work days).
2. The crawler should run in several threads, in order to accelerate the performance.
15 freelancers are bidding on average $461 for this job
We are the group of the experts to satisfy you with the quality service. Please send the full details of the project to proceed further. Thanks and Regards.
Hello, thank you for the chance to bid on this project for you. I am a professional computer programmer with over 10 years experience. I would love the chance to develop a professional application for you.
I have done this kind of project before in Python. I know how to use multiprocessing/gevent to achieve parallelism and asynchronous I/O (fetch multiple Flickr results at once, make smart use of computer resources).