Færdiggjort

Project for Majid

A python based CLI script that can download all product’s firmware (including all versions) from web pages for a given list of predefined vendors and store the information (meta data) in SQLite [login to view URL] mandatory metadata fields include ( Manufacturer, Model, Version, Type, Name, Release Date(if available), Download link, ( calculated Sha2 hash of the file)i.e. ( Cisco, Video Surveillance 6030 IP Camera, 2.7.0, IP Camera, [login to view URL], 21/08/2015, "link" ) There is a non-mandatory binary field which indicates if the device is discontinued or not depending on the fact that vendor mention that on the website or not. The firmware files itself will be stored in the file system and will be referenced by index ID in SQLite.

The arguments to the script should be a list of comma separated vendor names or the location of a text file containing the vendor name.

There are no GUI components in the server where the script will run so headless mode for browser should be used

Solution Scope

1. Script will be written per vendor. This is required because each vendor website will have its own implementation of the firmware download page.

2. The script will only download new firmware that have been added by the vendor. Hence first execution of script will download all the firmware available but the subsequent runs will only download new ones which will get added. This will be achieved by analysing data available in SQLite and skipping the files that are already been downloaded and processed.

3. Each vendor, that will be provided, will be analysed manually to identify the following, which will be required to develop the script:

a. URL for the firmware download page

b. Credential Requirements (Simple Signups, Specific Signups, No Signups)

c. Any Captcha on the page

d. Any honeypot traps

4. If there are credential required to download the firmware and the credentials are simple ones where a simple sign up is required, the signup will be done manually as part of the manual analysis using a gmail account dedicated for this work.

5. Script will try to imitate human like behaviour (to a limit) while scraping the web page as well as uses Tor, so that if the vendor site has scraper/crawler detection logic implemented, it can be skipped. This will be achieved by adding random delays, random view time, avoiding honeypot traps through manual analysis

Solution Brief

A Python Selenium (if required, sometimes simple requests do the job) and SQLite based solution will be developed which will have the following features/components:

1. File Management Module: Responsible for storing and managing the downloaded files and meta data. Firmware and installer files will be stored on the filesystem which will have a structured folder hierarchy. Meta data of the files will be stored in SQLite. Meta Data will refer to the stored files through paths on the file system and file index/name.

2. Vendor Scrappers: Python Selenium based scrapper will be written for each of the vendor, responsible for downloading the files and grabbing the meta data from the vendor’s site. This will make use of the file management module to store the file and meta data to SQLite.

3. Configuration File: All the configurations for the framework (including vendor specific like credentials, url etc) will be stored in a json file which can be easily modified.

4. Execution Script: The configuration file can be setup to represent the polling interval for each of the vendor scraper and when the execution script is run it will go and schedule each of the vendor scripts individually according the polling interval defined in the config.

We already developed the main skeleton of the scraper including some scrapers and need someone to develop new scrapers for each vendor. There are around 100 vendors and the milestones are defined per vendor and each milestone is max 50€ which is paid after we test the scraper and see no errors. The developer MUST test the scraper before delivering it to us.

Evner: Web Skrabning

Se mere: simple data entryresearch project, simple data mining project, simple data processing project, simple data project, turnover project capable delivering, simple data aggregation project, simple data entry typing project freelancer, simple data entry posting project, simple data entry typing project, simple data compression project java, net simple data entry project, simple data typing project, simple data entry project urgent, simple data entry record project, simple data structure program project, welcome simple data typing project, simple data project vba excel

Om arbejdsgiveren:
( 4 bedømmelser ) Brussels, Belgium

Projekt ID: #25295210

Tildelt til:

stevobujica91

Hello I'm ready to work on this project.

€50 EUR in 3 dage
(35 bedømmelser)
5.7

3 freelancere byder i gennemsnit €147 timen for dette job

MagicScripter

I understand you want a python script to scrape website firmwares and do so while implementing SQLite. I can do this with your provided URLs. send a message for details

€140 EUR in 7 dage
(7 bedømmelser)
2.4
altr1m

Hi, how are you today? It will be my great pleasure to work on your project. I am a senior and passionate full stack developer that has rich experiences in information technology. Working for 7 years I have strong know Flere

€250 EUR in 3 dage
(1 bedømmelse)
0.4