Lukket

Web Spider App

I'm in need of a web spider with the following features

1. Multi Threaded

2. Ability to import large lists of domains to check (up to 2 gig txt files)

2.1 Ability to start at a user determined line in the imported file.

For example if a file has 10,000,000 entries then I'd like to be able to start it at entry 9,000,000 if I so desire.

3. The spider must be customizable to search certain folders on sites from a list. For example if I input

/forum

/forums

/boards

/board

into the "folders" section of the app the spider should search like this.

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

4. The script should also support searching set subdomains of imported domains. For example if I input

forums.

boards.

into the "subdomain" section of the app the spider should search like this.

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

all subdomains should be exempt from subfolders checking mentioned in (3) above.

5. The spider must be customizable to find footprints within the code of the pages it scans.

For example if I input

phpbb

"powered by phpbb"

"Powered by vBulletin"

"Jelsoft Enterprises Ltd."

into the footprints then it should search for any page that includes any of the above phrases or keywords.

6. All negitive results should be exported to a "Notfound" file

7. All positive results (IE - found one of the set footprints) should be exported to a "Found" file

8. Should any "root" domain and/or subdomain return a 404 error the script should skip running through all the folder searches. For example, if

[url removed, login to view] returns a 404 error there is no need to check

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

etc

9. The spider should be capable of using proxies, it should check a preset .txt file every few minutes for updated proxies.

If you intend to place a bid on this project please include what platform/language you intend to use and verify that you are capable of writing a MULTI-THREADED application that will meet all the requirements along with whatever experience that you have that makes you right for this job.

I prefer a windows based application but am open to serverside applications as well.

Færdigheder: .NET, C programmering, MySQL, Perl, PHP

Se mere: writing spider, writing job boards, writing folders, writing boards, writing app, writing check example, exempt, web gig, vbulletin requirements, skip searching, search files web, script writing web app, script writing app, perl open file writing, job searching sites, found app, find project web code, features web page, 7 gig, 5 gig, 404 d, 2 gig, web spider app, spider app, serverside

Om arbejdsgiveren:
( 63 bedømmelser ) Aurora, United States

Projekt-ID: #945831