**Spider**
Save all data to a database (mssql).
1. Enter a start url (i.e. [login to view URL])
2. Visit url
3. Find new URL’s (links ??" unique top domains)
Only save hole URL’s in the database. I.e. [login to view URL] is correct, [login to view URL] would be wrong.
4. Populate database with new URL’s
5 Visit database widths URL’s and does procedure 2-4 again.
Extra
- Remove duplicate.
- Not visit same page more than once.
- Decide url type to be saved in database (i.e. Only .org addresses).
- Decide number of URL’s in the database (i.e. Stop at 100 000)
- Continuo to populate the database width a new start url (1) if (5) is empty.
**Crawler**
After the limit of URL’s is reached the next job is to visit the actual url and download the webpage to our local disk (cache the webpage like google). Set them up in unique folders and in this folders there must be and xml file width reference url path and local disc file path.
## Deliverables
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
## Platform
**System definitions
** Windows server
mssql