I need a Windows program, similar to Scrapebox Link Checker but with a few advanced features.
Needs to be multi-threaded for speed and have a configurable number of threads (5-25)
This is what I need it to do:
Enter a list of URLs (usually between 500 and 1500)
Enter a list of anchors I want to check
Example: Example Anchor 1, Example Anchor 2, ExampleAnchor3
Program will check for a presence of any of these anchors texts
Example: <a href="/examplehref.html">Example Anchor 1</a>
If found, program will scrape "/examplehref.html"
I need to be able to specify if I want to search an exact match or broad match.
Exact match search: only "Example Anchor 1" will return true
Broad match search: "Exam", "Ancho" or "Example An" would all return true
Case SenSiTiVe or non-sensitive search too.
Output results in .csv in the following format:
"ID" ; "FOUND" ; "original url checked" ; "http://domain.com/examplehref.html" ; "Anchor"
"ID" ; "NOTFOUND" ; "orignal url checked" ; "" ; ""
"NOTFOUND" can also be an error message if the website returned in, such as "404" or "403", etc.
Program would run in two passes, second pass would check NOTFOUND and ERROR urls again so we can be sure we didn't miss any links.
There's a few additional website footprints we'd need to put in where the scrape is slightly more complicated, we can discuss that privately.
Please check [url removed, login to view] - I want it to look and behave like this, except with these added functionalities I described.