I have many text files, the text files do not follow a standard format.
1. I need to find all domain names in the text files and print them to stdout.
2. The script should accept multiple file as an input and run as "./find_domains *"
3. The script should list each found domain on a separate line.
Here is an example input line found in one of our text files (please ignore this is CSV, many files are not)
"","Cheshire","28-02-2011","Sale Grammar School","4029","","Closed","Mixed","D A","","Wilson","Headteacher","Mr","07-01-2011","Trafford","358","","M33 3NH","","","999","Academy Converter","Not applicable","18","11","Marsland Road","9733217","0161","Sale","Foundation School","10005644","106371","[login to view URL]","Has a sixth form",""Not in special measures","Not part of PFI","No Special Classes","","Not applicable","","0","Not applicable","","Not applicable","Not applicable","Not applicable","","","","1","[login to view URL]","[login to view URL]:List_of_schools_in_Trafford","[login to view URL]","","","","[login to view URL]:List_of_schools_in_Trafford","[login to view URL]",""
The output from this line would be (urls should not be printed, just domains):
[login to view URL]
[login to view URL]
[login to view URL]
4. It is not expected that the last char in the domain name would be followed by a char 0-9 or a-z, so identifying the end of each domain name should be fairly simple. You can use this in a regexp or another method.
5. We need to find all .uk domain, .com domains, .net domain, .org domains
6. Script must be delivered in 2 days.
7. Programming should be Perl or Python.
8. Script must run on Linux command line.