Find URLs from websites

Populate an Excel sheet with the URLs of staff pages from a list of University websites.

To identify the XPath to various elements in a page, one of the tools that can be used is the XPathChecker plugin in Firefox ([url removed, login to view]).

The first step in creating a template is to identify the start page for each institute/organization. This start URL is added to the StartURL field in the institutes table. In most cases the list of staff members names is either a table or a list. The XPath to identify this table or list is then added to the TableXPath field in the corresponding record. The XPath to identify each staff member’s profile page link is added to the URLXPath field. Since most web profiles will be linked using a relative URL, the URLXPath based link needs to be combined with a URL prefix for the institute web server address and path. This is added to the URLPrefix field.

Once the StartURL, TableXPath, URLXPath and URLPrefix fields are populated, the script should be able to read the individual profile pages one by one. This can be verified by running the script and checking the output of the script on the screen to see whether the URLs are actually being retrieved.

Once the pages are able to be extracted, the template XPaths for the profile details need to be populated. The variables that are being captured include:

• Name

• Title

• Email

• Phone

• Fax

• Address

• Biography

• Qualifications

• Research Interests

• Publications

Each of these details will require a separate XPath added to the template with an optional regular expression to eliminate unwanted formatting and HTML tags. Please note that not all organizational units/staff members will have all of these details. A few trial runs will need to be run to get the most optimal XPath that will capture the majority of the details. For each detail, there are two methods of using the XPath. One is to get the value as a list of XPath nodes (‘V’) and the other is to get the values found by the XPath as a string (‘S’). The type of return needs to be added to the corresponding type field in the table. If a regular expression is needed, the type would usually be ‘S’.

More details will be posted in the coming weeks.

Evner: Dataindførsel, Excel, Perl, Web Skrabning, Websøgning

Se mere: xpath and or, using regular expression, the institutes, table checker, string prefix, regular expression using, regular expression a, regular expression 0, prefix string, prefix of a string, name and address template excel, formatting websites, find the staff, find perl, find a prefix, data entry staff needed, web expression 4, expression web 4.0, staff org or, university list research, html checker, firefox addon, find the fax no, find phone, find address

Om arbejdsgiveren:
( 0 bedømmelser ) Sydney, Australia

Projekt ID: #4066847

16 freelancere byder i gennemsnit $157 på dette job


I can help in your project, please check PMB and our ratings/reviews to get idea of our experience. Please let me know if you have any queries.

$199 AUD in 7 dage
(68 bedømmelser)

Good day, please see my message

$150 AUD in 7 dage
(9 bedømmelser)

Can be done very well. Have done this many time. Please see private message for proposal

$250 AUD in 3 dage
(16 bedømmelser)

i have done this work many times its quiet easy task for me....regards:R1

$130 AUD in 7 dage
(9 bedømmelser)

hi please check your PMB

$110 AUD in 3 dage
(1 bedømmelse)

Dear sir, I'm an experienced Web researcher and am eager to complete this job properly and in time.

$140 AUD in 9 dage
(1 bedømmelse)

Please see PMB We have experience team to do this job... [url removed, login to view]

$150 AUD på 1 dag
(2 bedømmelser)

Hello Sir, I 'm Faisal I will do my best for your project. And will deliver your completed project with a very short period. I have great team to done your task before time I am proficient in ms-word, ms-excel, dat Flere

$200 AUD in 3 dage
(0 bedømmelser)

some Details ?

$150 AUD in 2 dage
(0 bedømmelser)

Can be done very well. Have done this many time.

$150 AUD in 10 dage
(0 bedømmelser)

I know this process well,give me I will do it successfully with good quality

$150 AUD in 20 dage
(0 bedømmelser)

Lets get started.

$250 AUD in 3 dage
(0 bedømmelser)

ready for your work.

$30 AUD in 10 dage
(0 bedømmelser)

i am interested in working with you

$100 AUD på 1 dag
(0 bedømmelser)

Ho, please engage me, give me this chance. Thanks

$200 AUD in 5 dage
(0 bedømmelser)

Simple html DOM is better than Xpath for dirrect search of info on html page.

$160 AUD in 3 dage
(0 bedømmelser)