In general we want the content of the site ([url removed, login to view]) to be scraped formatted (consistently from page to page) and rebuilt into a tree of html pages. Navigating from one page to the next should yield a consistent look and feel. These pages will be access via a mobile device only (iPhone, iPad, Android phone or tablet) not from a desktop browser. As a result all content needs to be optimized for this size, resolution and aspect ratio. We want very basic (but nice looking formatting), but do not want frames or tables that take up only a side portion of the screen (these do not generally translate very well to the mobile viewing environment). Full width tables are ok to retain.
All information regarding content source (i.e. the government) should be omitted.
Any links going to pages not included in our list below should not be converted, and the entire sentence containing the link should be omitted as well. For example, under preconception health, we do not want the Fitness and nutrition section scraped. Therefore the sentence “Visit our Fitness and nutrition section.” should be omitted from the resultant content.
Generally speaking, the following sections will be omitted in the resultant HTML files:
• The end section “For more information on…” should never be included in the HTML scrape unless there are specific sub-links that will be converted under that section.
• “Share this information!” section
• “Fact sheet was reviewed by” section
• Any reference to when the document was lasted updated or reviewed
• Cross-references within the “Related Information” frame should cross-reference only the pages we’ve that are included in the resultant content. The result will be numerous links to certain pages. We want to preserve this.
Included in the attached word doc is a full explanation and an explicit list of all the pages we need scraped and rebuilt.