I have thousands of brazilian laws on several PDF documents. I need a custom tool to read the PDF file’s content and insert each law in a Joomla website article.? Joomla knowledge is not required, I can tell you what tables and columns to work with. The PDF files can be exported to other format (like html) if that makes things easier to insert into Joomla (MySQL database).
So in short,? I need a tool to read a large PDF file (or html file if exported first), parse it to identify where a law begins and where it ends, and insert each law (html formatted) in a MySQL table record.
The objective of this project is to move thousands of Brazilian laws in PDF format to a Joomla Website.
- Each law must be in its own article.
- Each law may contain references to other laws, identified by (*9999)? ( parentheses*<law number> parentheses ). That reference must be converted to a hyperlink, so when the user navigates on the website he can click on the law number to jump to it.
- The joomla article ID must be law number, so it’s easy to create a hyperlink from one law to another.
- every law number starts with the character ‘*’
- The text formatting should be kept (tables, bolds, italics, etc)
- There are vertical and horizontal tables on the laws. They can be converted to images.
- There are forms and math formulas on the laws. They can be converted to images.
- There can be notes on the laws, that should also be inserted in the law article, with hyperlink to referenced article on the note.
- One law can take several PDF pages, and one PDF page can hold several small laws.
- The PDF pages have a different format for even and odd pages, with the laws number on different positions.
- The Joomla Sections and Categories will be determined by the laws Titles
- The PDF files are also available in Page Maker 6.5 format.
I have attached a PDF file sample, and some images with commentaries.
-? check pages 25, 26, 27 on the PDF for horizontal table examples
-? check pages 74,110? ? on the PDF for tables and math formula examples
I can also upload a Page Maker file (.p65) if? needed.
We’ll be happy to answer any questions from the coders.