Pattern Matching project for Perlomonster


For this project a perl script needs to be created. The situation is as follows

A PDF file with several documents will be available which also has been OCRed. The result of the OCR is also available in a separate text file. The PDF has multiple order documents or has multiple business agreement documents. All documents are in Dutch.

Some data needs to be extracted from the documents by using the OCR text file and that data needs to be saved in a CSV file. The data that needs to be extracted when the PDF file consists of order documents are:

- Name, address, zipcode, city

- Order number

- Order data

- Specific code

When the PDF consists of Business Agreement documents, the data to be extracted is:

- Name, address, zipcode, city

- Specific code

- Date of the document

- Document characteristic code

- Subject of the document

- A location which is provided in the first alinea of the document text

In the OCR text file it is visible when a new document starts on a specific page in the PDF file that contains all documents. The perl script also needs to break up the

PDF document into separate PDF file per document.

I can provide samples of the documents and OCR files after you have signed and returned our Confidentiality Agreement.

Evner: Perl

Se mere: text pattern matching, pattern matching, zipcode city, perl pdf csv file, signed pdf, page break separate file, pdf break, perl pattern, data matching project, break text file multiple files, perl script pdf csv, pdf csv script, perl break, ocr pdf code, perl break file multiple files, break page pdf, perl pdf csv, data matching, project business documents, csv file dutch, project needs saved, script multiple order, pdf file page break page, pdf file break, number ocr

Om arbejdsgiveren:
( 1 bedømmelse ) Rotterdam, Netherlands

Projekt ID: #4094970

Tildelt til:


Hired by the Employer

$250 USD in 7 dage
(27 bedømmelser)