I gang

Extract PDF glyphs (characters) incl. position into HTML

I need a web- or windows application to extract all characters in a PDF into a HTML file which includes position definition for each character.

The application will need to input the PDF and extract the glyphys / characters into a HTML file.

The HTML file will include an image per PDF page (image does not need to be created by application)

The extracted text output needs to be positioned on roughly the same position as the text in the PDF.

Because a PDF may have different fonts which are not available as webfonts, it is important to get the following information from the PDF

- Line height per character

- Space beween characters

- Space between lines

The output needs to take this information into consideration when creating the HTML file.

You can calculate this for a typical web font such as Arial.

I will later use the output, to allow text-selection as an additional layer on top of an image.

Færdigheder: HTML, PDF, PHP, XML

Se mere: xml pdf php, include font html, available position, characters, extract text pdf file, php extract pdf file, xml extract pdf, pdf extract information, php pdf text extract, arial fonts, pdf extract text, html file pdf, php font characters, pdf page php, pdf image xml php, extract xml pdf, image extract, html pdf line, pdf font, use pdf image, different characters, pdf image text, pdf xml extract, extract font pdf, 2012 fonts php

Om arbejdsgiveren:
( 9 bedømmelser ) berlin, Germany

Projekt-ID: #4035424