Python Script PDF OCR to .rtf

I need a Python 3 script that will extract specific parts of text from a PDF file and generate an .rtf file with those text elements isolated, and some extra text added.

The input file is a typical "Written Discovery" request used in litigation (when people are suing/being sued). The first few pages have the case information and preliminary definitions. Then there are a series of numbered requests that all have the same title, but with a different number. Following the title, there is the text of the discovery request.

In the example I am using (see file: [login to view URL]), each discovery request is titled "SAMPLE DISCOVERY REQUEST" but in "real world" documents they could be a number of different titles (eg SPECIAL INTERROGATORY, REQUEST FOR PRODUCTION, REQUEST FOR ADMISSION), although they will all be the same in a single document.

I would like a script that uses OCR to extract all of the discovery requests and put them into an .rtf file with limited formatting (bold and underlined). Each request should be followed by the following text: RESPONSE TO [TITLE OF DISCOVERY REQUEST] (see file: [login to view URL])

What makes this tricky is that these documents always appear on "Pleading Paper" where each line is numbered on the left hand side of the page. This causes the text in OCR output to be interrupted by numbers (see file: OCR [login to view URL]).

The script will need to determine when the requests begin, what they are called, list all of them (in the example, there are 9, but 30-50 is more typical) and add in the "RESPONSE TO" language.

This is the first step in what could be a much larger project for the right developer.

Please let me know if you can handle this project in a short timeframe, and if you have any questions.

Evner: Python

Se mere: python script installieren, python script output text file, python script read html page, read csv file using python script, python script scrapes results google search, python script text files, pdf ocr book, download money transfer script pdf, pdf ocr php, python script modify file content, python script city, php pdf ocr, python script crawl, python script harvast email, scan book pdf ocr, java script pdf jpeg conversion page page, python script csv tab, python script download website pdf, python convert pdf rtf, python script convert pdf

Om arbejdsgiveren:
( 0 bedømmelser ) Santa Monica, United States

Projekt ID: #17684931

6 freelancere byder i gennemsnit $214 på dette job


Hello Sir, I am the expert freelancer here. I am on the 6th position through out the world to deliver the quality job. I have deliver here more than 395 + projects with 100% client satisfaction. I can help you Flere

$250 USD in 4 dage
(17 bedømmelser)

Dear client, Thanks a lot for taking your precious time to read my message. After browsing your job description, I am very interested in your project and I believe I’m qualified for the task. Regarding OCR, I Flere

$333 USD in 3 dage
(12 bedømmelser)

Hi, there - My name is Phong. I have read your job description and I am very interested in this project because I have good experience with OCR and python programming. I would love to speak with you further about tak Flere

$250 USD in 5 dage
(11 bedømmelser)

Hello!\nI am a python developer.\nI looked at your project and it seems interesting.\nI have all necessary skills required for this project.\nPing me to discuss in detail.

$140 USD in 2 dage
(21 bedømmelser)

As an: 1- Bachelor of Applied Mathematics 2- Engineer in Statistics and Applied Economics 3- Microsoft Office Technician (Microsoft Graduate: MOS Diploma), I am able, available and ready to do this kind of work in Flere

$155 USD in 4 dage
(0 bedømmelser)

hello,how are you. i read your bid carefully. i am OCR expert and have full experience for 7 years. python, c++, tesseract is my top skill and i can complete your project by using that skills. I can provide most qua Flere

$155 USD in 3 dage
(0 bedømmelser)