Generate a synthetic dataset for Chinese and English (Python)


I am looking for a freelancer to create an OCR dataset for segmenting Chinese and English text from any image. The dataset should be generated via Python. A lot of the code already exists; see "What already exists?" section. Lastly the dataset should be compatible with PyTorch.

Tasks that need to be completed are:

1. extract sentence fragments from a dialogue text file and dictionary database

2. add sentence to image with entire sentence in bounding box (place randomly without overlap)

3. generate fixed dataset inside the `dataset/` directory

4. create a pytorch dataset with random chinese and english sentences for semantic segmentation

What already exists?

- [login to view URL] (synthetic single character dataset)

- [login to view URL] ( chinese dictionary)

- different dialogue txt files for english and chinese

- [login to view URL]

I have created a private GitHub repo for this project. You can get access to it for further details before the project begins. If you are interested in this project please start your bid with "OCR PROJECT". (There are many bots bidding.)

Evner: Python, Databehandling, OCR, Datavidenskab, Image Processing

Se mere: chinese english building industry words, change autocad chinese english, simplified chinese english free translation, translation beijing chinese english email hiring web, chinese english typing input data, conversion chinese english, translate document chinese english, translation chinese english malaysia charges, looking chinese english bilingual, looking chinese english speaking actors, looking for chinese english interpreter in sydney, looking for chinese english translator, python code to generate synthetic data, chinese english translation dataset, chinese-english translation dataset, generate synthetic time series data python

Om arbejdsgiveren:
( 2 bedømmelser ) Lübbecke, Germany

Projekt ID: #29879822

4 freelancere byder i gennemsnit $259 timen for dette job


***"OCR PROJECT"*** Greeting of the Day, I appreciate posting this kind of job. I understand your requirement and I want to help you out with smart solutions. I'm an expert Python Developer having 5+ years of experi Flere

$140 USD in 15 dage
(11 bedømmelser)

Hi! I am an expert in photoshop & lightroom at [login to view URL] with 10+ years of experience in photo editing and retouching, having the top rate (highest score reviws 5/5). I saw the task Generate a synthetic dat Flere

$140 USD in 2 dage
(1 bedømmelse)

"OCR PROJECT" Hi, I am a OCR expert. I can use python for OCR and computere vision. I can use openCV or deep learnign model. Theses are CNN or RNN model. The importance is to prepere good data set. It we prepare datas Flere

$200 USD in 7 dage
(1 bedømmelse)

Hi, I'm a nativeChinese speaker and machine learning engineer with 4 years working experience. I've been involved in Chinese OCR project with leading tech company. For further discussion, pls contact me via inmail.

$556 USD in 14 dage
(0 bedømmelser)