I gang

Matching Problem (II)

I posted this project once before but it was not successfully completed.

This project involves devising a method to match inconsistently coded data. We have a dataset of worksite inspections. Entries describing the location of the same work site are often recorded differently. For example, one entry might have an "address" cell recorded as 123 Elm Rd while another entry might be recorded as 123 Elm Road. In other cases, the same company's "company name" cells might be recorded differently. For example, Acme Inc. might be misspelled as Amce Inc. in one entry. We would like to devise a program to match inconsistently coded entries. A successful match would occur when there is a high probability that the two entries are actually one and the same. This must be an automated process because our data set contains a few hundred thousand observations.

I have attached a sample of the data.

Færdigheder: Databehandling

Se mere: match problem, high match, example problem probability, road, inspections, ii, cells, cases, devise, recorded, company problem, probability data, problem probability, matching site, data matching project, process location, must posted, contains, data matching, thousand, automated data, sample attached, 123 data entry, successful company, sample name company

Om arbejdsgiveren:
( 0 bedømmelser ) Berkeley, United States

Projekt-ID: #33442