Data de-duplication

Færdiggjort Opslået 6 år siden Betalt ved levering
Færdiggjort

We have a list in Excel of 5464 wines extracted from the menus of several restaurants. The list has the type of wine (red or white). the name of the wine, a brief description (not always available), and the vintage (year e.g. 2011, and when not available, "No Vintage").

We would like to identify the subset of wines that appear twice or more in the list. The list is already sorted by wine type (reds first), then alphabetically by wine name, and finally on vintage. Two wines are deemed identical if they have the same type, name and vintage. We can use statistical software or Excel to identify and tag such wines. But often the same wine is listed a bit differently across different restaurant menus, and such wines can only be spotted by manual inspection.

The attached extract from the list illustrates. Most records as shown in the first several rows are clearly each referencing a different wine. But it includes an example of two records that obviously list the same wine. There are also records that likely refer to the same wine but one cannot be certain. See for example the three wines with wine_id 129 to 131. They share very similar names, identical or overlapping vintage, and the same broad region of origin, DOC Veneto.

The tasks are to:

1. Identify and tag (by entering 1 in a column entitled "Sample") wines that are clearly appearing twice or more in the list, as in the first example above.

2. Identify and tag (by entering 2 in the "Sample" column) wines that are likely to be identical but one cannot be certain. This will be a subjective process but be conservative.

3. For each wine identified in the steps above, assign a new running id (same wine id) in another column, starting with 1 for the first such instance, 2 for the next etc. Note that each group of wines identified as identical or likely identical will receive the same id.

In general, we believe that identical wines will be in adjacent rows, and the best strategy is to scroll down the list, browsing the names and descriptions. It is conceivable that in some cases identical wines are a few or even several rows apart. If this is spotted, that is fine but it is not expected that you identify such cases.

The final output will be the Excel sheet that you received with two additional columns. The first (Sample) will flag identical wines with 1 for sure and 2 for likely. The second (same wine id) will be a running counter of such wines, with all rows containing the same wine being assigned the same id.

We plan to hire at least two independent freelancers for this task. We will compare output across the hire freelancers to evaluate the quality of the output.

Data Cleansing Databehandling Excel

Projekt ID: #16492068

Om projektet

35 bud Remote projekt Aktiv 6 år siden

Tildelt til:

Shah101

Excel of 5464 wines extracted from the menus of several restaurants. The list has the type of wine (red or white). the name of the wine, a brief description (not always available), and the vintage (year e.g. 2011, and Flere

$5 USD / time
(259 bedømmelser)
6.7
moniapostolov

Hello. I have a good working experience in Data Entry jobs, (excel, word, web sites, etc.), so I hope, I can help you. Feel free to contact me, so you can tell me exactly what you need and hopefully we can start workin Flere

$4 USD / time
(19 bedømmelser)
3.7

35 freelancere byder i gennemsnit $8/timen for dette job

schoudhary1553

Hi there..... Warm Greetings We came along with your request for Data de-duplication and we reviewed your project description. We'd like to help you with confidence and satisfying results... We have professio Flere

$8 USD / time
(359 bedømmelser)
8.0
bouslimi1979

Hi, this can be coded by a "smart" macro to identify the similitude between wine names. 100$ is my price for the entire code, which I ll provide you. Thanks

$100 USD / time
(297 bedømmelser)
7.7
KulfiSoftwares

Checked the attachment. I am an excel guru. Expert in data cleansing, organizing, processing, etc. I understand the project as it is well instructed. To be more clear, I can do a sample for you before award. I can prom Flere

$7 USD / time
(306 bedømmelser)
7.6
Dhruvika111

Greetings of the day!! Thank you so much for the description. I understood your requirement and as it require a detail orientation and keen eye to check the details and find the identical ones as it is a long list of Flere

$4 USD / time
(337 bedømmelser)
7.7
sisicirnes

Hello! My name is Irnes and I'm a data entry/web search specialist from Bosnia and Herzegovina. I'm here to prove my skills and quality of the work. The Freelancer website gave me a preferred status. More than 95% Flere

$8 USD / time
(70 bedømmelser)
6.6
hdlong66

Hello, You need to identify and tag 5464 wines. I will do it with accuracy. I'm ready to start. Best regards, Hoang

$10 USD / time
(142 bedømmelser)
6.8
best4best

Hi, I can manually go through the list of wines and identify the subsets as described by you with great care. Regards

$8 USD / time
(162 bedømmelser)
6.4
supper5guy

Hello there, after reading your project description and with a drive for excellence, with a keen attention to details and with more than 10 years experience working with Microsoft office especially excel vba, i can do Flere

$2 USD / time
(82 bedømmelser)
5.8
rumapaul

A proposal has not yet been provided

$2 USD / time
(52 bedømmelser)
5.5
jonna88

Hello there, hope you are doing fine. I am very interested in doing this project and can start working on it after having discussed and awarded. Hoping to hear from you and may you have a great day ahead. Thank you.

$5 USD / time
(14 bedømmelser)
5.1
arman0464

Dear Clients, I have 04+ Years experience in this field with 100% of success ratio and 300 project has completed. Delivering top-level services is my speciality .I believe that my expertise would be a good match for Flere

$5 USD / time
(82 bedømmelser)
5.3
nbprince

Hi friend, I will do this task to identify the subset of wines that appear twice or more in the list. I have already done many such scripts and tasks and will do it for you as well.

$3 USD / time
(20 bedømmelser)
4.0
rasherzmr

I am an experienced typist and FULL-TIME Data Entry Customer Support Agent. I learned about the job post on Data Entry and Customer Support Agent, talents and skills fit the position well. As requested, I have good Flere

$5 USD / time
(0 bedømmelser)
0.0
point2solutions

Dear client As a highly skilled Data Entry .I read your posting for Data Entry task . I would like to express my deep interest in your project With more than 8+ years’ experience as a proficient Data Entry .I pl Flere

$8 USD / time
(0 bedømmelser)
0.0
SSDTrading

With an eye for detail and an analytical, methodical way of working, I can work quickly and efficiently through data, analyzing, correcting and noting duplicates. Having worked in various fields for many years, databa Flere

$16 USD / time
(0 bedømmelser)
0.0
ruhulamin2121

Hello There! I am an expert M/s office especially excel. I have read your job post and understood well. I would deliver this job in few hours also infix rate $30 (negotiable). I assure the quality of my work and acc Flere

$2 USD / time
(0 bedømmelser)
0.0
mehulsachdeva

I am a data analyst and have experience with tools such as excel , sql , R and tasks like Analyzing data , DE-Duping and visualization.

$6 USD / time
(0 bedømmelser)
0.0
Njengaw

Hi there-my name is Wachinga, a lady who will with work on the Data de-duplication work that you have described here. I am an experienced lady with working knowledge in software such as Microsoft office which is a i Flere

$5 USD / time
(0 bedømmelser)
0.0
jeckoi00

i use to do some sorting files on excel in my previous job,just need your go signal to start it

$3 USD / time
(0 bedømmelser)
0.0
johnb1275

Creativity. Marketing knowledge. Trustworthiness. When you're looking for a writer, editor, or proof-reader, your company - and your readers - shouldn't settle for a second-rate freelancer. A good writer can get by on Flere

$5 USD / time
(0 bedømmelser)
0.0