
Closed
Posted
Paid on delivery
Task: Data Collection Requirement: Existing datasets or functional code are acceptable. Specifications A. Code Requirements (if submitting code): - High concurrency support - At least 20M records/day - Minimum 20M records/day processing capacity - Full data persistence :“Raw data” B. Dataset Rules (if submitting datasets): - New data must be generated for 2025 - Must avoid data overlapping with open-source datasets Process & Payment: - Strict acceptance testing protocol will be enforced - Payment method: Segmented payment
Project ID: 39715625
69 proposals
Remote project
Active 8 mos ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
69 freelancers are bidding on average $3,566 HKD for this job

⭐⭐⭐⭐⭐ Efficient Data Collection for 2025 with High Processing Capacity ❇️ Hi My Friend, I hope you're doing well. I've reviewed your project requirements and see you are looking for data collection solutions. You don't need to look any further; Zohaib is here to help you! My team has successfully completed 50+ similar projects for data collection. I will ensure high concurrency support and meet the requirement of processing at least 20 million records a day, while also generating new datasets for 2025. ➡️ Why Me? I can easily handle your data collection project as I have 5 years of experience in data processing, database management, and data analysis. My expertise includes creating efficient code, ensuring data persistence, and avoiding overlaps with existing datasets. I also have a strong grip on data validation and testing protocols to meet your strict acceptance criteria. ➡️ Let's have a quick chat to discuss your project in detail and let me show you samples of my previous work. Looking forward to discussing this with you in chat. ➡️ Skills & Experience: ✅ Data Processing ✅ Database Management ✅ Data Collection ✅ Data Analysis ✅ High Concurrency Support ✅ Data Validation ✅ Code Optimization ✅ API Integration ✅ Data Persistence ✅ Scripting ✅ Error Handling ✅ Quality Assurance Waiting for your response! Best Regards, Zohaib
$2,800 HKD in 2 days
7.3
7.3

We at STR Softwares LLP fully understand the importance of accurate and well-curated datasets for any data-driven task. With our decade-long experience in Python development, including expertise in Data Engineering and Web Scraping, we are more than equipped to undertake your Text Dataset Curation project. Our proficiency in Django, Flask and FastAPI have enabled us to successfully handle numerous data-centric projects. For your specific needs, we can leverage our web scraping skills to scour the internet for open source text datasets that meet your stringent criteria. Coupled with our strong organizational abilities, we will ensure each dataset is carefully vetted, properly stored, and formatted in an intuitive folder hierarchy mirroring the comprehensive spreadsheet you require. Moreover, as a team of senior developers with personalized technical leads for each project, we guarantee meticulous planning and delivery within your required timeline. You can count on our cutting-edge Python solutions to not just locate 50-100 unique text datasets spanning varied categories but also deliver them without any duplicates—meaningful distinct ways—and interpreting all crucial metadata accurately as needed in your project. Let us handle your Text Dataset Curation need and make it an exemplar of efficient data engineering and sophisticated Python coding!
$4,000 HKD in 7 days
7.0
7.0

As a skilled web developer with a strong background in Web Data Scraping and Web Searching, I believe my expertise would be invaluable in your project. With my dedicated team at BN-Droids Digital Services - stacked with professionals who understand the value of structured and organized data - our talents align perfectly with your task of high-speed dataset creation whilst avoiding overlap with existing open-source datasets.
$10,000 HKD in 7 days
6.9
6.9

Hi, I am interested to work on this scraping project.I assure you that I can do this job perfectly within required time and reasonable budget. Message me here & LET'S GET STARTED THE WORK WITH ME. Looking forward to an early and positive response. Regards, Shalu
$3,000 HKD in 6 days
6.5
6.5

Hello kotsukimo, I have extensive experience in building web scrapers, crawlers, and webbots using PHP, with support for concurrency. I have created several scrapers for various clients to gather data, as well as webbots to automate repetitive tasks, which have saved them time and effort. I can deliver the data in several formats, including CSV, TSV, Microsoft Excel, JSON, XML, SQLite, and MySQL formats, depending on your preference. To get started, I need the name of the website from which you want to extract data, the specific data points you wish to collect, and any post-processing instructions you may have. I would be more than happy to assist you with your project needs. Let’s discuss your project further!
$2,000 HKD in 3 days
6.3
6.3

I'm confident that I will be a perfect match for your project. I understand the importance of generating fresh data, precisely why my web skills will greatly come in handy to avoid any overlapping with open-source datasets. Moreover, I am capable of handling concurrent processes with high efficiency making sure the minimum 20 million records/day are met without compromising the quality. My 3-year experience coupled with excellent academic skills guides me to execute tasks effectively within defined timelines. With an eye for detail, my data conversion ability (PDF, Excel, Word) will ensure that all records are maintained in their raw data form as required. Additionally, I am more than willing to provide you with a sample and walk you through similar projects I have successfully completed. This, in turn, will help us/you establish trust based on what I can deliver which ultimately aligns us better for accepting thankfully your strict testing protocol and Segmented Payment Method. That said, choosing me means getting a skilled professional who is committed to 100% quality work within the set timeline!
$2,000 HKD in 3 days
5.7
5.7

Dear Project Coordinator, Are you looking to efficiently gather unique datasets for 2025? I would be happy to offer a free demo of my robust web scraping solution before finalizing the project. I intend to deliver a high-performance data collection system that meets your specifications. Let's discuss how we can tailor a detailed plan and schedule a demo to showcase the capabilities of the solution. Regards, Smith
$4,000 HKD in 7 days
5.4
5.4

⭐⭐⭐⭐⭐ Dear Valuable Client, CnELIndia, led by Raman Ladhani, can efficiently support your Open-Source Text Dataset Curation project by leveraging our expertise in data collection, processing, and management. We will start by systematically identifying 50–100 unique, openly licensed text datasets across diverse domains. Each dataset will be downloaded or mirrored where legally permitted, organized into a clear folder hierarchy, and verified to avoid duplicates. We will create a comprehensive CSV detailing dataset name, source URL, license type, domain, size, and a concise description, accompanied by an at-a-glance licensing summary for commercial-use filtering. Our team has prior experience curating text corpora and maintaining personal mirrors, ensuring accuracy, completeness, and navigable structure. We can deliver the final compressed archive with spreadsheet metadata within a realistic timeline after project kickoff.
$4,000 HKD in 7 days
4.9
4.9

I specialize in curating high-quality text datasets, leveraging advanced NLP techniques for efficient filtering and cleaning. My recent project for [Client Name] involved building a similar library, utilizing Python with libraries like NLTK and spaCy for data pre-processing and quality control, resulting in a 98% accuracy rate. This experience ensures I can deliver a meticulously curated collection for your needs. My approach involves a multi-stage process. First, I'll identify and acquire relevant freely-licensed datasets using web scraping and APIs. Subsequently, I'll employ NLP techniques, including stemming, lemmatization, and stop word removal, for data cleaning. Finally, rigorous quality checks, using custom scripts for consistency and duplication detection, will guarantee the accuracy and reliability of your library. This structured process ensures minimal errors and optimal data quality. I'm confident I can create a robust and comprehensive text dataset library for you. Could you share specifics on the desired dataset types and licensing requirements so I can tailor the approach to perfectly match your vision?
$4,635.87 HKD in 21 days
4.7
4.7

Dear Hiring Manager Hello, I have done similar work earlier for my many client's with 100% satisfaction, INBOX ME FOR DETAILS, I can start your project right now with 100% accuracy and within time deadlines. I am happy to provide you a sample work. Please let me know. Thank you.
$2,000 HKD in 2 days
4.8
4.8

Hello, Hope you are doing great, As a highly experienced and dedicated Web Scraping professional, I believe I'm the ideal candidate to undertake your text dataset scraping project. My skills in managing high concurrency and delivering exceptional processing capacity of at least 20M records/day make me an excellent fit for your requirements. With my knowledge on data persistence, you can be assured that all your "Raw data" will be secured and available whenever you need them. Moreover, my strong command over popular coding languages like HTML, CSS, JavaScript, Angular.Js, and React.Js enables me to contribute not just to successful data collection but also to provide bespoke IT services that specifically meet your needs. Let's get started and build a long-lasting professional relation! thank you Gaurav D.
$4,000 HKD in 7 days
4.8
4.8

Greetings! This is exactly the kind of work I love doing. I specialize in data collection and management with 9+ years of experience, curating well-organized, high-quality datasets ready for research or analysis. Here’s how I can help: * Identify 50–100 unique, freely-licensed text datasets across various domains and topics * Download, mirror, and organize each dataset into a clear, intuitive folder hierarchy * Create a comprehensive spreadsheet detailing dataset name, source URL, licence, domain, size, and a brief description * Provide an at-a-glance licensing summary for easy filtering and ensure no duplicates Do you want me to prioritize certain types of text data—like code snippets, sentiment analysis, or general corpora—or cover all categories equally?
$3,500 HKD in 4 days
4.5
4.5

I have 8 years of experience in the data collection field, making me the best fit to complete this project. I possess the relevant skills to meet the requirements efficiently. How I will be completing this project: - Develop code with high concurrency support - Ensure processing capacity of at least 20M records/day - Implement full data persistence for "Raw data" - Generate new data for 2025 to comply with dataset rules - Avoid data overlapping with open-source datasets What tech stack I will be following: - Utilize advanced technologies for high concurrency support - Implement efficient data processing algorithms - Use secure data storage solutions for full data persistence I have worked on similar solutions in the past, providing me with the necessary expertise to tackle this project successfully. Roadmap to complete the project: 1. Analyze requirements and create a detailed project plan 2. Develop code with high concurrency support and data persistence 3. Generate new data for 2025 and ensure payment method compliance 4. Conduct acceptance testing to guarantee quality 5. Implement segmented payment method for project completion I am confident in my ability to deliver a high-quality solution for your data collection needs. Thank you for considering my proposal.
$2,000 HKD in 7 days
4.6
4.6

✅Full Experience in Web Scraping and Data Extraction with Python/Selenium/Scrapy/BeautifulSoup✅. ✳️I am very confident that complete your project perfectly. My job review is insufficient, but you don’t need to worry! ✳️I can guarantee the quality of the job and deliver the result on time. I hope we will discuss in more detail via chat. Best regards!
$3,000 HKD in 7 days
4.3
4.3

Hi kotsukimo, Thank you for considering my proposal. With over 8 years of experience in Excel, I am well-equipped to assist you with your project of curating open-source text datasets. I have carefully reviewed your requirements and am eager to work on this project with you. I have a strong background in data curation and organization, and I believe I can efficiently identify, vet, and organize the text datasets as per your specifications. I would like to discuss your project further in chat to understand your specific needs and outline a detailed plan to deliver the desired results. Looking forward to connecting with you to discuss this project in more detail. Regards.
$2,000 HKD in 1 day
3.8
3.8

I have reviewed the complete description of your job post and I'm confident I will meet your expectations and give you 100% satisfaction and quality work. I'm all set and available to start working on it right away. can begin work immediately and aim to have it completed within one day. I am pretty sure that I can provide you high quality work within short amount of time. Thank you, and I look forward to the opportunity. Best regards Jaweria
$2,000 HKD in 7 days
4.0
4.0

With 7 years of experience in data collection, I am confident that I am the best fit to complete this project. I have the relevant skills to meet all the requirements mentioned by Namita. How I will be completing this project: - I will ensure high concurrency support for efficient data collection. - Processing a minimum of 20M records per day will be guaranteed. - Full data persistence will be maintained for "Raw data" for future reference. Tech stack I will be following: - Utilizing advanced tools for seamless data collection. - Implementing robust algorithms to handle large datasets effectively. - Ensuring data security and integrity throughout the process. I have worked on similar solutions in the past, providing me with the expertise to deliver top-notch results for Namita's project. My approach will be thorough and meticulous to meet all specifications outlined. In terms of the roadmap, I will start by analyzing the existing datasets or functional code provided. I will then implement the necessary modifications to meet the requirements set forth. Testing will be conducted rigorously to ensure compliance with the specifications. The payment will be divided into segments based on the milestones achieved. This will align with the strict acceptance testing protocol to guarantee quality deliverables. Overall, my experience, skills, and dedication make me the ideal candidate to successfully complete this data collection
$2,000 HKD in 7 days
3.9
3.9

Boasted with a vast range of experience, I'm confident I can tackle your Text Dataset Scarping project impeccably. Specifically, my expertise in Scrapy and Web Scraping neatly aligns with your requirements for data collection. With over 10 years in Software Development, I've sharpened my skills in delivering high concurrency tasks while maintaining an impressive processing capacity - exactly what you need for collecting at least 20M records/day. Beyond just the numbers, I'm also proficient in managing large datasets, ensuring full data persistence and compliance with unique dataset rules like generating data exclusively for 2025 and avoiding overlap with open-source datasets. Quality assurance is ingrained in my work ethic; thus, I guarantee the delivery of top-notch, clean datasets that meet or even exceed your strict acceptance test protocol. Finally, being client-focused, my paramount concern is collaborating closely with you to achieve your goals expeditiously. We'll have clear communication lines on every project detail as I value timely delivery and exceeding clients' expectations. So why delay? Let's commence this fruitful partnership and bring your vision to life!
$5,000 HKD in 7 days
3.8
3.8

Hi, I can help you to collect the valuable datasets which are publically available and you know im Machine learning developer so im already working with various types of datasets and can you please hit me up so we can discuss i have some questions. Thanks
$2,000 HKD in 1 day
2.8
2.8

Drawing on my extensive experience as a Python developer and automation specialist, I am well-suited to tackle the complex task of curating and organizing your text dataset library. As an experienced data scraper, I can effectively identify and source the 50-100 distinct open-licensed text datasets you need, while ensuring compliance with all legal redistribution requirements. My expertise in creating intuitive folder hierarchies for easy navigation will streamline your access to each dataset. Additionally, my strong background in AI and Machine Learning will enable me to provide you with an at-a-glance licensing summary that allows for easy filtering for commercial-friendly use when required. I am adept at handling large volumes of data and translating complex technical processes into simple and efficient solutions. With me on board, you can expect complete and accurate metadata in your CSV spreadsheet, ensuring that every dataset is adequately described, mirroring the folder names and file structures for effortless organization. I am committed to delivering quality results on schedule, something I've consistently demonstrated in my previous projects. By choosing me, you'll get a highly-skilled professional who can not only complete your project effectively but also add significant value to it through my problem-solving mindset—and make sure we remain well-aligned through clear communication at every stage.
$4,000 HKD in 7 days
2.8
2.8

Hong Kong, China
Member since Aug 20, 2025
£10-15 GBP / hour
₹750-1250 INR / hour
$30-250 USD
₹100-400 INR / hour
₹750-1250 INR / hour
£2-5 GBP / hour
₹12500-37500 INR
$30-250 USD
$30-250 USD
$1500-3000 USD
€2-6 EUR / hour
$1500-3000 USD
€8-30 EUR
$30-250 USD
€750-1500 EUR
$30-250 USD
$250-750 USD
$30-250 USD
$2-8 USD / hour
$250-750 AUD