
Open
Posted
•
Ends in 5 days
Paid on delivery
I’m building a desktop application that lets me harvest information from two sources with the same, seamless workflow: • Logged-in web pages already open in Google Chrome, BRAVE , Mozilla Firefox • Local or network PDFs that I choose from an “Open File” dialog or that are currently opened on the desktop For websites, I need the tool to capture text and table content, and in some instances download linked PDFs linked to each record. From PDFs themselves I only care about the text layer. All captured data has to flow into a neatly structured Excel workbook (one row per record, tidy column mapping) while the corresponding PDF files are stored in a folder bearing the same name as the spreadsheet for easy reference. Key points on functionality – Runs natively on both Windows and Linux without asking the user to install extra dependencies beyond a standard runtime (for example, a Python one-file executable or an Electron package—open to your preference). – Detects that I’m already authenticated in Chrome and simply works with the live session; no separate login logic necessary. – Simple, intuitive GUI: select “Scrape Web”, “Scrape PDF”, choose output directory, press Start, see a progress bar and a final summary of how many rows and PDFs were saved. – Robust parsing of website tables (including pagination) and text blocks; graceful handling of captchas or timeouts would be a plus. – Excel output in .xlsx with auto-generated column headers and basic formatting. – Source code provided so I can tweak selectors later, plus a brief read-me explaining how to add new sites or PDF patterns. Acceptance criteria 1. I can run the packaged application on Windows 10/11 and Ubuntu 22.04, click through the GUI, and produce an .xlsx file plus matching PDFs. 2. Website extraction works on at least two sample sites we will define during testing, while logged in through Chrome. 3. PDF extraction correctly writes the text from a sample set (scanned or image-only PDFs can be skipped). 4. No data duplication and no crashes after a several hour continuous scrape. ________________________________________________________________________________ Program Specification: Automated Data Scraper with PDF Handling: an interactive data extraction tool with a GUI and automation workflow Develop a cross-platform program (Windows & Linux desktop compatible) that automates data collection from websites that are logged in and open in the browser and alternatively also can scrape and collect from any selected/ desktop opened PDFs, exporting the results into structured Excel spreadsheets, while also saving associated PDFs for each data record. Core Features 1. User-Controlled Start/Stop - Program has a GUI with an ON/OFF “Start/Stop” button. Also an “automatic mode that will run for a specified period of time on a specific site or document timer can be set for 0.1 hours up to 12.0 hours maximum (and if it runs out of data or time, it automatically saves and closes. - When ON: User can direct the program to a specific website. (Log in if needed prior to “start”) Program begins crawling/parsing data sequentially. (Can automatically click links only to search & collect data, not execute programs) - When OFF: Current process stops. Current Excel file is saved and named automatically. 2. Data Collection Workflow -Program reads through line items (e.g., rows, listings, or links) on the given web page. -For each line item: 1. Collect textual data from the web page (structured fields). 2. If a link opens a PDF, automatically click/follow it. 3. Download and save the PDF locally. 4. Parse PDF contents (text, tables, or metadata). 5. Insert all data into a structured Excel row. 6. Associate the saved PDF with that Excel row (e.g., file path reference). 3. Sequential Processing -After handling one line item and its PDF, move on to the next line item. -Repeat until all items on the page are processed. 4. Excel Output - Each run creates a new Excel file. - File is automatically named (e.g., Dataset_<Date>_<Time>.xlsx). - Columns should include: Line Item ID / Name Extracted fields from webpage Extracted fields from PDF Local PDF file path 5. PDF Handling - Save a local copy of each PDF with a systematic filename (e.g., [login to view URL]). - Store PDFs in a dedicated folder per run, linked to the Excel output. 6. Multiple Sessions If the program is directed to a different website or a standalone PDF: It should start a new Excel file for that session. Store related PDFs in a new folder tied to that session. 7. Error Handling If a link is broken or PDF fails to load: Log error in the Excel file under the line item. Continue with next item. If the program is manually stopped, partial data must still be saved. ________________________________________ Technical Notes for Developer • Platform Compatibility Must run on Windows desktop (primary). Android compatibility is optional (possible via porting to Kotlin or running Python in Termux/Pydroid). • Suggested Languages & Frameworks Python (recommended for prototyping): Crawling → requests, BeautifulSoup, Scrapy, or Selenium (if dynamic JavaScript). PDF parsing → pdfplumber, PyPDF2. Excel export → pandas, openpyxl. GUI → Tkinter, PyQt, or Kivy (Kivy would also allow Android portability). o Java/Kotlin (for future Android-native app): Crawling → Jsoup. PDF parsing → Apache PDFBox. Excel → Apache POI. GUI → JavaFX (desktop), Android native UI (mobile). • Architecture o Frontend: GUI with start/stop buttons, site/PDF input field. o Backend: Crawling/parsing engine that processes URLs, extracts data, and writes to Excel. o Storage: Excel files auto-saved per session. PDFs stored in session-specific folders. ________________________________________ Example Workflow 1. User opens program. 2. Enters target website URL (or single PDF opened on desktop). 3. Clicks Start. 4. Program begins: Reads each line item on site. Scrolls as needed. Opens/downloads linked PDFs. Extracts and parses text. Saves associated PDF in a folder linked to any excel line items accordingly. Records results in Excel. 5. User clicks Stop. 6. Program saves and names Excel file ([login to view URL]). 7. Session complete. _________________________________________________________________________________________________ If you’ve built similar Selenium, BeautifulSoup, Puppeteer, PDFMiner, or PyMuPDF solutions before, I’d love to see them. Looking forward to your approach and an estimated timeline for a first working prototype.
Project ID: 39747605
87 proposals
Open for bidding
Remote project
Active 2 hours ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
87 freelancers are bidding on average $145 USD for this job

Hello, With over a decade of experience in web development, our company is adept at handling complex projects like yours in an efficient and timely manner. Our proficiency in JavaScript and PHP aligns perfectly with your requirements of developing a cross-platform application that can seamlessly scarp data from both websites and PDFs. We are skilled at creating robust parsers which will enable accurate extraction of text, table content and associated PDFs while ensuring there is no data duplication. Your project specifies the need for a simple and intuitive GUI and guaranteed compatibility with Windows as well as Linux, which not only are areas that we have great expertise but also areas in which we excel. Our GUI creations are always designed to enhance user experience so that you won't face any hassles while navigating through different steps of data scraping. Moreover, our programs zen without lengthy installation processes, reducing the technicalities for you. In addition to these skills and expertise, we're passionately committed to provide premier products with excellent customer service. Your satisfaction comes first — something I'm sure you'll learn for yourself should you opt to give us this opportunity to work together. Looking forward to taking your raw ideas and turning them into a polished reality, one that meets all your expectations while adding the necessary WOW factor to your project.! Let's create some miracles together! Thanks!
$180 USD in 2 days
8.5
8.5

Hello, With over 10 years of experience fueling growth, driving innovation, and optimizing operations for various industries like Healthcare, eCommerce, Fintech, Beauty, Real Estate & proptech; my team at WellSpring Infotech couldn't be better equipped to deliver the cross-platform data extraction tool you're seeking. Proficient in a plethora of languages including Python and JavaScript which will be vital for this project's execution, we have extensive experience creating seamless, intuitive applications that not only meet client specifications but exceed expectations. Our seasoned team assures you ambiguous functionality: the program detects your ongoing authentication on Chrome and the ability to scrape web and collect PDFs simultaneously without any hassle is our forte. Much acquainted with handling web tables and text blocks effectively alongside data parsing from PDFs ensures a robust automation process for your data collection needs. Having fostered numerous lasting relationships with our clients through an unwavering commitment to delivering top-notch solutions tailored to intricate criteria, we aim to render a tool that functions faultlessly for you across Windows 10/11 and Ubuntu 22.04. Our approach is meticulous - leaving no room for duplication or crashes even during long continuous scrapes. Thank you
$250 USD in 8 days
7.9
7.9

With my wealth of experience in data extraction and web scraping, I am confident that I am the ideal candidate for your "Cross-Platform Web & desktop PDF Scraper" project. As the head of a talented team at BN-Droids Digital Services, we have built a strong reputation for delivering high-quality, efficient, and automated solutions for similar projects. Our expertise in developing custom web applications with seamless data structures enabled us to provide services to numerous clients across various industries. Let's turn your vision into reality together. Choose me and BN-Droids Digital Services; choose excellence, reliability, and success on all fronts!
$30 USD in 7 days
6.8
6.8

Hello, How r u doing? I have gone through the project and i believe that i can handle it well having experience related to Desktop Application, JavaScript, Software Architecture, Data Extraction, PHP, Web Scraping, Python, Excel, Selenium and BeautifulSoup. Please have a look at my profile to have an idea of my previous work: https://www.freelancer.com/u/ayesha0124 Regards, Ayesha
$250 USD in 7 days
6.7
6.7

Hi There , Good afternoon! This is my actual bid, not a placeholder. I have gone through your project, Cross-Platform Web & desktop PDF Scraper, and I am ready to start working as soon as you confirm. I offer best quality and highest performance at a reasonable price, with on-time delivery. I’m well-versed in PHP, Python, CSS, HTML, JavaScript, jQuery, Bootstrap, Angular, AJAX, Laravel, WordPress, BeautifulSoup, Software Architecture, PHP, Desktop Application, Python, Data Extraction, Web Scraping, JavaScript, Selenium and Excel. I’d love to discuss the project further to ensure we’re aligned on the scope, timeline, and deliverables. Please let me know a convenient time for us to connect, and I’ll be happy to accommodate. Thank you
$30 USD in 1 day
6.3
6.3

Dear Gerry, I am confident in delivering a robust cross-platform desktop application tailored to your detailed specifications for PDF and web data scraping. With over 10 years of experience in software architecture and development—including extensive proficiency in Python, Selenium, BeautifulSoup, and desktop GUI frameworks—I can create a native Windows and Linux executable with smooth, dependency-minimal installation. I propose a Python-based solution, packaged as a single executable for each platform, ensuring native performance and easy deployment. Source code and documentation on extending scraper rules will be included as requested. My approach minimizes duplication, handles pagination, and logs failures gracefully to maintain data integrity over prolonged runs. I look forward to discussing further details to align the solution precisely with your workflow. Best regards, Aleksandar
$200 USD in 3 days
6.1
6.1

With my extensive background as a web and AI specialist, I am confident in my ability to deliver an automated data scraping tool that checks all of your boxes and runs seamlessly on both Windows and Linux. My vast experience with JavaScript, PHP, and Python enables me to create powerful yet compact applications that meet specific requirements while maintaining excellent performance. Additionally, my expertise in web scraping perfectly aligns with the core function of your project. I understand the importance of maintaining a robust parsing capability for tables and text blocks while handling potential captchas or timeouts gracefully. I will provide you with a well-documented source code that allows you to easily tweak site selectors later or add new websites or PDF patterns as needed. Lastly, my commitment to quality and long-term support sets me apart from other freelancers. I promise you pixel-perfect work with a transparent process. You'll receive clear demos and ongoing support, even after the project is completed. My goal is to not just meet your expectations but exceed them in every aspect possible. Let's discuss how we can transform your vision into an impressive reality
$150 USD in 7 days
5.8
5.8

Hi, I can develop an Excel VBA Macro as per your requirements. I am an experienced Data Analyst and Macro Developer and have done many similar projects. So, can do this job nicely. Ready to start please message me to discuss your project in more detail. Thanks, Virendra
$200 USD in 7 days
6.0
6.0

Hello sir, As a seasoned software developer with proficiency in Javascript, Python, and various other technologies, I confidently offer you my skill set for this project. My expertise reflects perfectly with the web scraping, data extraction, and automation requirements you have laid out. With a strong grip on React.JS, Node.JS, Flask and Selenium I am fully capable of building a semi-automatic extraction app that can scrape data from the sources you desire. This includes my capability to handle captcha’s and timeouts efficiently Going over your acceptance criteria, I can assure you I will deliver an optimized cross-platform solution running flawlessly on Windows 10/11 as well as Ubuntu 22.04. Having previously developed tools that parse tables from web pages and save them in Excel spreadsheets while storing the associated PDFs separately, I am comfortable working within these scopes. Finally, what sets me apart is my commitment to after-support. Not only will I provide you complete access to source code and comprehensive read-me files explaining how to tweak selectors or add new sites/PDF patterns if needed in future but I'll make sure it's fully tested & optimized under connuous operation without dulpication or crashes.I look forward to discussing your project in more detail and outlining a strategy to ensure we produce a great product together.
$140 USD in 2 days
5.5
5.5

Hi, I'm ready to start the task. Please send me a message for discussion. Let's discuss about the job. Thanks
$250 USD in 3 days
5.3
5.3

Hi Gerry B., I recently completed a similar project. Message me so I can show you that sample. Question: For “already authenticated” sessions across Chrome/Brave/Firefox, are you okay with a secure session-bridge that reads cookies + user agent from your current browser profile (read-only, uses OS keychain/DPAPI/libsecret) and injects them into an internal Playwright context, so the app is logged in without remote-debug flags or extensions? If you need true tab control, I can optionally attach via CDP (Chromium) and Marionette (Firefox). Suggestion: Build a resumable, no-dup engine: stable record IDs (hash of canonical item URL + key fields), checkpoint every N rows, exponential backoff/retry, pagination and infinite-scroll drivers, and streaming .xlsx writing (openpyxl) for long runs. PDFs saved with deterministic names + SHA1 to dedupe; text via PyMuPDF with fallback to pdfminer; skip image-only by MIME/metadata. GUI has Scrape Web/Scrape PDF, output dir, Start/Stop + timer, live progress/ETA, per-row error log. Packaged as single-file executables for Win/Linux. Kindly send me a message, lets discuss in detail and my portfolio is uploaded here and on my website. Best Regards, Sid CTO and Co-Founder of Ekarthaan
$250 USD in 9 days
5.1
5.1

This is exactly the kind of work I love doing. I've developed custom data extraction tools using Python, Selenium, and PDF libraries like PyPDF2. My solutions excel in user-controlled workflows just like your project requires. I've dealt with seamless authentication handling in browser sessions before, ensuring a smooth user experience. I'm confident in meeting your high standards for automated data scraping and tailored Excel outputs. I can't wait to showcase my expertise in this project. I'm happy to offer insight even if you don't go with me. Regards, Anne S.
$200 USD in 5 days
5.3
5.3

Hi there, I'm Ahmed Hassan, a Senior Full-Stack Engineer based in California with over 15 years of hands-on experience in web and mobile application development. After reviewing your job posting, I’m confident that my background aligns closely with your project requirements and enough qualified for your project - Cross-Platform Web & desktop PDF Scraper. I’ve successfully delivered SIMILAR projects in previous roles—both as a senior developer and project manager—ensuring secure, scalable, and user-friendly systems tailored to business goals. I’d love the opportunity to discuss how I can contribute to your project’s success. Looking forward to connecting. Best regards, Ahmed Hassan
$100 USD in 3 days
5.0
5.0

Dear Project Owner, I am highly experienced in PHP, JavaScript, Python, Excel, and Web Scraping, with a proven track record of delivering successful projects similar to yours. I am confident in my ability to develop a cross-platform automated data scraper with PDF handling for your desktop application. My expertise in software architecture, data extraction, and utilizing tools like BeautifulSoup and Selenium align perfectly with the requirements of your project. I am excited about the opportunity to collaborate with you and turn your vision into a reality. I have a clear and timely communication style, ensuring reliable delivery of high-quality results. I am eager to discuss your project further and showcase how my skills can benefit your goals. Looking forward to the possibility of working together, Ali ZahidCEO, Azur Solutions
$30 USD in 7 days
5.0
5.0

Hello, As a seasoned full-stack developer, my skills in not only JavaScript and Python, but also my proficiency in web scraping are tailor-made for this project. Over the past decade, I've developed numerous programs and applications, similar in scope to yours, involving data collection, parsing, and exporting—making me well-versed in handling diverse data sources and formats. Besides, I have an extensive experience running apps on both Windows and Linux platforms, ensuring seamless functionality across your desired environments. Moreover, my knowledge of React guarantees you an intricate understanding of complex UI design. So while building your automated data scraper, I will ensure a simple yet intuitive GUI that functions exactly as you desire it. This way, you can easily navigate through the different input options and promptly view a concise summary of your data at any stage of the scraping process. Finally and perhaps most importantly, maintaining clean structured code has always been a top priority in my development process. This means that not only will I deliver on time as per your specifications (as noted by 1-4 in acceptance criteria), but the source code and necessary README documentation will be ready for easy future modifications by yourself. By choosing me for your PDF Scraper Project I guarantee complete satisfaction at every stage of the process from product execution to completed design applications. Thanks!
$100 USD in 2 days
4.8
4.8

With 8 years of experience in the field, I am best suited to fulfill your requirements for developing an Automated Data Scraper with PDF Handling. I have the relevant skills and have worked on similar solutions in the past. **How I will be completing this project:** I will develop a cross-platform desktop application that automates data collection from websites and PDFs, exporting the results into structured Excel spreadsheets, and saving associated PDFs for each data record. The program will have a user-controlled start/stop feature, sequential processing, and error handling capabilities. **What tech stack I will be following:** I will be using Python for crawling, PDF parsing, and Excel export. For the GUI, I will use Tkinter or PyQt. The architecture will include a frontend GUI and a backend crawling/parsing engine. **Roadmap to complete the project:** 1. Develop the GUI with start/stop buttons and input fields. 2. Implement the crawling/parsing engine using libraries like BeautifulSoup and pdfplumber. 3. Integrate the functionality to read from websites and PDFs, extract data, and write to Excel. 4. Include error handling mechanisms and the ability to save Excel files and PDFs in session-specific folders. I have previously worked with Selenium, BeautifulSoup, and PDFMiner, making me well-equipped to deliver a working prototype in a timely manner. I look forward to discussing the project further and providing you with a
$30 USD in 7 days
4.9
4.9

Hello Gerry, I am excited about the opportunity to build a cross-platform PDF scraper that seamlessly extracts data from both web browsers and local PDFs, presenting the information in a structured Excel format. With my extensive experience in web scraping using tools like Selenium and BeautifulSoup, along with a solid grasp of Python and GUI development, I am confident in delivering a robust solution that meets your requirements. Thanks, Hardik
$250 USD in 5 days
4.5
4.5

Hi, Gerry B. thanks for posting. Just checked job description that I’m building a desktop application that lets me harvest information from two sources with the same, seamless workflow: • Logged-in web pages already open in Google Chrome, BRAVE , Mozilla Firefox • Local or network PDFs that I choose from an “Open File” dialog or that are currently opened on the desktop For websites, I need the tool to capture text and table content, and in some instances download linked PDFs linked to each record I have carefully checked your requirements and could easily work with you. I would like to let you know that I have expert and commercial experience in BeautifulSoup, Python, Excel, PHP, Web Scraping, JavaScript, Desktop Application, Data Extraction, Selenium and Software Architecture. My skills are: Laravel, PHP, Codeigniter, Core PHP, HTML, CSS, and Javascript. Here are my other skills Front End: UI/UX Designer, Graphics Design, Vue.js, React, react.js, Node.js, Angular.js, HTML 5, CSS 3, Bootstrap, JavaScript, JQuery, Basic Photoshop, Figma, Webflow. Back End: PHP, Laravel, Codeigniter, Cake PHP, Ajax, jQuery, MySQL, Python. My work Capability : Responsible for communication for more than 10 hours each day. 8 Hours/Day and 40 Hours/Week. Daily or Weekly update.I am ready to work in a customized time zone i.e. EST/IST time zone and so. Waiting for your response!! Regards!
$155 USD in 1 day
4.3
4.3

Hi, Gerry I’ve gone through your project description and feel confident that I’m a strong match for your needs. I recently completed a similar project just a month ago. My background covers PHP, JavaScript, Python, Excel, Web Scraping, Software Architecture, Data Extraction, BeautifulSoup, Selenium, Desktop Application. Please come over chat and discuss your requirement in a detailed way. Regards
$110 USD in 7 days
4.0
4.0

Hi, I would like to grab this opportunity and will work till you get 100% satisfied with my work. I'm an expert who has many years of experience on PHP, JavaScript, Python, Excel, Web Scraping, Software Architecture, Data Extraction, BeautifulSoup, Selenium, Desktop Application Please come over chat and discuss your requirement in a detailed way. Regards
$140 USD in 7 days
4.1
4.1

Dennison, United States
Payment method verified
Member since Feb 16, 2023
$40-80 USD
$10-30 USD
$90-345 USD
$10-30 USD
$10-225 USD
€50 EUR
₹1500-12500 INR
$25-50 USD / hour
₹600-1500 INR
₹1500-12500 INR
$15-25 USD / hour
₹600-1500 INR
₹12500-37500 INR
$25-50 USD / hour
₹1500-12500 INR
₹600-1500 INR
$30-250 AUD
₹12500-37500 INR
₹750-1250 INR / hour
$30-80 USD
$40-80 USD / hour
€8-30 EUR
$250-750 USD
₹400-750 INR / hour
₹600-1500 INR