Closed

Python web scrapping/crawling in [login to view URL]

I need to save a snapshot (all html files) of the website [login to view URL] It is an online forum that allows people to post and follow each other. I want to save the following information:

1. [login to view URL] saved as '[login to view URL]'

2. On the index page, there are 23 forums (Notice the Porn Addiction and Porn-Induced Sexual Dysfunctions are two forums when I count). I need all pages of all threads in each of the 23 forums to be saved. For example, the first forum is shown as "Rebooting - Porn Addiction Recovery". After clicking on it, it leads to [login to view URL] The ending number 2 in the previous link is an identifier. I want this page to be saved to "[login to view URL]". There are 583 pages of threads (posts) in this forum. You can save them to "[login to view URL]" all the way to "[login to view URL]". In each of these pages, there are 50 threads (a little more on the first page due to some information and announcement at the top). Each of the 50+ thread may contain multiple pages as well. I need all these pages of html files saved too. For example, the first post is "[login to view URL]". The ending number 88344 is also an identifier, I want them to be saved to "[login to view URL]" to "[login to view URL]" (5 pages of this posting thread).

3. I want all the user profile pages to be saved as well. The website ([login to view URL]) shows there are 156,726 members. You can actually enumerate all of them starting from 1 to 156726 using the following link(for user 1): [login to view URL] In this user profile page, I need html pages that show the 5 tabs "Profile Posts"(It may have multiple pages, all pages needed), "Recent Activity" ("Click on Show older items" at the bottom until the button disappears so that everything is captured), "Postings" (No need to find all since all postings are captured in the previous step), "Information", "Groups". Moreover, I want to know the user_id of the "Following" and "Followers". For example, user 1 is following 8 other users and followed by 826 users. I want 2 tables (csv or sqlite) to save the Following/Followers information, each with 2 columns. Following Table: user_id, following_user_id; Followers Table: user_id, follower_user_id. In the Following/Followers information, only 20 users are shown each page, you need to click on the more button multiple times to enumerate all users.

Required:

1. The program should be able to finish running within 24 hours (Multithreading might be needed. For example, several threads can handle several forums, one thread can handle the user profile pages). The shorter the time, the better. Because I plan to scrape the websites on different days to see the change of users and posts.

2. Since I want to scrape this website in different days, it would be great to do some type of incremental scrapping. Running it the first time would save everything, but running it again would keep a "diff" type of files necessary to know what is deleted (user, user following relationship, threads). That would save a lot of hard disk space because I don't need to save duplicate html files that are already saved.

3. Python 3.5+ and other packages that you find necessary

4. The program should login to the forum before saving the html files. It is free to register. Login credentials can be provided upon requested.

5. The program will run on Linux Ubuntu

6. Clear comments in the code so that I can modify later

7. Object oriented design is preferred

Evner: Python, Web Skrabning

Se mere: python, machine learning, artificial Intelligence, deep learning, web scrapping, python, machine learning, artificial Intelligence, deep learning, web scrapping,, python web scrapping, computer science, media,art, c++ python c web scrapping, web scrapping python scrapping, Python web scrapping, python web crawling, web data crawling , web data crawling windows, Python web scraper, python web page snapshot, web scrapping php, python web bot, web spider crawling website robot vbnet, python web service app engine sample

Om arbejdsgiveren:
( 0 bedømmelser ) United States

Projekt ID: #16684496

18 freelancere byder i gennemsnit $283 på dette job

mingxiao2008

Dear,Sir How are you? I am very interested in your project and am ready for starting your project for now. I have experienced in developing Python, Web Scraping. I will work very hard and best for you. Best Regard Flere

$155 USD in 3 dage
(38 bedømmelser)
6.8
sohandas

Hi there, I just checked the project details and i'm very interested to discuss with you. I have great knowledge in web scraping and i use python. Feel free to pm so that we can discuss and share sample work! Regards. Flere

$250 USD in 3 dage
(142 bedømmelser)
6.6
schoudhary1553

Hello, I have the good knowledge of Python web scrapping/crawling in nofap.com. I have more than 5 years of experience in Python, Web Scraping . We have worked on several similar projects before! We have worked on Flere

$300 USD in 3 dage
(19 bedømmelser)
5.9
masterlancer999

Hello, My name is MingZhu.Z from China. You Can Check Website made by me. [login to view URL] I have completed soon facebook post scrap project. I have already seen and understood what want you. We are a team de Flere

$444 USD in 3 dage
(24 bedømmelser)
5.8
bytessolution

A proposal has not yet been provided

$30 USD in 2 dage
(106 bedømmelser)
5.9
Nada100200

Hello client. Hope you are doing well Over 9 +years experience writing almost exclusively web scraping code. I've done it all. I can scrape all LinkedIn profile My languages in order of experience and use is Python,dat Flere

$30 USD in 3 dage
(13 bedømmelser)
5.1
abhijitbuet

Can do it with selenium/scrapy or beautifulsoup of python whatever you want.

$100 USD in 3 dage
(38 bedømmelser)
5.1
kahilH

Hello i suggest to implement the crawler in java to support any OS linux and windows the crawler will be multithread and gives as output a xls file or a db file as you want i invite you to discuss more over chat Flere

$150 USD in 3 dage
(23 bedømmelser)
4.7
steve1112

Hey - I've checked [login to view URL] and confirm you that we can build a Python crawler as per your requirements. Please drop me a message so we can discuss every detail, thanks ~ Steve

$1000 USD in 12 dage
(6 bedømmelser)
3.7
VirtualBrainInc

Hello, I have briefly read the description on Python web scrapping/crawling in [login to view URL] development, and I can deliver as per the requirements however I need us to discuss for more clarity on the details, deadline Flere

$250 USD in 3 dage
(6 bedømmelser)
3.5
CPythonMan

Hi I reviewed your description carefully. Thus, I am very interested in your project. I bided 800USD for your project because I had the full understanding of your project. I have an experience in building an applica Flere

$800 USD in 3 dage
(6 bedømmelser)
3.5
$155 USD in 6 dage
(1 bedømmelse)
3.3
$155 USD in 3 dage
(0 bedømmelser)
0.0
iskanderakhmetov

I am pretty much familiar with the task and do similar things frequently.

$222 USD in 5 dage
(0 bedømmelser)
0.0
Lifu0415

Hello Thanks for your posting I have read your post with great care and interesting I have rich experiences of web scrapping I can take snapshot of all html files in [login to view URL] I can do it within 3 days and update Flere

$150 USD in 3 dage
(0 bedømmelser)
0.0
$500 USD in 7 dage
(0 bedømmelser)
0.0
sandyagg

I checked all your requirements properly. Would be able to scrap the info. Much skilled in python. Rate: $18/hr Let us discuss and start. Sandeep

$250 USD in 5 dage
(0 bedømmelser)
0.0
emon777asd

I have 6 year experience Freelancer,up work,Fiverr & 99design market place I have seen your project that i can to do easily because I have many experience to Graphic Design,Webdesign,Web Develop & programming .So I cou Flere

$155 USD in 3 dage
(0 bedømmelser)
0.0