Giter Club home page Giter Club logo

yeeyi-rent-info-scraper-outdated's Introduction

yeeyi rent information scraper(outdated)


Description

This Python project is designed to scrape and analyze rental information from 'yeeyi.com'. The main objective is to gather rental information, such as rental type, room type, rent price, house type, and address, among other details. It then inserts this information into a table for later use.

  • use selenium to simulate browser behavior in order to hide the DDoS attack detection from cloudflare and get index page source code and detail page source code(through multiprocessing)
  • use bs4 and regular expression to process page source codes and collect entity data which saves in sqlite database
  • PS: I also calculate the distance between my school and rent addresses through google map(using selenium)

Installation

This project requires the following Python libraries:

  • bs4 (BeautifulSoup)
  • re (Regular Expressions)
  • time
  • datetime
  • selenium
  • multiprocessing
  • threading
  • traceback

To install these libraries, use the following pip command:

pip install beautifulsoup4 selenium datetime multiprocessing traceback

Usage

1.change chrome or firefox webdriver address(functs.headless.py)

2.create a sqlite table in 'functs' folder according to functs.rent_inf.py

3.run main.py

Contributing

Please feel free to fork this repository, make amendments, and create pull requests.

Credits

This project utilizes the BeautifulSoup and Selenium libraries for web scraping, as well as several other standard Python libraries.

License

Include your license information here, if applicable.


yeeyi-rent-info-scraper-outdated's People

Contributors

liang-zhihao avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.