Giter Club home page Giter Club logo

crawler_immoscout_ger's Introduction

Crawler for flats and houses on immobilienscout24.de

This crawler will crawl all the listings for flats and houses (rent and sale) on the German real estate website www.immobilienscout24.de.

Dependencies

This crawler was programmed with Python3 and BeautifulSoup4. Naturally, you would need to install Python3 as well as the BeautifulSoup4 package. Pandas was used for data wrangling and generating the output. I recommend using Anaconda which already includes Pandas.

Usage

  1. Make sure that all dependencies are installed.
  2. Download the .py file in this repository.
  3. Run the file inside Python
  4. Go to your root directory (Home in unix; Documents in Windows)
  5. Use either wohnung_data_clean.csv or wohnung_data_raw.csv for further analysis

Possible Sources of Error The crawler was written while using UTF-8, so make sure that you set encoding to UTF-8 in your IDE.

Output

While crawling, the crawler will print out the website it is currently processing. Note that it does not show the domain, don't worry. In your home directory you'll find two similar files: wohnung_data_clean.csv and wohnung_data_raw.csv. Both files contain the same data. However, wohnung_data_clean.csv has already been cleaned and is ready for analysis. If you prefer to work with the data that comes directly from Immobilienscout24.de, then you should use wohnung_data_raw.csv. There are six variables inside the clean version. Price indicates the price for a given real estate (either rent or total price). size is the size of a listing in square meters. location_first is the most precise location indication (should be equal to the street in most cases). location_last should be equal to the city. real_estate tells you whether it is a flat ("Wohnung") or a house ("Haus"). "ownership" indicates whether the given real estate is for rent ("Miete") or for sale ("Kauf").

Formatting

German formatting conventions are used. This means that ";" is the delimeter inside the .csv file. Also note that the decimal seperator is equal to "," and not "."! Keep these formatting conventions in mind when you read in the data.

Usage

Feel free to use this data and crawler for your personal/academic/commerical projects. If you write an academic paper, I would appreciate to be mentioned by my full name (see bio). Please be polite when crawling. This crawler was not developed to do any harm.

crawler_immoscout_ger's People

Contributors

jhruzik avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.