Giter Club home page Giter Club logo

tripadvisor-scraper's Introduction

TripAdvisor Scrapper

Scrape the hotel reviews of a whole city on TripAdvisor.

Requirements

  • python 3.5

Installation & Setup

Download and install required libs and data:

pip install bs4

Usage Scrapper

Store all reviews of New York City:

python tripadvisor-scrapper.py 60763 New_York_City_New_York

Store all reviews of Paris:

python tripadvisor-scrapper.py 187147 Paris_Ile_de_France

Store all reviews of Vienna:

python tripadvisor-scrapper.py 190454 Vienna

The scrapper requires the city location id and the city name as commandline arguments. Both can be retrieved from the url, for example, https://www.tripadvisor.com/Hotels-g60763-New_York_City_New_York-Hotels.html The city location id is the number after the g. The city name is the string from the dash after the city location id to the dash before Hotels.

Store all reviews of Vienna and additionally store the review urls list as pickle for rescraping later:

python tripadvisor-scrapper.py 190454 vienna --pickle store

A pickle is stored in data/timestamp-cityname

Store all reviews of Vienna using a review urls list loaded from pickle/20160601-1522-vienna.pickle:

python tripadvisor-scrapper.py 190454 Vienna --pickle load --filename 20160601-1522-vienna.pickle

A pickle to load has to be placed in the pickle directory at the same directory level as the tripadvisor-scrapper.py

Usage Totalizer

Put all reviews and hotel information of a city together:

python tripadvisor-totalizer.py /Users/admin/tripadvisor-scrapper/data/20160716-202314-vienna

Author

Michael Andorfer

tripadvisor-scraper's People

Contributors

andorfermichael avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

tripadvisor-scraper's Issues

Getting an Indexerror on all scrapes

Getting this error for all of my scrapes.

Traceback (most recent call last):
  File "tripadvisor-scrapper.py", line 716, in <module>
    city_hotel_urls = parse_hotel_urls_of_city(BASE_URL, city_pagination_urls, headers)
  File "tripadvisor-scrapper.py", line 113, in parse_hotel_urls_of_city
    hotel_urls.append(base_url + soup.find_all('a', attrs={'class': 'property_title '})[j]['href'][1:])
IndexError: list index out of range

error on scrapping

Followed the step, but get an error like this:
: error: the following arguments are required: 187147, Paris_Ile_de_France

any idea why does this happen?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.