Giter Club home page Giter Club logo

googlemaps-scraper's Introduction

Google Maps Scraper

Scraper of Google Maps reviews. The code allows to extract the most recent reviews starting from the url of a specific Point Of Interest (POI) in Google Maps. An additional extension helps to monitor and incrementally store the reviews in a MongoDB instance.

Installation

Follow these steps to use the scraper:

  • Download latest version of Chromedrive from here.

  • Install Python packages from requirements file, either using pip, conda or virtualenv:

      conda create --name scraping python=3.9 --file requirements.txt
    

Note: Python >= 3.9 is required.

Basic Usage

The scraper.py script needs two main parameters as input:

  • --i: input file name, containing a list of urls that point to Google Maps place reviews (default: urls.txt)
  • --N: number of reviews to retrieve, starting from the most recent (default: 100)

Example:

python scraper.py --N 50

generates a csv file containing last 50 reviews of places present in urls.txt

In current implementation, the CSV file is handled as an external function, so if you want to change path and/or name of output file, you need to modify that function.

Additionally, other parameters can be provided:

  • --place: boolean value that allows to scrape POI metadata instead of reviews (default: false)
  • --debug: boolean value that allows to run the browser using the graphical interface (default: false)
  • --source: boolean value that allows to store source URL as additional field in CSV (default: false)
  • --sort-by: string value among most_relevant, newest, highest_rating or lowest_rating (default: newest), developed by @quaesito and that allows to change sorting behavior of reviews

For a basic description of logic and approach about this software development, have a look at the Medium post

Monitoring functionality

The monitor.py script can be used to have an incremental scraper and override the limitation about the number of reviews that can be retrieved. The only additional requirement is to install MongoDB on your laptop: you can find a detailed guide on the official site

The script takes two input:

  • --i: same as monitor.py script
  • --from-date: string date in the format YYYY-MM-DD, gives the minimum date that the scraper tries to obtain

The main idea is to periodically run the script to obtain latest reviews: the scraper stores them in MongoDB up to get either the latest review of previous run or the day indicated in the input parameter.

Take a look to this Medium post to have more details about the idea behind this feature.

Notes

Url must be provided as expected, you can check the example file urls.txt to have an idea of what is a correct url. If you want to generate the correct url:

  1. Go to Google Maps and look for a specific place;
  2. Click on the number of reviews in the parenthesis;
  3. Save the url that is generated from previous interaction.

googlemaps-scraper's People

Contributors

dependabot[bot] avatar gaspa93 avatar gozdekurtulmus avatar gtesk avatar ryuuzake avatar saito828koki avatar samirarman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

googlemaps-scraper's Issues

selenium.common.exceptions.NoSuchElementException: Message: no such element:

[Review 0]
Traceback (most recent call last):
File "/Users/satyammishra/Desktop/sentiment_analysis/googlemaps-scraper/scraper.py", line 63, in
reviews = scraper.get_reviews(n)
File "/Users/satyammishra/Desktop/sentiment_analysis/googlemaps-scraper/googlemaps.py", line 168, in get_reviews
self.__scroll()
File "/Users/satyammishra/Desktop/sentiment_analysis/googlemaps-scraper/googlemaps.py", line 278, in __scroll
scrollable_div = self.driver.find_element(By.CSS_SELECTOR, 'div.siAUzd-neVct.section-scrollbox.cYB2Ge-oHo7ed.cYB2Ge-ti6hGc')
File "/Users/satyammishra/opt/anaconda3/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 857, in find_element
return self.execute(Command.FIND_ELEMENT, {
File "/Users/satyammishra/opt/anaconda3/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 435, in execute
self.error_handler.check_response(response)
File "/Users/satyammishra/opt/anaconda3/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"div.siAUzd-neVct.section-scrollbox.cYB2Ge-oHo7ed.cYB2Ge-ti6hGc"}
(Session info: headless chrome=103.0.5060.134)
Stacktrace:
0 chromedriver 0x0000000102472ef9 chromedriver + 4480761
1 chromedriver 0x00000001023fe5d3 chromedriver + 4003283
2 chromedriver 0x0000000102091338 chromedriver + 410424
3 chromedriver 0x00000001020c75bd chromedriver + 632253
4 chromedriver 0x00000001020c7841 chromedriver + 632897
5 chromedriver 0x00000001020f97d4 chromedriver + 837588
6 chromedriver 0x00000001020e4a8d chromedriver + 752269
7 chromedriver 0x00000001020f74f1 chromedriver + 828657
8 chromedriver 0x00000001020e4953 chromedriver + 751955
9 chromedriver 0x00000001020bacd5 chromedriver + 580821
10 chromedriver 0x00000001020bbd25 chromedriver + 584997
11 chromedriver 0x000000010244402d chromedriver + 4288557
12 chromedriver 0x00000001024491b3 chromedriver + 4309427
13 chromedriver 0x000000010244e23f chromedriver + 4330047
14 chromedriver 0x0000000102449dfa chromedriver + 4312570
15 chromedriver 0x0000000102422fef chromedriver + 4153327
16 chromedriver 0x0000000102463d78 chromedriver + 4418936
17 chromedriver 0x0000000102463eff chromedriver + 4419327
18 chromedriver 0x000000010247aab5 chromedriver + 4512437
19 libsystem_pthread.dylib 0x00007ff811a524f4 _pthread_start + 1

no use for MAX_SCROLL

In your current code, there is no use for MAX_SCROLL, and I'm not sure why but all i get is this
review
please guide me on what needs to be done.
this is the error message i get once i CTRL+C
error

Fails on [Review 0]

Returns the following error:

raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"div.siAUzd-neVct.section-scrollbox.cYB2Ge-oHo7ed.cYB2Ge-ti6hGc"}```

unexpected keyword argument 'log_level'

/googlemaps-scraper/googlemaps.py", line 314, in __get_driver
input_driver = webdriver.Chrome(executable_path=ChromeDriverManager(log_level=0).install(), options=options)
TypeError: init() got an unexpected keyword argument 'log_level'

Autofocus processing was blocked because a document already has a focused element.

I'm unable to scarp the reviews, it showed me this
DevTools listening on ws://127.0.0.1:57417/devtools/browser/4c960584-6cdf-450f-831f-a1175e7d6d6a
[0723/113647.067:INFO:CONSOLE(0)] "Autofocus processing was blocked because a document already has a focused element.", source: https://www.google.com/maps/place/Al+Salaam+Mall/@21.5078941,39.2233532,15z/data=!4m8!3m7!1s0x15c3ce6cdb182a97:0x29f6012ad865f128!8m2!3d21.5078941!4d39.2233532!9m1!1b1!16s%2Fg%2F11bvt4d9_v?entry=ttu (0)
←[36m[Review 0]←[0m

any advice how to solve it, it was work last week

Recent comment

Recent reviews won't work

Google changed the role names, so the most recent function no longer works.
I tried to change the role names, but it seems like that's not the only thing they have changed
image

Failed to click recent button while debug works just fine

Hi, I ran this script on my previous computer and it works fine, but when I switched to my new computer, it keeps gives me "failed to click recent button", I checked debug and debug works just as it should, could you give me some idea how to solve this?

Parallelism

Can we make __scroll and __expand_reviews work parallely in the get_reviews function to improve performance?

webdriver_manager pointing to browser version instead of driver version?

I followed the readme including installing dependencies to new environment. when I try to run on terminal, i get the following error. I tried tracing it back, not familiar with chormedriver manager.

in the instructions I downloaded chromedriver and placed it in the root dir of the scraper just in case.

(google_maps_scrape) jg@J-MacBook-Pro googlemaps-scraper % python scraper.py --N 50 --i urls_1.txt
Traceback (most recent call last):
File "/Users/folder/googlemaps-scraper/scraper.py", line 43, in
with GoogleMapsScraper(debug=args.debug) as scraper:
File "/Users/folder/googlemaps-scraper/googlemaps.py", line 31, in init
self.driver = self.__get_driver()
File "/Users/folder/googlemaps-scraper/googlemaps.py", line 377, in __get_driver
input_driver = webdriver.Chrome(executable_path=ChromeDriverManager(log_level=0).install(), options=options)
File "/usr/local/anaconda3/envs/google_maps_scrape/lib/python3.10/site-packages/webdriver_manager/chrome.py", line 32, in install
driver_path = self._get_driver_path(self.driver)
File "/usr/local/anaconda3/envs/google_maps_scrape/lib/python3.10/site-packages/webdriver_manager/manager.py", line 23, in _get_driver_path
driver_version = driver.get_version()
File "/usr/local/anaconda3/envs/google_maps_scrape/lib/python3.10/site-packages/webdriver_manager/driver.py", line 41, in get_version
return self.get_latest_release_version()
File "/usr/local/anaconda3/envs/google_maps_scrape/lib/python3.10/site-packages/webdriver_manager/driver.py", line 74, in get_latest_release_version
validate_response(resp)
File "/usr/local/anaconda3/envs/google_maps_scrape/lib/python3.10/site-packages/webdriver_manager/utils.py", line 80, in validate_response
raise ValueError("There is no such driver by url {}".format(resp.url))
ValueError: There is no such driver by url https://chromedriver.storage.googleapis.com/LATEST_RELEASE_118.0.5993
(google_maps_scrape) jg@J-MacBook-Pro googlemaps-scraper %

pip list:

Package Version


appnope 0.1.3
asttokens 2.4.1
backcall 0.2.0
beautifulsoup4 4.6.0
certifi 2022.12.7
charset-normalizer 2.0.12
colorama 0.4.5
comm 0.1.4
configparser 5.2.0
crayons 0.4.0
debugpy 1.8.0
decorator 5.1.1
exceptiongroup 1.1.3
executing 2.0.0
idna 3.3
ipykernel 6.26.0
ipython 8.16.1
jedi 0.19.1
jupyter_client 8.5.0
jupyter_core 5.4.0
matplotlib-inline 0.1.6
nest-asyncio 1.5.8
numpy 1.23.0
packaging 23.2
pandas 1.4.3
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
pip 23.3
platformdirs 3.11.0
prompt-toolkit 3.0.39
psutil 5.9.6
ptyprocess 0.7.0
pure-eval 0.2.2
Pygments 2.16.1
pymongo 3.9.0
python-dateutil 2.8.2
python-dotenv 1.0.0
pytz 2022.1
pyzmq 25.1.1
requests 2.31.0
selenium 3.14.0
setuptools 68.0.0
six 1.16.0
stack-data 0.6.3
termcolor 1.1.0
tornado 6.3.3
traitlets 5.12.0
urllib3 2.0.7
wcwidth 0.2.8
webdriver-manager 3.5.2
wheel 0.41.2

most_relevant

Hello, if i set --sort_by most_relevant then some error:
selenium.common.exceptions.ElementNotVisibleException: Message: element not interactable

emails

hi any way you can add emails and ratings too pls?

"Uncaught RangeError: Maximum call stack size exceeded" error

I encountered the following error while scraping reviews of this business

Any help is apprecaited.

[0227/214023.992:INFO:CONSOLE(1560)] "Uncaught RangeError: Maximum call stack size exceeded", source: /maps/_/js/k=maps.m.en.FigERXCYMc0.2019.O/ck=maps.m.fQVt13g1oTE.L.W.O/m=vwr,vd,a,duc,owc,ob2,sp,en,smi,sc,vlg,smr,as,bpw,wrc/am=BsgEBA/rt=j/d=1/rs=ACT90oHs0cWOVxL_9t5x_yY1Y1NZDyb6qg/ed=1/exm=sc2,per,mo,lp,ti,ds,stx,pwd,dw,ppl,log,std,b (1560)

Googlemaps business info

Hi dear Mattia @gaspa93. Would you consider creating a library that pulls data on any business from Googlemaps (business name, avg rating, open hours, price range ($) etc.?

Stale Element Reference Error

Hi @gaspa93,
I was trying to use the following URL: https://www.google.com/maps/place/Ellora+Caves/@20.025817,75.1779975,17z/data=!4m7!3m6!1s0x3bdb93bd138ae4bd:0x574c6482cf0b89cf!8m2!3d20.025817!4d75.1779975!9m1!1b1
for scraping N=1000 reviews and sort by = most relevant when I got this error: selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
Full error:

[Review 0]
Traceback (most recent call last):
File "scraper.py", line 63, in
reviews = scraper.get_reviews(n)
File "/home/maunil/Desktop/googlemaps-scraper/googlemaps.py", line 172, in get_reviews
self.__expand_reviews()
File "/home/maunil/Desktop/googlemaps-scraper/googlemaps.py", line 298, in __expand_reviews
l.click()
File "/home/maunil/Desktop/venv/lib/python3.8/site-packages/selenium/webdriver/remote/webelement.py", line 80, in click
self._execute(Command.CLICK_ELEMENT)
File "/home/maunil/Desktop/venv/lib/python3.8/site-packages/selenium/webdriver/remote/webelement.py", line 628, in _execute
return self._parent.execute(command, params)
File "/home/maunil/Desktop/venv/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 320, in execute
self.error_handler.check_response(response)
File "/home/maunil/Desktop/venv/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
(Session info: headless chrome=103.0.5060.114)

Maximum call stack size exceeded

When I set --N to 10000, it scraped 1140 reviews, then threw this message and stopped:
[0724/211919.550:INFO:CONSOLE(1550)] "Uncaught RangeError: Maximum call stack size exceeded", source: /maps/_/js/k=maps.m.en.b7ZwJWQZkHM.2019.O/ck=maps.m.HgR1ySVFXik.L.W.O/m=vwr,vd,a,nrw,owc,ob2,sp,en,smi,sc,vlg,smr,as,wrc/am=BoDCIhAB/rt=j/d=1/rs=ACT90oEnDViSjerMr5DSozguPqRfPvO2Xg/ed=1/exm=sc2,per,mo,lp,ti,ds,stx,dwi,enr,pwd,dw,ppl,log,std,b (1550)

This is the url I used:
https://www.google.it/maps/place/Pantheon/@41.8986108,12.4768729,17z/data=!4m18!1m9!3m8!1s0x132f604f678640a9:0xcad165fa2036ce2c!2sPantheon!8m2!3d41.8986108!4d12.4768729!9m1!1b1!16zL20vMDF4emR6!3m7!1s0x132f604f678640a9:0xcad165fa2036ce2c!8m2!3d41.8986108!4d12.4768729!9m1!1b1!16zL20vMDF4emR6?entry=ttu

Any advice would be appreciated.

Any chance of an update?

Hello guys and especially Gaspa!

This is just an amazing tool. Im an amateur in programming but I can see how advanced this tool is and manages to do quite a bit. If only there was an update to make it work. I've struggled with it for a few days but unfortunately Im just not at a level where I can fix it.
It would be amazing if someone could update it.
Thank you again!

can't get original language of reviews.

Hi,
I get reviews on the CSV file but can't get the original text (not in English).
How can I fix it?
Eg:
(Translated by Google) When you come to Vietnam, visiting a beauty salon is a mandatory course (Original) 베트남� 오면 미장� 방문� 필수 코스 정답�네요

Failed to click recent button

HI, i was trying to execute the code on the default location in urls.txt and I've got the following in the log file with no input on terminal.

2020-02-23 18:34:28,685 - WARNING - Failed to click recent button
2020-02-23 18:34:38,990 - WARNING - Failed to click recent button
2020-02-23 18:34:49,301 - WARNING - Failed to click recent button
2020-02-23 18:34:59,635 - WARNING - Failed to click recent button
2020-02-23 18:35:09,984 - WARNING - Failed to click recent button
2020-02-23 18:35:20,309 - WARNING - Failed to click recent button
2020-02-23 18:35:30,624 - WARNING - Failed to click recent button
2020-02-23 18:35:40,961 - WARNING - Failed to click recent button
2020-02-23 18:35:51,317 - WARNING - Failed to click recent button
2020-02-23 18:36:01,665 - WARNING - Failed to click recent button

Looking at googlemaps.py i found that the issue is within:

   try:
            menu_bt = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'div.cYrDcjyGO77__container')))  # //button[@data-value=\'Sort\'] XPath with graphical interface
            menu_bt.click()
            clicked = True
            time.sleep(3)
     except Exception as e:
            tries += 1
            self.logger.warn('Failed to click recent button')

Could you please explain what is happening and why it isn't working.

possible confilcting requirements

from requirements.txt:

beautifulsoup4==4.6.0
certifi==2022.12.7
charset-normalizer==2.0.12
colorama==0.4.5
configparser==5.2.0
crayons==0.4.0
idna==3.3
numpy==1.23.0
pandas==1.4.3
pymongo==3.9.0
python-dateutil==2.8.2
pytz==2022.1
requests==2.31.0
selenium==3.14.0
six==1.16.0
termcolor==1.1.0
webdriver-manager==3.5.2
pandas==0.25.2
numpy==1.22.0

there are 2 pandas and numpy versions listed

Limit to 900 reviews

Hi,
If i want to get all reviews from a big places (lets say a mcdo), i only get 900 reviews then google ban us : urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fef9782ea20>: Failed to establish a new connection: [Errno 61] Connection refused but i have to Ctrl+C your script to get this error
(i think you are trying again, and again, and again... but google still ban you) ahah

Posible problem with the scrollable js script

I am having the following issue when running the example command:

"selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"div.section-layout.section-scrollbox.scrollable-y.scrollable-show"}
(Session info: headless chrome=90.0.4430.93)"

I tried this URL = "https://www.google.com/maps/place/Nike+Alto+Palermo/@-34.5883936,-58.4098625,15z/data=!4m7!3m6!1s0x0:0x842bcbef147891ee!8m2!3d-34.588376!4d-58.4098447!9m1!1b1"

But the same happens when I try to use the URLs at urls.txt

I am using a Mac and installed chromedrive using brew

Fails to click sorting button

Hi @gaspa93, I was attempting to scrape the following URL using the command python scraper.py --N 50 --i urls.txt --debug:
https://www.google.com/maps/place/The+Kutaya/@-8.7131747,115.18668,13z/data=!4m11!3m10!1s0x2dd2441f302d3927:0x7fdd6aa714bc38e1!5m2!4m1!1i2!8m2!3d-8.7389695!4d115.1673844!9m1!1b1!16s%2Fg%2F11cn5x9s4l?entry=ttu

However, the log displays a warning message stating, "Failed to click sorting button" after the script finishes running. Is there any way to fix this issue? Thank you!

__parse_place function error

When running the scraper.py file, print(scraper.get_account(url)) in line 41 and 42 is never executed. I commented out the if statement and found an error under the __parse_place function in line 164 and 165.

Error:

File "googlemaps.py", line 165, in __parse_place

place['overall_rating'] = float(response.find('div', class_='gm2-display-2').text.replace(',', '.')) AttributeError: 'NoneType' object has no attribute 'text'

Tried using the required beautifulsoup, selenium version and tried different versions of chromedriver. That did not solve the issue. What could be the problem?

Hi Mattia

I am trying to use your script but when i execute it on my cmd nothing appears in the data folder.
I just added 1 import ´from http import cookies´ because SameSite error appears, and this apparently solves it.
WARNING:
"A cookie associated with a cross-site resource at http://google.com/ was set without the SameSite attribute. A future release of Chrome will only deliver cookies with cross-site requests if they are set with SameSite=None and Secure. You can review cookies in developer tools under Application>Storage>Cookies and see more details at https://www.chromestatus.com/feature/5088147346030592 and https://www.chromestatus.com/feature/5633521622188032.", source: https://www.google.it/maps/place/Pantheon/@41.8986108,12.4746842,17z/data=!3m1!4b1!4m7!3m6!1s0x132f604f678640a9:0xcad165fa2036ce2c!8m2!3d41.8986108!4d12.4768729!9m1!1b1 (0)

ERROR file gm-scraper.txt:
2020-03-20 14:22:17,098 - WARNING - Failed to click recent button
2020-03-20 14:22:27,451 - WARNING - Failed to click recent button
2020-03-20 14:22:37,877 - WARNING - Failed to click recent button

If you can help me i will apreciate it because i need to get the data for my end-of-degree project.

Thank you soo much.

`__expand_reviews` sometimes not working

The position of self.__scroll() in these lines is incorrect, causing self.__expand_reviews() to be executed before the 'expand more' buttons are loaded. I believe that self.__scroll() was intended to be placed immediately after the comment # scroll to load reviews.

# scroll to load reviews
# wait for other reviews to load (ajax)
time.sleep(4)
self.__scroll()
# expand review text
self.__expand_reviews()

Support Initial place map url

Hello, is there a plan for supporting Initial place map url rather than clicking on reviews button.

the idea is i want to automate the whole extraction operation without human intervention (clicking on reviews).

scrape most relevant reviews

Hi Mattia,

Thansk for sharing your script, it works flawlessly!

I am now trying to re-adapt it to scrape 'most relevant reviews' rather than 'newest' ones.
However, if I change line 66 in googlemaps.py to pick the 'first element' rather than the 'second element', the __scroll function will not go through.

I was wondering whether you faced this difficulty before.

Thanks in advance.

Cheers,
Michele

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.