Giter Club home page Giter Club logo

google_myactivity_scraper's Introduction

high level scraper for myactivity.google.com

CONCEPT

(from my blog) I worked in previous days about integrating myactivity.google.com inside my dashboard. I am really fascinated by the amount of personal data I can find on myactivity, they will totally make my dashboard more rich and accurate. I opened a new standalone repo to implement this scarper at github.com/SolbiatiAlessandro/google_myactivity_scarper. I looked for couple of options and how to build this and I decided to use bs4 for the actual scraping and pyppetterr to run Chrome Headless. Where bs4 was kind of an obvious choice, I was not too familiar with headless browsers: you basically run chrome without the browser rendering. There is a nice library maintained by the Chrome/Chromium dudes called Puppeteer, you can run headless Chrome as a nodejs application, and the trick is that it can track your cookie sessions and automatically login into your google account to get your activity.

DOCS

  • README.md
  • init.py
  • init.pyc
  • pycache
  • google_myactivity_login.py : this is the pyppeteer module that launch the headless chrome browser
  • google_myactivity_parser.py : this is the bs4 parser
  • pyppeteer_example.py
  • requirements.txt
  • screenshots
  • tests
  • venv

To get your google-data-dir go on chrome://version/ and copy paste Profile Path mine is

/Users/lessandro/Library/Application\ Support/Google/Chrome/Default

Is put here and uploaded on the webapp, is small (only 2.4K but there are 6k files, that's not nice since there is an upload limit for gcloud of 10k files. Should see what is needed for my login and delete everything else)

DEV

  • implemented pyppetter but is not working with cookies
  • try to use ChromeDevTool APIs directly or think of alternative

TESTING

How to run tests

python -m pytest tests
pytest tests # this has import errors for some weird reasons

google_myactivity_scraper's People

Contributors

solbiatialessandro avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.