Giter Club home page Giter Club logo

haxball_scraper's Introduction

Haxball Scraper

Haxball Scraper is a tool that uses Selenium (Query Data), scrolls through a room list in web game Haxball (haxball.com) and saves the data of all rooms and global stats to MariaDB.

Uses:

  • MariaDB - storing data
  • Adminer - reading data
  • Query Data - scraping data

docker-compose.yml

services:
  mariadb:
    image: mariadb:10.6
    ...
  adminer:
    image: adminer
    ports:
      - 8080:8080
    ...

  query_data:
    build: ./query_data
    ...

The compose file defines a stack with three services: mariadb, adminer and query_data. When deploying the stack, docker-compose maps container ports to host ports. Make sure, that port 8080 is not already in use.

Requirements

  • docker
  • docker-compose

Usage

Run

Create .env file in root directory with:

MYSQL_ROOT_PASSWORD=yoursecretpassword

Change yoursecretpassword to your own password.

Run stack:

docker-compose up -d

Check

Listing containers must show three containers running and the port mapping:

docker ps

If containers are visible, navigate to http://localhost:8080 in your web browser and use the login credentials

  • user: root
  • password: from .env file
  • database name: haxball

to access the database.

Note: Database may be empty if the first scrape was not finished

Scrape process (scrape_and_upload.py) is being run chronically with 5 minutes cooldown by default.

👏 Seems like you are getting all the juicy Haxball data. Sweet.

Tear down

If you got enough, stop and remove the containers. Use -v to remove the volumes if looking to erase all data.

$ docker-compose down -v

Caveats

scrape_data/scrape_and_upload.py

199 if __name__ == "__main__":
200     while True:
201         cycle()
202         time.sleep(60*5)
  1. Loop is endless by default. This way we minimize need for changing container environment in order to run cron processes, as well as outside of container (one level higher - as another service in stack).
  2. Sleep time between executions is 5 minutes, but it does not mean, that the data is scraped every 5 minutes. The process itself takes around 3 minutes. Therefore, you will get data once every ~8 minutes.

docker-compose.yml

  8     environment:
  9       MYSQL_ROOT_PASSWORD: ${MYSQL_ROOT_PASSWORD}
  1. Stack is using only root database user. Consider altering the code in order to create suitable roles in the database.

What's next?

Now you can only read data through adminer. Next step would be visualizing the data.

Add grafana to the stack in docker-compose.yml:

  grafana:
    image: grafana/grafana:main
    restart: always
    ports:
      - 3000:3000
    volumes:
      - grafana-storage:/var/lib/grafana
    environment:
      GF_AUTH_ANONYMOUS_ENABLED: "true"
      GF_AUTH_ANONYMOUS_HIDE_VERSION: "true"
      GF_AUTH_ANONYMOUS_ORG_NAME: something

Change volumes section to the following:

volumes:
  grafana-storage:
  mariadb-storage:

Next, navigate to https://localhost:3000 and connect to the Data Source. MariaDB will be accessible under URI mariadb:3306 or just mariadb. Insert database user details and you should be good to go with making your own visualizations.

Note: You would have to use root user as reader for Grafana, which is not recommended. Consider creating additional role to have a production-ready Grafana solution.

Contributions

Very welcome.

License

MIT

haxball_scraper's People

Contributors

jakjus avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.