Giter Club home page Giter Club logo

scraping-log-kafka-infra's Introduction

This is an example of a Python application for scraping a log.txt, considering that this log is received through a messaging server, such as Kafka. The APIs are locally stored in Docker containers, in a NoSQL database (MongoDB). Welcome!

The application receives an extensive log containing timestamp information and player movements in an FPS game (such as CS, Vava, CoD, The Arena 3). Information about each round played in the game lobby is recorded, and the Python application

extracts important information for both player activity logging and future analysis of player behavior in the room.

How it works:

Kafka || Docker || MongoDB

Zookeeper and Kafka broker are separated into containers for event management triggered by a producer, while one consumer collects the messages. In this repository, in the file sendLog.py, an instance of the producer is created to send the log file to the server on a port specified in docker-compose, while the main.py file creates an instance of the consumer that collects information recorded after its creation. After obtaining the information, scraping is performed, and the data is stored in documents in MongoDB.

Struct

In the 'modules' folder, there are classes responsible for scraping, and in the controllers folder, there are classes responsible for handling the Kafka-Confluent API and MongoDB. In the tests folder, there are unit tests for the classes in 'module' using the Pytest framework. A summary of the unit tests is available in .html files.

How to use:

  1. Clone the repository.

  1. Install the packages in the requirements.txt using:
pip install -r requirements.txt

  1. Run docker-compose. You should have 3 active containers.
docker-compose -d up

  1. Enter the broker and configure the partitions and topic to connect with the consumer. The broker name in the .yml file is "only1Brokerv2".
docker exec -it "broker_name" bash
docker-topics --create --bootstrap-server "broker_url" --replication-factor 1 --partition 1 --topic "topic_name"

As an example, both the replication factor and partitions were configured to 1. It is recommended to use more than 1 broker, with replication and partitions above 2.


  1. If you want to check if the topic was created:
kafka-topics --list --bootstrap-server localhost:"broker_url"

The consumer was instantiated in the main.ipynb file, making it more didactic to observe each step of the scraping. Remember to instantiate the consumer before executing the producer in step 5.


  1. Open your terminal and run the 'kafka-docker/tests_in_project/sendLog.py' file.
python sendLog.py localhost:19092 "your-topic-here"

  1. Go back to the main.ipynb file and continue running each cell.

scraping-log-kafka-infra's People

Contributors

erlonidas avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.