This is an example of a Python application for scraping a log.txt, considering that this log is received through a messaging server, such as Kafka. The APIs are locally stored in Docker containers, in a NoSQL database (MongoDB). Welcome!
The application receives an extensive log containing timestamp information and player movements in an FPS game (such as CS, Vava, CoD, The Arena 3). Information about each round played in the game lobby is recorded, and the Python application
extracts important information for both player activity logging and future analysis of player behavior in the room.
Zookeeper and Kafka broker are separated into containers for event management triggered by a producer, while one consumer collects the messages. In this repository, in the file sendLog.py, an instance of the producer is created to send the log file to the server on a port specified in docker-compose, while the main.py file creates an instance of the consumer that collects information recorded after its creation. After obtaining the information, scraping is performed, and the data is stored in documents in MongoDB.
In the 'modules' folder, there are classes responsible for scraping, and in the controllers folder, there are classes responsible for handling the Kafka-Confluent API and MongoDB. In the tests folder, there are unit tests for the classes in 'module' using the Pytest framework. A summary of the unit tests is available in .html files.
- Clone the repository.
- Install the packages in the requirements.txt using:
pip install -r requirements.txt
- Run docker-compose. You should have 3 active containers.
docker-compose -d up
- Enter the broker and configure the partitions and topic to connect with the consumer. The broker name in the .yml file is "only1Brokerv2".
docker exec -it "broker_name" bash
docker-topics --create --bootstrap-server "broker_url" --replication-factor 1 --partition 1 --topic "topic_name"
As an example, both the replication factor and partitions were configured to 1. It is recommended to use more than 1 broker, with replication and partitions above 2.
- If you want to check if the topic was created:
kafka-topics --list --bootstrap-server localhost:"broker_url"
The consumer was instantiated in the main.ipynb file, making it more didactic to observe each step of the scraping. Remember to instantiate the consumer before executing the producer in step 5.
- Open your terminal and run the 'kafka-docker/tests_in_project/sendLog.py' file.
python sendLog.py localhost:19092 "your-topic-here"
- Go back to the main.ipynb file and continue running each cell.