Giter Club home page Giter Club logo

revisitpython's Introduction

Computer-Graphics: REVISTING PYTHON - Group 12 CAT 1

About

This Python3 development project establishes an environment for data processing and file manipulation. It structures a PyCharm project, imports a massive dataset, and accomplishes three main tasks. It creates language-specific files and a comprehensive translation dataset ensuring clean file management with GitHub integration.

Tasks

  1. Build a Python3 project with the structure of projects in PyCharm.
  2. Import the MASSIVE Dataset mentioned in the Data File above. In this dataset, the pivot language is English. Given that all the ids of the languages are matching, generate an en-xx.xlxs file for all the languages using the id, utt, and annot_utt.
  3. For English (en), Swahili (sw), and German (de), generate separate jsonl files with test, train, and dev datasets respectively.
  4. Generate one large json file showing all the translations from English (en) to xx with id and utt for all the train sets. Pretty print your json file structure.

Python Files

  • functions.py: Contains functions to answer the questions for generating files from Excel to Jsonl.
  • main.py: The main program file that loads, processes, and analyzes data.

Data Produced

  • LANGUAGE_SPECIFIC_FILES: The output excel files that contain translations of all languages.
  • JSONL_FILES: The jsonl files that contain the pretty printed jsonl formatted for each filtered file.
  • COMBINED_TRAANSLATION.jsonl: Large jsonl file showing all the translations from en to xx with id and utt for all the train sets.

Usage

To set up and run the project, follow these steps:

  1. Check Python Version: Ensure you have Python 3.x installed on your system. You can check your Python version by running the following command:

    python --version
  2. Create a Virtual Environment: It's a good practice to create a Python virtual environment to isolate project dependencies. You can create one using the following commands:

    • On macOS and Linux:
    python -m venv venv
    source venv/bin/activate
    • On Windows:
    python -m venv venv
    .\venv\Scripts\activate
  3. Install Dependencies: Clone the repository and navigate to the project directory in your terminal. Then, install the required dependencies by running the following command:

    pip install -r requirements.txt
  4. Run the Generator Script (Windows using WSL and Linux Terminal): Execute the generator.sh shell script to generate project files. Depending on your platform, use one of the following methods:

    • Windows Subsystem for Linux (WSL):

      bash generator.sh
    • Linux Terminal:

      ./generator.sh
  5. Check Output: After running the script, you can find the following logs and directories in the project directory:

    • generator.log: This log file contains information about the generated files.
    • files_count.log: This log file will contain information about the count of generated files.
    • language_specific_files/: This directory will contain language-specific Excel files generated by the script.
    • jsonl_files/: This directory will contain JSONL files generated.
  6. Deactivate the Virtual Environment: When you're done with the project, don't forget to deactivate the virtual environment using the command:

    deactivate

Team

  • 137192 Eddy Bogonko
  • 137938 Martin Mwangi
  • 136603 Jane Daisy
  • 146013 Amanda Karani
  • 139991 Glen Musa

License

bogonkoEd/revisitPython is licensed under the MIT License.

revisitpython's People

Contributors

bogonkoed avatar wotengs avatar amandakarani avatar glynn-2339 avatar janedaisy avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.