Giter Club home page Giter Club logo

test_driving_data_processor's Introduction

Dataset Cleaner for Test Drive Metadata

Project Overview

This project offers a solution to parse, validate, and clean the metadata of test drives. It considers common issues related to data validity, quality, and consistency. The dataset must be processed to ensure that the data are clean for further use. The configuration of the processing is supported via a JSON file. The preprocessed data is stored in a CSV file, ready for further applications such as visualizations, machine learning model training etc. Details on any issues found during cleaning are stored in an appropriate .log file for analysis.

Installation and Setup

Setting Up the Virtual Environment with Conda

Before using the data cleaning generator, you must install certain libraries that are used as dependencies. These dependencies are listed in the environment.yml file and can be installed by creating a new virtual environment with Conda.

To create a Conda environment:

  1. Open a terminal.
  2. Navigate to the generator directory.
  3. Run the command: conda env create -f environment.yml
  4. This command sets up a new Conda environment named dataset_cleaner_env and installs all the necessary packages.

Configuration

Configuration of the generator is done through a JSON config file, which is validated and parsed using the python Pydantic library.

Configuration Parameters

  • dataset_path: The path to the dataset folder containing the JSON files with metadata instances.
  • output_data_path: Specifies the directory where the preprocessed data CSV file is stored.
  • clean_data: A Boolean parameter that activates the cleaning of dataset instances.
  • logging_level: Determines the level of logging output. The default level is set to "error".

Default Configuration Values

The following configuration elements are used as defaults values that used to initialize missing values during the data cleaning process:

  • default_total_driven_km
  • default_group_vehicle_number
  • default_record_country
  • default_record_date

test_driving_data_processor's People

Watchers

Usam Sersultanov  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.