Giter Club home page Giter Club logo

experimet_template's Introduction

DataScience Project Template

ML related tasks:

	- data processing
	- visualization
	- feature engineering
	- training
	- ensembling
	- feature selection
	- hyperparameter optimization
	- experiment tracking
	- submission of prediction to kaggle

Project Structure

- data				
    - features	 	- location for parquet files containing engineered features
    - processed	 	- location for parquet files containing raw data after initial processing
    - raw	 	- location for parquet files containing raw data (train, test, sample submission)
- fi 		 	- location to store feature importances in CSV files
- fi_fig 	 	- location to store plots capturing feature importances
- hpo            	- location to save hyperparameter optimization artifacts
- logs           	- location for logs generated by python modules 
- notebooks	 	- Any Jupyter notebook can be saved here
- oof		 	- Out of fold predictions are saved here
- src			
	- common	- package containing common utility functions
	- config	- package containing configuration related modules
	- cv		- package containing cross validation related functions
	- fe		- package containing feature engineering related functions
	- fs		- package containing feature selection related functions
	- hpo		- package containing hyperparameter optimization related functions
	- modeling	- package containing training/prediction related functions
	- munging	- package containing data processing/exploration related functions
	- pre_process	- package containing data pre-processing related functions
	- scripts	- location for fe, training scripts
	- ts		- package containing time series related functions
	- viz		- package containing data visualization related functions
- submissions           - locations for predictions and submission scripts
- tracking              - CSV file to track experiments

Acknowledgment

  • I have borrowed the initial project structure and framework code from arnabbiswas1's open sourced code.

Steps to execute:

  1. Clone the source code from github under <PROJECT_HOME> directory.

     > git clone https://github.com/castillosebastian/mortality_analyses_covid.git
    
  2. Create r and python (/usr/local/bin/python3) env:

     > renv::init()
     > renv::use_python()
    
  3. Download dataset

    > HOME_DIR /src/scripts/data_processing/process_raw_data.R
    
  4. Set the value of variable HOME_DIR, libraries, logger and much more at <PROJECT_HOME>/main.R

  5. To train the baseline model with LGBM, <PROJECT_HOME>/kaggle_pipeline_tps_aug_22. Execute the following:

     > python -m src.scripts.training.lgb_baseline
    

    This will create the submission file under <PROJECT_HOME>/kaggle_pipeline_tps_aug_22/submissions. Out of Fold predictions under <PROJECT_HOME>/kaggle_pipeline_tps_aug_22/oof and CSVs capturing feature importances under <PROJECT_HOME>/kaggle_pipeline_tps_aug_22/fi

Result of the experiment will be tracked at <PROJECT_HOME>/kaggle_pipeline_tps_aug_22/tracking/tracking.csv

  1. To submit the submission file to kaggle, go to <PROJECT_HOME>/kaggle_pipeline_tps_aug_22/submissions:

     > python -m submissions_1.py
    
  2. Important Bib

  • custom metric functions: 1,2,
  • metric: binary_logloss
  • hpyer parameters optimization grid: 1

experimet_template's People

Contributors

castillosebastian avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.