Giter Club home page Giter Club logo

kdd24's Introduction

KDD24

CI

Experiments

python main.py --common_config <common_config_name> --model <model_name> --model_config <config_name> --dataset <dataset_name> --fold <fold_number> --mode <mode>
Argument Options Description
common_config config1, ... Common config to use. It contains details like features to use.
model rf, ... Model to use
model_config config1, ... Model configuration to use
dataset pa_lov, ... Dataset to use
fold 0, 1, ... Fold number
mode fit, predict, fit_predict Fit, predict or fit and predict together

Example:

python main.py --common_config config1 --model rf --model_config config1 --dataset pa_lov --fold 0 --mode fit_predict

One can define such commands in a shell script and run them in parallel. For example,

# measure time
start=`date +%s`

python main.py --model rf --dataset pa_lov --config config1 --mode fit_predict --fold 0
python main.py --model rf --dataset pa_lov --config config1 --mode fit_predict --fold 1
python main.py --model rf --dataset pa_lov --config config1 --mode fit_predict --fold 2
python main.py --model rf --dataset pa_lov --config config1 --mode fit_predict --fold 3
python main.py --model rf --dataset pa_lov --config config1 --mode fit_predict --fold 4

# measure time
end=`date +%s`

runtime=$((end-start))

# print time in miniutes upto 2 decimal places
echo "Total time : $(echo "scale=2; $runtime/60" | bc -l) minutes"

Note that sh files are by default .gitignored. So, you can create any custom run.sh file in the root of the repo and run it.

Reproducibility of the pipeline

  • [To Add] A notebook to prepare the data which goes into the pipeline. Assigned to Z in one of the issues.
  • Use notebooks/prepare_lov.ipynb to prepare the data
  • Then use main.py to run the experiments

Repo Structure

kdd24
|---models          # for all models
    |---<model1_name>
        |---model.py
        |---config1.toml
        |---config2.toml
        |---...
    |---<model2_name>
        |---model.py
        |---config1.toml
        |---config2.toml
        |---...
    ...
|---datasets        # for all datasets
    |---<dataset1_name>
        |---dataset.py
    |---<dataset2_name>
        |---dataset.py
    ...
main.py             # for running experiments

model.py structure

def fit(train_data, config):
    # Fit and save model and auxillary information

def predict(test_data, train_data, config):
    # Predict and save predictions

def fit_predict(train_data, test_data, config):
    # Fit and predict and save predictions. Useful for small models whis do not take much time to fit. For other models, one can use fit and predict separately.

For co-authors

  • Open VSCode in the root of the repo

  • Install this repo as an editable package

    pip install -e .
  • Define your own model in models/<model_name>/model.py which should have fit, predict and fit_predict functions.

  • All models must receive the data in same format.

  • All models must save their results in the same format.

kdd24's People

Contributors

patel-zeel avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.