Giter Club home page Giter Club logo

dsc180b_q2_project's Introduction

Evaluating Fungal Feature Importance in Predicting Life Expectancy for Cancer Patients

This is the repository for DSC180B Section B18-1's Project consisting of Benjamin Sacks, Ethan Chan, and Mark Zheng. This project is an extension of a study on the classification of cancer types using fungal mycobiome counts which can be found here: https://www.cell.com/cell/fulltext/S0092-8674(22)01127-8.

This project consists of two main machine learning models based upon the data presented in the previously mentioned study as well as additional metadata collected about each sample that was not used in prior models. The first is a regression model to predict the "days to death" continuous metadata variable measuring when the patient died in days after their sample was taken. The second is a classification model which aims to distinguish between different cancer stages(I-IV) as opposed to cancer types in the original study.

INSTRUCTIONS:

To run these models, run the run.py file with 1 argument, the name of the config file for the desired model. Ex. "run.py default-cancer-stage.json". Additionally, there is a notebook in the path notebooks/run.ipynb that can be used to run this program in Jupyter Notebook if desired.

Different models can be selected and run using the config files. Config files are json files in the "config" directory. They can be edited to change the parameters of the experiment as well as the type of experiment run. Each experiment only has 1 config file that it uses to increase the customization of experiments without flooding the folder with too many config files.

In each config file, there are 3 subcategories: dataset, preprocessing, and model.
Dataset specifies information about the raw feature tables including which column is the target variable.
Preprocessing specifies the parameters of the preprocessing including what transformations to apply to each column. Preprocessing can also be turned off if data is already preprocessed with "do_preprocessing".
Model specifies the parameters of the model as well as cross validation. These are model specific and will vary based upon which type of model is being used.

Additionally, these are some important keys in the config file:
experiment_name: Specifies the unique id of the experiment. This is important for separating plots in figures.
experiment_title: Title of the experiment that will be displayed on the graphs
experiment_type: internal parameter telling the pipeline which class of model to use (classification or regression)

dsc180b_q2_project's People

Contributors

mzh4ng avatar benjaminsacks avatar echan0 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.