Giter Club home page Giter Club logo

finetune-qa-powerset's Introduction

Codacy Badge

Finetune QA Powerset

finetune-qa-powerset's People

Contributors

eysteinn-orn avatar lsig avatar njallskarp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

finetune-qa-powerset's Issues

Create script to read multiple domain data

Read Data

Currently, the data is stored on Labelstudio. We need to fetch the data. We already have a starter code, but we need to make it A) prettier and B) make sure that all passwords, tokens, etc are environment variables and not commited in code

Research Testing Practices

Testing research

We need to figure out a testing plan for the following kinds of tests

  • unit tests (what works well, is easy and integrates well with github actions)
  • migration tests (same requirements
  • E2E tests (same requirements)

Add a autoformatter

Add a autoformatter

We should add an auto-formatter to our repo to improve code consistency and notation style

Add documentation for how to set up project locally

Documentation

We need to document how to set up the project locally with all the configuration

  • linter so it matches github actions.
  • Instructions on how to run linter such that if linting passes locally then it will pass on github
  • Instructions on how to run test suites
  • Instructions on how to set up poetry and install deps

Functions that compute metrics

Metrics Functions

We need a function that can evaluate a model and, given a test dataset, calculates the following aggregate metrics:

  • f1 score
  • precision
  • recal

Set up unit testing

Setup

Based on the research in previous ticket #15, set up unit testing for the project. This includes the following acceptance criteria

  • set up 1-2 unit tests for metrics file
  • can run unit tests locally
  • unit tests run on all commits and failed tests block merges

Training logic on powerset

Training logic for powerset

We have already added a command line argument (main.py) where users can specify the domains or datasets. What we need to do is we need to have each domain load a different Dataset class. Then, during each iteration of for set of sources in the powerset, we need to use torch Dataset's concat method to concatinate the multiple domains to create a single Dataset. If we have N domains then we will end up creating 2^N - 1 dataset classes, one per iteration.

Where this could happen

It seems to me that this might happen inside the run training fuction. That is, around the for ... in range(epochs) there will be something like for domain_subset in powerset:

This means that we will need to pass the Dataset classes into this function, not the dataloaders.

We can schedule a meeting to discuss this in detail.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.