njallskarp / finetune-qa-powerset Goto Github PK
View Code? Open in Web Editor NEWFinetuning BERT models on a powerset of different linguistic domains
Home Page: https://lvl.ru.is/
Finetuning BERT models on a powerset of different linguistic domains
Home Page: https://lvl.ru.is/
Currently, the data is stored on Labelstudio. We need to fetch the data. We already have a starter code, but we need to make it A) prettier and B) make sure that all passwords, tokens, etc are environment variables and not commited in code
We need to figure out a testing plan for the following kinds of tests
Formatting with github actions does work, but doesn't get saved and merged with PR
We should add an auto-formatter to our repo to improve code consistency and notation style
We need to document how to set up the project locally with all the configuration
We need a function that can evaluate a model and, given a test dataset, calculates the following aggregate metrics:
Based on the research in previous ticket #15, set up unit testing for the project. This includes the following acceptance criteria
We have already added a command line argument (main.py) where users can specify the domains or datasets. What we need to do is we need to have each domain load a different Dataset class. Then, during each iteration of for set of sources in the powerset, we need to use torch Dataset's concat method to concatinate the multiple domains to create a single Dataset. If we have N
domains then we will end up creating 2^N - 1
dataset classes, one per iteration.
It seems to me that this might happen inside the run training fuction. That is, around the for ... in range(epochs)
there will be something like for domain_subset in powerset:
This means that we will need to pass the Dataset classes into this function, not the dataloaders.
We can schedule a meeting to discuss this in detail.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.