ML pipeline for predicting tags for stackoverflow questions
Extracting data from aws s3 bucket to required folder.
- Parsing the raw data(from xml tags).
- Featurize training features.
- Featurize label feature into multilabel.
- Featurizing the data into vector format.
- Get preprocessed data.
- Get params.
- Train model.
- Store model in artifacts.
- Calculate metrics using model.
- Add metrics to artifacts.
- params.yaml: parameters for model.
- config.yaml: configuration such as paths.
- dvc.yaml: for creating pipeline.
- setup.py: for packaging model.
- artifacts: models, pipeline, featurizer, metrics.