oml-team / open-metric-learning Goto Github PK

Metric learning and retrieval pipelines, models and zoo.

Home Page: https://open-metric-learning.readthedocs.io/en/latest/index.html

License: Apache License 2.0

Makefile 0.86% Python 98.28% Dockerfile 0.20% Jupyter Notebook 0.66%

computer-vision data-science deep-learning metric-learning pytorch pytorch-lightning representation-learning hacktoberfest similarity-learning hacktoberfest-2023

open-metric-learning's Introduction

OML is a PyTorch-based framework to train and validate the models producing high-quality embeddings.

Trusted by

ㅤㅤ

There is a number of people from Oxford and HSE universities who have used OML in their theses. [1] [2] [3]

OML 3.0 has been released!

The update focuses on several components:

We added "official" texts support and the corresponding Python examples. (Note, texts support in Pipelines is not supported yet.)
We introduced the RetrievalResults (RR) class — a container to store gallery items retrieved for given queries. RR provides a unified way to visualize predictions and compute metrics (if ground truths are known). It also simplifies post-processing, where an RR object is taken as input and another RR_upd is produced as output. Having these two objects allows comparison retrieval results visually or by metrics. Moreover, you can easily create a chain of such post-processors.
- RR is memory optimized because of using batching: in other words, it doesn't store full matrix of query-gallery distances. (It doesn't make search approximate though).
We made Model and Dataset the only classes responsible for processing modality-specific logic. Model is responsible for interpreting its input dimensions: for example, BxCxHxW for images or BxLxD for sequences like texts. Dataset is responsible for preparing an item: it may use Transforms for images or Tokenizer for texts. Functions computing metrics like calc_retrieval_metrics_rr, RetrievalResults, PairwiseReranker, and other classes and functions are unified to work with any modality.
- We added IVisualizableDataset having method .visaulize() that shows a single item. If implemented, RetrievalResults is able to show the layout of retrieved results.

Migration from OML 2.* [Python API]:

The easiest way to catch up with changes is to re-read the examples!

The recommended way of validation is to use RetrievalResults and functions like calc_retrieval_metrics_rr, calc_fnmr_at_fmr_rr, and others. The EmbeddingMetrics class is kept for use with PyTorch Lightning and inside Pipelines. Note, the signatures of EmbeddingMetrics methods have been slightly changed, see Lightning examples for that.
Since modality-specific logic is confined to Dataset, it doesn't output PATHS_KEY, X1_KEY, X2_KEY, Y1_KEY, and Y2_KEY anymore. Keys which are not modality-specific like LABELS_KEY, IS_GALLERY, IS_QUERY_KEY, CATEGORIES_KEY are still in use.
inference_on_images is now inference and works with any modality.
Slightly changed interfaces of Datasets. For example, we have IQueryGalleryDataset and IQueryGalleryLabeledDataset interfaces. The first has to be used for inference, the second one for validation. Also added IVisualizableDataset interface.
Removed some internals like IMetricDDP, EmbeddingMetricsDDP, calc_distance_matrix, calc_gt_mask, calc_mask_to_ignore, apply_mask_to_ignore. These changes shouldn't affect you. Also removed code related to a pipeline with precomputed triplets.

Migration from OML 2.* [Pipelines]:

Feature extraction: No changes, except for adding an optional argument — mode_for_checkpointing = (min | max). It may be useful to switch between the lower, the better and the greater, the better type of metrics.
Pairwise-postprocessing pipeline: Slightly changed the name and arguments of the postprocessor sub config — pairwise_images is now pairwise_reranker and doesn't need transforms.

Documentation

FAQ

Why do I need OML?

You may think "If I need image embeddings I can simply train a vanilla classifier and take its penultimate layer". Well, it makes sense as a starting point. But there are several possible drawbacks:

If you want to use embeddings to perform searching you need to calculate some distance among them (for example, cosine or L2). Usually, you don't directly optimize these distances during the training in the classification setup. So, you can only hope that final embeddings will have the desired properties.
The second problem is the validation process. In the searching setup, you usually care how related your top-N outputs are to the query. The natural way to evaluate the model is to simulate searching requests to the reference set and apply one of the retrieval metrics. So, there is no guarantee that classification accuracy will correlate with these metrics.
Finally, you may want to implement a metric learning pipeline by yourself. There is a lot of work: to use triplet loss you need to form batches in a specific way, implement different kinds of triplets mining, tracking distances, etc. For the validation, you also need to implement retrieval metrics, which include effective embeddings accumulation during the epoch, covering corner cases, etc. It's even harder if you have several gpus and use DDP. You may also want to visualize your search requests by highlighting good and bad search results. Instead of doing it by yourself, you can simply use OML for your purposes.

What is the difference between Open Metric Learning and PyTorch Metric Learning?

PML is the popular library for Metric Learning, and it includes a rich collection of losses, miners, distances, and reducers; that is why we provide straightforward examples of using them with OML. Initially, we tried to use PML, but in the end, we came up with our library, which is more pipeline / recipes oriented. That is how OML differs from PML:

OML has Pipelines which allows training models by preparing a config and your data in the required format (it's like converting data into COCO format to train a detector from mmdetection).
OML focuses on end-to-end pipelines and practical use cases. It has config based examples on popular benchmarks close to real life (like photos of products of thousands ids). We found some good combinations of hyperparameters on these datasets, trained and published models and their configs. Thus, it makes OML more recipes oriented than PML, and its author confirms this saying that his library is a set of tools rather the recipes, moreover, the examples in PML are mostly for CIFAR and MNIST datasets.
OML has the Zoo of pretrained models that can be easily accessed from the code in the same way as in torchvision (when you type resnet50(pretrained=True)).
OML is integrated with PyTorch Lightning, so, we can use the power of its Trainer. This is especially helpful when we work with DDP, so, you compare our DDP example and the PMLs one. By the way, PML also has Trainers, but it's not widely used in the examples and custom train / test functions are used instead.

We believe that having Pipelines, laconic examples, and Zoo of pretrained models sets the entry threshold to a really low value.

What is Metric Learning?

Metric Learning problem (also known as extreme classification problem) means a situation in which we have thousands of ids of some entities, but only a few samples for every entity. Often we assume that during the test stage (or production) we will deal with unseen entities which makes it impossible to apply the vanilla classification pipeline directly. In many cases obtained embeddings are used to perform search or matching procedures over them.

Here are a few examples of such tasks from the computer vision sphere:

Person/Animal Re-Identification
Face Recognition
Landmark Recognition
Searching engines for online shops and many others.

Glossary (Naming convention)

embedding - model's output (also known as features vector or descriptor).
query - a sample which is used as a request in the retrieval procedure.
gallery set - the set of entities to search items similar to query (also known as reference or index).
Sampler - an argument for DataLoader which is used to form batches
Miner - the object to form pairs or triplets after the batch was formed by Sampler. It's not necessary to form the combinations of samples only inside the current batch, thus, the memory bank may be a part of Miner.
Samples/Labels/Instances - as an example let's consider DeepFashion dataset. It includes thousands of fashion item ids (we name them labels) and several photos for each item id (we name the individual photo as instance or sample). All of the fashion item ids have their groups like "skirts", "jackets", "shorts" and so on (we name them categories). Note, we avoid using the term class to avoid misunderstanding.
training epoch - batch samplers which we use for combination-based losses usually have a length equal to [number of labels in training dataset] / [numbers of labels in one batch]. It means that we don't observe all of the available training samples in one epoch (as opposed to vanilla classification), instead, we observe all of the available labels.

How good may be a model trained with OML?

It may be comparable with the current (2022 year) SotA methods, for example, Hyp-ViT. (Few words about this approach: it's a ViT architecture trained with contrastive loss, but the embeddings were projected into some hyperbolic space. As the authors claimed, such a space is able to describe the nested structure of real-world data. So, the paper requires some heavy math to adapt the usual operations for the hyperbolical space.)

We trained the same architecture with triplet loss, fixing the rest of the parameters: training and test transformations, image size, and optimizer. See configs in Models Zoo. The trick was in heuristics in our miner and sampler:

Category Balance Sampler forms the batches limiting the number of categories C in it. For instance, when C = 1 it puts only jackets in one batch and only jeans into another one (just an example). It automatically makes the negative pairs harder: it's more meaningful for a model to realise why two jackets are different than to understand the same about a jacket and a t-shirt.
Hard Triplets Miner makes the task even harder keeping only the hardest triplets (with maximal positive and minimal negative distances).

Here are CMC@1 scores for 2 popular benchmarks. SOP dataset: Hyp-ViT — 85.9, ours — 86.6. DeepFashion dataset: Hyp-ViT — 92.5, ours — 92.1. Thus, utilising simple heuristics and avoiding heavy math we are able to perform on SotA level.

What about Self-Supervised Learning?

Recent research in SSL definitely obtained great results. The problem is that these approaches required an enormous amount of computing to train the model. But in our framework, we consider the most common case when the average user has no more than a few GPUs.

At the same time, it would be unwise to ignore success in this sphere, so we still exploit it in two ways:

As a source of checkpoints that would be great to start training with. From publications and our experience, they are much better as initialisation than the default supervised model trained on ImageNet. Thus, we added the possibility to initialise your models using these pretrained checkpoints only by passing an argument in the config or the constructor.
As a source of inspiration. For example, we adapted the idea of a memory bank from MoCo for the TripletLoss.

Do I need to know other frameworks to use OML?

No, you don't. OML is a framework-agnostic. Despite we use PyTorch Lightning as a loop runner for the experiments, we also keep the possibility to run everything on pure PyTorch. Thus, only the tiny part of OML is Lightning-specific and we keep this logic separately from other code (see oml.lightning). Even when you use Lightning, you don't need to know it, since we provide ready to use Pipelines.

The possibility of using pure PyTorch and modular structure of the code leaves a room for utilizing OML with your favourite framework after the implementation of the necessary wrappers.

Can I use OML without any knowledge in DataScience?

Yes. To run the experiment with Pipelines you only need to write a converter to our format (it means preparing the .csv table with a few predefined columns). That's it!

Probably we already have a suitable pre-trained model for your domain in our Models Zoo. In this case, you don't even need to train it.

Can I export models to ONNX?

Currently, we don't support exporting models to ONNX directly. However, you can use the built-in PyTorch capabilities to achieve this. For more information, please refer to this issue.

DOCUMENTATION

TUTORIAL TO START WITH: English | Russian | Chinese

The DEMO for our paper STIR: Siamese Transformers for Image Retrieval Postprocessing
Meet OpenMetricLearning (OML) on Marktechpost
The report for Berlin-based meetup: "Computer Vision in production". November, 2022. Link

Installation

pip install -U open-metric-learning; # minimum dependencies
pip install -U open-metric-learning[nlp]

DockerHub

docker pull omlteam/oml:gpu
docker pull omlteam/oml:cpu

OML features

Losses \| Miners miner = AllTripletsMiner() miner = NHardTripletsMiner() miner = MinerWithBank() ... criterion = TripletLossWithMiner(0.1, miner) criterion = ArcFaceLoss() criterion = SurrogatePrecision()	Samplers labels = train.get_labels() l2c = train.get_label2category() sampler = BalanceSampler(labels) sampler = CategoryBalanceSampler(labels, l2c) sampler = DistinctCategoryBalanceSampler(labels, l2c)
Configs support max_epochs: 10 sampler: name: balance args: n_labels: 2 n_instances: 2	Pre-trained models model_hf = AutoModel.from_pretrained("roberta-base") tokenizer = AutoTokenizer.from_pretrained("roberta-base") extractor_txt = HFWrapper(model_hf) extractor_img = ViTExtractor.from_pretrained("vits16_dino") transforms, _ = get_transforms_for_pretrained("vits16_dino")
Post-processing emb = inference(extractor, dataset) rr = RetrievalResults.from_embeddings(emb, dataset) postprocessor = AdaptiveThresholding() rr_upd = postprocessor.process(rr, dataset)	Post-processing by NN \| Paper embeddings = inference(extractor, dataset) rr = RetrievalResults.from_embeddings(embeddings, dataset) postprocessor = PairwiseReranker(ConcatSiamese(), top_n=3) rr_upd = postprocessor.process(rr, dataset)
Logging logger = TensorBoardPipelineLogger() logger = NeptunePipelineLogger() logger = WandBPipelineLogger() logger = MLFlowPipelineLogger() logger = ClearMLPipelineLogger()	PML from pytorch_metric_learning import losses criterion = losses.TripletMarginLoss(0.2, "all") pred = ViTExtractor()(data) criterion(pred, gts)
Categories support # train loader = DataLoader(CategoryBalanceSampler()) # validation rr = RetrievalResults.from_embeddings() m.calc_retrieval_metrics_rr(rr, query_categories)	Misc metrics embeddigs = inference(model, dataset) rr = RetrievalResults.from_embeddings(embeddings, dataset) m.calc_retrieval_metrics_rr(rr, precision_top_k=(5,)) m.calc_fnmr_at_fmr_rr(rr, fmr_vals=(0.1,)) m.calc_topological_metrics(embeddings, pcf_variance=(0.5,))
Lightning import pytorch_lightning as pl model = ViTExtractor.from_pretrained("vits16_dino") clb = MetricValCallback(EmbeddingMetrics(dataset)) module = ExtractorModule(model, criterion, optimizer) trainer = pl.Trainer(max_epochs=3, callbacks=[clb]) trainer.fit(module, train_loader, val_loader)	Lightning DDP clb = MetricValCallback(EmbeddingMetrics(val)) module = ExtractorModuleDDP( model, criterion, optimizer, train, val ) ddp = {"devices": 2, "strategy": DDPStrategy()} trainer = pl.Trainer(max_epochs=3, callbacks=[clb], **ddp) trainer.fit(module)

Examples

Here is an example of how to train, validate and post-process the model on a tiny dataset of images or texts. See more details on dataset format.

IMAGES

TEXTS

from torch.optim import Adam
from torch.utils.data import DataLoader

from oml import datasets as d
from oml.inference import inference
from oml.losses import TripletLossWithMiner
from oml.metrics import calc_retrieval_metrics_rr
from oml.miners import AllTripletsMiner
from oml.models import ViTExtractor
from oml.registry import get_transforms_for_pretrained
from oml.retrieval import RetrievalResults, AdaptiveThresholding
from oml.samplers import BalanceSampler
from oml.utils import get_mock_images_dataset

model = ViTExtractor.from_pretrained("vits16_dino").to("cpu").train()
transform, _ = get_transforms_for_pretrained("vits16_dino")

df_train, df_val = get_mock_images_dataset(global_paths=True)
train = d.ImageLabeledDataset(df_train, transform=transform)
val = d.ImageQueryGalleryLabeledDataset(df_val, transform=transform)

optimizer = Adam(model.parameters(), lr=1e-4)
criterion = TripletLossWithMiner(0.1, AllTripletsMiner(), need_logs=True)
sampler = BalanceSampler(train.get_labels(), n_labels=2, n_instances=2)


def training():
    for batch in DataLoader(train, batch_sampler=sampler):
        embeddings = model(batch["input_tensors"])
        loss = criterion(embeddings, batch["labels"])
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        print(criterion.last_logs)


def validation():
    embeddings = inference(model, val, batch_size=4, num_workers=0)
    rr = RetrievalResults.from_embeddings(embeddings, val, n_items=3)
    rr = AdaptiveThresholding(n_std=2).process(rr)
    rr.visualize(query_ids=[2, 1], dataset=val, show=True)
    print(calc_retrieval_metrics_rr(rr, map_top_k=(3,), cmc_top_k=(1,)))


training()
validation()

from torch.optim import Adam
from torch.utils.data import DataLoader
from transformers import AutoModel, AutoTokenizer

from oml import datasets as d
from oml.inference import inference
from oml.losses import TripletLossWithMiner
from oml.metrics import calc_retrieval_metrics_rr
from oml.miners import AllTripletsMiner
from oml.models import HFWrapper
from oml.retrieval import RetrievalResults, AdaptiveThresholding
from oml.samplers import BalanceSampler
from oml.utils import get_mock_texts_dataset

model = HFWrapper(AutoModel.from_pretrained("bert-base-uncased"), 768).to("cpu").train()
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

df_train, df_val = get_mock_texts_dataset()
train = d.TextLabeledDataset(df_train, tokenizer=tokenizer)
val = d.TextQueryGalleryLabeledDataset(df_val, tokenizer=tokenizer)

optimizer = Adam(model.parameters(), lr=1e-4)
criterion = TripletLossWithMiner(0.1, AllTripletsMiner(), need_logs=True)
sampler = BalanceSampler(train.get_labels(), n_labels=2, n_instances=2)


def training():
    for batch in DataLoader(train, batch_sampler=sampler):
        embeddings = model(batch["input_tensors"])
        loss = criterion(embeddings, batch["labels"])
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        print(criterion.last_logs)


def validation():
    embeddings = inference(model, val, batch_size=4, num_workers=0)
    rr = RetrievalResults.from_embeddings(embeddings, val, n_items=3)
    rr = AdaptiveThresholding(n_std=2).process(rr)
    rr.visualize(query_ids=[2, 1], dataset=val, show=True)
    print(calc_retrieval_metrics_rr(rr, map_top_k=(3,), cmc_top_k=(1,)))


training()
validation()

Output

{'active_tri': 0.125, 'pos_dist': 82.5, 'neg_dist': 100.5}  # batch 1
{'active_tri': 0.0, 'pos_dist': 36.3, 'neg_dist': 56.9}     # batch 2

{'cmc': {1: 0.75}, 'precision': {5: 0.75}, 'map': {3: 0.8}}

Output

{'active_tri': 0.0, 'pos_dist': 8.5, 'neg_dist': 11.0}  # batch 1
{'active_tri': 0.25, 'pos_dist': 8.9, 'neg_dist': 9.8}  # batch 2

{'cmc': {1: 0.8}, 'precision': {5: 0.7}, 'map': {3: 0.9}}

Extra illustrations, explanations and tips for the code above.

Retrieval by trained model

Here is an inference time example (in other words, retrieval on test set). The code below works for both texts and images.

See example

from oml.datasets import ImageQueryGalleryDataset
from oml.inference import inference
from oml.models import ViTExtractor
from oml.registry import get_transforms_for_pretrained
from oml.utils import get_mock_images_dataset
from oml.retrieval import RetrievalResults, AdaptiveThresholding

_, df_test = get_mock_images_dataset(global_paths=True)
del df_test["label"]  # we don't need gt labels for doing predictions

extractor = ViTExtractor.from_pretrained("vits16_dino").to("cpu")
transform, _ = get_transforms_for_pretrained("vits16_dino")

dataset = ImageQueryGalleryDataset(df_test, transform=transform)
embeddings = inference(extractor, dataset, batch_size=4, num_workers=0)

rr = RetrievalResults.from_embeddings(embeddings, dataset, n_items=5)
rr = AdaptiveThresholding(n_std=3.5).process(rr)
rr.visualize(query_ids=[0, 1], dataset=dataset, show=True)

# you get the ids of retrieved items and the corresponding distances
print(rr)

Retrieval by trained model: streaming & txt2im

Here is an example where queries and galleries processed separately.

First, it may be useful for streaming retrieval, when a gallery (index) set is huge and fixed, but queries are coming in batches.
Second, queries and galleries have different natures, for examples, queries are texts, but galleries are images.

See example

import pandas as pd

from oml.datasets import ImageBaseDataset
from oml.inference import inference
from oml.models import ViTExtractor
from oml.registry import get_transforms_for_pretrained
from oml.retrieval import RetrievalResults, ConstantThresholding
from oml.utils import get_mock_images_dataset

extractor = ViTExtractor.from_pretrained("vits16_dino").to("cpu")
transform, _ = get_transforms_for_pretrained("vits16_dino")

paths = pd.concat(get_mock_images_dataset(global_paths=True))["path"]
galleries, queries1, queries2 = paths[:20], paths[20:22], paths[22:24]

# gallery is huge and fixed, so we only process it once
dataset_gallery = ImageBaseDataset(galleries, transform=transform)
embeddings_gallery = inference(extractor, dataset_gallery, batch_size=4, num_workers=0)

# queries come "online" in stream
for queries in [queries1, queries2]:
    dataset_query = ImageBaseDataset(queries, transform=transform)
    embeddings_query = inference(extractor, dataset_query, batch_size=4, num_workers=0)

    # for the operation below we are going to provide integrations with vector search DB like QDrant or Faiss
    rr = RetrievalResults.from_embeddings_qg(
        embeddings_query=embeddings_query, embeddings_gallery=embeddings_gallery,
        dataset_query=dataset_query, dataset_gallery=dataset_gallery
    )
    rr = ConstantThresholding(th=80).process(rr)
    rr.visualize_qg([0, 1], dataset_query=dataset_query, dataset_gallery=dataset_gallery, show=True)
    print(rr)

Pipelines

Pipelines provide a way to run metric learning experiments via changing only the config file. All you need is to prepare your dataset in a required format.

See Pipelines folder for more details:

Feature extractor pipeline
Retrieval re-ranking pipeline

Zoo

How to use text models?

Here is a lightweight integration with HuggingFace Transformers models. You can replace it with other arbitrary models inherited from IExtractor.

Note, we don't have our own text models zoo at the moment.

See example

pip install open-metric-learning[nlp]

from transformers import AutoModel, AutoTokenizer

from oml.models import HFWrapper

model = AutoModel.from_pretrained('bert-base-uncased').eval()
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
extractor = HFWrapper(model=model, feat_dim=768)

inp = tokenizer(text="Hello world", return_tensors="pt", add_special_tokens=True)
embeddings = extractor(inp)

How to use image models?

You can use an image model from our Zoo or use other arbitrary models after you inherited it from IExtractor.

See example

from oml.const import CKPT_SAVE_ROOT as CKPT_DIR, MOCK_DATASET_PATH as DATA_DIR
from oml.models import ViTExtractor
from oml.registry import get_transforms_for_pretrained

model = ViTExtractor.from_pretrained("vits16_dino").eval()
transforms, im_reader = get_transforms_for_pretrained("vits16_dino")

img = im_reader(DATA_DIR / "images" / "circle_1.jpg")  # put path to your image here
img_tensor = transforms(img)
# img_tensor = transforms(image=img)["image"]  # for transforms from Albumentations

features = model(img_tensor.unsqueeze(0))

# Check other available models:
print(list(ViTExtractor.pretrained_models.keys()))

# Load checkpoint saved on a disk:
model_ = ViTExtractor(weights=CKPT_DIR / "vits16_dino.ckpt", arch="vits16", normalise_features=False)

Image models zoo

Models, trained by us. The metrics below are for 224 x 224 images:

model	cmc1	dataset	weights	experiment
`ViTExtractor.from_pretrained("vits16_inshop")`	0.921	DeepFashion Inshop	link	link
`ViTExtractor.from_pretrained("vits16_sop")`	0.866	Stanford Online Products	link	link
`ViTExtractor.from_pretrained("vits16_cars")`	0.907	CARS 196	link	link
`ViTExtractor.from_pretrained("vits16_cub")`	0.837	CUB 200 2011	link	link

Models, trained by other researchers. Note, that some metrics on particular benchmarks are so high because they were part of the training dataset (for example unicom). The metrics below are for 224 x 224 images:

model	Stanford Online Products	DeepFashion InShop	CUB 200 2011	CARS 196
`ViTUnicomExtractor.from_pretrained("vitb16_unicom")`	0.700	0.734	0.847	0.916
`ViTUnicomExtractor.from_pretrained("vitb32_unicom")`	0.690	0.722	0.796	0.893
`ViTUnicomExtractor.from_pretrained("vitl14_unicom")`	0.726	0.790	0.868	0.922
`ViTUnicomExtractor.from_pretrained("vitl14_336px_unicom")`	0.745	0.810	0.875	0.924
`ViTCLIPExtractor.from_pretrained("sber_vitb32_224")`	0.547	0.514	0.448	0.618
`ViTCLIPExtractor.from_pretrained("sber_vitb16_224")`	0.565	0.565	0.524	0.648
`ViTCLIPExtractor.from_pretrained("sber_vitl14_224")`	0.512	0.555	0.606	0.707
`ViTCLIPExtractor.from_pretrained("openai_vitb32_224")`	0.612	0.491	0.560	0.693
`ViTCLIPExtractor.from_pretrained("openai_vitb16_224")`	0.648	0.606	0.665	0.767
`ViTCLIPExtractor.from_pretrained("openai_vitl14_224")`	0.670	0.675	0.745	0.844
`ViTExtractor.from_pretrained("vits16_dino")`	0.648	0.509	0.627	0.265
`ViTExtractor.from_pretrained("vits8_dino")`	0.651	0.524	0.661	0.315
`ViTExtractor.from_pretrained("vitb16_dino")`	0.658	0.514	0.541	0.288
`ViTExtractor.from_pretrained("vitb8_dino")`	0.689	0.599	0.506	0.313
`ViTExtractor.from_pretrained("vits14_dinov2")`	0.566	0.334	0.797	0.503
`ViTExtractor.from_pretrained("vits14_reg_dinov2")`	0.566	0.332	0.795	0.740
`ViTExtractor.from_pretrained("vitb14_dinov2")`	0.565	0.342	0.842	0.644
`ViTExtractor.from_pretrained("vitb14_reg_dinov2")`	0.557	0.324	0.833	0.828
`ViTExtractor.from_pretrained("vitl14_dinov2")`	0.576	0.352	0.844	0.692
`ViTExtractor.from_pretrained("vitl14_reg_dinov2")`	0.571	0.340	0.840	0.871
`ResnetExtractor.from_pretrained("resnet50_moco_v2")`	0.493	0.267	0.264	0.149
`ResnetExtractor.from_pretrained("resnet50_imagenet1k_v1")`	0.515	0.284	0.455	0.247

The metrics may be different from the ones reported by papers, because the version of train/val split and usage of bounding boxes may differ.

Contributing guide

We welcome new contributors! Please, see our:

Acknowledgments

The project was started in 2020 as a module for Catalyst library. I want to thank people who worked with me on that module: Julia Shenshina, Nikita Balagansky, Sergey Kolesnikov and others.

I would like to thank people who continue working on this pipeline when it became a separate project: Julia Shenshina, Misha Kindulov, Aron Dik, Aleksei Tarasov and Verkhovtsev Leonid.

I also want to thank NewYorker, since the part of functionality was developed (and used) by its computer vision team led by me.

open-metric-learning's People

Contributors

Stargazers

Watchers

open-metric-learning's Issues

Move our checkpoints from Google Drive to Amazon S3

Check the input for retrieval metric

Before computing metrics, we should check if every query has a valid gallery with the same label (don't forget mast_to_ignore)

how fast is it?
how to avoud checking every epoch?

Add CLIP (model)

Add query reranking techniques

probably this feature requires changing retrieval metrics signatures

I checked these techniques from Re-Id, link. It did not work even with different hyperparameters — results were slightly worst than the original ones.

Integrate some of the popular checkpoints from VISSL repo as new pretrained checkpoints for OML

link: https://github.com/facebookresearch/vissl/blob/main/MODEL_ZOO.md

Memory bank with categories

Memory bank can generate a lot of easy negatives. We need to find a way to use CategorySampler + CategoryMemoryBank.

Probably we can use MemoryBank as is, but modify the sampler (or create new one) so that it can generate several identical categories in a row. Possibly with the same labels.

Check if RandomSizedBBoxSafeCrop from albu improves score and if so add it to the repo

Check how crop augs affect metric

Fix DataLoaders + Samplers so they can work with DDP correctly

Image caching improvements

Make 2 options of caching for Datasets:

cache_arrays - cache np.array after reading and decoding. Now we have this.
cache_bytes - cache only bytes without decoding from jpg, png, ... to np.array. It also remove HDD bottleneck as previous option, but in addition it significantly reduces memory consumption and allow to cache more data

Support augs outside of Albu

Dataset cache is duplicated WORLD_SIZE times

In DDP mode each gpu runned in own process and don't have common cache

Add batches keys to the datasets as arguments

Add the tracked value which can show the dimension collapse problem

We may follow the idea from LeCun's paper and use the percentage of "big" singular values in SVD
Technically we may put it inside the EmbeddingMetric

Another approach is to track how many principal components in PCA we need to keep 95% of accuracy. (What is the policy for the case when PCA improves accuracy?)

Add a linter to find outdated doc strings

Add docker to CI and run the tests there

Test triplet loss vs torch version failure

During checking my code I found some strange error here

Log query/gallery matrix of images after the validation stage

Upload code to Neptune for reproducibility

Integrate OML with Nvidia-DALI

DALI allows decoding images directly on GPU which improves performance but it also requires a bit different interface for Dataset and we cannot use normal DataLoader anymore. So, integration may be a bit tricky

Before working on this feature we need to discuss a proposal on a high level

Rework handling weights

now we have to perform weights="random" which is not convenient, it's better to use weights=None

our integration with the pretrained models on imagenet after the update of torchvision should be reworked as well

Rework registry

kwargs -> **kwargs
serialise all of the objects, than use them for tests together with partial

Implement script for extracting features

we need something like python extract.py and YAML config which parametrizes the model and DataLoader

probably we should store features in hdf5 format which may be useful for users without knowing python

Unify names of methods and attributes for samplers

Samplers have different names of batches in epoch, batch size
Avoid calculation in @property

Fix the reproducibility issue

Log metrics as bar charts

We may compute metrics on different categories, see docs.

The problem appears when we log and plot this metrics. They are shown on different plots, so it's hard to compare different categories. It would be great to see those metrics as bar chars for easier side-by-side comparison.

Let's update a categories example so we have a function that creates a bar chart (as an image in np format) based on provided metrics dictionary, so we can visualize it.

After it's done, we can add this functionality to Pipelines Logging.

New names for samplers' parameters

p,k,c -> l,s,c

Visualise 2-d embedding space for TripletLoss and for ArcFace for some simple dataset like MNIST

There are several purposes:

1 - understand how these losses behave in comparison with each other
2 - understand what's going on if we combine them
3 - we can write a short post on Medium about it to attract new people to OML

Add a linter to find unused code

Add doc about adding custom augs / models / datasets to registry

Find a way how to avoid accumulating all batches with Lightning

PL accumulate all outputs from train and val during epoch

Then you can call

    def training_epoch_end(self, outputs: EPOCH_OUTPUT) -> None:
        ...

    def validation_epoch_end(self, outputs: EPOCH_OUTPUT) -> None:
        ...

Rework logs handling

if we always return logs we have a conflict with default losses
if we do non-strict unpacking (loss, *log = criterion(inp)) we have a problem that python tries to iterate over the 1st tensor dimension in the case with no logs
passing flags to init? the problem with the chain of flags is that if all of the elements require logs but the last one is not, then we still calculate all of the logs but then don't use it

Make an example on mock dataset and put it in the Docs here. It should look like this:

embeddings = inference(model, val, batch_size=4, num_workers=0)
rr = RetrievalResults.from_embeddings(embeddings, val, n_items=3)

embeddings_upd = PCA()(embeddings)
rr_upd = RetrievalResults.from_embeddings(embeddings_upd, val, n_items=3)

# after that we compare results visually and/or by metrics

Check if it boosts metrics on one of our benchmarks. Check how much the retrieval time is improved.
Write a shor report below the example on the same Doc's page.

PS. OML already has a PCA implementation, so, no need to bring an extra requirement.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

oml-team / open-metric-learning Goto Github PK

open-metric-learning's Introduction

Trusted by

Migration from OML 2.* [Python API]:

Migration from OML 2.* [Pipelines]:

OML features

Retrieval by trained model

Retrieval by trained model: streaming & txt2im

How to use text models?

How to use image models?

Image models zoo

Acknowledgments

open-metric-learning's People

Contributors

Stargazers

Watchers

Forkers

open-metric-learning's Issues

Recommend Projects

Recommend Topics

Recommend Org