Giter Club home page Giter Club logo

finetuner's Introduction



Finetuner logo: Finetuner helps you to create experiments in order to improve embeddings on search tasks. It accompanies you to deliver the last mile of performance-tuning for neural search applications.

Task-oriented finetuning for better embeddings on neural search

PyPI Codecov branch PyPI - Downloads from official pypistats

Fine-tuning is an effective way to improve performance on neural search tasks. However, setting up and performing fine-tuning can be very time-consuming and resource-intensive.

Jina AI's Finetuner makes fine-tuning easier and faster by streamlining the workflow and handling all complexity and infrastructure in the cloud. With Finetuner, one can easily enhance the performance of pre-trained models, making them production-ready without buying expensive hardware.

๐Ÿ“ˆ Performance promise: enhance the performance of pre-trained models and deliver state-of-the-art performance on domain-specific neural search applications.

๐Ÿ”ฑ Simple yet powerful: easy access to 40+ mainstream loss functions, 10+ optimisers, layer pruning, weight freezing, dimensionality reduction, hard-negative mining, cross-modal models, and distributed training.

โ˜ All-in-cloud: train using our free GPU infrastructure, manage runs, experiments and artifacts on Jina AI Cloud without worrying about resource availability, complex integration, or infrastructure costs.

Documentation

Benchmarks

Model Task Metric Pretrained Finetuned Delta Run it!
BERT Quora Question Answering mRR 0.835 0.967 15.8%

Open In Colab

Recall 0.915 0.963 5.3%
ResNet Visual similarity search on TLL mAP 0.110 0.196 78.2%

Open In Colab

Recall 0.249 0.460 84.7%
CLIP Deep Fashion text-to-image search mRR 0.575 0.676 17.4%

Open In Colab

Recall 0.473 0.564 19.2%
M-CLIP Cross market product recommendation (German) mRR 0.430 0.648 50.7%

Open In Colab

Recall 0.247 0.340 37.7%

All metrics were evaluated for k@20 after training for 5 epochs using the Adam optimizer with learning rates of 1e-4 for ResNet, 1e-7 for CLIP and 1e-5 for the BERT models.

Install

Make sure you have Python 3.7+ installed. Finetuner can be installed via pip by executing:

pip install -U finetuner

If you want to encode docarray.DocumentArray objects with the finetuner.encode function, you need to install "finetuner[full]". This includes a number of additional dependencies, which are necessary for encoding: Torch, Torchvision and OpenCLIP:

pip install "finetuner[full]"

โš ๏ธ Starting with version 0.5.0, Finetuner computing is performed on Jina AI Cloud. The last local version is 0.4.1. This version is still available for installation via pip. See Finetuner git tags and releases.

Get Started

The following code snippet describes how to fine-tune ResNet50 on the Totally Looks Like dataset. You can run it as-is. The model and training data are already hosted in Jina AI Cloud and Finetuner will download them automatically. (NB: If there is already a run called resnet50-tll-run, choose a different run-name in the code below.)

import finetuner
from finetuner.callback import EvaluationCallback

finetuner.login()

run = finetuner.fit(
    model='resnet50',
    run_name='resnet50-tll-run',
    train_data='tll-train-data',
    callbacks=[
        EvaluationCallback(
            query_data='tll-test-query-data',
            index_data='tll-test-index-data',
        )
    ],
)

This code snippet describes the following steps:

  1. Log in to Jina AI Cloud.
  2. Select backbone model, training and evaluation data for your evaluation callback.
  3. Start the cloud run.

You can also pass data to Finetuner as a CSV file or a DocumentArray object, as described in the Finetuner documentation.

Depending on the data, task, model, hyperparameters, fine-tuning might take some time to finish. You can leave your jobs to run on the Jina AI Cloud, and later reconnect to them, using code like this below:

import finetuner

finetuner.login()

run = finetuner.get_run('resnet50-tll-run')

for log_entry in run.stream_logs():
    print(log_entry)

run.save_artifact('resnet-tll')

This code logs into Jina AI Cloud, then connects to your run by name. After that, it does the following:

  • Monitors the status of the run and prints out the logs.
  • Saves the model once fine-tuning is done.

Using Finetuner to encode

Finetuner has interfaces for using models to do encoding:

import finetuner
from docarray import Document, DocumentArray

da = DocumentArray([Document(uri='~/Pictures/your_img.png')])

model = finetuner.get_model('resnet-tll')
finetuner.encode(model=model, data=da)

da.summary()

When encoding, you can provide data either as a DocumentArray or a list. Since the modality of your input data can be inferred from the model being used, there is no need to provide any additional information besides the content you want to encode. When providing data as a list, the finetuner.encode method will return a np.ndarray of embeddings, instead of a docarray.DocumentArray:

import finetuner
from docarray import Document, DocumentArray

images = ['~/Pictures/your_img.png']

model = finetuner.get_model('resnet-tll')
embeddings = finetuner.encode(model=model, data=images)

Training on your own data

If you want to train a model using your own dataset instead of one on the Jina AI Cloud, you can provide labeled data in a CSV file.

A CSV file is a tab or comma-delimited plain text file. For example:

This is an apple    apple_label
This is a pear      pear_label
...

The file should have two columns: The first for the data and the second for the category label.

You can then provide a path to a CSV file as training data for Finetuner:

run = finetuner.fit(
    model='bert-base-cased',
    run_name='bert-my-own-run',
    train_data='path/to/some/data.csv',
)

More information on providing your own training data is found in the Prepare Training Data section of the Finetuner documentation.

Next steps

Read our documentation to learn more about what Finetuner can do.

Support

Join Us

Finetuner is backed by Jina AI and licensed under Apache-2.0.

We are actively hiring AI engineers and solution engineers to build the next generation of open-source AI ecosystems.

finetuner's People

Contributors

alexcg1 avatar azayz avatar bwanglzu avatar catstark avatar deepankarm avatar florian-hoenicke avatar gmastrapas avatar guenthermi avatar gvondulong avatar hanxiao avatar jemmyshin avatar jina-bot avatar jupyterjazz avatar lmmilliken avatar makram93 avatar mapleeit avatar maximilianwerk avatar nan-wang avatar nomagick avatar numb3r3 avatar roshanjossey avatar scott-martens avatar shazhou2015 avatar shubhamsaboo avatar slettner avatar tadejsv avatar violenil avatar winstonww avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.