Giter Club home page Giter Club logo

nucleus-python-client's Introduction

Nucleus

https://dashboard.scale.com/nucleus

Aggregate metrics in ML are not good enough. To improve production ML, you need to understand their qualitative failure modes, fix them by gathering more data, and curate diverse scenarios.

Scale Nucleus helps you:

  • Visualize your data
  • Curate interesting slices within your dataset
  • Review and manage annotations
  • Measure and debug your model performance

Nucleus is a new way—the right way—to develop ML models, helping us move away from the concept of one dataset and towards a paradigm of collections of scenarios.

Installation

$ pip install scale-nucleus

Common issues/FAQ

Outdated Client

Nucleus is iterating rapidly and as a result we do not always perfectly preserve backwards compatibility with older versions of the client. If you run into any unexpected error, it's a good idea to upgrade your version of the client by running

pip install --upgrade scale-nucleus

Usage

The first step to using the Nucleus library is instantiating a client object. The client abstractions serves to authenticate the user and act as the gateway for users to interact with their datasets, models, and model runs.

Create a client object

import nucleus
client = nucleus.NucleusClient("YOUR_API_KEY_HERE")

Create Dataset

dataset = client.create_dataset("My Dataset")

List Datasets

datasets = client.list_datasets()

Delete a Dataset

By specifying target dataset id. A response code of 200 indicates successful deletion.

client.delete_dataset("YOUR_DATASET_ID")

Append Items to a Dataset

You can append both local images and images from the web. Simply specify the location and Nucleus will automatically infer if it's remote or a local file.

dataset_item_1 = DatasetItem(image_location="./1.jpeg", reference_id="1", metadata={"key": "value"})
dataset_item_2 = DatasetItem(image_location="s3://srikanth-nucleus/9-1.jpg", reference_id="2", metadata={"key": "value"})

The append function expects a list of DatasetItem objects to upload, like this:

response = dataset.append([dataset_item_1, dataset_item_2])

Get Dataset Info

Tells us the dataset name, number of dataset items, model_runs, and slice_ids.

dataset.info

Access Dataset Items

There are three methods to access individual Dataset Items:

(1) Dataset Items are accessible by reference id

item = dataset.refloc("my_img_001.png")

(2) Dataset Items are accessible by index

item = dataset.iloc(0)

(3) Dataset Items are accessible by the dataset_item_id assigned internally

item = dataset.loc("dataset_item_id")

Add Annotations

Upload groundtruth annotations for the items in your dataset. Box2DAnnotation has same format as https://dashboard.scale.com/nucleus/docs/api#add-ground-truth

annotation_1 = BoxAnnotation(reference_id="1", label="label", x=0, y=0, width=10, height=10, annotation_id="ann_1", metadata={})
annotation_2 = BoxAnnotation(reference_id="2", label="label", x=0, y=0, width=10, height=10, annotation_id="ann_2", metadata={})
response = dataset.annotate([annotation_1, annotation_2])

For particularly large payloads, please reference the accompanying scripts in references

Add Model

The model abstraction is intended to represent a unique architecture. Models are independent of any dataset.

model = client.add_model(name="My Model", reference_id="newest-cnn-its-new", metadata={"timestamp": "121012401"})

Upload Predictions to ModelRun

This method populates the model_run object with predictions. ModelRun objects need to reference a Dataset that has been created. Returns the associated model_id, human-readable name of the run, status, and user specified metadata. Takes a list of Box2DPredictions within the payload, where Box2DPrediction is formulated as in https://dashboard.scale.com/nucleus/docs/api#upload-model-outputs

prediction_1 = BoxPrediction(reference_id="1", label="label", x=0, y=0, width=10, height=10, annotation_id="pred_1", confidence=0.9)
prediction_2 = BoxPrediction(reference_id="2", label="label", x=0, y=0, width=10, height=10, annotation_id="pred_2", confidence=0.2)

model_run = model.create_run(name="My Model Run", metadata={"timestamp": "121012401"}, dataset=dataset, predictions=[prediction_1, prediction_2])

Commit ModelRun

The commit action indicates that the user is finished uploading predictions associated with this model run. Committing a model run kicks off Nucleus internal processes to calculate performance metrics like IoU. After being committed, a ModelRun object becomes immutable.

model_run.commit()

Get ModelRun Info

Returns the associated model_id, human-readable name of the run, status, and user specified metadata.

model_run.info

Accessing ModelRun Predictions

You can access the modelRun predictions for an individual dataset_item through three methods:

(1) user specified reference_id

model_run.refloc("my_img_001.png")

(2) Index

model_run.iloc(0)

(3) Internally maintained dataset_item_id

model_run.loc("dataset_item_id")

Delete ModelRun

Delete a model run using the target model_run_id.

A response code of 200 indicates successful deletion.

client.delete_model_run("model_run_id")

For Developers

Clone from github and install as editable

git clone [email protected]:scaleapi/nucleus-python-client.git
cd nucleus-python-client
pip3 install poetry
poetry install

Please install the pre-commit hooks by running the following command:

poetry run pre-commit install

Best practices for testing: (1). Please run pytest from the root directory of the repo, i.e.

poetry run pytest tests/test_dataset.py

(2) To skip slow integration tests that have to wait for an async job to start.

poetry run pytest -m "not integration"

nucleus-python-client's People

Contributors

ardila avatar elliotbranson-scaleai avatar jihan-yin avatar malcolmgreaves avatar nastia39 avatar rkaplan avatar sasha-scale avatar srikanth-scale avatar srikanth-scale-2 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.