Giter Club home page Giter Club logo

caikit's Introduction

Caikit

Build Status Minimum Python Version Release Read the Docs OpenSSF Best Practices

Caikit is an AI toolkit that enables users to manage models through a set of developer friendly APIs. It provides a consistent format for creating and using AI models against a wide variety of data domains and tasks.

Caikit Overview

Capabilities

Caikit streamlines the management of AI models for application usage by letting AI model authors focus on solving well known problems with novel technology. With a set of model implementations based on Caikit, you can:

  • Run training jobs to create models from your data
  • Run model inference using data APIs that represent data as structures rather than tensors
  • Implement the right training techniques to fit the model, from static regexes to multi-GPU distribution
  • Merge models from diverse AI communities into a common API (e.g. transformers, tensorflow, sklearn, etc...)
  • Update applications to newer models for a given task without client-side changes

What Differentiates Caikit from Other AI Model Runtimes?

Developers who write applications that consume AI models are not necessarily AI experts who understand the intricate details of the AI models that they use. Some would like to treat AI as a "black box function" where they give it input and it returns the output. This is similar in cloud computing whereby some users would like to deploy their applications to the cloud without detailed knowledge of the cloud infrastructure. The value for them is in their application and that is what is of most interest to them.

Caikit provides an abstraction layer for application developers where they can consume AI models through APIs independent of understanding the data form of the model. In other words, the input and output to the model is in a format which is easily programmable and does not require data transformations. This facilitates the model and the application to evolve independently of each other.

When deploying a small handful of models, this benefit is minimal. The benefits are generally realized when consuming 10s or hundreds of AI models, or maintaining an application over time as AI technology evolves. Caikit simplifies the scaling and maintenance of such integrations compared to other runtimes. This is because other runtimes require an AI centric view of the model (for example, the common interface of “tensor in, tensor out”) which means having to code different data transformations into the application for each model. Additionally, the data form of the model may change from version to version.

Getting Started

There are 2 key things to define upfront when using Caikit to manage your AI model. They are as follows:

The module defines the entry points for Caikit to manage your model. In other words, it tells Caikit how to load, infer and train your model. An example is the text sentiment module. The data model defines the input and outputs of the model task. An example is the text sentiment data model.

The model is served by a gRPC server which can run as is or in any container runtime, including Knative and KServe. Here is an example of the text sentiment server code for gRPC. This references the module configuration here. This configuration specifies the module(s), which wrap the model(s), to serve.

There is an example of a client here which is a simple Python CLI which calls the model and queries it for sentiment analysis on 2 different pieces of text. The client also references the module configuration.

Check out the full Text Sentiment example to understand how to load and infer a model using Caikit. If you want to get started with developing and integrating your AI model algorithm using Caikit, checkout the GitHub template. In the template repository when you click on the green Use this template button, it generates a repository in your GitHub account with a simple customized module which is wrapped to be served by the Caikit runtime. This template is designed to be extended for module implementations.

User Profiles

There are 2 user profiles who leverage Caikit:

  • AI Model Author:
    • Model Authors build and train AI models for data analysis
    • They bring data and tuning params to a pre-existing model architecture and create a new concrete model using APIs provided by Caikit
    • Examples of model authors are machine learning engineers, data scientists, and AI developers
  • AI Model Operator:
    • Model operators use an existing AI model to perform a specific function within the context of an application
    • They take trained models, deploy them, and then infer the models in applications through APIs provided by Caikit
    • Examples of operators are cloud and embedded application developers whose applications need analysis of unstructured data

Documentation

Get going with Getting Started or jump into more details with the Python API docs.

Contributing

Check out our contributing guide to learn how to contribute to Caikit.

Code of Conduct

Participation in the Caikit community is governed by the Code of Conduct.

caikit's People

Contributors

abhishek-tamu avatar afrittoli avatar alex-jw-brooks avatar aluu317 avatar anhuong avatar atharva-satpute avatar dependabot[bot] avatar dtrifiro avatar evaline-ju avatar gabe-l-hart avatar gkumbhat avatar hickeyma avatar honakerm avatar joerunde avatar kellyaa avatar markstur avatar matthewpwilson avatar mynhardtburger avatar prashantgupta24 avatar pratibha-moogi avatar rafvasq avatar ronensc avatar sboagibm avatar ssukriti avatar tharapalanivel avatar tjohnson31415 avatar tozhangkai avatar wai25 avatar wgifford avatar willmj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

caikit's Issues

Pure python rest gateway

Description

As an author of a caikit library, I want to have a REST interface available without any extra work.

Discussion

Currently we use the grpc gateway wrapper to provide rest support. This is problematic because:

  • It requires a custom build process using go to pull the proto interfaces from the library, generate a gateway, and place it in a container image with configuration carefully wired up
  • Using go in a build process widens the vulnerability service of the final runtime image
  • grpc-gateway supports swagger v2 but we would really like openapi v3 support

We have the expertise available both for openapi and protobuf/grpc to write our own rest gateway layer in the runtime. This would allow us to support REST out of the box in python, with no extra build process or go required.

Acceptance Criteria

  • Unit tests cover new/changed code
  • Examples build against new/changed code
  • READMEs are updated
  • Type of semantic version change is identified

Support optional fields and fields with defaults

Is your feature request related to a problem? Please describe.

A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

jtd-to-proto now supports optional fields, and we want to add this support to caikit to correctly support use cases where input fields are optional/nullable and use cases where input fields have default values.

Describe the solution you'd like

Track optional vs. default args on introspection of arg types

Surface coverage report to GitHub, gate on coverage level

Is your feature request related to a problem? Please describe.

This is related to issue #15 - to caikit CI infrastructure rather than caikit itself.

Describe the solution you'd like

to keep the coverage level above the desired threshold it would be helpful to have the coverage data published as a comment on PRs and possibly enforce a minimum level of coverage required.

This helps PR reviewers to catch PRs that would reduce coverage before they are merged.

Describe alternatives you've considered

There are github apps in the marketplace that could provide this feature, like for instance python-coverage. I'm not recommending this one specifically, it would need to be verified.

Server should fail if port is not available and find_available_port=False

Describe the bug

When I start a second server on the same port (e.g. 8085), it appears to run just fine, but requests go to the first server which leads to misleading results and confusion.

Platform

Sample Code

Expected behavior

When find_available_port=False if the port is busy, then it should just FAIL to start so the conflict can be resolved (either by finding and killing the other server, configuring a different port, or setting find_available_port=True).

Observed behavior

  1. Start server1 on 8085 without a "model_T"
  2. Start a server2 on 8085 with a "model_T" while the other is still running.
  3. Use a client expecting to hit model_T on 2
  4. go look at the output and see model not loaded messages in output from server1
  5. look at output from server2. It is quiet but happy doing nothing.

Additional context

For example, run a server in some terminal and forget about it / hide it. Start working in a new terminal and wonder what is going on.

Have CODEOWNERS instead of or with OWNERS

Describe the bug

A clear and concise description of what the bug is.
there is an OWNERS file (blank)
For github, there should properly be a CODEOWNERS file
see https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners and https://github.blog/2017-07-06-introducing-code-owners/
which states it was inspired by OWNERS https://chromium.googlesource.com/chromium/src/+/master/docs/code_reviews.md#OWNERS-files

Platform

Please provide details about the environment you are using, including the following:

  • Interpreter version:
  • Library version:

Sample Code

Please include a minimal sample of the code that will (if possible) reproduce the bug in isolation

Expected behavior

A clear and concise description of what you expected to happen.

Observed behavior

What you see happening (error messages, stack traces, etc...)

Additional context

Add any other context about the problem here.

Improve quick start examples in README

Description

As a potential model author, or potential model operator, I want to be able to have some small snippets or tutorial to figure out how to start using caikit!

Acceptance Criteria

  • READMEs are updated

Union of return types throws an exception

Describe the bug

If I have a run function defined like this:

def run(self, sample_input: SampleInputType) -> Union[OtherOutputType, str]:

I get an exception:

AttributeError: '_SpecialForm' object has no attribute 'get_proto_class'

Platform

Please provide details about the environment you are using, including the following:

  • Interpreter version: 3.9
  • Library version: 0.1.3

Sample Code

Another example:

  def run(self, sample_input: SampleInputType) -> str:

gives me:

AttributeError: type object 'str' has no attribute 'get_proto_class'

Note: This has been deemed invalid since we do not want to support return of primitive types through modules.

Expected behavior

The introspection logic should iterate through the Union defined and pick the data model type from it.

Observed behavior

self = typing.Union[sample_lib.data_model.sample.OtherOutputType, str], attr = 'get_proto_class'

    def __getattr__(self, attr):
        # We are careful for copy and pickle.
        # Also for simplicity we don't relay any dunder names
        if '__origin__' in self.__dict__ and not _is_dunder(attr):
>           return getattr(self.__origin__, attr)
E           AttributeError: '_SpecialForm' object has no attribute 'get_proto_class'

Additional context

While returning the return_type from serializers.py, we should perhaps call the py_type_to_proto_type method (that existed before) to pick the data model from the Union type - or think about reusing some code from it!

Implement module hierarchy collapse to module

Description

We have decided to remove the module class hierarchy since the hierarchy was not enforced through guidelines, and over time it became confusing for both developers and users to determine the module class (e.g. block, workflow, resource) of a given functionality.

Discussion

  • We have decided to remove the different module types (e.g. block, workflow, resource) and name this one remaining "type" a module.
  • With composition, tasks can be added to the module to describe that a module meets a task inference spec.

Acceptance Criteria

  • Remove block, workflow, resource and replace with module
  • Determine a path forward for registries
  • Determine a path forward for current WorkflowSaver functionality - should it remain? Go in ModuleSaver?
  • Update any tests and documentation
  • Announce a breaking change!

Quickstart/docs for model users

Description

As a user of ML models, I want to know the steps to bootstrapping or loading a model with caikit, so that I can onboard my supported models to caikit and run them

Discussion

This will require some template / example repos to exist. See #64 and #67

We should cover at least the bootstrap/load and local run case, maybe we can leave the whole "how do I deploy this model in Fargate / Code Engine / Openshift" for later 😉

Acceptance Criteria

  • READMEs are updated

Rename/refactor `serializers`

Is your feature request related to a problem? Please describe.

We previously had these protobuf serializers in the service_generation package which would write protobuf files.

We offloaded the protobuf serialization support to jtd_to_proto, but kept these classes around and hijacked them to be the containers to hold data for our conversions from python functions to both

  1. @dataobject classes for the request messages
  2. RPC methods for the service definition

Describe the solution you'd like

These "serializers" (TaskPredictRPC, ModuleTrainRPC) should be renamed, and should probably own the logic to do like:

def to_data_model() -> Type[DataBase]:
    # return the data model class describing the request message

def to_service_json() -> Dict:
    # return some json snippet for the service deinition

currently that logic is a big mess in the service factory

Describe alternatives you've considered

/shrug

Additional context

We'd like to remove any and all protobuf logic from the caikit code

Remove the import cycles!

Describe the bug

We have a bunch of import cycles :(

We would like to remove them and then turn on the no-import-cycles flag in pylint

Platform

n/a

Sample Code

n/a

Expected behavior

no cycles!

Observed behavior

pydeps --show-cycles caikit shows some fun import cycles, and pylint fails

Additional context

Remove non-none fields when doing oneof to_dict and to_json

Describe the bug

class Foo(DataObjectBase):
    foo: Union[
        Annotated[int, FieldNumber(10), OneofField("foo_int")],
        Annotated[float, FieldNumber(20), OneofField("foo_float")],
    ]
foo1 = Foo(foo_int=2)
json_repr_foo = foo1.to_json()
assert json.loads(json_repr_foo) == {
    "foo_int": 2,
    "foo_float": None,
}

"foo_float": None should not be in the to_json represenatation.

Platform

Please provide details about the environment you are using, including the following:

  • Interpreter version: 3.9
  • Library version: 0.5.2

Expected behavior

All None fields should not show up.

Testing the examples in CI

Is your feature request related to a problem? Please describe.

We should test the examples as part of CI to ensure that they are kept up to date with the latest changes to caikit so that the project always provides a great experience for newcomers trying out caikit.

Describe the solution you'd like

A CI job that runs all the examples under https://github.com/caikit/caikit/tree/main/examples

Describe alternatives you've considered

The alternative today is to ask PR authors and reviewers to check the examples manually.
It could be added to the PR template as a checkbox to remind contributors about it.

Additional context

See the issue solved in #145

Get to 100% test coverage

Describe the bug

We don't have 100% line coverage in unit tests.
We really like having 100% test coverage, so that we can ensure all future changes at least have tests that invoke all lines of code.

We've already caught bugs in some modules by requiring 100% coverage, and keeping it once you have it is not hard

Platform

every platform

Sample Code

pytest tests --cov --cov-fail-under=100

Expected behavior

passing tests 😎

Observed behavior

~80% coverage

Additional context

Coverage reports can be found in the htmlcov directory after running tests

tox -e 3.9
open htmlcov/index.html

Implement OSSF Scorecard

Is your feature request related to a problem? Please describe.

The OpenSSF scorecard helps projects implement security best practices. Quoting from their website:

We created Scorecard to help open source maintainers improve their security best practices and to help open source consumers judge whether their dependencies are safe.

Describe the solution you'd like

The scorecard covers various aspects of an open source project, so implementing it will require an initial assessment and possibly a number of subtasks created. The end result would be to have the ossf scorecard badge in the main readme.

Describe alternatives you've considered

The security posture of the project does not depend on the scorecard, but it's a nice tool to guide projects in the right direction.

Document initial architectural decisions

Is your feature request related to a problem? Please describe.

During the initial implementation of caikit we iterated through quite a few design choices, and landed on some important ones after trial and error. We should document those for reference.

Describe the solution you'd like

A few ADRs about key decisions that have already been made

One or more documents about general architectural principles that we consider when making architectural decisions, to guide future work

Describe alternatives you've considered

Not documenting lmao

Additional context

Possible decisions to consider:

  • Using protobuf as the base backend for all data models, open to extension
  • JTD as the single source of schema definition for data models
  • Runtime and interfaces co-located with caikit
  • train (classmethod) instead of fit (instance method)
  • load and save semantics
  • distributed backends

Support runtime base path for DataStreamSource

Is your feature request related to a problem? Please describe.

When running caikit.runtime and using DataStreamSource to reference on-disk files, the client making the training request should not need to know about the directory structure for the running server (e.g. the mounting path for a shared volume in a kubernetes deployment).

Describe the solution you'd like

Add a global runtime configuration option to configure a shared base path where the training API will look for referenced data stream files.

Stop inferring tasks in service generation

Is your feature request related to a problem? Please describe.

Modules will now have a TASK_CLASS attribute, pointing to a @task with required input parameters and an output type.

Currently the service_generation code tries to infer task by the model's module path which is obscure and flaky.
Additionally we rely on the runtime.service_generation.primitive_data_model_types to define what is "allowed" to be in the inputs for our tasks.

Describe the solution you'd like

We can use the module.TASK_CLASS to group modules by task

We can turn the "which data models are allowed to be here" logic around and just check that the modules have the required input parameters to be included in the task rpc. Any other proto-able input parameters in the run signatures should be fine.

Describe alternatives you've considered

Additional context

To help support users who have some modules in a task that accept extra data model parameters, which are currently disallowed

Primitive oneof (discriminator) support in JTD (and eventually caikit)

Is your feature request related to a problem? Please describe.

Currently it's not possible to generate this oneof through DataObjects:

oneof time {
        int64 ts_int     = 1;
        float ts_float   = 2;
    } 

Describe the solution you'd like

I want to be able to define a schema within a DataObject such that the proto is defined like this:

oneof time {
        int64 ts_int     = 1;
        float ts_float   = 2;
    } 

Describe alternatives you've considered

I can create a discriminator like this:

{
        "properties": {
            "whatever": {
                "discriminator": "sequence",
                "mapping": {
                    "TS_INT": {
                        "properties": {
                            "ts_int": {"type": "int32"},
                        }
                    },
                    "TS_FLOAT": {
                        "properties": {
                            "ts_float": {"type": "float32"},
                        }
                    },
                },
            },
        }
    }

but this creates the primitives one layer deeper than what I want.

 message TSINT {

    /*-- fields --*/
    int32 ts_int = 1;
  }
  message TSFLOAT {

    /*-- fields --*/
    float ts_float = 1;
  }

  /*-- fields --*/

  /*-- oneofs --*/
  oneof sequence {
    caikit_data_model.sample_lib.SampleDataModelOneof.TSINT ts_int = 1;
    caikit_data_model.sample_lib.SampleDataModelOneof.TSFLOAT ts_float = 2;
  }

Additional context

Add any other context about the feature request here.

Allow multiple output types (names) in caikit runtime

Is your feature request related to a problem? Please describe.

Currently caikit.runtime deduces return type based on request type name <task-type>Request -> <task-type>Prediction, which is implemented here. However, a user may need to have different output type names other than <task-type>/Prediction.

Describe the solution you'd like

One possible solution might be for output pattern types to be made configurable via caikit.runtime.config and allowing list of output "suffixes".

Describe alternatives you've considered

Renaming all my output types for .run function with Prediction suffixes, however the term "prediction" doesn't represent all the domains of AI problems.

Additional context

Add any other context about the feature request here.

Fix batcher test deadlocks to enable tests

Describe the bug

Some batcher tests tend to deadlock occasionally and skipped for now (by pytest).

Sample Code

pytest tests/runtime/model_management/test_batcher (may need to be run multiple times)

Expected behavior

No deadlocks, tests pass consistently

Observed behavior

Occasionally tests will deadlock - test_valid_req_after_invalid_req tends to deadlock the most but others may as well

Use request metadata for the model name for training

Is your feature request related to a problem? Please describe.

The generated training APIs currently include a model_name parameter that defines the name of the model to train.
This can collide with an actual training parameter if the train function contains a model_name parameter, e.g.

def train(training_data: DataStream[MyDataType], model_name: str = "roberta") -> MyModelClass:

Describe the solution you'd like

Similar to how the mm-model-id header is used during inference to identify the model to run, a grpc metadata header can be used at training time to set the name of the model to train

Describe alternatives you've considered

We could nest the training parameters under a sub-message in the request, but that makes the API a bit more awkward for users. There may also be some contexts where training parameters are already nested in a sub-field of a training-orchestration-system-specific message that hits a translator in front of caikit, so further nesting would make the messages super complex.

Additional context

We would like to avoid any situations where we reserve a parameter name, one of our goals is to be a seamless runtime on top of python code that should not enforce any extra constraints where not strictly necessary.

Validate task modules at decoration time

Is your feature request related to a problem? Please describe.

When a @module is decorated with a task=SomeTask, the task class' validation should be run to ensure that the module is correct.

Describe the solution you'd like

The module decoration should fail to give good feedback to users that their module is incorrect.

Describe alternatives you've considered

We've been deferring this to the runtime service generation, which causes a whole lot of problems when modules are built and only later when the runtime doesn't work does it turn out that the module requires changes.

Additional context

This would generally make users' code fail to import if their run functions are ill-specified for the given task

Union within a Train function argument throws an exception

Describe the bug

If we have a train function defined like so:

def train(
        cls,
        sample_input: Union[SampleInputType, str],
    )

I see an exception:

ValueError: Invalid input schema, cannot handle unions yet: typing.Union[sample_lib.data_model.sample.SampleInputType, str]

Platform

Please provide details about the environment you are using, including the following:

  • Interpreter version: 3.9
  • Library version: 0.1.4

Sample Code

Already provided

Expected behavior

The introspection needs to pick the primitive or primitive_data_model_types from the Union.

Observed behavior

What you see happening (error messages, stack traces, etc...)

Additional context

Add any other context about the problem here.

Investigate OpenTelemetry for metrics

Is your feature request related to a problem? Please describe.

The py-grpc-prometheus dependency for metrics serving is fairly outdated and not very maintained. We want to investigate a more maintained solution for exposing metrics such as model loading time, request input size, input throughput etc.

We don't want to just remove Prometheus completely because users could want these for use cases involving observability.

Describe the solution you'd like

Look into OpenTelemetry: https://opentelemetry.io/docs/instrumentation/python/exporters/

Potentially helpful: https://www.timescale.com/blog/prometheus-vs-opentelemetry-metrics-a-complete-guide/

  • Swap out Prometheus metrics for OpenTelemetry metrics
  • Configure Prometheus exporter or another exporter (and document user guidance)

Release versioning and API Compatibility Policy

Is your feature request related to a problem? Please describe.

The issue solved by #145 prompted me to think about the need for an API compatibility policy for caikit. Looking at the release numbers, it seems like the project adopts semantic versioning today, which is perfect.
To clarify end-user expectations and foster adoption I think caikit should document the versioning schema and eventually implement an API compatibility policy, specifically:

  • what is the relationship between version numbers and backward incompatible changes
  • how the project handles deprecations on API features i.e. how long they last
  • what surface of the API is protected by the API compatibility policy

Describe the solution you'd like

  1. The first step would be to document the versioning scheme.
  2. The second step should be to document the policies about backward incompatible changes and possibly the surface of the API protected/affected by such policies. In the beginning, this could simply be that we may do backwards incompatible changes in any new minor release, with no advance notice, on any part of the API.
  3. Eventually, once the project is ready for it, it shall define an API compatibility policy to guarantee no backwards-incompatible changes without notice/deprecation periodic and at least a subset of the API surface

Simplify backend config a bit

Is your feature request related to a problem? Please describe.

  • The disable_local flag adds fun complexity, we can instead simplify the config and the code that configures backends by including LOCAL in the base config. Users can merge in more backends and keep local, or override the backends completely to remove it
  • The name field of backends is also a bit extra, we don't have any current uses for a name field, so the behavior of allowing multiple backends of the same type but failing if they don't have unique names is a bit confusing when the names are unused.

Describe the solution you'd like

  • Merge / override semantics of lists so that we can remove the disable_local flag
  • Support backend lists like:
load_priority:
  - type: TGIS
    config: 
      url: foo
  - type: TGIS
    config: 
      url: bar

Describe alternatives you've considered

Additional context

Caikit Github information completion for Community Standards

Is your feature request related to a problem? Please describe.

Complete the github information for caikit to have a complete Community Standards profile.
https://github.com/caikit/caikit/community

Describe the solution you'd like

Add the following:

  • Description
  • Security policy
  • Issue templates
  • Pull request template
  • Repository admins accept content reports

Describe alternatives you've considered

N/A. Non-critical suggestion only.

Additional context

With the Description item fixed, then in the Code view https://github.com/caikit/caikit About section

About
No description, website, or topics provided.

will be updated.

Support oneofs/Union[int, str, ...] in function signatures

Is your feature request related to a problem? Please describe.

Running the training introspection on a funtion that has Union[int, str] of 2 primitive types results in an error:

def train(cls, time_column: Union[int, str] = None):

output:

E           ValueError: Invalid input schema: typing.Union[int, str]

Describe the solution you'd like

For a function signature like this:

def train(cls, time_column: Union[int, str] = None):

The introspection should ideally create a oneof with the 2 fields:

oneof time_column {
        int64 ts_int     = 1;
        string ts_str   = 2;
    } 

Provide a repository template

Description

As an ai model author, I want to spin up a new project with caikit, so that I can wrap my models in caikit interfaces to load in my applications

Discussion

@gkumbhat has some experience setting up slick repo templates.

We can probably get one going that does any boilerplatey stuff, though we should try to remove as much as possible.

Should we include docker image support to build an image that boots the runtime?

Acceptance Criteria

  • A template repo exists that users can quickly hack on to get a caikit project running

Consolidate config

Description

As a library extending caikit (or somebody deploying a container of that library) I would like a single file defining all of my config.

(Instead of one file for core config and one for runtime config)

Discussion

The caikit.core.config and caikit.runtime.config packages developed separately, and now could use merging.
It would be great to have

  • only a single package deal with reading config
  • only a single config yml baked into the caikit release
  • only a single config file required to be written by the extension library
  • the same CONFIG_FILES sources override from the runtime config package so that configs can be easily supplied by files at runime

The environment overrides (e.g. test.config.name vs. prod.config.name, not environment variable overrides) could be considered for deletion, it's unclear if they are actually useful

Acceptance Criteria

  • Unit tests cover new/changed code
  • Examples build against new/changed code
  • READMEs are updated
  • Type of semantic version change is identified

Bump to py-to-proto 0.3.x

Description

As a caikit maintainer, I want to bump to py-to-proto 0.3.x, so that we can utilize the oneof naming convention change (and future changes) from the repo.

Discussion

Multiple tests will need to change to adhere to the breaking changes introduced in 0.3.x.

Acceptance Criteria

  • Unit tests cover new/changed code
  • Type of semantic version change is identified

Users should be able to specify model types for model size multiplier use

Is your feature request related to a problem? Please describe.

Caikit today assumes model type is given by model mesh, where users don't generally map specific model implementations to model types. It then uses the given model types to estimate the model size (as potentially specified in the runtime config). As is, Caikit users would have to maintain a full external mapping of Kserve model types to individual Python modules, and this is not a sustainable pattern.

Describe the solution you'd like

A user should be able to set model type, potentially by actual module class of the block e.g. my_library.blocks.sample_task.SomeModelImpl and configure model size multipliers for these model types

Describe alternatives you've considered

Models could contain metadata about their size e.g. model.size - but creating a way to have that info on all models by default is also not a sustainable pattern. Users would have to know the internal architecture of their models, and this information may potentially have to extend across different devices.

For function signatures with List, send a List (and not a datastream)

Describe the bug

If the run/train function signature has a List argument, then a Datastream object is passed instead of a List object.

Platform

Please provide details about the environment you are using, including the following:

  • Interpreter version: 3.9
  • Library version: 0.4.x

Sample Code

Coming soon...

Expected behavior

If the run/train function signature requires a List, then a List should be passed and not a Datastream.

Observed behavior

Datastream is always passed.

Additional context

I'm assuming this requires the introspection to know the function signature of the corresponding run/train function it’s going to call, only then can it correctly deduce if it should be sending a list or a DataStream. I think knowing the function signature ahead of time is enabled by the module consolidation PR.

Caikit API Docs

Description

Caikit python references on readthedocs.org which describe Caikit API.

Discussion

We've used sphinx in the past with some success. We had been heavily relying on the autodoc features to walk the import tree and document everything, but the resulting docs were a bit scattered and sometimes missing.

It might be worth explicitly annotating which apis should be documented, and laying that out in a docs tree for sphinx to generate instead.

Acceptance Criteria

Streaming output support

Is your feature request related to a problem? Please describe.

Text generation will require supporting modules that return DataStream[T] to stream output.
This will need to be hooked up to a unary-stream RPC.

Describe the solution you'd like

Tasks can specify DataStream[T] as output type

Service generation code adds output streaming flag in py-to-proto

Servicer implementation returns something like...

stream = model.run
return iter(stream.map(stream_class.to_proto()))

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context about the feature request here.

Remove project structure requirements/assumptions

Is your feature request related to a problem? Please describe.

Currently, there are several places in the code that make assumptions about the directory/module structure for the derived library. We should identify these areas and make them more generic to allow arbitrary project structures.

Describe the solution you'd like

  • Identify all places where module structure assumptions are made
  • Fix them!

Support for module paths for runtime service generation inclusions and exclusions

Is your feature request related to a problem? Please describe.

Caikit today supports module guids for module inclusion or exclusion during runtime service generation. However, it is difficult for users looking at the config to know what modules the guids actually apply to without grepping through runtime libraries. This would be more readable if users could specify module paths e.g. my_library.blocks.sample_task.SomeModelImpl.

Describe the solution you'd like

In place of or alongside the module guid support for runtime service generation inclusion and exclusion, allow users to specify modules by path for inclusion or exclusion.

Describe alternatives you've considered

As-is, continue using only model guids for reference

Support runtime service generation config for `modules` and `task_types`

Is your feature request related to a problem? Please describe.

ServicePackageFactory currently does not respect the modules and task types requested in config.yml.

Describe the solution you'd like

Add support in the service factory to respect the configs for:

service_generation:
  module_types:
    included: ["blocks", "workflows"]
  modules:
    excluded: []
  task_types:
    excluded: []

modules and task_types could also have an include list as well that excludes everything else

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context about the feature request here.

Reshuffle backend semantics

We want to make the backends configuration a bit cleaner and to fit in nicely with the top-level caikit.configure() method.
We don't really want an extra backends.configure() that users need to call separately.
JK we do want caikit.configure() to be side-effect free

Goals:

  • The backend configuration should be smart enough to not barf if called multiple times. Currently it can only be called once or it throws. We should probably:
    • Actually call backend_instance.reconfigure() if config for that backend has changed
    • Throw only if that call throws
    • Do not attempt to call reconfigure if the configuration for that backend has not changed
  • start and stop are interesting, they can probably be invoked lazily by the backend itself. Do we need the public start_all that is currently unused?

That should alleviate the burden of many model loading tests needing to use the clunky reset_globals (needs a better name anyway) config. New backend configurations shouldn't collide with existing ones from other tests.

We should also consider refactoring to extract the loading logic of backends into its own module (or at least method) for better decoupling and testability.

Decorator for providing task level APIs

Is your feature request related to a problem? Please describe.

A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context about the feature request here.

Delete `build_caikit_library_request_dict`

Description

As a caikit maintainer, I want to burn build_caikit_library_request_dict to the ground because it replicates all the logic from DataModels

Discussion

Previously in history, runtime request messages themselves were not data model objects, so we required custom deserialization logic to .from_proto each field individually.

This is no longer the case, the request message itself is now a data model object, so it has the ability to .from_proto itself.
However, there is some extra runtime-specific logic that we would want to keep around, like:

  • Fields that were not set at all should not be passed along
  • For task inference RPCs, if fields were set that the requested model does not support, they should also not be passed along

There may be some more custom logic hiding in that function as well that we'll need to dig into a lil' bit

Acceptance Criteria

  • Unit tests cover new/changed code
  • Examples build against new/changed code
  • READMEs are updated
  • Type of semantic version change is identified

Support dataclasses as schema for data model

Description

As a Caikit model author, I want to be able to define my data model classes with dataclasses, so that I can have a Python-friendly schema to define my data models.

Discussion

We want to use the new features in py-to-proto, formerly jtd-to-proto.

Acceptance Criteria

  • Users are able to define data models with dataclasses
  • Service generation uses dataclasses
  • Unit tests cover new/changed code
  • Examples build against new/changed code

Can we not E0503...os_error:"No such file or directory all the time? Is it just me?

Describe the bug

I get an error, but I don't think it's a real error.

E0503 17:25:08.082902000 140704599909184 chttp2_server.cc:1045] UNKNOWN:No address added out of total 1 resolved for 'unix:///tmp/mmesh/grpc.sock' {created_time:"2023-05-03T17:25:08.082045-07:00", children:[UNKNOWN:Unable to configure socket {fd:6, created_time:"2023-05-03T17:25:08.080064-07:00", children:[UNKNOWN:No such file or directory {created_time:"2023-05-03T17:25:08.078961-07:00", errno:2, os_error:"No such file or directory", syscall:"bind"}]}]}

I'm looking for cleaner output for demo, but really logging this as a bug because bogus but serious looking error message can cause everybody that has trouble getting started to have to deal with triage of this one.

Unless it is just me. Then what might I be doing wrong?

Platform

Please provide details about the environment you are using, including the following:

  • Interpreter version: 3.9
  • Library version: 0.2.0

Sample Code

from caikit.runtime import grpc_server
grpc_server.main()

Expected behavior

Library users should not see bogus errors that look like real errors. These get in the way every time someone has to troubleshoot something.

Observed behavior

see above

Additional context

Spin up a public docs site

Description

As a model user, model author, or caikit contributor, I want to open caikit.org and find documentation that helps me

Discussion

Long term hosting TBD, initially we'll probably want to throw some stuff in gh-pages.

Should gh-pages be hosted here in caikit/caikit or in another public docs repo?

Acceptance Criteria

  • caikit.org points to a site we manage

Documentation on how to use Caikit to manage AI models

Description

Documentation on how to use Caikit to:

  • Load/Serve an AI model
  • Infer/Run an AI model
  • Train an AI model

This should also include examples and bolierplate to support the documentation. Use open source models like the models from Hugging Face as an example.

Discussion

What architectural details are relevant to this user?

  • datamodel / dataobject
  • module API (run / bootstrap / save / load / train)
  • module registry?

Acceptance Criteria

  • QuickStart on how to infer a model (#195)
  • Quickstart on how to train a model
  • GitHub template which provide boilerplate dummy model (https://github.com/caikit/caikit-template)
  • GitHub template which provide boilerplate HuggingFace model
  • Example using an open source model (for example, from Hugging face) (#84)
  • Configuration explained
  • FAQ incorporating questions like:
    • Does Caikit add any latency when inferencing models?
    • How do run models with different requirements like CPU/GPU?
    • How much space does Caikit need?
    • Can Caikit load any models?
    • Is there a limit on the size of a model that Caikit can run?
    • How does Caikit compare with other runtimes?
  • README updated to reference the content produced

Enforce included module types in service factory

Description

In #8, modules (in the form of module_guids) and task_types configs were enforced in the service factory, but module_types was not. We want to allow module_types to be specified, since users may generally only want to generate RPCs for blocks and workflows (default config currently), and not for other modules such as resources.

#41

Discussion

Reference TODO in code: https://github.com/caikit/caikit/blob/main/caikit/runtime/service_factory.py#L211

Acceptance Criteria

  • module_types specification is enforced in service factory
  • Unit tests cover updated code

Provide a huggingface template + examples

Is your feature request related to a problem? Please describe.

Huggingface has loads of ai models ready to download and use. We should make the on-ramp to consuming them via caikit as simple as possible.

Describe the solution you'd like

A repo that has support for a couple types of huggingface models, with bootstrap functionality to stand up caikit model artifacts with one line in the interpreter.

Describe alternatives you've considered

???

Additional context

Supporting Dicts with data model serialization

Description

As a caikit library developer, I want to be able to support Dicts in function signatures (.train, .run) so that I can serve my library that may have Dict method parameters.

Acceptance Criteria

  • Support Dicts in service generation
  • Unit tests

Fix `test_train_job_works_with_wait`

Describe the bug

Flaky test- we sometimes get a ConfigParser object has no attribute training error

Platform

Please provide details about the environment you are using, including the following:

  • Interpreter version: 3.9
  • Library version: ???

Sample Code

tox -e 3.9

Expected behavior

tests should all pass

Observed behavior

that test fails, don't have a stack trace atm

Additional context

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.