Giter Club home page Giter Club logo

examples's Introduction

Pachyderm Examples

Pachyderm Examples is a curated list of examples that use Pachyderm to accomplish various tasks.

Getting Started

  • Intro to Pachyderm Tutorial - A notebook introduction to Pachyderm, using the pachctl command line utility to illustrate the basics of Pachyderm data repositories and pipelines
  • Boston Housing Prices - A machine learning pipeline to train a regression model on the Boston Housing Dataset to predict the value of homes.
  • Boston Housing Prices (Intermediate) - Extends the original Boston Housing Prices example to show a multi-pipeline DAG and data rollbacks.
  • Market Sentiment - Train and deploy a fully automated financial market sentiment BERT model. As data is manually labeled, the model will automatically retrain and deploy.
  • Object Detection - Train an object detector on the COCO128 dataset with Lightning Flash, modify predictions with Label Studio, and version everything in Pachyderm.

Notebooks

Data Labeling

  • Label Studio Integration - Incorporate data versioning into any labeling project with Label Studio and Pachyderm.
  • Superb AI Integration - Version labeled image datasets created in Superb AI Suite using a cron pipeline.
  • Toloka Integration - Uses Pachyderm to create crowdsourced annotation jobs for news headlines in Toloka, aggregate the labeled data, and train a model.

Data Warehouse

  • BigQuery - Connector to ingests the result of a BigQuery query into Pachyderm as a parquet file.
  • Churn Prediction with Snowflake - Create a churn analysis model for a music streaming service with Pachyderm and Snowflake using the Data Warehouse integration.

Machine Learning

  • Boston Housing Prices (Intermediate) - Extends the original Boston Housing Prices example to show a multi-pipeline DAG and data rollbacks.
  • Breast Cancer Detection - A breast cancer detection system based on radiology scans scaled and visualized using Pachyderm.
  • AutoML - A Pachyderm pipeline that uses the mljar-supervised to train a machine learning model on a CSV file.
  • Market Sentiment - Train and deploy a fully automated financial market sentiment BERT model. As data is manually labeled, the model will automatically retrain and deploy.
  • Apache Spark - MLflow Integration - End-to-end example demostrating the full ML training process of a fraud detection model with Spark, MLlib, MLflow, and Pachyderm.

ML Experiment Tracking

  • Weights and Biases - Log pipelines running in Pachyderm to Weights and Biases.
  • ClearML Integration - Log Pachyderm experiments to ClearML's experiment montioring platform, using Pachyderm Secrets.

Model Deployment

examples's People

Contributors

armaanv avatar bbonenfant avatar brendoncarroll avatar chainlink avatar djanicekpach avatar jeffrifwald avatar jimmywhitaker avatar jrockway avatar lbliii avatar msteffen avatar nadegepepin avatar pappasilenus avatar tybritten avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

examples's Issues

Breast cancer detection example is broken with CRIO

I've deployed Pachyderm on AKS. Latest version of AKS don't seem to use the docker daemon anymore so deployment now works with --no-expose-docker-socket.

I've tried doing the breast cancer example right now and I saw that since it uses relative paths in the pipeline spec it doesn't work anymore.

I'm in the process of making this work, so hopefully I'll get a PR for this. But I wanted to get you a heads up for this as this is probably a general issue which affects all the other examples. To my understanding, the simplest solution is to provide the "working_dir" in the pipeline spec.

Pachyderm not an option for label studio cloud storage

This is strange because I saw pachyderm as an option for cloud storage just yesterday, not even 24 hours ago. I am running label studio on a python virtual environment like they recommend for Ubuntu users and this is the same way I accessed it yesterday.

I am running a minikube cluster with port-forwarding for the pachyderm service and I had to open a separate terminal to connect to pachyderm. I have pachyderm connected to localhost:30650 and I can open the console in my browser to see my repositories.

occasionally I see this output from the terminal running label-studio [2023-06-01 13:56:01,768] [urllib3.connectionpool::urlopen::823] [WARNING] Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1131)'))': /api/5820521/envelope/

but I am still able to work with it in the browser. Any help is greatly appreciated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.