Giter Club home page Giter Club logo

ai-practitioner-handbook's Introduction

AI Singapore's AI Practitioner Handbook

AISG's AI Practitioner Handbook Banner

Overview

This handbook is an accumulation of AI Singapore’s Innovation and Platforms Engineering team's experience in delivering more than 40 AI Minimum Viable products (MVP) under the 100E programme over the last 5 years. It is an edited volume containing individual original articles written by our AI Engineers, themed around common topics encountered over a typical AI development lifecycle.

Handbook chapters mapped onto AI development lifecycle

We envisioned this handbook as a useful guide for new AI Engineers joining AI Singapore and to quickly come up to speed on how we execute AI projects. However, the information contained here would also appeal to any new AI Engineers and Managers deploying their first AI project into production.

How to use this book

The AI Practitioner Handbook is designed to cater to different reading styles. When read end-to-end, the chapters will cover the typical AI project lifecycle, providing a comprehensive understanding of the entire process. Alternatively, book sections can be read in a standalone manner. Each section is written in a question-and-answer format, making it easy to find relevant information quickly.

The AI Practitioner Handbook complements other resources by focusing on the practical aspects of delivering AI projects. To get the most out of this book, use it alongside resources that cover AI algorithms, techniques, and research that further build upon your AI fundamentals. By doing so, you’ll gain a comprehensive understanding of both the theoretical and practical aspects of AI project execution.

What Our Reviewers Say

"Whether your role in an AI project is that of a technical lead, AI model implementor, data manager, domain or business function expert, or business-side project manager, this handbook will accelerate your learning curve for understanding the end-to-end aspects of the AI project."
Steven Miller, Professor Emeritus of Information Systems, Singapore Management University and co-author of Working with AI, MIT Press

"This is a fantastic book because it focuses on an often overlooked aspect of ML education—the actual problems, people and teams you deploy it for. It's a great resource for anyone who wants to successfully put the theory of ML into practice. BAM!"
- Josh Starmer, Founder and CEO at StatQuest

AISG’s release of the AI Practitioner Handbook as a practical and credible guide to accelerate the learning curve of incoming AI scientists and engineers is a generous service to Singapore’s growing AI community.
Jason Tamara Widjaja, Global AI Lead at a multinational biopharmaceutical company

ai-practitioner-handbook's People

Contributors

chuawjk avatar jadegoat avatar jannah-aisg avatar kwanchettan avatar rhoggs-bot-test-account avatar ryzalk avatar shafirahmad avatar snjannah avatar ssakhavi avatar syakyr avatar tappyness1 avatar waimarn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

ai-practitioner-handbook's Issues

Chapter overviews

Section Outline
This issue adds content to the overview page of each of the 8 chapters.

Definition of Done

How should we choose the appropriate data splitting strategy to mitigate bias and leakage and to improve fairness?

Section Outline
Please list the key points of content you intend to cover for this section

  1. What are the various data splits strategies (Static, Cross-validation, Stratified, Temporal)
  2. Potential scenarios of bias in data splits and suggestions to remedy
  3. Potential scenarios of unfairness in data splits and suggestions to remedy
  4. Potential scenarios of data leakage in data splits and suggestions to remedy

Definition of Done

Example issue

Section Outline
Please list the key points of content you intend to cover for this section

Definition of Done

  • Completed <chapter-num>/<filename.md>
  • Verified against self-review checklist
  • Page builds and displays correctly

Issue on page /REVIEWING.html

As an external audience, I don't have access to the Google drive link to the book cited at the bottom of this page. Should that link go to Amazon, perhaps? Or NLB?
Or, any other major source where the book is available to be purchased/borrowed.

What are some ML risks I should be aware of and how it relates to model robustness. What are some tools I can use to assess model robustness?

Section Outline

Definition of Done

What are some questions to be asked to the project sponsor to understand their deployment requirements?

Section Outline
This section covers common considerations that engineers should clarify with sponsors in relation to:

  • data transfer
  • solution architecture
  • system usage
  • deployment testing/staging environment
  • organisational restrictions

Note: To avoid questions regarding workflows & integration as these will be covered by issue #17.

Definition of Done

  • Completed 7-solution-delivery/deployment-requirements-gathering.md
  • Verified against self-review checklist
  • Page builds and displays correctly

Edit data split strategies

Section Outline
Please list the key points of content you intend to cover for this section

  • Edit temporal split section to cover cross validation and “champion” model training in temporal data split.
  • Edit cross-validation and nested cross-validation section to cover “champion” model training.

Definition of Done

Guidelines for Classification Metrics

Section Outline
Please list the key points of content you intend to cover for this section

  • Cover guidelines for choosing classification metrics.
  • Focus on imbalanced datasets, multi class, multi label problems.

Definition of Done

What are some of the factors/questions that an AI Engineer should consider during literature review?

Section Outline

  • Brief overview of AI literature review
  • Factors to consider during literature review (in no particular order)
    • dev time
    • business needs
    • Open-Sourcing of a pre-trained model
    • Key takeaways from the literature (do not focus on results)
    • code existence & readability (open source/ code present?)

Definition of Done

  • Completed 4-literaturereview/factors-to-consider-during-literature-review.md
  • Verified against self-review checklist
  • Added section to table of contents
  • Page builds and displays correctly on local machine

How can we provide a simple post-hoc explanation for black-box model performance to ensure reliability?

Section Outline

  • Introduction to Explainability vs Interpretability
  • Distinction between Glass-box vs Black-box models
  • Concepts surrounding Interpretability for Black-box models
  • List of established interpretability tools in various AI fields

Definition of Done

What to look out for in my data sources to reduce the risks of data poisoning and data extraction?

Section Outline

  • Brief overview of data poisoning and data extraction (data/model inversion attack)
  • List of items to check ie. making sure only necessary outputs are exposed, making sure model outputs are not used for training
  • Examples to relate to projects, ie. how exposed probability values can lead to model inversion

Definition of Done

Preamble

Section Outline
Add and edit Laurence's preamble section

Definition of Done

What are the common CV (specifically object detection and image segmentation) evaluation metrics?

Section Outline

  • Intersection Over Union (IOU)
  • True Positive, False Negatives and False Positives
  • Mean Average Precision/ Recall
  • Imbalanced data situations - how to handle
  • Thresholds - IOU, Confidence and non-max suppression (NMS) and their impact on Precision and Recall
  • Frames Per Second

Definition of Done

How do we make data splits repeatable?

Section Outline
Please list the key points of content you intend to cover for this section

  • Cover how repeatable data splits will make experiments more reproducible by ensuring that the same data point will always go into the same data split regardless of changes to the entire dataset.
  • One approach will be to hash column/columns and using the generated hash value for determining which data split the data point will fall into.

Definition of Done

What are the checklist of items to look out for to ensure reproducibility of the model pipeline as much as possible?

Section Outline

  • Overview of model reproducibility
  • List of items to check to ensure reproducibility
    • Model tests ie. API calls, algorithmic correctness
    • Model version control
    • Integration tests
    • Proper logging

Definition of Done

What are the questions that we ask our project sponsors so that we can understand their project lifecycle as well as their workflows in order to integrate our MVP with their organisation better?

Section Outline
Please list the key points of content you intend to cover for this section

  • What is the project sponsor's typical project lifecycle
  • What is the project sponsor's typical workflow
  • Active persuasion for our own stack (the advantages, etc.)
  • Client-first approach

Definition of Done

  • Completed 2-proj-mgmt-tech-lead/lifecycle-questions.md
  • Verified against self-review checklist
  • Page builds and displays correctly

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.