aisingapore / ai-practitioner-handbook Goto Github PK

AI Practitioner Handbook | https://aisingapore.github.io/ai-practitioner-handbook/

TeX 0.35% Jupyter Notebook 99.65%

ai-practitioner-handbook's Introduction

AI Singapore's AI Practitioner Handbook

Overview

This handbook is an accumulation of AI Singapore’s Innovation and Platforms Engineering team's experience in delivering more than 40 AI Minimum Viable products (MVP) under the 100E programme over the last 5 years. It is an edited volume containing individual original articles written by our AI Engineers, themed around common topics encountered over a typical AI development lifecycle.

We envisioned this handbook as a useful guide for new AI Engineers joining AI Singapore and to quickly come up to speed on how we execute AI projects. However, the information contained here would also appeal to any new AI Engineers and Managers deploying their first AI project into production.

How to use this book

The AI Practitioner Handbook is designed to cater to different reading styles. When read end-to-end, the chapters will cover the typical AI project lifecycle, providing a comprehensive understanding of the entire process. Alternatively, book sections can be read in a standalone manner. Each section is written in a question-and-answer format, making it easy to find relevant information quickly.

The AI Practitioner Handbook complements other resources by focusing on the practical aspects of delivering AI projects. To get the most out of this book, use it alongside resources that cover AI algorithms, techniques, and research that further build upon your AI fundamentals. By doing so, you’ll gain a comprehensive understanding of both the theoretical and practical aspects of AI project execution.

What Our Reviewers Say

"Whether your role in an AI project is that of a technical lead, AI model implementor, data manager, domain or business function expert, or business-side project manager, this handbook will accelerate your learning curve for understanding the end-to-end aspects of the AI project."
- Steven Miller, Professor Emeritus of Information Systems, Singapore Management University and co-author of Working with AI, MIT Press

"This is a fantastic book because it focuses on an often overlooked aspect of ML education—the actual problems, people and teams you deploy it for. It's a great resource for anyone who wants to successfully put the theory of ML into practice. BAM!"
- Josh Starmer, Founder and CEO at StatQuest

“AISG’s release of the AI Practitioner Handbook as a practical and credible guide to accelerate the learning curve of incoming AI scientists and engineers is a generous service to Singapore’s growing AI community.”
- Jason Tamara Widjaja, Global AI Lead at a multinational biopharmaceutical company

ai-practitioner-handbook's People

Contributors

Stargazers

Watchers

Forkers

jansen-lin ellacharmed sohwencong adambear hashim21223445 morni-andoka qasura-h

ai-practitioner-handbook's Issues

Chapter overviews

Section Outline
This issue adds content to the overview page of each of the 8 chapters.

Definition of Done

Completed <chapter-num>/overview.md
Verified against self-review checklist
Added section to table of contents
Page builds and displays correctly on local machine

How can I cultivate an effective and cohesive AI development team?

Section Outline

Define “effective” and “cohesive”
Contextualise the following leadership principles:
- Model the way
- Promote a shared vision
- Challenge the process
- Enable others to act
- Encourage the heart

Definition of Done

Completed 2-proj-mgmt-tech-lead/cultivate-ai-team.md
Verified against self-review checklist
Added section to table of contents
Page builds and displays correctly on local machine

Set up `proselint` in CI pipeline

How should we choose the appropriate data splitting strategy to mitigate bias and leakage and to improve fairness?

Section Outline
Please list the key points of content you intend to cover for this section

What are the various data splits strategies (Static, Cross-validation, Stratified, Temporal)
Potential scenarios of bias in data splits and suggestions to remedy
Potential scenarios of unfairness in data splits and suggestions to remedy
Potential scenarios of data leakage in data splits and suggestions to remedy

Definition of Done

Completed 5-data-mgmt-exp-proc/<filename.md>
Verified against self-review checklist
Added section to table of contents
Page builds and displays correctly on local machine

Example issue

Section Outline
Please list the key points of content you intend to cover for this section

Definition of Done

Completed <chapter-num>/<filename.md>
Verified against self-review checklist
Page builds and displays correctly

Issue on page /REVIEWING.html

As an external audience, I don't have access to the Google drive link to the book cited at the bottom of this page. Should that link go to Amazon, perhaps? Or NLB?
Or, any other major source where the book is available to be purchased/borrowed.

During onboarding, what are the team principles that I should convey to my apprentices?

Section Outline

What is the difference between Principles and Values?
A sample of Engineering Principles

Definition of Done

Completed 2-proj-mgmt-tech-lead/principles.md
Verified against self-review checklist
Added section to table of contents
Page builds and displays correctly on local machine

Add CONTRIBUTING.md file to specify guidelines for contributors.

What are some considerations in setting up a project repository to facilitate collaboration and establish good coding practices among developers?

Section Outline

Things to consider when setting up a new Git project repository
Tips and best practices in designing a repository structure for an ML project

Definition of Done

Completed 3-collab-dev-platforms/repo-structure-setup.md
Verified against self-review checklist
Page builds and displays correctly

What are some ML risks I should be aware of and how it relates to model robustness. What are some tools I can use to assess model robustness?

Section Outline

Brief overview of ML risks related to robustness (adversarial and non-adversarial)
Link to https://oss.gitlab.aisingapore.net/aisg-handbook/ai-practitioner-handbook/book/1-pre-project-phase/key_areas_in_data.html (Robustness starts with data curation)
Include point of view for robustness testing from external auditing team and developer team
List of tools to start with ie. Microsoft CheckList, IBM ART, and examples

Definition of Done

Completed <chapter-num>/<filename.md>
Verified against self-review checklist
Added section to table of contents
Page builds and displays correctly on local machine

What are some questions to be asked to the project sponsor to understand their deployment requirements?

Section Outline
This section covers common considerations that engineers should clarify with sponsors in relation to:

data transfer
solution architecture
system usage
deployment testing/staging environment
organisational restrictions

Note: To avoid questions regarding workflows & integration as these will be covered by issue #17.

Definition of Done

Completed 7-solution-delivery/deployment-requirements-gathering.md
Verified against self-review checklist
Page builds and displays correctly

Edit data split strategies

Section Outline
Please list the key points of content you intend to cover for this section

Edit temporal split section to cover cross validation and “champion” model training in temporal data split.
Edit cross-validation and nested cross-validation section to cover “champion” model training.

Definition of Done

Edited 5/data-splits.md
Verified against self-review checklist
Added section to table of contents
Page builds and displays correctly on local machine

Guidelines for Classification Metrics

Section Outline
Please list the key points of content you intend to cover for this section

Cover guidelines for choosing classification metrics.
Focus on imbalanced datasets, multi class, multi label problems.

Definition of Done

Completed 6/classification-metrics-guidelines.md>
Verified against self-review checklist
Added section to table of contents
Page builds and displays correctly on local machine

Which data storage options are suitable for my project?

Section Outline

Types of common business problems
Data storage recommendation for said business problems

Definition of Done

Completed 5-data-mgmt-exp-proc/data-mgmt.md
Verified against self-review checklist
Page builds and displays correctly

How can we better evaluate Time-Series models?

Section Outline

Time Series Classification
Time Segments
Overlapping Segments
Case Study (comparing Time segment + Overlapping Segments)

Definition of Done

Completed 6-modelling/evaluating-timeseries.md
Verified against self-review checklist
Added section to table of contents
Page builds and displays correctly on local machine

What are some of the factors/questions that an AI Engineer should consider during literature review?

Section Outline

Brief overview of AI literature review
Factors to consider during literature review (in no particular order)
- dev time
- business needs
- Open-Sourcing of a pre-trained model
- Key takeaways from the literature (do not focus on results)
- code existence & readability (open source/ code present?)

Definition of Done

Completed 4-literaturereview/factors-to-consider-during-literature-review.md
Verified against self-review checklist
Added section to table of contents
Page builds and displays correctly on local machine

How can we provide a simple post-hoc explanation for black-box model performance to ensure reliability?

Section Outline

Introduction to Explainability vs Interpretability
Distinction between Glass-box vs Black-box models
Concepts surrounding Interpretability for Black-box models
List of established interpretability tools in various AI fields

Definition of Done

Completed 6_Modeling/post-hoc-explanation.md
Verified against self-review checklist
Added section to table of contents
Page builds and displays correctly on local machine

How might we simplify and translate technical jargon for a non-technical audience?

Section Outline
Please list the key points of content you intend to cover for this section

Definition of Done

Completed 2-proj-mgmt-tech-lead/translate-technical-jargon.md
Verified against self-review checklist
Added section to table of contents
Page builds and displays correctly on local machine

What are the processes involved to build a basic end-to-end workflow, and how do I enhance the workflow with quality-of-life improvements from there?

Section Outline

Processes of a typical end-to-end workflow
How basic can you get the workflow to be
How to enhance the workflow with quality-of-life improvements from there

Definition of Done

Completed 5-data-mgmt-exp-proc/e2e-workflow.md
Verified against self-review checklist
Page builds and displays correctly

What are considerations (internally by the team), and externally (by the shareholders) when selecting evaluation metrics?

Section Outline

High level metrics (for shareholders)
In-depth metrics (for internal project team)

Definition of Done

Completed 6-modelling/eval-metrics.md
Verified against self-review checklist
Added section to table of contents
Page builds and displays correctly on local machine

What are the factors/considerations/criteria to consider that will reduce potential technical debt?

Section Outline

What is Technical Debt
Deliberate and prudent technical debt
Technical Debt in Machine Learning.

Definition of Done

Completed 1_Pre_project_Phase/technical_debt.md>
Verified against self-review checklist
Added section to table of contents
Page builds and displays correctly on local machine

What to look out for in my data sources to reduce the risks of data poisoning and data extraction?

Section Outline

Brief overview of data poisoning and data extraction (data/model inversion attack)
List of items to check ie. making sure only necessary outputs are exposed, making sure model outputs are not used for training
Examples to relate to projects, ie. how exposed probability values can lead to model inversion

Definition of Done

Completed <chapter-num>/<filename.md>
Verified against self-review checklist
Added section to table of contents
Page builds and displays correctly on local machine

[Example] What are some rules of thumb for estimating model bias and variance?

Definition of Done

Completed 6-modelling/bias-variance-rules-of-thumb.md
Completed self-review checklist
Page builds and displays correctly

What are some questions that an AI engineer can ask the project sponsor to assess their AI capabilities?

Section Outline

Areas for AI readiness assessment:
- Organizational readiness
- Infrastructure readiness
- Data readiness
Examples of AI readiness index assessment

Definition of Done

Completed 1_Pre_project_Phase/ai_readiness_assessment.md
Verified against self-review checklist
Added section to table of contents
Page builds and displays correctly on local machine

Preamble

Section Outline
Add and edit Laurence's preamble section

Definition of Done

Completed 0-preamble/preamble.md
Verified against self-review checklist
Added section to table of contents
Page builds and displays correctly on local machine

What are the potential data risks/uncertainties that the AI engineer will be inheriting from the project?

Section Outline
Please list the key points of content you intend to cover for this section

Introduction
Questions

Definition of Done

Completed <chapter-num>/<filename.md>
Verified against self-review checklist
Page builds and displays correctly

Question you are responding to

Section Outline
Please list the key points of content you intend to cover for this section

Definition of Done

Completed 2-proj-mgmt-tech-lead/translate-technical-jargon.md
Verified against self-review checklist
Added section to table of contents
Page builds and displays correctly on local machine

What are the common CV (specifically object detection and image segmentation) evaluation metrics?

Section Outline

Intersection Over Union (IOU)
True Positive, False Negatives and False Positives
Mean Average Precision/ Recall
Imbalanced data situations - how to handle
Thresholds - IOU, Confidence and non-max suppression (NMS) and their impact on Precision and Recall
Frames Per Second

Definition of Done

Completed 6-modelling/cv-metrics.md
Verified against self-review checklist
Added section to table of contents
Page builds and displays correctly on local machine

How do we make data splits repeatable?

Section Outline
Please list the key points of content you intend to cover for this section

Cover how repeatable data splits will make experiments more reproducible by ensuring that the same data point will always go into the same data split regardless of changes to the entire dataset.
One approach will be to hash column/columns and using the generated hash value for determining which data split the data point will fall into.

Definition of Done

Completed 5/data-splits-repeatable.md
Verified against self-review checklist
Added section to table of contents
Page builds and displays correctly on local machine

How Can We Build a Minimum Viable Code/Configuration for CI/CD Automation Into an Existing Codebase?

Section Outline

What Minimum Viable Code/Configuration is
What CI/CD is
What the steps are to ensure any extra code doesn't break functionality of the current codebase

Definition of Done

Completed 7-solution-delivery/min-viable-code.md
Verified against self-review checklist
Page builds and displays correctly

What are some of the ways to do EDA for CV tasks?

Section Outline

Checks on images generally
EDA Tasks
Bounding Box Formats
Bounding Box Random Sample Checks
Masks Sample Checks

Definition of Done

Completed 5-data-mgmt-exp-proc/eda-cv-tasks.md
Verified against self-review checklist
Added section to table of contents
Page builds and displays correctly on local machine

What are the checklist of items to look out for to ensure reproducibility of the model pipeline as much as possible?

Section Outline

Overview of model reproducibility
List of items to check to ensure reproducibility
- Model tests ie. API calls, algorithmic correctness
- Model version control
- Integration tests
- Proper logging

Definition of Done

Completed <chapter-num>/<filename.md>
Verified against self-review checklist
Added section to table of contents
Page builds and displays correctly on local machine

What are the basic core structure for performing EDA and optional blocks for specific AI domain or specific use case?

Section Outline
Please list the key points of content you intend to cover for this section

Structure investigation
Quality investigation
Content investigation

Definition of Done

Completed 5-data-mgmt-exp-proc/eda-generic.md
Verified against self-review checklist
Added section to table of contents
Page builds and displays correctly on local machine

What are the questions that we ask our project sponsors so that we can understand their project lifecycle as well as their workflows in order to integrate our MVP with their organisation better?

Section Outline
Please list the key points of content you intend to cover for this section

What is the project sponsor's typical project lifecycle
What is the project sponsor's typical workflow
Active persuasion for our own stack (the advantages, etc.)
Client-first approach

Definition of Done

Completed 2-proj-mgmt-tech-lead/lifecycle-questions.md
Verified against self-review checklist
Page builds and displays correctly

What are the platforms and their respective considerations required for collaborative ML development?

Section Outline
Tools for collaboration.

Communication tools.
Code versioning.
Model/Weight/Data versioning

Definition of Done

Completed 3_Collaborative_Development_Platforms/collaborative_ml.md
Verified against self-review checklist
Added section to table of contents
Page builds and displays correctly on local machine

As a handbook contributor, I would like to have a reference guide for contributing to the handbook so that I can be made known of the conventions to be adhered to when making pull/merge requests.

What are some good practices in documenting high-level system architecture and processes of an AI solution?

Section Outline
This section consists of tips and techniques for creating and maintaining documentation of:

high-level solution architecture
ML pipelines

Definition of Done

Completed 8-documentation-handover/documenting-architecture-processes.md
Uploaded images to assets directory
Verified against self-review checklist
Page builds and displays correctly

What are some of the common NLP evaluation metrics?

Section Outline
Please list the key points of content you intend to cover for this section

Definition of Done

Completed 6/nlp-metrics.md
Verified against self-review checklist
Added section to table of contents
Page builds and displays correctly on local machine

How does the AI engineer translate the business challenge into an AI problem?

Section Outline
This section entails the information/knowledge that AI Engineer should acquire to translate the business challenge into an AI problem.

Introduction
Translation of Business Requirements into Technical Requirements

Definition of Done

Completed <chapter-num>/<filename.md>
Verified against self-review checklist
Page builds and displays correctly

aisingapore / ai-practitioner-handbook Goto Github PK

ai-practitioner-handbook's Introduction

AI Singapore's AI Practitioner Handbook

Overview

How to use this book

What Our Reviewers Say

ai-practitioner-handbook's People

Contributors

Stargazers

Watchers

Forkers

ai-practitioner-handbook's Issues

Recommend Projects

Recommend Topics

Recommend Org