Portfolio link: https://sites.google.com/view/sourav9827/home
Linkedin Link: www.linkedin.com/in/sourav-mandal-390064210
mail id: [email protected]
#Data scientist
- This is a curated list of delightful resources for everything you need to develop Machine Learning solutions.
- Each item in this list will teach you at least one distinct and significant skill or piece of information.
- There are three content levels:
- π₯ Essential reading for all ML engineers
- π Advanced reading for professional ML engineers
- π¦ Expert material for expert ML engineers
- Descriptions are written to complete the sentence "After reading this article you will have learned ...".
- π₯ BLUF: The Military Standard That Can Make Your Writing More Powerful - How to make your communication more powerful (5 min)
- π₯ The XY Problem - How to focus on explaining your end goal when asking for help (5 min)
- π₯ Bike-shedding: how mature are you as an engineer? - How to avoid and call out bike-shedding (5 min)
- π₯ E-mail like a boss - How to write better e-mails (5 min)
- π₯ Stop Swiss Cheesing your calendar - How to manage your calendar so you can focus (15 min)
- π₯ How to write in plain English - How to write in plain English (30 min)
- π₯ Presentation Rules - How to create a great slide deck (30 min)
- π SMART criteria - How to define goals (15 min)
- π MECE principle - How to fully decompose a problem into a structured list (15 min)
- π SCQA: What is it, how does it work, and how can it help me? - How to structure your presentations, proposals, and sales outlines (15 min)
- π No More Misunderstandings - How to avoid miscommunication by paraphrasing (15 min)
- π¦ Nonviolent communication - How to deliver constructive feedback in difficult situations (15 min)
- π¦ The Halo effect - How to recognize and use the Halo effect to your advantage (15 min)
- π¦ Mythical Man Month - The relationship between person-days and throughput time in a project (15 min)
- π¦ Four-sides model - How to communicate effectively by considering how the receiver interprets your message (30 min)
- π₯ Semantic Versioning - How to bump the version of your apps and packages (15 min)
- π₯
__all__
and wild imports in Python - How__all__
defines the public API of your Python packages (15 min) - π₯ APIs for Machine Learning - How to design RESTful APIs for Machine Learning applications (30 min)
- π FastAPI docs - How to build RESTful APIs that correspond one-to-one with an OpenAPI specification (1 day)
- π The Rule of Three - When to build reusable components and when not (15 min)
- π Falsehoods programmers believe about time - How to avoid common pitfalls about time (15 min)
- π Falsehoods programmers believe about names - How to avoid common pitfalls about names (15 min)
- π Command Line Interface Guidelines - How to write great CLIs (1 hour)
- π¦ Zalando's RESTful API guidelines - How to design RESTful APIs (1 day)
- π₯ Poetry Cookiecutter - How to scaffold a modern Poetry-based development environment for Python packages and apps (30 min)
- π₯ The seven rules of a great Git commit message - How to write great Git commit messages (15 min)
- π₯ Learn Git Branching - Practice Git from beginner to advanced (1 hour)
- π Keep a Changelog - How to keep a changelog for your apps and packages (30 min)
- π Conventional Commits - How to prefix your commit messages to automate Semantic Versioning and Keep a Changelog (15 min)
- π Testing Python Applications with Pytest - How to properly test a package with pytest (30 min)
- π A successful Git branching model - How to release software with Git (15 min)
- π Code Review Best Practices - What to look for when reviewing a Pull Request (30 min)
- π Code Health: Respectful Reviews == Useful Reviews - How to communicate code review comments respectfully (15 min)
- π The Code Review Pyramid - What to look for and what to automate when reviewing a Pull Request (15 min)
- π¦ Poetry workspace plugin - How to create and manage a Poetry-based monorepo (15 min)
- π₯ PEP20 "The Zen of Python" - How to write idiomatic Python (15 min)
- π₯ The Definitive Guide to Python import Statements - How to write import statements (30 min)
- π₯ Understanding Python's logging module - How to use the
logging
module effectively (30 min) - π Don't run code at import time - Why you shouldn't run code at import time
- π Please fix your decorators - Why you should probably use
wrapt
to write your decorators (30 min) - π¦ Do not log - What you should be doing instead of logging (30 min)
- π¦ The Little Book of Python Anti-Patterns - A collectiong of Python anti-patterns (X hours)
- π¦ Effective Python - A collection of Python idioms (X hours)
- π¦ Python Design Patterns - A collection of software architecture patterns (1 hour)
- π¦ SOLID - A standard set of software architecture patterns (1 hour)
- π¦ What the f*ck Python! - How to master Python by understanding its edge cases (1 day)
- π The Comprehensive Guide to mypy - How to write type annotations in Python (1 hour)
- π Pydantic overview - How to write type annotations for complex types instead of a meaningless
Dict[str, Any]
(1 hour) - π Magic number - Why magic values are an anti-pattern (15 min)
- π Enums - How to write
Enum
s in Python instead of type-unsafe magic values (15 min) - π¦ Mypy generics - How to use
TypeVar
s to write generic types such asList[T]
(30 min) - π¦ Mypy protocols - How to use
Protocol
s to define interfaces such asIterable
(30 min)
- π cookiecutter - Scaffold new Python packages or apps quickly with a Cookiecutter template
- π cruft - Update a Python package's underlying Cookiecutter scaffolding
- π commitizen - Check that commit messages satisfy Conventional Commits and automate Semantic Versioning and Keep a Changelog
- π poetry - Manage the packaging and dependencies of your Python project
- π poe - Define and run tasks in a Poetry project with Poe the Poet
- π poetry-workspace-plugin - Manage a Python monorepo with this Poetry plugin
- π₯ black - Automatically format your code
- π₯ isort - Automatically sort your import statements
- π pre-commit - Automatically run code quality checks on commit
- π bandit - Find common security issues
- π darglint - Check that your docstrings match your function signature
- π flake8 - Check your code for bugs and that your code style is PEP8-compliant
- π flake8 extensions - An awesome list of Flake8 extensions
- π mypy - Check the type-correctness of your code
- π pre-commit hooks - A collection of pre-commit hooks that check file quality
- π pydocstyle - Check that your code is documented
- π pygrep hooks - A collection of pre-commit hooks that check for common Python code smells
- π pytest-recording - Record and play back HTTP requests in your pytest tests
- π pyupgrade - Check that your code is written using the latest Python language features
- π safety - Check that your dependencies don't have any known security vulnerabilities
- π shellcheck - Check the quality of your shell scripts
- π coverage.py - Check your code's test coverage
- π¦ hypothesis - Write tests that automatically look for edge cases that break your code
- π¦ hypothesis-auto - Automate generate Hypothesis tests based on your code's type annotations
- π fastapi - Create RESTful APIs based on type annotations
- π typer - Create CLIs based on type annotations
- π streamlit - Create web apps with a single Python file
- π bump2version - Release a new version of your package
- π coloredlogs - Increase your logs' readability with colour
- π hvplot - Create interactive plots from pandas dataframes
- π mkdocs - Create developer documentation for your project
- π pdoc - Generate API documentation for your code
- π birdseye - Graphically debug your Python code
- π scalene - Profile your code's CPU and memory usage by line
- π viztracer - Vizualize your code's performance with a flamegraph
- π tqdm - Easily add progress bars to long-running jobs
- π Bias-variance tradeoff - How a model's total error is the sum of bias and variance (30 min)
- π The two different uses of cross-validation - How to use nested cross-validation to combine the two different uses of cross-validation (30 min)
- π Modes, Medians and Means: A Unifying Perspective - Why minimizing the Mean Absolute Error (MAE) is more robust than minimizing the Mean Squared Error (MSE) (30 min)
- π Backpropagation is the chain rule to compute the gradient - How backpropagation is an algorithm to compute the objective function's gradient (30 min)
- π Stacked generalization - How to stack models (30 min)
- π We have been using the wrong initialization for t-SNE and UMAP - How to initialize t-SNE and UMAP properly (15min)
- π¦ From classic Fully Connected Networks to Transformers - How neural networks evolved from Fully Connected Networks to Transformers (30 min)
- π¦ What is the .632+ rule? - How to measure generalization performance with bootstrapping (30 min)
- π¦ Stacking strategies with and without leaks - Different strategies to stack models (30 min)
- π¦ Data Distribution Shifts and Monitoring - How to detect and address the different types of data shift (1 hour)
- π¦ Backprop is not just the chain rule - How backpropagation relates to Lagrange multipliers (30 min)
- π¦ Why ML algorithms are hard to tune - Optimize multiple objectives when the Pareto front is concave (30min)
- π¦ Deep learning model compression - How quantization, pruning, and distillation can be used to compress models (30 min)
- π SHAP: SHapley Additive exPlanations - How to explain a model's output with Shapley values (30 min)
- π¦ Intro to Shapley and SHAP - How Shapley values are approximated by SHAP (30 min)
- π UMAP: Uniform Manifold Approximation and Projection - How to reduce dimensionality for visualization and modelling (30 min)
- π PyNNDescent - How to find nearest neighbours in huge datasets (15 min)
- π₯ Precision and recall - How precision and recall measure a classifier's performance (30 min)
- π Probability calibration - How and for which model types you should calibrate the model's output scores into probabilities (30 min)
- π You're all calculating churn rates wrong - Correctly define what churn is (30 min)
- π¦ Gaussian processes - From scratch - How to build probabilistic regression models with Gaussian Processes (1 hour)
- π Microsoft's Document Image Transformer - A self-supervised pre-trained model that achieves SotA performance on PubLayNet and can be used for various downstream tasks (30 min)
- π Awesome Sentence Embedding - A curated list of pretrained sentence and word embedding models (15 min)
- π The Prophet model - How Meta's Prophet model decomposes a time series into a trend, seasonality, and holiday components (30 min)
- π Darts - Time Series Made Easy in Python - How to build forecasting models with
darts
(1 hour)
- π Microsoft Recommenders - A comparison of recommender system models (30 min)
- π What I Wish Someone Had Told Me About Tensor Computation Libraries - How JAX, PyTorch, TensorFlow, and Theano are different (30 min)
- π Modern Pandas series (Part 1 - 7) - Write idiomatic pandas (1 hour)
- π Awesome Pandas - An awesome list of Pandas resources (1 hour)
- π₯ Using scikit-learn Pipelines and FeatureUnions - How to use
Pipeline
s to build end-to-end models (30 min) - π₯ Transforming target in regression - How to transform the target to build more robust models (15 min)
- π ColumnTransformer for heterogeneous data - How to use
ColumnTransformer
to process pandas DataFrames in sklearnPipeline
s (30 min) - π Custom Estimators - Create your own custom
Estimator
(30 min) - π Hyperparameter optimization with successive halving - How to optimize hyperparameters with the most computationally efficient method (30 min)
- π Doccano - A tool for labelling text (30 min)
- π CVAT: Computer Vision Annotation Tool - A tool for labelling images (30 min)
- π Awesome Data Labelling - An awesome list of data labelling tools (30 min)
- π invoke - How to implement common tasks you run on your project as a CLI (30 min)
- π poe - How to implement common tasks you run on your project as a CLI (30 min)
- π₯ Intro to packaging and dependency management for Python with Poetry - How to manage your Python package's dependencies and environment (30 min)
- π Intro to Pyenv for Machine Learning - How to use pyenv to manage your Python interpreter (30 min)
- π Modern Python Environments - dependency and workspace management - A comparison between pyenv, venv + pip, venv + pip-tools, poetry, pipenv, and conda (30 min)
- π¦ Conda: Myths and Misconceptions - Common misconceptions about Conda (15 min)
- π₯ Docker Curriculum - How to use Docker (4 hours)
- π Docker layer caching - How to write Dockerfiles to benefit from layer caching (30 min)
- π Dockerfile best practices - How to write good Dockerfiles (1 hour)
- π Configuring Gunicorn for Docker - How to best configure Gunicorn for a Docker image (30 min)
- π Speed up Docker with BuildKitβs new caching - How to speed up Docker builds with a build cache (30 min)
- π¦ Build secrets in Docker and Compose, the secure way - How to use secrets in a Docker build (15 min)
- π¦ Security scanners for Python and Docker - How to scan your Docker image for security issues with your code and Docker image (30 min)
- π¦ The security scanner that cried wolf - How to scan your Docker image for security issues without false positives (15 min)
- π¦ Awesome Docker - An awesome list of Docker resources (30 min)
- π Great Expectations - How to test and document your data and data pipelines (30 min)
- π Cron best practices - How to best use cron to schedule tasks (30 min)
- π A visual guide to SSH tunnels - How to forward ports and create tunnels with SSH (30 min)
- π Safe ways to do things in bash - How to write safe and robust shell scripts (1 hour)
- π¦ Your terminal is not a terminal: An Introduction to Streams - How your terminal is a tool to manipulate streams (30 min)
- π¦ Bash Heredoc - How to pass multiline arguments to commands with a heredoc (30 min)
- π¦ Please stop writing shell scripts - Why you shouldn't write shell scripts for CI/CD or Docker images (30 min)
- π₯ An Introduction to Terraform - How to use Terraform (1 hour)
- π Terraform best practices - Terraform best practices (1 hour)
- π¦ Terraform pre-commit hooks collection - How to automate Terraform code quality checks with pre-commit (1 hour)
- π¦ Awesome Terraform - An awesome list of Terraform resources (30 min)
- π₯ Terraform Tutorial - How to get started with Terraform (1 hour)
- π₯ Using Redis In-Memory Storage for your Python Applications - How to use Redis as an in-memory cache for your Python application (30 min)
- π Python Kafka Consumers: at-least-once, at-most-once, exactly-once - How to write different types of Kafka consumers in Python (30 min)
- π¦ Kafka Exactly-Once-Semantics - How to produce and consume messages exactly once (1 hour)
- π¦ RabbitMQ: a message queue library with persistance - RabbitMQ is a messaging system with a message broker (4 hours)
- π¦ ZeroMQ: a socket library with message queue primitives - ZeroMQ is a lightweight messaging system without a message broker (8 hours)
(RADIX-AI)(https://github.com/radix-ai/awesome-machine-learning-engineer)