Giter Club home page Giter Club logo

daily-dose-of-data-science's Introduction

View on GitHub View on Medium Daily Dose of Data Science View on LinkedIn

alt text

Daily Dose of Data Science is a publication on Substack that brings together intriguing frameworks, libraries, technologies, and tips that make the life cycle of a Data Science project effortless.

This repository is a collection of all the code snippets presented in my publication. If you want to receive these tips in your mailbox daily, you can subscribe to my Substack newsletter.

Star History

Star History Chart

Run These Code Snippets on Your Local Machine

To download the tips listed here, you can clone this repo.

git clone https://github.com/ChawlaAvi/Daily-Dose-of-Data-Science

Table of Contents

  1. Pandas
  2. Jupyter Tips
  3. Python
  4. Plotting
  5. NumPy
  6. Memory Optimization
  7. Cool Tools
  8. Run-time Optimization
  9. Sklearn
  10. Debugging
  11. Missing Data
  12. ML-AI News
  13. Machine Learning
  14. Statistics
  15. Testing
  16. Terminal
  17. Documents
  18. Animations

Pandas

Title Notebook Substack Article
One-Minute Guide To Becoming a Polars-savvy Data Scientist ๐Ÿ”— ๐Ÿ”—
Avoid Using Pandas' Apply() Method At All Times ๐Ÿ”— ๐Ÿ”—
Pandas vs Polars โ€” Run-time and Memory Comparison ๐Ÿ”— ๐Ÿ”—
A Lesser-Known Feature of the Merge Method in Pandas ๐Ÿ”— ๐Ÿ”—
A Highly Overlooked Approach To Analysing Pandas DataFrames ๐Ÿ”— ๐Ÿ”—
The Most Common Misconception About Inplace Operations in Pandas ๐Ÿ”— ๐Ÿ”—
Become A Bilingual Data Scientist With These Pandas to SQL Translations ๐Ÿ”— ๐Ÿ”—
Avoid This Costly Mistake When Indexing A DataFrame ๐Ÿ”— ๐Ÿ”—
AutoProfiler: Automatically Profile Your DataFrame As You Work ๐Ÿ”— ๐Ÿ”—
Why You Should Avoid Appending Rows To A DataFrame ๐Ÿ”— ๐Ÿ”—
Are You Sure You Are Using The Correct Pandas Terminologies? ๐Ÿ”— ๐Ÿ”—
If You Are Not Able To Code A Vectorized Approach, Try This. ๐Ÿ”— ๐Ÿ”—
Why Are We Typically Advised To Never Iterate Over A DataFrame? ๐Ÿ”— ๐Ÿ”—
PyGWalker: Analyze Pandas Dataframe in Jupyter using a Tableau-style Interface ๐Ÿ”— ๐Ÿ”—
A Simple Trick to Make The Most Out of Pivot Tables in Pandas ๐Ÿ”— ๐Ÿ”—
Never Worry About Parsing Errors Again While Reading CSV with Pandas ๐Ÿ”— ๐Ÿ”—
An Interesting and Lesser-Known Way To Create Plots Using Pandas ๐Ÿ”— ๐Ÿ”—
Generate Helpful Hints As You Write Your Pandas Code ๐Ÿ”— ๐Ÿ”—
Speed-up Parquet I/O of Pandas by 5x ๐Ÿ”— ๐Ÿ”—
Stop Using The Describe Method in Pandas. Instead, use Skimpy. ๐Ÿ”— ๐Ÿ”—
Stop Using The Describe Method in Pandas. Instead, use Summarytools. ๐Ÿ”— ๐Ÿ”—
Analyze A Pandas DataFrame Without Code ๐Ÿ”— ๐Ÿ”—
70x Faster Pandas By Changing Just One Line of Code ๐Ÿ”— ๐Ÿ”—
Reduce Memory Usage Of A Pandas DataFrame By 90% ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
Speed-up Pandas Apply 5x with NumPy ๐Ÿ”— ๐Ÿ”—
A Lesser-Known Feature of Apply Method In Pandas ๐Ÿ”— ๐Ÿ”—
Create Pandas DataFrame from Dataclass ๐Ÿ”— ๐Ÿ”—
Run SQL in Jupyter To Analyze A Pandas DataFrame ๐Ÿ”— ๐Ÿ”—
When You Should Not Use the head() Method In Pandas ๐Ÿ”— ๐Ÿ”—
Three Lesser-known Tips For Reading a CSV File Using Pandas ๐Ÿ”— ๐Ÿ”—
The Best File Format To Store A Pandas DataFrame ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
Lesser-Known Feature of the Merge Method in Pandas ๐Ÿ”— ๐Ÿ”—
The Best Way to Use Apply() in Pandas ๐Ÿ”— ๐Ÿ”—
A No-code Tool To Understand Your Data Quickly ๐Ÿ”— ๐Ÿ”—
Display Progress Bar With Apply() in Pandas ๐Ÿ”— ๐Ÿ”—
Supercharge value_counts() Method in Pandas With Sidetable ๐Ÿ”— ๐Ÿ”—
Explore CSV Data Right From The Terminal ๐Ÿ”— ๐Ÿ”—
Define the Correct DataType for Categorical Columns ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
Don't Create Conditional Columns in Pandas with Apply ๐Ÿ”— ๐Ÿ”—
Write Your Own Flavor Of Pandas ๐Ÿ”— ๐Ÿ”—
Create DataFrame Hassle-free By Using Clipboard ๐Ÿ”— ๐Ÿ”—
Alter the Datatype of Multiple Columns at Once ๐Ÿ”— ๐Ÿ”—
Why you should not dump DataFrames to a CSV ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
Why You Should Not Read CSVs with Pandas ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
Parallelize Pandas Apply() With Swifter ๐Ÿ”— ๐Ÿ”—
A Hidden Feature of Describe Method In Pandas ๐Ÿ”— ๐Ÿ”—
Enrich Your Notebook With Interactive Controls ๐Ÿ”— ๐Ÿ”—
Data Analysis Using No-Code Pandas In Jupyter ๐Ÿ”— ๐Ÿ”—
Create Pivot Tables, Aggregations and Plots Without Any Code ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
Parallelize Pandas with Pandarallel ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
Pretty Plotting With Pandas ๐Ÿ”— ๐Ÿ”—
How to Read Multiple CSV Files Efficiently ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
Configure Sklearn To Output Pandas DataFrame ๐Ÿ”— ๐Ÿ”—
Datatype For Handling Missing Valued Columns in Pandas ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
Vectorization Does Not Always Guarantee Better Performance ๐Ÿ”— ๐Ÿ”—

Jupyter Tips

Title Notebook Substack Article
Declutter Your Jupyter Notebook Using Interactive Controls ๐Ÿ”— ๐Ÿ”—
๐Ÿš€ Jupyter Notebook + Spreadsheet + AI โ€” All in One Place With Mito ๐Ÿ”— ๐Ÿ”—
The Coolest GitHub-Colab Integration You Would Ever See ๐Ÿ”— ๐Ÿ”—
Break the Linear Presentation of Notebooks With Stickyland ๐Ÿ”— ๐Ÿ”—
Restart Jupyter Kernel Without Losing Variables ๐Ÿ”— ๐Ÿ”—
Annotate Data With The Click Of A Button Using Pigeon ๐Ÿ”— ๐Ÿ”—
Build Elegant Web Apps Right From Jupyter Notebook with Mercury ๐Ÿ”— ๐Ÿ”—
Supercharge Your Jupyter Kernel With ipyflow ๐Ÿ”— ๐Ÿ”—
PyGWalker: Analyze Pandas Dataframe in Jupyter using a Tableau-style Interface ๐Ÿ”— ๐Ÿ”—
Draw The Data You Are Looking For In Seconds ๐Ÿ”— ๐Ÿ”—
Never Search Jupyter Notebooks Manually Again To Find Your Code ๐Ÿ”— ๐Ÿ”—
Stop Previewing Raw DataFrames. Instead, Use DataTables ๐Ÿ”— ๐Ÿ”—
Label Your Data With The Click Of A Button ๐Ÿ”— ๐Ÿ”—
The Coolest Jupyter Notebook Hack ๐Ÿ”— ๐Ÿ”—
View Documentation in Jupyter Notebook ๐Ÿ”— ๐Ÿ”—
Get Notified When Jupyter Cell Has Executed ๐Ÿ”— ๐Ÿ”—
Clear Cell Output In Jupyter Notebook During Run-time ๐Ÿ”— ๐Ÿ”—
CodeSquire: The AI Coding Assistant You Should Use Over GitHub Copilot ๐Ÿ”— ๐Ÿ”—
Find Your Code Hiding In Some Jupyter Notebook With Ease ๐Ÿ”— ๐Ÿ”—
Enrich Your Notebook With Interactive Controls ๐Ÿ”— ๐Ÿ”—
Data Analysis Using No-Code Pandas In Jupyter ๐Ÿ”— ๐Ÿ”—
Create Pivot Tables, Aggregations and Plots Without Any Code ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
Restart Notebook Without Losing Variables ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
Retrieve Previously Computed Output In Jupyter Notebook ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
Transfer Variables Between Jupyter Notebooks ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—

Python

Title Notebook Substack Article
7 Elegant Usages of Underscore in Python ๐Ÿ”— ๐Ÿ”—
How To Enforce Type Hints in Python? ๐Ÿ”— ๐Ÿ”—
A Common Misconception About Deleting Objects in Python ๐Ÿ”— ๐Ÿ”—
What Makes The Join() Method Blazingly Faster Than Iteration? ๐Ÿ”— ๐Ÿ”—
A Hidden Feature of a Popular String Method in Python ๐Ÿ”— ๐Ÿ”—
Execute Python Project Directory as a Script ๐Ÿ”— ๐Ÿ”—
Improve Python Run-time Without Changing A Single Line of Code ๐Ÿ”— ๐Ÿ”—
A Lesser-Known Difference Between For-Loops and List Comprehensions ๐Ÿ”— ๐Ÿ”—
A Lesser-Known Difference Between For-Loops and List Comprehensions ๐Ÿ”— ๐Ÿ”—
Magic Methods: An Underrated Gem of Python OOP ๐Ÿ”— ๐Ÿ”—
9 Command Line Flags To Run Python Scripts More Flexibly ๐Ÿ”— ๐Ÿ”—
Use Custom Python Objects In A Boolean Context ๐Ÿ”— ๐Ÿ”—
You Were Probably Given Incomplete Info About A Tuple's Immutability ๐Ÿ”— ๐Ÿ”—
A Counterintuitive Thing About Python Dictionaries ๐Ÿ”— ๐Ÿ”—
A Counterintuitive Thing About Python Dictionaries ๐Ÿ”— ๐Ÿ”—
Probably The Fastest Way To Execute Your Python Code ๐Ÿ”— ๐Ÿ”—
A Counterintuitive Fact About Python Functions ๐Ÿ”— ๐Ÿ”—
Manipulating Mutable Objects In Python Can Get Confusing At Times ๐Ÿ”— ๐Ÿ”—
Most Python Programmers Don't Know This About Python OOP ๐Ÿ”— ๐Ÿ”—
You Can Add a List As a Dictionary's Key (Technically)! ๐Ÿ”— ๐Ÿ”—
Why Python Does Not Offer True OOP Encapsulation ๐Ÿ”— ๐Ÿ”—
Most Python Programmers Don't Know This About Python For-loops ๐Ÿ”— ๐Ÿ”—
How To Enable Function Overloading In Python ๐Ÿ”— ๐Ÿ”—
The Right Way to Roll Out Library Updates in Python ๐Ÿ”— ๐Ÿ”—
F-strings Are Much More Versatile Than You Think ๐Ÿ”— ๐Ÿ”—
A Single Line That Will Make Your Python Code Faster ๐Ÿ”— ๐Ÿ”—
Make Dot Notation More Powerful in Python ๐Ÿ”— ๐Ÿ”—
An Elegant Way To Perform Shutdown Tasks in Python ๐Ÿ”— ๐Ÿ”—
What Are Class Methods and When To Use Them? ๐Ÿ”— ๐Ÿ”—
Hide Attributes While Printing A Dataclass Object ๐Ÿ”— ๐Ÿ”—
List : Tuple :: Set : ? ๐Ÿ”— ๐Ÿ”—
Post_init: Add Attributes To A Dataclass Post Initialization ๐Ÿ”— ๐Ÿ”—
Simplify Your Functions With Partial Functions ๐Ÿ”— ๐Ÿ”—
DotMap: A Better Alternative to Python Dictionary ๐Ÿ”— ๐Ÿ”—
Prevent Wild Imports With all in Python ๐Ÿ”— ๐Ÿ”—
Performance Comparison of Python 3.11 and Python 3.10 ๐Ÿ”— ๐Ÿ”—
Why 256 is 256 But 257 is not 257? ๐Ÿ”— ๐Ÿ”—
Make a Class Object Behave Like a Function ๐Ÿ”— ๐Ÿ”—
Lesser-known Feature of Pickle Files ๐Ÿ”— ๐Ÿ”—
Specify Loops and Runs In %%timeit ๐Ÿ”— ๐Ÿ”—
Don't Use time.time() To Measure Execution Time ๐Ÿ”— ๐Ÿ”—
Import Your Python Package as a Module ๐Ÿ”— ๐Ÿ”—
Fine-grained Error Tracking With Python 3.11 ๐Ÿ”— ๐Ÿ”—
Run Python Project Directory As A Script ๐Ÿ”— ๐Ÿ”—
Use Slotted Class To Improve Your Python Code ๐Ÿ”— ๐Ÿ”—
Using Dictionaries In Place of If-conditions ๐Ÿ”— ๐Ÿ”—
In Defense of Match-case Statements in Python ๐Ÿ”— ๐Ÿ”—

Plotting

Title Notebook Substack Article
Don't Overuse Scatter, Line and Bar Plots. Try These Four Elegant Alternatives. ๐Ÿ”— ๐Ÿ”—
Sankey Diagrams: An Underrated Gem of Data Visualization ๐Ÿ”— ๐Ÿ”—
Enrich Your Heatmaps With This Simple Trick ๐Ÿ”— ๐Ÿ”—
The Coolest Matplotlib Hack to Create Subplots Intuitively ๐Ÿ”— ๐Ÿ”—
Waterfall Charts: A Better Alternative to Line/Bar Plot ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
Enrich Your Confusion Matrix With A Sankey Diagram ๐Ÿ”— ๐Ÿ”—
A Simple One-Liner to Create Professional Looking Matplotlib Plots ๐Ÿ”— ๐Ÿ”—
Visualise The Change In Rank Over Time With Bump Charts ๐Ÿ”— ๐Ÿ”—
A Simple Trick That Significantly Improves The Quality of Matplotlib Plots ๐Ÿ”— ๐Ÿ”—
A Lesser-known Feature of Creating Plots with Plotly ๐Ÿ”— ๐Ÿ”—
A Little Bit Of Extra Effort Can Hugely Transform Your Basic Matplotlib Plots ๐Ÿ”— ๐Ÿ”—
Interactively Visualise A Decision Tree With A Sankey Diagram ๐Ÿ”— ๐Ÿ”—
Use Histograms With Caution. They Are Highly Misleading! ๐Ÿ”— ๐Ÿ”—
Three Simple Ways To (Instantly) Make Your Scatter Plots Clutter Free ๐Ÿ”— ๐Ÿ”—
Matplotlib Has Numerous Hidden Gems. Here's One of Them. ๐Ÿ”— ๐Ÿ”—
A Simple Trick That Will Make Heatmaps More Elegant ๐Ÿ”— ๐Ÿ”—
The Limitations Of Heatmap That Are Slowing Down Your Data Analysis ๐Ÿ”— ๐Ÿ”—
An Underrated Technique To Improve Your Data Visualizations ๐Ÿ”— ๐Ÿ”—
Who Said Matplotlib Cannot Create Interactive Plots? ๐Ÿ”— ๐Ÿ”—
Don't Create Messy Bar Plots. Instead, Try Bubble Charts! ๐Ÿ”— ๐Ÿ”—
Use Box Plots With Caution! They May Be Misleading. ๐Ÿ”— ๐Ÿ”—
An Underrated Technique To Create Better Data Plots ๐Ÿ”— ๐Ÿ”—
An Interesting and Lesser-Known Way To Create Plots Using Pandas ๐Ÿ”— ๐Ÿ”—
Style Matplotlib Plots To Make Them More Attractive ๐Ÿ”— ๐Ÿ”—
Simple One-Liners to Preview a Decision Tree Using Sklearn ๐Ÿ”— ๐Ÿ”—
Create Data Plots Right From The Terminal ๐Ÿ”— ๐Ÿ”—
Make Your Matplotlib Plots More Professional ๐Ÿ”— ๐Ÿ”—
Perfplot: Measure, Visualize and Compare Run-time With Ease ๐Ÿ”— ๐Ÿ”—
Prettify Word Clouds In Python ๐Ÿ”— ๐Ÿ”—
Calendar Map As A Richer Alternative to Line Plot ๐Ÿ”— ๐Ÿ”—
Density Plot As A Richer Alternative to Scatter Plot ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
Python One-Liner To Create Sketchy Hand-drawn Plots ๐Ÿ”— ๐Ÿ”—
Create a Moving Bubbles Chart in Python ๐Ÿ”— ๐Ÿ”—
Visualizing Google Search Trends of 2022 using Python ๐Ÿ”— ๐Ÿ”—
Create A Racing Bar Chart In Python ๐Ÿ”— ๐Ÿ”—
Elegantly Plot the Decision Boundary of a Classifier ๐Ÿ”— ๐Ÿ”—
Dot Plot: A Potential Alternative to Bar Plot ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
Hexbin Plots As A Richer Alternative to Scatter Plots ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
Enrich Your Notebook With Interactive Controls ๐Ÿ”— ๐Ÿ”—
Regression Plot Made Easy with Plotly ๐Ÿ”— ๐Ÿ”—
Pretty Plotting With Pandas ๐Ÿ”— ๐Ÿ”—
Polynomial Linear Regression Plot Made Easy With Seaborn ๐Ÿ”— ๐Ÿ”—
Analyse Flow Data With Sankey Diagrams ๐Ÿ”— ๐Ÿ”—
Waterfall Charts: A Better Alternative to Line/Bar Plot ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—

NumPy

Title Notebook Substack Article
A Major Limitation of NumPy Which Most Users Aren't Aware Of ๐Ÿ”— ๐Ÿ”—
Beware of This Unexpected Behaviour of NumPy Methods ๐Ÿ”— ๐Ÿ”—
Speedup NumPy Methods 25x With Bottleneck ๐Ÿ”— ๐Ÿ”—
Speed-up NumPy 20x with Numexpr ๐Ÿ”— ๐Ÿ”—
An Elegant Way To Perform Matrix Multiplication ๐Ÿ”— ๐Ÿ”—
Difference Between Dot and Matmul in NumPy ๐Ÿ”— ๐Ÿ”—
Don't Print NumPy Arrays! Use Lovely-NumPy Instead ๐Ÿ”— ๐Ÿ”—
Polynomial Linear Regression with NumPy ๐Ÿ”— ๐Ÿ”—

Memory Optimization

Title Notebook Substack Article
70x Faster Pandas By Changing Just One Line of Code ๐Ÿ”— ๐Ÿ”—
Reduce Memory Usage Of A Pandas DataFrame By 90% ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
The Best File Format To Store A Pandas DataFrame ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
Define the Correct DataType for Categorical Columns ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
Datatype For Handling Missing Valued Columns in Pandas ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
Save Memory with Python Generators ๐Ÿ”— ๐Ÿ”—

Cool Tools

Title Notebook Substack Article
CNN Explainer: Interactively Visualize a Convolutional Neural Network ๐Ÿ”— ๐Ÿ”—
Break the Linear Presentation of Notebooks With Stickyland ๐Ÿ”— ๐Ÿ”—
Annotate Data With The Click Of A Button Using Pigeon ๐Ÿ”— ๐Ÿ”—
Mito Just Got Supercharged With AI! ๐Ÿ”— ๐Ÿ”—
PyGWalker: Analyze Pandas Dataframe in Jupyter using a Tableau-style Interface ๐Ÿ”— ๐Ÿ”—
Supercharge Shell With Python Using Xonsh ๐Ÿ”— ๐Ÿ”—
Draw The Data You Are Looking For In Seconds ๐Ÿ”— ๐Ÿ”—
Preview Your README File Locally In GitHub Style ๐Ÿ”— ๐Ÿ”—
This GUI Tool Can Possibly Save You Hours Of Manual Work ๐Ÿ”— ๐Ÿ”—
Stop Previewing Raw DataFrames. Instead, Use DataTables. ๐Ÿ”— ๐Ÿ”—
Converting Python To LaTeX Has Possibly Never Been So Simple ๐Ÿ”— ๐Ÿ”—
Label Your Data With The Click Of A Button ๐Ÿ”— ๐Ÿ”—
Analyze A Pandas DataFrame Without Code ๐Ÿ”— ๐Ÿ”—
A No-Code Online Tool To Explore and Understand Neural Networks ๐Ÿ”— ๐Ÿ”—
Speed-up NumPy 20x with Numexpr ๐Ÿ”— ๐Ÿ”—
Debugging Made Easy With PySnooper ๐Ÿ”— ๐Ÿ”—
Deep Learning Network Debugging Made Easy ๐Ÿ”— ๐Ÿ”—
CodeSquire: The AI Coding Assistant You Should Use Over GitHub Copilot ๐Ÿ”— ๐Ÿ”—
Find Unused Python Code With Ease ๐Ÿ”— ๐Ÿ”—
Enrich Your Notebook With Interactive Controls ๐Ÿ”— ๐Ÿ”—
Data Analysis Using No-Code Pandas In Jupyter ๐Ÿ”— ๐Ÿ”—
Modify Python Code During Run-Time ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
Modify Function During Run-Time ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
Importing Modules Made Easy with Pyforest ๐Ÿ”— ๐Ÿ”—
Create Pivot Tables, Aggregations and Plots Without Any Code ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—

Run-time Optimization

Title Notebook Substack Article
Pandas vs Polars โ€” Run-time and Memory Comparison ๐Ÿ”— ๐Ÿ”—
The Limitation of KMeans Which Is Often Overlooked by Many ๐Ÿ”— ๐Ÿ”—
Most Sklearn Users Don't Know This About Its LinearRegression Implementation ๐Ÿ”— ๐Ÿ”—
Probably The Fastest Way To Execute Your Python Code ๐Ÿ”— ๐Ÿ”—
Why Are We Typically Advised To Never Iterate Over A DataFrame? ๐Ÿ”— ๐Ÿ”—
Speed-up Parquet I/O of Pandas by 5x ๐Ÿ”— ๐Ÿ”—
A Single Line That Will Make Your Python Code Faster ๐Ÿ”— ๐Ÿ”—
Make Sklearn KMeans 20x times faster ๐Ÿ”— ๐Ÿ”—
Speed-up NumPy 20x with Numexpr ๐Ÿ”— ๐Ÿ”—
The Best File Format To Store A Pandas DataFrame ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
The Best Way to Use Apply() in Pandas ๐Ÿ”— ๐Ÿ”—
Don't Create Conditional Columns in Pandas with Apply ๐Ÿ”— ๐Ÿ”—
Why you should not dump DataFrames to a CSV ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
Parallelize Pandas Apply() With Swifter ๐Ÿ”— ๐Ÿ”—
Parallelize Pandas with Pandarallel ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
How to Read Multiple CSV Files Efficiently ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—

Sklearn

Title Notebook Substack Article
Why Sklearn's Linear Regression Has No Hyperparameters? ๐Ÿ”— ๐Ÿ”—
Scikit-LLM: Integrate Sklearn API with Large Language Models ๐Ÿ”— ๐Ÿ”—
Most Sklearn Users Don't Know This About Its LinearRegression Implementation ๐Ÿ”— ๐Ÿ”—
A Lesser-Known Feature of Sklearn To Train Models on Large Datasets ๐Ÿ”— ๐Ÿ”—
Sklearn One-liner to Generate Synthetic Data ๐Ÿ”— ๐Ÿ”—
Skorch: Use Scikit-learn API on PyTorch Models ๐Ÿ”— ๐Ÿ”—
Make Sklearn KMeans 20x times faster ๐Ÿ”— ๐Ÿ”—
Build Baseline Models Effortlessly With Sklearn ๐Ÿ”— ๐Ÿ”—
Polynomial Linear Regression with NumPy ๐Ÿ”— ๐Ÿ”—
An Elegant Way to Import Metrics From Sklearn ๐Ÿ”— ๐Ÿ”—
Feature Tracking Made Simple In Sklearn Transformers ๐Ÿ”— ๐Ÿ”—
Configure Sklearn To Output Pandas DataFrame ๐Ÿ”— ๐Ÿ”—

Debugging

Title Notebook Substack Article
Debugging Made Easy With PySnooper ๐Ÿ”— ๐Ÿ”—
Don't use print() to debug your code. ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
Inspect Program Flow with IceCream ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
Lesser-known Feature of f-strings in Python ๐Ÿ”— ๐Ÿ”—

Missing Data

Title Notebook Substack Article
Handle Missing Data With Missingno ๐Ÿ”— ๐Ÿ”—
Datatype For Handling Missing Valued Columns in Pandas ๐Ÿ”— ๐Ÿ”—

ML-AI News

Title Notebook Substack Article
Now You Can Use DALLยทE With OpenAI API ๐Ÿ”— ๐Ÿ”—

Machine Learning

Title Notebook Substack Article
Decision Trees ALWAYS Overfit. Here's A Lesser-Known Technique To Prevent It. ๐Ÿ”— ๐Ÿ”—
Evaluate Clustering Performance Without Ground Truth Labels ๐Ÿ”— ๐Ÿ”—
The Most Common Misconception About Continuous Probability Distributions ๐Ÿ”— ๐Ÿ”—
A Common Misconception About Feature Scaling and Standardization ๐Ÿ”— ๐Ÿ”—
Random Forest May Not Need An Explicit Validation Set For Evaluation ๐Ÿ”— ๐Ÿ”—
A Visual and Overly Simplified Guide To Bagging and Boosting ๐Ÿ”— ๐Ÿ”—
10 Most Common (and Must-Know) Loss Functions in ML ๐Ÿ”— ๐Ÿ”—
A Visual and Overly Simplified Guide To Bagging and Boosting ๐Ÿ”— ๐Ÿ”—
10 Most Common (and Must-Know) Loss Functions in ML ๐Ÿ”— ๐Ÿ”—
Theil-Sen Regression: The Robust Twin of Linear Regression ๐Ÿ”— ๐Ÿ”—
The Limitations Of Elbow Curve And What You Should Replace It With ๐Ÿ”— ๐Ÿ”—
21 Most Important (and Must-know) Mathematical Equations in Data Science ๐Ÿ”— ๐Ÿ”—
Try This If Your Linear Regression Model is Underperforming ๐Ÿ”— ๐Ÿ”—
The Limitation of KMeans Which Is Often Overlooked by Many ๐Ÿ”— ๐Ÿ”—
Nine Most Important Distributions in Data Science ๐Ÿ”— ๐Ÿ”—
The Limitation of Linear Regression Which is Often Overlooked By Many ๐Ÿ”— ๐Ÿ”—
The Limitation of Linear Regression Which is Often Overlooked By Many ๐Ÿ”— ๐Ÿ”—
A Reliable and Efficient Technique To Measure Feature Importance ๐Ÿ”— ๐Ÿ”—
Does Every ML Algorithm Rely on Gradient Descent? [๐Ÿ”—](https://github.com/ChawlaAvi/Daily-Dose-of-Data-Science/blob/main/Machine%20Learning/Does Every ML Algorithm Rely on Gradient Descent?.ipynb) ๐Ÿ”—
Visualize The Performance Of Linear Regression With This Simple Plot ๐Ÿ”— ๐Ÿ”—
Confidence Interval and Prediction Interval Are Not The Same ๐Ÿ”— ๐Ÿ”—
The Ultimate Categorization of Performance Metrics in ML ๐Ÿ”— ๐Ÿ”—
The Most Overlooked Problem With One-Hot Encoding ๐Ÿ”— ๐Ÿ”—
9 Most Important Plots in Data Science ๐Ÿ”— ๐Ÿ”—
Is Categorical Feature Encoding Always Necessary Before Training ML Models? ๐Ÿ”— ๐Ÿ”—
The Counterintuitive Behaviour of Training Accuracy and Training Loss ๐Ÿ”— ๐Ÿ”—
A Highly Overlooked Point In The Implementation of Sigmoid Function ๐Ÿ”— ๐Ÿ”—
The Ultimate Categorization of Clustering Algorithms ๐Ÿ”— ๐Ÿ”—
A Lesser-Known Feature of Sklearn To Train Models on Large Datasets ๐Ÿ”— ๐Ÿ”—
Visualize The Performance Of Any Linear Regression Model With This Simple Plot ๐Ÿ”— ๐Ÿ”—
How To Truly Use The Train, Validation and Test Set ๐Ÿ”— ๐Ÿ”—
The Advantages and Disadvantages of PCA To Consider Before Using It ๐Ÿ”— ๐Ÿ”—
Loss Functions: An Algorithm-wise Comprehensive Summary ๐Ÿ”— ๐Ÿ”—
Is Data Normalization Always Necessary Before Training ML Models? ๐Ÿ”— ๐Ÿ”—
A Visual Guide to Stochastic, Mini-batch, and Batch Gradient Descent ๐Ÿ”— ๐Ÿ”—
The Taxonomy Of Regression Algorithms That Many Don't Bother To Remember ๐Ÿ”— ๐Ÿ”—
The Limitation of PCA Which Many Folks Often Ignore ๐Ÿ”— ๐Ÿ”—
Breathing KMeans: A Better and Faster Alternative to KMeans ๐Ÿ”— ๐Ÿ”—
How Many Dimensions Should You Reduce Your Data To When Using PCA? ๐Ÿ”— ๐Ÿ”—
A Visual Guide To Sampling Techniques in Machine Learning ๐Ÿ”— ๐Ÿ”—
A Visual and Overly Simplified Guide to PCA ๐Ÿ”— ๐Ÿ”—
The Limitation Of Euclidean Distance Which Many Often Ignore ๐Ÿ”— ๐Ÿ”—
Visualising The Impact Of Regularisation Parameter ๐Ÿ”— ๐Ÿ”—
A (Highly) Important Point to Consider Before You Use KMeans Next Time ๐Ÿ”— ๐Ÿ”—
Is Class Imbalance Always A Big Problem To Deal With? ๐Ÿ”— ๐Ÿ”—
A Visual Comparison Between Locality and Density-based Clustering ๐Ÿ”— ๐Ÿ”—
Why Don't We Call It Logistic Classification Instead? ๐Ÿ”— ๐Ÿ”—
A Typical Thing About Decision Trees Which Many Often Ignore ๐Ÿ”— ๐Ÿ”—
Always Validate Your Output Variable Before Using Linear Regression ๐Ÿ”— ๐Ÿ”—
Why Is It Important To Shuffle Your Dataset Before Training An ML Model ๐Ÿ”— ๐Ÿ”—
Why Are We Typically Advised To Set Seeds for Random Generators? ๐Ÿ”— ๐Ÿ”—
This Small Tweak Can Significantly Boost The Run-time of KMeans ๐Ÿ”— ๐Ÿ”—
Most ML Folks Often Neglect This While Using Linear Regression ๐Ÿ”— ๐Ÿ”—
Is This The Best Animated Guide To KMeans Ever? ๐Ÿ”— ๐Ÿ”—
An Effective Yet Underrated Technique To Improve Model Performance ๐Ÿ”— ๐Ÿ”—
How to Encode Categorical Features With Many Categories? ๐Ÿ”— ๐Ÿ”—
Why KMeans May Not Be The Apt Clustering Algorithm Always ๐Ÿ”— ๐Ÿ”—
Skorch: Use Scikit-learn API on PyTorch Models ๐Ÿ”— ๐Ÿ”—
A No-Code Online Tool To Explore and Understand Neural Networks ๐Ÿ”— ๐Ÿ”—
Make Sklearn KMeans 20x times faster ๐Ÿ”— ๐Ÿ”—
Deep Learning Network Debugging Made Easy ๐Ÿ”— ๐Ÿ”—
Build Baseline Models Effortlessly With Sklearn ๐Ÿ”— ๐Ÿ”—
Polynomial Linear Regression with NumPy ๐Ÿ”— ๐Ÿ”—

Statistics

Title Notebook Substack Article
Be Cautious Before Drawing Any Conclusions Using Summary Statistics ๐Ÿ”— ๐Ÿ”—
The Limitation Of Pearson Correlation Which Many Often Ignore ๐Ÿ”— ๐Ÿ”—
Pandas and NumPy Return Different Values for Standard Deviation. Why? ๐Ÿ”— ๐Ÿ”—
Why Correlation (and Other Statistics) Can Be Misleading ๐Ÿ”— ๐Ÿ”—

Testing

Title Notebook Substack Article
Generate Your Own Fake Data In Seconds ๐Ÿ”— ๐Ÿ”—

Terminal

Title Notebook Substack Article
Supercharge Shell With Python Using Xonsh ๐Ÿ”— ๐Ÿ”—
Most Command-line Users Don't Know This Cool Trick About Using Terminals ๐Ÿ”— ๐Ÿ”—
Never Refactor Your Code Manually Again. Instead, Use Sourcery! ๐Ÿ”— ๐Ÿ”—
Create Data Plots Right From The Terminal ๐Ÿ”— ๐Ÿ”—
Visualize Commit History of Git Repo With Beautiful Animations ๐Ÿ”— ๐Ÿ”—
How Would You Identify Fuzzy Duplicates In A Data With Million Records? ๐Ÿ”— ๐Ÿ”—
Automated Code Refactoring With Sourcery ๐Ÿ”— ๐Ÿ”— ๐Ÿ”—
Explore CSV Data Right From The Terminal ๐Ÿ”— ๐Ÿ”—

Documents

Title Document Substack Article
Daily Dose of Data Science - Full Archive ๐Ÿ”— ๐Ÿ”—
35 Hidden Python Libraries That Are Absolute Gems ๐Ÿ”— ๐Ÿ”—
40 Open-Source Tools to Supercharge Your Pandas Workflow ๐Ÿ”— ๐Ÿ”—
37 Hidden Python Libraries That Are Absolute Gems ๐Ÿ”— ๐Ÿ”—
10 Automated EDA Tools That Will Save You Hours Of (Tedious) Work ๐Ÿ”— ๐Ÿ”—
30 Python Libraries to (Hugely) Boost Your Data Science Productivity ๐Ÿ”— ๐Ÿ”—

Animations

Title Notebook Substack Video
Visualizing The Data Transformation of a Neural Network ๐Ÿ”— ๐Ÿ”—

daily-dose-of-data-science's People

Contributors

chawlaavi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

daily-dose-of-data-science's Issues

when i use the code, the print is always the iris.data' result although i change the data.

Hi ChawlaAvi, when i use the code, the print is always the iris.data' result although i change the data.

`import pandas as pd
import numpy as np
import interactive_decision_tree as idt ## local module
from sklearn.tree import DecisionTreeClassifier

data = pd.read_csv('/Users/lee/Desktop/data-xy015.csv')
X = data.iloc[:, 0:31]
y = data.iloc[:, 38]

clf = DecisionTreeClassifier()
clf = clf.fit(X, y)

idt.create_tree(tree_model=clf,
X=X,
target_names=np.unique(y),
save_path='C:/Users/lee/Desktop/PY01/tree_template.html')

idt.create_sankey(tree_model=clf,
X=X,
target_names=np.unique(y),
save_path='C:/Users/lee/Desktop/PY01/sankey_template.html')
`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.