mrdbourke / zero-to-mastery-ml Goto Github PK

View Code? Open in Web Editor NEW

2.7K 2.7K 3.3K 250.2 MB

All course materials for the Zero to Mastery Machine Learning and Data Science course.

Home Page: https://dbourke.link/ZTMmlcourse

Jupyter Notebook 100.00% Shell 0.01%

data-science deep-learning machine-learning

zero-to-mastery-ml's Issues

Cannot participate in Discord

I got this error message:

Your message could not be delivered. This is usually because you don't share a server with the recipient or the recipient is only accepting direct messages from friends. You can see the full list of reasons here: https://support.discord.com/hc/en-us/articles/360060145013

Can't plot the bar graph

The plot_roc_curve is not supported in the shown version

Before sklearn 1.2:

from sklearn.metrics import plot_roc_curve
svc_disp = plot_roc_curve(svc, X_test, y_test)
rfc_disp = plot_roc_curve(rfc, X_test, y_test, ax=svc_disp.ax_)
From sklearn 1.2:

from sklearn.metrics import RocCurveDisplay
svc_disp = RocCurveDisplay.from_estimator(svc, X_test, y_test)
rfc_disp = RocCurveDisplay.from_estimator(rfc, X_test, y_test, ax=svc_disp.ax_)

regarding joining of ZTM community channel

I'm not able to join the ZTM Discord community channel

Make predictions on test data batch using the loaded full mode

Pandas Exercises Solution

In[24]

This does not work anymore
car_sales.groupby(["Make"]).mean()

The mean now needs a condition in order for it to work
car_sales.groupby(["Make"]).mean(numeric_only=True)

Zero To Mastery Data Science and Machine Learning Resources.

Improve the end-to-end-heart-disease classification model score...

I tried CatBoost With tunned hyperparameters it gives me the score of 0.96!! Can we use it in our model??

Issue regarding colliding dog breed name when plotting

Currently, in our visualization code, the dog breed labels sometimes collide with each other, making it difficult to read the breed names clearly. To address this problem and enhance the visual appeal of our graphs, we can implement a solution that prevents the breed names from overlapping.
with overlap

Proposed Solution:
We can make use of the tight_layout() function in our visualization code after visualizing the data batches
no overlap

Ml

error installing Jupyter

hi, I get this error while trying to install Jupyter through terminal in macOS.
how can I fix it?

thanks

Couldn't get through Discord verification

Hi, I am Rostislav Alpin new student for the course "Complete Machine Learning & Data Science Bootcamp 2023" couldn't go through the verification even after logging in to Discord. Please help to resolve the issue.
Thanks.

Zero to mastery

ZeroToMastery SciKit Learn Exercises

plot_roc_curve no longer exists in SciKitLearn 1.5+ but is used in the exercises; new function is RocCurveDisplay

Update notebook of "end-to-end-heart-disease-classification"

As you write True in X label and Predicted in Y label it should be opposite.

Predicting bulldozer price - Wrong Hyperparameters Tuning

In lecture no. 196, you use RandomizedSearchCV with default cv=5 for tuning hyperparameters, i think that's a wrong approach for time series data! because :

It will perform cross validation by randomly splitting the data into 5-folds i.e. losing intrinsic order of data
This will result in poor evaluation of best hyperparameters

What ChatGPT says -

What we can do is use a `TimeSeriesSplit` of sklearn!

You should suggest the correct way of doing this in you course soon!

New to git hub

completely new to github dont know how to use, what to do on github for machine learning and data science course, suggest me guide line like a kid need help for 1st time while geeting on github ..

i have take a course on machine learning and data science course..

help need for a begnner on github...

if their is any issue will inform on via email i.e [email protected], contact number :- =+91 8169044393 and +91 993077743.

Make predictions on test data batch using the loaded full model

Discord Community invalid link is invalid

Pandas 1.5.3 causes `ValueError`

Course:
"Complete Machine Learning & Data Science Bootcamp 2023"
Section 12, video 195, "Preprocessing Our Data", In the exercise "Make Predictions on Test Data"

Issue:
ValueError is thrown as demonstrated.

# Manually adjust to have auctioneerID_is_missing column
df_test["auctioneerID_is_missing"] = False
df_test.head()

# Make predictions on the test data
test_preds = ideal_model.predict(df_test)

A ValueError occurs:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[75], line 2
      1 # Make predictions on the test data
----> 2 test_preds = ideal_model.predict(df_test)

File ~/Documents/code/udemy/udemy_ml_ds_ztm/.venv/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:981, in ForestRegressor.predict(self, X)
    979 check_is_fitted(self)
    980 # Check data
--> 981 X = self._validate_X_predict(X)
    983 # Assign chunk of trees to jobs
    984 n_jobs, _, _ = _partition_estimators(self.n_estimators, self.n_jobs)

File ~/Documents/code/udemy/udemy_ml_ds_ztm/.venv/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:602, in BaseForest._validate_X_predict(self, X)
    599 """
    600 Validate X whenever one tries to predict, apply, predict_proba."""
    601 check_is_fitted(self)
--> 602 X = self._validate_data(X, dtype=DTYPE, accept_sparse="csr", reset=False)
    603 if issparse(X) and (X.indices.dtype != np.intc or X.indptr.dtype != np.intc):
    604     raise ValueError("No support for np.int64 index based sparse matrices")

File ~/Documents/code/udemy/udemy_ml_ds_ztm/.venv/lib/python3.9/site-packages/sklearn/base.py:548, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, **check_params)
    483 def _validate_data(
    484     self,
    485     X="no_validation",
   (...)
    489     **check_params,
    490 ):
    491     """Validate input data and set or check the `n_features_in_` attribute.
    492 
    493     Parameters
   (...)
    546         validated.
    547     """
--> 548     self._check_feature_names(X, reset=reset)
    550     if y is None and self._get_tags()["requires_y"]:
    551         raise ValueError(
    552             f"This {self.__class__.__name__} estimator "
    553             "requires y to be passed, but the target y is None."
    554         )

File ~/Documents/code/udemy/udemy_ml_ds_ztm/.venv/lib/python3.9/site-packages/sklearn/base.py:481, in BaseEstimator._check_feature_names(self, X, reset)
    476 if not missing_names and not unexpected_names:
    477     message += (
    478         "Feature names must be in the same order as they were in fit.\n"
    479     )
--> 481 raise ValueError(message)

ValueError: The feature names should match those that were passed during fit.
Feature names must be in the same order as they were in fit.

Tests:
By the error alone, one could assume the error was caused by the addition of the missing column. After a bit of research and troubleshooting, I ran the following tests to determine if they had the same columns, in order.

set(df_test.columns) == set(X_train.columns)
[Output]: True

df_test.columns.tolist() == X_train.columns.tolist()
[Output]: False

sorted(df_test.columns) == sorted(X_train.columns)
[Output]: True

Solution:
To fix the column order, I had to reindex the test data, based on the columns of the train data

df_test = df_test.reindex(X_train.columns, axis=1)

The code was successful, demonstrated by the next following lines in the exercise.

# Make predictions on the test data
test_preds = ideal_model.predict(df_test)
test_preds

which resulted in:

array([17030.00927386, 14355.53565165, 46623.08774286, ...,
       11964.85073347, 16496.71079281, 27119.99044029])

Fix Sklearn version upgrades videos/code

Some students are getting different results when running different models in Scikit-Learn.

This is because of different version upgrades (e.g. Scikit-Learn 0.23.0 -> 1.0.0).

Find the videos/code that is showing the worst results and update them with the newer versions.

inplace = True in AI/ML course

https://academy.zerotomastery.io/courses/complete-machine-learning-and-data-science-bootcamp-2020/lectures/12693715

In this section we are doing the inplace=True. It works but it hits with Warning

Predicting bulldozer price - Converting string to category

Instead of getting objects in an order I am getting bound method exception. The output is not as shown in the course. Please solve this and let me know.

I got an errors when I was trying to do data preprocessing

I was trying to follow your steps to convert the categorical features in the car_sales dataframe to numbers but got some errors

This is the thread:
https://github.com/scikit-learn/scikit-learn/issues/17741

Ml zero to hero

ML

I can't get through to the Discord Community

For advanced users: you should export the conda virtual environment as a resource

I see some dependency issues in the code you put in the ML videos, to avoid that you can provide requirements.txt or yml file as part of resources.

Thanks & Regards
Koteswara

Resolved Error in sklearn Lesson File - Incorrect Data Splitting

I have resolved an error in the provided sklearn lesson file. Below is my updated code along with the corrected data splitting after preprocessing:

Corrected data splitting after preprocessing

X_train, X_test, y_train, y_test = train_test_split(X_transform_df, y, test_size=0.2, random_state=5)

Fit and score the model

grid_cv = GridSearchCV(estimator=model, param_grid=param, cv=5, verbose=2)
grid_cv.fit(X_train, y_train)

y_preds = grid_cv.predict(X_test)
evaluation_metrics(y_test, y_preds)

You can also access the IPython Notebook containing the complete code and execution results updated-Notebook.

Update Sklearn API `plot_roc_curve` -> `RocCurveDisplay`

Link to notebook changed: https://github.com/mrdbourke/zero-to-mastery-ml/blob/master/section-3-structured-data-projects/end-to-end-heart-disease-classification.ipynb

Error

As of Scikit-Learn 1.2+ the method sklearn.metrics.plot_roc_curve is deprecated in favour of sklearn.metrics.RocCurveDisplay.

How to check your Scikit-Learn version

You can check your Scikit-Learn version with:

import sklearn
sklearn.__version__

How to update your Scikit-Learn version

You can run the following command in your terminal with your Conda (or other) environment active to upgrade Scikit-Learn (the -U stands for "upgrade):

pip install -U scikit-learn

Previous code (this will error if running Scikit-Learn version 1.2+)

# This will error if run in Scikit-Learn version 1.2+
from sklearn.metrics import plot_roc_curve

Also:

# This will error if run in Scikit-Learn version 1.2+
from sklearn.metrics import plot_roc_curve 
plot_roc_curve(gs_log_reg, X_test, y_test);

New code (this will work with Scikit-Learn version 1.2+)

from sklearn.metrics import RocCurveDisplay # new in Scikit-Learn 1.2+

And to plot a ROC curve, note the use of RocCurveDisplay.from_estimator():

# Scikit-Learn 1.2.0 or later
from sklearn.metrics import RocCurveDisplay 

# from_estimator() = use a model to plot ROC curve on data
RocCurveDisplay.from_estimator(estimator=gs_log_reg, 
                               X=X_test, 
                               y=y_test);

error section 6 vid 55

So I am coding along with Complete A.I. & Machine learning, data science bootcamp 2024. On video 55 selecting and viewing data with pandas part 2 of section 6. I try to run the code:

car_sales["Price"] = car_sales["Price"].str.replace('[$,.]', '').astype(int)

I run it and get the following error which I cannot seem to find any solution or fix for:

<>:1: SyntaxWarning: invalid escape sequence '$'
<>:1: SyntaxWarning: invalid escape sequence '$'
C:\Users\sweet\AppData\Local\Temp\ipykernel_16004\2312081839.py:1: SyntaxWarning: invalid escape sequence '$'
car_sales["Price"] = car_sales["Price"].str.replace('[$,.]', '').astype(int)

ValueError Traceback (most recent call last)
Cell In[170], line 1
----> 1 car_sales["Price"] = car_sales["Price"].str.replace('[$,.]', '').astype(int)

File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\generic.py:6640, in NDFrame.astype(self, dtype, copy, errors)
6634 results = [
6635 ser.astype(dtype, copy=copy, errors=errors) for _, ser in self.items()
6636 ]
6638 else:
6639 # else, only a single dtype is given
-> 6640 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
6641 res = self._constructor_from_mgr(new_data, axes=new_data.axes)
6642 return res.finalize(self, method="astype")

File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\internals\managers.py:430, in BaseBlockManager.astype(self, dtype, copy, errors)
427 elif using_copy_on_write():
428 copy = False
--> 430 return self.apply(
431 "astype",
432 dtype=dtype,
433 copy=copy,
434 errors=errors,
435 using_cow=using_copy_on_write(),
436 )

File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\internals\managers.py:363, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
361 applied = b.apply(f, **kwargs)
362 else:
--> 363 applied = getattr(b, f)(**kwargs)
364 result_blocks = extend_blocks(applied, result_blocks)
366 out = type(self).from_blocks(result_blocks, self.axes)

File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\internals\blocks.py:758, in Block.astype(self, dtype, copy, errors, using_cow, squeeze)
755 raise ValueError("Can not squeeze with more than one column.")
756 values = values[0, :] # type: ignore[call-overload]
--> 758 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
760 new_values = maybe_coerce_values(new_values)
762 refs = None

File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\dtypes\astype.py:237, in astype_array_safe(values, dtype, copy, errors)
234 dtype = dtype.numpy_dtype
236 try:
--> 237 new_values = astype_array(values, dtype, copy=copy)
238 except (ValueError, TypeError):
239 # e.g. _astype_nansafe can fail on object-dtype of strings
240 # trying to convert to float
241 if errors == "ignore":

File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\dtypes\astype.py:182, in astype_array(values, dtype, copy)
179 values = values.astype(dtype, copy=copy)
181 else:
--> 182 values = _astype_nansafe(values, dtype, copy=copy)
184 # in pandas we don't store numpy str dtypes, so convert to object
185 if isinstance(dtype, np.dtype) and issubclass(values.dtype.type, str):

File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\dtypes\astype.py:133, in _astype_nansafe(arr, dtype, copy, skipna)
129 raise ValueError(msg)
131 if copy or arr.dtype == object or dtype == object:
132 # Explicit copy, or required since NumPy can't view from / to object.
--> 133 return arr.astype(dtype, copy=True)
135 return arr.astype(dtype, copy=copy)

ValueError: invalid literal for int() with base 10: '$4,000.00'