mrdbourke / zero-to-mastery-ml Goto Github PK
View Code? Open in Web Editor NEWAll course materials for the Zero to Mastery Machine Learning and Data Science course.
Home Page: https://dbourke.link/ZTMmlcourse
All course materials for the Zero to Mastery Machine Learning and Data Science course.
Home Page: https://dbourke.link/ZTMmlcourse
I got this error message:
Your message could not be delivered. This is usually because you don't share a server with the recipient or the recipient is only accepting direct messages from friends. You can see the full list of reasons here: https://support.discord.com/hc/en-us/articles/360060145013
Before sklearn 1.2:
from sklearn.metrics import plot_roc_curve
svc_disp = plot_roc_curve(svc, X_test, y_test)
rfc_disp = plot_roc_curve(rfc, X_test, y_test, ax=svc_disp.ax_)
From sklearn 1.2:
from sklearn.metrics import RocCurveDisplay
svc_disp = RocCurveDisplay.from_estimator(svc, X_test, y_test)
rfc_disp = RocCurveDisplay.from_estimator(rfc, X_test, y_test, ax=svc_disp.ax_)
I'm not able to join the ZTM Discord community channel
In[24]
This does not work anymore
car_sales.groupby(["Make"]).mean()
The mean now needs a condition in order for it to work
car_sales.groupby(["Make"]).mean(numeric_only=True)
Currently, in our visualization code, the dog breed labels sometimes collide with each other, making it difficult to read the breed names clearly. To address this problem and enhance the visual appeal of our graphs, we can implement a solution that prevents the breed names from overlapping.
with overlap
Proposed Solution:
We can make use of the tight_layout() function in our visualization code after visualizing the data batches
no overlap
Hi, I am Rostislav Alpin new student for the course "Complete Machine Learning & Data Science Bootcamp 2023" couldn't go through the verification even after logging in to Discord. Please help to resolve the issue.
Thanks.
plot_roc_curve no longer exists in SciKitLearn 1.5+ but is used in the exercises; new function is RocCurveDisplay
In lecture no. 196, you use RandomizedSearchCV
with default cv=5
for tuning hyperparameters, i think that's a wrong approach for time series data! because :
TimeSeriesSplit
of sklearn!You should suggest the correct way of doing this in you course soon!
completely new to github dont know how to use, what to do on github for machine learning and data science course, suggest me guide line like a kid need help for 1st time while geeting on github ..
i have take a course on machine learning and data science course..
help need for a begnner on github...
if their is any issue will inform on via email i.e [email protected], contact number :- =+91 8169044393 and +91 993077743.
Course:
"Complete Machine Learning & Data Science Bootcamp 2023"
Section 12, video 195, "Preprocessing Our Data", In the exercise "Make Predictions on Test Data"
Issue:
ValueError
is thrown as demonstrated.
# Manually adjust to have auctioneerID_is_missing column
df_test["auctioneerID_is_missing"] = False
df_test.head()
# Make predictions on the test data
test_preds = ideal_model.predict(df_test)
A ValueError
occurs:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[75], line 2
1 # Make predictions on the test data
----> 2 test_preds = ideal_model.predict(df_test)
File ~/Documents/code/udemy/udemy_ml_ds_ztm/.venv/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:981, in ForestRegressor.predict(self, X)
979 check_is_fitted(self)
980 # Check data
--> 981 X = self._validate_X_predict(X)
983 # Assign chunk of trees to jobs
984 n_jobs, _, _ = _partition_estimators(self.n_estimators, self.n_jobs)
File ~/Documents/code/udemy/udemy_ml_ds_ztm/.venv/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:602, in BaseForest._validate_X_predict(self, X)
599 """
600 Validate X whenever one tries to predict, apply, predict_proba."""
601 check_is_fitted(self)
--> 602 X = self._validate_data(X, dtype=DTYPE, accept_sparse="csr", reset=False)
603 if issparse(X) and (X.indices.dtype != np.intc or X.indptr.dtype != np.intc):
604 raise ValueError("No support for np.int64 index based sparse matrices")
File ~/Documents/code/udemy/udemy_ml_ds_ztm/.venv/lib/python3.9/site-packages/sklearn/base.py:548, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, **check_params)
483 def _validate_data(
484 self,
485 X="no_validation",
(...)
489 **check_params,
490 ):
491 """Validate input data and set or check the `n_features_in_` attribute.
492
493 Parameters
(...)
546 validated.
547 """
--> 548 self._check_feature_names(X, reset=reset)
550 if y is None and self._get_tags()["requires_y"]:
551 raise ValueError(
552 f"This {self.__class__.__name__} estimator "
553 "requires y to be passed, but the target y is None."
554 )
File ~/Documents/code/udemy/udemy_ml_ds_ztm/.venv/lib/python3.9/site-packages/sklearn/base.py:481, in BaseEstimator._check_feature_names(self, X, reset)
476 if not missing_names and not unexpected_names:
477 message += (
478 "Feature names must be in the same order as they were in fit.\n"
479 )
--> 481 raise ValueError(message)
ValueError: The feature names should match those that were passed during fit.
Feature names must be in the same order as they were in fit.
Tests:
By the error alone, one could assume the error was caused by the addition of the missing column. After a bit of research and troubleshooting, I ran the following tests to determine if they had the same columns, in order.
set(df_test.columns) == set(X_train.columns)
[Output]: True
df_test.columns.tolist() == X_train.columns.tolist()
[Output]: False
sorted(df_test.columns) == sorted(X_train.columns)
[Output]: True
Solution:
To fix the column order, I had to reindex the test data, based on the columns of the train data
df_test = df_test.reindex(X_train.columns, axis=1)
The code was successful, demonstrated by the next following lines in the exercise.
# Make predictions on the test data
test_preds = ideal_model.predict(df_test)
test_preds
which resulted in:
array([17030.00927386, 14355.53565165, 46623.08774286, ...,
11964.85073347, 16496.71079281, 27119.99044029])
Some students are getting different results when running different models in Scikit-Learn.
This is because of different version upgrades (e.g. Scikit-Learn 0.23.0 -> 1.0.0).
Find the videos/code that is showing the worst results and update them with the newer versions.
I was trying to follow your steps to convert the categorical features in the car_sales dataframe to numbers but got some errors
This is the thread:
https://github.com/scikit-learn/scikit-learn/issues/17741
I see some dependency issues in the code you put in the ML videos, to avoid that you can provide requirements.txt or yml file as part of resources.
Thanks & Regards
Koteswara
I have resolved an error in the provided sklearn lesson file. Below is my updated code along with the corrected data splitting after preprocessing:
X_train, X_test, y_train, y_test = train_test_split(X_transform_df, y, test_size=0.2, random_state=5)
grid_cv = GridSearchCV(estimator=model, param_grid=param, cv=5, verbose=2)
grid_cv.fit(X_train, y_train)
y_preds = grid_cv.predict(X_test)
evaluation_metrics(y_test, y_preds)
You can also access the IPython Notebook containing the complete code and execution results updated-Notebook.
Link to notebook changed: https://github.com/mrdbourke/zero-to-mastery-ml/blob/master/section-3-structured-data-projects/end-to-end-heart-disease-classification.ipynb
As of Scikit-Learn 1.2+ the method sklearn.metrics.plot_roc_curve
is deprecated in favour of sklearn.metrics.RocCurveDisplay
.
You can check your Scikit-Learn version with:
import sklearn
sklearn.__version__
You can run the following command in your terminal with your Conda (or other) environment active to upgrade Scikit-Learn (the -U
stands for "upgrade):
pip install -U scikit-learn
# This will error if run in Scikit-Learn version 1.2+
from sklearn.metrics import plot_roc_curve
Also:
# This will error if run in Scikit-Learn version 1.2+
from sklearn.metrics import plot_roc_curve
plot_roc_curve(gs_log_reg, X_test, y_test);
from sklearn.metrics import RocCurveDisplay # new in Scikit-Learn 1.2+
And to plot a ROC curve, note the use of RocCurveDisplay.from_estimator()
:
# Scikit-Learn 1.2.0 or later
from sklearn.metrics import RocCurveDisplay
# from_estimator() = use a model to plot ROC curve on data
RocCurveDisplay.from_estimator(estimator=gs_log_reg,
X=X_test,
y=y_test);
So I am coding along with Complete A.I. & Machine learning, data science bootcamp 2024. On video 55 selecting and viewing data with pandas part 2 of section 6. I try to run the code:
car_sales["Price"] = car_sales["Price"].str.replace('[$,.]', '').astype(int)
I run it and get the following error which I cannot seem to find any solution or fix for:
<>:1: SyntaxWarning: invalid escape sequence '$'
<>:1: SyntaxWarning: invalid escape sequence '$'
C:\Users\sweet\AppData\Local\Temp\ipykernel_16004\2312081839.py:1: SyntaxWarning: invalid escape sequence '$'
car_sales["Price"] = car_sales["Price"].str.replace('[$,.]', '').astype(int)
ValueError Traceback (most recent call last)
Cell In[170], line 1
----> 1 car_sales["Price"] = car_sales["Price"].str.replace('[$,.]', '').astype(int)
File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\generic.py:6640, in NDFrame.astype(self, dtype, copy, errors)
6634 results = [
6635 ser.astype(dtype, copy=copy, errors=errors) for _, ser in self.items()
6636 ]
6638 else:
6639 # else, only a single dtype is given
-> 6640 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
6641 res = self._constructor_from_mgr(new_data, axes=new_data.axes)
6642 return res.finalize(self, method="astype")
File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\internals\managers.py:430, in BaseBlockManager.astype(self, dtype, copy, errors)
427 elif using_copy_on_write():
428 copy = False
--> 430 return self.apply(
431 "astype",
432 dtype=dtype,
433 copy=copy,
434 errors=errors,
435 using_cow=using_copy_on_write(),
436 )
File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\internals\managers.py:363, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
361 applied = b.apply(f, **kwargs)
362 else:
--> 363 applied = getattr(b, f)(**kwargs)
364 result_blocks = extend_blocks(applied, result_blocks)
366 out = type(self).from_blocks(result_blocks, self.axes)
File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\internals\blocks.py:758, in Block.astype(self, dtype, copy, errors, using_cow, squeeze)
755 raise ValueError("Can not squeeze with more than one column.")
756 values = values[0, :] # type: ignore[call-overload]
--> 758 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
760 new_values = maybe_coerce_values(new_values)
762 refs = None
File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\dtypes\astype.py:237, in astype_array_safe(values, dtype, copy, errors)
234 dtype = dtype.numpy_dtype
236 try:
--> 237 new_values = astype_array(values, dtype, copy=copy)
238 except (ValueError, TypeError):
239 # e.g. _astype_nansafe can fail on object-dtype of strings
240 # trying to convert to float
241 if errors == "ignore":
File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\dtypes\astype.py:182, in astype_array(values, dtype, copy)
179 values = values.astype(dtype, copy=copy)
181 else:
--> 182 values = _astype_nansafe(values, dtype, copy=copy)
184 # in pandas we don't store numpy str dtypes, so convert to object
185 if isinstance(dtype, np.dtype) and issubclass(values.dtype.type, str):
File ~\Desktop\sample_project_1\env\Lib\site-packages\pandas\core\dtypes\astype.py:133, in _astype_nansafe(arr, dtype, copy, skipna)
129 raise ValueError(msg)
131 if copy or arr.dtype == object or dtype == object:
132 # Explicit copy, or required since NumPy can't view from / to object.
--> 133 return arr.astype(dtype, copy=True)
135 return arr.astype(dtype, copy=copy)
ValueError: invalid literal for int() with base 10: '$4,000.00'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.