kaggle / learntools Goto Github PK
View Code? Open in Web Editor NEWTools and tests used in Kaggle Learn exercises
License: Apache License 2.0
Tools and tests used in Kaggle Learn exercises
License: Apache License 2.0
Instructions for competition submission (specifically steps 3 and 4) in https://www.kaggle.com/ryanholbrook/feature-engineering-for-house-prices
are out of date since the competition submission process has been simplified and the ellipsis menu now allows you to directly select submit to competition.
This will need to be updated in the learntools macro as well.
Description:
In the Data Visualization: from Non-Coder to Coder
tutorial in the first exercise (Exercise: Hello, Seaborn
) in the third step (Review the data
) the step_3.check()
does not work
Also step_4.a.check()
and step_4.b.check()
does not work correctly. If it's changed to step_3.a.check()
and step_3.b.check()
it works good. step_3
still points to the fourth step - Plot the data
I think it could be the problem provided in this commit. As far as I can see it adds the "Review the data" step
What should be done:
Checking, hinting and solving provided problems works correctly in all tutorial exercise steps
Hi,
I have a question about the cross-validation method, as I understand, it is used to evaluate machine learning models on a random test set. Hence, I think we cannot apply it when having a static test set as provided by competitions. Because submitted models must be evaluated on the same testing data to be compared against each other. Correct me if I am wrong plz
thank u
learntools/learntools/python/blackjack.py
Line 58 in 3d21e2e
Should be :
if (tot + 9) <= 21:
because 1pt has already been added in tot for aces.
Example:
I have already 11pts without ace and I hit an Ace.
tot must get 10 pts (1 + 9) and not 11 (1+10)
Hey there! The schema of the hacker news dataset seems to have changed and the “by” field is no longer first. The tutorial may need to be updated to reflect this.
>>> YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
yield yaml.load(f)
After implementing the function select_features_l1
, the below error pops up in the 'train_model' section. I'm not too familiar with Kaggle's python notebooks and not sure if there's a way to browse defined variables like in VSCode. I explored the source a little and I see the selected
variable IS defined in the check
function, but in the current notebook, it is failing.
If you run step_5.solution()
you get:
def evaluate(model, texts, labels):
# Get predictions from textcat model
predicted_class = predict(model, texts)
# From labels, get the true class as a list of integers (POSITIVE -> 1, NEGATIVE -> 0)
true_class = [int(each['cats']['POSITIVE']) for each in labels]
# A boolean or int array indicating correct predictions
correct_predictions = predicted_class == true_class
# The accuracy, number of correct predictions divided by all predictions
accuracy = correct_predictions.mean()
return accuracy
but if you use that and run step_5.check()
it says incorrect. I can't find an answer to this question that passes step_5.check()
even though many solutions seem correct and run correctly in the next step.
The hint says: Hint: use a map and the argmax function.
However the documentation and the current answer uses idxmax.
Is it possible to use this tool for a learning course that isn't deployed on Kaggle Learn?
Imagine I want to create my own course on Python and want to share it with some friends. Is that possible? I still want to run the notebooks on Kaggle.
When I ran the first cell I got the error
│ exit code: 128 ╰─> See above for output.
Later in q_3.check()
, I got
TypeError: only size-1 arrays can be converted to Python scalars
q_4.check()
, I got
IndexError: list index out of range
check() for 3b and 4b is absent.
I am working through the Deep Learning course on Kaggle and ran into a problem with the Learning Transfer exercise (raw notebook here). It instructs:
Your training data is in the directory ../input/dogs-gone-sideways/train
. The validation data is in ../input/dogs-gone-sideways/val
. Use that information when setting up train_generator
and validation_generator
.
But using these directories didn't work for me. I am wondering if there was a change in the way the data is stored on Kaggle since this lesson was written. Paths that do work seem to be:
../input/dogs-gone-sideways/images/train
../input/dogs-gone-sideways/image/val
Using these directories the code now runs fine, but the checking code does complain about me using the wrong directories:
git clone https://github.com/Kaggle/learntools.git
.D:\Projects-intellij\machine-learning-course\kaggle\learntools>python ex1.py
Traceback (most recent call last):
File "ex1.py", line 4, in <module>
from learntools.pandas.creating_reading_and_writing import *
File "D:\Projects-intellij\machine-learning-course\kaggle\learntools\learntools\pandas\creating_reading_and_writing.py", line 49, in <module>
class ReadWineCsv(EqualityCheckProblem):
File "D:\Projects-intellij\machine-learning-course\kaggle\learntools\learntools\pandas\creating_reading_and_writing.py", line 54, in ReadWineCsv
_expected = pd.read_csv('../input/wine-reviews/winemag-data_first150k.csv', index_col=0)
File "C:\Users\OEM\Miniconda3\lib\site-packages\pandas\io\parsers.py", line 702, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Users\OEM\Miniconda3\lib\site-packages\pandas\io\parsers.py", line 429, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Users\OEM\Miniconda3\lib\site-packages\pandas\io\parsers.py", line 895, in __init__
self._make_engine(self.engine)
File "C:\Users\OEM\Miniconda3\lib\site-packages\pandas\io\parsers.py", line 1122, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\Users\OEM\Miniconda3\lib\site-packages\pandas\io\parsers.py", line 1853, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas\_libs\parsers.pyx", line 387, in pandas._libs.parsers.TextReader.__cinit__
File "pandas\_libs\parsers.pyx", line 705, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] File b'../input/wine-reviews/winemag-data_first150k.csv' does not exist: b'../input/wine-reviews/winemag-data_first150k.csv'
Where can I find instructions on how to setup datasets for the course?
build\bdist.win-amd64\egg\learntools\computer_vision\ex5.py:81: SyntaxWarning: "is" with a literal. Did you mean "=="?
assert (activations[0] is 'relu' and activations[1] is 'relu'),
build\bdist.win-amd64\egg\learntools\computer_vision\ex5.py:81: SyntaxWarning: "is" with a literal. Did you mean "=="?
assert (activations[0] is 'relu' and activations[1] is 'relu'),
byte-compiling build\bdist.win-amd64\egg\learntools\computer_vision\ex6.py to ex6.cpython-39.pyc
byte-compiling build\bdist.win-amd64\egg\learntools\computer_vision\visiontools.py to visiontools.cpython-39.pyc
build\bdist.win-amd64\egg\learntools\computer_vision\visiontools.py:40: SyntaxWarning: "is" with a literal. Did you mean "=="?
if type is 'binary':
build\bdist.win-amd64\egg\learntools\computer_vision\visiontools.py:42: SyntaxWarning: "is" with a literal. Did you mean "=="?
elif type is 'sparse':
build\bdist.win-amd64\egg\learntools\computer_vision\visiontools.py:302: SyntaxWarning: "is" with a literal. Did you mean "=="?
if layer.class.name is 'Conv2D']
build\bdist.win-amd64\egg\learntools\computer_vision\visiontools.py:460: SyntaxWarning: "is" with a literal. Did you mean "=="?
if fill_method is 'replicate':
build\bdist.win-amd64\egg\learntools\computer_vision\visiontools.py:464: SyntaxWarning: "is" with a literal. Did you mean "=="?
elif fill_method is 'reflect':
When I took the Pandas course I met an error:
import pandas as pd
pd.set_option('max_rows', 5)
from learntools.core import binder; binder.bind(globals())
from learntools.pandas.creating_reading_and_writing import *
print("Setup complete.")
WARNING:root:Ignoring repeated attempt to bind to globals
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-7-1cf6d6f127c2> in <module>
2 pd.set_option('max_rows', 5)
3 from learntools.core import binder; binder.bind(globals())
----> 4 from learntools.pandas.creating_reading_and_writing import *
5 print("Setup complete.")
ModuleNotFoundError: No module named 'learntools.pandas'
Hi,
not sure if this is the right spot to provide feedback. I can't find a suitable way over on kaggle.
Anyways, i startet working through the Data Visualization: From Non-Coder to Coder Micro-Course on kaggle and get the following Error in the first exercise:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-24-15ba42341748> in <module>()
2 brazil_rank = 3.0
3 # Check your answer
----> 4 step_3.check()
AttributeError: 'MultipartProblem' object has no attribute 'check'
I think there is a mismatch between the version hosted on kaggle and the current version of this repository (Step 3 on github is plotting the data, on kaggle it's reviewing the data).
Btw, i really like the learntools
idea :)
price_extremes = reviews.groupby('variety').price.agg([min, max])
TypeError Traceback (most recent call last)
<ipython-input-81-1ee08b4f09ca> in <module>
1 #q3.hint()
2 q3.solution()
----> 3 price_extremes = reviews.groupby('variety').price.agg([min, max])
/opt/conda/lib/python3.6/site-packages/pandas/core/groupby/generic.py in aggregate(self, func_or_funcs, *args, **kwargs)
849 # but not the class list / tuple itself.
850 func_or_funcs = _maybe_mangle_lambdas(func_or_funcs)
--> 851 ret = self._aggregate_multiple_funcs(func_or_funcs, (_level or 0) + 1)
852 if relabeling:
853 ret.columns = columns
/opt/conda/lib/python3.6/site-packages/pandas/core/groupby/generic.py in _aggregate_multiple_funcs(self, arg, _level)
916 for name, func in arg:
917 obj = self
--> 918 if name in results:
919 raise SpecificationError(
920 "Function names must be unique, found multiple named "
/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py in __hash__(self)
1884 raise TypeError(
1885 "{0!r} objects are mutable, thus they cannot be"
-> 1886 " hashed".format(self.__class__.__name__)
1887 )
1888
TypeError: 'Series' objects are mutable, thus they cannot be hashed
Exercise: Grouping and Sorting - point 3
notebooks/deep_learning/raw/ex3_programming_tf_and_keras.ipynb
2) Run an Example Model
from IPython.display import Image, display
from learntools.deep_learning.decode_predictions import decode_predictions
import numpy as np
from tensorflow.keras.applications.resnet50 import preprocess_input
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.preprocessing.image import load_img, img_to_array
image_size = 224
def read_and_prep_images(img_paths, img_height=image_size, img_width=image_size):
imgs = [load_img(img_path, target_size=(img_height, img_width)) for img_path in img_paths]
img_array = np.array([img_to_array(img) for img in imgs])
output = preprocess_input(img_array)
return(output)
my_model = ResNet50(weights='../input/resnet50/resnet50_weights_tf_dim_ordering_tf_kernels.h5')
test_data = read_and_prep_images(img_paths)
preds = my_model.predict(test_data)
most_likely_labels = decode_predictions(preds, top=3)
ValueError: Shapes (1, 1, 256, 512) and (512, 128, 1, 1) are incompatible
PR
While following along with the example locally, I got the same error. Doing a little google and trial and error, I finally got it to work with the following import while working with the dog files and weights:
from keras.applications.resnet50 import ResNet50
my_model = ResNet50(weights='./pre-trained/resnet50/resnet50_weights_tf_dim_ordering_tf_kernels.h5')
Please note that I'm just a beginner with python and tensorflow, so if there is a better fix, please let me know!
Hey, I was just going through the pandas tutorial and wasn't sure whether this is a typo or if I'm misunderstanding.
learntools/notebooks/pandas/raw/tut_1.ipynb
Line 279 in 886f5c2
it's a lot more convenient to index
df.loc['Apples':'Potatoes']
than it is to index something likedf.loc['Apples', 'Potatoet]
(t
coming afters
in the alphabet)
Should the second code snippet be df.iloc['Apples':'Potatoes']
? It was just explained that df.iloc[0:10]
gives you indices 0,...,9
but df.loc[0:10]
gives you indices 0,...,10
; and then I wasn't sure how they got df.loc['Apples', 'Potatoet]
.
The setup imports are written to import hacker_news with the actual being hacker-news which may be the possible cause behind this.
When I Did My Work Check Is Saying Incorrect Even When Pasting Solution
Please remove.
In the fourier_features()
example algorithm on the Seasonality lesson of the Time Series course, the variable name freq
is given to a parameter which takes units of days/cycle. This was confusing to me at first, because unless I'm mistaken, frequency typically refers to measurements given in inverse units (cycles/day), whereas period refers to time per cycle.
Whenever I try to run the first cell of the Time Series as Features exercise, I get this error:
Collecting git+https://github.com/Kaggle/learntools.git
Cloning https://github.com/Kaggle/learntools.git to /tmp/pip-req-build-65_z7vlm
Running command git clone --filter=blob:none -q https://github.com/Kaggle/learntools.git /tmp/pip-req-build-65_z7vlm
fatal: unable to access 'https://github.com/Kaggle/learntools.git/': Could not resolve host: github.com
WARNING: Discarding git+https://github.com/Kaggle/learntools.git. Command errored out with exit status 128: git clone --filter=blob:none -q https://github.com/Kaggle/learntools.git /tmp/pip-req-build-65_z7vlm Check the logs for full command output.
ERROR: Command errored out with exit status 128: git clone --filter=blob:none -q https://github.com/Kaggle/learntools.git /tmp/pip-req-build-65_z7vlm Check the logs for full command output.
https://www.kaggle.com/alexisbcook/distributions
This is a great course, but distplot examples should be replaced with displot or histplot as deprecation warnings are being encountered with distplot.
Hi,
I found wrong assert descriptions.
In this line, "reduced_X_train" should be replaced with "reduced_X_valid".
learntools/learntools/ml_intermediate/ex2.py
Line 113 in 1c59223
Also, "imputed_X_train" should be replaced with "imputed_X_valid".
For Exercise: Machine Learning Competitions, the train data file path should be
iowa_file_path = '../input/train.csv'
The current path is not working:
iowa_file_path = '../input/home-data-for-ml-course/train.csv'
Not making a PR because I'm not sure if the path should be corrected or the files should be placed in that path.
In the last code cell
print("MAE from Approach 3 (One-Hot Encoding):")
print(score_dataset(OH_X_train, OH_X_valid, y_train, y_valid))
This shows
MAE from Approach 3 (One-Hot Encoding):
/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py:1692: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['int', 'str']. An error will be raised in 1.2.
FutureWarning,
17525.345719178084
/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py:1692: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['int', 'str']. An error will be raised in 1.2.
FutureWarning,
Adding these to step4 can fix
OH_cols_train.columns = list(map(str, OH_cols_train.columns))
OH_cols_valid.columns = list(map(str, OH_cols_valid.columns))
Bug in the following file: https://github.com/Kaggle/learntools/blob/master/learntools/time_series/utils.py
The function doesn't correctly create the leads. For example, setting leads to 1, it just copies the existing columns under a new column named: {name}_lead_0. See the attached screenshot for an example.
def make_leads(ts, leads, name='y'):
return pd.concat(
{f'{name}lead{i}': ts.shift(-i)
for i in reversed(range(leads))},
axis=1)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.