Giter Club home page Giter Club logo

learntools's People

Contributors

alexisbcook avatar aurnik avatar bobfraser-google avatar caffeinehighzombie avatar calderjo avatar colinmorris avatar dakkers avatar dansbecker avatar djherbis avatar dvincelli avatar gagan2608 avatar ifigotin avatar igorsafo avatar imba-tjd avatar jplotts avatar kaczmarekwill avatar mcbex avatar mcleonard avatar nkoep avatar paulbarrett avatar pculliton avatar philmod avatar residentmario avatar roannav avatar rosbo avatar ryanholbrook avatar stephenramthun avatar unterumarmung avatar wendykan avatar zdiemer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

learntools's Issues

Wrong 'step' configuration in data_viz_to_coder ex1

Description:

In the Data Visualization: from Non-Coder to Coder tutorial in the first exercise (Exercise: Hello, Seaborn) in the third step (Review the data) the step_3.check() does not work
Also step_4.a.check() and step_4.b.check() does not work correctly. If it's changed to step_3.a.check() and step_3.b.check() it works good. step_3 still points to the fourth step - Plot the data

I think it could be the problem provided in this commit. As far as I can see it adds the "Review the data" step

What should be done:

Checking, hinting and solving provided problems works correctly in all tutorial exercise steps

k-fold CV vs static test split

Hi,
I have a question about the cross-validation method, as I understand, it is used to evaluate machine learning models on a random test set. Hence, I think we cannot apply it when having a static test set as provided by competitions. Because submitted models must be evaluated on the same testing data to be compared against each other. Correct me if I am wrong plz
thank u

SQL tutorial 1 out of date

Hey there! The schema of the hacker news dataset seems to have changed and the “by” field is no longer first. The tutorial may need to be updated to reflect this.

Deprecation warning in `prepare_push.py`

>>> YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
 yield yaml.load(f)

L1 Regularization training error

After implementing the function select_features_l1, the below error pops up in the 'train_model' section. I'm not too familiar with Kaggle's python notebooks and not sure if there's a way to browse defined variables like in VSCode. I explored the source a little and I see the selected variable IS defined in the check function, but in the current notebook, it is failing.

image

NLP Exercise, step 5, solution code fails .check()

If you run step_5.solution() you get:

def evaluate(model, texts, labels):
        # Get predictions from textcat model
        predicted_class = predict(model, texts)

        # From labels, get the true class as a list of integers (POSITIVE -> 1, NEGATIVE -> 0)
        true_class = [int(each['cats']['POSITIVE']) for each in labels]

        # A boolean or int array indicating correct predictions
        correct_predictions = predicted_class == true_class

        # The accuracy, number of correct predictions divided by all predictions
        accuracy = correct_predictions.mean()

        return accuracy

but if you use that and run step_5.check() it says incorrect. I can't find an answer to this question that passes step_5.check() even though many solutions seem correct and run correctly in the next step.

Validation path exercise_4.py

Hi Dan,

there appears to be an error in the validation function. The model works with the specifications in the instructions but check() throws an error indicating that the file path is wrong.

image

Type Error in Ex 5 of Intro to SQL

  1. When I ran the first cell I got the error
    │ exit code: 128 ╰─> See above for output.

  2. Later in q_3.check() , I got
    TypeError: only size-1 arrays can be converted to Python scalars

  3. q_4.check() , I got
    IndexError: list index out of range

Image directories incorrectly specified? (Deep Learning - exercise 4)

I am working through the Deep Learning course on Kaggle and ran into a problem with the Learning Transfer exercise (raw notebook here). It instructs:

Your training data is in the directory ../input/dogs-gone-sideways/train. The validation data is in ../input/dogs-gone-sideways/val. Use that information when setting up train_generator and validation_generator.

But using these directories didn't work for me. I am wondering if there was a change in the way the data is stored on Kaggle since this lesson was written. Paths that do work seem to be:

  • ../input/dogs-gone-sideways/images/train
  • ../input/dogs-gone-sideways/image/val

Using these directories the code now runs fine, but the checking code does complain about me using the wrong directories:

Missing instructions on dataset setup for Pandas course

  1. I did git clone https://github.com/Kaggle/learntools.git.
  2. I followed instructions for Panda Course running the first example and got the following error:
D:\Projects-intellij\machine-learning-course\kaggle\learntools>python ex1.py
Traceback (most recent call last):
  File "ex1.py", line 4, in <module>
    from learntools.pandas.creating_reading_and_writing import *
  File "D:\Projects-intellij\machine-learning-course\kaggle\learntools\learntools\pandas\creating_reading_and_writing.py", line 49, in <module>
    class ReadWineCsv(EqualityCheckProblem):
  File "D:\Projects-intellij\machine-learning-course\kaggle\learntools\learntools\pandas\creating_reading_and_writing.py", line 54, in ReadWineCsv
    _expected = pd.read_csv('../input/wine-reviews/winemag-data_first150k.csv', index_col=0)
  File "C:\Users\OEM\Miniconda3\lib\site-packages\pandas\io\parsers.py", line 702, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "C:\Users\OEM\Miniconda3\lib\site-packages\pandas\io\parsers.py", line 429, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "C:\Users\OEM\Miniconda3\lib\site-packages\pandas\io\parsers.py", line 895, in __init__
    self._make_engine(self.engine)
  File "C:\Users\OEM\Miniconda3\lib\site-packages\pandas\io\parsers.py", line 1122, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "C:\Users\OEM\Miniconda3\lib\site-packages\pandas\io\parsers.py", line 1853, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas\_libs\parsers.pyx", line 387, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas\_libs\parsers.pyx", line 705, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] File b'../input/wine-reviews/winemag-data_first150k.csv' does not exist: b'../input/wine-reviews/winemag-data_first150k.csv'

Where can I find instructions on how to setup datasets for the course?

python

build\bdist.win-amd64\egg\learntools\computer_vision\ex5.py:81: SyntaxWarning: "is" with a literal. Did you mean "=="?
assert (activations[0] is 'relu' and activations[1] is 'relu'),
build\bdist.win-amd64\egg\learntools\computer_vision\ex5.py:81: SyntaxWarning: "is" with a literal. Did you mean "=="?
assert (activations[0] is 'relu' and activations[1] is 'relu'),
byte-compiling build\bdist.win-amd64\egg\learntools\computer_vision\ex6.py to ex6.cpython-39.pyc
byte-compiling build\bdist.win-amd64\egg\learntools\computer_vision\visiontools.py to visiontools.cpython-39.pyc
build\bdist.win-amd64\egg\learntools\computer_vision\visiontools.py:40: SyntaxWarning: "is" with a literal. Did you mean "=="?
if type is 'binary':
build\bdist.win-amd64\egg\learntools\computer_vision\visiontools.py:42: SyntaxWarning: "is" with a literal. Did you mean "=="?
elif type is 'sparse':
build\bdist.win-amd64\egg\learntools\computer_vision\visiontools.py:302: SyntaxWarning: "is" with a literal. Did you mean "=="?
if layer.class.name is 'Conv2D']
build\bdist.win-amd64\egg\learntools\computer_vision\visiontools.py:460: SyntaxWarning: "is" with a literal. Did you mean "=="?
if fill_method is 'replicate':
build\bdist.win-amd64\egg\learntools\computer_vision\visiontools.py:464: SyntaxWarning: "is" with a literal. Did you mean "=="?
elif fill_method is 'reflect':

Import Error in Pandas course

When I took the Pandas course I met an error:

import pandas as pd
pd.set_option('max_rows', 5)
from learntools.core import binder; binder.bind(globals())
from learntools.pandas.creating_reading_and_writing import *
print("Setup complete.")
WARNING:root:Ignoring repeated attempt to bind to globals

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-7-1cf6d6f127c2> in <module>
      2 pd.set_option('max_rows', 5)
      3 from learntools.core import binder; binder.bind(globals())
----> 4 from learntools.pandas.creating_reading_and_writing import *
      5 print("Setup complete.")

ModuleNotFoundError: No module named 'learntools.pandas'

AttributeError: 'MultipartProblem' object has no attribute 'check' (data_viz_to_coder)

Hi,
not sure if this is the right spot to provide feedback. I can't find a suitable way over on kaggle.
Anyways, i startet working through the Data Visualization: From Non-Coder to Coder Micro-Course on kaggle and get the following Error in the first exercise:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-24-15ba42341748> in <module>()
      2 brazil_rank = 3.0
      3 # Check your answer
----> 4 step_3.check()

AttributeError: 'MultipartProblem' object has no attribute 'check'

I think there is a mismatch between the version hosted on kaggle and the current version of this repository (Step 3 on github is plotting the data, on kaggle it's reviewing the data).

Btw, i really like the learntools idea :)

Error in solution in Pandas tutorial

price_extremes = reviews.groupby('variety').price.agg([min, max])

TypeError                                 Traceback (most recent call last)
<ipython-input-81-1ee08b4f09ca> in <module>
      1 #q3.hint()
      2 q3.solution()
----> 3 price_extremes = reviews.groupby('variety').price.agg([min, max])

/opt/conda/lib/python3.6/site-packages/pandas/core/groupby/generic.py in aggregate(self, func_or_funcs, *args, **kwargs)
    849             # but not the class list / tuple itself.
    850             func_or_funcs = _maybe_mangle_lambdas(func_or_funcs)
--> 851             ret = self._aggregate_multiple_funcs(func_or_funcs, (_level or 0) + 1)
    852             if relabeling:
    853                 ret.columns = columns

/opt/conda/lib/python3.6/site-packages/pandas/core/groupby/generic.py in _aggregate_multiple_funcs(self, arg, _level)
    916         for name, func in arg:
    917             obj = self
--> 918             if name in results:
    919                 raise SpecificationError(
    920                     "Function names must be unique, found multiple named "

/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py in __hash__(self)
   1884         raise TypeError(
   1885             "{0!r} objects are mutable, thus they cannot be"
-> 1886             " hashed".format(self.__class__.__name__)
   1887         )
   1888 

TypeError: 'Series' objects are mutable, thus they cannot be hashed

Exercise: Grouping and Sorting - point 3

Deep Learning EX3 Tensorflow Shape Error

Notebook path & Example:

notebooks/deep_learning/raw/ex3_programming_tf_and_keras.ipynb
2) Run an Example Model

Failing code

from IPython.display import Image, display
from learntools.deep_learning.decode_predictions import decode_predictions
import numpy as np
from tensorflow.keras.applications.resnet50 import preprocess_input
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.preprocessing.image import load_img, img_to_array


image_size = 224

def read_and_prep_images(img_paths, img_height=image_size, img_width=image_size):
    imgs = [load_img(img_path, target_size=(img_height, img_width)) for img_path in img_paths]
    img_array = np.array([img_to_array(img) for img in imgs])
    output = preprocess_input(img_array)
    return(output)


my_model = ResNet50(weights='../input/resnet50/resnet50_weights_tf_dim_ordering_tf_kernels.h5')
test_data = read_and_prep_images(img_paths)
preds = my_model.predict(test_data)

most_likely_labels = decode_predictions(preds, top=3)

Error

ValueError: Shapes (1, 1, 256, 512) and (512, 128, 1, 1) are incompatible

image

Supposed fix:

PR

While following along with the example locally, I got the same error. Doing a little google and trial and error, I finally got it to work with the following import while working with the dog files and weights:

from keras.applications.resnet50 import ResNet50

my_model = ResNet50(weights='./pre-trained/resnet50/resnet50_weights_tf_dim_ordering_tf_kernels.h5')

Please note that I'm just a beginner with python and tensorflow, so if there is a better fix, please let me know!

Typo in pandas tutorial 1?

Hey, I was just going through the pandas tutorial and wasn't sure whether this is a typo or if I'm misunderstanding.

"Why the change? Remember that loc can index any stdlib type: strings, for example. If we have a DataFrame with index values `Apples, ..., Potatoes, ...`, and we want to select \"all the alphabetical fruit choices between Apples and Potatoes\", then it's a lot more convenient to index `df.loc['Apples':'Potatoes']` than it is to index something like `df.loc['Apples', 'Potatoet]` (`t` coming after `s` in the alphabet).\n",

it's a lot more convenient to index df.loc['Apples':'Potatoes'] than it is to index something like df.loc['Apples', 'Potatoet] (t coming after s in the alphabet)

Should the second code snippet be df.iloc['Apples':'Potatoes']? It was just explained that df.iloc[0:10] gives you indices 0,...,9 but df.loc[0:10] gives you indices 0,...,10; and then I wasn't sure how they got df.loc['Apples', 'Potatoet].

--

Please remove.

"Frequency" used in place of "phase" in Seasonality tutorial

In the fourier_features() example algorithm on the Seasonality lesson of the Time Series course, the variable name freq is given to a parameter which takes units of days/cycle. This was confusing to me at first, because unless I'm mistaken, frequency typically refers to measurements given in inverse units (cycles/day), whereas period refers to time per cycle.

Trouble running Exercise: Time series as Features

Whenever I try to run the first cell of the Time Series as Features exercise, I get this error:
Collecting git+https://github.com/Kaggle/learntools.git
Cloning https://github.com/Kaggle/learntools.git to /tmp/pip-req-build-65_z7vlm
Running command git clone --filter=blob:none -q https://github.com/Kaggle/learntools.git /tmp/pip-req-build-65_z7vlm
fatal: unable to access 'https://github.com/Kaggle/learntools.git/': Could not resolve host: github.com
WARNING: Discarding git+https://github.com/Kaggle/learntools.git. Command errored out with exit status 128: git clone --filter=blob:none -q https://github.com/Kaggle/learntools.git /tmp/pip-req-build-65_z7vlm Check the logs for full command output.
ERROR: Command errored out with exit status 128: git clone --filter=blob:none -q https://github.com/Kaggle/learntools.git /tmp/pip-req-build-65_z7vlm Check the logs for full command output.

Input file paths are wrong

For Exercise: Machine Learning Competitions, the train data file path should be
iowa_file_path = '../input/train.csv'

The current path is not working:
iowa_file_path = '../input/home-data-for-ml-course/train.csv'

Not making a PR because I'm not sure if the path should be corrected or the files should be placed in that path.

[Exercise: Categorical Variables] FutureWarning: Feature names only support names that are all strings

In the last code cell

print("MAE from Approach 3 (One-Hot Encoding):") 
print(score_dataset(OH_X_train, OH_X_valid, y_train, y_valid))

This shows

MAE from Approach 3 (One-Hot Encoding):

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py:1692: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['int', 'str']. An error will be raised in 1.2.
  FutureWarning,

17525.345719178084

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py:1692: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['int', 'str']. An error will be raised in 1.2.
  FutureWarning,

Adding these to step4 can fix

OH_cols_train.columns = list(map(str, OH_cols_train.columns))
OH_cols_valid.columns = list(map(str, OH_cols_valid.columns))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.