The issue is already open in the scikit-learn repo. But maybe you can help with it.</p

Alternatively, you can use TensorFlow: <div class="snippet-clipboard-content notra

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Chapter 3 - MNIST data not available about handson-ml HOT 11 CLOSED

Jeronimo-GL commented on May 3, 2024 2

Chapter 3 - MNIST data not available

from handson-ml.

Comments (11)

ageron commented on May 3, 2024 39

Ok, I just pushed a workaround in the notebooks that use fetch_mldata() (chapters 3, 7, 8 and 13).

Not very pretty, but it works... as long as github.com doesn't go down and amplab/datascience-sp14 does not remove the data.

from six.moves import urllib
from sklearn.datasets import fetch_mldata
try:
    mnist = fetch_mldata('MNIST original')
except urllib.error.HTTPError as ex:
    print("Could not download MNIST data from mldata.org, trying alternative...")

    # Alternative method to load MNIST, if mldata.org is down
    from scipy.io import loadmat
    mnist_alternative_url = "https://github.com/amplab/datascience-sp14/raw/master/lab7/mldata/mnist-original.mat"
    mnist_path = "./mnist-original.mat"
    response = urllib.request.urlopen(mnist_alternative_url)
    with open(mnist_path, "wb") as f:
        content = response.read()
        f.write(content)
    mnist_raw = loadmat(mnist_path)
    mnist = {
        "data": mnist_raw["data"].T,
        "target": mnist_raw["label"][0],
        "COL_NAMES": ["label", "data"],
        "DESCR": "mldata.org dataset: mnist-original",
    }
    print("Success!")

from handson-ml.

ageron commented on May 3, 2024 5

Alternatively, you can use TensorFlow:

>>> from tensorflow.examples.tutorials.mnist import input_data
>>> mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

from handson-ml.

gig67 commented on May 3, 2024 3

Aurelien, yours is a great book and after reading it all I am now going through all the exercises.
So I got stuck here, as others.
The MNIST dataset is quite big at 50MB, so those like me on mobile data may struggle to download the file each time.
My suggestions would be to download the data just once into a local directory from https://github.com/amplab/datascience-sp14/raw/master/lab7/mldata/mnist-original.mat
and then use a shorter version of your code:

from scipy.io import loadmat
mnist_path = "my/local/path/mnist-original.mat" #the MNIST file has been previously downloaded here
mnist_raw = loadmat(mnist_path)
mnist = {
"data": mnist_raw["data"].T,
"target": mnist_raw["label"][0],
"COL_NAMES": ["label", "data"],
"DESCR": "mldata.org dataset: mnist-original",
}
print("Success!")

All the best
G

from handson-ml.

Jeronimo-GL commented on May 3, 2024 2

Thanks!. Great book, I forgot to mention. 2017-03-27 18:33 GMT+02:00 Aurélien Geron <[email protected]>:

…

Alternatively, you can use TensorFlow: >>> from tensorflow.examples.tutorials.mnist import input_data >>> mnist = input_data.read_data_sets('MNIST_data', one_hot=True) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#7 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AOHdMd24RGtgX9chz9K8v8v3Z_OXneHlks5rp-TggaJpZM4Mpe46> .

from handson-ml.

TomMeowMeow commented on May 3, 2024 2

Hi all,

Thanks for the great book and the advice above. This is 2019 and this is still a problem. I would like to put my solution here so others might use it.

Aurelien, your solution above has some issues, as MAC would return an error "TimeoutError: [Errno 60] Operation timed out", because the website is down. I also agreed with gig67 that we do not need to download dataset every time unless we have to. Therefore, I simply combined them together here:

from six.moves import urllib
from sklearn.datasets import fetch_mldata

#try different solutions from here
#https://github.com/ageron/handson-ml/issues/7
#the try code below would return operation timed out as the web is down
#try to import data
    # Alternative method to load MNIST, if mldata.org is down
from scipy.io import loadmat
def data_down():
    mnist_alternative_url = "https://github.com/amplab/datascience-sp14/raw/master/lab7/mldata/mnist-original.mat"
    mnist_path = "./mnist-original.mat"
    response = urllib.request.urlopen(mnist_alternative_url)
    with open(mnist_path, "wb") as f:
        content = response.read()
        f.write(content)
    return mnist_path

#run this if it is the first time or you need to update
# mnist_path = data_down()

#since we already download the dataset, we could directly use them
mnist_path = "./mnist-original.mat"
mnist_raw = loadmat(mnist_path)
mnist = {
    "data": mnist_raw["data"].T,
    "target": mnist_raw["label"][0],
    "COL_NAMES": ["label", "data"],
    "DESCR": "mldata.org dataset: mnist-original",
    }
print("Success!")

So now we could download the data first time we use, and stick on it during the analysis, and update if needed in future.

from handson-ml.

ageron commented on May 3, 2024 1

Hi @TomGauss , thanks for your message. fetch_mldata() is now deprecated and it has been replaced with fetch_openml(). See my comments in #301 for more details.

from handson-ml.

ageron commented on May 3, 2024

Thanks for heads up, I'll look into this. In the meantime you can dowload MNIST in many places, for example:
http://yann.lecun.com/exdb/mnist/

from handson-ml.

ageron commented on May 3, 2024

I'm glad you enjoy it! :)

Apparently, the admins of mldata.org are working on fixing the problem. Hopefully, everything should be back to normal within the next few days.

from handson-ml.

ageron commented on May 3, 2024

Mmh, in fact after some investigation, it seems that mldata.org was managed by the European Union's PASCAL2 project, which was closed a couple years ago. The people in charge of these servers probably have some other jobs now, so it might take another few days. If this lasts too long, I'll update the book and the Jupyter notebooks.

from handson-ml.

anhduc2203 commented on May 3, 2024

I tries alternative but it announce error:
File "", line 2
mnist_alternative_url = "https://github.com/amplab/datascience-sp14/raw/master/lab7/mldata/mnist-original.mat"
^
IndentationError: unexpected indent

from handson-ml.

ageron commented on May 3, 2024

Hi @anhduc2203 ,
That's an indentation error. Make sure this line is indented like the rest of the block it belongs to. This tutorial explains more. Perhaps you are using a mix of tabs and spaces. By default, Python considers tabs to be equivalent to 8 spaces, but most editors display them only about 4 spaces wide, so your code may be wrong even though it looks good. That's why I recommend using only spaces, no tabs. Find all the tabs and replace them with spaces. You can generally configure your editor to make it add spaces instead of tabs when when you type the Tab key.
Hope this helps,
Aurélien

from handson-ml.

Chapter 3 - MNIST data not available about handson-ml HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent