Giter Club home page Giter Club logo

Comments (11)

ageron avatar ageron commented on May 3, 2024 39

Ok, I just pushed a workaround in the notebooks that use fetch_mldata() (chapters 3, 7, 8 and 13).

Not very pretty, but it works... as long as github.com doesn't go down and amplab/datascience-sp14 does not remove the data.

from six.moves import urllib
from sklearn.datasets import fetch_mldata
try:
    mnist = fetch_mldata('MNIST original')
except urllib.error.HTTPError as ex:
    print("Could not download MNIST data from mldata.org, trying alternative...")

    # Alternative method to load MNIST, if mldata.org is down
    from scipy.io import loadmat
    mnist_alternative_url = "https://github.com/amplab/datascience-sp14/raw/master/lab7/mldata/mnist-original.mat"
    mnist_path = "./mnist-original.mat"
    response = urllib.request.urlopen(mnist_alternative_url)
    with open(mnist_path, "wb") as f:
        content = response.read()
        f.write(content)
    mnist_raw = loadmat(mnist_path)
    mnist = {
        "data": mnist_raw["data"].T,
        "target": mnist_raw["label"][0],
        "COL_NAMES": ["label", "data"],
        "DESCR": "mldata.org dataset: mnist-original",
    }
    print("Success!")

from handson-ml.

ageron avatar ageron commented on May 3, 2024 5

Alternatively, you can use TensorFlow:

>>> from tensorflow.examples.tutorials.mnist import input_data
>>> mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

from handson-ml.

gig67 avatar gig67 commented on May 3, 2024 3

Aurelien, yours is a great book and after reading it all I am now going through all the exercises.
So I got stuck here, as others.
The MNIST dataset is quite big at 50MB, so those like me on mobile data may struggle to download the file each time.
My suggestions would be to download the data just once into a local directory from https://github.com/amplab/datascience-sp14/raw/master/lab7/mldata/mnist-original.mat
and then use a shorter version of your code:

from scipy.io import loadmat
mnist_path = "my/local/path/mnist-original.mat" #the MNIST file has been previously downloaded here
mnist_raw = loadmat(mnist_path)
mnist = {
"data": mnist_raw["data"].T,
"target": mnist_raw["label"][0],
"COL_NAMES": ["label", "data"],
"DESCR": "mldata.org dataset: mnist-original",
}
print("Success!")

All the best
G

from handson-ml.

Jeronimo-GL avatar Jeronimo-GL commented on May 3, 2024 2

from handson-ml.

TomMeowMeow avatar TomMeowMeow commented on May 3, 2024 2

Hi all,

Thanks for the great book and the advice above. This is 2019 and this is still a problem. I would like to put my solution here so others might use it.

Aurelien, your solution above has some issues, as MAC would return an error "TimeoutError: [Errno 60] Operation timed out", because the website is down. I also agreed with gig67 that we do not need to download dataset every time unless we have to. Therefore, I simply combined them together here:

from six.moves import urllib
from sklearn.datasets import fetch_mldata

#try different solutions from here
#https://github.com/ageron/handson-ml/issues/7
#the try code below would return operation timed out as the web is down
#try to import data
    # Alternative method to load MNIST, if mldata.org is down
from scipy.io import loadmat
def data_down():
    mnist_alternative_url = "https://github.com/amplab/datascience-sp14/raw/master/lab7/mldata/mnist-original.mat"
    mnist_path = "./mnist-original.mat"
    response = urllib.request.urlopen(mnist_alternative_url)
    with open(mnist_path, "wb") as f:
        content = response.read()
        f.write(content)
    return mnist_path

#run this if it is the first time or you need to update
# mnist_path = data_down()

#since we already download the dataset, we could directly use them
mnist_path = "./mnist-original.mat"
mnist_raw = loadmat(mnist_path)
mnist = {
    "data": mnist_raw["data"].T,
    "target": mnist_raw["label"][0],
    "COL_NAMES": ["label", "data"],
    "DESCR": "mldata.org dataset: mnist-original",
    }
print("Success!")

So now we could download the data first time we use, and stick on it during the analysis, and update if needed in future.

from handson-ml.

ageron avatar ageron commented on May 3, 2024 1

Hi @TomGauss , thanks for your message. fetch_mldata() is now deprecated and it has been replaced with fetch_openml(). See my comments in #301 for more details.

from handson-ml.

ageron avatar ageron commented on May 3, 2024

Thanks for heads up, I'll look into this. In the meantime you can dowload MNIST in many places, for example:
http://yann.lecun.com/exdb/mnist/

from handson-ml.

ageron avatar ageron commented on May 3, 2024

I'm glad you enjoy it! :)

Apparently, the admins of mldata.org are working on fixing the problem. Hopefully, everything should be back to normal within the next few days.

from handson-ml.

ageron avatar ageron commented on May 3, 2024

Mmh, in fact after some investigation, it seems that mldata.org was managed by the European Union's PASCAL2 project, which was closed a couple years ago. The people in charge of these servers probably have some other jobs now, so it might take another few days. If this lasts too long, I'll update the book and the Jupyter notebooks.

from handson-ml.

anhduc2203 avatar anhduc2203 commented on May 3, 2024

I tries alternative but it announce error:
File "", line 2
mnist_alternative_url = "https://github.com/amplab/datascience-sp14/raw/master/lab7/mldata/mnist-original.mat"
^
IndentationError: unexpected indent

from handson-ml.

ageron avatar ageron commented on May 3, 2024

Hi @anhduc2203 ,
That's an indentation error. Make sure this line is indented like the rest of the block it belongs to. This tutorial explains more. Perhaps you are using a mix of tabs and spaces. By default, Python considers tabs to be equivalent to 8 spaces, but most editors display them only about 4 spaces wide, so your code may be wrong even though it looks good. That's why I recommend using only spaces, no tabs. Find all the tabs and replace them with spaces. You can generally configure your editor to make it add spaces instead of tabs when when you type the Tab key.
Hope this helps,
Aurélien

from handson-ml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.