Comments (11)
Ok, I just pushed a workaround in the notebooks that use fetch_mldata()
(chapters 3, 7, 8 and 13).
Not very pretty, but it works... as long as github.com doesn't go down and amplab/datascience-sp14 does not remove the data.
from six.moves import urllib
from sklearn.datasets import fetch_mldata
try:
mnist = fetch_mldata('MNIST original')
except urllib.error.HTTPError as ex:
print("Could not download MNIST data from mldata.org, trying alternative...")
# Alternative method to load MNIST, if mldata.org is down
from scipy.io import loadmat
mnist_alternative_url = "https://github.com/amplab/datascience-sp14/raw/master/lab7/mldata/mnist-original.mat"
mnist_path = "./mnist-original.mat"
response = urllib.request.urlopen(mnist_alternative_url)
with open(mnist_path, "wb") as f:
content = response.read()
f.write(content)
mnist_raw = loadmat(mnist_path)
mnist = {
"data": mnist_raw["data"].T,
"target": mnist_raw["label"][0],
"COL_NAMES": ["label", "data"],
"DESCR": "mldata.org dataset: mnist-original",
}
print("Success!")
from handson-ml.
Alternatively, you can use TensorFlow:
>>> from tensorflow.examples.tutorials.mnist import input_data
>>> mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
from handson-ml.
Aurelien, yours is a great book and after reading it all I am now going through all the exercises.
So I got stuck here, as others.
The MNIST dataset is quite big at 50MB, so those like me on mobile data may struggle to download the file each time.
My suggestions would be to download the data just once into a local directory from https://github.com/amplab/datascience-sp14/raw/master/lab7/mldata/mnist-original.mat
and then use a shorter version of your code:
from scipy.io import loadmat
mnist_path = "my/local/path/mnist-original.mat" #the MNIST file has been previously downloaded here
mnist_raw = loadmat(mnist_path)
mnist = {
"data": mnist_raw["data"].T,
"target": mnist_raw["label"][0],
"COL_NAMES": ["label", "data"],
"DESCR": "mldata.org dataset: mnist-original",
}
print("Success!")
All the best
G
from handson-ml.
from handson-ml.
Hi all,
Thanks for the great book and the advice above. This is 2019 and this is still a problem. I would like to put my solution here so others might use it.
Aurelien, your solution above has some issues, as MAC would return an error "TimeoutError: [Errno 60] Operation timed out", because the website is down. I also agreed with gig67 that we do not need to download dataset every time unless we have to. Therefore, I simply combined them together here:
from six.moves import urllib
from sklearn.datasets import fetch_mldata
#try different solutions from here
#https://github.com/ageron/handson-ml/issues/7
#the try code below would return operation timed out as the web is down
#try to import data
# Alternative method to load MNIST, if mldata.org is down
from scipy.io import loadmat
def data_down():
mnist_alternative_url = "https://github.com/amplab/datascience-sp14/raw/master/lab7/mldata/mnist-original.mat"
mnist_path = "./mnist-original.mat"
response = urllib.request.urlopen(mnist_alternative_url)
with open(mnist_path, "wb") as f:
content = response.read()
f.write(content)
return mnist_path
#run this if it is the first time or you need to update
# mnist_path = data_down()
#since we already download the dataset, we could directly use them
mnist_path = "./mnist-original.mat"
mnist_raw = loadmat(mnist_path)
mnist = {
"data": mnist_raw["data"].T,
"target": mnist_raw["label"][0],
"COL_NAMES": ["label", "data"],
"DESCR": "mldata.org dataset: mnist-original",
}
print("Success!")
So now we could download the data first time we use, and stick on it during the analysis, and update if needed in future.
from handson-ml.
Hi @TomGauss , thanks for your message. fetch_mldata()
is now deprecated and it has been replaced with fetch_openml()
. See my comments in #301 for more details.
from handson-ml.
Thanks for heads up, I'll look into this. In the meantime you can dowload MNIST in many places, for example:
http://yann.lecun.com/exdb/mnist/
from handson-ml.
I'm glad you enjoy it! :)
Apparently, the admins of mldata.org are working on fixing the problem. Hopefully, everything should be back to normal within the next few days.
from handson-ml.
Mmh, in fact after some investigation, it seems that mldata.org was managed by the European Union's PASCAL2 project, which was closed a couple years ago. The people in charge of these servers probably have some other jobs now, so it might take another few days. If this lasts too long, I'll update the book and the Jupyter notebooks.
from handson-ml.
I tries alternative but it announce error:
File "", line 2
mnist_alternative_url = "https://github.com/amplab/datascience-sp14/raw/master/lab7/mldata/mnist-original.mat"
^
IndentationError: unexpected indent
from handson-ml.
Hi @anhduc2203 ,
That's an indentation error. Make sure this line is indented like the rest of the block it belongs to. This tutorial explains more. Perhaps you are using a mix of tabs and spaces. By default, Python considers tabs to be equivalent to 8 spaces, but most editors display them only about 4 spaces wide, so your code may be wrong even though it looks good. That's why I recommend using only spaces, no tabs. Find all the tabs and replace them with spaces. You can generally configure your editor to make it add spaces instead of tabs when when you type the Tab key.
Hope this helps,
Aurélien
from handson-ml.
Related Issues (20)
- mnist dataset HOT 2
- Chapter#02 FileNotFoundError HOT 1
- Chapter 2 error during prediction HOT 2
- Ml
- Dropout at test time HOT 3
- How can I use my own dataset and fit it to your code
- Need help understanding crc hash used to explain test train split in Chapter 2 HOT 1
- ImportError: cannot import name 'fetch_mldata' from 'sklearn.datasets' (F:\Anaconda3\lib\site-packages\sklearn\datasets\__init__.py) HOT 1
- Chapter 3 : Exercise 1 - MNIST Classifier with 97% accuracy - Could not pickle the task to send it to the workers. HOT 3
- Broken image in readme HOT 1
- Chapter 5 SVM why should center before LinearSVC
- Chapter 3 (Page 82): Getting error during Fitting the SGD Classifier with Training data
- Chapter 2: Value differences in prediction
- Chapter 2: Looking for Correlations - ValueError: could not convert string to float: 'INLAND' HOT 1
- Use github.com/apssouza22/chatflow as a conversational layer. It would enable actual API requests to be carried out from natural language inputs.
- chapter 4: SGDRegressor(tol=-np.infty) is not accepted by the module HOT 1
- Hi
- Ch.2 Error using corr() HOT 1
- Problem downloading data HOT 1
- Why does saving the test set not work?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from handson-ml.