sorki / python-mnist Goto Github PK
View Code? Open in Web Editor NEWSimple MNIST data parser written in Python
License: Other
Simple MNIST data parser written in Python
License: Other
I am trying to start a program as shown in Getting Started but face with this error:
`Traceback (most recent call last):
File "C:/Users/Stanislav/Desktop/Master/MLKR/Test.py", line 3, in
mndata.load_training()
File "C:\Anaconda2\lib\site-packages\mnist\loader.py", line 33, in load_training
os.path.join(self.path, self.train_lbl_fname))
File "C:\Anaconda2\lib\site-packages\mnist\loader.py", line 42, in load
with open(path_lbl, 'rb') as file:
IOError: [Errno 13] Permission denied: 'data/train-labels-idx1-ubyte'`
Would it make sense to return Numpy arrays? This would be very convenient for me and would speed up the processing modes I added with the pull request. Drawback would be the introduction of an additional library dependency. Anyways, I guess that nowadays every Python developer working with MNIST does have Numpy installed. Numpy could also be imported only on-demand (with an if statement), in order to avoid the mandatory dependency. Thoughts?
Using get_data.sh to wget then gunzip delivers the required format e.g. t10k-images-idx3-ubyte, etc. However going to http://yann.lecun.com/exdb/mnist/ and downloading the .gz files, then using say 7-zip open archive gives t10k-images.idx3-ubyte etc, which natrually throws a file not found execption. The workround is of course to rename to t10k-images-idx3-ubyte, etc. It would be useful to add this warning to the README. Thanks,
there's https://pypi.python.org/pypi/mnist/ and https://pypi.python.org/pypi/python-mnist/
Seems like a bit of duplication of effort? How about merging the projects?
This will enable that the executables become available when installed. The get_data.sh
can also be moved there too. Although I suspect giving it a better name as get_data.sh
is too general.
Hi,
GPL is quite restrictive. My own libraries are BSD2, not compatiable. Please can you consider relicensing as BSD2?
hi there, great package and thanks for your contribution.
only one tiny suggestion
In the last lines of readme file, perhaps it is better suggesting
**
images, labels = mndata.load_training() # list of images and labels
**
instead of
**
mndata.load_training() # the first time I run it, my screen went crazy with numbers jumping around.
**
Also comment out the data type of each one.
again, thank you!
Please include this package in the Conda repo- I'm using this with PyTorch, and PyTorch requires Conda when built from source.
Hey,
I've tried this library with EMNIST Dataset. It works quite nice with one issue, loader for testing data returns 1 for every class. I know, that this implementation has been made only for MNIST, but anyway it could be a really nice improvement for this library. Please consider this and let me know, maybe I can help with implementation ;)
Good job anyway!
If I load the full data with result_type of 'numpy', it takes over 4 seconds. I can get it down to around half a second with the following approach, which I'm not going to make into a pull request because it would require refactoring the surrounding class. Just thinking it'd be nice to share what I found.
In mnist/loader.py, the load
method, instead of images
, return np.array(image_data).reshape((size, rows*cols))
(and don't create the images
list). In other words, instead of creating thousands of small numpy arrays, create just one array with one more dimension, directly on the byte array read from the file.
from mnist import MNIST
when executing this line it shows
cannot import name 'MNIST'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.