You can reach me also here:
ORC-iD | 0000-0002-9050-9040 |
ResearchGate | Simon Kern |
@skjerns | |
Website | skjerns.de |
Load MATLAB 7.3 .mat files. I.e. load hdf5 into Python datatypes.
License: GNU General Public License v3.0
You can reach me also here:
ORC-iD | 0000-0002-9050-9040 |
ResearchGate | Simon Kern |
@skjerns | |
Website | skjerns.de |
Looking for input as to why the library strips null characters from the original stored character arrays. We are storing serialized encoded data into character array fields. We have found the offending line in the code which is manipulating the stored data:
https://github.com/skjerns/mat7.3/blob/master/mat73/core.py#L253
Is there a reason as to why the original data is being altered? Null characters being replaced with empty characters? If we were to remove this line, what impacts do you foresee?
Thank You.
Add a functionality to allow loading specific variables from the .mat file.
Firstly, thanks for making this package it is very handy.
I often work with data that has been stored as a matlab structure array. Currently, mat73 successfully loads this data which is great. However, it is loaded as a dict of lists requiring syntax such as variable = data["key"][element]
, whereas the matlab syntax would have been variable = data(element).key;
. The advantage of the latter is that I can easily isolate all variables of a specific element e.g. data_subset = data(element); variable1 = data_subset.key1; variable2 = data_subset.key2;
etc. Whereas in python, this would require construction of a new dict using a for loop over every key.
Similar functionality to matlab would be achieved by returning list of dict. Do you think it would be easy to modify mat73 to provide this as an option?
When I attempt to load a fairly vanilla .mat
, I get a TypeError. By generating some simple test cases with similar datatypes, it appears to be hanging up when I provide an empty matlab cell array.
data_dict = mat73.loadmat('test_dataset_nodatetime.mat', use_attrdict=True)
TypeError Traceback (most recent call last)
<ipython-input-16-97a7782477dc> in <module>
----> 1 data_dict = mat73.loadmat('test_dataset_nodatetime.mat', use_attrdict=True)
C:\<path>\lib\site-packages\mat73\__init__.py in loadmat(filename, use_attrdict, verbose)
207 try:
208 with h5py.File(filename, 'r') as hdf5:
--> 209 dictionary = decoder.mat2dict(hdf5)
210 return dictionary
211 except OSError:
C:\<path>\lib\site-packages\mat73\__init__.py in mat2dict(self, hdf5)
48 ext = os.path.splitext(hdf5.filename)[1].lower()
49 if ext.lower()=='.mat':
---> 50 d[var] = self.unpack_mat(hdf5[var])
51 elif ext=='.h5' or ext=='.hdf5':
52 err = 'Can only load .mat. Please use package hdfdict instead'\
C:\<path>\lib\site-packages\mat73\__init__.py in unpack_mat(self, hdf5, depth)
73 matlab_class = hdf5[key].attrs.get('MATLAB_class')
74 elem = hdf5[key]
---> 75 unpacked = self.unpack_mat(elem, depth=depth+1)
76 if matlab_class==b'struct' and len(elem)>1:
77
C:\<path>\lib\site-packages\mat73\__init__.py in unpack_mat(self, hdf5, depth)
104 return d
105 elif isinstance(hdf5, h5py._hl.dataset.Dataset):
--> 106 return self.convert_mat(hdf5)
107 else:
108 raise Exception(f'Unknown hdf5 type: {key}:{type(hdf5)}')
C:\<path>\lib\site-packages\mat73\__init__.py in convert_mat(self, dataset)
140 for ref in dataset:
141 row = []
--> 142 for r in ref:
143 entry = self.unpack_mat(self.refs.get(r))
144 row.append(entry)
TypeError: 'numpy.uint64' object is not iterable
The behavior is replicated with or without the use_attrdict
flag.
I can provide the file if that's helpful.
Great package, solved a lot of problems for me. SInce the matlab version 7.3 has been around a while now, did you consider implementing your changes upstream to scipy directly?
Hi! Thank you for useful fuction.
I'm using it with great appreciation.
Do you have any plans to create a savemat function?
Thanks.
This is a very valuable package, as it reads my multi-GB Matlab 7.3 files and converts all of my data types quickly. In my case I need to append the data and save back to .mat 7.3 format as well, and have resorted to using the hdf5storage library, which does not load the files as cleanly as mat7.3. I would like to request adding a savemat() function, as the author says that would be easy to implement. Thanks for creating this awesome tool!
Thank you for your great job.
But I met a problem when I was loading the mat73 lib as follow:
'''
only_include = [s if s[0]=='/' else f'/{s}' for s in only_include]
^
SyntaxError: invalid syntax
'''
I didn't change anything, do you know why this happen?
PS: I run it under windows system , anaconda ,and python==3.6
Hey, thanks for the library! It would be great if we could also save mats.
Hi
Thank you for your package. However, when llaoding a v7.3 mat file I got the following error:
<HDF5 dataset "Hf": shape (625, 1), type "|O"> is not a matlab type
If you could help me in saying what needs to be implemented, I am happy to do a PR.
Hi,
Thanks for you package, it really helps me for loading complex matlab structure in v 7.3.
Could you add a boolean argument for knowing if array variables have to be squeezed or not.
I already manage to do it with your code but I think this functionality could be of interest for the community.
If necessary, I can do a pull request.
Matfile can do that in matlab. I'm trying to open a large mat file with python, is it possible to implement such feature?
Hi, thanks for this very useful library!
Getting the following message while loading a mat file:
ERROR:root:ERROR: MATLAB type not supported: missing, (uint32)
I'm assuming this error is minor since I expect any such "missing" fields to translate to "None" values.
Is this assumption valid or completely off?
I do get error similar to #43 except that it reads ERROR:root:ERROR: not a MATLAB datatype
(see longer except below). The same files are properly loaded without any error by pymatreader. So They have figured how to read them, so these types likely are new additional non proprietary matlab datatypes or datatypes added by EIDORS Electric Impedance Tomography library.
The exact error is (repeats cut):
ERROR:root:ERROR: not a MATLAB datatype: <HDF5 dataset "data": shape (26,), type "<f8">, (float64)
ERROR:root:ERROR: not a MATLAB datatype: <HDF5 dataset "ir": shape (26,), type "<u8">, (uint64)
ERROR:root:ERROR: not a MATLAB datatype: <HDF5 dataset "jc": shape (17,), type "<u8">, (uint64)
/* cut as repeats of the above */
ERROR:root:ERROR: not a MATLAB datatype: <HDF5 dataset "data": shape (2,), type "<f8">, (float64)
ERROR:root:ERROR: not a MATLAB datatype: <HDF5 dataset "ir": shape (2,), type "<u8">, (uint64)
ERROR:root:ERROR: not a MATLAB datatype: <HDF5 dataset "jc": shape (2,), type "<u8">, (uint64)
/* cut as repeats of the above */
The exemplary files are
check_diff.zip
With some time at spare, i do not have, I would fork and submit a pull request fixing the up to date unknown data types. For now i do have to fallback to pymatreader
which is second best alternative for me as neither need mat < 3.7
support nor scipy.io.loadmat
backwards compatibility are required.
tested version is latest on PyPi (0.62)
master not tested
I've found some packages handling the .mat files. However, they all can not process the string datatype in Matlab. Because of the large data volume, I can't change the data in Matlab. So, I think if your tool can support more datatype, it'll be more critical.
I keep getting this error whenever I am trying load a mat file
core.py", line 321, in loadmat
with h5py.File(filename, 'r') as hdf5:
AttributeError: module 'h5py' has no attribute 'File'
import mat73
data_dict = mat73.loadmat(DATA_SET_PATH)
I get this error:
data type not supported: dataset, uint32
Hi Simon,
Great package, saved me loads of time while working with colleagues who love MATLAB.
I saw that you mentioned something about converting matlab datenums and were looking for method of converting them. If that still is the case, then I often use this function handle matlab datenums --> datetimes.
def matlab2datetime(matlab_datenum):
day = dt.datetime.fromordinal(int(matlab_datenum))
dayfrac = dt.timedelta(days=matlab_datenum%1) - dt.timedelta(days = 366)
return day + dayfrac
here's an example of how it is implemented:
times = [matlab2datetime(tval) for tval in data_dict['time']]
index = pd.to_datetime(times).round('S') # round it, sometimes the times go to weird decimal place
Hi! How's it going?
I attempted to load the .mat file available here like shown:
mat73.loadmat('atlas_index.mat', use_attrdict=True)
(The behavior is replicated with or without the use_attrdict flag.)
but it failed with the error:
ERROR:root:ERROR: MATLAB type not supported: struct, (uint64)
Traceback (most recent call last):
File "/home/afonso/anaconda3/envs/mgv/lib/python3.7/site-packages/mat73/__init__.py", line 226, in loadmat
dictionary = decoder.mat2dict(hdf5, only_load=only_load)
File "/home/afonso/anaconda3/envs/mgv/lib/python3.7/site-packages/mat73/__init__.py", line 52, in mat2dict
d[var] = self.unpack_mat(hdf5[var])
File "/home/afonso/anaconda3/envs/mgv/lib/python3.7/site-packages/mat73/__init__.py", line 77, in unpack_mat
unpacked = self.unpack_mat(elem, depth=depth+1)
File "/home/afonso/anaconda3/envs/mgv/lib/python3.7/site-packages/mat73/__init__.py", line 108, in unpack_mat
return self.convert_mat(hdf5)
File "/home/afonso/anaconda3/envs/mgv/lib/python3.7/site-packages/mat73/__init__.py", line 151, in convert_mat
entry = self.unpack_mat(self.refs.get(r))
File "/home/afonso/anaconda3/envs/mgv/lib/python3.7/site-packages/mat73/__init__.py", line 77, in unpack_mat
unpacked = self.unpack_mat(elem, depth=depth+1)
File "/home/afonso/anaconda3/envs/mgv/lib/python3.7/site-packages/mat73/__init__.py", line 77, in unpack_mat
unpacked = self.unpack_mat(elem, depth=depth+1)
File "/home/afonso/anaconda3/envs/mgv/lib/python3.7/site-packages/mat73/__init__.py", line 80, in unpack_mat
values = unpacked.values()
AttributeError: 'NoneType' object has no attribute 'values'
Versions:
mat73==0.50
numpy==1.20.3
Apologies in advance for using python language to describe matlab things.
I'm having trouble handling structs that have lists of structs in them. To use a spin on your example from documentation:
data_dict = mat73.loadmat('data.mat', use_attrdict=True)
struct = data_dict['structure'] # assuming a structure was saved in the .mat
If structure has 2 fields, 'strings': a list of strings, and 'subvar': a list of structs, this is currently returned by calling struct:
{'subvar': [{}, {}], 'strings': [None, None]}
My understanding is hdf5 sees subvar as full of region references. I've worked through this thus far (before I found this package) by:
f = h5py.File(data_dir, 'r')['struct']
ref_1 = f['subvar'][0,0]
ref_2 = f[ref_1]['subvarField'][0][0]
another_ref = f[ref_2]['subsubvarField']
so on and so forth... original code is in a repo I'm working on if this is unclear
Request for feature: It would be a nice addition if Matlab tables could be imported with mat73.
Matlab tables are similar to cells and can contain numerical and non-numerical data.
https://nl.mathworks.com/help/matlab/cell-arrays.html
https://nl.mathworks.com/help/matlab/ref/table.html
The following link may help to explain how tables can be converted:
https://stackoverflow.com/questions/63136526/h5py-issues-to-correctly-read-a-table-class-stored-in-matlab-mat-7-3
Thank you for developing mat73!
Peter
Hi,
I am attempting to load in a large MAT file (>3GB) which is >=v7.3. When I try and load in the file I get the following error
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
data_dict = mat73.loadmat("UB_daq.mat")
File "C:\Users\david\AppData\Local\Programs\Python\Python37\lib\site-packages\mat73\__init__.py", line 294, in loadmat
dictionary = decoder.mat2dict(hdf5)
File "C:\Users\david\AppData\Local\Programs\Python\Python37\lib\site-packages\mat73\__init__.py", line 90, in mat2dict
d[var] = self.unpack_mat(hdf5[var])
File "C:\Users\david\AppData\Local\Programs\Python\Python37\lib\site-packages\mat73\__init__.py", line 155, in unpack_mat
return self.convert_mat(hdf5, depth, MATLAB_class=MATLAB_class)
File "C:\Users\david\AppData\Local\Programs\Python\Python37\lib\site-packages\mat73\__init__.py", line 209, in convert_mat
entry = self.unpack_mat(self.refs.get(r), depth+1)
File "C:\Users\david\AppData\Local\Programs\Python\Python37\lib\site-packages\h5py\_hl\group.py", line 304, in get
return self[name]
File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "C:\Users\david\AppData\Local\Programs\Python\Python37\lib\site-packages\h5py\_hl\group.py", line 264, in __getitem__
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "C:\Users\david\AppData\Local\Programs\Python\Python37\lib\site-packages\h5py\_hl\base.py", line 137, in _e
name = name.encode('ascii')
AttributeError: 'numpy.ndarray' object has no attribute 'encode'
I suspect it might be related to the MATLAB_fields entry which stores the field names (attribute MATLAB_fields) as numpy arrays of bytes, but that's just a guess.
I can't send you the datafiles due to confidentiality restrictions.
I like your package very much. Especially, I found that it is capable of reading string data types stored in mat files without any problem, unlike the h5py package. But it is extremely slow during the loading stage. As an example, while the h5py package takes only 0.6 s to load a mat file, yours takes about 5 s to load a similar mat file. Could you fix this issue?
Describe the bug
Thank you for creating this library. I'm trying to load a mat file that seems to have some sort of dictionary in it as well:
namesToIds โ map from english label names to class IDs (with C key-value pairs)
I get the following error:
__init__ - ERROR: MATLAB type not supported: containers.Map, (uint32)
Do you have any plans on supporting that type?
update: It's not a pressing issue, as I just realized that another variable as the information I need to construct the map (dictionary) myself.
Provide sample file
From https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html:
http://horatio.cs.nyu.edu/mit/silberman/nyu_depth_v2/nyu_depth_v2_labeled.mat
when I use the mat73.loadmat to load my mat. It gives me the error.
ERROR:root:ERROR: not a MATLAB datatype: <HDF5 dataset "data": shape (46432426,), type "<f8">, (float64)
ERROR:root:ERROR: not a MATLAB datatype: <HDF5 dataset "ir": shape (46432426,), type "<u8">, (uint64)
ERROR:root:ERROR: not a MATLAB datatype: <HDF5 dataset "jc": shape (17488,), type "<u8">, (uint64)
Hi there. I am trying to load a Matlab file and something curious is happening, which leads me to a few questions. The file contains a Matlab struct with several fields containing cell values. These cells themselves have entries that are cells (or in some cases structs). When I try to load the file using
mat73.loadmat(data.m)
I receive the following error several times.
ERROR:root:ERROR: MATLAB type not supported: affine2d, (uint32)
As far as I know, affine2d is not a Matlab file type so I'm not sure what is going on here. Strangely, after all of these errors, a dictionary is still returned, albeit not quite in the format I would like. For example one of the field values is an 8x8 cell which contains a cell in each element and mat73.loadmat returns a list of lists as advertised. My first question is: what is the convention for unpacking cells into Python lists? Is it row-wise or column-wise? i.e. does the second element of my list correspond to the cell element (1,2) or (2,1)? Additionally, how can I suppress these errors if nothing is really going wrong?
My second question arose when I tried to load specific variables within this data file using
mat73.loadmat(data.m, only_include='variable')
In this case, the previous errors are not thrown, but the dictionary returned only contains None type elements.
Thanks in advance for taking the time!
Best,
Javan
Thanks for the great package. How feasible would it be to add your package as a new backend in xarrray? It would allow direct conversion to other formats. In my opinion, it is a pretty desired functionality.
As I understand, installation is only with pip?
I work in the conda/anaconda environment.
Do you plan to support this in the future?
thanks
-ken
Describe the bug: When selecting variables to load using only_include, entries of cell arrays are not loaded (the cell is included but every entry is 'None').
Provide sample file: test_mat73.zip
mat73.loadmat('test_mat73.mat')
: loads normally.
mat73.loadmat('test_mat73.mat', only_include='foo')
: results in [None, None]
I don't know much about HDF5 files, but I did some debugging and I think what's going on is that entries of a cell are references to variables stored elsewhere in the file, under /#refs#
. When converting the cell in mat73, the reference is dereferenced properly to get that variable, but when checking whether it should be included, its name is something starting with /#refs#/
rather than with the name of the parent variable. Thus it is excluded from loading at that stage.
Possible solutions:
is_included
for anything starting with /#refs#
. This will not cause the whole /#refs#
tree to be loaded at the top level, since there's already a check in mat2dict
that excludes it.If I were fixing this for myself I would submit it as a PR, but I took your advice and tried pymatreader, which seems to work correctly, so I'll just switch to that for now.
I get this error by executing this line:
derivedData = mat73.loadmat('some_mat_file.mat')
In this repo you have the license listed as GPL3. However, via pypi.org it is listed as having a MIT license. Which is the correct license for this project?
Describe the bug
The sparse matrix A
is not properly loaded
>>> data = mat73.loadmat('DataFull_128x45.mat')
{'A': {'data': None, 'ir': None, 'jc': None}, 'm': array([[0.04562512, 0.03787717, 0.04324344, ..., 0.03325841, 0.02450304,
0.03...3638235, 0.03459222,
0.03479418]]), 'normA': array(694.51590391)}
>>> h5py.File('DataFull_128x45.mat')["A"]["data"]
<HDF5 dataset "data": shape (8537187,), type "<f8">
Provide sample file
Taken from https://zenodo.org/records/1254210 see DataFull_128x45.mat
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.