Giter Club home page Giter Club logo

mat7.3's Introduction

mat7.3's People

Contributors

afonsobspinto avatar atspaeth avatar dawyd avatar felixbenning avatar skjerns avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

mat7.3's Issues

Parsing of character arrays invalidates stored data

Looking for input as to why the library strips null characters from the original stored character arrays. We are storing serialized encoded data into character array fields. We have found the offending line in the code which is manipulating the stored data:

https://github.com/skjerns/mat7.3/blob/master/mat73/core.py#L253

Is there a reason as to why the original data is being altered? Null characters being replaced with empty characters? If we were to remove this line, what impacts do you foresee?

Thank You.

structure array is converted to dict of lists rather than list of dict

Firstly, thanks for making this package it is very handy.

I often work with data that has been stored as a matlab structure array. Currently, mat73 successfully loads this data which is great. However, it is loaded as a dict of lists requiring syntax such as variable = data["key"][element], whereas the matlab syntax would have been variable = data(element).key;. The advantage of the latter is that I can easily isolate all variables of a specific element e.g. data_subset = data(element); variable1 = data_subset.key1; variable2 = data_subset.key2; etc. Whereas in python, this would require construction of a new dict using a for loop over every key.

Similar functionality to matlab would be achieved by returning list of dict. Do you think it would be easy to modify mat73 to provide this as an option?

TypeError loading empty cell array

When I attempt to load a fairly vanilla .mat, I get a TypeError. By generating some simple test cases with similar datatypes, it appears to be hanging up when I provide an empty matlab cell array.

Command

data_dict = mat73.loadmat('test_dataset_nodatetime.mat', use_attrdict=True)

Output

TypeError                                 Traceback (most recent call last)
<ipython-input-16-97a7782477dc> in <module>
----> 1 data_dict = mat73.loadmat('test_dataset_nodatetime.mat', use_attrdict=True)

C:\<path>\lib\site-packages\mat73\__init__.py in loadmat(filename, use_attrdict, verbose)
    207     try:
    208         with h5py.File(filename, 'r') as hdf5:
--> 209             dictionary = decoder.mat2dict(hdf5)
    210         return dictionary
    211     except OSError:

C:\<path>\lib\site-packages\mat73\__init__.py in mat2dict(self, hdf5)
     48             ext = os.path.splitext(hdf5.filename)[1].lower()
     49             if ext.lower()=='.mat':
---> 50                 d[var] = self.unpack_mat(hdf5[var])
     51             elif ext=='.h5' or ext=='.hdf5':
     52                 err = 'Can only load .mat. Please use package hdfdict instead'\

C:\<path>\lib\site-packages\mat73\__init__.py in unpack_mat(self, hdf5, depth)
     73                 matlab_class = hdf5[key].attrs.get('MATLAB_class')
     74                 elem   = hdf5[key]
---> 75                 unpacked = self.unpack_mat(elem, depth=depth+1)
     76                 if matlab_class==b'struct' and len(elem)>1:
     77 

C:\<path>\lib\site-packages\mat73\__init__.py in unpack_mat(self, hdf5, depth)
    104             return d
    105         elif isinstance(hdf5, h5py._hl.dataset.Dataset):
--> 106             return self.convert_mat(hdf5)
    107         else:
    108             raise Exception(f'Unknown hdf5 type: {key}:{type(hdf5)}')

C:\<path>\lib\site-packages\mat73\__init__.py in convert_mat(self, dataset)
    140             for ref in dataset:
    141                 row = []
--> 142                 for r in ref:
    143                     entry = self.unpack_mat(self.refs.get(r))
    144                     row.append(entry)

TypeError: 'numpy.uint64' object is not iterable

The behavior is replicated with or without the use_attrdict flag.
I can provide the file if that's helpful.

Implement upstream

Great package, solved a lot of problems for me. SInce the matlab version 7.3 has been around a while now, did you consider implementing your changes upstream to scipy directly?

TypeError in Python 3.8

Hi, thanks for the package!

Just flagging that using Python 3.8, loading a v7.3 .mat file raised the following error for me:

TypeError: only integer scalar arrays can be converted to a scalar index

The structure of the .mat file is displayed here:
Screen Shot 2020-08-03 at 1 19 04 pm

Worked like a charm in Python 3.7.4 though!

savemat is possible ?

Hi! Thank you for useful fuction.

I'm using it with great appreciation.
Do you have any plans to create a savemat function?

Thanks.

savemat() would be a nice addition to this package

This is a very valuable package, as it reads my multi-GB Matlab 7.3 files and converts all of my data types quickly. In my case I need to append the data and save back to .mat 7.3 format as well, and have resorted to using the hdf5storage library, which does not load the files as cleanly as mat7.3. I would like to request adding a savemat() function, as the author says that would be easy to implement. Thanks for creating this awesome tool!

SyntaxError: invalid syntax

Thank you for your great job.

But I met a problem when I was loading the mat73 lib as follow:
'''
only_include = [s if s[0]=='/' else f'/{s}' for s in only_include]
^
SyntaxError: invalid syntax
'''

I didn't change anything, do you know why this happen?

PS: I run it under windows system , anaconda ,and python==3.6

savemat

Hey, thanks for the library! It would be great if we could also save mats.

<HDF5 dataset ..., type "|O"> is not a matlab type

Hi
Thank you for your package. However, when llaoding a v7.3 mat file I got the following error:

<HDF5 dataset "Hf": shape (625, 1), type "|O"> is not a matlab type

If you could help me in saying what needs to be implemented, I am happy to do a PR.

Add unsqueeze functionality

Hi,
Thanks for you package, it really helps me for loading complex matlab structure in v 7.3.
Could you add a boolean argument for knowing if array variables have to be squeezed or not.
I already manage to do it with your code but I think this functionality could be of interest for the community.
If necessary, I can do a pull request.

MATLAB type not supported: missing, (uint32)

Hi, thanks for this very useful library!

Getting the following message while loading a mat file:
ERROR:root:ERROR: MATLAB type not supported: missing, (uint32)

I'm assuming this error is minor since I expect any such "missing" fields to translate to "None" values.
Is this assumption valid or completely off?

ERROR:root:ERROR: not a MATLAB datatype

I do get error similar to #43 except that it reads ERROR:root:ERROR: not a MATLAB datatype (see longer except below). The same files are properly loaded without any error by pymatreader. So They have figured how to read them, so these types likely are new additional non proprietary matlab datatypes or datatypes added by EIDORS Electric Impedance Tomography library.

The exact error is (repeats cut):

ERROR:root:ERROR: not a MATLAB datatype: <HDF5 dataset "data": shape (26,), type "<f8">, (float64)
ERROR:root:ERROR: not a MATLAB datatype: <HDF5 dataset "ir": shape (26,), type "<u8">, (uint64)
ERROR:root:ERROR: not a MATLAB datatype: <HDF5 dataset "jc": shape (17,), type "<u8">, (uint64)

/* cut as repeats of the above */

ERROR:root:ERROR: not a MATLAB datatype: <HDF5 dataset "data": shape (2,), type "<f8">, (float64)
ERROR:root:ERROR: not a MATLAB datatype: <HDF5 dataset "ir": shape (2,), type "<u8">, (uint64)
ERROR:root:ERROR: not a MATLAB datatype: <HDF5 dataset "jc": shape (2,), type "<u8">, (uint64)

/* cut as repeats of the above */

The exemplary files are
check_diff.zip

With some time at spare, i do not have, I would fork and submit a pull request fixing the up to date unknown data types. For now i do have to fallback to pymatreader which is second best alternative for me as neither need mat < 3.7 support nor scipy.io.loadmat backwards compatibility are required.

tested version is latest on PyPi (0.62)
master not tested

matlab string not supported

I've found some packages handling the .mat files. However, they all can not process the string datatype in Matlab. Because of the large data volume, I can't change the data in Matlab. So, I think if your tool can support more datatype, it'll be more critical.

Implement MATLAB datenums to datetime

Hi Simon,

Great package, saved me loads of time while working with colleagues who love MATLAB.

I saw that you mentioned something about converting matlab datenums and were looking for method of converting them. If that still is the case, then I often use this function handle matlab datenums --> datetimes.

def matlab2datetime(matlab_datenum):
    day = dt.datetime.fromordinal(int(matlab_datenum))
    dayfrac = dt.timedelta(days=matlab_datenum%1) - dt.timedelta(days = 366)
    return day + dayfrac

here's an example of how it is implemented:

times = [matlab2datetime(tval) for tval in data_dict['time']]
index = pd.to_datetime(times).round('S') # round it, sometimes the times go to weird decimal place

ERROR: MATLAB type not supported: struct, (uint64)

Hi! How's it going?
I attempted to load the .mat file available here like shown:

mat73.loadmat('atlas_index.mat', use_attrdict=True)

(The behavior is replicated with or without the use_attrdict flag.)

but it failed with the error:

ERROR:root:ERROR: MATLAB type not supported: struct, (uint64)
Traceback (most recent call last):
  File "/home/afonso/anaconda3/envs/mgv/lib/python3.7/site-packages/mat73/__init__.py", line 226, in loadmat
    dictionary = decoder.mat2dict(hdf5, only_load=only_load)
  File "/home/afonso/anaconda3/envs/mgv/lib/python3.7/site-packages/mat73/__init__.py", line 52, in mat2dict
    d[var] = self.unpack_mat(hdf5[var])
  File "/home/afonso/anaconda3/envs/mgv/lib/python3.7/site-packages/mat73/__init__.py", line 77, in unpack_mat
    unpacked = self.unpack_mat(elem, depth=depth+1)
  File "/home/afonso/anaconda3/envs/mgv/lib/python3.7/site-packages/mat73/__init__.py", line 108, in unpack_mat
    return self.convert_mat(hdf5)
  File "/home/afonso/anaconda3/envs/mgv/lib/python3.7/site-packages/mat73/__init__.py", line 151, in convert_mat
    entry = self.unpack_mat(self.refs.get(r))
  File "/home/afonso/anaconda3/envs/mgv/lib/python3.7/site-packages/mat73/__init__.py", line 77, in unpack_mat
    unpacked = self.unpack_mat(elem, depth=depth+1)
  File "/home/afonso/anaconda3/envs/mgv/lib/python3.7/site-packages/mat73/__init__.py", line 77, in unpack_mat
    unpacked = self.unpack_mat(elem, depth=depth+1)
  File "/home/afonso/anaconda3/envs/mgv/lib/python3.7/site-packages/mat73/__init__.py", line 80, in unpack_mat
    values = unpacked.values()
AttributeError: 'NoneType' object has no attribute 'values'

Versions:
mat73==0.50
numpy==1.20.3

Support for region references

Apologies in advance for using python language to describe matlab things.

I'm having trouble handling structs that have lists of structs in them. To use a spin on your example from documentation:

data_dict = mat73.loadmat('data.mat', use_attrdict=True) 
struct = data_dict['structure'] # assuming a structure was saved in the .mat

If structure has 2 fields, 'strings': a list of strings, and 'subvar': a list of structs, this is currently returned by calling struct:

{'subvar': [{}, {}], 'strings': [None, None]}

My understanding is hdf5 sees subvar as full of region references. I've worked through this thus far (before I found this package) by:

f = h5py.File(data_dir, 'r')['struct']
ref_1 = f['subvar'][0,0]
ref_2 = f[ref_1]['subvarField'][0][0] 
another_ref = f[ref_2]['subsubvarField']

so on and so forth... original code is in a repo I'm working on if this is unclear

Load Matlab tables

Request for feature: It would be a nice addition if Matlab tables could be imported with mat73.

Matlab tables are similar to cells and can contain numerical and non-numerical data.
https://nl.mathworks.com/help/matlab/cell-arrays.html
https://nl.mathworks.com/help/matlab/ref/table.html

The following link may help to explain how tables can be converted:
https://stackoverflow.com/questions/63136526/h5py-issues-to-correctly-read-a-table-class-stored-in-matlab-mat-7-3

Thank you for developing mat73!
Peter

AttributeError: 'numpy.ndarray' object has no attribute 'encode'

Hi,
I am attempting to load in a large MAT file (>3GB) which is >=v7.3. When I try and load in the file I get the following error

Traceback (most recent call last):
  File "<pyshell#3>", line 1, in <module>
    data_dict = mat73.loadmat("UB_daq.mat")
  File "C:\Users\david\AppData\Local\Programs\Python\Python37\lib\site-packages\mat73\__init__.py", line 294, in loadmat
    dictionary = decoder.mat2dict(hdf5)
  File "C:\Users\david\AppData\Local\Programs\Python\Python37\lib\site-packages\mat73\__init__.py", line 90, in mat2dict
    d[var] = self.unpack_mat(hdf5[var])
  File "C:\Users\david\AppData\Local\Programs\Python\Python37\lib\site-packages\mat73\__init__.py", line 155, in unpack_mat
    return self.convert_mat(hdf5, depth, MATLAB_class=MATLAB_class)
  File "C:\Users\david\AppData\Local\Programs\Python\Python37\lib\site-packages\mat73\__init__.py", line 209, in convert_mat
    entry = self.unpack_mat(self.refs.get(r), depth+1)
  File "C:\Users\david\AppData\Local\Programs\Python\Python37\lib\site-packages\h5py\_hl\group.py", line 304, in get
    return self[name]
  File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "C:\Users\david\AppData\Local\Programs\Python\Python37\lib\site-packages\h5py\_hl\group.py", line 264, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "C:\Users\david\AppData\Local\Programs\Python\Python37\lib\site-packages\h5py\_hl\base.py", line 137, in _e
    name = name.encode('ascii')
AttributeError: 'numpy.ndarray' object has no attribute 'encode'

I suspect it might be related to the MATLAB_fields entry which stores the field names (attribute MATLAB_fields) as numpy arrays of bytes, but that's just a guess.

I can't send you the datafiles due to confidentiality restrictions.

It is extremely slow during loading files

I like your package very much. Especially, I found that it is capable of reading string data types stored in mat files without any problem, unlike the h5py package. But it is extremely slow during the loading stage. As an example, while the h5py package takes only 0.6 s to load a mat file, yours takes about 5 s to load a similar mat file. Could you fix this issue?

support for containers.Map

Describe the bug

Thank you for creating this library. I'm trying to load a mat file that seems to have some sort of dictionary in it as well:
namesToIds โ€“ map from english label names to class IDs (with C key-value pairs)

I get the following error:
__init__ - ERROR: MATLAB type not supported: containers.Map, (uint32)

Do you have any plans on supporting that type?

update: It's not a pressing issue, as I just realized that another variable as the information I need to construct the map (dictionary) myself.

Provide sample file

From https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html:
http://horatio.cs.nyu.edu/mit/silberman/nyu_depth_v2/nyu_depth_v2_labeled.mat

not a MATLAB datatype

when I use the mat73.loadmat to load my mat. It gives me the error.

ERROR:root:ERROR: not a MATLAB datatype: <HDF5 dataset "data": shape (46432426,), type "<f8">, (float64)
ERROR:root:ERROR: not a MATLAB datatype: <HDF5 dataset "ir": shape (46432426,), type "<u8">, (uint64)
ERROR:root:ERROR: not a MATLAB datatype: <HDF5 dataset "jc": shape (17488,), type "<u8">, (uint64)

ERROR:root:ERROR: MATLAB type not supported: affine2d, (uint32)

Hi there. I am trying to load a Matlab file and something curious is happening, which leads me to a few questions. The file contains a Matlab struct with several fields containing cell values. These cells themselves have entries that are cells (or in some cases structs). When I try to load the file using
mat73.loadmat(data.m)
I receive the following error several times.
ERROR:root:ERROR: MATLAB type not supported: affine2d, (uint32)
As far as I know, affine2d is not a Matlab file type so I'm not sure what is going on here. Strangely, after all of these errors, a dictionary is still returned, albeit not quite in the format I would like. For example one of the field values is an 8x8 cell which contains a cell in each element and mat73.loadmat returns a list of lists as advertised. My first question is: what is the convention for unpacking cells into Python lists? Is it row-wise or column-wise? i.e. does the second element of my list correspond to the cell element (1,2) or (2,1)? Additionally, how can I suppress these errors if nothing is really going wrong?

My second question arose when I tried to load specific variables within this data file using
mat73.loadmat(data.m, only_include='variable')
In this case, the previous errors are not thrown, but the dictionary returned only contains None type elements.

Thanks in advance for taking the time!

Best,
Javan

support for anaconda/conda?

As I understand, installation is only with pip?
I work in the conda/anaconda environment.
Do you plan to support this in the future?
thanks
-ken

Elements of cell arrays are not loaded when using 'only_include'

Describe the bug: When selecting variables to load using only_include, entries of cell arrays are not loaded (the cell is included but every entry is 'None').

Provide sample file: test_mat73.zip

mat73.loadmat('test_mat73.mat'): loads normally.
mat73.loadmat('test_mat73.mat', only_include='foo'): results in [None, None]

I don't know much about HDF5 files, but I did some debugging and I think what's going on is that entries of a cell are references to variables stored elsewhere in the file, under /#refs#. When converting the cell in mat73, the reference is dereferenced properly to get that variable, but when checking whether it should be included, its name is something starting with /#refs#/ rather than with the name of the parent variable. Thus it is excluded from loading at that stage.

Possible solutions:

  • make a flag to skip the check for inclusion when unpacking a variable, and set it when dereferencing into a cell
  • simpler: just return True in is_included for anything starting with /#refs#. This will not cause the whole /#refs# tree to be loaded at the top level, since there's already a check in mat2dict that excludes it.

If I were fixing this for myself I would submit it as a PR, but I took your advice and tried pymatreader, which seems to work correctly, so I'll just switch to that for now.

Does not load sparse matrix correctly

Describe the bug

The sparse matrix A is not properly loaded

>>> data = mat73.loadmat('DataFull_128x45.mat')
{'A': {'data': None, 'ir': None, 'jc': None}, 'm': array([[0.04562512, 0.03787717, 0.04324344, ..., 0.03325841, 0.02450304,
        0.03...3638235, 0.03459222,
        0.03479418]]), 'normA': array(694.51590391)}
>>> h5py.File('DataFull_128x45.mat')["A"]["data"]
<HDF5 dataset "data": shape (8537187,), type "<f8">

Provide sample file

Taken from https://zenodo.org/records/1254210 see DataFull_128x45.mat

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.