asdf-format / asdf Goto Github PK

View Code? Open in Web Editor NEW

509.0 509.0 55.0 6.43 MB

ASDF (Advanced Scientific Data Format) is a next generation interchange format for scientific data

Home Page: http://asdf.readthedocs.io/

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

advanced-scientific-data-format asdf astronomy astropy jwst

asdf's People

Contributors

Stargazers

Watchers

asdf's Issues

Warn when encountering unknown tags

Not sure if we do this by default, but it would be useful.

overwriting asdf files

This isn't urgent, I am simply reporting it so it's not forgotten.
When attempting to overwrite an asdf file on Windows I get an error:

In [17]: f1.write_to('foc2sky.asdf')
---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)
<ipython-input-17-1418e770a532> in <module>()
----> 1 f1.write_to('foc2sky.asdf')

C:\Anaconda\envs\gwcs\lib\site-packages\pyasdf-0.0.dev417-py2.7.egg\pyasdf\asdf.
pyc in write_to(self, fd, all_array_storage, all_array_compression, auto_inline,
 pad_blocks)
    586         original_fd = self._fd
    587
--> 588         self._fd = fd = generic_io.get_file(fd, mode='w')
    589
    590         self._pre_write(fd, all_array_storage, all_array_compression,

C:\Anaconda\envs\gwcs\lib\site-packages\pyasdf-0.0.dev417-py2.7.egg\pyasdf\gener
ic_io.pyc in get_file(init, mode, uri)
   1033             realpath = url2pathname(parsed.path)
   1034             return RealFile(
-> 1035                 open(realpath, realmode), mode, close=True,
   1036                 uri=uri)
   1037

IOError: [Errno 22] invalid mode ('wb') or filename: 'foc2sky.asdf'

On Linux the file is silently overwritten. Perhaps add a clobber argument to write_to similar to fits.

Reconsitute FITS files correctly

Right now, it can only handle Image HDUs, but it should determine which HDUs to create based on the header content.

Python 3 leaves uncollected objects

Known issue with jsonschema. Upstream fix here: python-jsonschema/jsonschema#154

docs building error with sphinx 1.3.1

Building the HTML documentation with sphinx 1.3.1 and the latest astropy-helpers fails with the following error:

# Sphinx version: 1.3.1
# Python version: 3.4.3 (CPython)
# Docutils version: 0.12 release
# Jinja2 version: 2.7.3
# Last messages:
#   11 added, 0 changed, 0 removed
#   reading sources... [  9%] api/pyasdf.AsdfExtension
#   reading sources... [ 18%] api/pyasdf.AsdfFile
#   reading sources... [ 27%] api/pyasdf.AsdfType
#   reading sources... [ 36%] api/pyasdf.Stream
#   reading sources... [ 45%] api/pyasdf.ValidationError
#   reading sources... [ 54%] api/pyasdf.fits_embed.AsdfInFits
#   reading sources... [ 63%] api/pyasdf.open
#   reading sources... [ 72%] api/pyasdf.test
#   reading sources... [ 81%] index
# Loaded extensions:
#   sphinx.ext.autosummary (1.3.1) from /usr/lib/python3.4/site-packages/sphinx/ext/autosummary/__init__.py
#   sphinx.ext.pngmath (1.3.1) from /usr/lib/python3.4/site-packages/sphinx/ext/pngmath.py
#   astropy_helpers.sphinx.ext.changelog_links (unknown version) from /home/mdevalbo/.local/lib/python3.4/site-packages/astropy_helpers-1.1.dev427-py3.4.egg/astropy_helpers/sphinx/ext/changelog_links.py
#   sphinx.ext.autodoc (1.3.1) from /usr/lib/python3.4/site-packages/sphinx/ext/autodoc.py
#   sphinx.ext.coverage (1.3.1) from /usr/lib/python3.4/site-packages/sphinx/ext/coverage.py
#   sphinx.ext.inheritance_diagram (1.3.1) from /usr/lib/python3.4/site-packages/sphinx/ext/inheritance_diagram.py
#   astropy_helpers.sphinx.ext.doctest (unknown version) from /home/mdevalbo/.local/lib/python3.4/site-packages/astropy_helpers-1.1.dev427-py3.4.egg/astropy_helpers/sphinx/ext/doctest.py
#   astropy_helpers.sphinx.ext.autodoc_enhancements (unknown version) from /home/mdevalbo/.local/lib/python3.4/site-packages/astropy_helpers-1.1.dev427-py3.4.egg/astropy_helpers/sphinx/ext/autodoc_enhancements.py
#   example (unknown version) from sphinxext/example.py
#   astropy_helpers.sphinx.ext.viewcode (unknown version) from /home/mdevalbo/.local/lib/python3.4/site-packages/astropy_helpers-1.1.dev427-py3.4.egg/astropy_helpers/sphinx/ext/viewcode.py
#   astropy_helpers.sphinx.ext.automodsumm (unknown version) from /home/mdevalbo/.local/lib/python3.4/site-packages/astropy_helpers-1.1.dev427-py3.4.egg/astropy_helpers/sphinx/ext/automodsumm.py
#   astropy_helpers.sphinx.ext.numpydoc (unknown version) from /home/mdevalbo/.local/lib/python3.4/site-packages/astropy_helpers-1.1.dev427-py3.4.egg/astropy_helpers/sphinx/ext/numpydoc.py
#   astropy_helpers.sphinx.ext.astropyautosummary (unknown version) from /home/mdevalbo/.local/lib/python3.4/site-packages/astropy_helpers-1.1.dev427-py3.4.egg/astropy_helpers/sphinx/ext/astropyautosummary.py
#   sphinx.ext.graphviz (1.3.1) from /usr/lib/python3.4/site-packages/sphinx/ext/graphviz.py
#   matplotlib.sphinxext.plot_directive (unknown version) from /usr/lib/python3.4/site-packages/matplotlib/sphinxext/plot_directive.py
#   astropy_helpers.sphinx.ext.smart_resolver (unknown version) from /home/mdevalbo/.local/lib/python3.4/site-packages/astropy_helpers-1.1.dev427-py3.4.egg/astropy_helpers/sphinx/ext/smart_resolver.py
#   astropy_helpers.sphinx.ext.tocdepthfix (unknown version) from /home/mdevalbo/.local/lib/python3.4/site-packages/astropy_helpers-1.1.dev427-py3.4.egg/astropy_helpers/sphinx/ext/tocdepthfix.py
#   astropy_helpers.sphinx.ext.automodapi (unknown version) from /home/mdevalbo/.local/lib/python3.4/site-packages/astropy_helpers-1.1.dev427-py3.4.egg/astropy_helpers/sphinx/ext/automodapi.py
#   sphinx.ext.todo (1.3.1) from /usr/lib/python3.4/site-packages/sphinx/ext/todo.py
#   alabaster (0.7.4) from /usr/lib/python3.4/site-packages/alabaster/__init__.py
#   sphinx.ext.intersphinx (1.3.1) from /usr/lib/python3.4/site-packages/sphinx/ext/intersphinx.py
Traceback (most recent call last):
  File "/usr/lib/python3.4/site-packages/sphinx/cmdline.py", line 245, in main
    app.build(opts.force_all, filenames)
  File "/usr/lib/python3.4/site-packages/sphinx/application.py", line 264, in build
    self.builder.build_update()
  File "/usr/lib/python3.4/site-packages/sphinx/builders/__init__.py", line 245, in build_update
    'out of date' % len(to_build))
  File "/usr/lib/python3.4/site-packages/sphinx/builders/__init__.py", line 259, in build
    self.doctreedir, self.app))
  File "/usr/lib/python3.4/site-packages/sphinx/environment.py", line 618, in update
    self._read_serial(docnames, app)
  File "/usr/lib/python3.4/site-packages/sphinx/environment.py", line 638, in _read_serial
    self.read_doc(docname, app)
  File "/usr/lib/python3.4/site-packages/sphinx/environment.py", line 791, in read_doc
    pub.publish()
  File "/usr/lib/python3.4/site-packages/docutils/core.py", line 218, in publish
    self.apply_transforms()
  File "/usr/lib/python3.4/site-packages/docutils/core.py", line 199, in apply_transforms
    self.document.transformer.apply_transforms()
  File "/usr/lib/python3.4/site-packages/docutils/transforms/__init__.py", line 171, in apply_transforms
    transform.apply(**kwargs)
  File "/usr/lib/python3.4/site-packages/sphinx/transforms.py", line 129, in apply
    if has_child(node.parent, nodes.caption):
  File "/usr/lib/python3.4/site-packages/sphinx/transforms.py", line 116, in has_child
    return any(isinstance(child, cls) for child in node)
TypeError: 'NoneType' object is not iterable

compound transforms loose attributes

This only happens with compound transforms. Attributes like name and inverse are lost.

offx = models.Shift(1)
scl = models.Scale(2)
model = (offx | scl).rename('compound_model')
f = AsdfFile()
f.tree['model'] = model
f.write_to('test.asdf')
f1 = AsdfFile.read('test.asdf')
f1.tree['model'].name
model.name
Out[97]: 'compound_model'

Python 3.5 ImportError for pyasdf.compat.user_collections_py3

I'm getting this error for Python 3.5:

$ pip install asdf
$ python -c 'import pyasdf; pyasdf.test()'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/deil/Library/Python/3.5/lib/python/site-packages/pyasdf/__init__.py", line 37, in <module>
    from .asdf import AsdfFile
  File "/Users/deil/Library/Python/3.5/lib/python/site-packages/pyasdf/asdf.py", line 15, in <module>
    from . import block
  File "/Users/deil/Library/Python/3.5/lib/python/site-packages/pyasdf/block.py", line 23, in <module>
    from .compat.numpycompat import NUMPY_LT_1_7
  File "/Users/deil/Library/Python/3.5/lib/python/site-packages/pyasdf/compat/__init__.py", line 13, in <module>
    from .user_collections_py3.UserDict import UserDict
ImportError: No module named 'pyasdf.compat.user_collections_py3'

Regular failure of test_http_connection_range

I get consistent failures in test_http_connection_range when I run the tests locally, on both Python 2 and 3. Obviously since this failure doesn't seem to be occurring on the CI builds it may be an issue local to me. But I'm opening an issue to remember to investigate:

tree = {'more': array([[[ 0.16936457,  0.04898563,  0.68901559, ...,  0.97004914,
          0.....46327186,  0.78642262, ...,...123, ...,  0.8856184 ,
         0.13... 0.00274184,  0.78529121, ...,  0.6853034 ,
         0.08646289,  0.77335592]])}
rhttpserver = <pyasdf.conftest.RangeHTTPServer object at 0x428ea50>

    @pytest.mark.skipif(sys.platform.startswith('win'),
                        reason="Windows firewall prevents test")
    def test_http_connection_range(tree, rhttpserver):
        path = os.path.join(rhttpserver.tmpdir, 'test.asdf')
        connection = [None]

        def get_write_fd():
            return generic_io.get_file(open(path, 'wb'), mode='w')

        def get_read_fd():
            fd = generic_io.get_file(rhttpserver.url + "test.asdf")
            assert isinstance(fd, generic_io.HTTPConnection)
            connection[0] = fd
            return fd

        with _roundtrip(tree, get_write_fd, get_read_fd) as ff:
            if len(tree) == 4:
                assert connection[0]._nreads == 0
            else:
>               assert connection[0]._nreads == 6
E               assert 5 == 6
E                +  where 5 = <pyasdf.generic_io.HTTPConnection object at 0x431e910>._nreads

../../../.virtualenvs/13aecf6e-83d7-40c6-86f5-713fad8a4373/lib/python2.7/site-packages/pyasdf/tests/test_generic_io.py:306: AssertionError
------------------------------------------------- Captured stdout call -------------------------------------------------
----------------------------------------
Exception happened during processing of request from ('127.0.0.1', 53619)
----------------------------------------
------------------------------------------------- Captured stderr call -------------------------------------------------
127.0.0.1 - - [08/Jan/2016 15:04:07] "GET /test.asdf HTTP/1.1" 206 -
Traceback (most recent call last):
  File "/internal/1/root/usr/local/lib/python2.7/SocketServer.py", line 295, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/internal/1/root/usr/local/lib/python2.7/SocketServer.py", line 321, in process_request
    self.finish_request(request, client_address)
  File "/internal/1/root/usr/local/lib/python2.7/SocketServer.py", line 334, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/internal/1/root/usr/local/lib/python2.7/SocketServer.py", line 651, in __init__
    self.finish()
  File "/internal/1/root/usr/local/lib/python2.7/SocketServer.py", line 710, in finish
    self.wfile.close()
  File "/internal/1/root/usr/local/lib/python2.7/socket.py", line 279, in close
    self.flush()
  File "/internal/1/root/usr/local/lib/python2.7/socket.py", line 303, in flush
    self._sock.sendall(view[write_offset:write_offset+buffer_size])
error: [Errno 32] Broken pipe
127.0.0.1 - - [08/Jan/2016 15:04:07] "GET /test.asdf HTTP/1.1" 206 -
127.0.0.1 - - [08/Jan/2016 15:04:07] "GET /test.asdf HTTP/1.1" 206 -
127.0.0.1 - - [08/Jan/2016 15:04:07] "GET /test.asdf HTTP/1.1" 206 -
127.0.0.1 - - [08/Jan/2016 15:04:07] "GET /test.asdf HTTP/1.1" 206 -
127.0.0.1 - - [08/Jan/2016 15:04:07] "GET /test.asdf HTTP/1.1" 206 -

Get arrays from BytesIO objects without a copy

Is it even possible?

Confusion with asdf / pyasdf package name

This package conflicts with another one:

Python package for astro data ASDF format:

PyPI: https://pypi.python.org/pypi/asdf
Repo: https://github.com/spacetelescope/pyasdf
Install: pyasdf

Python package for Seismic data ASDF format:

PyPI: https://pypi.python.org/pypi/pyasdf
Repo: https://github.com/SeismicData/pyasdf
Install: pyasdf

@krischer @mdboom @embray This is very confusing. Any chance to still change something to avoid the name conflict / simplify this?

Stream read the YAML tree

PyYAML doesn't like it when you give it a YAML file with "invalid" (non-UTF8) content following the end marker. Why it tries to read past the end marker at all is sort of beyond me.

The current solution is to read in the entire tree into a string and pass that to PyYAML. It should also be possible to use some sort of reading proxy that treats ... as EOF. That should be more memory efficient when the tree is really large -- though there might be some more overhead to that. Worth experimenting with in any event.

does asdf support virtual datasets?

I can't tell from reading the specification what transformations do, so I am asking here. Can I specify a dataset a that draws data from dataset b, but then applies some transformation to the data before returning it? For example b = [1 2 3], a=b+4, so when I read values from a I would read [5 6 7].

The whole discussion of transformations seems tailored to the astronomy community, and is hard to follow if you don't know what wcs is.

Support non-memmappable types

Examples would be a packed bit format, or anything where the "view" is not exactly the same as the data.

We should try to do this right, if possible, and not repeat the mistakes of pyfits.

Inconsistent naming of read/write_to?

At the moment, the methods for reading and writing are read and write_to. These seem inconsistent, and I wonder if it should be either read_from/write_to or read/write? (the latter would be my preferred choice).

Change affine transform

affine.yaml requires a 3x3 matrix for the affine transformation. I think the idea was that this may go to higher dimensions in the future. Pyasdf splits this into matrix and translation in order to initialize modeling.AffineTransformation2D which uses the two quantities separately.

I was assuming pyasdf was writing to disk the augmented matrix but it simply attaches the translation part as an additional row to the matrix. This is confusing.

array([[   0.92913257,   -0.36974676,  100.        ],
       [   0.36974676,    0.92913257,   20.        ],
       [   0.        ,    0.        ,    1.        ]])

matrix: !core/ndarray
    data:
    - [0.92913257, -0.36974676, 0.0]
    - [0.36974676, 0.92913257, 0.0]
    - [100.0, 20.0, 0.0]
    datatype: float64
    shape: [3, 3]

In addition, modeling.AffineTranslation2D is the only option currently that can be used to apply the PC matrix in WCS transformations. However I am only able to write a 3x3 matrix as an affine transformation. This is also confusing because it may mean the data is 3 dimensional in the WCS context.

Any ideas how to resolve this? This is also related to astropy.modeling issue #3548 which would solve this problem if accepted.

Diverged release version numbers on github and pypi

It looks like that the package version numbering started to diverge between github and pypi with release 1.0.2 here is being the one uploaded as 1.1 to pypi (some discussion on it in #190).

Since then there are two release tags 1.0.3 and 1.0.4 that seem to be a continuation of the 1.1 version but not uploaded to pypi.

Change RTD domain name

Change all references of readthedocs.org to readthedocs.io. This is not urgent but should happen eventually.

Customize paths of external blocks

Is it possible to customize the filenames and subdirectories where the exploded block files are saved?

Add a commandline utility

For the basic operations -- implode/explode/garbage collect/validate.

HDF5

I suppose you considered using HDF5, but decided not to. I would be interested in knowing the reasons why you chose to design a new format over using HDF5. What were the limitations of HDF5 for your use-cases? It might also be a good idea to put those reasons on the documentation, as I'm sure other people would be interested as well.

SystemError when running tests with numpy-1.11 b2

When running with the 1.11 b2 beta release of numpy (Debian unstable), I get a couple of errors like

____________________________ test_streams[tree0] ______________________________

tree = {'not_shared': array([10,  9,  8,  7,  6,  5,  4,  3,  2,  1], dtype=uint8),
     'science_data': array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.]), 'skipping

    def test_streams(tree):
        buff = io.BytesIO()

        def get_write_fd():
            return generic_io.OutputStream(buff)

        def get_read_fd():
            buff.seek(0)
            return generic_io.InputStream(buff, 'rw')

>       with _roundtrip(tree, get_write_fd, get_read_fd) as ff:

pyasdf/tests/test_generic_io.py:226: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pyasdf/tests/test_generic_io.py:59: in _roundtrip
    ff = asdf.AsdfFile.open(fd, **read_options)
pyasdf/asdf.py:533: in open
    do_not_fill_defaults=do_not_fill_defaults)
pyasdf/asdf.py:475: in _open_impl
    fd, past_magic=True, validate_checksums=validate_checksums)
pyasdf/block.py:243: in read_internal_blocks
    block = self._read_next_internal_block(fd, past_magic=past_magic)
pyasdf/block.py:211: in _read_next_internal_block
    validate_checksum=self._validate_checksums)
pyasdf/block.py:989: in read
    fd, self._size, self._data_size, self.compression)
pyasdf/block.py:1002: in _read_data
    return fd.read_into_array(used_size)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pyasdf.generic_io.InputStream object at 0x7f4b413b5710>, size = 80

    def read_into_array(self, size):
        try:
            # See if Numpy can handle this as a real file first...
>           return np.fromfile(self._fd, np.uint8, size)
E           SystemError: error return without exception set

pyasdf/generic_io.py:870: SystemError

This happens for teststreams, testurlopen, testhttpconnection, testexplodedhttp, testseekuntilonblockboundary, teststreamtostream, and testarraytostream.
With numpy-1.10 (Debian testing) the tests run fine.

I have no idea whether this is an issue for numpy or for pyasdf; I will forward is to numpy as well.

Add stream to disk to memmap mode

When reading from an input stream, we could write the content to a temporary file and memmap that to reduce memory consumption.

Add support for byte string and unicode types in ndarray

The spec needs to add this, too.

Support external sources

It's in the spec -- not yet implemented.

`write_to` is not thread-safe

This is because historically it changed the underlying file descriptor, but as of #118, it doesn't any longer, but it has to fake it because of underlying assumptions elsewhere. (But that's not a regression, write_to never was threadsafe -- it's just that with the new design it no longer has to be). This should all be cleaned up, but at a fairly low priority.

Handle complex256 and float128 types in non-native byteorder on Python 3

Obscure enough? Low priority bug.

general questions

@mdboom Two questions that came up while testing this with some real data.

I had a file written with a previous version of pyasdf and I can't read it with the latest version. I can write it and read it back correctly. I understand this is still in development but I'm wondering if there's a way to handle this in the future. I am not suggesting "Once ASDF always ASDF" but I suspect we'll have to handle this in some way in the future so I'm raising it for consideration. One thing that can be done now is perhaps write out the version of Pyasdf that created the file.
The second question is about performance. I have 3 files: "dist.asdf" has a compound model consisting of a few polynomials, "foc2sky.asdf" is the typical WCS transformation consisting of linear transformation, tan deprojection and sky rotation, and "image.asdf" is the above two combined into one file. Here's the timing I get from reading them:

timeit f=AsdfFile.read('dist.asdf')
1 loops, best of 3: 422 ms per loop

timeit f=AsdfFile.read('foc2sky.asdf')
10 loops, best of 3: 54.9 ms per loop

timeit f=AsdfFile.read('image.asdf')
1 loops, best of 3: 8.99 s per loop

Why the big difference?

Add support for lzma filter

In addition to the supported zlib compression type, it is useful to have support ifor other algorithms like liblzma and bzip2 for lossless data compression. The interface of these modules in the python standard library is similar and lzma has some advantages for some types of data compared to zlib. If these methods are not supported by the standard it would be useful to have user-defined filters to implement the compression.

Make the block index and streaming blocks play well together

There was discussion about this in #144.

Including a block index makes it impossible to add to the streaming block after the file is written. I think this is an important use case. However, maybe it also makes sense to allow "freezing" an ASDF file and appending a block index, after which the streaming block could not be updated.

As it stands, the block index and streaming blocks are explicitly disallowed to coexist.

Multiple ASDF extentions in single FITS file?

Afetr some limited playing with fits_embed, I see it can write only one ASDF extension with extname=ASDF. Is there a plan to allow more than one ASDF extension in a fits file with properly managed extname and extver?

Make it easier to validate custom tags

problem using `with` to read a file

@mdboom Should I not use with to open asdf files? Or am I doing something wrong?

ar = np.arange(36).reshape((6,6))
f = AsdfFile()
f.tree = {'regions': ar, 'reftype': 'regions'}
f.write_to('test.asdf')

with AsdfFile.open('test.asdf') as f:
    reg=f.tree['regions']
print reg
<array (unloaded) shape: [6, 6] dtype: int64>

Context manager clarification

Just out of curiosity, why is the current syntax for the context manager:

ff = AsdfFile(tree)
with ff.write_to("example.asdf"):
    pass

Would it not make sense for it to be:

with AsdfFile(tree) as ff:
    ff.write_to("example.asdf")

? What is the purpose of using write_to as a context manager?:

with ff.write_to("example.asdf"):
    pass

Sphinx warnings with sphix 1.3.5

The sphinx build in master currently fails with this kind of warnings:

WARNING: Could not parse literal_block as "yaml". highlighting skipped.

These are issued from code like

.. runcode::

   from asdf import AsdfFile

   # Make the tree structure, and create a AsdfFile from it.
   tree = {'hello': 'world'}
   ff = AsdfFile(tree)
   ff.write_to("test.asdf")

   # You can also make the AsdfFile first, and modify its tree directly:
   ff = AsdfFile()
   ff.tree['hello'] = 'world'
   ff.write_to("test.asdf")

.. asdf:: test.asdf

This is discussed a bit in #190 in this comment.

Add test for TaggedTime class addtion

The code added in pull request #198 does not have a test, due to time constraints. One should be added.

Support ``asdf`` interface?

While playing around with trying to store Astropy objects in ASDF (e.g. astropy/astropy#3733) I was wondering how we might best make it easy for developers to make their kinds of objects storable in ASDF. One option would be to check if objects have a __asdf__ method that if called will return a valid ASDF structure that can be included in a file? (with all the correct meta-data).

astropy dependency

Can astropy be an optional dependency? Communities outside astronomy might be interested in using the file format.

Scope of ASDF beyond Astronomy?

At the moment, the tagline for the repo is ASDF (Advanced Scientific Data Format) is a next generation interchange format for astronomical data - I wonder if it would be worth making it sound like it would also be useful to other fields, e.g. being developed for astronomical and other scientific data?

transform.name does not roundtrip

This demonstrates the problem:

rot=models.Rotation2D(23, name='rotation')
fa=AsdfFile()
fa.tree={'r': rot}
fa.write_to('rot.asdf')
frot=AsdfFile.read('todel.asdf')
frot.tree['r'].name is None
Out[38]: True

name is already part of the basic transform schema. Is there a way to add this to the general transform to_tree method instead of adding it to every subclass of TransformType?

AsdfInFits API enhancements

Two possible enhancements I envision to the API when working with FITS-embedded-ASDF:

AsdfInFits.open currently just accepts an existing HDUList object as its first argument. This means that when reading from a FITS file on disk one has to:

from pyasdf.fits_embed import AsdfInFits
from astropy.io import fits
asdf = AsdfInFits.open(fits.open('filename.fits'))

The two open calls are silly--AsdfInFits.open could easily accept any filename or other object accepted by fits.open.

Relatedly, I think it should be possible to read ASDF directly from a FITS file with pyasdf.open. It's easy to detect that the input is a FITS file (instead of a true ASDF file), and just as easy to detect that it's using the ASDF-embedded-in-FITS convention (which I think should be part of the ASDF Standard if it isn't already, albeit may in an index since it's really more of a usage convention that part of ASDF itself).

JSON Schema validation performance

JSON schema validation currently takes 60% of load time on a benchmark with 10000 arrays.

Unlike the YAML parsing where there was a lot of low-hanging fruit, in JSON schema things are tricky. It's hard to figure out what to do to improve the performance of jsonschema without obliterating its really clean architecture.

Relatedly, I experimented with adding a flag to turn of JSON schema validation. The problem is that then many of the type converters become more brittle in interesting ways because they don't do their own validation that the JSON schema is currently doing for them. Duplicating that work seems like a way to only make things slower, so not sure what to do there.

Feature request: auto-close files when using context managers

In the following example:

import numpy as np
from pyasdf import AsdfFile

tree = {'test': np.array([1,2,3])}

f = AsdfFile(tree)
f.set_array_storage(tree['test'], 'inline')
f.write_to('data.asdf')

for i in range(1000):
    with AsdfFile.read('masked.asdf') as f2:
        np.sum(f2.tree['test'])

I am running into:

OSError: [Errno 24] Too many open files

It would be nice if read could work as a normal context manager and auto-close the file.

Add support for `set` types

This will need an extension to jsonschema, as well as an extension to the tagged.py module.

YAML formatting issues

This is a placeholder to remind to deal with some issues with how the YAML is output.

Since YAML has multiple ways to represent the same thing, there are cases where it might be preferable to use one form over another. Currently, pyasdf does "whatever PyYAML does by default".

There are (at least) three separate things to consider here:

When reading an input file, preserving the form of each of the input entries when writing back out
When generating a file from scratch, using hints in the schema to select an output form
Allowing the user to explicitly specify the form of the output on an individual item basis

Failed test in transform schema

One of the schema tests fails with a KeyError. This is the end of the traceback output from python setup.py test:

cls = <class 'pyasdf.tags.transform.projections.Rotate3DType'>
node = {'phi': 12.3, 'psi': -1.2, 'theta': 34}
ctx = <pyasdf.asdf.AsdfFile object at 0x7f7e2f75d390>

    @classmethod
    def from_tree_transform(cls, node, ctx):
        print(node)
>       if node['direction'] == 'native2celestial':
E       KeyError: 'direction'

pyasdf/tags/transform/projections.py:83: KeyError
From file: rotate3d.yaml
=================== 1 failed, 222 passed in 16.35 seconds ====================

Remove astropy as hard dependency

windows testing?

I am trying to set this up on Windows and I get this error below. Has this been run on Windows? It's possible I did not install it correctly.


In [4]: f=AsdfFile()
---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)
<ipython-input-4-f4d43ec31023> in <module>()
----> 1 f=AsdfFile()

C:\Anaconda\envs\gwcs\lib\site-packages\pyasdf-0.0.dev0-py2.7.egg\pyasdf\asdf.py
c in __init__(self, tree, uri, extensions)
     58         self._blocks = block.BlockManager(self)
     59         if tree is None:
---> 60             self.tree = {}
     61             self._uri = uri
     62         elif isinstance(tree, AsdfFile):

C:\Anaconda\envs\gwcs\lib\site-packages\pyasdf-0.0.dev0-py2.7.egg\pyasdf\asdf.py
c in tree(self, tree)
    181         tagged_tree = yamlutil.custom_tree_to_tagged_tree(
    182             AsdfObject(tree), self)
--> 183         schema.validate(tagged_tree, self)
    184         self._tree = AsdfObject(tree)
    185

C:\Anaconda\envs\gwcs\lib\site-packages\pyasdf-0.0.dev0-py2.7.egg\pyasdf\schema.
pyc in validate(instance, ctx, *args, **kwargs)
    281     # test suite!!!).  Instead, we assume that the schemas are valid
    282     # through the running of the unit tests, not at run time.
--> 283     cls = _create_validator()
    284     validator = cls({}, *args, **kwargs)
    285     validator.ctx = ctx

C:\Anaconda\envs\gwcs\lib\site-packages\pyasdf-0.0.dev0-py2.7.egg\pyasdf\schema.
pyc in _create_validator()
    156         meta_schema=load_schema(
    157             'http://stsci.edu/schemas/yaml-schema/draft-01',
--> 158             mresolver.default_url_mapping),
    159         validators=YAML_VALIDATORS)
    160     validator.orig_iter_errors = validator.iter_errors

C:\Anaconda\envs\gwcs\lib\site-packages\pyasdf-0.0.dev0-py2.7.egg\pyasdf\schema.
pyc in load_schema(url, resolver)
    245         resolver = mresolver.default_url_mapping
    246     loader = _make_schema_loader(resolver)
--> 247     return loader(url)
    248
    249

C:\Anaconda\envs\gwcs\lib\site-packages\pyasdf-0.0.dev0-py2.7.egg\pyasdf\schema.
pyc in load_schema(url)
    223     def load_schema(url):
    224         url = resolver(url)
--> 225         return _load_schema(url)
    226     return load_schema
    227

C:\Anaconda\envs\gwcs\lib\site-packages\pyasdf-0.0.dev0-py2.7.egg\pyasdf\compat\
functools_backport.pyc in wrapper(*args, **kwds)
    115                         stats[HITS] += 1
    116                         return result
--> 117                 result = user_function(*args, **kwds)
    118                 with lock:
    119                     root, = nonlocal_root

C:\Anaconda\envs\gwcs\lib\site-packages\pyasdf-0.0.dev0-py2.7.egg\pyasdf\schema.
pyc in _load_schema(url)
    212 @lru_cache()
    213 def _load_schema(url):
--> 214     with generic_io.get_file(url) as fd:
    215         if isinstance(url, six.text_type) and url.endswith('json'):
    216             result = json.load(fd)

C:\Anaconda\envs\gwcs\lib\site-packages\pyasdf-0.0.dev0-py2.7.egg\pyasdf\generic
_io.pyc in get_file(init, mode, uri)
   1014                 realmode = mode + 'b'
   1015             return RealFile(
-> 1016                 open(parsed.path, realmode), mode, close=True,
   1017                 uri=uri or parsed.path)
   1018

IOError: [Errno 2] No such file or directory: u'/stsci.edu/yaml-schema/draft-01.
yaml'

add support for custom extensions in `helpers` functions

I'd like to use the pyasdf testing infrastructure with custom extensions, specifically to test roundtripping.
I've looked into adding the extensions keyword to the assert_roundtrip_tree function (calls to AsdfFile() and AsdfFile.open) but this doesn't seem to be sufficient as the type_index is not updated with the custom types.
Would it be easy to add this functionality?

Indicate presence of block index in the Tree

As discussed in #144. Should revisit if the time looking for the block index when it isn't there becomes burdensome.

Bug when read storing masked array inline

The following code:

from numpy import ma
from pyasdf import AsdfFile

tree = {'test': ma.array([1,2,3], mask=[0,1,0])}

f = AsdfFile(tree)
f.set_array_storage(tree['test'], 'inline')
f.write_to('masked.asdf')

f2 = AsdfFile.read('masked.asdf')

triggers the following exception:

Traceback (most recent call last):
  File "buggy.py", line 10, in <module>
    f2 = AsdfFile.read('masked.asdf')
  File "/Users/tom/miniconda3/envs/production/lib/python3.4/site-packages/pyasdf-0.0.dev426-py3.4.egg/pyasdf/asdf.py", line 392, in read
    yaml_content, self, do_not_fill_defaults=do_not_fill_defaults)
  File "/Users/tom/miniconda3/envs/production/lib/python3.4/site-packages/pyasdf-0.0.dev426-py3.4.egg/pyasdf/yamlutil.py", line 269, in load_tree
    tree = tagged_tree_to_custom_tree(tree, ctx)
  File "/Users/tom/miniconda3/envs/production/lib/python3.4/site-packages/pyasdf-0.0.dev426-py3.4.egg/pyasdf/yamlutil.py", line 245, in tagged_tree_to_custom_tree
    return treeutil.walk_and_modify(tree, walker)
  File "/Users/tom/miniconda3/envs/production/lib/python3.4/site-packages/pyasdf-0.0.dev426-py3.4.egg/pyasdf/treeutil.py", line 99, in walk_and_modify
    return recurse(top, set())
  File "/Users/tom/miniconda3/envs/production/lib/python3.4/site-packages/pyasdf-0.0.dev426-py3.4.egg/pyasdf/treeutil.py", line 84, in recurse
    result[key] = recurse(val, new_seen)
  File "/Users/tom/miniconda3/envs/production/lib/python3.4/site-packages/pyasdf-0.0.dev426-py3.4.egg/pyasdf/treeutil.py", line 95, in recurse
    result = callback(result)
  File "/Users/tom/miniconda3/envs/production/lib/python3.4/site-packages/pyasdf-0.0.dev426-py3.4.egg/pyasdf/yamlutil.py", line 242, in walker
    return tag_type.from_tree_tagged(node, ctx)
  File "/Users/tom/miniconda3/envs/production/lib/python3.4/site-packages/pyasdf-0.0.dev426-py3.4.egg/pyasdf/asdftypes.py", line 180, in from_tree_tagged
    return cls.from_tree(tree.data, ctx)
  File "/Users/tom/miniconda3/envs/production/lib/python3.4/site-packages/pyasdf-0.0.dev426-py3.4.egg/pyasdf/stream.py", line 45, in from_tree
    return ndarray.NDArrayType.from_tree(data, ctx)
  File "/Users/tom/miniconda3/envs/production/lib/python3.4/site-packages/pyasdf-0.0.dev426-py3.4.egg/pyasdf/tags/core/ndarray.py", line 343, in from_tree
    return cls(source, shape, dtype, offset, strides, 'C', mask, ctx)
  File "/Users/tom/miniconda3/envs/production/lib/python3.4/site-packages/pyasdf-0.0.dev426-py3.4.egg/pyasdf/tags/core/ndarray.py", line 200, in __init__
    self._array = inline_data_asarray(source, dtype)
  File "/Users/tom/miniconda3/envs/production/lib/python3.4/site-packages/pyasdf-0.0.dev426-py3.4.egg/pyasdf/tags/core/ndarray.py", line 162, in inline_data_asarray
    return np.asarray(inline, dtype=dtype)
  File "/Users/tom/miniconda3/envs/production/lib/python3.4/site-packages/numpy/core/numeric.py", line 462, in asarray
    return array(a, dtype, copy=False, order=order)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

asdf-format / asdf Goto Github PK

asdf's People

Contributors

Stargazers

Watchers

Forkers

asdf's Issues

Recommend Projects

Recommend Topics

Recommend Org