Giter Club home page Giter Club logo

swiftspec's People

Contributors

d70-t avatar observingclouds avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

swiftspec's Issues

Create containers with `mkdir`

It would be nice to have the mkdir method at least for the lowest level, the container because it seems container are not created on the fly when a mapper is used. If the container testtzis does not exist in the following account, the mapper runs into an error:

a=fsspec.get_mapper("swift://swift.dkrz.de/dkrz_4236b71e-04df-456b-8a32-5d66641510f2/testtzis/testcmip6")
a[".test"]=b"ho"

404, message='Not Found', url=URL('https://swift.dkrz.de/v1/dkrz_4236b71e-04df-456b-8a32-5d66641510f2/testtzis/testcmip6/.test')

Do you think that is possible?

Issue when using intake catalog with simplecache

I'm trying to access a netCDF resources via intake and simplecache as following:

from intake.catalog.local import LocalCatalogEntry
from intake import Catalog
mycat = Catalog.from_dict({'testcat': LocalCatalogEntry('testfile', 'some showcase testfile', driver='netcdf', args={'urlpath': 'simplecache::swift://swift.dkrz.de/dkrz_948e7d4bbfbb445fbff5315fc433e36a/TEST/test01.nc', 'xarray_kwargs': {'engine': 'h5netcdf'}})})
p=mycat.testcat.to_dask()

Expectation
The file is cached via simplecache and returned

Issue
I get a NotImplementedError:

NotImplementedError                       Traceback (most recent call last)
Input In [4], in <cell line: 1>()
----> 1 p=mycat.testcat.to_dask()

File ~/.local/lib/python3.9/site-packages/intake_xarray/base.py:69, in DataSourceMixin.to_dask(self)
     67 def to_dask(self):
     68     """Return xarray object where variables are dask arrays"""
---> 69     return self.read_chunked()

File ~/.local/lib/python3.9/site-packages/intake_xarray/base.py:44, in DataSourceMixin.read_chunked(self)
     42 def read_chunked(self):
     43     """Return xarray object (which will have chunks)"""
---> 44     self._load_metadata()
     45     return self._ds

File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/intake/source/base.py:285, in DataSourceBase._load_metadata(self)
    283 """load metadata only if needed"""
    284 if self._schema is None:
--> 285     self._schema = self._get_schema()
    286     self.dtype = self._schema.dtype
    287     self.shape = self._schema.shape

File ~/.local/lib/python3.9/site-packages/intake_xarray/base.py:18, in DataSourceMixin._get_schema(self)
     15 self.urlpath = self._get_cache(self.urlpath)[0]
     17 if self._ds is None:
---> 18     self._open_dataset()
     20     metadata = {
     21         'dims': dict(self._ds.dims),
     22         'data_vars': {k: list(self._ds[k].coords)
     23                       for k in self._ds.data_vars.keys()},
     24         'coords': tuple(self._ds.coords.keys()),
     25     }
     26     if getattr(self, 'on_server', False):

File ~/.local/lib/python3.9/site-packages/intake_xarray/netcdf.py:87, in NetCDFSource._open_dataset(self)
     84     _open_dataset = xr.open_dataset
     86 if self._can_be_local:
---> 87     url = fsspec.open_local(self.urlpath, **self.storage_options)
     88 else:
     89     # https://github.com/intake/filesystem_spec/issues/476#issuecomment-732372918
     90     url = fsspec.open(self.urlpath, **self.storage_options).open()

File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/fsspec/core.py:454, in open_local(url, mode, **storage_options)
    449 if not getattr(of[0].fs, "local_file", False):
    450     raise ValueError(
    451         "open_local can only be used on a filesystem which"
    452         " has attribute local_file=True"
    453     )
--> 454 with of as files:
    455     paths = [f.name for f in files]
    456 if isinstance(url, str) and not has_magic(url):

File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/fsspec/core.py:171, in OpenFiles.__enter__(self)
    168 while True:
    169     if hasattr(fs, "open_many"):
    170         # check for concurrent cache download; or set up for upload
--> 171         self.files = fs.open_many(self)
    172         return self.files
    173     if hasattr(fs, "fs") and fs.fs is not None:

File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/fsspec/implementations/cached.py:444, in CachingFileSystem.__getattribute__.<locals>.<lambda>(*args, **kw)
    408 def __getattribute__(self, item):
    409     if item in [
    410         "load_cache",
    411         "_open",
   (...)
    442         # all the methods defined in this class. Note `open` here, since
    443         # it calls `_open`, but is actually in superclass
--> 444         return lambda *args, **kw: getattr(type(self), item).__get__(self)(
    445             *args, **kw
    446         )
    447     if item in ["__reduce_ex__"]:
    448         raise AttributeError

File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/fsspec/implementations/cached.py:552, in WholeFileCacheFileSystem.open_many(self, open_files)
    549 downfn = [fn for fn, d in zip(downfn0, details) if not d]
    550 if downpath:
    551     # skip if all files are already cached and up to date
--> 552     self.fs.get(downpath, downfn)
    554     # update metadata - only happens when downloads are successful
    555     newdetail = [
    556         {
    557             "original": path,
   (...)
    563         for path in downpath
    564     ]

File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/fsspec/asyn.py:114, in sync_wrapper.<locals>.wrapper(*args, **kwargs)
    111 @functools.wraps(func)
    112 def wrapper(*args, **kwargs):
    113     self = obj or args[0]
--> 114     return sync(self.loop, func, *args, **kwargs)

File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/fsspec/asyn.py:99, in sync(loop, func, timeout, *args, **kwargs)
     97     raise FSTimeoutError from return_result
     98 elif isinstance(return_result, BaseException):
---> 99     raise return_result
    100 else:
    101     return return_result

File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/fsspec/asyn.py:54, in _runner(event, coro, result, timeout)
     52     coro = asyncio.wait_for(coro, timeout=timeout)
     53 try:
---> 54     result[0] = await coro
     55 except Exception as ex:
     56     result[0] = ex

File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/fsspec/asyn.py:541, in AsyncFileSystem._get(self, rpath, lpath, recursive, callback, **kwargs)
    539     callback.branch(rpath, lpath, kwargs)
    540     coros.append(self._get_file(rpath, lpath, **kwargs))
--> 541 return await _run_coros_in_chunks(
    542     coros, batch_size=batch_size, callback=callback
    543 )

File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/fsspec/asyn.py:249, in _run_coros_in_chunks(coros, batch_size, callback, timeout, return_exceptions, nofiles)
    243     if callback is not _DEFAULT_CALLBACK:
    244         [
    245             t.add_done_callback(lambda *_, **__: callback.relative_update(1))
    246             for t in chunk
    247         ]
    248     results.extend(
--> 249         await asyncio.gather(*chunk, return_exceptions=return_exceptions),
    250     )
    251 return results

File /work/mh0010/m300408/envs/covariability/lib/python3.9/asyncio/tasks.py:442, in wait_for(fut, timeout, loop)
    437     warnings.warn("The loop argument is deprecated since Python 3.8, "
    438                   "and scheduled for removal in Python 3.10.",
    439                   DeprecationWarning, stacklevel=2)
    441 if timeout is None:
--> 442     return await fut
    444 if timeout <= 0:
    445     fut = ensure_future(fut, loop=loop)

File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/fsspec/asyn.py:508, in AsyncFileSystem._get_file(self, rpath, lpath, **kwargs)
    507 async def _get_file(self, rpath, lpath, **kwargs):
--> 508     raise NotImplementedError

NotImplementedError: 

Towards fixing the issue
The _get_file method is not implemented in slkspec which seems to cause this error message.

Implement large objects

Standard objects on Swift can not be larger than 5 GB. Static large objects circumvent this limitation by splitting larger objects into pieces and reassembling them on the server side using a manifest object.

Access via temporary URL

Maybe it is possible to access data via the api having only the temporary url. It seems like the temporary url and an expiration date of this url has to be provided in the header of an request. I will have a deeper look.

Account in URL

Hello,

According to the docs, here is how to access a file in swift:

import fsspec

with fsspec.open("swift://server/account/container/object.txt", "r") as f:
    print(f.read())

Apologies for asking the obvious but, is account on the string above referring to a personal user account for the swift endpoint?

Do you still need account if you configure environment variables OS_STORAGE_URL and OS_AUTH_TOKEN for authentication?

Many thanks,
Sebastian

Test for existing directory

Currently the exists method is not working as expected on directories. Swift does not have actual directories, but it would be great if the following would work:

Failing minimal example

import fsspec
fs = fsspec.filesystem("swift")
fs.lexists("swift://swift.dkrz.de/dkrz_948e7d4bbfbb445fbff5315fc433e36a/fsspec_test/bug01.zarr")

returns False although I expect it to return True.

On container-level and object-level the function returns as expected:

fs.lexists("swift://swift.dkrz.de/dkrz_948e7d4bbfbb445fbff5315fc433e36a/fsspec_test")
#True

fs.lexists("swift://swift.dkrz.de/dkrz_948e7d4bbfbb445fbff5315fc433e36a/fsspec_test/bug01.zarr/.zattrs")
#True

Issue when removing files recursively

Description
When trying to delete a directory and its files, the removal succeeds but throws a ClientResponseError.

Minimal example

# Creating directory with files
import xarray as xr
ds = xr.Dataset({"a":(("time"), [10,20,30,40])})
ds.to_zarr("swift://swift.dkrz.de/dkrz_948e7d4bbfbb445fbff5315fc433e36a/TEST/out.zarr")

# Removing files
import fsspec
fs = fsspec.filesystem('swift')
fs.rm("swift://swift.dkrz.de/dkrz_948e7d4bbfbb445fbff5315fc433e36a/TEST/out.zarr", recursive=True)

This leads to:

---------------------------------------------------------------------------
ClientResponseError                       Traceback (most recent call last)
Input In [1], in <cell line: 8>()
      6 import fsspec
      7 fs = fsspec.filesystem('swift')
----> 8 fs.rm("swift://swift.dkrz.de/dkrz_948e7d4bbfbb445fbff5315fc433e36a/TEST/out.zarr", recursive=True)

File XXXXXXXXXXXXXXXX/lib/python3.9/site-packages/fsspec/asyn.py:85, in sync_wrapper.<locals>.wrapper(*args, **kwargs)
     82 @functools.wraps(func)
     83 def wrapper(*args, **kwargs):
     84     self = obj or args[0]
---> 85     return sync(self.loop, func, *args, **kwargs)

File XXXXXXXXXXXXXXXX/lib/python3.9/site-packages/fsspec/asyn.py:65, in sync(loop, func, timeout, *args, **kwargs)
     63     raise FSTimeoutError from return_result
     64 elif isinstance(return_result, BaseException):
---> 65     raise return_result
     66 else:
     67     return return_result

FileXXXXXXXXXXXXXXXX/lib/python3.9/site-packages/fsspec/asyn.py:25, in _runner(event, coro, result, timeout)
     23     coro = asyncio.wait_for(coro, timeout=timeout)
     24 try:
---> 25     result[0] = await coro
     26 except Exception as ex:
     27     result[0] = ex

File XXXXXXXXXXXXXXXX/lib/python3.9/site-packages/fsspec/asyn.py:309, in AsyncFileSystem._rm(self, path, recursive, batch_size, **kwargs)
    307 batch_size = batch_size or self.batch_size
    308 path = await self._expand_path(path, recursive=recursive)
--> 309 return await _run_coros_in_chunks(
    310     [self._rm_file(p, **kwargs) for p in path],
    311     batch_size=batch_size,
    312     nofiles=True,
    313 )

FileXXXXXXXXXXXXXXXX/lib/python3.9/site-packages/fsspec/asyn.py:241, in _run_coros_in_chunks(coros, batch_size, callback, timeout, return_exceptions, nofiles)
    235     if callback is not _DEFAULT_CALLBACK:
    236         [
    237             t.add_done_callback(lambda *_, **__: callback.relative_update(1))
    238             for t in chunk
    239         ]
    240     results.extend(
--> 241         await asyncio.gather(*chunk, return_exceptions=return_exceptions),
    242     )
    243 return results

File XXXXXXXXXXXXXXXX/lib/python3.9/asyncio/tasks.py:442, in wait_for(fut, timeout, loop)
    437     warnings.warn("The loop argument is deprecated since Python 3.8, "
    438                   "and scheduled for removal in Python 3.10.",
    439                   DeprecationWarning, stacklevel=2)
    441 if timeout is None:
--> 442     return await fut
    444 if timeout <= 0:
    445     fut = ensure_future(fut, loop=loop)

File ~/.local/lib/python3.9/site-packages/swiftspec/core.py:270, in SWIFTFileSystem._rm_file(self, path, **kwargs)
    268 session = await self.set_session()
    269 async with session.delete(ref.http_url, headers=headers) as res:
--> 270     res.raise_for_status()

File XXXXXXXXXXXXXXXX/lib/python3.9/site-packages/aiohttp/client_reqrep.py:1004, in ClientResponse.raise_for_status(self)
   1002 assert self.reason is not None
   1003 self.release()
-> 1004 raise ClientResponseError(
   1005     self.request_info,
   1006     self.history,
   1007     status=self.status,
   1008     message=self.reason,
   1009     headers=self.headers,
   1010 )

ClientResponseError: 404, message='Not Found', url=URL('https://swift.dkrz.de/v1/dkrz_948e7d4bbfbb445fbff5315fc433e36a/TEST/out.zarr/a')

Expected behaviour
Success without error. On the same note, I would expect

fs.rmdir("swift://swift.dkrz.de/dkrz_948e7d4bbfbb445fbff5315fc433e36a/TEST/out.zarr")

to work. This call returns successful without throwing an error, but does not delete anything.

Cheers,
Hauke

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.