Comments (8)
Until #57 is ready, ls
on the bucket is fetching the full listing of all keys and their details (98k files), which might take longer than some FUSE timeout.
@asford , are you planning on working on file listing?
from gcsfs.
I am adding more logging in #67 and any further improvements. In this case, I don't think it would tell you more than that the directly listing had started.
from gcsfs.
#57 includes optimization of the directory listing codepath, so the ls
time should be proportional to the number of files in the directory.
@mrocklin I've rebased that pull against the release revision. Would it be possible for you to quickly test git+https://github.com/asford/gcsfs@per_dir_cache
in that install to see if it resolves your issue?
from gcsfs.
Sure. Here is an attempt from that branch
mrocklin@carbon:~/workspace/gcsfs$ git checkout per_dir_cache
Branch per_dir_cache set up to track remote branch per_dir_cache from asford.
Switched to a new branch 'per_dir_cache'
mrocklin@carbon:~/workspace/gcsfs$ python gcsfs/cli/gcsfuse.py pangeo-data gcs
Mounting bucket pangeo-data to directory gcs
Traceback (most recent call last):
File "/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/fuse.py", line 495, in _wrapper
return func(*args, **kwargs) or 0
File "/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/fuse.py", line 693, in readdir
fip.contents.fh):
File "/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/fuse.py", line 800, in __call__
return getattr(self, op)(*args)
File "/home/mrocklin/workspace/gcsfs/gcsfs/gcsfuse.py", line 57, in readdir
files = [f.rstrip('/').rsplit('/', 1)[1] for f in files]
File "/home/mrocklin/workspace/gcsfs/gcsfs/gcsfuse.py", line 57, in <listcomp>
files = [f.rstrip('/').rsplit('/', 1)[1] for f in files]
IndexError: list index out of range
Traceback (most recent call last):
File "/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/fuse.py", line 495, in _wrapper
return func(*args, **kwargs) or 0
File "/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/fuse.py", line 693, in readdir
fip.contents.fh):
File "/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/fuse.py", line 800, in __call__
return getattr(self, op)(*args)
File "/home/mrocklin/workspace/gcsfs/gcsfs/gcsfuse.py", line 57, in readdir
files = [f.rstrip('/').rsplit('/', 1)[1] for f in files]
File "/home/mrocklin/workspace/gcsfs/gcsfs/gcsfuse.py", line 57, in <listcomp>
files = [f.rstrip('/').rsplit('/', 1)[1] for f in files]
IndexError: list index out of range
Also, fwiw that bucket is publicly readable. You should be able to do the same thing locally if you're interested.
from gcsfs.
Great, thanks. This is likely due to a mismatch in how objects & prefixes are matched to files & directories in that patch.
Is there any intention to lift the gcsfuse
implementation into a generic interface support s3fs
?
from gcsfs.
@asford , we certainly could implement the same model of fuse for s3fs, and this idea has been mentioned; but there are already several projects around that do that for s3, which may be good enough.
from gcsfs.
This should now work better
from gcsfs.
from gcsfs.
Related Issues (20)
- Stale cache info leading to failing isfile check HOT 2
- _find does not support maxdepth
- GCS High Performance Parallel Listing
- Strange behavior with `HTTPError` and multiprocessing HOT 3
- isdir/info method works incorrectly HOT 20
- Clarify how to pass JSON credentials HOT 2
- Is it needed to be so strict about dependency on fsspec HOT 1
- Release 2023.10.0 and consider relaxing fsspec dependency HOT 3
- Error when listing large directory with versions=True
- Request: add chmod
- Issues when using identity_pool.Credentials for connecting GCSFileSystem HOT 2
- Strange error message when using cp instead of put HOT 2
- Filename with slashes in the path are getting URL encoded, causing them to fail HOT 2
- Pin generation on open for version aware file system HOT 4
- asyncio exception while writing to zarr store HOT 5
- Add API reference for gcsfs.mapping to docs HOT 2
- `fs.isdir` latency 200x slower beginning with version 2023.09.01 HOT 1
- Error introduced in 2024.3.0 HOT 2
- FileNotFoundError since 2024.3.1 HOT 5
- Missing 'name' attribute in 'GCSFile' object when accessing PDF files HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gcsfs.