Comments (7)
From a purely naming perspective, I think FileOpener
is more accurate but FileLoader
isn't wildly misleading either.
I had a look at numpy.load
, json.load
, pickle.load
, these seem to either read or parse through files that have already been opened and return some structured data. In that sense, FileLoader
definitely behaves differently relative to other modules. Renaming it is likely better.
Perhaps we can rename it, but still leave FileLoader
functional with a deprecation warning? I would imagine most people who started using IterDataPipe
in PyTorch Core would be using FileLoader
and it would be a BC-breaking change.
from data.
I had a look at
numpy.load
,json.load
,pickle.load
, these seem to either read or parse through files that have already been opened and return some structured data. In that sense,FileLoader
definitely behaves differently that other modules. Renaming it is likely better.
Thanks for digging into it. You are right. And, the most concerning part is the functional API since we are currently using load_file_...
to for each FileLoader
. I don't want users to complain the inconsistent behavior across libs.
Perhaps we can rename it, but still leave
FileLoader
functional with deprecation warning? I would imagine most people who started usingIterDataPipe
in PyTorch Core would be usingFileLoader
and it would be a BC-breaking change.
Thank you for pointing out. If we plan to move, we should definitely add deprecation warning at least before official release.
from data.
Also want to gather some insights from domains since this is going to be BC breaking.
cc: @pmeier @Nayef211
from data.
Also want to gather some insights from domains since this is going to be BC breaking. cc: @pmeier @Nayef211
I agree that FileLoader
sounds misleading and that FileOpener
would be a better name for what the datapipe is actually doing. I also think adding a deprecation warning is a good idea, so that users have time to migrate to the new datapipe.
from data.
+1 for FileOpener
.
Not sure about the deprecation warning though. I mean torchdata
is not even "released" yet and is also clearly labeled as prototype. Of course you can go for it, but that also ups the maintenance burden. For torchvision
it is fine to ping me on PRs that break BC so I can land a fix quickly.
from data.
I agree with @pmeier mainly because we didn't mention these DataPipes and functionalities in PyTorch release. And, based on the prototyping policy, we should be able to switch name directly and prevent users to keep using a deprecated feature or name at prototyping phase.
Another way as I mentioned is to add a deprecation warning before our official release. Then, cleaning up during our branch cut. It would add more burden to us maintaining the repo.
I would prefer option 1.
from data.
I agree that based on the prototyping policy we should be able to rename as we wish. I think the policy is very clear for things that exist in TorchData. Do you think that policy is clear to users for things that are in PyTorch Core? If so, then we can rename it without deprecation.
from data.
Related Issues (20)
- Iterating a data pipe, created with random split, ends in error as the code tries to iterate past the data pipe lenght
- `v2.1.2+cu118` and `v2.1.1+cu118` run into torchdata `ImportError: libssl.so.3: cannot open shared object file: No such file or directory`, that `v2.1.0+cu118` doesn't have an issue with HOT 1
- PyTorch 2.2: import torchdata fails on ubuntu-20.04 github runners HOT 3
- Dataloader is slow with iterdatapipes and shuffle that has large in-memory fields (because traverse_dps is slow) HOT 3
- DataLoader2 with multiprocess raise exception: Can not request next item while we are still waiting response for previous request HOT 1
- Move to removesuffix string method after python 3.8 support is dropped
- torchdata not compatible with torch 2.3.0 HOT 3
- [StatefulDataLoader] macOS tests are too slow
- MacOS state_dict tests in CI are failing during shutdown HOT 2
- StatefulDataLoader stores worker state twice if the IterableDataset is also an Iterator
- GDriveReaderDataPipe complains "using a sharing/viewing link instead of a download link"
- iter(dataset) is called twice for certain cases of state restore of IterableDataset HOT 3
- State_dict on dataset seems to be called more often than expected HOT 2
- Make DistributedSampler stateful HOT 4
- Enable Append Mode in SaverIterDataPipe HOT 1
- Returning tensor instead of dict for state_dict causes failure HOT 2
- Importing `torchdata.stateful_dataloader` hides `torch` RandomSampler and BatchSampler HOT 8
- best practice for `snapshot_every_n_steps` HOT 1
- what's the exact plan for torchdata now? HOT 1
- early stop worker got Exception Error HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from data.