Comments (9)
Thanks for taking interest in this issue.
I understand the intent of this issue to effectively be a way to wrap up the try-except import and cache the module (or parent)
This would cover partial requirement.
I think the ideal situation would be import module/submodule to global namespace lazily. And, when the methods from such module or submodule are invoked, the actual module is imported. By this mean, we don't need to assign the module to each DataPipe instance self
to expose such modules into __iter__
function.
Current:
class XXXDataPipe:
def __init__(self, ...):
try:
import abc
self._abc = abc
except:
raise Error
def __iter__(self):
self._abc
Ideally:
class XXXDataPipe:
def __init__(self, ...):
abc = lazy_import("abc") # Inject into global namespace
def __iter__(self):
abc.xxx # I am not sure if mypy is going to allow such behavior
from data.
In other words, attribute access on not-installed modules should never happen?
Yeah, you are right. The Error should be raised at the first place. Otherwise, it should be a bug.
from data.
So you want to stick with the current, but wrap up the try/except logic?
class XXXDataPipe:
def __init__(self, ...):
self._abc = lazy_import("abc")
def __iter__(self):
self._abc
from data.
I'm happy to take this up.
It seems like if you return the lazily loaded module from some function, you don't actually need to support as ...
. For example, you could do something like np = lazy_import("numpy")
where you'd normally do this check.
One question about the lazy loading is how lazy we want to get. I understand the intent of this issue to effectively be a way to wrap up the try-except import and cache the module (or parent). If so, I guess the logic would basically be:
def lazy_import(module, error_msg, submodule=None):
if not module_installed(module, submodule):
raise ModuleNotFoundError(error_msg)
# check if module (and submodule if provided) are in the "cache"
if not module_imported(module, submodule):
import(module, submodule)
# return the module (or submodule) to be used in code
return load_module(module, submodule)
Is this what you had in mind, @ejguan? If so, there are probably some other logistics to address like what to do with parent modules which might be polluting the namespace, etc. If we're only worrying about modules and submodules, I think we could do this all with importlib
, though submodules can be a bit hacky at times...
from data.
I see. The failure to import would result in an immediate error, but the actual import is the lazy part. Is that right? In other words, attribute access on not-installed modules should never happen?
from data.
Excellent, thanks for the clarification. One idea that's kind of messy is using globals()
to populate the module with as_
or fully qualified name. The issue is that it will then be a global everywhere and not just within the file. For instance, in this minimal example
def lazy_import(name, as_=None):
# Doesn't handle error_msg well yet
import importlib
mod = importlib.import_module(name)
if as_ is not None:
name = as_
globals()[name] = mod
class Foo:
def __init__(self):
lazy_import("numpy", as_="np")
def foo(self):
return np.array([1,2,])
f = Foo()
print(f.foo()) # prints [1, 2] as expected
Using this, I think it might introduce some collisions or a heavily polluted global namespace.
from data.
You are right. Then, let's keep it as the minimum as possible. Not using global for now. If more users requested, we can easily extend it.
from data.
I asked here and it seems like pandas has something similar to what we want. They still need to import where used so it's not terribly different, but it prevents us from needing to stuff modules into self
or have globals floating around.
from data.
I asked here and it seems like pandas has something similar to what we want. They still need to import where used so it's not terribly different, but it prevents us from needing to stuff modules into
self
or have globals floating around.
Sounds great.
from data.
Related Issues (20)
- MultiplexerLongest example snippet isn't very useful
- Passing dict in datapipe/dataset will have memory leak problem HOT 3
- Roadmap for mixed chain of multithread and multiprocessing pipelines? HOT 2
- DataLoader2 Memory Behavior is very strange on Epoch Resets HOT 9
- FileExistsError when using `on_disk_cache` and multiple workers HOT 1
- Dataloader2 with FullSyncIterDataPipe throws error during initilization HOT 3
- Make archive datapipes faster HOT 1
- Is torchdata still being actively developed? HOT 6
- An iterator that can stream over stdin
- torchdata has a very low accuracy
- Future of torchdata and dataloading HOT 38
- Calling __iter__ twice on DataLoader2 causes hang with MPRS HOT 2
- Loading `.tfrecords` files that require a deserialization method
- S3FileLoaderIterDataPipe buffer_size
- Iterating a data pipe, created with random split, ends in error as the code tries to iterate past the data pipe lenght
- `v2.1.2+cu118` and `v2.1.1+cu118` run into torchdata `ImportError: libssl.so.3: cannot open shared object file: No such file or directory`, that `v2.1.0+cu118` doesn't have an issue with HOT 1
- PyTorch 2.2: import torchdata fails on ubuntu-20.04 github runners HOT 3
- Dataloader is slow with iterdatapipes and shuffle that has large in-memory fields (because traverse_dps is slow) HOT 3
- DataLoader2 with multiprocess raise exception: Can not request next item while we are still waiting response for previous request HOT 1
- Move to removesuffix string method after python 3.8 support is dropped
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from data.