Comments (4)
Putting data on NFS like ceph or gluster can fix this issue.
from pytorch-biggraph.
PBG's input and output is file-based: its config, entity counts, edge lists, checkpoints, ... must all be files on the filesystem, i.e, they must have a path which allows to read from and write through them using standards operating system interfaces. And that's really all PBG needs, so in particular it doesn't require that those files are backed by local disk: one can use network storage, as long as it can be accessed by the above means. In fact, for distributed training, the checkpoint directory must be shared across the machines because it's used to transfer data among the trainers. NFS storage works. I'm not familiar with HDFS but I expect it can be used with vanilla PBG if it can be mounted as a filesystem.
from pytorch-biggraph.
Two more things:
- HDF5 (the library we use to read edgelists and store checkpoints) may be able to use some lower-level interfaces to perform faster I/O when the files are on local disk. I'm using the conditional because I've no idea how HDF5 works internally. However it is still able to fall back to "regular" file I/O for remote files.
- During distributed training the checkpoints are only used to transfer files between one epoch and the next. Within one epoch the data is passed around between trainers using parameter servers and partition servers. Except if partition servers are disabled (which, unless you operate under strong memory pressure, they shouldn't), in which case checkpoints are also used within the epoch. Edgelists are however always read from file.
What this boils down to is: in single-machine mode, using local disk may be faster; in distributed mode it doesn't matter that much how the checkpoint directory is stored because anyways I/O to the filesystem is rarer.
from pytorch-biggraph.
@lerks Thanks, that's really helpful.
from pytorch-biggraph.
Related Issues (20)
- [Question] Are featurized entities supported on GPUs? HOT 2
- raise EOFError and RuntimeError: CUDA error: device-side assert triggered HOT 2
- Regarding non dynamic relations HOT 2
- Regarding negative batch sampling in dyamic relations HOT 2
- Entity attributes, attribute triplets and custom loss function HOT 1
- Parallelizing convert_input_data HOT 1
- How to calculate similarity score between two embeddings if affine operator was used? HOT 2
- Could Pytorch BigGraph find similarities two edges away? HOT 1
- Could you please generate a new release ? HOT 3
- [Question] Python API
- [Question] Incremental training on existing entity embeddings and relation embeddings
- [Question] Stability of embeddings on consecutive runs? HOT 3
- [Question] Dynamic relations with multiple entity types
- Behavior of same batch negative sampling ? HOT 1
- Require C++ Installation after using PBG_INSTALL_CPP=1 pip install .
- [Question] Is it possible to "freeze" embeddings of a certain entity type?
- ModuleNotFoundError: No module named 'torchbiggraph.converters.importers'
- Choice of edge weight
- update version available through pip?
- Does VERSION.txt change as code changes?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch-biggraph.