Comments (7)
You might want to look at http://scheduler-address:8787/status to get a sense of what's happening with your computation: http://distributed.readthedocs.io/en/latest/web.html . Also http://scheduler-address:8788/workers
That will probably help to identify what is happening.
from dask-drmaa.
I think something goes wrong with the distribution of tasks:
Some workers are already done, while other have 15 (long) pending tasks left.
from dask-drmaa.
from dask-drmaa.
Also, can you verify that you're using the latest version of distributed?
import distributed
print(distributed.__version__)
This should be 1.15.1
from dask-drmaa.
If you visit http://scheduler-address:8788/workers (this is a different page than what you are currently visiting) do you notice stealing events in the bottom plot?
I start the cluster with dask-drmaa 100
however, the page http://scheduler-address:8788/workers
doesn't exist for me.
Upon running ps aux | grep basnijh*
the only related process seems to be:
basnijh+ 6697 8.2 0.2 964060 93804 pts/32 Sl+ 16:01 0:16 /home/basnijholt/anaconda3/envs/py36/bin/python /home/basnijholt/anaconda3/envs/py36/bin/dask-drmaa 100
basnijh+ 6708 8.4 0.2 813504 88300 pts/32 S+ 16:01 0:17 /home/basnijholt/anaconda3/envs/py36/bin/python -m bokeh serve /home/basnijholt/anaconda3/envs/py36/lib/python3.6/site-packages/distributed/bokeh/background /home/basnijholt/anaconda3/envs/py36/lib/python3.6/site-packages/distributed/bokeh/status /home/basnijholt/anaconda3/envs/py36/lib/python3.6/site-packages/distributed/bokeh/tasks /home/basnijholt/anaconda3/envs/py36/lib/python3.6/site-packages/distributed/bokeh/workers /home/basnijholt/anaconda3/envs/py36/lib/python3.6/site-packages/distributed/bokeh/memory-usage.py /home/basnijholt/anaconda3/envs/py36/lib/python3.6/site-packages/distributed/bokeh/task-stream.py /home/basnijholt/anaconda3/envs/py36/lib/python3.6/site-packages/distributed/bokeh/task-progress.py /home/basnijholt/anaconda3/envs/py36/lib/python3.6/site-packages/distributed/bokeh/resource-profiles.py /home/basnijholt/anaconda3/envs/py36/lib/python3.6/site-packages/distributed/bokeh/worker-table.py /home/basnijholt/anaconda3/envs/py36/lib/python3.6/site-packages/distributed/bokeh/processing-stacks.py --check-unused-sessions=50 --unused-session-lifetime=1 --port 8787 --host localhost:8787 --host 127.0.0.1:8787 --host 127.0.0.1:8787 --host 127.0.0.1:8787 --host 127.0.0.1:8787 --host 192.168.0.34:8787 --host * --log-level info --args host 127.0.0.1 http-port 35199 tcp-port 8786 bokeh-port 8787
Also, can you verify that you're using the latest version of distributed?
Yes it is 1.15.1
.
from dask-drmaa.
If you're able to create an easy-to-reproduce example showing poor scheduler load then I encourage you to raise an issue at https://github.com/dask/distributed . I'm not sure how to help you easily without more information.
from dask-drmaa.
Since I am not able to make a minimal example I am closing this for now.
from dask-drmaa.
Related Issues (20)
- Munge authentication failure on SGE HOT 6
- Refresh DRMAACluster docstring HOT 1
- test_adaptive's test_adaptive_memory times out HOT 2
- CI failure in test_stop_single_worker on Python 3 HOT 2
- Unable to start SGE workers via docker-compose on Windows 8.1 HOT 6
- Move back towards Distributed's _retire_workers method HOT 5
- Make resource constraints first class citizens
- Release? HOT 2
- How to limit number of threads for a DRMAACluster worker? HOT 7
- Allow setting ip of scheduler HOT 1
- Docstring of `DRMAACluster` does not match function signature HOT 1
- outputPath and errorPath are ignored HOT 3
- New release? HOT 6
- Give absolute path to worker script HOT 4
- Does not work with user install of dask HOT 4
- Closing all workers HOT 2
- test_stop_single_worker failing on CI HOT 4
- worker_info dict has no key 'name' HOT 3
- distributed.utils.ignoring no longer exists - should be replaced with contextlib.suppress HOT 4
- Archive this repository HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dask-drmaa.