Comments (14)
Sorry for the delay, I will have a look at it tomorrow!
from dicomweb-client.
This feature has been included in release 0.50.0
.
from dicomweb-client.
@razorx89 I agree that this would be a valuable addition. I think it would be most useful when combined with chunked transfer encoding to allow users to iterate over individual DICOM data sets of a larger collection without having to transfer the entire collection over network at once. However, this will likely require some work for parsing multipart messages and JSON arrays in an iterative manner.
from dicomweb-client.
Hi @razorx89, it took a while, but thanks to @ambrussimon's work, we have in the meanwhile implemented this feature and are getting close to releasing it. It would be great if you could try it out (see feature/retrieve-iter branch) and let us know what you think.
from dicomweb-client.
Regarding the exposed API functionality it behaves as expected 👍 The generator yields a single Dataset at a time which can then be processed, stored, or whatsoever. However, I observe a lag when retrieving the first instance followed by immediate retrievals for all other instances. It seems like the whole download is performed before yielding any instances.
from dicomweb_client.api import DICOMwebClient
import requests
import time
url = '...'
token = '...'
study_instance_uid = '...'
series_instance_uid = '...'
client = DICOMwebClient(
url=url,
headers={'Authorization': 'Bearer ' + token}
)
started = time.time()
for instance in client.iter_series(study_instance_uid, series_instance_uid):
print(time.time() - started)
started = time.time()
Here are the timings:
10.188491821289062
0.013832330703735352
0.013205766677856445
0.013229131698608398
0.013056039810180664
0.012837648391723633
0.01196742057800293
0.011025667190551758
0.005928516387939453
0.006155967712402344
0.0056972503662109375
0.0056705474853515625
0.005823373794555664
0.005536794662475586
0.005642890930175781
0.005368471145629883
0.0055768489837646484
0.005317211151123047
0.005077838897705078
0.005247592926025391
0.0048863887786865234
0.004983663558959961
0.004970550537109375
0.004859209060668945
0.004797220230102539
0.004669904708862305
0.004552125930786133
0.004494190216064453
0.004645347595214844
0.004573345184326172
0.004433631896972656
0.004312753677368164
0.004160642623901367
0.004187583923339844
0.004033565521240234
0.0040705204010009766
0.0039179325103759766
0.003842592239379883
0.003755807876586914
0.0037078857421875
0.0037102699279785156
0.0036394596099853516
0.003527402877807617
0.003499746322631836
0.003312826156616211
0.003244161605834961
0.0032253265380859375
0.003154277801513672
0.00311279296875
0.0029146671295166016
0.002889871597290039
0.002794504165649414
0.002807140350341797
0.0027315616607666016
0.0024902820587158203
0.0025513172149658203
0.002411365509033203
0.00225830078125
0.0021219253540039062
0.0021643638610839844
0.001979351043701172
0.0019381046295166016
0.0018315315246582031
0.0018048286437988281
0.001745462417602539
0.0017278194427490234
0.001683950424194336
0.0015952587127685547
from dicomweb-client.
@razorx89 Thank you for reviewing and testing the feature!
The generator yields a single Dataset at a time which can then be processed, stored, or whatsoever. However, I observe a lag when retrieving the first instance followed by immediate retrievals for all other instances. It seems like the whole download is performed before yielding any instances
Yes, that's the intended behavior when chunk_size
is unspecified. When chunked transfer encoding is used (chunk_size
is a positive integer), individual chunks will be streamed and the next part of the multipart message will be yielded once it has been completely downloaded.
You can try:
client = DICOMwebClient(
url=url,
headers={'Authorization': 'Bearer ' + token},
chunk_size=10**3
)
from dicomweb-client.
This approach decouples Python API from the underlying protocol and thereby allows clients to use the iter_*
methods even if chunked transfer encoding is disabled.
from dicomweb-client.
Thanks for clarification! Yes, I can confirm that setting the chunk_size
to a positive integer value indeed works. I would suggest to update the docstring for this parameter and explicitly mention, that the iter_*
methods will download everything before yielding any items if this parameter is unspecified (I only looked at the docstring for the client initializer).
Regarding the default behavior, just my two cents, from a user perspective this code is actually identical in terms of:
- delay until the first item is retrieved
- code execution (is does the same)
- high memory consumption for both, since the whole binary data has to be downloaded into memory
iter_*
methods only reduce the memory consumption for parsing thepydicom.Dataset
s
for dataset in client.retrieve_series(...):
pass
for dataset in client.iter_series(...):
pass
However, in my opinion the true potential of the iter_*
methods would be to be less memory hungry and to allow for interleaving CPU and I/O operations. With the current default behavior I think a lot of people might miss this "hidden" feature. Wouldn't it be better to set the default for iter_*
methods to a positive integer and allow for an opt-out?
from dicomweb-client.
I can confirm that setting the chunk_size to a positive integer value indeed works.
Great! Thanks for confirming.
However, in my opinion the true potential of the iter_* methods would be to be less memory hungry and to allow for interleaving CPU and I/O operations. With the current default behavior I think a lot of people might miss this "hidden" feature. Wouldn't it be better to set the default for iter_* methods to a positive integer and allow for an opt-out?
You are raising a valid concern. I have been reluctant to use chunked transfer encoding by default. However, it is a required HTTP 1.1 feature and should thus be supported by every DICOMweb server.
I would suggest that we use different implementations for retrieve_*
and iter_*
, where the first one always retrieves the data in one request, while the latter retrieves data in multiple chunks (with a reasonable default for chunk_size
).
@razorx89 @ambrussimon What do you think?
from dicomweb-client.
Sounds fine!
from dicomweb-client.
@razorx89 I implemented this in 081bcae as follows:
iter_*()
methods and thestore_instances()
method now always use chunked transfer encoding (bytes per chunk can be set via thechunk_size
parameter)retrieve_*()
methods never use chunked transfer encoding (chunk_size
is not used forGET
requests)
from dicomweb-client.
Two points:
- Line 659: Typo in the exception message
Experted
->Expected
https://github.com/MGHComputationalPathology/dicomweb-client/blob/081bcae64bdbad9ab7cfa4cc09ec9a5256ebb461/src/dicomweb_client/api.py#L658-L659 - Are
retrieve_*()
methods really never using chunked transfer encoding? It looks like the if statement on line 581 is always true with the new defaultchunk_size
and thus the chunked transfer encoding streamed GET will be always used. Or am I missing something?
https://github.com/MGHComputationalPathology/dicomweb-client/blob/081bcae64bdbad9ab7cfa4cc09ec9a5256ebb461/src/dicomweb_client/api.py#L581-L583
from dicomweb-client.
Thanks @razorx89 for your feedback!
-
I fixed the typo (3e06d92)
-
I provided you a link to the wrong commit (see ccc015f instead). Sorry about that!
from dicomweb-client.
No problem, at first glance it looks good to me 👍
from dicomweb-client.
Related Issues (20)
- Query regarding the body of STOW-RS request HOT 2
- Retrive metadata is QIDO-RS or WADO-RS? HOT 2
- Implementation of DICOMs3Client based on DICOMwebProtocol HOT 27
- ImportError: cannot import name 'Protocol' from 'typing' (python 3.7.11) HOT 5
- Orthanc Support HOT 3
- Outdated JPEG-LS media type for retrieval of frames
- Something broke for me going from 0.55.0 to 0.55.1 HOT 1
- Make pillow and numpy optional HOT 2
- Double trailing slash problem HOT 5
- Cannot find reference 'load_json_dataset' in 'api.py' HOT 1
- [Doc] Broken link in "Introduction" HOT 1
- Failing to import study using retrieve_study method
- noisy 'empty response' warning
- Retrieve_series_metadata Hangs Indefinitely Without Timeout
- Provide the results as Iterable of generators instead of lists HOT 2
- Allow to set force option of dcmread HOT 4
- Error in doc string?
- image/dicom-rle media type
- DICOMfileClient raises error for RLE Losssless files
- Success with partial content (status code 206)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dicomweb-client.