(venv) [bash][gwatts]:idap-200gbps-atlas > python servicex/servicex_materialize_branches.py -v --distributed-client scheduler --dask-scheduler 'tcp://dask-gwatts-ead73a76-c.af-jupyter:8786' --num-files 0
0000.0744 - INFO - Using release 21.2.231
0000.0749 - INFO - Building ServiceX query
0000.1044 - WARNING - Fetched the default calibration configuration for a query. It should have been intentionally configured - using configuration for data format PHYS
0000.1327 - INFO - Starting ServiceX query
0000.7497 - INFO - Running servicex query for f70228e6-6655-443a-a7f2-77de0937d134 took 0:00:00.278472 (no files downloaded)
0000.7583 - INFO - Finished ServiceX query
0000.7593 - INFO - Using `uproot.dask` to open files
0001.2214 - INFO - Generating the dask compute graph for 27 fields
0001.3238 - INFO - Computing the total count
Traceback (most recent call last):
File "/home/gwatts/code/iris-hep/idap-200gbps-atlas/servicex/servicex_materialize_branches.py", line 325, in <module>
main(ignore_cache=args.ignore_cache, num_files=args.num_files,
File "/home/gwatts/code/iris-hep/idap-200gbps-atlas/servicex/servicex_materialize_branches.py", line 228, in main
r = total_count.compute() # type: ignore
File "/venv/lib/python3.9/site-packages/dask/base.py", line 375, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/venv/lib/python3.9/site-packages/dask/base.py", line 661, in compute
results = schedule(dsk, keys, **kwargs)
File "/venv/lib/python3.9/site-packages/uproot/_dask.py", line 1343, in __call__
result, _ = self._call_impl(
File "/venv/lib/python3.9/site-packages/uproot/_dask.py", line 1266, in _call_impl
ttree = uproot._util.regularize_object_path(
File "/venv/lib/python3.9/site-packages/uproot/_util.py", line 962, in regularize_object_path
file = ReadOnlyFile(
File "/venv/lib/python3.9/site-packages/uproot/reading.py", line 761, in root_directory
return ReadOnlyDirectory(
File "/venv/lib/python3.9/site-packages/uproot/reading.py", line 1400, in __init__
keys_chunk = file.chunk(keys_start, keys_stop)
File "/venv/lib/python3.9/site-packages/uproot/reading.py", line 1185, in chunk
return self._source.chunk(start, stop)
File "/venv/lib/python3.9/site-packages/uproot/source/fsspec.py", line 115, in chunk
data = self._fh.read(stop - start)
File "/venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 598, in read
return super().read(length)
File "/venv/lib/python3.9/site-packages/fsspec/spec.py", line 1846, in read
out = self.cache._fetch(self.loc, self.loc + length)
File "/venv/lib/python3.9/site-packages/fsspec/caching.py", line 439, in _fetch
self.cache = self.fetcher(start, bend)
File "/venv/lib/python3.9/site-packages/fsspec/asyn.py", line 118, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/venv/lib/python3.9/site-packages/fsspec/asyn.py", line 103, in sync
raise return_result
File "/venv/lib/python3.9/site-packages/fsspec/asyn.py", line 56, in _runner
result[0] = await coro
File "/venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 653, in async_fetch_range
r.raise_for_status()
File "/venv/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 1060, in raise_for_status
raise ClientResponseError(
Exception: ClientResponseError(RequestInfo(url=URL('https://s3.af.uchicago.edu/f70228e6-6655-443a-a7f2-77de0937d134/root:::192.170.240.145::root:::eosatlas.cern.ch:1094::eos:atlas:atlasdatadisk:rucio:mc23_13p6TeV:e5:17:DAOD_PHYSLITE.37223155._000341.pool.root.1?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ABAOJZ4XMLKWO5H0PZJ3/20240412/af-object-store/s3/aws4_request&X-Amz-Date=20240412T190811Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=385d92df18e0cad7e071e0dc84ef8c72fc32d8ec2f02a63bf1fd97d2304083f9'), method='GET', headers=<CIMultiDictProxy('Host': 's3.af.uchicago.edu', 'Range': 'bytes=30381811-35624926', 'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'User-Agent': 'Python/3.9 aiohttp/3.9.3')>, real_url=URL('https://s3.af.uchicago.edu/f70228e6-6655-443a-a7f2-77de0937d134/root:::192.170.240.145::root:::eosatlas.cern.ch:1094::eos:atlas:atlasdatadisk:rucio:mc23_13p6TeV:e5:17:DAOD_PHYSLITE.37223155._000341.pool.root.1?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ABAOJZ4XMLKWO5H0PZJ3/20240412/af-object-store/s3/aws4_request&X-Amz-Date=20240412T190811Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=385d92df18e0cad7e071e0dc84ef8c72fc32d8ec2f02a63bf1fd97d2304083f9')), (), status=503, message='Slow Down', headers=<CIMultiDictProxy('Date': 'Fri, 12 Apr 2024 19:08:48 GMT', 'Content-Type': 'application/xml', 'Content-Length': '211', 'Connection': 'keep-alive', 'x-amz-request-id': 'tx00000000000000002daba-00661986c0-7b36232-af-object-store', 'Accept-Ranges': 'bytes', 'Strict-Transport-Security': 'max-age=15724800; includeSubDomains')>)
This is testing with workers already setup (not dynamically scaling). It occurs with:
I think what is happening is 200 workers hit S3 at exactly the same time and that causes its slow down
message. With dynamic scaling, the nodes slowly come up, and so the S3 load is spread out a little bit.