Comments (11)
I will do some research on this: which is faster? How do they deal with nans and numpy arrays? How fast is the encoding (does the library do it all)? Which is better maintained? I will present what I find at the next pyaerocom meeting.
from pyaerocom.
I think that reading the obs files takes most of the time.
Do you have an example of time reduction?
Also, how is the memory consumption?
from pyaerocom.
No I don't have any number. Would be interesting to get the profile of a standard CAMS run.
The documentation mentions some memory specs here
For info, the documentation has several performances comparisons with other json libraries
e.g:
Library | compact (ms) | pretty (ms) | vs. orjson
orjson | 0.03 | 0.04 | 1
ujson | 0.18 | 0.19 | 4.6
rapidjson | 0.1 | 0.12 | 2.9
simplejson | 0.25 | 0.89 | 21.4
json | 0.18 | 0.71 | 17
from pyaerocom.
For example, the CAMS2-82 IFS experiment contains 34,854 files.
from pyaerocom.
There is a reason for orjson to be faster: e.g. it works with byte streams (so it leaves out the time consuming encoding work). Not a direct replacement of simplejson...
But I noted that the standard json is now faster than simplejson (and should be a direct replacement).
I'm wondering a bit if the json file writing could be multithreaded... At least for some of the files that should be possible.
from pyaerocom.
I will do some research on this: which is faster? How do they deal with nans and numpy arrays? How fast is the encoding (does the library do it all)? Which is better maintained? I will present what I find at the next pyaerocom meeting.
-
Which is faster?
In a number of experiments, orjson is faster than any other standard json package. https://github.com/ijl/orjson, https://showmax.engineering/articles/json-python-libraries-overview -
How to they deal with nans and numpy arrays?
nans: Nans are serialized to null
numpy arrays: serializes numpy arrays natively and is faster than all other compared libraries. -
The encoding is very fast - seems like that library does almost everything, except it is very strict about UTF-8 conformance for strings.
-
orjson is actively maintained
Based on this, my vote would be to open a PR where we test this library and maybe do some timing studies to see if it helps our performance.
from pyaerocom.
Meetings minutes 22/1/24
@avaldebe :
More generally speaking, a context manager could relieve many of the inefficiencies associated with how we produce the json files for Aeroval experiments. We currently read and overwrite the same files thousands of times. This is very slow. A context manager would allow use to write the file once, every time we call it check the payload of what is currently in the "json file", if they output is not already there then it would add it, and then write the file once upon exiting the context manager. Probably a bigger rewrite.
Smarter idea: for longer-term development: Use a small sqlite database to store the intermediate evaluation results, and then query the database to create the json files as a final step so that you only write them once.
Iterative development approach: Start with writing the context manager to handle dealing with the writing of the jsons. Once that is up and working, swap out what the context manager does to upon exit write to the sqlite database (faster than writing to the jsons), and then write something at the end to query the sqlite database and write the json file once. sqlite is a relational database so some work is needed to figure out how to write the json file to this structure. Want an embedded database.
@jgriesfeller : Just changing the json library might not result in a shorter runtime because the largest time bottleneck we face is in terms of the IO.
from pyaerocom.
simplejson
results, profiled using scalene.
The test config file is /lustre/storeB/users/lewisb/Python/Evaluations/aeroval/config/config_files/cfg_json_959.py
on the blake_dev
branch of the config repository. This experiment does not include time needed for reading data. In all tests a collocated netcdf file is available and picked up by the code so these timings are not effected by the time needed to read data.
With simplejson
it took 6m:24.746s.
Memory usage: ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ (max: 2.322 GB, growth rate: 0%)
/lustre/storeB/users/lewisb/Python/Evaluations/aeroval/config/config_files/cfg_json_959.py: % of time = 99.99% (6m:24.720s) out of 6m:24.746s.
╷ ╷ ╷ ╷ ╷ ╷ ╷ ╷
│Time │–––––– │–––––– │Memory │–––––– │––––––––––– │Copy │
Line │Python │native │system │Python │peak │timeline/% │(MB/s) │/lustre/storeB/users/lewisb/Python/Evaluations/aeroval/config/config_files/cfg_json_959.py
╺━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━━━━━┿━━━━━━━┿━━━━━━━━━━━━━━━┿━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸
1 │ │ │ │ │ │ │ │#!/usr/bin/env python3
2 │ │ │ │ │ │ │ │# -*- coding: utf-8 -*-
3 │ │ │ │ │ │ │ │"""
4 │ │ │ │ │ │ │ │Config file for AeroCom PhaseIII optical properties experiment
5 │ │ │ │ │ │ │ │"""
6 │ │ │ │ │ │ │ │import os
7 │ │ │ │ │ │ │ │import socket
8 │ │ │ │ │ │ │ │
9 │ │ │ │ │ │ │ │### Define filters for the obs subsets
10 │ │ │ │ │ │ │ │
11 │ │ │ │ │ │ │ │
12 │ │ │ │ │ │ │ │BASE_FILTER = {
13 │ │ │ │ │ │ │ │ "latitude": [-90, 90],
14 │ │ │ │ │ │ │ │ "longitude": [-180, 180],
15 │ │ │ │ │ │ │ │ "station_id": ["NO0042*"],
16 │ │ │ │ │ │ │ │ "negate": "station_id",
17 │ │ │ │ │ │ │ │}
18 │ │ │ │ │ │ │ │
19 │ │ │ │ │ │ │ │VAR_OUTLIER_RANGES = {
20 │ │ │ │ │ │ │ │ "concpm10": [-1, 5000], # ug m-3
21 │ │ │ │ │ │ │ │ "concpm25": [-1, 5000], # ug m-3
22 │ │ │ │ │ │ │ │ "vmrno2": [-1, 5000], # ppb
23 │ │ │ │ │ │ │ │ "vmro3": [-1, 5000], # ppb
24 │ │ │ │ │ │ │ │}
25 │ │ │ │ │ │ │ │
26 │ │ │ │ │ │ │ │ALTITUDE_FILTER = {"altitude": [0, 1000]}
27 │ │ │ │ │ │ │ │
28 │ │ │ │ │ │ │ │HOSTNAME = socket.gethostname()
29 │ │ │ │ │ │ │ │if HOSTNAME == "pc5654":
30 │ │ │ │ │ │ │ │ preface = "/home/lewisb"
31 │ │ │ │ │ │ │ │else:
32 │ │ │ │ │ │ │ │ preface = ""
33 │ │ │ │ │ │ │ │
34 │ │ │ │ │ │ │ │# Setup for models used in analysis
35 │ │ │ │ │ │ │ │# Empty MODELS dictionary will use a dummy model
36 │ │ │ │ │ │ │ │MODELS = {
37 │ │ │ │ │ │ │ │ # "IFS-OSUITE": dict(
38 │ │ │ │ │ │ │ │ # model_id="ECMWF_OSUITE",
39 │ │ │ │ │ │ │ │ # model_use_vars=dict(ec532aer="ec550aer"),
40 │ │ │ │ │ │ │ │ # ),
41 │ │ │ │ │ │ │ │ "ECMWF-CAMS-REAN": dict(
42 │ │ │ │ │ │ │ │ model_id="ECMWF_CAMS_REAN",
43 │ │ │ │ │ │ │ │ # model_use_vars=dict(ec532aer="ec550aer"),
44 │ │ │ │ │ │ │ │ )
45 │ │ │ │ │ │ │ │}
46 │ │ │ │ │ │ │ │
47 │ │ │ │ │ │ │ │OBS_GROUNDBASED = {
48 │ │ │ │ │ │ │ │ ##################
49 │ │ │ │ │ │ │ │ # AERONET
50 │ │ │ │ │ │ │ │ ##################
51 │ │ │ │ │ │ │ │ "AERONET": dict(
52 │ │ │ │ │ │ │ │ obs_id="AeronetSunV3Lev1.5.daily",
53 │ │ │ │ │ │ │ │ obs_vars=[
54 │ │ │ │ │ │ │ │ # "ang4487aer",
55 │ │ │ │ │ │ │ │ "od550aer",
56 │ │ │ │ │ │ │ │ ],
57 │ │ │ │ │ │ │ │ web_interface_name="AERONET",
58 │ │ │ │ │ │ │ │ obs_vert_type="Column",
59 │ │ │ │ │ │ │ │ ignore_station_names="DRAGON*",
60 │ │ │ │ │ │ │ │ ts_type="daily",
61 │ │ │ │ │ │ │ │ colocate_time=True,
62 │ │ │ │ │ │ │ │ min_num_obs=dict(
63 │ │ │ │ │ │ │ │ yearly=dict(
64 │ │ │ │ │ │ │ │ # monthly=9,
65 │ │ │ │ │ │ │ │ daily=90,
66 │ │ │ │ │ │ │ │ ),
67 │ │ │ │ │ │ │ │ monthly=dict(
68 │ │ │ │ │ │ │ │ weekly=1,
69 │ │ │ │ │ │ │ │ ),
70 │ │ │ │ │ │ │ │ # weekly=dict(
71 │ │ │ │ │ │ │ │ # daily=3,
72 │ │ │ │ │ │ │ │ # ),
73 │ │ │ │ │ │ │ │ ),
74 │ │ │ │ │ │ │ │ # obs_filters=AERONET_FILTER,
75 │ │ │ │ │ │ │ │ ),
76 │ │ │ │ │ │ │ │ "EBAS-tc": dict(
77 │ │ │ │ │ │ │ │ obs_id="EBASMC",
78 │ │ │ │ │ │ │ │ web_interface_name="EBAS",
79 │ │ │ │ │ │ │ │ obs_vars=["concpm10"], # "concpm10", "concpm25", "concso2", "vmrco", "vmrno2"
80 │ │ │ │ │ │ │ │ obs_vert_type="Surface",
81 │ │ │ │ │ │ │ │ ts_type="daily",
82 │ │ │ │ │ │ │ │ colocate_time=True,
83 │ │ │ │ │ │ │ │ # ignore_station_ids=ignore_id_dict,
84 │ │ │ │ │ │ │ │ # obs_filters=EBAS_FILTER,
85 │ │ │ │ │ │ │ │ ),
86 │ │ │ │ │ │ │ │}
87 │ │ │ │ │ │ │ │
88 │ │ │ │ │ │ │ │
89 │ │ │ │ │ │ │ │# Setup for supported satellite evaluations
90 │ │ │ │ │ │ │ │OBS_SAT = {}
91 │ │ │ │ │ │ │ │
92 │ │ │ │ │ │ │ │OBS_CFG = {**OBS_GROUNDBASED, **OBS_SAT}
93 │ │ │ │ │ │ │ │
94 │ │ │ │ │ │ │ │
95 │ │ │ │ │ │ │ │DEFAULT_RESAMPLE_CONSTRAINTS = dict(
96 │ │ │ │ │ │ │ │ yearly=dict(monthly=9),
97 │ │ │ │ │ │ │ │ monthly=dict(
98 │ │ │ │ │ │ │ │ daily=21,
99 │ │ │ │ │ │ │ │ weekly=3,
100 │ │ │ │ │ │ │ │ ),
101 │ │ │ │ │ │ │ │ daily=dict(hourly=1),
102 │ │ │ │ │ │ │ │)
103 │ │ │ │ │ │ │ │
104 │ │ │ │ │ │ │ │CFG = dict(
105 │ │ │ │ │ │ │ │ model_cfg=MODELS,
106 │ │ │ │ │ │ │ │ obs_cfg=OBS_CFG,
107 │ │ │ │ │ │ │ │ obs_only=False,
108 │ │ │ │ │ │ │ │ json_basedir=os.path.abspath("../../data"),
109 │ │ │ │ │ │ │ │ # coldata_basedir = os.path.abspath('../../coldata'),
110 │ │ │ │ │ │ │ │ coldata_basedir=os.path.abspath("../../coldata"),
111 │ │ │ │ │ │ │ │ io_aux_file=os.path.abspath("../eval_py/gridded_io_aux.py"),
112 │ │ │ │ │ │ │ │ # if True, existing colocated data files will be deleted
113 │ │ │ │ │ │ │ │ reanalyse_existing=True,
114 │ │ │ │ │ │ │ │ only_json=False,
115 │ │ │ │ │ │ │ │ add_model_maps=False,
116 │ │ │ │ │ │ │ │ only_model_maps=False,
117 │ │ │ │ │ │ │ │ clear_existing_json=True,
118 │ │ │ │ │ │ │ │ # if True, the analysis will stop whenever an error occurs (else, errors that
119 │ │ │ │ │ │ │ │ # occurred will be written into the logfiles)
120 │ │ │ │ │ │ │ │ raise_exceptions=False,
121 │ │ │ │ │ │ │ │ # Regional filter for analysis
122 │ │ │ │ │ │ │ │ filter_name="ALL-wMOUNTAINS",
123 │ │ │ │ │ │ │ │ # colocation frequency (no statistics in higher resolution can be computed)
124 │ │ │ │ │ │ │ │ ts_type="monthly",
125 │ │ │ │ │ │ │ │ map_zoom="World",
126 │ │ │ │ │ │ │ │ freqs=["daily", "monthly", "yearly"],
127 │ │ │ │ │ │ │ │ periods=[
128 │ │ │ │ │ │ │ │ # "2012",
129 │ │ │ │ │ │ │ │ "2013",
130 │ │ │ │ │ │ │ │ ],
131 │ │ │ │ │ │ │ │ main_freq="monthly",
132 │ │ │ │ │ │ │ │ zeros_to_nan=False,
133 │ │ │ │ │ │ │ │ # diurnal = True,
134 │ │ │ │ │ │ │ │ min_num_obs=DEFAULT_RESAMPLE_CONSTRAINTS,
135 │ │ │ │ │ │ │ │ colocate_time=True,
136 │ │ │ │ │ │ │ │ resample_how={"vmro3": {"daily": {"hourly": "max"}}},
137 │ │ │ │ │ │ │ │ obs_remove_outliers=False,
138 │ │ │ │ │ │ │ │ model_remove_outliers=False,
139 │ │ │ │ │ │ │ │ harmonise_units=True,
140 │ │ │ │ │ │ │ │ regions_how="htap", # "htap", 'default',#'country',
141 │ │ │ │ │ │ │ │ annual_stats_constrained=False, # keep false for earlinet
142 │ │ │ │ │ │ │ │ proj_id="testing",
143 │ │ │ │ │ │ │ │ exp_id="json959",
144 │ │ │ │ │ │ │ │ exp_name="json959",
145 │ │ │ │ │ │ │ │ exp_descr=("Testing JSON libraries"),
146 │ │ │ │ │ │ │ │ exp_pi="Lewis Blake",
147 │ │ │ │ │ │ │ │ public=True,
148 │ │ │ │ │ │ │ │ # directory where colocated data files are supposed to be stored
149 │ │ │ │ │ │ │ │ weighted_stats=True,
150 │ │ │ │ │ │ │ │ obs_only_stats=False,
151 │ │ │ │ │ │ │ │ var_order_menu=["od550aer", "concpm10"],
152 │ │ │ │ │ │ │ │)
153 │ │ │ │ │ │ │ │
154 │ │ │ │ │ │ │ │if __name__ == "__main__":
155 │ │ │ │ 100% │ 30M │▁▁▁ │ 2 │ import matplotlib.pyplot as plt
156 │ │ │ 2% │ 99% │ 86M │▁▁▁▁▁▁▁▁ │ 5 │ import pyaerocom as pya
157 │ │ │ │ │ │ │ │ from pyaerocom import const
158 │ │ │ │ │ │ │ │ from pyaerocom.aeroval import EvalSetup, ExperimentProcessor
159 │ │ │ │ │ │ │ │
160 │ │ │ │ │ │ │ │ plt.close("all")
161 │ │ │ │ │ │ │ │ stp = EvalSetup(**CFG)
162 │ │ │ │ │ │ │ │
163 │ │ │ │ │ │ │ │ ana = ExperimentProcessor(stp)
164 │ │ │ │ │ │ │ │
165 │ 45% │ 43% │ 8% │ 8% │17.47G │▁▁▁▁▁▁▁▁▁ 100% │ 276 │ res = ana.run()
166 │ │ │ │ │ │ │ │
╵ ╵ ╵ ╵ ╵ ╵ ╵ ╵
Top AVERAGE memory consumption, by line:
(1) 165: 17886 MB
Top PEAK memory consumption, by line:
(1) 165: 17886 MB
(2) 156: 86 MB
(3) 155: 30 MB
from pyaerocom.
orjson
based results: 8m:43.674s
Memory usage: ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁ (max: 2.333 GB, growth rate: 0%)
/lustre/storeB/users/lewisb/Python/Evaluations/aeroval/config/config_files/cfg_json_959.py: % of time = 100.00% (8m:43.662s) out of 8m:43.674s.
╷ ╷ ╷ ╷ ╷ ╷ ╷ ╷
│Time │–––––– │–––––– │Memory │–––––– │––––––––––– │Copy │
Line │Python │native │system │Python │peak │timeline/% │(MB/s) │/lustre/storeB/users/lewisb/Python/Evaluations/aeroval/config/config_files/cfg_js…
╺━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━━━━━┿━━━━━━━┿━━━━━━━━━━━━━━━┿━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸
1 │ │ │ │ │ │ │ │#!/usr/bin/env python3
2 │ │ │ │ │ │ │ │# -*- coding: utf-8 -*-
3 │ │ │ │ │ │ │ │"""
4 │ │ │ │ │ │ │ │Config file for AeroCom PhaseIII optical properties experiment
5 │ │ │ │ │ │ │ │"""
6 │ │ │ │ │ │ │ │import os
7 │ │ │ │ │ │ │ │import socket
8 │ │ │ │ │ │ │ │
9 │ │ │ │ │ │ │ │### Define filters for the obs subsets
10 │ │ │ │ │ │ │ │
11 │ │ │ │ │ │ │ │
12 │ │ │ │ │ │ │ │BASE_FILTER = {
13 │ │ │ │ │ │ │ │ "latitude": [-90, 90],
14 │ │ │ │ │ │ │ │ "longitude": [-180, 180],
15 │ │ │ │ │ │ │ │ "station_id": ["NO0042*"],
16 │ │ │ │ │ │ │ │ "negate": "station_id",
17 │ │ │ │ │ │ │ │}
18 │ │ │ │ │ │ │ │
19 │ │ │ │ │ │ │ │VAR_OUTLIER_RANGES = {
20 │ │ │ │ │ │ │ │ "concpm10": [-1, 5000], # ug m-3
21 │ │ │ │ │ │ │ │ "concpm25": [-1, 5000], # ug m-3
22 │ │ │ │ │ │ │ │ "vmrno2": [-1, 5000], # ppb
23 │ │ │ │ │ │ │ │ "vmro3": [-1, 5000], # ppb
24 │ │ │ │ │ │ │ │}
25 │ │ │ │ │ │ │ │
26 │ │ │ │ │ │ │ │ALTITUDE_FILTER = {"altitude": [0, 1000]}
27 │ │ │ │ │ │ │ │
28 │ │ │ │ │ │ │ │HOSTNAME = socket.gethostname()
29 │ │ │ │ │ │ │ │if HOSTNAME == "pc5654":
30 │ │ │ │ │ │ │ │ preface = "/home/lewisb"
31 │ │ │ │ │ │ │ │else:
32 │ │ │ │ │ │ │ │ preface = ""
33 │ │ │ │ │ │ │ │
34 │ │ │ │ │ │ │ │# Setup for models used in analysis
35 │ │ │ │ │ │ │ │# Empty MODELS dictionary will use a dummy model
36 │ │ │ │ │ │ │ │MODELS = {
37 │ │ │ │ │ │ │ │ # "IFS-OSUITE": dict(
38 │ │ │ │ │ │ │ │ # model_id="ECMWF_OSUITE",
39 │ │ │ │ │ │ │ │ # model_use_vars=dict(ec532aer="ec550aer"),
40 │ │ │ │ │ │ │ │ # ),
41 │ │ │ │ │ │ │ │ "ECMWF-CAMS-REAN": dict(
42 │ │ │ │ │ │ │ │ model_id="ECMWF_CAMS_REAN",
43 │ │ │ │ │ │ │ │ # model_use_vars=dict(ec532aer="ec550aer"),
44 │ │ │ │ │ │ │ │ )
45 │ │ │ │ │ │ │ │}
46 │ │ │ │ │ │ │ │
47 │ │ │ │ │ │ │ │OBS_GROUNDBASED = {
48 │ │ │ │ │ │ │ │ ##################
49 │ │ │ │ │ │ │ │ # AERONET
50 │ │ │ │ │ │ │ │ ##################
51 │ │ │ │ │ │ │ │ "AERONET": dict(
52 │ │ │ │ │ │ │ │ obs_id="AeronetSunV3Lev1.5.daily",
53 │ │ │ │ │ │ │ │ obs_vars=[
54 │ │ │ │ │ │ │ │ # "ang4487aer",
55 │ │ │ │ │ │ │ │ "od550aer",
56 │ │ │ │ │ │ │ │ ],
57 │ │ │ │ │ │ │ │ web_interface_name="AERONET",
58 │ │ │ │ │ │ │ │ obs_vert_type="Column",
59 │ │ │ │ │ │ │ │ ignore_station_names="DRAGON*",
60 │ │ │ │ │ │ │ │ ts_type="daily",
61 │ │ │ │ │ │ │ │ colocate_time=True,
62 │ │ │ │ │ │ │ │ min_num_obs=dict(
63 │ │ │ │ │ │ │ │ yearly=dict(
64 │ │ │ │ │ │ │ │ # monthly=9,
65 │ │ │ │ │ │ │ │ daily=90,
66 │ │ │ │ │ │ │ │ ),
67 │ │ │ │ │ │ │ │ monthly=dict(
68 │ │ │ │ │ │ │ │ weekly=1,
69 │ │ │ │ │ │ │ │ ),
70 │ │ │ │ │ │ │ │ # weekly=dict(
71 │ │ │ │ │ │ │ │ # daily=3,
72 │ │ │ │ │ │ │ │ # ),
73 │ │ │ │ │ │ │ │ ),
74 │ │ │ │ │ │ │ │ # obs_filters=AERONET_FILTER,
75 │ │ │ │ │ │ │ │ ),
76 │ │ │ │ │ │ │ │ "EBAS-tc": dict(
77 │ │ │ │ │ │ │ │ obs_id="EBASMC",
78 │ │ │ │ │ │ │ │ web_interface_name="EBAS",
79 │ │ │ │ │ │ │ │ obs_vars=["concpm10"], # "concpm10", "concpm25", "concso2", "vmrco", "vmr
80 │ │ │ │ │ │ │ │ obs_vert_type="Surface",
81 │ │ │ │ │ │ │ │ ts_type="daily",
82 │ │ │ │ │ │ │ │ colocate_time=True,
83 │ │ │ │ │ │ │ │ # ignore_station_ids=ignore_id_dict,
84 │ │ │ │ │ │ │ │ # obs_filters=EBAS_FILTER,
85 │ │ │ │ │ │ │ │ ),
86 │ │ │ │ │ │ │ │}
87 │ │ │ │ │ │ │ │
88 │ │ │ │ │ │ │ │
89 │ │ │ │ │ │ │ │# Setup for supported satellite evaluations
90 │ │ │ │ │ │ │ │OBS_SAT = {}
91 │ │ │ │ │ │ │ │
92 │ │ │ │ │ │ │ │OBS_CFG = {**OBS_GROUNDBASED, **OBS_SAT}
93 │ │ │ │ │ │ │ │
94 │ │ │ │ │ │ │ │
95 │ │ │ │ │ │ │ │DEFAULT_RESAMPLE_CONSTRAINTS = dict(
96 │ │ │ │ │ │ │ │ yearly=dict(monthly=9),
97 │ │ │ │ │ │ │ │ monthly=dict(
98 │ │ │ │ │ │ │ │ daily=21,
99 │ │ │ │ │ │ │ │ weekly=3,
100 │ │ │ │ │ │ │ │ ),
101 │ │ │ │ │ │ │ │ daily=dict(hourly=1),
102 │ │ │ │ │ │ │ │)
103 │ │ │ │ │ │ │ │
104 │ │ │ │ │ │ │ │CFG = dict(
105 │ │ │ │ │ │ │ │ model_cfg=MODELS,
106 │ │ │ │ │ │ │ │ obs_cfg=OBS_CFG,
107 │ │ │ │ │ │ │ │ obs_only=False,
108 │ │ │ │ │ │ │ │ json_basedir=os.path.abspath("../../data"),
109 │ │ │ │ │ │ │ │ # coldata_basedir = os.path.abspath('../../coldata'),
110 │ │ │ │ │ │ │ │ coldata_basedir=os.path.abspath("../../coldata"),
111 │ │ │ │ │ │ │ │ io_aux_file=os.path.abspath("../eval_py/gridded_io_aux.py"),
112 │ │ │ │ │ │ │ │ # if True, existing colocated data files will be deleted
113 │ │ │ │ │ │ │ │ reanalyse_existing=True,
114 │ │ │ │ │ │ │ │ only_json=False,
115 │ │ │ │ │ │ │ │ add_model_maps=False,
116 │ │ │ │ │ │ │ │ only_model_maps=False,
117 │ │ │ │ │ │ │ │ clear_existing_json=True,
118 │ │ │ │ │ │ │ │ # if True, the analysis will stop whenever an error occurs (else, errors that
119 │ │ │ │ │ │ │ │ # occurred will be written into the logfiles)
120 │ │ │ │ │ │ │ │ raise_exceptions=False,
121 │ │ │ │ │ │ │ │ # Regional filter for analysis
122 │ │ │ │ │ │ │ │ filter_name="ALL-wMOUNTAINS",
123 │ │ │ │ │ │ │ │ # colocation frequency (no statistics in higher resolution can be computed)
124 │ │ │ │ │ │ │ │ ts_type="monthly",
125 │ │ │ │ │ │ │ │ map_zoom="World",
126 │ │ │ │ │ │ │ │ freqs=["daily", "monthly", "yearly"],
127 │ │ │ │ │ │ │ │ periods=[
128 │ │ │ │ │ │ │ │ # "2012",
129 │ │ │ │ │ │ │ │ "2013",
130 │ │ │ │ │ │ │ │ ],
131 │ │ │ │ │ │ │ │ main_freq="monthly",
132 │ │ │ │ │ │ │ │ zeros_to_nan=False,
133 │ │ │ │ │ │ │ │ # diurnal = True,
134 │ │ │ │ │ │ │ │ min_num_obs=DEFAULT_RESAMPLE_CONSTRAINTS,
135 │ │ │ │ │ │ │ │ colocate_time=True,
136 │ │ │ │ │ │ │ │ resample_how={"vmro3": {"daily": {"hourly": "max"}}},
137 │ │ │ │ │ │ │ │ obs_remove_outliers=False,
138 │ │ │ │ │ │ │ │ model_remove_outliers=False,
139 │ │ │ │ │ │ │ │ harmonise_units=True,
140 │ │ │ │ │ │ │ │ regions_how="htap", # "htap", 'default',#'country',
141 │ │ │ │ │ │ │ │ annual_stats_constrained=False, # keep false for earlinet
142 │ │ │ │ │ │ │ │ proj_id="testing",
143 │ │ │ │ │ │ │ │ exp_id="json959",
144 │ │ │ │ │ │ │ │ exp_name="json959",
145 │ │ │ │ │ │ │ │ exp_descr=("Testing JSON libraries"),
146 │ │ │ │ │ │ │ │ exp_pi="Lewis Blake",
147 │ │ │ │ │ │ │ │ public=True,
148 │ │ │ │ │ │ │ │ # directory where colocated data files are supposed to be stored
149 │ │ │ │ │ │ │ │ weighted_stats=True,
150 │ │ │ │ │ │ │ │ obs_only_stats=False,
151 │ │ │ │ │ │ │ │ var_order_menu=["od550aer", "concpm10"],
152 │ │ │ │ │ │ │ │)
153 │ │ │ │ │ │ │ │
154 │ │ │ │ │ │ │ │if __name__ == "__main__":
155 │ │ │ │ 100% │ 30M │▁▁▁ │ 2 │ import matplotlib.pyplot as plt
156 │ │ │ 4% │ 100% │ 90M │▁▁▁▁▁▁▁▁▁ │ 6 │ import pyaerocom as pya
157 │ │ │ │ │ │ │ │ from pyaerocom import const
158 │ │ │ │ │ │ │ │ from pyaerocom.aeroval import EvalSetup, ExperimentProcessor
159 │ │ │ │ │ │ │ │
160 │ │ │ │ │ │ │ │ plt.close("all")
161 │ │ │ │ │ │ │ │ stp = EvalSetup(**CFG)
162 │ │ │ │ │ │ │ │
163 │ │ │ │ │ │ │ │ ana = ExperimentProcessor(stp)
164 │ │ │ │ │ │ │ │
165 │ 42% │ 38% │ 14% │ 8% │14.87G │▁▁▁▁▁▁▁▁▁ 100% │ 230 │ res = ana.run()
166 │ │ │ │ │ │ │ │
╵ ╵ ╵ ╵ ╵ ╵ ╵ ╵
Top AVERAGE memory consumption, by line:
(1) 165: 15230 MB
Top PEAK memory consumption, by line:
(1) 165: 15230 MB
(2) 156: 90 MB
(3) 155: 30 MB
from pyaerocom.
How much time is spent reading/writing JSON files?
Could you run a benchmark where write_json
does nothing?
from pyaerocom.
@heikoklein and I profiled the code using cProfile and then visualized the results using KCacheGrind. The results were interesting. _lowlevel_helpers.py:write_json
takes only about 1.4% of the total execution time. On the other hand, the bulk of the execution time is spent on colocationdata.py:filter_region
and processing of the colocated data objects. Therefore, we are closing this issue as the potential optimization is relatively small, and in our tests we did not see any benefit for swapping out the libraries. Thanks for the suggestion in any case.
from pyaerocom.
Related Issues (20)
- make the variable ratpm25pm10 usable HOT 1
- Simplification of pyaeroval using Pydantic
- Remove support for python 3.9
- Merging of CAMS2_83 code to main-dev
- tests with lustre-access fail HOT 5
- Pya browse for ungridded observations is broken. HOT 2
- adjust URL for testdata
- Make aeroval json reading work with Pyaro HOT 1
- is this typo a bug? HOT 1
- is this a bug? or it is masking a bug elsewhere? HOT 1
- geodesy tests are broken or outdated HOT 1
- test_config.py fail on a machine with access to /lustre/storeA and /lustre/storeB
- Allow ordering of statistics HOT 1
- pytest should not fail on warinings HOT 1
- Remove warnings in tests
- pyaerocom gives warnings about files not being in metastandards version HOT 1
- Only read files fullfilling certain convention
- venv 3.12 tests on Ubuntu 22.04 are too slow HOT 4
- Dropping of statistics doesn't work when "num_valid" is 0
- CAMS2_82 new CNEMC reader - Chilean observations HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyaerocom.