Giter Club home page Giter Club logo

Comments (11)

lewisblake avatar lewisblake commented on August 30, 2024 1

I will do some research on this: which is faster? How do they deal with nans and numpy arrays? How fast is the encoding (does the library do it all)? Which is better maintained? I will present what I find at the next pyaerocom meeting.

from pyaerocom.

avaldebe avatar avaldebe commented on August 30, 2024

I think that reading the obs files takes most of the time.
Do you have an example of time reduction?
Also, how is the memory consumption?

from pyaerocom.

AugustinMortier avatar AugustinMortier commented on August 30, 2024

No I don't have any number. Would be interesting to get the profile of a standard CAMS run.
The documentation mentions some memory specs here

For info, the documentation has several performances comparisons with other json libraries
e.g:
Library | compact (ms) | pretty (ms) | vs. orjson
orjson | 0.03 | 0.04 | 1
ujson | 0.18 | 0.19 | 4.6
rapidjson | 0.1 | 0.12 | 2.9
simplejson | 0.25 | 0.89 | 21.4
json | 0.18 | 0.71 | 17

from pyaerocom.

AugustinMortier avatar AugustinMortier commented on August 30, 2024

For example, the CAMS2-82 IFS experiment contains 34,854 files.

from pyaerocom.

jgriesfeller avatar jgriesfeller commented on August 30, 2024

There is a reason for orjson to be faster: e.g. it works with byte streams (so it leaves out the time consuming encoding work). Not a direct replacement of simplejson...
But I noted that the standard json is now faster than simplejson (and should be a direct replacement).
I'm wondering a bit if the json file writing could be multithreaded... At least for some of the files that should be possible.

from pyaerocom.

lewisblake avatar lewisblake commented on August 30, 2024

I will do some research on this: which is faster? How do they deal with nans and numpy arrays? How fast is the encoding (does the library do it all)? Which is better maintained? I will present what I find at the next pyaerocom meeting.

  1. Which is faster?
    In a number of experiments, orjson is faster than any other standard json package. https://github.com/ijl/orjson, https://showmax.engineering/articles/json-python-libraries-overview

  2. How to they deal with nans and numpy arrays?
    nans: Nans are serialized to null
    numpy arrays: serializes numpy arrays natively and is faster than all other compared libraries.

  3. The encoding is very fast - seems like that library does almost everything, except it is very strict about UTF-8 conformance for strings.

  4. orjson is actively maintained

Based on this, my vote would be to open a PR where we test this library and maybe do some timing studies to see if it helps our performance.

from pyaerocom.

lewisblake avatar lewisblake commented on August 30, 2024

Meetings minutes 22/1/24

@avaldebe :
More generally speaking, a context manager could relieve many of the inefficiencies associated with how we produce the json files for Aeroval experiments. We currently read and overwrite the same files thousands of times. This is very slow. A context manager would allow use to write the file once, every time we call it check the payload of what is currently in the "json file", if they output is not already there then it would add it, and then write the file once upon exiting the context manager. Probably a bigger rewrite.

Smarter idea: for longer-term development: Use a small sqlite database to store the intermediate evaluation results, and then query the database to create the json files as a final step so that you only write them once.

Iterative development approach: Start with writing the context manager to handle dealing with the writing of the jsons. Once that is up and working, swap out what the context manager does to upon exit write to the sqlite database (faster than writing to the jsons), and then write something at the end to query the sqlite database and write the json file once. sqlite is a relational database so some work is needed to figure out how to write the json file to this structure. Want an embedded database.

@jgriesfeller : Just changing the json library might not result in a shorter runtime because the largest time bottleneck we face is in terms of the IO.

from pyaerocom.

lewisblake avatar lewisblake commented on August 30, 2024

simplejson results, profiled using scalene.

The test config file is /lustre/storeB/users/lewisb/Python/Evaluations/aeroval/config/config_files/cfg_json_959.py on the blake_dev branch of the config repository. This experiment does not include time needed for reading data. In all tests a collocated netcdf file is available and picked up by the code so these timings are not effected by the time needed to read data.

With simplejson it took 6m:24.746s.

                                                  Memory usage: ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ (max: 2.322 GB, growth rate:   0%)                                                   
                 /lustre/storeB/users/lewisb/Python/Evaluations/aeroval/config/config_files/cfg_json_959.py: % of time =  99.99% (6m:24.720s) out of 6m:24.746s.                 
       ╷       ╷       ╷       ╷        ╷       ╷               ╷       ╷                                                                                                        
       │Time   │–––––– │–––––– │Memory  │–––––– │–––––––––––    │Copy   │                                                                                                        
  Line │Python │native │system │Python  │peak   │timeline/%     │(MB/s) │/lustre/storeB/users/lewisb/Python/Evaluations/aeroval/config/config_files/cfg_json_959.py              
╺━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━━━━━┿━━━━━━━┿━━━━━━━━━━━━━━━┿━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸
     1 │       │       │       │        │       │               │       │#!/usr/bin/env python3                                                                                  
     2 │       │       │       │        │       │               │       │# -*- coding: utf-8 -*-                                                                                 
     3 │       │       │       │        │       │               │       │"""                                                                                                     
     4 │       │       │       │        │       │               │       │Config file for AeroCom PhaseIII optical properties experiment                                          
     5 │       │       │       │        │       │               │       │"""                                                                                                     
     6 │       │       │       │        │       │               │       │import os                                                                                               
     7 │       │       │       │        │       │               │       │import socket                                                                                           
     8 │       │       │       │        │       │               │       │                                                                                                        
     9 │       │       │       │        │       │               │       │### Define filters for the obs subsets                                                                  
    10 │       │       │       │        │       │               │       │                                                                                                        
    11 │       │       │       │        │       │               │       │                                                                                                        
    12 │       │       │       │        │       │               │       │BASE_FILTER = {                                                                                         
    13 │       │       │       │        │       │               │       │    "latitude": [-90, 90],                                                                              
    14 │       │       │       │        │       │               │       │    "longitude": [-180, 180],                                                                           
    15 │       │       │       │        │       │               │       │    "station_id": ["NO0042*"],                                                                          
    16 │       │       │       │        │       │               │       │    "negate": "station_id",                                                                             
    17 │       │       │       │        │       │               │       │}                                                                                                       
    18 │       │       │       │        │       │               │       │                                                                                                        
    19 │       │       │       │        │       │               │       │VAR_OUTLIER_RANGES = {                                                                                  
    20 │       │       │       │        │       │               │       │    "concpm10": [-1, 5000],  # ug m-3                                                                   
    21 │       │       │       │        │       │               │       │    "concpm25": [-1, 5000],  # ug m-3                                                                   
    22 │       │       │       │        │       │               │       │    "vmrno2": [-1, 5000],  # ppb                                                                        
    23 │       │       │       │        │       │               │       │    "vmro3": [-1, 5000],  # ppb                                                                         
    24 │       │       │       │        │       │               │       │}                                                                                                       
    25 │       │       │       │        │       │               │       │                                                                                                        
    26 │       │       │       │        │       │               │       │ALTITUDE_FILTER = {"altitude": [0, 1000]}                                                               
    27 │       │       │       │        │       │               │       │                                                                                                        
    28 │       │       │       │        │       │               │       │HOSTNAME = socket.gethostname()                                                                         
    29 │       │       │       │        │       │               │       │if HOSTNAME == "pc5654":                                                                                
    30 │       │       │       │        │       │               │       │    preface = "/home/lewisb"                                                                            
    31 │       │       │       │        │       │               │       │else:                                                                                                   
    32 │       │       │       │        │       │               │       │    preface = ""                                                                                        
    33 │       │       │       │        │       │               │       │                                                                                                        
    34 │       │       │       │        │       │               │       │# Setup for models used in analysis                                                                     
    35 │       │       │       │        │       │               │       │# Empty MODELS dictionary will use a dummy model                                                        
    36 │       │       │       │        │       │               │       │MODELS = {                                                                                              
    37 │       │       │       │        │       │               │       │    # "IFS-OSUITE": dict(                                                                               
    38 │       │       │       │        │       │               │       │    #     model_id="ECMWF_OSUITE",                                                                      
    39 │       │       │       │        │       │               │       │    #     model_use_vars=dict(ec532aer="ec550aer"),                                                     
    40 │       │       │       │        │       │               │       │    # ),                                                                                                
    41 │       │       │       │        │       │               │       │    "ECMWF-CAMS-REAN": dict(                                                                            
    42 │       │       │       │        │       │               │       │        model_id="ECMWF_CAMS_REAN",                                                                     
    43 │       │       │       │        │       │               │       │        # model_use_vars=dict(ec532aer="ec550aer"),                                                     
    44 │       │       │       │        │       │               │       │    )                                                                                                   
    45 │       │       │       │        │       │               │       │}                                                                                                       
    46 │       │       │       │        │       │               │       │                                                                                                        
    47 │       │       │       │        │       │               │       │OBS_GROUNDBASED = {                                                                                     
    48 │       │       │       │        │       │               │       │    ##################                                                                                  
    49 │       │       │       │        │       │               │       │    #    AERONET                                                                                        
    50 │       │       │       │        │       │               │       │    ##################                                                                                  
    51 │       │       │       │        │       │               │       │    "AERONET": dict(                                                                                    
    52 │       │       │       │        │       │               │       │        obs_id="AeronetSunV3Lev1.5.daily",                                                              
    53 │       │       │       │        │       │               │       │        obs_vars=[                                                                                      
    54 │       │       │       │        │       │               │       │            # "ang4487aer",                                                                             
    55 │       │       │       │        │       │               │       │            "od550aer",                                                                                 
    56 │       │       │       │        │       │               │       │        ],                                                                                              
    57 │       │       │       │        │       │               │       │        web_interface_name="AERONET",                                                                   
    58 │       │       │       │        │       │               │       │        obs_vert_type="Column",                                                                         
    59 │       │       │       │        │       │               │       │        ignore_station_names="DRAGON*",                                                                 
    60 │       │       │       │        │       │               │       │        ts_type="daily",                                                                                
    61 │       │       │       │        │       │               │       │        colocate_time=True,                                                                             
    62 │       │       │       │        │       │               │       │        min_num_obs=dict(                                                                               
    63 │       │       │       │        │       │               │       │            yearly=dict(                                                                                
    64 │       │       │       │        │       │               │       │                # monthly=9,                                                                            
    65 │       │       │       │        │       │               │       │                daily=90,                                                                               
    66 │       │       │       │        │       │               │       │            ),                                                                                          
    67 │       │       │       │        │       │               │       │            monthly=dict(                                                                               
    68 │       │       │       │        │       │               │       │                weekly=1,                                                                               
    69 │       │       │       │        │       │               │       │            ),                                                                                          
    70 │       │       │       │        │       │               │       │            # weekly=dict(                                                                              
    71 │       │       │       │        │       │               │       │            #     daily=3,                                                                              
    72 │       │       │       │        │       │               │       │            # ),                                                                                        
    73 │       │       │       │        │       │               │       │        ),                                                                                              
    74 │       │       │       │        │       │               │       │        # obs_filters=AERONET_FILTER,                                                                   
    75 │       │       │       │        │       │               │       │    ),                                                                                                  
    76 │       │       │       │        │       │               │       │    "EBAS-tc": dict(                                                                                    
    77 │       │       │       │        │       │               │       │        obs_id="EBASMC",                                                                                
    78 │       │       │       │        │       │               │       │        web_interface_name="EBAS",                                                                      
    79 │       │       │       │        │       │               │       │        obs_vars=["concpm10"],  # "concpm10", "concpm25", "concso2", "vmrco", "vmrno2"                  
    80 │       │       │       │        │       │               │       │        obs_vert_type="Surface",                                                                        
    81 │       │       │       │        │       │               │       │        ts_type="daily",                                                                                
    82 │       │       │       │        │       │               │       │        colocate_time=True,                                                                             
    83 │       │       │       │        │       │               │       │        # ignore_station_ids=ignore_id_dict,                                                            
    84 │       │       │       │        │       │               │       │        # obs_filters=EBAS_FILTER,                                                                      
    85 │       │       │       │        │       │               │       │    ),                                                                                                  
    86 │       │       │       │        │       │               │       │}                                                                                                       
    87 │       │       │       │        │       │               │       │                                                                                                        
    88 │       │       │       │        │       │               │       │                                                                                                        
    89 │       │       │       │        │       │               │       │# Setup for supported satellite evaluations                                                             
    90 │       │       │       │        │       │               │       │OBS_SAT = {}                                                                                            
    91 │       │       │       │        │       │               │       │                                                                                                        
    92 │       │       │       │        │       │               │       │OBS_CFG = {**OBS_GROUNDBASED, **OBS_SAT}                                                                
    93 │       │       │       │        │       │               │       │                                                                                                        
    94 │       │       │       │        │       │               │       │                                                                                                        
    95 │       │       │       │        │       │               │       │DEFAULT_RESAMPLE_CONSTRAINTS = dict(                                                                    
    96 │       │       │       │        │       │               │       │    yearly=dict(monthly=9),                                                                             
    97 │       │       │       │        │       │               │       │    monthly=dict(                                                                                       
    98 │       │       │       │        │       │               │       │        daily=21,                                                                                       
    99 │       │       │       │        │       │               │       │        weekly=3,                                                                                       
   100 │       │       │       │        │       │               │       │    ),                                                                                                  
   101 │       │       │       │        │       │               │       │    daily=dict(hourly=1),                                                                               
   102 │       │       │       │        │       │               │       │)                                                                                                       
   103 │       │       │       │        │       │               │       │                                                                                                        
   104 │       │       │       │        │       │               │       │CFG = dict(                                                                                             
   105 │       │       │       │        │       │               │       │    model_cfg=MODELS,                                                                                   
   106 │       │       │       │        │       │               │       │    obs_cfg=OBS_CFG,                                                                                    
   107 │       │       │       │        │       │               │       │    obs_only=False,                                                                                     
   108 │       │       │       │        │       │               │       │    json_basedir=os.path.abspath("../../data"),                                                         
   109 │       │       │       │        │       │               │       │    # coldata_basedir = os.path.abspath('../../coldata'),                                               
   110 │       │       │       │        │       │               │       │    coldata_basedir=os.path.abspath("../../coldata"),                                                   
   111 │       │       │       │        │       │               │       │    io_aux_file=os.path.abspath("../eval_py/gridded_io_aux.py"),                                        
   112 │       │       │       │        │       │               │       │    # if True, existing colocated data files will be deleted                                            
   113 │       │       │       │        │       │               │       │    reanalyse_existing=True,                                                                            
   114 │       │       │       │        │       │               │       │    only_json=False,                                                                                    
   115 │       │       │       │        │       │               │       │    add_model_maps=False,                                                                               
   116 │       │       │       │        │       │               │       │    only_model_maps=False,                                                                              
   117 │       │       │       │        │       │               │       │    clear_existing_json=True,                                                                           
   118 │       │       │       │        │       │               │       │    # if True, the analysis will stop whenever an error occurs (else, errors that                       
   119 │       │       │       │        │       │               │       │    # occurred will be written into the logfiles)                                                       
   120 │       │       │       │        │       │               │       │    raise_exceptions=False,                                                                             
   121 │       │       │       │        │       │               │       │    # Regional filter for analysis                                                                      
   122 │       │       │       │        │       │               │       │    filter_name="ALL-wMOUNTAINS",                                                                       
   123 │       │       │       │        │       │               │       │    # colocation frequency (no statistics in higher resolution can be computed)                         
   124 │       │       │       │        │       │               │       │    ts_type="monthly",                                                                                  
   125 │       │       │       │        │       │               │       │    map_zoom="World",                                                                                   
   126 │       │       │       │        │       │               │       │    freqs=["daily", "monthly", "yearly"],                                                               
   127 │       │       │       │        │       │               │       │    periods=[                                                                                           
   128 │       │       │       │        │       │               │       │        # "2012",                                                                                       
   129 │       │       │       │        │       │               │       │        "2013",                                                                                         
   130 │       │       │       │        │       │               │       │    ],                                                                                                  
   131 │       │       │       │        │       │               │       │    main_freq="monthly",                                                                                
   132 │       │       │       │        │       │               │       │    zeros_to_nan=False,                                                                                 
   133 │       │       │       │        │       │               │       │    # diurnal = True,                                                                                   
   134 │       │       │       │        │       │               │       │    min_num_obs=DEFAULT_RESAMPLE_CONSTRAINTS,                                                           
   135 │       │       │       │        │       │               │       │    colocate_time=True,                                                                                 
   136 │       │       │       │        │       │               │       │    resample_how={"vmro3": {"daily": {"hourly": "max"}}},                                               
   137 │       │       │       │        │       │               │       │    obs_remove_outliers=False,                                                                          
   138 │       │       │       │        │       │               │       │    model_remove_outliers=False,                                                                        
   139 │       │       │       │        │       │               │       │    harmonise_units=True,                                                                               
   140 │       │       │       │        │       │               │       │    regions_how="htap",  # "htap", 'default',#'country',                                                
   141 │       │       │       │        │       │               │       │    annual_stats_constrained=False,  # keep false for earlinet                                          
   142 │       │       │       │        │       │               │       │    proj_id="testing",                                                                                  
   143 │       │       │       │        │       │               │       │    exp_id="json959",                                                                                   
   144 │       │       │       │        │       │               │       │    exp_name="json959",                                                                                 
   145 │       │       │       │        │       │               │       │    exp_descr=("Testing JSON libraries"),                                                               
   146 │       │       │       │        │       │               │       │    exp_pi="Lewis Blake",                                                                               
   147 │       │       │       │        │       │               │       │    public=True,                                                                                        
   148 │       │       │       │        │       │               │       │    # directory where colocated data files are supposed to be stored                                    
   149 │       │       │       │        │       │               │       │    weighted_stats=True,                                                                                
   150 │       │       │       │        │       │               │       │    obs_only_stats=False,                                                                               
   151 │       │       │       │        │       │               │       │    var_order_menu=["od550aer", "concpm10"],                                                            
   152 │       │       │       │        │       │               │       │)                                                                                                       
   153 │       │       │       │        │       │               │       │                                                                                                        
   154 │       │       │       │        │       │               │       │if __name__ == "__main__":                                                                              
   155 │       │       │       │ 100%   │   30M │▁▁▁            │     2 │    import matplotlib.pyplot as plt                                                                     
   156 │       │       │   2%  │  99%   │   86M │▁▁▁▁▁▁▁▁       │     5 │    import pyaerocom as pya                                                                             
   157 │       │       │       │        │       │               │       │    from pyaerocom import const                                                                         
   158 │       │       │       │        │       │               │       │    from pyaerocom.aeroval import EvalSetup, ExperimentProcessor                                        
   159 │       │       │       │        │       │               │       │                                                                                                        
   160 │       │       │       │        │       │               │       │    plt.close("all")                                                                                    
   161 │       │       │       │        │       │               │       │    stp = EvalSetup(**CFG)                                                                              
   162 │       │       │       │        │       │               │       │                                                                                                        
   163 │       │       │       │        │       │               │       │    ana = ExperimentProcessor(stp)                                                                      
   164 │       │       │       │        │       │               │       │                                                                                                        
   165 │   45% │   43% │   8%  │   8%   │17.47G │▁▁▁▁▁▁▁▁▁ 100% │   276 │    res = ana.run()                                                                                     
   166 │       │       │       │        │       │               │       │                                                                                                        
       ╵       ╵       ╵       ╵        ╵       ╵               ╵       ╵                                                                                                        
Top AVERAGE memory consumption, by line:
(1)   165: 17886 MB                                                                                                                                                               
Top PEAK memory consumption, by line:
(1)   165: 17886 MB                                                                                                                                                               
(2)   156:    86 MB                                                                                                                                                               
(3)   155:    30 MB                                                        

from pyaerocom.

lewisblake avatar lewisblake commented on August 30, 2024

orjson based results: 8m:43.674s

                                        Memory usage: ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁ (max: 2.333 GB, growth rate:   0%)                                         
       /lustre/storeB/users/lewisb/Python/Evaluations/aeroval/config/config_files/cfg_json_959.py: % of time = 100.00% (8m:43.662s) out of 8m:43.674s.       
       ╷       ╷       ╷       ╷        ╷       ╷               ╷       ╷                                                                                    
       │Time   │–––––– │–––––– │Memory  │–––––– │–––––––––––    │Copy   │                                                                                    
  Line │Python │native │system │Python  │peak   │timeline/%     │(MB/s) │/lustre/storeB/users/lewisb/Python/Evaluations/aeroval/config/config_files/cfg_js…  
╺━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━━━━━┿━━━━━━━┿━━━━━━━━━━━━━━━┿━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸
     1 │       │       │       │        │       │               │       │#!/usr/bin/env python3                                                              
     2 │       │       │       │        │       │               │       │# -*- coding: utf-8 -*-                                                             
     3 │       │       │       │        │       │               │       │"""                                                                                 
     4 │       │       │       │        │       │               │       │Config file for AeroCom PhaseIII optical properties experiment                      
     5 │       │       │       │        │       │               │       │"""                                                                                 
     6 │       │       │       │        │       │               │       │import os                                                                           
     7 │       │       │       │        │       │               │       │import socket                                                                       
     8 │       │       │       │        │       │               │       │                                                                                    
     9 │       │       │       │        │       │               │       │### Define filters for the obs subsets                                              
    10 │       │       │       │        │       │               │       │                                                                                    
    11 │       │       │       │        │       │               │       │                                                                                    
    12 │       │       │       │        │       │               │       │BASE_FILTER = {                                                                     
    13 │       │       │       │        │       │               │       │    "latitude": [-90, 90],                                                          
    14 │       │       │       │        │       │               │       │    "longitude": [-180, 180],                                                       
    15 │       │       │       │        │       │               │       │    "station_id": ["NO0042*"],                                                      
    16 │       │       │       │        │       │               │       │    "negate": "station_id",                                                         
    17 │       │       │       │        │       │               │       │}                                                                                   
    18 │       │       │       │        │       │               │       │                                                                                    
    19 │       │       │       │        │       │               │       │VAR_OUTLIER_RANGES = {                                                              
    20 │       │       │       │        │       │               │       │    "concpm10": [-1, 5000],  # ug m-3                                               
    21 │       │       │       │        │       │               │       │    "concpm25": [-1, 5000],  # ug m-3                                               
    22 │       │       │       │        │       │               │       │    "vmrno2": [-1, 5000],  # ppb                                                    
    23 │       │       │       │        │       │               │       │    "vmro3": [-1, 5000],  # ppb                                                     
    24 │       │       │       │        │       │               │       │}                                                                                   
    25 │       │       │       │        │       │               │       │                                                                                    
    26 │       │       │       │        │       │               │       │ALTITUDE_FILTER = {"altitude": [0, 1000]}                                           
    27 │       │       │       │        │       │               │       │                                                                                    
    28 │       │       │       │        │       │               │       │HOSTNAME = socket.gethostname()                                                     
    29 │       │       │       │        │       │               │       │if HOSTNAME == "pc5654":                                                            
    30 │       │       │       │        │       │               │       │    preface = "/home/lewisb"                                                        
    31 │       │       │       │        │       │               │       │else:                                                                               
    32 │       │       │       │        │       │               │       │    preface = ""                                                                    
    33 │       │       │       │        │       │               │       │                                                                                    
    34 │       │       │       │        │       │               │       │# Setup for models used in analysis                                                 
    35 │       │       │       │        │       │               │       │# Empty MODELS dictionary will use a dummy model                                    
    36 │       │       │       │        │       │               │       │MODELS = {                                                                          
    37 │       │       │       │        │       │               │       │    # "IFS-OSUITE": dict(                                                           
    38 │       │       │       │        │       │               │       │    #     model_id="ECMWF_OSUITE",                                                  
    39 │       │       │       │        │       │               │       │    #     model_use_vars=dict(ec532aer="ec550aer"),                                 
    40 │       │       │       │        │       │               │       │    # ),                                                                            
    41 │       │       │       │        │       │               │       │    "ECMWF-CAMS-REAN": dict(                                                        
    42 │       │       │       │        │       │               │       │        model_id="ECMWF_CAMS_REAN",                                                 
    43 │       │       │       │        │       │               │       │        # model_use_vars=dict(ec532aer="ec550aer"),                                 
    44 │       │       │       │        │       │               │       │    )                                                                               
    45 │       │       │       │        │       │               │       │}                                                                                   
    46 │       │       │       │        │       │               │       │                                                                                    
    47 │       │       │       │        │       │               │       │OBS_GROUNDBASED = {                                                                 
    48 │       │       │       │        │       │               │       │    ##################                                                              
    49 │       │       │       │        │       │               │       │    #    AERONET                                                                    
    50 │       │       │       │        │       │               │       │    ##################                                                              
    51 │       │       │       │        │       │               │       │    "AERONET": dict(                                                                
    52 │       │       │       │        │       │               │       │        obs_id="AeronetSunV3Lev1.5.daily",                                          
    53 │       │       │       │        │       │               │       │        obs_vars=[                                                                  
    54 │       │       │       │        │       │               │       │            # "ang4487aer",                                                         
    55 │       │       │       │        │       │               │       │            "od550aer",                                                             
    56 │       │       │       │        │       │               │       │        ],                                                                          
    57 │       │       │       │        │       │               │       │        web_interface_name="AERONET",                                               
    58 │       │       │       │        │       │               │       │        obs_vert_type="Column",                                                     
    59 │       │       │       │        │       │               │       │        ignore_station_names="DRAGON*",                                             
    60 │       │       │       │        │       │               │       │        ts_type="daily",                                                            
    61 │       │       │       │        │       │               │       │        colocate_time=True,                                                         
    62 │       │       │       │        │       │               │       │        min_num_obs=dict(                                                           
    63 │       │       │       │        │       │               │       │            yearly=dict(                                                            
    64 │       │       │       │        │       │               │       │                # monthly=9,                                                        
    65 │       │       │       │        │       │               │       │                daily=90,                                                           
    66 │       │       │       │        │       │               │       │            ),                                                                      
    67 │       │       │       │        │       │               │       │            monthly=dict(                                                           
    68 │       │       │       │        │       │               │       │                weekly=1,                                                           
    69 │       │       │       │        │       │               │       │            ),                                                                      
    70 │       │       │       │        │       │               │       │            # weekly=dict(                                                          
    71 │       │       │       │        │       │               │       │            #     daily=3,                                                          
    72 │       │       │       │        │       │               │       │            # ),                                                                    
    73 │       │       │       │        │       │               │       │        ),                                                                          
    74 │       │       │       │        │       │               │       │        # obs_filters=AERONET_FILTER,                                               
    75 │       │       │       │        │       │               │       │    ),                                                                              
    76 │       │       │       │        │       │               │       │    "EBAS-tc": dict(                                                                
    77 │       │       │       │        │       │               │       │        obs_id="EBASMC",                                                            
    78 │       │       │       │        │       │               │       │        web_interface_name="EBAS",                                                  
    79 │       │       │       │        │       │               │       │        obs_vars=["concpm10"],  # "concpm10", "concpm25", "concso2", "vmrco", "vmr  
    80 │       │       │       │        │       │               │       │        obs_vert_type="Surface",                                                    
    81 │       │       │       │        │       │               │       │        ts_type="daily",                                                            
    82 │       │       │       │        │       │               │       │        colocate_time=True,                                                         
    83 │       │       │       │        │       │               │       │        # ignore_station_ids=ignore_id_dict,                                        
    84 │       │       │       │        │       │               │       │        # obs_filters=EBAS_FILTER,                                                  
    85 │       │       │       │        │       │               │       │    ),                                                                              
    86 │       │       │       │        │       │               │       │}                                                                                   
    87 │       │       │       │        │       │               │       │                                                                                    
    88 │       │       │       │        │       │               │       │                                                                                    
    89 │       │       │       │        │       │               │       │# Setup for supported satellite evaluations                                         
    90 │       │       │       │        │       │               │       │OBS_SAT = {}                                                                        
    91 │       │       │       │        │       │               │       │                                                                                    
    92 │       │       │       │        │       │               │       │OBS_CFG = {**OBS_GROUNDBASED, **OBS_SAT}                                            
    93 │       │       │       │        │       │               │       │                                                                                    
    94 │       │       │       │        │       │               │       │                                                                                    
    95 │       │       │       │        │       │               │       │DEFAULT_RESAMPLE_CONSTRAINTS = dict(                                                
    96 │       │       │       │        │       │               │       │    yearly=dict(monthly=9),                                                         
    97 │       │       │       │        │       │               │       │    monthly=dict(                                                                   
    98 │       │       │       │        │       │               │       │        daily=21,                                                                   
    99 │       │       │       │        │       │               │       │        weekly=3,                                                                   
   100 │       │       │       │        │       │               │       │    ),                                                                              
   101 │       │       │       │        │       │               │       │    daily=dict(hourly=1),                                                           
   102 │       │       │       │        │       │               │       │)                                                                                   
   103 │       │       │       │        │       │               │       │                                                                                    
   104 │       │       │       │        │       │               │       │CFG = dict(                                                                         
   105 │       │       │       │        │       │               │       │    model_cfg=MODELS,                                                               
   106 │       │       │       │        │       │               │       │    obs_cfg=OBS_CFG,                                                                
   107 │       │       │       │        │       │               │       │    obs_only=False,                                                                 
   108 │       │       │       │        │       │               │       │    json_basedir=os.path.abspath("../../data"),                                     
   109 │       │       │       │        │       │               │       │    # coldata_basedir = os.path.abspath('../../coldata'),                           
   110 │       │       │       │        │       │               │       │    coldata_basedir=os.path.abspath("../../coldata"),                               
   111 │       │       │       │        │       │               │       │    io_aux_file=os.path.abspath("../eval_py/gridded_io_aux.py"),                    
   112 │       │       │       │        │       │               │       │    # if True, existing colocated data files will be deleted                        
   113 │       │       │       │        │       │               │       │    reanalyse_existing=True,                                                        
   114 │       │       │       │        │       │               │       │    only_json=False,                                                                
   115 │       │       │       │        │       │               │       │    add_model_maps=False,                                                           
   116 │       │       │       │        │       │               │       │    only_model_maps=False,                                                          
   117 │       │       │       │        │       │               │       │    clear_existing_json=True,                                                       
   118 │       │       │       │        │       │               │       │    # if True, the analysis will stop whenever an error occurs (else, errors that   
   119 │       │       │       │        │       │               │       │    # occurred will be written into the logfiles)                                   
   120 │       │       │       │        │       │               │       │    raise_exceptions=False,                                                         
   121 │       │       │       │        │       │               │       │    # Regional filter for analysis                                                  
   122 │       │       │       │        │       │               │       │    filter_name="ALL-wMOUNTAINS",                                                   
   123 │       │       │       │        │       │               │       │    # colocation frequency (no statistics in higher resolution can be computed)     
   124 │       │       │       │        │       │               │       │    ts_type="monthly",                                                              
   125 │       │       │       │        │       │               │       │    map_zoom="World",                                                               
   126 │       │       │       │        │       │               │       │    freqs=["daily", "monthly", "yearly"],                                           
   127 │       │       │       │        │       │               │       │    periods=[                                                                       
   128 │       │       │       │        │       │               │       │        # "2012",                                                                   
   129 │       │       │       │        │       │               │       │        "2013",                                                                     
   130 │       │       │       │        │       │               │       │    ],                                                                              
   131 │       │       │       │        │       │               │       │    main_freq="monthly",                                                            
   132 │       │       │       │        │       │               │       │    zeros_to_nan=False,                                                             
   133 │       │       │       │        │       │               │       │    # diurnal = True,                                                               
   134 │       │       │       │        │       │               │       │    min_num_obs=DEFAULT_RESAMPLE_CONSTRAINTS,                                       
   135 │       │       │       │        │       │               │       │    colocate_time=True,                                                             
   136 │       │       │       │        │       │               │       │    resample_how={"vmro3": {"daily": {"hourly": "max"}}},                           
   137 │       │       │       │        │       │               │       │    obs_remove_outliers=False,                                                      
   138 │       │       │       │        │       │               │       │    model_remove_outliers=False,                                                    
   139 │       │       │       │        │       │               │       │    harmonise_units=True,                                                           
   140 │       │       │       │        │       │               │       │    regions_how="htap",  # "htap", 'default',#'country',                            
   141 │       │       │       │        │       │               │       │    annual_stats_constrained=False,  # keep false for earlinet                      
   142 │       │       │       │        │       │               │       │    proj_id="testing",                                                              
   143 │       │       │       │        │       │               │       │    exp_id="json959",                                                               
   144 │       │       │       │        │       │               │       │    exp_name="json959",                                                             
   145 │       │       │       │        │       │               │       │    exp_descr=("Testing JSON libraries"),                                           
   146 │       │       │       │        │       │               │       │    exp_pi="Lewis Blake",                                                           
   147 │       │       │       │        │       │               │       │    public=True,                                                                    
   148 │       │       │       │        │       │               │       │    # directory where colocated data files are supposed to be stored                
   149 │       │       │       │        │       │               │       │    weighted_stats=True,                                                            
   150 │       │       │       │        │       │               │       │    obs_only_stats=False,                                                           
   151 │       │       │       │        │       │               │       │    var_order_menu=["od550aer", "concpm10"],                                        
   152 │       │       │       │        │       │               │       │)                                                                                   
   153 │       │       │       │        │       │               │       │                                                                                    
   154 │       │       │       │        │       │               │       │if __name__ == "__main__":                                                          
   155 │       │       │       │ 100%   │   30M │▁▁▁            │     2 │    import matplotlib.pyplot as plt                                                 
   156 │       │       │   4%  │ 100%   │   90M │▁▁▁▁▁▁▁▁▁      │     6 │    import pyaerocom as pya                                                         
   157 │       │       │       │        │       │               │       │    from pyaerocom import const                                                     
   158 │       │       │       │        │       │               │       │    from pyaerocom.aeroval import EvalSetup, ExperimentProcessor                    
   159 │       │       │       │        │       │               │       │                                                                                    
   160 │       │       │       │        │       │               │       │    plt.close("all")                                                                
   161 │       │       │       │        │       │               │       │    stp = EvalSetup(**CFG)                                                          
   162 │       │       │       │        │       │               │       │                                                                                    
   163 │       │       │       │        │       │               │       │    ana = ExperimentProcessor(stp)                                                  
   164 │       │       │       │        │       │               │       │                                                                                    
   165 │   42% │   38% │  14%  │   8%   │14.87G │▁▁▁▁▁▁▁▁▁ 100% │   230 │    res = ana.run()                                                                 
   166 │       │       │       │        │       │               │       │                                                                                    
       ╵       ╵       ╵       ╵        ╵       ╵               ╵       ╵                                                                                    
Top AVERAGE memory consumption, by line:
(1)   165: 15230 MB                                                                                                                                           
Top PEAK memory consumption, by line:
(1)   165: 15230 MB                                                                                                                                           
(2)   156:    90 MB                                                                                                                                           
(3)   155:    30 MB                                                                                                                                                            

from pyaerocom.

avaldebe avatar avaldebe commented on August 30, 2024

How much time is spent reading/writing JSON files?
Could you run a benchmark where write_json does nothing?

from pyaerocom.

lewisblake avatar lewisblake commented on August 30, 2024

@heikoklein and I profiled the code using cProfile and then visualized the results using KCacheGrind. The results were interesting. _lowlevel_helpers.py:write_json takes only about 1.4% of the total execution time. On the other hand, the bulk of the execution time is spent on colocationdata.py:filter_region and processing of the colocated data objects. Therefore, we are closing this issue as the potential optimization is relatively small, and in our tests we did not see any benefit for swapping out the libraries. Thanks for the suggestion in any case.
Screenshot from 2024-01-25 14-55-59

from pyaerocom.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.