Giter Club home page Giter Club logo

addalin / pyalidan Goto Github PK

View Code? Open in Web Editor NEW
5.0 3.0 0.0 583.52 MB

A python implementation of the Atmospheric Lidar Data Augmentation (ALiDAn) framework and a learning pipeline utilizing both ALiDAn's and raw data.

Home Page: https://github.com/Addalin/pyALiDAn

HTML 4.51% Python 0.52% Jupyter Notebook 94.97% Shell 0.01%
databases deep-learning lidar statistical-learning aerosols atmospheric-modelling data-augmentation lidar-calibration photonics atmospheric-lidar

pyalidan's Introduction

DOI

pyALiDAn

Python implementation of the Atmospheric Lidar Data Augmentation (ALiDAn) framework & a learning PyTorch-based pipeline of lidar analysis.

ALiDAn is an end-to-end physics- and statistics-based simulation framework of lidar measurements [1]. This framework aims to promote the study of dynamic phenomena from lidar measurements and set new benchmarks.

The repository also includes a spatiotemporal and synergistic lidar calibration approach [2], which forms a learning pipeline for additional algorithms such as inversion of aerosols, aerosol typing etc.

Note: This repository is still under final preparations. It will hold the supplemental data and code for the papers [1] and [2]. To receive a notification when the code is ready, you are welcome to add our repository to your "star" & "watch" repositories :)

References:

[1] Adi Vainiger, Omer Shubi, Yoav Schechner, Zhenping Yin, Holger Baars, Birgit Heese, Dietrich Althausen, "ALiDAn: Spatiotemporal and Multi--Wavelength Atmospheric Lidar Data Augmentation”, IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1-17, 2022.

[2] Adi Vainiger, Omer Shubi, Yoav Schechner, Zhenping Yin, Holger Baars, Birgit Heese, Dietrich Althausen, "Supervised learning calibration of an atmospheric lidar” IEEE International Geoscience and Remote Sensing Symposium (2022).

Acknowledgements:

I. Czerninski, Y. Sde Chen, M. Tzabari, Y.Bertschy , M. Fisher, J. Hofer, A. Floutsi, R. Hengst, I. Talmon, and D. Yagodin, The Taub Foundation, Ollendorff Minerva Center. The authors acknowledge the financial contributions and the inspiring framework of the ERC Synergy Grant “CloudCT” (Number 810370).

pyALiDAn derives data from measurements, reanalyses, and assimilation databases such as PollyNet, AERONET by NASA , GDAS NOAA, ERA5, etc. Such data varies by geographic location, spatially, temporally, and spectrally. For handling and visualizing we chose to use xarray, pandas, and seaborn. SQLite is used for information extraction from databases, ARLreader is used to read the NOAA ARLs data. Additional science codes are used for physics or machine learning models, as SciPy, lidar_molecular and more. The learning section relies on PyTorch, PyTorch Lightning and RAY. These are wonderful learning packages, if you are not familiar they have many tutorials.

We are grateful to the developers and creators of the above libraries.

Installation

To get the code simply clone it -

git clone https://github.com/Addalin/learning_lidar.git

Then, to setup the environment -

  • cd learning_lidar
  • conda env create -f environment.yml

Activate it by - conda activate lidar

Run python setup.py develop to locally install the lidar learning package - this is not currently necessary but can assist with missing paths when running scripts from command line.

Running the scripts

Each script can be run separately. They all use the command line format, with the base arguments of --station_name, --start_date, --end_date, --plot_results, --save_ds, and additional agruments based on the specific script.

For example to run generation main: python generation_main.py --station_name haifa --start_date 2017-09-01 --end_date 2017-10-31 --plot_results --save_ds

Where relevant, use the --use_km_unit flag to use km units vs m units.

Code Structure

Under learning_lidar:

In general, each sub folder corresponds to a process, and each standalone script is in a different file, and has a corresponding <script_name>_utils file for subroutines,

There is a general utils folder, and additional minor scripts and notebooks not mentioned here.

Preprocessing

  • Main script is preprocessing/preprocessing.py
  • converts raw data into clean format.
  • Specifically can be used to:
    • download and convert gdas files with the --download_gdas and --convert_gdas flags
    • generate molecular --generate_molecular_ds, lidar --generate_lidar_ds or raw lidar --generate_raw_lidar_ds
    • --unzip_lidar_tropos to automatically unzip downloaded TROPOS lidar data.

Generation

  • Generates ALiDAn data. generation/generation_main.py is a wrapper for the different parts of the process and and can be used to to run everything at once for a given period. It includes:

    • Background Signal (genage_bg_signals)
    • Angstrom Exponent and optical depth (read_AERONET_data)
    • KDE Estimation (KDE_estimation_sample)
    • Lidar Constant (generate_LC_pattern)
    • Density generation (generate_density)
    • signal generation (daily_signals_generation)
  • Additional code:

    • Figures output and validation of ALiDAn [1] are under [generation/ALiDAn Notebooks](generation/ALiDAn Notebooks).
    • Large parts of the code were initially written as notebooks, then manually converted to py files.
      • For example under generation/legacy are the original notebooks.
      • generate_bg_signals has been converted to py, but not yet generalized to any time period, thus the original notebook is still in the main generation folder.
      • overlap.ipynb hasn't been converted to py yet. Overlap is an additional part of the generation process
      • Figures that were necessary for the paper are saved under the figures subdirectory. Only relevant if the --plot_results flag is present.

Dataseting

  • Main script is dataseting/dataseting.py
  • Flags:
    • Used to create a csv of the records - --do_dataset
    • --extend_dataset to add additional info to the dataset
    • Create calibration dataset from the extended df - --do_calibration_dataset
    • --create_train_test_splits to create train test splits
    • --calc_stats to calculate mean, min, max, std statistics
    • --create_time_split_samples to split up the dataset into small intervals.
    • Note, use --generated_mode to apply the operations on the generated data (vs the raw tropos data)

Learning_phase

The learning pipeline is designed to receive two data types: raw lidar measurements by pollyXT and simulated by ALiDAn. The implementation is oriented to lidar calibration. However, one can easily apply any other model.

  • Deep learning module to predict 'Y' given 'X'.
  • Makes use of parameters from run_params.py.
  • Configure the params as desired, then run the NN with python main_lightning.py
  • The models are implemented with PyTorch Lightning, currently only calibCNN.py.
  • analysis_LCNet_results extracts the raw the results from a results folder and displays many comparisons of the different trials. NOTE: currently analysis_LCNet_results.ipynb is old results with messy code. Updated code is at analysis_LCNet_results_no_overlap.ipynb and this is the notebook that should be used!
  • model_validation.py is a script that was barely used yet but is meant to be used to load a pretrained model and use it to reproduce results.

Notes

  1. The data folder contains both data necessary for the generation, and csv files that are created in the dataseting stage, and needed as input for learning phase. Specifically -
    1. stations.csv defines stations, currently also relevant when working on a different computer.
    2. dataset_<station_name>_<start_date>_<end_date>.csv contain links to the actual data paths. Each row is a record.
  2. There are many todos in the code, some of which are crucial for certain stages, and some 'nice to have'.
  3. The run_script.sh can be used as an example of how to run parts of the code from the terminal with the commandline arguments, for example for different dates.
  4. Paths_lidar_learning.pptx is for the planned changes to the data paths - which are meant to be much more organized, easier to maintain and less dependent.
  5. The pyALiDAn_dev - is a private folder of ongoing research.

pyalidan's People

Contributors

addalin avatar liamhazan avatar omershubi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pyalidan's Issues

invalid argument in path

below error message of a run of main_lightning.py:


Failure # 1 (occurred at 2021-05-23_21-45-03)
Traceback (most recent call last):
File "C:\Users\addalin.conda\envs\lidar\lib\site-packages\ray\tune\trial_runner.py", line 880, in _process_trial_save
results = self.trial_executor.fetch_result(trial)
File "C:\Users\addalin.conda\envs\lidar\lib\site-packages\ray\tune\ray_trial_executor.py", line 686, in fetch_result
result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
File "C:\Users\addalin.conda\envs\lidar\lib\site-packages\ray_private\client_mode_hook.py", line 47, in wrapper
return func(*args, **kwargs)
File "C:\Users\addalin.conda\envs\lidar\lib\site-packages\ray\worker.py", line 1481, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(OSError): �[36mray::ImplicitFunc.save()�[39m (pid=22632, ip=132.68.58.209)
File "python\ray_raylet.pyx", line 505, in ray._raylet.execute_task
File "python\ray_raylet.pyx", line 449, in ray._raylet.execute_task.function_executor
File "C:\Users\addalin.conda\envs\lidar\lib\site-packages\ray_private\function_manager.py", line 556, in actor_method_executor
return method(__ray_actor, *args, **kwargs)
File "C:\Users\addalin.conda\envs\lidar\lib\site-packages\ray\tune\function_runner.py", line 434, in save
checkpoint_path = TrainableUtil.process_checkpoint(
File "C:\Users\addalin.conda\envs\lidar\lib\site-packages\ray\tune\utils\trainable.py", line 46, in process_checkpoint
with open(checkpoint_path + ".tune_metadata", "wb") as f:
OSError: [Errno 22] Invalid argument: 'C:\Users\addalin\Dropbox\Lidar\lidar_learning\results\main_2021-05-23_19-35-00\main_5831d016_3_bsize=32,dfilter=None,dnorm=False,fc_size=[32],hsizes=[4, 4, 4, 4],lr=0.001,ltype=MAELoss,source=signal_p,use_bg=F_2021-05-23_21-28-18\checkpoint_epoch=3-step=703\.tune_metadata'


This is weird since it failed in the last epoch. And also in other experiments.
running resume with 'ERRORED_ONLY', fix this.
But why would it happen from the beginning?

script_asutomation

script_asutomation branch

  1. preprocessing.py - parsing flags from the command line
  2. generation main pipeline (in the workflow) (similar to preprocessing and dataseting )
  3. test on 1 day / one month

Error during training: probably related to loading / saving jason files

Trial Runner checkpointing failed: [WinError 5] Access is denied: 'C:\\Users\\addalin\\Dropbox\\Lidar\\lidar_learning\\results\\main_2021-05-06_19-17-01\\.tmp_generator' -> 'C:\\Users\\addalin\\Dropbox\\Lidar\\lidar_learning\\results\\main_2021-05-06_19-17-01\\basic-variant-state-2021-05-06_19-17-01.json

Also - we need to check again the issue of checkpoint naming

Error during training : AttributeError: 'Tee' object has no attribute 'close'

To reproduce - run main_lightining.py

(pid=28740) Error in atexit._run_exitfuncs:
(pid=28740) Traceback (most recent call last):
(pid=28740)   File "C:\Users\addalin\.conda\envs\lidar\lib\logging\__init__.py", line 2123, in shutdown
2021-05-08 05:14:36,713	INFO tune.py:549 -- Total run time: 122255.10 seconds (122254.85 seconds for the tuning loop).
(pid=28740)     h.close()
(pid=28740)   File "C:\Users\addalin\.conda\envs\lidar\lib\site-packages\absl\logging\__init__.py", line 945, in close
(pid=28740)     self.stream.close()
(pid=28740) AttributeError: 'Tee' object has no attribute 'close'

aerosols_update

aerosols_update branch

  1. update the statistics of LR/A - for Haifa station - save to a new csv file (using the info from Birgit)
  2. update KDE_estimation_sample.py when loading df_A_LR (this should be per station or generic one)
  3. Test on a single day

stats_update

stats_update branch

  1. merge stats from update_signal_database.ipynb into dataseting.py
  2. Test on period

raw_update

raw_update branch

  1. prepare samples using time list (from the generated dataset)
  2. update paths in station according to path_lidar_learning.ppts
  3. Test on single day (or period)

Overlap updates

Todos:
overla_update branch

  1. save overlap params per month in the overlap.ipynb
  2. load generated dataset ( 'lidar' or 'signal')
  3. update or generated measurement - split the function. and update the input to calc_measurment (flag, none..)
  4. test on one day, two options: generate and update.
  5. create overlap.py (later ...)

matplotlib warnings

C:\Users\addalin\Dropbox\Lidar\lidar_learning\learning_lidar\utils\misc_lidar.py:38: MatplotlibDeprecationWarning: Support for setting the 'text.latex.preamble' or 'pgf.preamble' r
cParam to a list of strings is deprecated since 3.3 and will be removed two minor releases later; set it to a single string instead.
plt.rcParams['text.latex.preamble'] = [r"\usepackage{amsmath}"]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.