nansencenter / sea_ice_type_cnn_training Goto Github PK

Deep learning of satellite data: Use the data from satellites for machine learning (deep learning) purposes

License: GNU General Public License v3.0

Python 0.77% Dockerfile 0.01% Jupyter Notebook 99.23%

tensorflow-training satellites asip tensorboard inference

sea_ice_type_cnn_training's Issues

To keep track of processed files

To write which files were processed and which files are unprocessable.
If processing has to be restarted (due to crash) process only files not from this list.

Add selection of hot-encoding to CLI

Either 'binary' or 'continuous' (default)

Training -n nersc_ parameter not recognised

When lauching training, using the "-n nersc_" as an entry parameter gives the following error:

Refactor self.util into classes

Another tree of classes can be developed:

class Batches:
    def calculate_variable_ML


class SarBatches(Batches):
    def pading

class OutputBatches(SarBatches):

class Amsr2Batches(Batches):

Attributes and methods from self.util can be moved to the new classes.

Inside main()

these classes can be instantiated and used in a loop after archive_.calculate_batches_for_masks() or even earlier:

for cls in [SarBatches, OutputBatches, Amsr2Batches]:
    o = cls(archive_)
    o.process()

Develop the generator code

Questions about generators:

In a simple case one input (sar), one output (CT) with the same size do we need just one generator?
In the same case, do we need just one generator for training and validation data?

In a more complex case, a typical workflow:

one input from sar at input layer
another input from amsr2 at intermediate layer
output (CT) at output layer
How many generators do we need?

Parameters for generator:

which bands to use
which files to use (list of files or mask)
at which layers to add the data

Tests for output type with hot-encoding

Currently some test are failing due to changed dimentionality of hot-encoded output. Tests need to be updated.

Filtering dataset for nan values

Filter dataset for nan values that are not land (put a condition for land with distance_map) for sar and amsr2 images in build_dataset.py (after line 33). fil must be considered as a dictionnary. (Filtering should work be able to function independently from functions in archive and mask).

initial commit

Add functionality to reduce resolution of input and output data

Add a parameter to script which tells how much the input data should be subsampled
Decrease resolution of batches

Nan values in .npz files

When running the training on the .npz files I have got the loss function equal to nan. When debugging, I found that there a nan values in the .npz files (in the matrixes for 'nersc_sar_primary' and 'nersc_sar_secondary').
I have run the python train_model.py -o /fold2 -bs 4 -p 0.8 -see -sft on the .npz files extracted from the following .nc files:

20180410T084537_S1B_AMSR2_Icechart-Greenland-SouthEast.nc
20190404T201246_S1A_AMSR2_Icechart-Greenland-SouthEast.nc
20190423T200433_S1A_AMSR2_Icechart-Greenland-SouthEast.nc
20190509T081206_S1A_AMSR2_Icechart-Greenland-CentralEast.nc
20190509T081306_S1A_AMSR2_Icechart-Greenland-CentralEast.nc
20190519T194808_S1A_AMSR2_Icechart-Greenland-SouthEast.nc
20190519T194908_S1A_AMSR2_Icechart-Greenland-SouthEast.nc
20190523T200352_S1B_AMSR2_Icechart-Greenland-SouthEast.nc

To find the nan values, I have run a Python script that stocks the different values of each file and then did a CTRL+F.
In the Method 2 of the python script (https://github.com/Alissa13777/Internship_NERSC_CNN_IceTypes), one can see the file in which there are nan values in the shell.

tests

data_builded should not be called before inference

It is quite disadvantageous that the data_builded script has to be called before inference. Then it will work only with the data from the ASIP (note P instead of D) dataset. However we will use it also for other data from Sentinel-1 and AMSR-2. Therefore the model should be applicable to any input with such data.

One way to apply a model without a builder script is to make a new generator which accepts an in-memory object as input, instead of a list of NPZ files. The Archive class already has the functionality to read everything into memory (and also to write NPZ files, which is relevant only for training dataset). Now a new DataGenerator should be developed to take Archive object as input.
Then Archive becomes not a proper class as it mixes operations on archive (multiple files) and as single dataset. So it should be split into two (e.g. Archive and Dataset) and then the new DataGenerator should take only a Dataset object.

Later (another issue), in order to adapt the generator to other input data, we will develop a class that can read Sentinel-1 and AMSR2 from two different files, collocate them on the same grid, create an object with the same interface as the Dataset above and use it either for building another training dataset or for inference.

Interpolate AMSR2 data to a given size (or resolution)

A script for processing a netCDF file with a pre-trained model

The script should

take name of a netCDF file from ASIP v2
split it into sub-images
run the pre trained CNN
assemble back into image size, and
save as npz

nansencenter / sea_ice_type_cnn_training Goto Github PK

sea_ice_type_cnn_training's Issues

To keep track of processed files

Add selection of hot-encoding to CLI

Training -n nersc_ parameter not recognised

Refactor self.util into classes

Develop the generator code

Tests for output type with hot-encoding

Filtering dataset for nan values

initial commit

Add functionality to reduce resolution of input and output data

Nan values in .npz files

tests

data_builded should not be called before inference

Interpolate AMSR2 data to a given size (or resolution)

A script for processing a netCDF file with a pre-trained model

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent