gersl / pycold Goto Github PK

License: Apache License 2.0

CMake 0.35% Shell 0.98% Python 14.98% C 65.40% Fortran 9.42% Cython 1.55% MATLAB 0.05% Jupyter Notebook 7.01% Dockerfile 0.26%

pycold's Introduction

PYCOLD

A Python library for COntinuous monitoring of Land Disturbance (COLD) and its extension algorithms at the speed of C language

The base algorithms were mostly written using C wrapped in Python, and have been verified with MATLAB version. The C codes of the package were partially modified from C-CCDC developed by USGS.

This library provides:

Original COntinuous monitoring of Land Disturbance (COLD): a upgraded CCDC algorithm proposed by Dr.Zhe Zhu for offline satellite-based time-series analysis
Stochastic Continuous Change Detection (S-CCD, a near real-time and short-memory implementation of COLD)
Object-based COLD (OB-COLD), integrating spatial information into COLD by using a ‘change object’ view

The recent applications of S-CCD and OB-COLD could be found in CONUS Land Watcher

1. Pre-work

Clone github repo to your local code directory for the first use:

git clone https://github.com/GERSL/pycold.git

Or you call pull the recent repo if you want to update the existing pycold repo:

git pull origin devel:devel

2. Installation

The steps to install this library in development mode are consolidated into a single script: run_developer_setup.sh. On debian-based systems, this will install all of the developer requirements and ensure you are setup with a working opencv-python-headless and gdal Python modules, as well as other requirements and then it will compile and install pycold in editable development mode.

The following is an overview of these details and alternative choices that could be made.

2.1 Install Required Libraries

The ZLIB, GSL libraries are required.

For Ubuntu/Debian systems, they can be installed via:

sudo apt-get update
sudo apt-get install build-essential  -y
sudo apt-get install zlib1g-dev -y
sudo apt-get install gfortran -y
sudo apt-get install libgsl-dev -y

On CentOS systems run:

sudo apt-get install gcc gcc-c++ make  -y
sudo apt-get install zlib-devel -y
sudo apt-get install gcc-gfortran -y
# Yum provides an gsl 1.5, but we need 2.7
# sudo apt-get install gsl-devel -y
curl https://ftp.gnu.org/gnu/gsl/gsl-2.7.1.tar.gz  > gsl.tar.gz && tar xfv gsl.tar.gz && cd gsl-2.7.1 && ./configure --prefix=/usr --disable-static && make && make install

2.2 Compile and Install PYCOLD

The following instructure assume you are inside a Python virtual environment (e.g. via conda or pyenv).

# Install required packages
pip install -r requirements.txt

Note that in all cases gdal will need to be manually installed. The following step will install GDAL from a custom pypi server containing precompiled wheels.

# Install GDAL (note-this must be done manually)
pip install -r requirements/gdal.txt

Additionally, to access the cv2 module, pycold will require either opencv-python or opencv-python-headless, which are mutually exclusive. This is exposed as optional dependencies in the package via either "graphics" or "headless" extras. Headless mode is recommended as it is more compatible with other libraries. These can be obtained manually via:

pip install -r requirements/headless.txt

# XOR (choose only one!)

pip install -r requirements/graphics.txt

Option 1: Install in development mode

For details on installing in development mode see the developer install instructions.

We note that all steps in the above document and other minor details are consolidated in the run_developer_setup.sh script.

Option 2: Build and install a wheel

Scikit-build will invoke CMake and build everything. (you may need to remove any existing _skbuild directory).

python -m build --wheel .

Then you can pip install the wheel (the exact path will depend on your system and version of python).

pip install dist/pycold-0.1.0-cp38-cp38-linux_x86_64.whl

You can also use the build_wheels.sh script to invoke cibuildwheel to produce portable wheels that can be installed on different than they were built on. You must have docker and cibuildwheel installed to use this.

Option 3: build standalone binaries with CMake by itself (recommended for C development)

mkdir -p build
cd build
cmake ..
make

Option 4: Use a docker image.

This repo provides dockerfiles that illustrate a reproduceable method for compling and installing PYCOLD. See dockerfiles/README.rst for details.

3. Using pycold for pixel-based processing (more see jupyter examples)

COLD:

from pycold import cold_detect
cold_result = cold_detect(dates, blues, greens, reds, nirs, swir1s, swir2s, thermals, qas)

COLD algorithm for any combination of band inputs from any sensor:

from pycold import cold_detect
# input a user-defined array instead of multiple lists
cold_result = cold_detect_flex(dates, np.stack((band1, band2, band3), axis=1), qas, tmask_b1=1, tmask_b2=2)

S-CCD:

# require offline processing for the first time
from pycold import sccd_detect, sccd_update
sccd_pack = sccd_detect(dates, blues, greens, reds, nirs, swir1s, swir2s, thermals, qas)

# then use sccd_pack to do recursive and short-memory NRT update
sccd_pack_new = sccd_update(sccd_pack, dates, blues, greens, reds, nirs, swir1s, swir2s, thermals, qas)

Q&A

Q1: Has pycold been verified with original Matlab codes?

Re: yes, multiple rounds of verification have been done. Comparison based on two testing tiles shows that pycold and Matlab version have smaller than <2% differences for breakpoint detection and <2% differences for harmonic coefficients; the accuracy of pycold was also tested against the same reference dataset used in the original COLD paper (Zhu et al., 2020), and pycold reached the same accuracy (27% omission and 28% commission) showing that the discrepancy doesn’t hurt accuracy. The primary source for the discrepancy is mainly from the rounding: MATLAB uses float64 precision, while pycold chose float32 to save the run-time computing memory and boost efficiency.

Q2: how much time for production of a tile-based disturbance map (5000*5000 pixels) using pycold?

Re: I tested it in UCONN HPC environment (200 EPYC7452 cores): for processing a 40-year Landsat ARD tile (1982-2021), the stacking typically takes 15 mins; per-pixel COLD processing costs averagely 1 hour; exporting maps needs 7 mins.

4. Citations

If you make use of the algorithms in this repo (or to read more about them), please cite (/see) the relevant publications from the following list:

[COLD] Zhu, Z., Zhang, J., Yang, Z., Aljaddani, A. H., Cohen, W. B., Qiu, S., & Zhou, C. (2020). Continuous monitoring of land disturbance based on Landsat time series. Remote Sensing of Environment, 238, 111116.

[S-CCD] Ye, S., Rogan, J., Zhu, Z., & Eastman, J. R. (2021). A near-real-time approach for monitoring forest disturbance using Landsat time series: Stochastic continuous change detection. Remote Sensing of Environment, 252, 112167.

[OB-COLD] Ye, S., Zhu, Z., & Cao, G., (2022). Object-based continuous monitoring of land disturbance from dense Landsat time series. Remote Sensing of Environment, 287, 113462.

pycold's People

Contributors

Stargazers

Watchers

Forkers

wildfires-org erotemic andongma twin22jw suiyuanwang du33169 glin321 m-elhussieny ismethr

pycold's Issues

Integration with WATCH

@SuYe99 I'm looking at how to best integrate this code with watch, and I'm trying to connect the steps your taking in prepare_ard.py with the data we have available.

Something prepare_ard does it it makes a lot of assumptions about filenames. This is something that kwcoco helps to abstract away. Filename assumptions should be fine for the packed data (although I'd recommend to avoid it, as my opinion is that it is usually an anti-pattern), so the goal is just to get data from a kwcoco file into something pycold will accept.

The following snippets are what I've done so far. First here is a bit of demo code that will build a kwcoco dataset with landsat8 data assuming you have the watch repo and appropriate AWS credentials to pull data from the USGS stac catalog.

(Note, it does depend on a few tweaks and fixes I haven't landed in main yet, so let me know if you actually need to run this and I'll land those).

        DATASET_SUFFIX=DemoKHQ-2022-06-10-V2
        DEMO_DPATH=$HOME/.cache/watch/demo/datasets
        REGION_FPATH="$HOME/.cache/watch/demo/annotations/KHQ_R001.geojson"
        SITE_GLOBSTR="$HOME/.cache/watch/demo/annotations/KHQ_R001_sites/*.geojson"
        START_DATE=$(jq -r '.features[] | select(.properties.type=="region") | .properties.start_date' "$REGION_FPATH")
        END_DATE=$(jq -r '.features[] | select(.properties.type=="region") | .properties.end_date' "$REGION_FPATH")
        REGION_ID=$(jq -r '.features[] | select(.properties.type=="region") | .properties.region_id' "$REGION_FPATH")
        SEARCH_FPATH=$DEMO_DPATH/stac_search.json
        RESULT_FPATH=$DEMO_DPATH/all_sensors_kit/${REGION_ID}.input

        mkdir -p "$DEMO_DPATH"

        # Create the search json wrt the sensors and processing level we want
        python -m watch.stac.stac_search_builder \
            --start_date="$START_DATE" \
            --end_date="$END_DATE" \
            --cloud_cover=40 \
            --sensors=landsat-c2ard-bt,landsat-c2ard-sr \
            --out_fpath "$SEARCH_FPATH"
        cat "$SEARCH_FPATH"

        # Delete this to prevent duplicates
        rm -f "$RESULT_FPATH"
        # Create the .input file
        python -m watch.cli.stac_search \
            -rf "$REGION_FPATH" \
            -sj "$SEARCH_FPATH" \
            -m area \
            --verbose 2 \
            -o "${RESULT_FPATH}"

        # Construct the TA2-ready dataset
        python -m watch.cli.prepare_ta2_dataset \
            --dataset_suffix=$DATASET_SUFFIX \
            --s3_fpath "${RESULT_FPATH}" \
            --collated False \
            --dvc_dpath="$DEMO_DPATH" \
            --aws_profile=iarpa \
            --region_globstr="$REGION_FPATH" \
            --site_globstr="$SITE_GLOBSTR" \
            --fields_workers=8 \
            --convert_workers=8 \
            --align_workers=26 \
            --cache=0 \
            --ignore_duplicates=0 \
            --visualize=True \
            --requester_pays=True \
            --api_key=None \
            --serial=True --run=0

This makes a small dataset of the Kitware HQ building and an associated ~/.cache/watch/demo/datasets/Aligned-DemoKHQ-2022-06-10-V2/data.kwcoco.json kwcoco dataset, which registers all the paths to all of the images along with metadata about what sensor they come from and when they were taken.

Given this I wrote a small script to attempt to emulate what you are doing in prepare_ard to construct blocked arrays for a single image:

    import ubelt as ub
    import kwcoco
    import numpy as np
    fpath = ub.Path('~/.cache/watch/demo/datasets/Aligned-DemoKHQ-2022-06-10-V2/data.kwcoco.json').expand()
    dset = kwcoco.CocoDataset(fpath)

    # available channels
    # sr_qa_aerosol,qa_pixel,qa_radsat
    # 'coastal|blue|green|red|nir|swir16|swir22|lwir11|lwir12'

    for vidid in dset.videos():
        images = dset.images(vidid=vidid)

        for coco_image in images.coco_images:
            print(f'coco_image.channels={coco_image.channels}')
            assert coco_image.img['sensor_coarse'] == 'L8'
            intensity_channels = 'blue|green|red|nir|swir16|swir22|lwir11'
            quality_channels = 'qa_pixel'
            delayed_intensity = coco_image.delay(channels=intensity_channels, space='video', nodata_method='ma')
            delayed_qa = coco_image.delay(
                channels=quality_channels, space='video', nodata_method='ma', antialias=False)

            # hacks: should be available in main API
            delayed_intensity._set_nested_params(interpolation='cubic')
            delayed_qa._set_nested_params(interpolation='nearest')

            delayed_intensity = delayed_intensity.optimize()
            delayed_qa = delayed_qa.optimize()

            intensity_data = delayed_intensity.finalize()
            qa_data = delayed_qa.finalize()
            im_band = np.concatenate([intensity_data, qa_data], axis=2)
            im_band_data = im_band.data
            # im_band_mask = im_band.mask

            h, w, c = im_band_data.shape
            n_block_x = 20
            n_block_y = 20

            padded_w = int(np.ceil(w / n_block_x) * n_block_x)
            padded_h = int(np.ceil(h / n_block_y) * n_block_y)

            # Pad data out to fit in block sizes
            pad_x = padded_w - w
            pad_y = padded_h - h
            padded_data = np.pad(im_band_data, [(0, pad_y), (0, pad_x), (0, 0)])

            bw = int(padded_w / n_block_x)  # width of a block
            bh = int(padded_h / n_block_y)  # height of a block

            import einops
            blocks = einops.rearrange(padded_data, '(nby bh) (nbx bw) c -> nbx nby bh bw c', bw=bw, bh=bh)

Note that in a prepared kwcoco file Using CocoImage.delay(space='video') will return an object that will resample any bands (that may exist at different scales on disk) on the fly. This resample will also take care of aligning pixels across multiple images in a sequence according to their geo-metadata.

This means I can load all of the data in a fairly concise command. The next step is to construct the block structure you expect. I'm using einops to express the appropriate operation. It's probably not as efficient as your implementation, but writing it this way will let us generalize and reduce the number of assumptions we make. I think we can improve the efficiency if it turns out to be a bottleneck.

It looks like I'll also need to check the QA band to determine if I should write a block or not.

What would be helpful is a precise description of the file format that you would expect.
It looks like there needs to be a starting_last_dates.txt with the times for the images in the sequence, and then the images are stored in "partition subfolders" where each image has shape (NY, NX, BH, BW, C) in npy format?

I'm wondering if it would be inefficient to use hdf5 here instead? Perhaps the npy format is just superior?

Also how are you handling data that doesn't have a shape where a blocksize neatly divides it? I'm just using some padding, there probably is a more efficient way to handle that part.

Debugging Pypi

@SuYe99 Looks like the publishing to test pypi failed. Can you add me as a collaborator to this repo so I have permission to push branches to it directly? That will let me debug the issue quicker.

ZLIB not found

Hi @Erotemic is it possible for you check this import error in my recent push? Looks like linking to run_developer_setup.sh

E ImportError: /lib64/libz.so.1: version `ZLIB_1.2.9' not found (required by /tmp/tmp.cO9Mr9wrTp/venv/lib/python3.8/site-packages/cv2/../opencv_python_headless.libs/libpng16-186fce2e.so.16.37.0)

Cython_source issue

Hi Jon @Erotemic , I did some progress for pycold this summer, including flexible band inputs and window support. But I encounterred an error for my recent push. Would it be possible for you to have a look at the cython_source error? I really lack knowledge on the test.yml, so want to ask if you have a quick answer for here. Thanks!

multiple types as pycold inputs

The current pycold package only supports the array inputs as 'np.int64' such as pycold.cold_detect. This is mainly because that it has the best fit into the C functions which only accepts 'long' as inputted vector type, and we don't need to do extra conversion in the cython codes. We may consider adding a support for multiple input types which frees users from 'type cast' for their input (e.g., .astype(np.int64)). For the cython part, it looks easy to use 'fused types'. But I don't know how to create a 'template' function for C and linking it to cython functions. @Erotemic You may have suggestion for this?

Window platform implementation

I began to seriously consider implementing pycold for the Windows platform, as it is the need from several other users and also for a geospatial course I am currently developing.

I copied the paragraph where Jon reported his last try for Windows platform.

"I did attempt to get windows, osx, but I wasn't able to figure out how to ge tthe dependency on GSL to be satisfied. I also did try to get 32bit systems to work (i686), but for some reason they can't find OpenMP (even though I'm pretty sure I installed the right GOMP on the images). These issues can be sorted out later though. CPython wheels 3.8, 3.9, and 3.10 for linux are now being built, so anyone on linux with those python versions will be able to pip install pycold. (FWIW people on other systems will be able to run the command and download the source, but it will only work if they have all the other dependencies setup). "

Allowing different combinations of indices/bands as inputs

I received a request from a user for upgrading cold_detect to accept user-defined indices or band combination. So far, the technical problem is mainly at its causing C struct 'Output_t' to be variable by the input band number. But we may have solutions in https://github.com/GERSL/pycold/blob/devel/src/python/pycold/colds.py to 'customize' a default-lengthy 'Output_t' in python.
My current plan is to create a new function called 'cold_detect_super(green, swir1, B1,B2,B3,B4,B5,B6,B7...)'. We still need green and swir1 bands anyway to do TMask.
@twin22jw does the WATCH project need this functionality?

refuse to push new branches

@Erotemic I am trying to push a new feature branch (I am currently working on), but received an reject:

[remote rejected] devel -> devel-suy (refusing to allow a Personal Access Token to create or update workflow .github/workflows/tests.yml without workflow scope) error: failed to push some refs to 'https://github.com/GERSL/pycold.git'

It looks like being blocked by certain rules from your added workflow. Do you know how to bypass it?