Giter Club home page Giter Club logo

pick-benchmark's Introduction


PyPI - License GitHub Workflow Status Read the Docs PyPI Python 3.9 DOI

The Seismology Benchmark collection (SeisBench) is an open-source python toolbox for machine learning in seismology. It provides a unified API for accessing seismic datasets and both training and applying machine learning algorithms to seismic data. SeisBench has been built to reduce the overhead when applying or developing machine learning techniques for seismological tasks.

Getting started

SeisBench offers three core modules, data, models, and generate. data provides access to benchmark datasets and offers functionality for loading datasets. models offers a collection of machine learning models for seismology. You can easily create models, load pretrained models or train models on any dataset. generate contains tools for building data generation pipelines. They bridge the gap between data and models.

The easiest way of getting started is through our colab notebooks.

Examples
Dataset basics Open In Colab
Model API Open In Colab
Generator Pipelines Open In Colab
Applied picking Open In Colab
Using DeepDenoiser Open In Colab
Depth phases and earthquake depth Open In Colab
Training PhaseNet (advanced) Open In Colab
Creating a dataset (advanced) Open In Colab
Building an event catalog with GaMMA (advanced) Open In Colab
Building an event catalog with PyOcto (advanced) Open In Colab

Alternatively, you can clone the repository and run the same examples locally.

For more detailed information on Seisbench check out the SeisBench documentation.

Installation

SeisBench can be installed in two ways. In both cases, you might consider installing SeisBench in a virtual environment, for example using conda.

The recommended way is installation through pip. Simply run:

pip install seisbench

Alternatively, you can install the latest version from source. For this approach, clone the repository, switch to the repository root and run:

pip install .

which will install SeisBench in your current python environment.

CPU only installation

SeisBench is built on pytorch, which in turn runs on CUDA for GPU acceleration. Sometimes, it might be preferable to install pytorch without CUDA, for example, because CUDA will not be used and the CUDA binaries are rather large. To install such a pure CPU version, the easiest way is to follow a two-step installation. First, install pytorch in a pure CPU version as explained here. Second, install SeisBench the regular way through pip. Example instructions would be:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install seisbench

Contributing

There are many ways to contribute to SeisBench and we are always looking forward to your contributions. Check out the contribution guidelines for details on how to contribute.

Known issues

  • Some institutions and internet providers are blocking access to our data and model repository, as it is running on a non-standard port (2880). This usually manifests in timeouts when trying to download data or model weights. To verify the issue, try accessing https://hifis-storage.desy.de:2880/ directly from the same machine. As a mitigation, you can use our backup repository. Just run seisbench.use_backup_repository(). Please note that the backup repository will usually show lower download speeds. We recommend contacting your network administrator to allow outgoing access to TCP port 2880 on our server as a higher performance solution.
  • We've recently changed the URL of the SeisBench repository. To use the new URL update to SeisBench 0.4.1. It this is not possible, you can use the following commands within your runtime to update the URL manually:
    import seisbench
    from urllib.parse import urljoin
    
    seisbench.remote_root = "https://hifis-storage.desy.de:2880/Helmholtz/HelmholtzAI/SeisBench/"
    seisbench.remote_data_root = urljoin(seisbench.remote_root, "datasets/")
    seisbench.remote_model_root = urljoin(seisbench.remote_root, "models/v3/")
  • On the Apple M1 and M2 chips, pytorch seems to not always work when installed directly within pip install seisbench. As a workaround, follow the instructions at (https://pytorch.org/) to install pytorch and then install SeisBench as usual through pip.
  • EQTransformer model weights "original" in version 1 and 2 are incompatible with SeisBench >=0.2.3. Simply use from_pretrained("original", version="3") or from_pretrained("original", update=True). The weights will not differ in their predictions.

References

Reference publications for SeisBench:




Acknowledgement

The initial version of SeisBench has been developed at GFZ Potsdam and KIT with funding from Helmholtz AI. The SeisBench repository is hosted by HIFIS - Helmholtz Federated IT Services.

pick-benchmark's People

Contributors

yetinam avatar zhong-yy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

pick-benchmark's Issues

Failed to run the training example

Hi, thank you for providing these useful benchmark examples.This package provides very useful encapsulation of training and evaluation processes and detailed examples.

I have gone through the examples given in the seisbench documentation. One of the seisbench examples refers to this package for more elaborate training and evaluation experiments.

Problem

After I installed the package from source pip install -r requirements.txt in the root directory, I tried to run an example with the following command

python benchmark/train.py --config=configs/stead_phasenet.json

but I got the following ImportError

Traceback (most recent call last):
  File "/home/zhongyiyuan/pick-benchmark/benchmark/train.py", line 8, in <module>
    import pytorch_lightning as pl
  File "/home/zhongyiyuan/miniconda3/envs/pickbenchmark/lib/python3.9/site-packages/pytorch_lightning/__init__.py", line 20, in <module>
    from pytorch_lightning import metrics  # noqa: E402
  File "/home/zhongyiyuan/miniconda3/envs/pickbenchmark/lib/python3.9/site-packages/pytorch_lightning/metrics/__init__.py", line 15, in <module>
    from pytorch_lightning.metrics.classification import (  # noqa: F401
  File "/home/zhongyiyuan/miniconda3/envs/pickbenchmark/lib/python3.9/site-packages/pytorch_lightning/metrics/classification/__init__.py", line 14, in <module>
    from pytorch_lightning.metrics.classification.accuracy import Accuracy  # noqa: F401
  File "/home/zhongyiyuan/miniconda3/envs/pickbenchmark/lib/python3.9/site-packages/pytorch_lightning/metrics/classification/accuracy.py", line 18, in <module>
    from pytorch_lightning.metrics.utils import deprecated_metrics
  File "/home/zhongyiyuan/miniconda3/envs/pickbenchmark/lib/python3.9/site-packages/pytorch_lightning/metrics/utils.py", line 22, in <module>
    from torchmetrics.utilities.data import get_num_classes as _get_num_classes
ImportError: cannot import name 'get_num_classes' from 'torchmetrics.utilities.data' (/home/zhongyiyuan/miniconda3/envs/pickbenchmark/lib/python3.9/site-packages/torchmetrics/utilities/data.py)

The versions of related packages are

# Name                    Version                   Build  Channel
seisbench                 0.1.16                   pypi_0    pypi
pytorch-lightning         1.3.4                    pypi_0    pypi
torch                     1.8.1                    pypi_0    pypi
torchmetrics              0.11.0                   pypi_0    pypi
scikit-learn              0.24.2                   pypi_0    pypi
seaborn                   0.11.2                   pypi_0    pypi
bayesian-optimization     1.2.0                    pypi_0    pypi

My try

I updated the packages

# Name                    Version                   Build  Channel
seisbench                 0.2.8                    pypi_0    pypi
pytorch-lightning         1.8.4.post0              pypi_0    pypi
torch                     1.13.0                   pypi_0    pypi
torchmetrics              0.11.0                   pypi_0    pypi
scikit-learn              1.2.0                    pypi_0    pypi
seaborn                   0.12.1                   pypi_0    pypi
bayesian-optimization     1.4.2                    pypi_0    pypi

and tried agaiin, but got the following error

Traceback (most recent call last):
  File "/home/zhongyiyuan/pick-benchmark/benchmark/train.py", line 10, in <module>
    from pytorch_lightning.callbacks import GPUStatsMonitor
ImportError: cannot import name 'GPUStatsMonitor' from 'pytorch_lightning.callbacks' (/home/zhongyiyuan/miniconda3/envs/seisbench/lib/python3.9/site-packages/pytorch_lightning/callbacks/__init__.py)

This may be caused by Lightning-AI/pytorch-lightning#12554

How can I load a model which was saved on GPU on a CPU-only device?

Hi @yetinam, I've recently trained a model on GPU device using pytorch-lightning and I exported the model using a similar way as the export_model function in file export_models.py file.

The exported model works well on the GPU device where it was trained. However, when I copied the model files to a CPU-only device and load the model using from_pretrained(...), the following error occurred:

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

However, the built-in models provided in SeisBench can be loaded smoothly, e.g.

model = sbm.EQTransformer.from_pretrained("original")

May I ask if you had moved the model weights to cpu using model.to(torch.device("cpu")) before you saved the models? I didn't find any code in the file export_models.py that moves the model to CPU before exporting

def export_model(row):
output_base = Path("seisbench_models")
weights = Path("weights") / row["experiment"]
version = sorted(weights.iterdir())[-1]
config_path = version / "hparams.yaml"
with open(config_path, "r") as f:
# config = yaml.safe_load(f)
config = yaml.full_load(f)
model_cls = models.__getattribute__(config["model"] + "Lit")
model = load_best_model(model_cls, weights, version.name)
output_path = output_base / row["model"] / f"{row['data']}.pt.v1"
json_path = output_base / row["model"] / f"{row['data']}.json.v1"
output_path.parent.mkdir(parents=True, exist_ok=True)
torch.save(model.model.state_dict(), output_path)
meta = generate_metadata(row)
with open(json_path, "w") as f:
json.dump(meta, f, indent=4)

I have tried adding model.model.to(torch.device("cpu")) before torch.save(...). It works, but I am still not sure whether this would cause any unintended behavior.

The solution provided by the PyTorch document is to set the parameter map_location=torch.device('cpu') in torch.load(), but there is no way to pass the map_location to torch through seisbench (https://github.com/seisbench/seisbench/blob/61ac2df822249fc16b62dfe7da1779ec2e8255bf/seisbench/models/base.py#L515-L543)

Could you give me any advice? Thank you very much!

Question about model evaluation

Thank you for providing the seisbench package. I've learned a lot after reading the code, but there are many details I haven't understood.

  1. For task 2 and task 3, how do you guarantee only one phase exists in a 10s window?

  2. Instead of dividing the testing into 3 tasks, if I want to simply calculate metrics in the following way:

  • True positive: if the difference between the predicted pick and the reference pick is less than 0.1 s
  • False positive: if (a) the distance between the predicted pick and the reference pick is greater than 0.1 s, or (b) there is a predicted pick, but no reference pick.
  • True negative: there are no reference picks and predicted picks.
  • False negative: there are reference picks but no positive prediction.

what should I do with the code?

Maybe I should modify predict_step functions in models.py, but I feel lost here.

def predict_step(self, batch, batch_idx=None, dataloader_idx=None):
x = batch["X"]
window_borders = batch["window_borders"]
pred = self.model(x)
score_detection = torch.zeros(pred.shape[0])
score_p_or_s = torch.zeros(pred.shape[0])
p_sample = torch.zeros(pred.shape[0], dtype=int)
s_sample = torch.zeros(pred.shape[0], dtype=int)
for i in range(pred.shape[0]):
start_sample, end_sample = window_borders[i]
local_pred = pred[i, :, start_sample:end_sample]
score_detection[i] = torch.max(1 - local_pred[-1]) # 1 - noise
score_p_or_s[i] = torch.max(local_pred[0]) / torch.max(
local_pred[1]
) # most likely P by most likely S
p_sample[i] = torch.argmax(local_pred[0])
s_sample[i] = torch.argmax(local_pred[1])
return score_detection, score_p_or_s, p_sample, s_sample

Why is local_pred = pred[i, :, start_sample:end_sample] needed in line 210?

Since a SteeredWindow has been added to the generator

generator = sbg.SteeredGenerator(split, task_targets)
generator.add_augmentations(model.get_eval_augmentations())

isn't the x automatically trimmed to start_sample:end_sample by the SteeredWindow?

Can I delete the lines following pred = self.model(x) , and finaly return pred?

How can I turn pred to picks? Do I need to first store the pred array in obspy.Stream, and then use the classify or pick_from_annotation functions in the model api?

dependency conflicts and interpretation.ipynb running error

Hi @yetinam thanks for updating the requirements. However, it still has conflicts(test in Google Colab and Mac OS 13.1):

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
yellowbrick 1.5 requires scikit-learn>=1.0.0, but you have scikit-learn 0.24.2 which is incompatible.
torchvision 0.14.1+cu116 requires torch==1.13.1, but you have torch 1.8.1 which is incompatible.
torchtext 0.14.1 requires torch==1.13.1, but you have torch 1.8.1 which is incompatible.
torchaudio 0.13.1+cu116 requires torch==1.13.1, but you have torch 1.8.1 which is incompatible.

Then I try running interpretation.ipynb , when importing packages from benchmark import models, the error is raised:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
[<ipython-input-4-b67497ed2a33>](https://localhost:8080/#) in <module>
      1 import torch
      2 import torch.nn.functional as F
----> 3 from benchmark import models
      4 import seisbench.data as sbd
      5 import seisbench.models as sbm

15 frames
[/usr/lib/python3.8/ctypes/__init__.py](https://localhost:8080/#) in __init__(self, name, mode, handle, use_errno, use_last_error, winmode)
    371 
    372         if handle is None:
--> 373             self._handle = _dlopen(self._name, mode)
    374         else:
    375             self._handle = handle

OSError: /usr/local/lib/python3.8/dist-packages/torchtext/lib/libtorchtext.so: undefined symbol: _ZN5torch6detail10class_baseC2ERKSsS3_SsRKSt9type_infoS6_

Same situation when running train.py

Black action broken

The black CI action is broken due to deprecations. With an upgrade of the versions, this could be fixed, similar to the SeisBench repository. However, as this repository mainly serves for reference to the benchmarking paper and is not actively maintained, I will for now refrain from fixing this.

`_pickle.UnpicklingError: pickle data was truncated` when using more than one `GPU`

I got the following error when using 2 GPUs. It runs normal when only one GPU is used.

Epoch 0:   0%|                                           | 0/25 [00:00<?, ?it/s]Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
Traceback (most recent call last):
  File "/home/zhongyiyuan/pick-benchmark/benchmark/train.py", line 257, in <module>
    train(config, experiment_name, test_run=args.test_run)
  File "/home/zhongyiyuan/pick-benchmark/benchmark/train.py", line 87, in train
    trainer.fit(model, train_loader, dev_loader)
  File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 608, in fit
    call._call_and_handle_interrupt(
  File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 36, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/multiprocessing.py", line 113, in launch
    mp.start_processes(
  File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
    while not context.join():
  File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 140, in join
    raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 1 terminated with signal SIGKILL

Config file description?

Hi @yetinam , Compared to tutorial training examples, this training.py is more comprehensive and helpful for training new models based on new data. When I try to rewrite the config JSON file, I meet some difficulties in rewriting my demand.

For example, I saw the README.md mention user could use the local datasets in Seisbench format, but when I replace
"data": "SCEDC" to "data": "./seisbench/datasets/ethz" (which is the path of the downloaded ETHZ dataset) it will raise errors.
Moreover, how to use data.filter in JSON to change the complete built-in dataset? E.g., reduce the samples amount, filter magnitude range? Is it possible to work by rewrite the json file?

Another question is could I load pre_trained weights in this training.py? I.E., how to express sbm.PhaseNet.from_pretrained("stead") in json file (or rewrite some lines of training.py)?

Tutorial 03a_training_phasenet.ipynb is a really user-friendly notebook for training. However, it didn't contain the following part, e.g., export the trained model (weights, logging) to a local folder for later reused, generate performance matrix based on validation set, etc. pick-benchmark have these scripts, so I'd like to create a notebook to extend 03a_training_phasenet.ipynb. I would appreciate it if you could offer some advice.

python version for installation

Dear developer,
I used python version 3.7 to install this package, but this error pops up:
ERROR: Could not find a version that satisfies the requirement sklearn~=0.24.2 (from versions: 0.0, 0.0.post1)
ERROR: No matching distribution found for sklearn~=0.24.2
I tried to install other python versions, but those versions also have problems with installing seisbench.
could you please tell me which python version is proper for this package?
Best regards,
Javad

Training output is inconsistent with the document and the evaluation code

Training outputs

I ran a training example with

python benchmark/train.py --config=configs/stead_phasenet.json

and the output in the weights folder is

weights
└── stead_phasenet
    └── version_0
        ├── checkpoints
        │   └── epoch=99-step=105000.ckpt
        ├── hparams.yaml
        └── metrics.csv

There is no weights/[config]_[config] folder and weights/[config] folder as mentioned in the README document.

Evaluation

When I tried to do the evaluation using

python benchmark/eval.py weights/stead_phasenet targets/stead

a FileNotFoundError occured:

  File "/home/zhongyiyuan/pick-benchmark/benchmark/eval.py", line 210, in <module>
    main(
  File "/home/zhongyiyuan/pick-benchmark/benchmark/eval.py", line 46, in main
    model = load_best_model(model_cls, weights, version.name)
  File "/home/zhongyiyuan/pick-benchmark/benchmark/util.py", line 38, in load_best_model
    return model_cls.load_from_checkpoint(checkpoint_path)
...
FileNotFoundError: [Errno 2] No such file or directory: '/home/zhongyiyuan/pick-benchmark/weights/stead_phasenet_stead_phasenet/0_0/checkpoints/epoch=91-step=96599.ckpt'

In addition, the best model seems at the 91th epoth. However, the output in weights/stead_phasenet/version_0/checkpoints/epoch=99-step=105000.ckpt seems to be the model at the 99th epoch, which might be the last epoch if indexing starts from 0.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.