Giter Club home page Giter Club logo

torchsig's Introduction


build

TorchSig is an open-source signal processing machine learning toolkit based on the PyTorch data handling pipeline. The user-friendly toolkit simplifies common digital signal processing operations, augmentations, and transformations when dealing with both real and complex-valued signals. TorchSig streamlines the integration process of these signals processing tools building on PyTorch, enabling faster and easier development and research for machine learning techniques applied to signals data, particularly within (but not limited to) the radio frequency domain. An example dataset, Sig53, based on many unique communication signal modulations is included to accelerate the field of modulation classification. Additionally, an example wideband dataset, WidebandSig53, is also included that extends Sig53 with larger data example sizes containing multiple signals enabling accelerated research in the fields of wideband signal detection and recognition.

TorchSig is currently in beta

Key Features


TorchSig provides many useful tools to facilitate and accelerate research on signals processing machine learning technologies:

  • The SignalData class and its SignalDescription objects enable signals objects and meta data to be seamlessly handled and operated on throughout the TorchSig infrastructure.
  • The Sig53 Dataset is a state-of-the-art static modulations-based RF dataset meant to serve as the next baseline for RFML classification development & evaluation.
  • The ModulationsDataset class synthetically creates, augments, and transforms the largest communications signals modulations dataset to date in a generic, flexible fashion.
  • The WidebandSig53 Dataset is a state-of-the-art static wideband RF signals dataset meant to serve as the baseline for RFML signal detection and recognition development & evaluation.
  • The WidebandModulationsDataset class synthetically creates, augments, and transforms the largest wideband communications signals dataset in a generic, flexible fashion.
  • Numerous signals processing transforms enable existing ML techniques to be employed on the signals data, streamline domain-specific signals augmentations in signals processing machine learning experiments, and signals-specific data transformations to speed up the field of expert feature signals processing machine learning integration.
  • TorchSig also includes a model API similar to open source code in other ML domains, where several state-of-the-art convolutional and transformer-based neural architectures have been adapted to the signals domain and pretrained on the Sig53 and WidebandSig53 datasets. These models can be easily used for follow-on research in the form of additional hyperparameter tuning, out-of-the-box comparative analysis/evaluations, and/or fine-tuning to custom datasets.

Documentation


Documentation can be found online or built locally by following the instructions below.

cd docs
pip install -r docs-requirements.txt
make html
firefox build/html/index.html

Installation


Clone the torchsig repository and simply install using the following commands:

cd torchsig
pip install .

Generating the Datasets

If you'd like to generate the named datasets without messing with your current Python environment, you can build the development container and use it to generate data at the location of your choosing.

docker build -t torchsig -f Dockerfile .
docker run -u $(id -u ${USER}):$(id -g ${USER}) -v `pwd`:/workspace/code/torchsig torchsig python3 torchsig/scripts/generate_sig53.py --root=/workspace/code/torchsig/examples/sig53 --all=True

For the wideband dataset, you can do:

docker build -t torchsig -f Dockerfile .
docker run -u $(id -u ${USER}):$(id -g ${USER}) -v `pwd`:/workspace/code/torchsig torchsig python3 torchsig/scripts/generate_wideband_sig53.py --root=/workspace/code/torchsig/examples/wideband_sig53 --all=True

If you do not need to use Docker, you can also just generate using the regular command-line interface

python3 torchsig/scripts/generate_sig53.py --root=torchsig/examples --all=True

or for the wideband dataset:

python3 torchsig/scripts/generate_wideband_sig53.py --root=torchsig/examples --all=True

Then, be sure to point scripts looking for root to torchsig/examples.

Using the Dockerfile

If you have Docker installed along with compatible GPUs and drivers, you can try:

docker build -t torchsig -f Dockerfile .
docker run -d --rm --network=host --shm-size=32g --gpus all --name torchsig_workspace -v `pwd`/examples:/workspace/code/examples torchsig tail -f /dev/null
docker exec torchsig_workspace jupyter notebook --allow-root --ip=0.0.0.0 --no-browser

Then use the URL in the output in your browser to run the examples and notebooks.

License


TorchSig is released under the MIT License. The MIT license is a popular open-source software license enabling free use, redistribution, and modifications, even for commercial purposes, provided the license is included in all copies or substantial portions of the software. TorchSig has no connection to MIT, other than through the use of this license.

Citing TorchSig


Please cite TorchSig if you use it for your research or business.

@misc{torchsig,
  title={Large Scale Radio Frequency Signal Classification},
  author={Luke Boegner and Manbir Gulati and Garrett Vanhoy and Phillip Vallance and Bradley Comar and Silvija Kokalj-Filipovic and Craig Lennon and Robert D. Miller},
  year={2022},
  archivePrefix={arXiv},
  eprint={2207.09918},
  primaryClass={cs-LG},
  note={arXiv:2207.09918}
  url={https://arxiv.org/abs/2207.09918}
}

torchsig's People

Contributors

gvanhoy avatar mattcarrickpl avatar torchdsp avatar lboegner avatar pvallance avatar dicta avatar awthomp avatar sognefej avatar

Stargazers

Luke Berndt avatar Koorye avatar  avatar Haitao Xiao avatar Aaditya Purani avatar djp avatar  avatar Qiexiang Wang avatar Eugene avatar Nick Pourazima avatar Yezhuo Zhang avatar 李小保 avatar Steven(Yuhang) Wang avatar Chaos2048 avatar Woon-Ha Yeo avatar  avatar Hao Zhang avatar Sachin Chanchani avatar Erik Young avatar Corey avatar  avatar Edgar Riba avatar  avatar Gen Li avatar Jason Merlo avatar Nicholas Bruce avatar Parham Gousheh avatar Cleverson Nahum avatar Camila Novaes avatar  avatar Ronny Kunze avatar  avatar  avatar Dylan Stewart avatar Noah avatar Zachary Iles avatar Kestutis Vinciunas avatar  avatar  avatar hz avatar Noah avatar Null avatar  avatar  avatar Erdem Köse avatar  avatar Dzunior avatar sciafri avatar David Ramirez avatar Simon Ballantyne avatar Logan avatar Ken Witham avatar  avatar Bo Zhou avatar  avatar  avatar  avatar  avatar Chris Kroenke avatar  avatar  avatar Jie Su avatar GrokkedBandwidth avatar Stephen Anthony avatar Evan Spector avatar Mikhail Ronkin avatar  avatar ZHENG avatar snowyseason avatar 赵光焰 avatar  avatar Haik avatar Tevis Gehr avatar Todd Hildebrant avatar Paul Ang avatar  avatar Yu-Shun Liu avatar Hans Baier avatar Minje Park / 박민제 avatar  avatar  avatar  avatar Rohan Khera avatar Maximilian Bundscherer avatar Colorado Reed avatar Bryse avatar philanthrope avatar  avatar Hao Zhu avatar zwlin avatar  avatar Nathan Martindale avatar  avatar Alex Lackpour avatar  avatar Youngjoon Lee avatar Paul avatar  avatar  avatar Edwin Trejo avatar

Watchers

Luke Berndt avatar sciafri avatar Todd Hildebrant avatar  avatar Zero_Chaos avatar Qinsi Long avatar  avatar Alex Lackpour avatar  avatar  avatar Utopienne avatar Michael Hansen avatar zwlin avatar Kostas Georgiou avatar  avatar Ryan Crawford avatar  avatar  avatar

torchsig's Issues

Accelerate Wideband Sig53 Generation

For dataset generation, to properly use multi-threading + GPU's, the torch.utils.data.DataLoader doesn't quite enjoy routines using torch to generate a dataset because it requires re-initializing something related to CUDA unless you specifically do multi-processing.

To do multi-processing, you have to "protect" the generation script with an if __name__ == "__main__". Which points to having entirely stand-alone dataset generation utilities if you want to do it fast. So, I'm thinking we still refactor out the responsibility of a dataset to store/load itself so that datasets are lean and have another class do it (i.e. the DatasetCreator).

But we also have a few scripts that just run as they are to generate a static version of a dataset. These can serve as examples, documentation, be tested in QA configurations just like the example notebooks.

Convert examples to scripts

Examples are nice as they serve as documentation, but regular documentation and unit-tests also serve as documentation. If we're going to have examples, they should be test-able.

Also, I don't like that we have to include a handful of libraries to support our examples, can we not just optionally install the examples?

FSK generation issues

There are a couple of issues with the FSK signal generation:

  1. The low pass filter applied to non gaussian FSKs does not take the number of tones into accountwhen calculating the bandwidth. The LPF is randomly chosen between 1.25 and 3.75 times the symbol rate, but the 3dB bandwidth of an FSK is (num_tones - 1) * tone_spacing where tone_spacing is modulation_index * symbol_rate. So the 3dB bandwidth of 16 tone FSK with mod index of 1 is 15 * symbol_rate but the LPF filter for this signal is between 1.25 and 3.75 * symbol_rate. This means that all but the 2 level FSKs are always over filtered
  2. The modulation index is not correct for anything larger than the 2FSKs. This is because the "constellation" of the FSKs (the frequency list) is normalized between -1 and 1 and then multiplied by the mod_index when passed to the FM modulation. This normalizes the total frequency excursion to 2 which happens to actually be an excursion of 1 since the argument to the FM modulation has a pi instead of a 2pi. So for the 2FSKs, a modulation index of 1 results in the correct tone spacing. But when you pack more than 2 tones between -1 and 1, the tone spacing gets scaled by a factor of num_tones/2. For example, packing 4 tones onto [-1, 1] means that the normalized tone spacing (accounting for the pi vs 2pi in the modulator) is 0.5 not 1, so the 4FSK has an effective modualtion index of 0.5 and the 4MSK has an effective modulation index of 0.25.

Fix spectrogram models

Right now, these require a combination of package dependencies with versions that are too hard to wield.

torchsig.transforms subpackage does not export its contents correctly

In torchsig/transforms/__init__.py, the following imports result in names being overridden in the torchsig.transforms namespace:

from .functional import *
from .transforms import *
from torchsig.transforms.system_impairment import *
from torchsig.transforms.wireless_channel import *
from torchsig.transforms.expert_feature import *
from torchsig.transforms.signal_processing import *
from torchsig.transforms.deep_learning_techniques import *
from torchsig.transforms.target_transforms import *

Specifically, some of the transforms subpackages themselves contain modules named functional.py, which results in the following mis-import:

>>> from torchsig.transforms import functional as f
>>> f
<module 'torchsig.transforms.deep_learning_techniques.functional' from '/home/repo/torchsig.git/torchsig/transforms/deep_learning_techniques/functional.py'>

pip install issues(?)

Followed the steps to pip install TorchSig however, I can not run any of the scripts since there are no models(example 5) that were pip installed or other dependencies like cm_plotter (example 1).

Which package versions does the code support?

Hi,
What is the python/torch/torchvision versions that the code support?

I"m getting this issue:
OSError: /home/usr/anaconda3/lib/python3.9/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11

Thanks.

TypeError: cannot pickle 'module' object

Describe the bug
I'm running the example code in: 00_example_sig53_dataset.py, but the call to creator.create() gives me a TypeError: cannot pickle 'module' object from the multiprocessing module, full stack trace below.

Also the two lines in the docs that use torch.data.DataLoader for using sig53 don't work, but that's another issue I suppose.

Environment Info
MacBook, OSX: 13.2.1 (22D68)
Python 3.10.10 virtualenv

Stacktrace

Traceback (most recent call last):
  File "/Users/garrickw/Documents/mudsum/perc-rf/venv/lib/python3.10/site-packages/tqdm/std.py", line 1178, in __iter__
    for obj in iterable:
  File "/Users/garrickw/Documents/mudsum/perc-rf/extern/torchsig/torchsig/utils/writer.py", line 57, in __iter__
    return iter(self.loader)
  File "/Users/garrickw/Documents/mudsum/perc-rf/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 442, in __iter__
    return self._get_iterator()
  File "/Users/garrickw/Documents/mudsum/perc-rf/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 388, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/Users/garrickw/Documents/mudsum/perc-rf/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1043, in __init__
    w.start()
  File "/usr/local/Cellar/[email protected]/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/usr/local/Cellar/[email protected]/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/local/Cellar/[email protected]/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
  File "/usr/local/Cellar/[email protected]/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/local/Cellar/[email protected]/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/local/Cellar/[email protected]/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/usr/local/Cellar/[email protected]/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'module' object

Fix torchsig/transforms/deep_learning_techniques/dlt.py

from torchsig.utils import SignalDescription, SignalData, SignalDataset
...
from torchsig.transforms.wireless_channel import TargetSNR

to

from torchsig.utils.types import SignalDescription, SignalData
from torchsig.utils.dataset import SignalDataset
...
from torchsig.transforms.wireless_channel.wce import TargetSNR

Demand 1TB memory on Disk

when creating Sig53 or WidebandSig53
there is in the init function a code that create a dataset file
self._env = lmdb.open( str(self.path).encode(), max_dbs=2, map_size=int(1e12), max_readers=512, readahead=False, )
The map_size=int(1e12) demand that I have 1TB free on disk and indeed throw an error:

---------------------------------------------------------------------------
Error                                     Traceback (most recent call last)
<ipython-input-4-b4d9e9cc30f1> in <module>
     26     use_signal_data=True,
     27     gen_batch_size=1,
---> 28     use_gpu=False,
     29 )
     30 

c:\users\yoavp\documents\projects\torchsig\venv\lib\site-packages\torchsig\datasets\wideband_sig53.py in __init__(self, root, train, impaired, transform, target_transform, regenerate, use_signal_data, gen_batch_size, use_gpu)
    113             map_size=int(1e12),
    114             max_readers=512,
--> 115             readahead=False,
    116         )
    117 

Error: wideband_sig53\wideband_sig53_clean_val: There is not enough space on the disk.

If I only want to create the train data / only val data I think I don't need that much memory.
Is there a reason to ask for this amount of memory? can it be set to the minimal value possible?

Build Warning - AttributeError: module 'numpy' has no attribute 'typing'

Warning while building torchsig:

WARNING: autodoc: failed to import class 'datasets.Sig53' from module 'torchsig'; the following exception was raised:
Traceback (most recent call last):
  File "/home/sognefeste/Documents/TOOLS/torchsig/.torchsigenv/lib/python3.8/site-packages/sphinx/ext/autodoc/importer.py", line 58, in import_module
    return importlib.import_module(modname)
  File "/home/sognefeste/.pyenv/versions/3.8.12/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/sognefeste/Documents/TOOLS/torchsig/torchsig/__init__.py", line 1, in <module>
    from torchsig import transforms
  File "/home/sognefeste/Documents/TOOLS/torchsig/torchsig/transforms/__init__.py", line 2, in <module>
    from . import system_impairment
  File "/home/sognefeste/Documents/TOOLS/torchsig/torchsig/transforms/system_impairment/__init__.py", line 1, in <module>
    from .si import *
  File "/home/sognefeste/Documents/TOOLS/torchsig/torchsig/transforms/system_impairment/si.py", line 9, in <module>
    from torchsig.transforms.functional import NumericParameter, IntParameter, FloatParameter
  File "/home/sognefeste/Documents/TOOLS/torchsig/torchsig/transforms/functional.py", line 10, in <module>
    class RandomStatePartial(Protocol):
  File "/home/sognefeste/Documents/TOOLS/torchsig/torchsig/transforms/functional.py", line 21, in RandomStatePartial
    def __call__(self, size: Union[int, Sequence[int]] = ...) -> np.typing.ArrayLike:
  File "/home/sognefeste/Documents/TOOLS/torchsig/.torchsigenv/lib/python3.8/site-packages/numpy/__init__.py", line 311, in __getattr__
    raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'typing'

Causes my program to crash importing the following:

from torchsig.datasets import Sig53, SignalDataset

Environment:

python --version
Python 3.8.12
pip freeze
absl-py==1.2.0
aiohttp==3.8.3
aiosignal==1.2.0
alabaster==0.7.12
asttokens==2.0.8
async-timeout==4.0.2
attrs==22.1.0
Babel==2.10.3
backcall==0.2.0
beautifulsoup4==4.11.1
better-apidoc==0.3.2
cachetools==5.2.0
certifi==2022.9.24
charset-normalizer==2.1.1
colorama==0.4.5
commonmark==0.9.1
contourpy==1.0.5
cycler==0.11.0
decorator==5.1.1
docutils==0.17.1
executing==1.1.0
filelock==3.8.0
fonttools==4.37.3
frozenlist==1.3.1
fsspec==2022.8.2
gdown==4.5.1
google-auth==2.12.0
google-auth-oauthlib==0.4.6
grpcio==1.49.1
h5py==3.7.0
icecream==2.1.3
idna==3.4
imagesize==1.4.1
importlib-metadata==4.12.0
ipdb==0.13.9
ipython==8.5.0
jedi==0.18.1
Jinja2==3.1.2
joblib==1.2.0
kiwisolver==1.4.4
llvmlite==0.39.1
lmdb==1.3.0
Markdown==3.4.1
MarkupSafe==2.1.1
matplotlib==3.6.0
matplotlib-inline==0.1.6
multidict==6.0.2
numba==0.56.2
numpy==1.23.3
oauthlib==3.2.1
packaging==21.3
pandas==1.5.0
parso==0.8.3
pexpect==4.8.0
pickleshare==0.7.5
Pillow==9.2.0
prompt-toolkit==3.0.31
protobuf==3.19.5
ptyprocess==0.7.0
pure-eval==0.2.2
pyasn1==0.4.8
pyasn1-modules==0.2.8
pyDeprecate==0.3.2
Pygments==2.13.0
pyparsing==3.0.9
PySocks==1.7.1
python-dateutil==2.8.2
pytorch-lightning==1.7.7
pytz==2022.2.1
PyWavelets==1.4.1
PyYAML==6.0
recommonmark==0.7.1
requests==2.28.1
requests-oauthlib==1.3.1
rsa==4.9
scikit-learn==1.1.2
scipy==1.9.1
six==1.16.0
snowballstemmer==2.2.0
soupsieve==2.3.2.post1
sphinx==5.2.2
sphinx-rtd-theme==1.0.0
sphinxcontrib-applehelp==1.0.2
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==2.0.0
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.5
stack-data==0.5.1
tensorboard==2.10.1
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
threadpoolctl==3.1.0
timm==0.5.4
toml==0.10.2
torch==1.12.1
torchmetrics==0.9.3
torchsig @ file:///home/sognefeste/Documents/TOOLS/torchsig
torchvision==0.13.1
tqdm==4.64.1
traitlets==5.4.0
typing-extensions==4.3.0
urllib3==1.26.12
wcwidth==0.2.5
Werkzeug==2.2.2
yarl==1.8.1
zipp==3.8.1
lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.4 LTS
Release:	20.04
Codename:	focal

No warning about CuSignal dependency when generating a dataset.

Hi all, thanks for releasing this game changer of a repo! I'd like to share a couple of installation issues I had when following the instructions in the README. First, sympy isn't included in the requirements. Second, I had to explicitly install cuSignal after installing torchsig, otherwise the GPU isn't used when generating the dataset.

Fix filter length approximations

Filter length approximation generates too large filters. Examples include:

https://github.com/TorchDSP/torchsig/blob/main/torchsig/datasets/synthetic.py#L760
https://github.com/TorchDSP/torchsig/blob/main/torchsig/datasets/wideband.py#L373
https://github.com/TorchDSP/torchsig/blob/main/torchsig/datasets/wideband.py#L481
https://github.com/TorchDSP/torchsig/blob/main/torchsig/datasets/wideband.py#L589

Should be using filter length approximation function: def estimate_filter_length(attenuation_db, sample_rate, transition_bandwidth) in synthetic.py

Review randomized pulse shaping for FSK/MSK

  1. The randomized pulse shaping filter as part of FSKDataSet(), _generate_samples() in synthetic.py may not be needed. Presumably the filter is used to attenuate the sidelobes of FSK and MSK as part of randomized impairments, see paper "Large Scale Radio Frequency Signal Classification", section 3.2.2 Randomized Pulse Shaping. Unclear if this randomization provides value to the RFML algorithms.

  2. If the randomized pulse shaping filter is needed, then the filter length approximation on line 770 needs to be corrected because it's generating filters that are too large. The approximation can be simplified as num_taps = int(115/lpf_bandwidth). Because lpf_bandwidth < 1, the filter length becomes a massive number. A different filter length approximation is needed.

Automated Testing: Model instantiation

A simple suite of tests for ensuring that models supported in this repository can be instantiated without issue. Because the baseline models are being imported from other Python libraries, we need a way to know when new versions of these libraries cause instantiation to fail.

Automated Testing: Dataset generation

Generating the datasets from scratch may take some time, but we can at least ensure that we can generate maybe ~100 samples with different parameterizations of some of the relevant datasets.

Yolov5 output interpretation

Hello, I adapted the DETR wideband detector example to perform detection on some of my own signal data. I am now trying to adapt to using the yolov5 model and am running into issues.

The output of the DETR model was a pandas dictionary with keys 'pred_logits' and 'pred_boxes'. When I feed in the same data to the yolov5 model (pred = model.eval()(data)) I get a tuple with size:

  • pred[0].shape: [1,16128,6]
  • pred[1][0].shape: [1, 3, 64, 64, 6]
  • pred[1][1].shape: [1, 3, 32, 32, 6]
  • pred[1][2].shape: [1, 3, 16, 16, 6]

Can you provide any guidance in how to process this output to get the bounding boxes and classes for the predictions? Completing this issue would probably help me out as well. Thanks!

Introduce new DatasetWriter class and other related classes

To support the parallel generation of datasets #41, #35, #36 , partial generation of datasets #31 , and storing datasets in multiple formats, it makes sense to have a class that can do this.

I think the DatasetCreator class can just have the responsibility of connecting a DatasetLoader and a DatasetWriter. Something like this.

The DatasetCreator can take care of seeding, naming, and handling dataset metadata.

The DatasetLoader is probably a Dataset wrapped by a torch.utils.data.DataLoader, but could be wrapped in anything that generates that data like a Python Generator.

The DatasetWriter is just a strategy to store metadata and data pairs.

This also means we can refactor out of named datasets like Sig53 any responsibility it has to generate/store itself.

datasets reveresed in 02_example_sig53_classifier.py ?

Line 37 sets train=False, but line 50 creates a dataset labled sig53_clean_train.

Line 60 then again states train=False, then line 61 creates a dataset labled sig53_clean_val.

shouldn't line 37 be train=True?

(am creating this as an issue rather than a PR to doublecheck my logic)
Thank you

Sig53 Dataset issue

Hi,

I am trying to use Sig53 datasets for my research, but I met some problems with it.

  1. Does the sig53 equal to WidebandSig53? Since the impaired version of WindebandSig53 only consists of 250000 samples with a length of 262144. However, sig53 contains 5.2M samples in the paper.

  2. The WidebandSig53 dataset's target contains multi-labels (as a list), and there are no documents related to the transformation.

Thanks,

Jie,

"pip install ." doesn't install

I tried to install today, then tried to run 00_example _sig53_dataset.py.
It errored out saying:
Traceback (most recent call last):
File "~/torchsig/examples/00_example_sig53_dataset.py", line 8, in
from torchsig.utils.visualize import IQVisualizer, SpectrogramVisualizer
ModuleNotFoundError: No module named 'torchsig'

To reproduce
on a clean Ubuntu-22.04 image (WSL2), I installed python3-pip
then git clone https://github.com/TorchDSP/torchsig.git
cd torchsig
~/torchsig$ pip install .
Defaulting to user installation because normal site-packages is not writeable
Processing /home/rfml/torchsig
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: UNKNOWN
Building wheel for UNKNOWN (pyproject.toml) ... done
Created wheel for UNKNOWN: filename=UNKNOWN-0.0.0-py3-none-any.whl size=1781 sha256=e2342c977887edc1d4ba88e9fa97f8df084732439bcd89f9bc42a361edc0bfcf
Stored in directory: /tmp/pip-ephem-wheel-cache-e8gqc7yu/wheels/d7/45/19/1a2488076fc711f0225342f63115e57818593ac2f598ba9438
Successfully built UNKNOWN
Installing collected packages: UNKNOWN
Attempting uninstall: UNKNOWN
Found existing installation: UNKNOWN 0.0.0
Uninstalling UNKNOWN-0.0.0:
Successfully uninstalled UNKNOWN-0.0.0
Successfully installed UNKNOWN-0.0.0

Broken cupy import when using GPU

Cupy is imported as 'xp' in synthetic.py and wideband.py, causing the assignment xp = cp to fail later in the code. I think the intent is to import it as cp.

synthetic.py

import cupy as xp

xp = cp if self.use_gpu else np

wideband.py

import cupy as xp

xp = cp if self.use_gpu else np

Pre-trained models do not seem to restore correctly

I was able to modify the code in a PR to get the pre-trained model weights to restore correctly, but when I run the pre-trained models against the validation dataset they do not produce meaningful results.

I used the following script to generate the impaired Sig53 dataset, and restore the EfficientNet B4 weights. And I used the evaluation code from the example notebook to run predict with the model as follows.

from pathlib import Path

import numpy as np
import torch
import torchsig
import torchsig.transforms as ST

from matplotlib import pyplot as plt
from sklearn.metrics import classification_report
from tqdm import tqdm

from cm_plotter import plot_confusion_matrix
from torchsig.datasets import Sig53

root = 'sig53'

Path(root).mkdir(parents=True, exist_ok=True)
train = False
impaired = False
class_list = list(Sig53._idx_to_name_dict.values())
transform = ST.Compose([
    ST.RandomPhaseShift(phase_offset=(-1, 1)),
    ST.Normalize(norm=np.inf),
    ST.ComplexTo2D(),
])
target_transform = ST.DescToClassIndex(class_list=class_list)

sig53_impaired_val = Sig53(
    root=root, 
    train=False, 
    impaired=True,
    transform=transform,
    target_transform=target_transform,
    use_signal_data=True,
)

# Load the pretrained weights directly downloaded from torchsig.com
model = torchsig.models.efficientnet_b4(
    pretrained=True,
    path="efficientnet_b4_online.pt",
)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
model.eval()

# Infer results over validation set
num_test_examples = len(sig53_impaired_val)
num_classes = len(list(Sig53._idx_to_name_dict.values()))
y_raw_preds = np.empty((num_test_examples,num_classes))
y_preds = np.zeros((num_test_examples,))
y_true = np.zeros((num_test_examples,))

for i in tqdm(list(range(0,num_test_examples))):
    # Retrieve data
    idx = i # Use index if evaluating over full dataset
    data, label = sig53_impaired_val[idx]
    # Infer
    data = torch.from_numpy(np.expand_dims(data,0)).float()
    data = data.cuda() if torch.cuda.is_available() else data
    with torch.no_grad():
        pred_tmp = model(data)
    pred_tmp = pred_tmp.cpu().numpy() if torch.cuda.is_available() else pred_tmp
    # Argmax
    y_preds[i] = np.argmax(pred_tmp)
    # Store label
    y_true[i] = label

acc = np.sum(np.asarray(y_preds)==np.asarray(y_true))/len(y_true)
plot_confusion_matrix(
    y_true, 
    y_preds, 
    classes=class_list,
    normalize=True,
    title="Example Modulations Confusion Matrix\nTotal Accuracy: {:.2f}%".format(acc*100),
    text=False,
    rotate_x_text=90,
    figsize=(16,9),
)

plt.savefig('confusion_matrix_effnetb4_online.png')

However, the final prediction is seemingly random:
confusion_matrix_restored_effnetb4_online

I confirmed that the weights are restoring into the parameters of the model when using the pretrained=True flag, and that you get randomly initialized weights when you run with pretrained=False.

The major difference is that I'm using the code as posted to create the impaired dataset, and I guess there is potential that my randomized dataset does not match the dataset that you used.

However, training the "clean" model for 1M steps using the example notebook on the clean dataset and evaluating it on the clean validation set did result in a model that at least learned something:
confusion_matrix_trained_effnetb4_static

Is there something in my above script that isn't correct? I've tested the same code restoring the XCiT-Tiny12 network with the same outcome.

Installation instructions on README.md

Requires update README.md from:

cd torchsig
pip install -r requirements.txt
pip install .

to...

cd torchsig
pip install -e .

in order to conform to installing dependencies via pyproject.toml

02_example_sig53_classifier.py has data on CPU and GPU

Running example 2 on my laptop (Dell XPS 17 9700 with GPU RTX 2060) errors out with "Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same" during "Evaluation of the Trained model" referring to line 210.
After troubleshooting I found that, after loading the state_dict, example_model was on the CPU, in contrast to data, which on line 209 is on the GPU.
This can be verified by inserting next(example_model.parameters()).is_cuda on line 195.

Possible fix, based on https://pytorch.org/tutorials/beginner/saving_loading_models.html#save-on-gpu-load-on-gpu
is to add
example_model.to(device) after line 194

Fix example jupyter notebook

Example notebook 03_example_widebandsig53_dataset give an error AttributeError: module 'torchsig.transforms.transforms' has no attribute 'Spectrogram'

New wideband visualizers missing

The torchsig/utils/visualize.py file was not updated in the wideband code release. Need to update this file to include missing visualizations, such as the MaskClassVisualizer which is used in the new wideband example notebook, 03_example_widebandsig53_dataset.ipynb.

Packaging: Cleanup requirements

  1. We should have more rigorously defined requirements with ranges of version values.
  2. We should get rid of requirements that are unused, like ipdb
  3. Requirements that are only used for a certain file or function should have a means of prompting the user to install it, if it's a heavier-weight library.
  4. If we really only use a single function from a library, it's better to implement that function in this software directly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.