luizgh / sigver Goto Github PK

Signature verification package, for learning representations from signature data, training user-dependent classifiers.

License: BSD 3-Clause "New" or "Revised" License

Python 50.20% Jupyter Notebook 49.80%

sigver's Introduction

Learning representations for Offline Handwritten Signature Verification

This repository contains code to train CNNs for feature extraction for Offline Handwritten Signatures, code to train writer-dependent classifiers [1] and code to train meta-learners [3]. It also contains code to train two countermeasures for Adversarial Examples, as described in [2] (the code to run the experiments from this second paper can be found in this repository).

[1] Hafemann, Luiz G., Robert Sabourin, and Luiz S. Oliveira. "Learning Features for Offline Handwritten Signature Verification using Deep Convolutional Neural Networks" http://dx.doi.org/10.1016/j.patcog.2017.05.012 (preprint)

[2] Hafemann, Luiz G., Robert Sabourin, and Luiz S. Oliveira. "Characterizing and evaluating adversarial examples for Offline Handwritten Signature Verification" https://doi.org/10.1109/TIFS.2019.2894031 (preprint)

[3] Hafemann, Luiz G., Robert Sabourin, and Luiz S. Oliveira. "Meta-learning for fast classifier adaptation to new users of Signature Verification systems" https://doi.org/10.1109/TIFS.2019.2949425 (preprint)

This code for feature extraction and writer-dependent classifiers is a re-implementation in Pytorch (original code for [1] was written in theano+lasagne: link).

Installation

This package requires python 3. Installation can be done with pip:

pip install git+https://github.com/luizgh/sigver.git  --process-dependency-links

You can also clone this repository and install it with pip install -e <path/to/repository> --process-dependency-links

Usage

Data preprocessing

The functions in this package expect training data to be provided in a single .npz file, with the following components:

x: Signature images (numpy array of size N x 1 x H x W)
y: The user that produced the signature (numpy array of size N )
yforg: Whether the signature is a forgery (1) or genuine (0) (numpy array of size N )

We provide functions to process some commonly used datasets in the script sigver.datasets.process_dataset. As an example, the following code pre-process the MCYT dataset with the procedure from [1] (remove background, center in canvas and resize to 170x242)

python -m sigver.preprocessing.process_dataset --dataset mcyt \
 --path MCYT-ORIGINAL/MCYToffline75original --save-path mcyt_170_242.npz

During training a random crop of size 150x220 is taken for each iteration. During test we use the center 150x220 crop.

Training a CNN for Writer-Independent feature learning

This repository implements the two loss functions defined in [1]: SigNet (learning from genuine signatures only) and SigNet-F (incorporating knowledge of forgeries). In the training script, the flag --users is use to define the users that are used for feature learning. In [1], GPDS users 300-881 were used (--users 300 881).

Training SigNet:

python -m sigver.featurelearning.train --model signet --dataset-path  <data.npz> --users [first last]\ 
--model signet --epochs 60 --logdir signet

Training SigNet-F with lambda=0.95:

python -m sigver.featurelearning.train --model signet --dataset-path  <data.npz> --users [first last]\ 
--model signet --epochs 60 --forg --lamb 0.95 --logdir signet_f_lamb0.95

For checking all command-line options, use python -m sigver.featurelearning.train --help. In particular, the option --visdom-port allows real-time monitoring using visdom (start the visdom server with python -m visdom.server -port <port>).

Training WD classifiers and evaluating the result

For training and testing the WD classifiers, use the sigver.wd.test script. Example:

python -m sigver.wd.test -m signet --model-path <path/to/trained_model> \
    --data-path <path/to/data> --save-path <path/to/save> \
    --exp-users 0 300 --dev-users 300 881 --gen-for-train 12

Where trained_model is a .pth file (trained with the script above, or pre-trained - see the section below). This script will split the dataset into train/test, train WD classifiers and evaluate then on the test set. This is performed for K random splits (default 10). The script saves a pickle file containing a list, where each element is the result of one random split. Each item contains a dictionary with:

'all_metrics': a dictionary containing:
- 'FRR': false rejection rate
- 'FAR_random': false acceptance rate for random forgeries
- 'FAR_skilled': false acceptance rate for skilled forgeries
- 'mean_AUC': mean Area Under the Curve (average of AUC for each user)
- 'EER': Equal Error Rate using a global threshold
- 'EER_userthresholds': Equal Error Rate using user-specific thresholds
- 'auc_list': the list of AUCs (one per user)
- 'global_threshold': the optimum global threshold (used in EER)
'predictions': a dictionary containing the predictions for all images on the test set:
- 'genuinePreds': Predictions to genuine signatures
- 'randomPreds': Predictions to random forgeries
- 'skilledPreds': Predictions to skilled forgeries

The example above train WD classifiers for the exploitation set (users 0-300) using a development set (users 300-881), with 12 genuine signatures per user (this is the setup from [1] - refer to the paper for more details). For knowing all command-line options, run python -m sigver.wd.test -m signet.

Pre-trained models

Pre-trained models can be found here:

SigNet (link)
SigNet-F lambda 0.95 (link)

These models contains the weights for the feature extraction layers.

Important: These models were trained with pixels ranging from [0, 1]. Besides the pre-processing steps described above, you need to divide each pixel by 255 to be in the range. This can be done as follows: x = x.float().div(255). Note that Pytorch does this conversion automatically if you use torchvision.transforms.totensor, which is used during training.

Usage:

import torch
from sigver.featurelearning.models import SigNet

# Load the model
state_dict, classification_layer, forg_layer = torch.load('models/signet.pth')
base_model = SigNet().eval()
base_model.load_state_dict(state_dict)

# Extract features
with torch.no_grad(): # We don't need gradients. Inform torch so it doesn't compute them
    features = base_model(input)

See example.py for a complete example. For a jupyter notebook, see this interactive example.

Meta-learning

To train a meta-learning model, use the sigver.metalearning.train script:

python -m sigver.metalearning.train --dataset-path <path/to/datataset.npz> \
    --pretrain-epochs <number_pretrain_epochs> --num-updates <number_gradient_steps> --num-rf <num_random_forgeries> \
    --epochs <number_epochs> --num-sk-test <number_skilled_in_Dtest> --model <model>

Where num-updates is the number of gradient descent steps in the classifier adaptation (K in the paper), and num-rf refers to the number of random forgeries in the classifier adaptation steps (set --num-rf 0 for the one-class formulation). Refer to the sigver/metalearning/train.py script for a complete list of arguments.

Citation

If you use our code, please consider citing the following papers:

License

The source code is released under the BSD 3-clause license. Note that the trained models used the GPDS dataset for training, which is restricted for non-commercial use.

Please do not contact me requesting access to any particular dataset. These requests should be addressed directly to the groups that collected them.

sigver's People

Contributors

Stargazers

Watchers

sigver's Issues

y-value equals to user's index instead of user's label

In the file "sigver/datasets/util.py" the function "process_dataset_images" loops through all the users and set the y-value of each signature to the index of the user instead of the value.

In other words, it does this:

for i, user in enumerate(tqdm(users)):
    ...
    y[indexes] = i

but I believe it should be like this:

for i, user in enumerate(tqdm(users)):
    ...
    y[indexes] = user

Is this an error or did I understood something incorrectly?

ModuleNotFoundError: No module named 'sigver.datasets.toremove'

when I run the train.py in meta-learning, there is an error message(ModuleNotFoundError: No module named 'sigver.datasets.toremove'). It seems that a source file named "toremove.py" was missing in this repositories?

Where i get the pickle file for testing

Trouble executing training script train.py on CEDAR Dataset

I have preprocessed CEDAR Dataset as per instructions

$ python3 -m sigver.preprocessing.process_dataset --dataset cedar --path /home/atinesh/Downloads/Datasets/CEDAR/signatures --save-path data/cedar/mcyt_170_242.npz

Processing dataset
Number of users: 55
Allocated x of shape: (2640, 170, 242)
 51%|████████████████████████████████▌                               | 28/55 [00:27<00:24,  1.09it/s]/home/atinesh/Desktop/virtual_envs/pytorch_env/lib/python3.6/site-packages/skimage/util/dtype.py:135: UserWarning: Possible precision loss when converting from float64 to uint8
  .format(dtypeobj_in, dtypeobj_out))
100%|████████████████████████████████████████████████████████████████| 55/55 [00:54<00:00,  1.01it/s]

But when I try to run train.py by the following command

$ python3 -m sigver.featurelearning.train --model signet --dataset-path data/cedar/mcyt_170_242.npz --users 1 55 --model signet --epochs 10 --logdir signet

I get error in util.py

train.py, line 330, in main
x, y, yforg, usermapping, filenames = util.load_dataset(args.dataset_path)

util.py, line 31, in load_dataset
user_mapping, filenames = data['user_mapping'], data['filenames']

ValueError: Object arrays cannot be loaded when allow_pickle=False

Expected more than 1 value per channel when training, got input size torch.Size([1, 2048])

try to run this project , like this:
/projects/projectS/sigver/sigver/featurelearning/train.py
--model signet --logdir ./../../npz --dataset-path /projects/projectS/sigver/npz/mcyt_170_242.npz --users 0 100 --epochs 10

has some problem:

Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1596, in
globals = debugger.run(setup['file'], None, None, is_module)
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 974, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/projects/projectS/sigver/sigver/featurelearning/train.py", line 427, in
main(arguments)
File "/projects/projectS/sigver/sigver/featurelearning/train.py", line 368, in main
device, logger, args, logdir)
File "/projects/projectS/sigver/sigver/featurelearning/train.py", line 83, in train
val_metrics = test(val_loader, base_model, classification_layer, device, args.forg, forg_layer)
File "/projects/projectS/sigver/sigver/featurelearning/train.py", line 282, in test
features = base_model(x)
File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/projects/projectS/sigver/sigver/featurelearning/models/signet.py", line 32, in forward
x = self.fc_layers(x)
File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 76, in forward
exponential_average_factor, self.eps)
File "/usr/local/lib/python3.6/site-packages/torch/nn/functional.py", line 1619, in batch_norm
raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 2048])

Preprocessing scheme

I noticed that you have mentioned " we normalized the input to the neural network by dividing each pixel by the standard deviation of all pixel intensities (from all images in D)" in the paper of SigNet. But I haven't found the relative code either in "preprocessing_signature" function or in "set_up_dataloaders". Do I ignore the code in somewhere else? Or this procedure has been discarded already?

not getting good results in pretrained model with real time images, why?

I was tried with Pre-trained models,

SigNet
SigNet-F lambda 0.95

with Cosine similarity between two image features (like: one genuine, one forgery signature).

feature1 = base_model(input1)
feature2 = base_model(input2)
cosine_similarity(features1, features2)

real1.png, real2.png -> similarity: 0.31142098
real1.png, fake1.png -> similarity: 0.2714578
real1.png, fake2.png -> similarity: 0.6426417
real2.png, fake1.png -> similarity: 0.18943718
real2.png, fake2.png -> similarity: 0.6238067

here how i concluded is 60% above similarity means, both as verified, else rejected.
but 2/5 cases successfully verified. 40% accuracy only i got it from pretrained model.

I have few doubts @luizgh, @gonultasbu and @atinesh-s sir. please help to resolve,

when i am testing with real time noisy images (like signature written in paper images) not getting good results. why? how to resolve these issue?
How i will improve the accuary of the model SigNet?

Thanks & Regards
Murugan Rajenthiran

Issue about using WD classifier and running test.py

Hello,
I am a Phd scholar and I am working on signarure verification. I am trying ti execute your code and want to see how you have computed the EER for all the runs/folds of data. When i run test.py in wd folder i get the following error.
usage: test.py [-h] -m {signet,signet_thin,signet_smaller} --model-path
MODEL_PATH --data-path DATA_PATH [--save-path SAVE_PATH]
[--input-size INPUT_SIZE INPUT_SIZE]
[--exp-users EXP_USERS EXP_USERS]
[--dev-users DEV_USERS DEV_USERS] [--gen-for-test GEN_FOR_TEST]
[--forg-from_exp FORG_FROM_EXP] [--forg-from_dev FORG_FROM_DEV]
[--svm-type {rbf,linear}] [--svm-c SVM_C]
[--svm-gamma SVM_GAMMA] [--gpu-idx GPU_IDX]
[--batch-size BATCH_SIZE] [--folds FOLDS]
test.py: error: the following arguments are required: -m, --model-path, --data-path
An exception has occurred, use %tb to see the full traceback.

SystemExit: 2
Can you please guied me what is the solution of this. Moreover I want to use your eer_user_threshold function in my own implementation. Fot that I need to compute eer for each user and then compute overall EER for each fold/run???

Cannot extract features similar to features in "some_signature_features.pth"

I'm getting the error below with the example.py code and I'd like to know the possible cause.

I am using the following environment:
python == 3.7
pytorch == 1.11
Looking forward to your reply.

Exploitation set in WD classifiers

If I understood the paper correctly, the exploitation set is used only to train the feature extractor, if that's the case why is it necessary to specify the users for the exploitation set to train the WD model?

I tried to pass an empty set as the exploitation set but that caused an error.

Problem about the result pickle file

As a beginner of Python, I want to ask
How can the pickle file be transformed to a readable format to obtain the FAR, AUC, owner prediction and etc?

ZeroDivisionError: integer division or modulo by zero

Dear Luis,

Thank you for making this code publicly available.
I do encounter an error while trying to train on the CEDAR dataset though, perhaps you could help me?

The error looks like this:

The script I use to start training looks like this:

python3.7 -m  sigver.featurelearning.train \
    --dataset cedar \
    --model signet \
    --dataset-path  /data/signature_matching/data/processed/sigver_datasets/cedar.npz \
    --users 10 20 \
    --epochs 2 \
    --logdir /tmp/signet

The cedar.npz dataset has been preprocessed as has been indicated in the README.

Hope you can help me with this.

Can not reproduce the results.

Hi,
I have trained the writer-dependent classifiers using your pre-trained models. But EER score is not the same as reported in your papers.

prediction for wd classification

Sir, after training WD classifier and meta learning can you tell us how to predict forgery or genuine for a single image

Thresholds

How exactly is the global and user threshold calculated? given the SVM decision function, do we use user-specific threshold for calculating probabilities?

Where to download CEDAR and Brazilian PUC-PR database

I'm working on an offline signature verification project right now. I really need those benchmark databases to compare my work on. I searched the whole Internet for CEDAR and Brazilian PUC-PR database but ended up in vain. The paper cited for CEDAR leaves no link and the email the author left was abandoned. The paper of Brazilian PUC-PR was in Portuguese and I simply don't understand a word.
Could you please share the link of these two databases. I'd be very grateful if you help me.

Assertion error in example.py

Hi,
So, as the title says, I got an assertion error running example.py. I printed out the value of

(features.cpu() - expected_features).abs().max()

and got 0.0010.

Versions of required libraries:
matplotlib: 3.0.2
torch: 1.0.1
torchvision: 0.2.1

Transfer learning

Is transfer learning with using the pretrained signet model a good idea? Adding additional data which is specific to current application and updating the cnn model?

RuntimeError: invalid argument 2: non-empty vector or matrix expected at /pytorch/aten/src/THCUNN/generic/ClassNLLCriterion.cu:32

CUDA_VISIBLE_DEVICES=0 python -m sigver.featurelearning.train --model signet --dataset-path /home/dell/Documents/Prasad_AI/sign_similarity/sigver/dataset_npz/dataset_npz.npz --users 20 55 --model signet --epochs 60 --forg --lamb 0.95 --logdir signet_f_lamb0.95
Namespace(batch_size=32, checkpoint=None, dataset_path='/home/dell/Documents/Prasad_AI/sign_similarity/sigver/dataset_npz/dataset_npz.npz', epochs=60, forg=True, gpu_idx=0, input_size=(150, 220), lamb=0.95, logdir='signet_f_lamb0.95', loss_type='L2', lr=0.001, lr_decay=0.1, lr_decay_times=3, model='signet', momentum=0.9, seed=42, test=False, users=[20, 55], visdomport=None, weight_decay=0.0001)
Using device: cuda:0
Loading Data
Initializing Model
Training
/home/dell/Documents/Prasad_AI/sign_similarity/sigver/sigver/featurelearning/train.py:180: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
x = torch.tensor(x, dtype=torch.float).to(device)
/home/dell/Documents/Prasad_AI/sign_similarity/sigver/sigver/featurelearning/train.py:181: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
y = torch.tensor(y, dtype=torch.long).to(device)
/home/dell/Documents/Prasad_AI/sign_similarity/sigver/sigver/featurelearning/train.py:182: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
yforg = torch.tensor(batch[2], dtype=torch.float).to(device)
/home/dell/Documents/Prasad_AI/sign_similarity/sigver/sigver/featurelearning/train.py:278: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
x = torch.tensor(x, dtype=torch.float).to(device)
/home/dell/Documents/Prasad_AI/sign_similarity/sigver/sigver/featurelearning/train.py:279: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
y = torch.tensor(y, dtype=torch.long).to(device)
/home/dell/Documents/Prasad_AI/sign_similarity/sigver/sigver/featurelearning/train.py:280: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
yforg = torch.tensor(yforg, dtype=torch.float).to(device)
Epoch 0. Val loss: 3.4440, Val acc: 10.22%,Val forg loss: 0.5179, Val forg acc: 79.69%
Epoch 1. Val loss: 3.2476, Val acc: 20.76%,Val forg loss: 0.4181, Val forg acc: 84.38%
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/dell/Documents/Prasad_AI/sign_similarity/sigver/sigver/featurelearning/train.py", line 428, in
main(arguments)
File "/home/dell/Documents/Prasad_AI/sign_similarity/sigver/sigver/featurelearning/train.py", line 369, in main
device, logger, args, logdir)
File "/home/dell/Documents/Prasad_AI/sign_similarity/sigver/sigver/featurelearning/train.py", line 81, in train
epoch, optimizer, lr_scheduler, callback, device, args)
File "/home/dell/Documents/Prasad_AI/sign_similarity/sigver/sigver/featurelearning/train.py", line 201, in train_epoch
class_loss = F.cross_entropy(logits, y[yforg == 0])
File "/home/dell/Documents/Prasad_AI/sign_similarity/sigver/myenv/lib/python3.6/site-packages/torch/nn/functional.py", line 1970, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/home/dell/Documents/Prasad_AI/sign_similarity/sigver/myenv/lib/python3.6/site-packages/torch/nn/functional.py", line 1790, in nll_loss
ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: invalid argument 2: non-empty vector or matrix expected at /pytorch/aten/src/THCUNN/generic/ClassNLLCriterion.cu:32