Giter Club home page Giter Club logo

githubharald / ctcdecoder Goto Github PK

View Code? Open in Web Editor NEW
802.0 25.0 179.0 1.01 MB

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

Home Page: https://towardsdatascience.com/3797e43a86c

License: MIT License

Python 91.37% C 8.63%
token-passing beam-search ctc language-model best-path prefix-search handwriting-recognition speech-recognition recurrent-neural-networks loss

ctcdecoder's Introduction

CTC Decoding Algorithms

Update 2021: installable Python package

Python implementation of some common Connectionist Temporal Classification (CTC) decoding algorithms. A minimalistic language model is provided.

Installation

  • Go to the root level of the repository
  • Execute pip install .
  • Go to tests/ and execute pytest to check if installation worked

Usage

Basic usage

Here is a minimalistic executable example:

import numpy as np
from ctc_decoder import best_path, beam_search

mat = np.array([[0.4, 0, 0.6], [0.4, 0, 0.6]])
chars = 'ab'

print(f'Best path: "{best_path(mat, chars)}"')
print(f'Beam search: "{beam_search(mat, chars)}"')

The output mat (numpy array, softmax already applied) of the CTC-trained neural network is expected to have shape TxC and is passed as the first argument to the decoders. T is the number of time-steps, and C the number of characters (the CTC-blank is the last element). The characters that can be predicted by the neural network are passed as the chars string to the decoder. Decoders return the decoded string.
Running the code outputs:

Best path: ""
Beam search: "a"

To see more examples on how to use the decoders, please have a look at the scripts in the tests/ folder.

Language model and BK-tree

Beam search can optionally integrate a character-level language model. Text statistics (bigrams) are used by beam search to improve reading accuracy.

from ctc_decoder import beam_search, LanguageModel

# create language model instance from a (large) text
lm = LanguageModel('this is some text', chars)

# and use it in the beam search decoder
res = beam_search(mat, chars, lm=lm)

The lexicon search decoder computes a first approximation with best path decoding. Then, it uses a BK-tree to retrieve similar words, scores them and finally returns the best scoring word. The BK-tree is created by providing a list of dictionary words. A tolerance parameter defines the maximum edit distance from the query word to the returned dictionary words.

from ctc_decoder import lexicon_search, BKTree

# create BK-tree from a list of words
bk_tree = BKTree(['words', 'from', 'a', 'dictionary'])

# and use the tree in the lexicon search
res = lexicon_search(mat, chars, bk_tree, tolerance=2)

Usage with deep learning frameworks

Some notes:

  • No adapter for TensorFlow or PyTorch is provided
  • Apply softmax already in the model
  • Convert to numpy array
  • Usually, the output of an RNN layer rnn_output has shape TxBxC, with B the batch dimension
    • Decoders work on single batch elements of shape TxC
    • Therefore, iterate over all batch elements and apply the decoder to each of them separately
    • Example: extract matrix of batch element 0 mat = rnn_output[:, 0, :]
  • The CTC-blank is expected to be the last element along the character dimension
    • TensorFlow has the CTC-blank as last element, so nothing to do here
    • PyTorch, however, has the CTC-blank as first element by default, so you have to move it to the end, or change the default setting

List of provided decoders

Recommended decoders:

  • best_path: best path (or greedy) decoder, the fastest of all algorithms, however, other decoders often perform better
  • beam_search: beam search decoder, optionally integrates a character-level language model, can be tuned via the beam width parameter
  • lexicon_search: lexicon search decoder, returns the best scoring word from a dictionary

Other decoders, from my experience not really suited for practical purposes, but might be used for experiments or research:

  • prefix_search: prefix search decoder
  • token_passing: token passing algorithm
  • Best path decoder implementation in OpenCL (see extras/ folder)

This paper gives suggestions when to use best path decoding, beam search decoding and token passing.

Documentation of test cases and data

References

ctcdecoder's People

Contributors

a-sneddon avatar chwick avatar githubharald avatar thomasdelteil avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ctcdecoder's Issues

beam search

Hi! I have a question! The result of "best_path" is normal or i want to,but when i used the "beam_search",the result is none or no output,is all blank! So what i meet? How to deal with?

module 'pyopencl' has no attribute 'enqueue_write_buffer

I am having a difficult time using the GPU. it runs ok with out GPU but with I get this error

=====Line example (GPU)=====
Traceback (most recent call last):
File "main.py", line 147, in
testLineExampleGPU()
File "main.py", line 122, in testLineExampleGPU
resBatch = BestPathCL.ctcBestPathCL(batch, classes, clWrapper)
File "/home/ubuntu/handwrite/CTCDecoder/src/BestPathCL.py", line 109, in ctcBestPathCL
labelStrBatch = clWrapper.compute(batch)
File "/home/ubuntu/handwrite/CTCDecoder/src/BestPathCL.py", line 84, in compute
cl.enqueue_write_buffer(self.queue, self.batchBuf, batch.astype(np.float32), is_blocking=False)
AttributeError: module 'pyopencl' has no attribute 'enqueue_write_buffer'

Best path decoding (negative values of logits)

Hello Sir,
I have tested the code below. It is quite interresting. However, the best path search using my own data performs like this example:
TARGET : "Le trois Janvier mil neuf cent soixante dix,"
BEST PATH : "|Je|Je|trois|fanmier|mil|neuf|cont|soitante|dex|.||Je|trois|fanmier|mil|neuf|cont|soitante|dex|.||"

I will be gratefull if you can help me to solve this problem.
The logits are generated using a linear layer (tf.layers.dense)
Values in logits matrix are like:

4.287187 -32.6091 -12.860022 -18.233511 -12.024508 -32.516006 -31.813993 -12.016912 -11.002839
7.621706 -39.008682 -17.869062 -20.652061 -18.614656 -38.91413 -39.586323 -17.21066 -14.14866
11.9552145 -41.16309 -21.07399 -20.94039 -23.124344 -41.045444 -39.15103 -22.799099 -15.869296

in beamsearch.py,why last.norm() only use in the last step?

It's a great job, thanks to the author.

i have a question in beam search +lm ,why last.norm() only use in the last step ?
why not use last.norm() in every time step ? the long the seq the lm is small,so it should be compensate by length norm, i think it should be norm every time step ,is it right ? thanks in advance

About blankIdx

I have a question
How can I use blankIdx value is zero?
Thank you so much in advance

No module named 'editdistance'

I had installed editdistance at terminal successfully by:

pip install editdistance

And install requirements.txt with error as followed:

Could not find a version that satisfies the requirement pkg-resources==0.0.0 (from -r requirements.txt (line 5)) (from versions: )
No matching distribution found for pkg-resources==0.0.0 (from -r requirements.txt (line 5))

However no error message about editdistance.

But running main.py still get error message:

ModuleNotFoundError: No module named 'editdistance'

Handling duplicate paths in Beam Search

Hey! I am wondering if you could help me to figure something out, in [1] you mentioned that summing up probabilities for Pr, Pr+ and Pr- leads to better results, I tried it in my implementation based on [2], and it does get better, but my probabilities are getting positive (I am working with log probabilities, so the values should be between (-inf, 0]). Did you experienced this phenomenon while implementing the sum in your algorithm?

[1] Stackexchange CTC
[2] CTC implementation github

PS: Sorry if this is not the place to make this question, but I have no other way to reach you.

Is support BPE token?

My acoustic model output are BPE tokens, so can i use lexicon search to get correct word from BPE tokens? Or beamsearch with ngram
Thanks!

beam_search with a beam_width=1

Hi!
Could you please tell, why does a beam_search with a beam_width equal to 1 not give the same result as best_path?

For example

import numpy as np
from ctc_decoder import best_path, beam_search

chars = 'ab'
mat = np.array([[0.8, 0, 0.2], [0.4, 0.0, 0.6], [0.8, 0, 0.2]])

print(f'Best path: "{best_path(mat, chars)}"')
print(f'Beam search: "{beam_search(mat, chars, beam_width=1)}"')

Gives:
Best path: "aa"
Beam search: "a"

Thanks!

tensor flow op

great code!!
can you please post the code of the implementation in c++ for tensorflow you mentioned?

thanks

beam_search.py don't support batch data

def ctcBeamSearch(mat, classes, lm, beamWidth=25): blankIdx = len(classes) maxT, maxC = mat.shape
the matrix only have 2 dimension(length, char_size), don't have batch dimension.

Support for K-Gram LM where k > 2?

I wanted to experiment with the LMs other than Bigram. Any suggestion on how to approach extending current codebase?
or
the probability of the last character conditioned on all previous character.

Test custom image and word

Hello, I'm trying to test a custom word and image but doesn't work for me, can you tell me how can I use a specific word and line for the test?

difference between this repo and ctc decoder in tensorflow

Thanks for your code.
I test my sequence with both ctc decoder in tensorflow and this repo.I always get different result. Tensorflow is always right.This repo sometimes return right and sometimes return wrong.
Have you ever compare these two implementation?

CTC Token Passing

Hi!

I'm trying to use the Token Passing algorithm for decoding a model trained in IAM-DB. I'm using a language model built with the LOB corpus, however, there are situations in which the word that is passed to wordToLabelSeq method presents a character that is not mapped to any class, eg.: '>'. What do you advise to do in these situations?

Thanks in advance,
Dayvid.

Language model at word level

Hi, did you add word level language model for beam search?

Currently its easy to add character level bi-gram, but I find it much harder to add word level. I tried CTC token passing algorithm but its just way too slow comparing beam search.

Different beam search output using different blankIdx value

Hi, I have a question regarding your beam search implementation.

On your ctcBeamSearch method, you put value on blankIdx equals to length of the classes (in this case, the known letters and symbols). But on some other beam-search implementation, they put zero on it.

I tested this using your example, and indeed it differs both in decoded result and how far is it from the ground truth (i'm using CER and WER)

=====Line example (using blankIdx = len(classes))=====
TARGET                  : "the fake friend of the family, like the"
BEAM SEARCH             : "the fak friend of the fomcly hae tC" CER: CER/WER: 0.25714/0.15000
BEAM SEARCH LM          : "the fake friend of the family, lie th" CER: CER/WER: 0.05405/0.03226
=====Line example (using blankIdx=0)=====
TARGET                  : "the fake friend of the family, like the"
BEAM SEARCH             : "the faetker friend of ther foarmnacly,  harse. tHhC." CER: CER/WER: 0.33333/0.22368
BEAM SEARCH LM          : "the fake friend of the family, like the " CER: CER/WER: 0.00000/0.00000

So is there a different case where the blankIdx is not zero? Which value is suitable for beam search decoding?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.