Comments (3)
It's currently only possible to get the number of substitutions, deletions, and insertions with jiwer.compute_measures()
. It should be a relatively straightforward PR to feed the Levenshtein.editops()
results to the resulting dictionary, which would give you a more precise alignment, e.g to get from the hypothesis to the ground truth, insert 'x' at index 2
, delete 'y' at index 5
, etc.
from jiwer.
@nikvaessen: I'm looking in measures.py
I'm planning to generate 'words alignment between ground_truth and hypothesis'
should I add it to jiwer ? are there any code style or other guidance ?
from jiwer.
Hi @nikvaessen ,
I have a code version that by calling
ground_truth = ["XX hello world aa bb cc dd ee ff ", "aa bb", "aa bb cc dd", "bb dd ee"]
hypothesis = ["hello duck aa cc dd ee1 ff 11 22 33", "11 aa aa", "bb", "aa bb cc dd ee1 ff"]
error = local_wer(ground_truth, hypothesis, get_alignment=True)
print(f'error={error}\n')
words_alignment.validate()
words_alignment.collect_errors_information()
words_alignment.print_analysis()
generate the following output:
sentence 0, truth length: 9, hyp lenngth 10
0 XX - - 'DEL'
1 hello 0 hello 'COR'
2 world 1 duck 'SUB'
3 aa 2 aa 'COR'
4 bb - - 'DEL'
5 cc 3 cc 'COR'
6 dd 4 dd 'COR'
7 ee 5 ee1 'SUB'
8 ff 6 ff 'COR'
- - 7 11 'INS'
- - 8 22 'INS'
- - 9 33 'INS'sentence 1, truth length: 2, hyp lenngth 3
- - 0 11 'INS'
0 aa 1 aa 'COR'
1 bb 2 aa 'SUB'sentence 2, truth length: 4, hyp lenngth 1
0 aa - - 'DEL'
1 bb 0 bb 'COR'
2 cc - - 'DEL'
3 dd - - 'DEL'sentence 3, truth length: 3, hyp lenngth 6
- - 0 aa 'INS'
0 bb 1 bb 'COR'
- - 2 cc 'INS'
1 dd 3 dd 'COR'
- - 4 ee1 'INS'
2 ee 5 ff 'SUB'
Detailed recognition performance analysis.
Truth words count.
bb 4
aa 3
dd 3
cc 2
ee 2
XX 1
hello 1
world 1
ff 1Deletions words count.
XX 1
bb 1
aa 1
cc 1
dd 1Insersions words count.
11 2
22 1
33 1
aa 1
cc 1
ee1 1Substitutions words count.
True-word Hyp-word Count
world duck 1
ee ee1 1
bb aa 1
ee ff 1
I can merge it into the code with your/community comments and help.
Tzur
from jiwer.
Related Issues (20)
- WER score bigger than 1.0 HOT 2
- Permutations of the sentences give different results HOT 3
- module 'jiwer.transforms' has no attribute 'ReduceToListOfListOfWords' HOT 2
- Don't support Chinese? HOT 4
- AttributeError: module 'jiwer' has no attribute 'cer'
- SentencesToListOfWords is removed after 2.2.0 HOT 8
- RemovePunctuation does not remove smart/curly quotes HOT 2
- Avoid error when a string in the truth is empty after transformation HOT 2
- Alignment options similar to `fstalign` HOT 1
- Batch vs Individual results are not same HOT 6
- Update Levenshtein dependency to maintained version
- Major performance regression in 2.5.0 for jiwer.transforms.RemovePunctuation HOT 2
- jiwer WER runs very fast , compared to Torchmetrics WER how? HOT 1
- Current licenses might not be allowed HOT 2
- jiwer.visualize_measures doesn't work as in the docs HOT 2
- Version 3.0.0 can produce wrong results HOT 1
- Regarding visualize_alignment() function. HOT 1
- Apparent WER bug? HOT 2
- Update rapidfuzz version HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jiwer.