Comments (6)
Could you please be more specific about what exactly breaks when and what your suggestion would be to change or add functionality?
from gateplugin-stringannotation.
We cannot point the plugin to bundled resources as it expects a physical file, probably due to this file that is created by the plugin to run (the binary lookup file). So for now we have taken the plugin out of our pipeline as building the project with maven and then running the service, seems to be problematic given this creation of a file on the file system. If you had an option to load the gazetteer normally without this physical file dependency then that would be great.
from gateplugin-stringannotation.
OK, so you mean some mechanism where the gazetteer list can be loaded from a JAR or some other URL?
The gazetteer builds a highly optimized trie datastructure from the original lists, which takes a while. For this reason, the datastructure is written into a .gazbin file as a cache. Not having this cache would mean that every time the lists are loaded, the optimization and compilation into the trie has to be done first.
Could you give an example for how you would imagine specifying the gazetteer lists in a way that would be compatible with your deployment requirements?
from gateplugin-stringannotation.
The default gazetteers suffice, so if the optimized trie cannot be built without this cache then provide an option for a normal, unoptimized lookup Gate style, where resources can be loaded from the JAR.
By the way, see https://github.com/npgall/concurrent-trees for an implementation of efficient in-memory tries that are thread safe, in case you want to explore an alternative to writing to a cache.
from gateplugin-stringannotation.
The optimized trie created by the gazetteer pr already is thread safe.
If you do not need the optimization (which increases both memory efficiency for huge gazetteer lists and lookup speed), then maybe the default gazetteer included in the standard GATE distribution is a better option?
There will be some kind of support for loading resources from a JAR eventually, but this will have to wait until after the first pre-release version of the next GATE version.
For this issue, I need a concrete description of what you propose should get changed or implemented so that developers can decide whether to implement it and to take it.
from gateplugin-stringannotation.
No worries, we'll just use the default gazetteer. Thanks!
from gateplugin-stringannotation.
Related Issues (19)
- Remove dependency on virtual corpus HOT 1
- Rename classes HOT 1
- Rethink PR parameters HOT 1
- Rethink case sensitivity HOT 2
- Add parameter/setting to handle latin characters HOT 1
- Better handling of list-specific features HOT 1
- Distinguish between missing features and empty features HOT 1
- Support TSV files as list files HOT 1
- Add GUI option to re-build the cache HOT 1
- Make it easier to use for non-exact matches
- Investigate is jaspell could be useful HOT 1
- Add features that indicate offsets of feature-based matches
- Investigate approximate string matching
- Investigate collapsing multiple matches into one annotation
- Add benchmark information to the Wiki
- Read from URLs instead of files where possible HOT 2
- Make sure version changes in the backend library are detected when loading gazbin files HOT 1
- Mavenize HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gateplugin-stringannotation.