Comments (4)
The following suggestions come courtesy of Pete Smith:
ENTITY TYPE: Counsel (Lawyer...?)
ENTITY DESCRIPTION: Detects mentions of legal representatives
LEGAL TOPIC: General
EXAMPLE: Rumpole, H
ENTITY TYPE: Command paper
ENTITY DESCRIPTION: Detects mention of policy documents
LEGAL TOPIC: General
EXAMPLE: Students at the heart of the system Cm 8122
ENTITY TYPE: Book
ENTITY DESCRIPTION: Detects mention of legal treatise / academic work
LEGAL TOPIC: General
EXAMPLE: Halsbury's Laws of England (5th edition) Volume 99 Taxation Law (2018)
ENTITY TYPE: Treaty International Organisations
ENTITY DESCRIPTION: Detects mention of inter-state organisations
LEGAL TOPIC: General
EXAMPLE: United Nations
ENTITY TYPE: Private International Organisations
ENTITY DESCRIPTION: Detects mention of international organisations of a private nature, but not businesses
LEGAL TOPIC: General
EXAMPLE: FIFA, IBA
ENTITY TYPE: Government department
ENTITY DESCRIPTION: Detects mention of government department
LEGAL TOPIC: General
EXAMPLE: Ministry of Justice
from blackstone.
Bit of advice:
It seems very unlikely to me that a model you train will be able to tell the difference between Private International Organisations
and Treaty International Organisations
.
Consider that the model has literally zero knowledge about the world and is essentially operating on features extracted from the text only. As an example, IBA is a Private International Org
, the IMF is a Treaty International Org
and IBM is neither. Generally speaking it is very difficult to distinguish these cases for a statistical model.
Similar comments apply to the difference between Lawyer
and Judge
, although I can imagine that they are often referred to with different titles etc, so maybe it is slightly more possible.
In comparison, Book
is a great example of a Named Entity which is likely to work well, because there are things common across mentions of books, such as refererences to pages, publishing dates, editions, consistent title capitalisation. I don't know enough about what a Command Paper
is to know if it is mentioned in a way separate from a book.
Government department also seems like a reasonable NER label, I think.
from blackstone.
At some point the model is able to memorize things, and even if it has zero world knowledge, seeing enough data points is often good enough. For instance, it can remember that the word Treaty is a text, and the word organization is an organization, then learn how to use both of them.
More over, pretrained language model (Spacy have its own) is a way to get world knowledge.
from blackstone.
I would be pleased to volunteer
from blackstone.
Related Issues (19)
- General Python tidy HOT 2
- ValueError - Unknown morphological feature: 'Person' HOT 4
- Just a hello HOT 2
- Add __version__ to import package
- Compatibility issue with Spacy v2.2 HOT 1
- Custom modules are not getting loaded and giving error HOT 1
- requirements.txt is too big HOT 1
- Any particular reason why spacy is not been upgraded to 2.2+?
- Compatibility with spaCy 2.1.9 & 2.2+ HOT 2
- config.cfg is missing from model HOT 6
- neither segmenter, nor legislation linker, nor anything else works
- Add a `tests` directory HOT 1
- Pipenv installation failed HOT 1
- Poorly maintained project: Upgrade package to support Python 3.10+
- Abbreviation detection not working where short form contains a space followed by digits HOT 1
- Need some tests!
- Add Sentence Segmenter HOT 1
- Unknown Morphological Feature HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from blackstone.