Comments (2)
Hey ! Thanks for this PR.
Can I use it to use an other transformer model ? Like this one : https://huggingface.co/Jean-Baptiste/camembert-ner
I was thinkings about using a conf file yaml like this :
nlp_engine_name: transformers
models:
-
lang_code: fr
model_name:
spacy: fr_core_news_lg
transformers: Jean-Baptiste/camembert-ner
ner_model_configuration:
labels_to_ignore:
- O
aggregation_strategy: simple # "simple", "first", "average", "max"
stride: 16
alignment_mode: strict # "strict", "contract", "expand"
model_to_presidio_entity_mapping:
PER: PERSON
LOC: LOCATION
ORG: ORGANIZATION
AGE: AGE
ID: ID
EMAIL: EMAIL
PATIENT: PERSON
STAFF: PERSON
HOSP: ORGANIZATION
PATORG: ORGANIZATION
DATE: DATE_TIME
PHONE: PHONE_NUMBER
HCW: PERSON
HOSPITAL: ORGANIZATION
low_confidence_score_multiplier: 0.4
low_score_entity_names:
- ID
Can this work ? Without your PR, fr
language never seems available.
Thanks.
from presidio.
Hi @GautierT, are you looking to run this through a REST API?
If no, then you can configure your model using the standard NlpEngineProvider
logic, for example see this documentation
If yes, then the only additional change needed is on app.py
to pass the NlpEngine
into the AnalyzerEngine
. Instead of this:
presidio/presidio-analyzer/app.py
Line 40 in 5bc4b67
Have this:
class Server:
"""HTTP Server for calling Presidio Analyzer."""
def __init__(self):
fileConfig(Path(Path(__file__).parent, LOGGING_CONF_FILE))
self.logger = logging.getLogger("presidio-analyzer")
self.logger.setLevel(os.environ.get("LOG_LEVEL", self.logger.level))
self.app = Flask(__name__)
self.logger.info("Starting analyzer engine")
provider = NlpEngineProvider(conf_file=PATH_TO_CONF)
nlp_engine = provider.create_engine()
self.engine = AnalyzerEngine(nlp_engine=nlp_engine, supported_languages=["fr"])
self.logger.info(WELCOME_MESSAGE)
from presidio.
Related Issues (20)
- Filter recognizers based on locale/country HOT 22
- Transformers backend, device and dtype HOT 2
- PhoneRecognizer returns incorrect recognizer name in the analysis_explanation HOT 2
- How to call AWS Comprehend service for PII detection
- Analyzer identifies Portuguese phone number as US bank account HOT 1
- Custom Pattern Recognizer Not Working Properly with German Language in Analyzer Engine HOT 6
- feat: Add new recognizer for IN_VOTER id
- New Recognizer: Finnish Personal Identity Codes (Henkilötunnus).
- Add new recognizer for IN_PASSPORT number
- 96c word is being incorrectly identified as PERSON HOT 1
- How to Add Custom Functions for Anonymization in Presidio Structured HOT 2
- all capital letters names are not detected HOT 1
- in pdf document unnecessary redact or masking is coming HOT 4
- Enhancing the decision process text when working with images HOT 8
- Ports mixed up in documentation for Docker HOT 1
- Avoid: "WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead." HOT 2
- How to change the default 0.85 score for `SpacyRecognizer`? HOT 1
- NER HuggingFace example can't find required packages HOT 5
- OverflowError in crypto_recognizer HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from presidio.