Comments (7)
@surajsonee can you please elaborate on how you did step 3? Was the yaml loaded and recognizers added? if you print the list of recognizers, are the ones from the yaml listed there?
from presidio.
sure!
Here is the yaml file which I'm using:
https://github.com/microsoft/presidio/blob/main/presidio-analyzer/conf/example_recognizers.yaml
-
Access the Presidio Container: Use the docker exec command to access the running Presidio container's shell. For example:
sudo docker exec -it <container_id> bash
-
Navigate to Configuration Directory: Inside the container, navigate to the directory where Presidio's configuration files are stored. This is typically the /presidio/config/ directory.
-
Check Mounted YAML File: Verify that the custom entity YAML file is correctly mounted in the container's configuration directory. using ls command to list the files in the directory:
-
Access Presidio Configuration Directory: First, navigate to the directory where Presidio's configuration files are stored. which is /presidio/config/ directory within the Presidio container.
-
Review Configuration Files: Look for configuration files or scripts that are used to initialize Presidio's analyzer engine. These files often have names like config.yaml or similar.
content of config.yaml:
custom_entities:
yaml_path: /presidio/config/example_recognizers.yaml
-
Inspect Configuration Content: Open the configuration file using a text editor or command-line tools like cat or less. Look for sections or properties related to loading custom entity rules or YAML files.
-
Add Recognizers to Registry: Add the created recognizers to Presidio's recognizer registry.
Here's a Python example demonstrating how to add recognizers after loading the YAML:
from presidio_analyzer import Pattern, PatternRecognizer, AnalyzerEngine
import yaml
# Load custom entity rules from YAML file
with open('example_recognizers.yaml', 'r') as yaml_file:
custom_entity_rules = yaml.safe_load(yaml_file)
# Create an instance of AnalyzerEngine
analyzer = AnalyzerEngine()
# Iterate over each entity in the custom entity rules
for entity_name, entity_config in custom_entity_rules.items():
patterns = entity_config.get('patterns', [])
# Create a recognizer for each pattern defined for the entity
for pattern_config in patterns:
name = pattern_config.get('name')
regex = pattern_config.get('regex')
score = pattern_config.get('score', 0.8) # Default score
# Create a Pattern object
pattern = Pattern(name=name, regex=regex, score=score)
# Create a PatternRecognizer with the Pattern object
recognizer = PatternRecognizer(supported_entity=entity_name, patterns=[pattern])
# Add the recognizer to Presidio's recognizer registry
analyzer.registry.add_recognizer(recognizer)
In this above example, example_recognizers.yaml is the YAML file containing the custom entity rules. The script reads this file, extracts the entity names and patterns, creates recognizers based on the extracted information, and adds them to Presidio's recognizer registry.
Please let me know where I'm doing wrong.
Thank you!
from presidio.
Hi, I'm not sure what's wrong, as you seem to add the recognizers the right way. Could it be that patterns
are always empty?
BTW we have a method for adding recognizers from YAML: https://microsoft.github.io/presidio/analyzer/adding_recognizers/#reading-pattern-recognizers-from-yaml
Perhaps try to see if it makes any difference.
from presidio.
Thank you for the reference!
Could you please provide guidance on which files require modification to establish custom entity rules?
from presidio.
Sure. if you change the default configuration in app.py
:
presidio/presidio-analyzer/app.py
Line 40 in dee6562
To something more similar to the tutorial:
yaml_file = "recognizers.yaml"
registry = RecognizerRegistry()
registry.load_predefined_recognizers()
registry.add_recognizers_from_yaml(yaml_file)
self.engine = AnalyzerEngine(registry=registry)
You should be able to load the yaml based recognizers into the analyzer engine, and these would be used in each call.
from presidio.
Related Issues (20)
- Enhancement Request: Documentation for the New presidio-structured Package HOT 1
- Some spans are being skipped by spacy-huggingface-pipelines, resulting in poor anonymisation HOT 14
- Presidio demo website seems to be down: https://presidio-demo.azurewebsites.net/ HOT 2
- Sample code for LemmaContextAwareEnhancer does not return the expected results HOT 3
- Combination of FlairRecognizer with different PatternRecognizers (and defining their order of execution) HOT 2
- Can't change the langauge to call redact in ImageRedactorEngine to use another language in tesseract. HOT 2
- Reversed PII order HOT 2
- Error in documentation code
- Order of execution of the recognizers HOT 1
- Anonymizer does not work HOT 5
- Feature Request: Enhance Sampling Mechanism in presidio-structured to Exclude Null Values HOT 1
- Incorrect Type Hint and Type Checking in presidio structured `analysis_builder`
- Enhancing flexibility in Data Analysis Classes with default attributes HOT 1
- Error in documentation - Using a previously loaded spaCy pipeline HOT 4
- Anonymizer does not work if not separated by spaces HOT 3
- Why does Presidio spin up so many threads? HOT 1
- Add Support for 'M' Prefix in SG_NRIC_FIN Recognizer for New Foreigner IDs
- Add Support for 'bc1' Prefix in Crypto Recognizer for Bech32 Bitcoin Address Format
- Not understanding why DICOM redaction does not detect Patient Name on example data HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from presidio.