wildfish / django-gdpr-assist Goto Github PK
View Code? Open in Web Editor NEWTools to help manage user data in the age of GDPR
License: Other
Tools to help manage user data in the age of GDPR
License: Other
I have an old codebase which still uses the built-in Django User model as a base.
If I add the following code to (actually, where should this code go?!) a models.py somewhere:-
class UserPrivacyMeta:
fields = ['first_name', 'last_name', 'email']
gdpr_assist.register(User, UserPrivacyMeta)
then makemigrations makes a new migration in the django section of my virtualenv, which of course isn't part of the git repo so isn't committed, which in turn means that it has no effect on the code when I run it on the live system (unless I run makemigrations on there).
This all seems a bit wrong somehow to me. Am I doing something wrong? Is there another way to do this?
During testing of bulk anonymisation, there seem to be a few areas where performance can be optimized (although there may be correctness / auditing tradeoffs for some of these).
I'll try to provide some supporting statistics on each of these soon - but as a rough preface, I've been aiming to bring a ~12-hour estimated bulk anonymisation down to less than 3 hours (and ideally reduce it further than that).
Modifications applied so far towards this goal have included:
for_bulk=True
as an argument to the anonymise
method (nb: reduces audit logging)force=True
argument to the anonymise
method and flipping the order of the self.is_anonymised() and not force
conditionals -- so that no DB exists()
query is made when force mode is enabled (nb: does this risk introducing incorrect/circular anonymisation?)__getattr__
implementation by using dictionary lookups rather than list iterations to retrieve anonymisers (nb: no evidence of improvements here, yet)In Django-2.1.2 and python 3.5, in a fresh installation I have overriden the User model:
$ ./manage.py startapp community
$ cat community/models.py
from django.contrib.auth.models import AbstractUser
from django.utils.translation import gettext_lazy as _
class Person(AbstractUser):
def __str__(self):
return self.username
I add the code:
class PrivacyMeta:
fields = ['email', 'first_name', 'last_name']
Navigating to the admin, returns the error:
Exception Type: ProgrammingError at /admin/logout/
Exception Value: column community_person.anonymised does not exist
LINE 1: ...n"."is_active", "community_person"."date_joined", "community...
Trying to make migrations I receive:
Traceback (most recent call last):
File "./manage.py", line 15, in <module>
execute_from_command_line(sys.argv)
File "/home/user/.virtualenvs/my-env/lib/python3.5/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
utility.execute()
File "/home/user/.virtualenvs/my-env/lib/python3.5/site-packages/django/core/management/__init__.py", line 375, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/home/user/.virtualenvs/my-env/lib/python3.5/site-packages/django/core/management/base.py", line 316, in run_from_argv
self.execute(*args, **cmd_options)
File "/home/user/.virtualenvs/my-env/lib/python3.5/site-packages/django/core/management/base.py", line 353, in execute
output = self.handle(*args, **options)
File "/home/user/.virtualenvs/my-env/lib/python3.5/site-packages/django/core/management/base.py", line 83, in wrapped
res = handle_func(*args, **kwargs)
File "/home/user/.virtualenvs/my-env/lib/python3.5/site-packages/django/core/management/commands/makemigrations.py", line 170, in handle
migration_name=self.migration_name,
File "/home/user/.virtualenvs/my-env/lib/python3.5/site-packages/django/db/migrations/autodetector.py", line 44, in changes
changes = self._detect_changes(convert_apps, graph)
File "/home/user/.virtualenvs/my-env/lib/python3.5/site-packages/django/db/migrations/autodetector.py", line 129, in _detect_changes
self.new_apps = self.to_state.apps
File "/home/user/.virtualenvs/my-env/lib/python3.5/site-packages/django/utils/functional.py", line 37, in __get__
res = instance.__dict__[self.name] = self.func(instance)
File "/home/user/.virtualenvs/my-env/lib/python3.5/site-packages/django/db/migrations/state.py", line 210, in apps
return StateApps(self.real_apps, self.models)
File "/home/user/.virtualenvs/my-env/lib/python3.5/site-packages/django/db/migrations/state.py", line 271, in __init__
self.render_multiple(list(models.values()) + self.real_models)
File "/home/user/.virtualenvs/my-env/lib/python3.5/site-packages/django/db/migrations/state.py", line 306, in render_multiple
model.render(self)
File "/home/user/.virtualenvs/my-env/lib/python3.5/site-packages/django/db/migrations/state.py", line 572, in render
body.update(self.construct_managers())
File "/home/user/.virtualenvs/my-env/lib/python3.5/site-packages/django/db/migrations/state.py", line 531, in construct_managers
as_manager, manager_path, qs_path, args, kwargs = manager.deconstruct()
File "/home/user/.virtualenvs/my-env/lib/python3.5/site-packages/django/db/models/manager.py", line 65, in deconstruct
% (name, module_name)
ValueError: Could not find manager CastPrivacyUserManager in gdpr_assist.models.
Please note that you need to inherit from managers you dynamically generated with 'from_queryset()'.
FYI, the docs specify the method to manually register a model as:
gdpr_assist.register_model(User, UserPrivacyMeta)
However, it seems the correct way is with:
gdpr_assist.registry.register(User, UserPrivacyMeta)
Is that the correct way, or am I missing something?
Sometimes we don't seem to be getting all the search results we should
Hello!
Unfortonately django-gdpr-assist
does not play nicely with models that have a UUID primary key.
The resulting issue is, that the library will attempt to parse the pk as an integer and thus the database commit fails, because the "integer" (casted UUID) is too large.
The fix is here: #60
However no ones merged it, can we get it in asap because this isn't really usable as a package in production without it.
I gather this is a regression. Tested with latest master against django 3.1 (and also 3.2)
from django.contrib.auth.models import User
import gdpr_assist
class UserPrivacyMeta:
fields = ['username', 'email']
gdpr_assist.register(User, UserPrivacyMeta)
then:
$ manage.py makemigrations --dry-run
Waiting for database connection...
Migrations for 'auth':
/opt/hunter2/venv/lib/python3.8/site-packages/django/contrib/auth/migrations/0013_alter_user_managers.py
- Change managers on user
This of course relates to #6 but this is with the built-in user model, not a custom one, so no migrations are appropriate in this situation. I don't understand though how this wasn't picked up after fixing #5 though.
Can I suggest as a first step adding a test in which a third-party model like User
is registered, and it is checked that no migrations are created in that scenario.
Django keeps on adding extra migration for setting user manager for User
class which inherits from AbstractUser
.
In the examples below I work on MySQL (5.7.23), use Python 3.7.3 and mysqlclient==1.3.13.
INSTALLED_APPS = [
# ...
"gdpr_assist",
]
DATABASES = {
"default": {
"ENGINE": "django.db.backends.mysql",
# ...
},
"gdpr_log": {
"ENGINE": "django.db.backends.mysql",
"NAME": "gdpr_log",
"HOST": "127.0.0.1",
"USER": "change_it",
"PASSWORD": "change_it",
},
}
DATABASE_ROUTERS = ["gdpr_assist.routers.EventLogRouter"]
python manage.py migrate
python manage.py migrate --database=gdpr_log
and:
python manage.py makemigrations
gives me 'No changes detected' - so far so good.
User
class, which extends AbstractUser
- I'm trying to make private only fields defined on AbstractUser
, not my on User
.class User(AbstractUser):
some_new_field_not_important_actually = models.BooleanField(default=False)
class UserPrivacyMeta:
fields = [
"first_name",
"last_name",
]
from gdpr_assist import register
register(User, UserPrivacyMeta)
python manage.py makemigrations
gives me:
Migrations for 'users':
project/users/migrations/0058_auto_20190716_1308.py
- Change managers on user
- Add field anonymised to user
The interesting thing for me to note was that the manager seems to have changed.
python manage.py migrate
python manage.py migrate --database=gdpr_log
and again - migrations have been applied in both cases, looks good:
Applying users.0058_auto_20190716_1308... OK
python scripts/manage.py makemigrations
I get:
sym_poc/users/migrations/0059_auto_20190716_1332.py
- Change managers on user
WHY? The change that Django is trying to make relates only to changing manager, and not to changing the anonymised
field (which has already been added to the database).
It looks like the post_anonymise
example is just a copy of the pre_anonymise
example:
We have a use case where a Django application manages some API and session tokens, and we'd like to remove them from the anonymised (manage.py anonymise_db
) version of the database. Replacing them with mock data doesn't seem to make sense, especially for the session tokens.
There are ways to do this outside of the library and Django: we could, for example, perform post-anonymization SQL commands to truncate the relevant tables, and/or exclude the API token tables from the application database backup/restore processes. The main benefits to an in-library solution would be convenience and consistency (one command and one layer of configuration + PrivacyMeta
to manage bulk data migration from pristine to anonymised).
Has django-gdpr-assist
considered adding support to clear the contents of model tables during the manage.py anonymise_db
step, and/or does this seem to make sense as a context and feature request?
Are there any plans to add support for Django 3.2?
Are there know issues?
After trying to create a simple unit test to validate anonymisation I ran into the following issues:
Acording to the documentation the object should have a anonymised property/field but that doesn't seems to be the case.
https://django-gdpr-assist.readthedocs.io/en/latest/anonymising.html
After some debugging I found the method obj.is_anonymised() method which I guess is a replacement? In either case calling
obj.anonymise()
followed directly by obj.is_anonymised()
yields "False".
If I requery the model it yields True as expected.
I am running python 3.6
my unit test looks roughly like this:
def test_anonymise(self):
address = Address( name=... )
address.save()
self.assertEqual(address.is_anonymised(), False)
address.anonymise()
self.assertEqual(address.is_anonymised(), True)
Thx for the great library. really appreciate it guys :)
What happens with fields which has unique constraint or they can not be empty?
There are sometimes models which contain PII so need to be exported, but anonymising them doesn't make sense - mailing lists for example.
We could link the post-anonymise event to delete those objects, but that sounds like it could be unexpected and dangerous.
Lets look at adding a can_anonymise=True
flag to the PrivacyMeta
class; if it's True
we could either:
We should also make can_anonymise=False
models not get the anonymised
field option - looks odd in an AdminModel which shows all fields.
Hi. Great project you guys have here. Are there any plans for adding support for Django 3.0 to this project?
Hi. After updating to version 1.4.0 I started receiving this error when starting django:
RuntimeError: Registered gdpr_assist model Users manager specified 'use_in_migrations=True', with no name provided.
Is there a guide on what changes need to be done in order to upgrade to version 1.4.0?
During testing, we've seen some high memory usage in the library when it is applied to models that have a large number of object records stored in the database.
This has been traced to the model.objects.all().anonymise()
call in anonymise_db
which uses a queryset but appears to cache a significant amount of query metadata before anonymisation of the first object takes place.
Since we have once-only usage semantics for the model objects in the anonymise_db
use case, we could use a Django queryset iterator to reduce memory consumption.
It seems like this product does not work with inheritance.
When I run the code below (just running makemigrations command) I get the following error:
TypeError: Cannot create a consistent method resolution
order (MRO) for bases PrivacyModel, ModelA
class ModelA(models.Model):
a = models.CharField(max_length=50, default="Hello")
class PrivacyMeta:
fields = ["a"]
class ModelB(ModelA):
b = models.BooleanField(null=True, default=True)
class PrivacyMeta:
fields = ["b"]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.