Giter Club home page Giter Club logo

identity-anonymization-tool's Introduction

Tool for removing/Replacing all identifiers matching given criteria

How to Build

Make sure you have JDK 8 (JDK 11 not supported)

mvn clean install
mvn package

How to run

cd components/org.wso2.carbon.privacy.forgetme.tool
cd target/dist/bin
./forget-me -U <userName> [-D domainName] [-TID tenantId]

For more information please refer the help document

Components

How to debug the tool

To debug the tool remotely do the following.

####Linux:

Execute following commands in the shell that the tool in running.

  • JAVA_OPTS="-Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=5005,suspend=y"
  • export JAVA_OPTS
  • ./forgetme.sh <arguments>

Use IDEs remote debugging feature to connect to port 5005.

####Windows

identity-anonymization-tool's People

Contributors

ashensw avatar bhagyasakalanka avatar chanikaruchini avatar darshanasbg avatar dependabot-support avatar dewnimw avatar dmhp avatar emswbandara avatar geethkokila avatar janakamarasena avatar jkaushalya avatar lasanthas avatar madurangasiriwardena avatar maheshika avatar malithie avatar megala21 avatar nandika avatar nuwandiw avatar omindu avatar pasant9 avatar piraveena avatar rushmin avatar sachiniwettasinghe avatar sajithshn avatar senthalan avatar tharikagithub avatar thumimku avatar vihanga-liyanage avatar wso2-jenkins-bot avatar yasara-y avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

identity-anonymization-tool's Issues

By default configurations should exist to delete PII data from all occurrences

Description:
By default configurations should exist to delete PII data from all occurrences

For example: delete all occurrences from all the log files (wso2carbon.log, audit.log etc.)

Current configuration:

"directories": [
    {
      "dir": "log-config",
      "type": "log-file",
      "processor" : "log-file",
      "log-file-path" : "xx",
      "log-file-name-regex" : "wso2carbon.log"
    }

Suggested Labels:
Improvement

Suggested Assignees:
RuwanA

Affected Product Version:
IS 550

Document the report generate after tool execution

Description:
Document the report generate after tool execution.

Suggested Labels:
N/A

Suggested Assignees:
N/A
Affected Product Version:
N/A

OS, DB, other environment details and versions:
N/A

Steps to reproduce:
N/A

Related Issues:
N/A

[Doc] Mention the tool is packaged in other WSO2 products and not only in IS

Description:
The below doc has a note as below
https://docs.wso2.com/display/ADMIN44x/Removing+References+to+Deleted+User+Identities+in+WSO2+Products#RemovingReferencestoDeletedUserIdentitiesinWSO2Products-Configuringtheconfig.jsonfile

If you want to remove references to deleted user identities in WSO2 Identity Server (WSO2 IS), you do not need to build the Identity Anonymization tool. This is because the tool is packaged with WSO2 IS by default. For instructions on how to run the Identity Anonymization tool with WSO2 IS, see Removing References to Deleted User Identities in the WSO2 IS documentation.

Issue
This tool is not packaged only in IS, but in other products as well. Therefore, need to mention about it.

Suggested Labels:
Doc, Bug

Adding SHA256 hashing functionality in generating the pseudonym

Description:
In anonymizing the PII, there may be times that the anonymized user still needs to be uniquely identified across others users ( both anonymize and non-anonymize). In a scenario like above using a randomly generated UUID values for replacing the PII is not valid.

Suggested Labels:
Type/Enhacement

Suggested Assignees:
Kasun Siyambalapitiya

Affected Product Version:
v1.1.19

OS, DB, other environment details and versions:
N/A
Steps to reproduce:
N/A

Related Issues:
N/A

Forget Me tool is not working in Windows

Description:
Tool is not running in Windows

Suggested Labels:
Type/Bug
Severity/Blocker
Priority/Highest
Affected/APIM update 14

Affected Product Version:
WSO2 APIM update 14

OS, DB, other environment details and versions:
OS-Mac high sierra
DB-H2
JDK-1.8.0_42

Steps to reproduce:
Run the forget me tool for deleted user.
attached a screenshot of config.json

screen shot 2018-03-14 at 8 14 59 pm

`C:\Users\Administrator\Desktop\APIMUpdate14\wso2am-2.1.0-update14\bin>rem ----- Only set CARBON_HOME if not already set ----------------------------

C:\Users\Administrator\Desktop\APIMUpdate14\wso2am-2.1.0-update14\bin>setlocal enabledelayedexpansion

C:\Users\Administrator\Desktop\APIMUpdate14\wso2am-2.1.0-update14\bin>rem C:\Users\ADMINI1\Desktop\APIMUP1\WSO2AM~1.0-U\bin\ is expanded pathname of the current script under NT with spaces in the path removed

C:\Users\Administrator\Desktop\APIMUpdate14\wso2am-2.1.0-update14\bin>if "C:\Users\ADMINI1\Desktop\APIMUP1\WSO2AM1.0-U\bin.." == "" set CARBON_HOME=C:\Users\ADMINI1\Desktop\APIMUP1\WSO2AM1.0-U\bin..

C:\Users\Administrator\Desktop\APIMUpdate14\wso2am-2.1.0-update14\bin>SET curDrive=C

C:\Users\Administrator\Desktop\APIMUpdate14\wso2am-2.1.0-update14\bin>SET wsasDrive=C

C:\Users\Administrator\Desktop\APIMUpdate14\wso2am-2.1.0-update14\bin>if not "C" == "C" C:

C:\Users\Administrator\Desktop\APIMUpdate14\wso2am-2.1.0-update14\bin>cd C:\Users\ADMINI1\Desktop\APIMUP1\WSO2AM~1.0-U\bin..`

Improve patterns to anonymize user management audit logs

Description:
With the improved audit logs, we need to improve the patterns to anonymize those audit logs

Suggested Labels:
Improvement

Suggested Assignees:
N/A

Affected Product Version: WUM updated products and latest versions.

OS, DB, other environment details and versions: Ubuntu 16.04

Steps to reproduce:
N/A

Related Issues:
wso2/product-is#2975

Better to have some message while running the forgetme tool

Description:
Need some message to indicate tool is still running.

Suggested Labels:
Type/Improvement
Severity/Critical
Priority/High
Affected/APIM 2.1.0 update 13

Affected Product Version:
WSO2 API Manager update 13

OS, DB, other environment details and versions:
OS-Mac high sierra
DB-H2
Fresh APIM 2.2.0 update 13

Steps to reproduce:
When we running the forget me tool, it takes considerable amount of time to print the completion.
Mean time there's no indication of whether tool is still running or complete the task.
Therefore better to have some indication when tool is processing.

[Query] Do we have to configure data sources again in this tool?

Description:
In the datasources directory in the IS server, we configure different datasources we use. We have to configure the same under datsources directory in this tool as well. This is too much of configuration and duplication of the same.

Therefore, can't we have an option to read the data sources from the IS directory without configuring it again here?

Suggested Labels:
Query, Improvement

Affected Product Version:
IS 550

Error in sp-patterns.xml and das-patterns.xml

Description:
sp-patterns.xml and das-patterns.xml files have invalid place holders which will giving an error when running the tool in standalone version.

Suggested Labels:
N/A

Suggested Assignees:
N/A

Affected Product Version:
1.1.4

OS, DB, other environment details and versions:
N/A

Steps to reproduce:
N/A

Related Issues:
N/A

Have separate configs for Windows

Description:
When running the forgetme.bat in windows, we need to change the path to logs and datasources in the config.json file. We cannot run the script without changing these values. On the other hand in linux, we can run with default values in the config.json file which is easy.

Can we support to run the forgetme script with minimal configurations on other OS's as well?
(The implementation suggested will be similar to wso2carbon script which runs with both OS's without configuration changes)

Suggested Labels:
Improvement

Running the forget me tool for non-existing users in the system won't show any error messages.

Description:
When we run the forget me tool for non-existing users, won't show any error message.

Suggested Labels:
Type/Bug
Severity/Critical
Priority/High
Affected/APIM 2.1.0 update 13

Affected Product Version:
APIM 2.1.0 update 13

OS, DB, other environment details and versions:
OS-Mac high sierra
DB-H2
Fresh APIM 2.2.0 update 13

Steps to reproduce:
Ran the forget me tool for non-existing user in the system.

Results:
Tool is showing the execution is successful and following error messages can be seen in the backend.
tool is not finishing the execution until we kill the process.

2018-03-09 17:29:18 INFO ForgetMeTool:167 - Generating pseudonym as pseudo name is not provided : 69d5999a-aedf-4047-808d-0e061f67fb3d 2018-03-09 17:29:19 INFO HikariDataSource:72 - HikariPool-0 - is starting. 2018-03-09 17:29:21 INFO HikariDataSource:72 - HikariPool-1 - is starting. 2018-03-09 17:29:23 INFO HikariDataSource:72 - HikariPool-2 - is starting. 2018-03-09 17:29:26 INFO HikariDataSource:72 - HikariPool-3 - is starting. 2018-03-09 17:29:28 INFO HikariDataSource:72 - HikariPool-4 - is starting. 2018-03-09 17:29:32 INFO ForgetMeExecutionEngine:219 - Processor execution completed. Processor : rdbms

Null check for Datasource before logging its name

Description:
https://github.com/wso2/identity-anonymization-tool/blob/master/components/org.wso2.carbon.privacy.forgetme.sql/src/main/java/org/wso2/carbon/privacy/forgetme/sql/config/DataSourceConfig.java#L70

Above method getDatasource() may return null.

Callers of this method can be found in org.wso2.carbon.privacy.forgetme.sql.config package (module directory)

Ex:
https://github.com/wso2/identity-anonymization-tool/blob/master/components/org.wso2.carbon.privacy.forgetme.sql/src/main/java/org/wso2/carbon/privacy/forgetme/sql/module/AMApplicationRegistrationSQLExecutionModule.java#L56

Here
dataSource.getClass()
may produce NPE.

Affected Product Version:
IS 5.10

Suggestion.
Provide a null check along with debug enabled check in if logic.

Document should explain the correct use of config.json and each config set

Document [1] should explain the correct use of config.json and each config set

[1] https://docs.wso2.com/display/ADMIN44x/Removing+References+to+Deleted+User+Identities+in+WSO2+Products#RemovingReferencestoDeletedUserIdentitiesinWSO2Products-MasterConfig

Document says,

config.json | This is the master configuration file. You can configure this file based on your requirement. For information on how to configure this file, see Configuring the master configuration file.

Can you please explain what sort of configurations we should do in here in terms of GDPR?
Is it the place where we configure the locations or information to be deleted of a user?

The below content is too technical. Can we explain each of the section's usage?

You can configure the following in the config.json file based on your requirement:

processors - Specify a list of processors that you want to enable. The names of the processors that you can enable are pre-defined. Possible values are RDBMS and log-file. - Why we need to enable processors? Why do we call them processors?

directories - Specify required directories. When you include config directories, be sure to either specify the config directories relative to the location of this file or specify the absolute path. - Required directories for what? Why a user needs to include config directories?

processor - Specify the processor to be used to process instructions in a selected directory. - Why need to use a process to provide instructions? Why do we call it a processor? What is the difference between processor and processors in config.json?

extensions - Specify any extensions to be initialized prior to starting the processor. - What is an extension? Why need an extensions in here? What is the difference of each processor directories and extension?

Add batch excecute script

Description:
Add a script to execute user anonymization in a batch. Inputs for the script is a csv file in following format. <Username>,<User-Store-Domain>,<Tenant-Domain>,<Tenant-ID>

Executing the anonymization tool several time result in generating log files with multiple "temp.txt" extensions

Description:
Running the tool for multiple times creates the log fixes with multiple "temp.txt" extensions

Affected Product Version:
EI 6.1.1 update 23

OS, DB, other environment details and versions:
OS - ubuntu 15.10
DB - mysql 5.6
JDK - 1.8

Pre- Requisites
PII information exists in the logs

Steps to reproduce:

  1. Execute the tool to replace the user information (ex: ./forget-me.sh -U peter)
  2. Without removing the log files execute the tool again (ex: ./forget-me.sh -U sam)

Observations
This will create another the set of temp log files with "temp.txt" extension

Expected Behaviour
The tool should be handled in a way to not execute set of temp log files if that particular username does not exist in the logs.
Since we can not handle the behaviour of a user using the tool, we should make sure not to create another set of log files with the extension.

screenshot from 2018-03-12 11-13-05

Refer - wso2/product-ei#1967

Grammar mistake in the logs

Description:
Grammar mistake in the following log
INFO ForgetMeExecutionEngine:115 - All processors has been properly shut-down

It should be "All processors have been properly shutdown"

Suggested Labels:
Bug

Suggested Assignees:
RuwanA

Affected Product Version:
IS 550

Can we run forgetme tool with multiple users

Description:
If we want to run multiple users in same time, would it be possible to do it?
those information are not in [1] also.
If not, it's better to have a way to run forget me tool with multiple users.

[1] https://docs.wso2.com/display/AM2xx/Removing+References+to+Deleted+User+Identities

Suggested Labels:
Type/Question
Type/Improvement
Severity/Major
Priority/Normal
Affected/APIM 2.1.0 update 13

Affected Product Version:
APIM 2.1.0 update 13

OS, DB, other environment details and versions:
OS-Mac high sierra
DB-H2
Fresh APIM 2.2.0 update 13

NoClassDefFoundError: com/sun/istack/Pool

Description:

When running forget me to remove user identities stored in the database the following error stack trace is observed.

Exception in thread "main" java.lang.NoClassDefFoundError: com/sun/istack/Pool
	at com.sun.xml.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1126)
	at com.sun.xml.bind.v2.ContextFactory.createContext(ContextFactory.java:135)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:247)
	at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:234)
	at javax.xml.bind.ContextFinder.find(ContextFinder.java:441)
	at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:641)
	at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:584)
	at org.wso2.carbon.datasource.utils.DataSourceUtils.loadJAXBConfiguration(DataSourceUtils.java:344)
	at org.wso2.carbon.datasource.core.DataSourceManager.initDataSource(DataSourceManager.java:159)
	at org.wso2.carbon.datasource.core.DataSourceManager.initDataSources(DataSourceManager.java:137)
	at org.wso2.carbon.datasource.core.DataSourceManager.initDataSources(DataSourceManager.java:109)
	at org.wso2.carbon.privacy.forgetme.sql.instructions.DatasourceProcessorConfigReader.readProcessorConfig(DatasourceProcessorConfigReader.java:45)
	at org.wso2.carbon.privacy.forgetme.sql.instructions.DatasourceProcessorConfigReader.readProcessorConfig(DatasourceProcessorConfigReader.java:32)
	at org.wso2.carbon.privacy.forgetme.ConfigReader.loadExtensions(ConfigReader.java:213)
	at org.wso2.carbon.privacy.forgetme.ConfigReader.readSystemConfig(ConfigReader.java:112)
	at org.wso2.carbon.privacy.forgetme.ForgetMeTool.process(ForgetMeTool.java:201)
	at org.wso2.carbon.privacy.forgetme.ForgetMeTool.main(ForgetMeTool.java:141)
Caused by: java.lang.ClassNotFoundException: com.sun.istack.Pool
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	... 21 more

Solution:

Later versions of jaxb-impl no longer seem to contain that class, so need to add a dependency to the com.sun.istack libraries.

Need details on the arguments passed to run the tool

Description:
Need details on the below arguments as to what they mean
./forget-m -d <config-dir> -U <userName> [-D domainName] [-T tenantDomain]

example: config-dir - what should be the config directory, the path to config directory etc.

Suggested Labels:
Improvement

Suggested Assignees:
RuwanA

Missing files or directories should be explained in docs which is included in <Tool>/conf directory

Missing files or directories should be explained in docs which is included in /conf directory in this [1] doc

[1] https://docs.wso2.com/display/ADMIN44x/Removing+References+to+Deleted+User+Identities+in+WSO2+Products#RemovingReferencestoDeletedUserIdentitiesinWSO2Products-MasterConfig

config.json
datasources
log4j.properties -missing. Explain what is the use of this and whether we can configure this
log-config
product-config -missing
sql
streams - missing

Implement an extension to rename username.

Description:
The inbuilt extensions don't rename the username in the system.

Therefore an extension is needed to rename the username and the same functionality should be exposed as an API.

Maintainability become difficult when the number of log files increased

Description:
Removing the original file after temp file generation will create a new wso2carbon.log. Execution of the tool will replace the information in the latest wso2carbon.log and wso2carbon.log.temp.txt and create a set of temp files for these two. When the number of log files growing it will become difficult to maintain.

Suggested Labels
Improvement

Affected Product Version:
EI 6.1.1 update 23

OS, DB, other environment details and versions:
OS - ubuntu 15.10
DB - Mysql 5.6

Steps to reproduce:

  1. Start the server and perform some login operations from user A, user B
  2. Stop the server and execute the tool to replace user A -> this will create wso2carbon.log.temp.txt
  3. Remove the wso2carbon.log which contains the user information
  4. Restart the server (new wso2carbon.log will be created ) and login with User B (Related user login will be logged in the new wso2carbon.log)
  5. Execute the tool to replace user B,

Observations
This will create a temp log in the following format.
wso2carbon.log -> wso2carbon.log.temp.txt
wso2carbon.log.temp.txt -> wso2carbon.log.temp.txt.temp.txt

When the number of log files growing it will be difficult to maintain the logs in the log run. Better to come up with a proper mechanism to maintain the logs.

Related Issue
wso2/product-ei#1969

Add a tool to scan logs

Description:
A tool is required to list down all the logs in a given repository.

Suggested Labels:
Improvement

Suggested Assignees:
N/A

Affected Product Version:
N/A

OS, DB, other environment details and versions:
N/A

Steps to reproduce:
N/A

Related Issues:
N/A

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.