Comments (25)
I've released a new version v0.6.2. One major change is that I had to split sqlite to keep the command line clean. Refer below. I'll work on revamping docs. Please follow #38
usage: piicatcher sqlite [-h] -s PATH [-f {ascii_table,json,db}]
[-c {deep,shallow}] [-o OUTPUT] [--list-all]
optional arguments:
-h, --help show this help message and exit
-s PATH, --path PATH File path to SQLite database
-f {ascii_table,json,db}, --output-format {ascii_table,json,db}
Choose output format type
-c {deep,shallow}, --scan-type {deep,shallow}
Choose deep(scan data) or shallow(scan column names
only)
-o OUTPUT, --output OUTPUT
File path for report. If not specified, then report is
printed to sys.stdout
--list-all List all columns. By default only columns with PII
information is listed
from piicatcher.
@vrajat Getting this error after v0.6.2. files also not working now
from piicatcher.
Can you check after running pip install pymssql
? The package is part of requirements and it should have been involved. I am not sure why it got missed. I'll see if I can reproduce it in the background.
from piicatcher.
Tried installing pymssql
from piicatcher.
More info: #29 (comment)
Moore's law. I'll release a new version with the right version.
from piicatcher.
For temporary fix I ran pip install "pymssql<3.0". Now Files are running fine. For db not able to get data:
from piicatcher.
OK. By default it only lists columns that has PII. You can list all columns using --list-all
.
Shallow checks names of columns only. It looks for column names like name, email etc using regular expressions from https://github.com/madisonmay/CommonRegex. Do you have columns with these names ?
Another option is to also check data using -c deep
from piicatcher.
Yes I have such columns with names:
--list-all with -c deep
from piicatcher.
I recreated the table and it works for me:
Can you check if piicatcher can read all the tables in the sqlite database ? list all should not have returned an empty list. That points that it cannot find any tables and columns.
What are the data types of these columns ? piicatcher only checks text, varchar and char columns.
sqlite does not have user/roles/grant. So it cant be that.
Can you check if the following query returns any rows in sqlite3 ?
sqlite3 db.data.sqlite
sqlite>SELECT
"" as schema_name,
m.name as table_name,
p.name as column_name,
p.type as data_type
FROM
sqlite_master AS m
JOIN
pragma_table_info(m.name) AS p
WHERE
p.type like 'text' or p.type like 'varchar%' or p.type like 'char%'
ORDER BY
m.name,
p.name
from piicatcher.
May be I am missing something
This is my sqlite
Anything you can see is not correct ?
from piicatcher.
Piicatcher should definitely detect these columns. I dont see anything wrong. Can you run this SQL query:
SELECT
"" as schema_name,
m.name as table_name,
p.name as column_name,
p.type as data_type
FROM
sqlite_master AS m
JOIN
pragma_table_info(m.name) AS p
ORDER BY
m.name,
p.name
from piicatcher.
Any issues ?
from piicatcher.
Yes. data_type is null or empty. That is unexpected. I get
|test|address|varchar
|test|city|varchar
|test|company_name|varchar
|test|county|varchar
|test|email|varchar web
|test|first_name|varchar
|test|last_name|varchar
|test|phone1|varchar
|test|phone2|varchar
|test|state|varchar
|test|zip|varchar
What was your CREATE TABLE statement ? My guess is that you did not specify a type. I can reproduce with:
create table no_type(a,b);
# run select query
|no_type|a|
|no_type|b|
Can you recreate the table with data types ? For example:
create table test(first_name varchar, last_name varchar, company_name varchar, address varchar, city varchar, county varchar, state varchar, zip varchar, phone1 varchar, phone2 varchar, email varchar web);
from piicatcher.
I have script like this which has datatypes:
CREATE TABLE IF NOT EXISTS [userdata] (
[first_name] VARCHAR NULL,
[last_name] VARCHAR NULL,
[company_name] VARCHAR NULL,
[address] VARCHAR NULL,
[city] VARCHAR NULL,
[county] VARCHAR NULL,
[state] VARCHAR NULL,
[zip] INT NULL,
[phone1] VARCHAR NULL,
[phone2] VARCHAR NULL,
[email] VARCHAR NULL,
[web] VARCHAR NULL
);
from piicatcher.
I get the data type for that table.
|userdata|address|VARCHAR
|userdata|city|VARCHAR
|userdata|company_name|VARCHAR
|userdata|county|VARCHAR
|userdata|email|VARCHAR
|userdata|first_name|VARCHAR
|userdata|last_name|VARCHAR
|userdata|phone1|VARCHAR
|userdata|phone2|VARCHAR
|userdata|state|VARCHAR
|userdata|web|VARCHAR
|userdata|zip|INT
Which version of sqlite do you have ?
I have:
SQLite 3.29.0 2019-07-10 17:32:03 fc82b73eaac8b36950e527f12c4b5dc1e147e6f4ad2217ae43ad82882a88alt1
zlib version 1.2.11
gcc-9.2.1 20191008
from piicatcher.
V 3.30.1.0
from piicatcher.
OK. Then I am at a loss why data type is not returned for you.
from piicatcher.
I altered script and removed null and now I am getting the desired result:
But piicatcher sqlite still not returning data
from piicatcher.
My bad.... Now its working.. Thanks a lot...
from piicatcher.
from piicatcher.
I have tested piicatcher on different file formats like json, plaintext, xml, csv but does not work with xlsx, xlx
from piicatcher.
It does not support binary formats like xlsx, xls and PDF yet. Can you file a new feature request ?
from piicatcher.
Sure ...
from piicatcher.
Getting error while having deep scan mysql db
from piicatcher.
I've released a new version. Thanks for all the bug reports! Please open a new issue for future problems.
from piicatcher.
Related Issues (20)
- Error parsing info on dropped column during deep (data) detect command
- sqlalchemy.orm.exc.NoResultFound: No row was found for one() HOT 6
- Datahub ingestion function HOT 2
- No row was found for one() when trying Local File
- Support Google Cloud BigQuery HOT 1
- Support Google Cloud Spanner HOT 1
- Unable to Connect to Postgres HOT 5
- Scan can take DAYS on large database clusters HOT 2
- Support OpenMetadata integration HOT 1
- Redshift doesnt support bernoulli tablesample HOT 1
- Unclear example of export to datahub HOT 4
- Unique Constraint Failed HOT 1
- Update ReadMe to accommodate new commands and remove outdated data
- Switches for views, external schemas, remote DBs, etc. HOT 2
- piicatcher installation stuck HOT 6
- Columns names and data are identified incorrectly pii HOT 1
- Connection refused when scanning Postgres with Docker HOT 3
- error: subprocess-exited-with-error HOT 5
- PII installation error HOT 3
- Unable to scan Redshift catalog
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from piicatcher.