reevoo / anon Goto Github PK
View Code? Open in Web Editor NEWReplaces personal e-mails with fake e-mails
License: MIT License
Replaces personal e-mails with fake e-mails
License: MIT License
We should make all the options flags, the, in and out file should be flags to, we should default to read from std in and write to std out. So I can do stuff like
mysql -uroot secret_database -e 'SELECT * from secet_user_files' | anon --text > not_so_secret_file.csv
It encourages people then to not keep unneded copies of sensitive data around.
Isn't it better idea to generally use text anon instead of CSV?
given the file personas-a-correo-no-deseado.csv
correo electronico,color de pelo,bigote
[email protected], azul,verdadero
[email protected],verde, falso
[email protected], rosa, verdadero
I want to run the following comand: anon --in personas-a-correo-no-deseado.csv --out anoned.csv --column 'correo electronico'
and get the following output:
correo electronico,color de pelo,bigote
[email protected], azul,verdadero
[email protected],verde, falso
[email protected], rosa, verdadero
This is a nice piece of open-source-able functionality.
What needs to be done?
@reevoo.com
from code$anon csv test_file.csv test_out.csv 1
/Users/ed/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/anon-0.0.1/lib/anon/cli.rb:30:in `csv': uninitialized constant Anon::Csv (NameError)
from /Users/ed/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/anon-0.0.1/lib/anon/cli.rb:8:in `parse!'
from /Users/ed/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/anon-0.0.1/bin/anon:5:in `<top (required)>'
from /Users/ed/.rbenv/versions/1.9.3-p194/bin/anon:23:in `load'
looks like a typo of Anon::Csv
vs Anon::CSV
in lib/anon/cli.rb:30
Frequently we deal with CSV's with pipe separation. It would be good to recognise the separator and use that as the output separator for consistency.
We can also use CodeClimate instead of simplecov (CC is also using simplecov) and have a nice badge with coverage.
http://data.iana.org/TLD/tlds-alpha-by-domain.txt has some fairly long tlds
for example .xn--clchc0ea0b2g2a9gcd i.e. .சிங்கப்பூர்
we should make the regex more permissive than it is.
In revieworld we have been using /^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,6}$/i
for validation of emails...
We should try /^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,24}$/i
to start with.
Some possibly useful ones.
I did RetailerFeedSetting.where(feed_queue_name: 'purchasers').map { |p| p.replacement_headers[:email] }.compact.uniq
["email",
"zemailaddress",
"contact email address",
"emailaddress",
"e-mail",
"personal email",
"email address",
"ct_email_addr",
"e_mail_address",
"student_e_mail",
"e-mail address",
"parent email address",
"customer email address",
"email_address",
"contact address.email",
"user email string",
"email client",
"client email",
"ot ship-to email address",
"customer email",
"clnp email address1",
"default_email",
"email_addr",
"sender_email",
"webemail"]
We should be able to detect which columns need annoning quite simply . . .
This seems to anon the file ok but blows up
$echo '[email protected]' > test.in && anon text test.in test.out
/Users/ed/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/anon-0.0.1/lib/anon/base.rb:40:in `round': Infinity (FloatDomainError)
from /Users/ed/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/anon-0.0.1/lib/anon/base.rb:40:in `complete_progress'
from /Users/ed/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/anon-0.0.1/lib/anon/text.rb:24:in `anonymise!'
from /Users/ed/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/anon-0.0.1/lib/anon/base.rb:10:in `anonymise!'
from /Users/ed/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/anon-0.0.1/lib/anon/cli.rb:19:in `text'
from /Users/ed/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/anon-0.0.1/lib/anon/cli.rb:8:in `parse!'
from /Users/ed/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/anon-0.0.1/bin/anon:5:in `<top (required)>'
from /Users/ed/.rbenv/versions/1.9.3-p194/bin/anon:23:in `load'
from /Users/ed/.rbenv/versions/1.9.3-p194/bin/anon:23:in `<main>'
$cat test.out
[email protected]
For example it would be nice for this:
anon config --email-domain example.com
to create or update the file ~/.anonrc
email_domain: 'example.com'
and then for the config in that file to be used when annoning things.
Currently we create a lookup table as we parse e-mail addresses to ensure we return the same address. This doesn't scale when we get to 100,000+ addresses.
An alternative would be to hash our addresses, but we want to keep the e-mail addresses human readable. Fortunately there is a hashing algorithm around that returns human-readable output. As a bonus, the hashes it returns are ridiculous.
We should look at porting this to Ruby and using it.
beacause, and then it will make installing dependencies easier and expose a bin thingy
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.