Giter Club home page Giter Club logo

text-sanitizer-rs's People

Stargazers

 avatar

Watchers

 avatar

text-sanitizer-rs's Issues

Sanitizer Structure

To improve the module API a structure that holds important information about the request will be introduced with this development.

Tests with complete non UTF-8 Files

In order to be able to reproduce just any issue with the Text-Sanitizer tool a test case is needed which is able to work with any given text file.

The file tests/data/server_access_sheat.txt is formatted as Windows - ISO-8859-15
and it cannot be edited correctly in Linux Command Line environments.

$ cat server_access_sheat.txt|grep -i password|wc -l
grep: (entrada estándar): binary file matches
0
$ cat -A server_access_sheat.txt|grep -i password|wc -l
4
$ cat -A server_access_sheat.txt|grep -i password
contraseM-qa:^I<root_password>^M$
contraseM-qa:^I<mysql_password>^M$
contraseM-qa:^I<ftp_password01>^M$
contraseM-qa:^I<ftp_password02>^M$
$  echo "contraseña"|cat -A
contraseM-CM-1a$

Also, _Rust _ Applications that try read_into_string() on it will crash.

The Special Characters are differently formatted.
Valid UTF-8 content can be correctly displayed in Rust

$ echo "contraseña" |../../target/debug/text-sanitizer -i en es -d
vtext 0:'[99, 111, 110, 116, 114, 97, 115, 101, 195, 177, 97, 10]'
pdg spec chars '8 - 10': '[195, 177]'
; sequence 0 (cnt: '2', strt: '0', end: '2'): '[c3, b1]' - parsing ...
; sequence (cnt: '2', strt: '0', end: '2'): '[c3, b1]' - parsing ...
utf8 ok: 'ñ'
uni res: '["ñ"]'
= ["f1"] -> 'n''
; sanitze done.
chrs rpt: '"; 0 - 99:\'c\'; 1 - 111:\'o\'; 2 - 110:\'n\'; 3 - 116:\'t\'; 4 - 114:\'r\'; 5 - 97:\'a\'; 6 - 115:\'s\'; 7 - 101:\'e\'; 8 - 195:\'Ã\' - non-ascii \'None > Some(8) - None\'|; 9 - 177:\'±\' - non-ascii \'Some(8) - None\'|; 10 - 97:\'a\'; 11 - 10:\'\n\'; chr cnt \'12\'"'
stt rpt chrs (count : '11'):
['c', 'o', 'n', 't', 'r', 'a', 's', 'e', 'ñ', 'a', '\n']
stt chrs ascii:
c|o|n|t|r|a|s|e|\u{f1}|a|
|
rs rpt chrs (count : '11'):
"contrasena\n"
contrasena

While Windows - ISO-8859-15 uses different bytes to store the same content.

$ perl -e 'my @arrchrs = (99, 111, 110, 116, 114, 97, 115, 101, 241, 97, 58); print pack "U*", @arrchrs;' | ../../target/debug/text-sanitizer -i en es -d
vtext 0:'[99, 111, 110, 116, 114, 97, 115, 101, 241, 97, 58]'
pdg spec chars '8 - 9': '[241]'
; sequence 0 (cnt: '1', strt: '0', end: '1'): '[f1]' - parsing ...
; sequence (cnt: '1', strt: '0', end: '1'): '[f1]' - parsing ...
utf8 Err: 'Utf8Error { valid_up_to: 0, error_len: None }'
vld ps: '0'
vld idx: '0'
ivld chrs cnt: all
ivld chr: 'f1'
uni res: '["(?f1)"]'
= ["f1"] -> 'n''
; sanitze done.
chrs rpt: '"; 0 - 99:\'c\'; 1 - 111:\'o\'; 2 - 110:\'n\'; 3 - 116:\'t\'; 4 - 114:\'r\'; 5 - 97:\'a\'; 6 - 115:\'s\'; 7 - 101:\'e\'; 8 - 241:\'ñ\' - non-ascii \'None > Some(8) - None\'|; 9 - 97:\'a\'; 10 - 58:\':\'; chr cnt \'11\'"'
stt rpt chrs (count : '11'):
['c', 'o', 'n', 't', 'r', 'a', 's', 'e', '�', 'a', ':']
stt chrs ascii:
c|o|n|t|r|a|s|e|\u{fffd}|a|:|
rs rpt chrs (count : '11'):
"contrasena:"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.