digitalegesellschaft / anonip Goto Github PK
View Code? Open in Web Editor NEWAnonip is a tool to anonymize IP-addresses in log-files.
License: Other
Anonip is a tool to anonymize IP-addresses in log-files.
License: Other
Hello
please implement an inplace-substitutionoption ( like in ' sed -i ') that takes only a single argument
Thank you
Hello,
I've working only with IPv4 and I have error Logs in like followed:
[Mon Jul 18 17:54:15.281165 2022] [ssl:info] [pid 32202] [client 11.22.33.44:11388] Some Text bla...
I try
cat error.log | egrep -o '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}'
and got sucessful my IP's
Next I try
anonip.py --input error.log -4 8 --regex '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' -d
and got
DEBUG:__main__:Regex did not match!
The same with:
anonip.py --input error.log -4 8 --regex '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}' -d
Any idears?
Regards,
Heiko
Hey,
for days now i try to get anonip.py combined with piped logs in apache2 up and running. I'm really frustrated since i've tried so many things, from changing permissions of severeal files and folders, trying every possible notation in the access.conf
and even to re-write the anonip.py itself, but to no avial. The ip adresses in access.log
doesn't change at all.
Did i miss something?
Adding a custom log pipe to Apache with anonip.py as target doesn't change the ip adresses in access.log
. The entries simply doesn't change.
Changed ip adresses in access.log
from something like:
192.168.137.95 - - [28/Jul/2022:15:51:14 +0200] "GET / HTTP/1.1" 200 1162 ....
to
192.168.[another number].[yet another number] - - [28/Jul/2022:15:51:14 +0200] "GET / HTTP/1.1" 200 1162 ...
CustomLog "|/var/www/anonip.py --ipv4mask 12 --output /var/log/apache2/access.log" combined
to /etc/apache2/apache2.conf
LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined
LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined
CustomLog "|/var/www/anonip.py --ipv4mask 12 --output /var/log/apache2/access.log" combined
LogFormat "%h %l %u %t \"%r\" %>s %O" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent
sudo systemctl restart apache2
So far there no error messages. The only error.log
entries are:
[Thu Jul 28 15:54:40.131208 2022] [mpm_prefork:notice] [pid 46167] AH00169: caught SIGTERM, shutting down
[Thu Jul 28 15:55:11.728647 2022] [mpm_prefork:notice] [pid 46218] AH00163: Apache/2.4.41 (Ubuntu) configured -- resuming normal operations
[Thu Jul 28 15:55:11.728716 2022] [core:notice] [pid 46218] AH00094: Command line: '/usr/sbin/apache2'
However calling the script by hand, like:
/path/to/script/anonip.py --ipv4mask 12 < /var/log/apache2/access.log --output /home/user/foobar.log
or /path/to/script/anonip.py --ipv4mask 12 < /var/log/apache2/access.log --output /var/log/apache2/access.log
works just fine.
I have trouble getting anonip to mask IPs in my nginx error.log file.
When I execute anonip as root, I receive the following error message:
# anonip < /var/log/nginx/error.log
WARNING:anonip:'2020' does not appear to be an IPv4 or IPv6 network
2020/03/05 19:27:43 [error] 1253#1253: *15347 open() "/usr/share/nginx/html/favicon.ico" failed (2: No such file or directory), client: XXX.XXX.XXX.XXX, server: address.tld, request: "GET /favicon.ico HTTP/1.1", host: "address.tld"
With XXX.XXX.XXX.XXX being some IP address (not masked) and address.tld the domain of the server.
I already tried the --delimiter and --column options:
# anonip --delimiter "," --column 2 < /var/log/nginx/error.log
WARNING:anonip:' client' does not appear to be an IPv4 or IPv6 network
Somehow I would need to tell anonip to look behind "client:", but that's not possible.
I thought anonip would work with error logs out of the box. Or have there been some recent changes in the way nginx is formating the error logs?
I am using nginx version nginx/1.14.0 (Ubuntu) and anonip 1.0.0 (installed through pip3).
When processing ErrorLog entries from Apache 2.4 in the following format (IPv6 short form notation + port)
2001:db8:1::ab9:C0A8:102:46824 [Wed Jul 06 21:28:43 2022] [error] [pid 68812] mod_proxy_fcgi.c(887): AH01071: Got error 'Primary script unknown'
I get the following error:
$ echo "2001:db8:1::ab9:C0A8:102:46824" | ./anonip.py
WARNING:__main__:'2a06' does not appear to be an IPv4 or IPv6 network
2a06:6440:0:2c80::1:46824
When I remove the port, it works:
$ echo "2001:db8:1::ab9:C0A8:102" | ./anonip.py
2001:db8::
This occurs with Python 3.8 and 3.9. Can this be fixed?
I assume anonip can be used for other programms than apache. I suggest stateing this in the Readme.
Our column-based approach of specifying the location of an IP address is not flexible enough to cover all usecases.
A good example of such a usecase can be seen in this issue. Since it's not possible to configure the log format for error logs in nginx, Anonip can't reliably detect IP addresses.
I propose an alternative regex matching IP detection.
I don't intend to match IP addresses with regexes! But I'd like to provide a way to point Anonip to the locations of IP addresses with a regex.
This alternative approach should be provided alongside the already existing column-based approach.
When using the new --regex
argument, the arguments --column
and --delimiter
will become obsolete.
--replace
can still be used, for cases, where we have matching groups, but they're not valid IP addresses.
The regexes provided in the examples are simplified and should just illustrate the proposed feature. For production environments you want to have more robust ones.
Let's use the log line from the before mentioned issue:
2020/03/05 19:27:43 [error] 1253#1253: *15347 open() "/usr/share/nginx/html/favicon.ico" failed (2: No such file or directory), client: XXX.XXX.XXX.XXX, server: address.tld, request: "GET /favicon.ico HTTP/1.1", host: "address.tld"
With the new feature in place, we could do:
$ ./anonip.py --regex ".* client\: ([^,]+), .*"
This would then match the provided log line and capture the IP address (XXX.XXX.XXX.XXX
) into the first group.
In order to find all IP addresses, Anonip would then iterate over all available matched groups (just one in this example).
Let's say we still want to handle above log line, but additionally we expect lines in the following format:
1970-01-01 - somefixedstring: XXX.XXX.XXX.XXX - exception foo - XXX.XXX.XXX.XXX
Note the two IP addresses.
This can be handled in one single regex:
$ ./anonip.py --regex "(?:.*, client\: ([^,]+), .*|.* - somefixedstring\: ([^,]+) - .* - ([^,]+))"
This opens a box of very verbose and hardly readable commands needed to run Anonip against certain logs.
But for more advanced users, it would fill the gap which exists now for parsing log files with formats that are not parseable by Anonip.
Is there also a Usage Example for nginx?
Using anonip with nginx could be made much more easy for the admin when leveraging systemd. I already prepared the required information, which includes two systemd unit-files. Just to avoid unnecessary work:
The old homepage https://www.privacyfoundation.ch/de/service/anonip.html should prominently point to this repository.
The old homepage ist stll the first hit when searching the web for "apache log anonip" and It took me some time to figure out there is this repo and anonip is still maintained.
When processing a logfile that contains binary parts, the following exception gets thrown:
Traceback (most recent call last):
File "anonip.py", line 508, in <module>
main()
File "anonip.py", line 491, in main
for line in anonip.run(input_file):
File "anonip.py", line 161, in run
line = input_file.readline()
File "/usr/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 1040: invalid start byte
While obviously processing purely binary content isn't the target of this project, this issue arose while anonymizing an nginx error.log which contained the following line:
2022/01/09 05:21:49 [info] 58271#58271: *55771 client sent invalid method while reading client request line, client: 192.0.2.0, server: foo.example.org, request: "<binary rubbish>"
Note that there's even an IP address in that line that needs to be anonymized!
So maybe the file shouldn't be read as UTF-8, or as string at all for that matter, but as bytes?
Im use Apache and have several Python versions installed
"python" defaults to the "2.7.17" binary
Using anonyip in Apache2 yields:
Traceback (most recent call last):
File "/pathto/anonip.py", line 422, in <module>
main()
File "/pathto/anonip.py", line 406, in main
print(line, file=output_file)
TypeError: write() argument 1 must be unicode, not str
AH00106: piped log program '/pathto/anonip.py --ipv4mask 8 --ipv6mask 64 --column 2 --output /var/log/apache2/access.log' failed unexpectedly
According to some research, it might be to some change were in this line, python2 would expect a str
, where python3 would expect a utf8
.
This might be fixed by adding a conversion in that case.
In my case I just changed the shebang to python3 and now it works fine for me.
I tried it also with ErrorLog, but it doesn't replace the IP address in format like that line:
[Wed Jun 27 20:37:49.123456 2018] [cgi:error] [pid 1234] [client 222.111.222.111:12345] script not found or unable to stat: /var/www/cgi-bin/test.php5
Would be nice if it could work here too. Thanks.
This might require a MANIFEST.in
For the records: I filed a "request for packaging" for
When using the anonip
module from Python applications, a logger
instance is created at import time and basicConfig
is called immediately. Regrettably logging.basicConfig
modifies global state, and thus will result in side effects for the calling application.
Consider the following very simple Python script:
import logging
logging.basicConfig(level=logging.DEBUG)
logging.debug('This is a debug message')
logging.error('This is an error message')
Output:
DEBUG:root:This is a debug message
ERROR:root:This is an error message
Importing anonip
breaks the logging configuration of the script:
import logging
from anonip import Anonip
logging.basicConfig(level=logging.DEBUG)
logging.debug('This is a debug message')
logging.error('This is an error message')
Now the debug
line is missing from the output:
ERROR:root:This is an error message
In order to fix this, I propose to move the call to 'basicConfig' into 'main()'.
I currently use anonip to change the last byte (I have IP4 addresses only, for now) in the logs, changing e.g. 123.123.123.123
to 123.123.123.0 (the default behaviour).
Now I'd like to except my "own" addresses from log analyzing. Let's say the external IP addresses from our company are 123.123.123.122
and 123.123.123.123
(or 123.123.123.122/31
). I'd like to leave these addresses intact for e.g. AWStats to recognize them as "internal" and ignore them (SkipHosts
configuration setting).
This is similar to (but much simpler than) the --skip-private
option which allows
to except Special-Use Addresses from masking.
Currently I'd need to tell AWStats to ignore 123.123.123.0
altogether, missing e.g. all requests from 123.123.123.128/25
. Or do very complicated things. It shouldn't be necessary to parse the logs for particular IP addresses more than once, right?
I propose a --skip
option which could be used independently from --skip-private
; it could add to a list or take more than one value.
I followed the instructions, and configured it properly. Restarted the Server -even made a reboot of the system- yet it doesn't run. I checked the running processes and realized, that it's a Zombie Process in my system: 2955 0.0 0.0 0 0 ? Z 13:14 0:00 [anonip.py]
killing the process does not help either. any ideas?
Hallo,
danke für die ausführlichen Antworten. Leider funktioniert es immer noch nicht. :/
Also nochmal step by step:
$ echo "192.168.0.123" | anonip.py
192.168.0.0
Der Befehl echo funktioniert ja so bei Windows nicht. Was aber funktioniert ist:
C:\Apache24\Anonip-main>python anonip.py --input test.log
10.xxx.xxx.0 [11/Jan/2022:09:06:26 +0100] "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36 Edg/96.0.1054.62"
10.xxx.xxx.0 [11/Jan/2022:09:06:26 +0100] "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36 Edg/96.0.1054.62"
Die IP Adresse wird hier anonymisiert. Das Skript läuft also durch. Die anonymisierten IP-Adressen werden aber nicht in die test.log geschrieben, sondern lediglich in der Konsole ausgegeben.
Hab ich gemacht. Bringt nix.
CustomLog "|/usr/bin/python3.10 Anonip-main/anonip.py --ipv4mask 8 --output logs/test.log" combined
Da verstehe ich nicht, was das bringen soll, bzw. ich verstehe nicht, warum hinte python 3.10 kein "/" kommt. Letztenendes würde die Anonip ja dann nur in einem anderen Ordner liegen. Pyhton ist global für alle User installiert, da ich Admin-Rechte habe.
Ich habe es also so probiert:
CustomLog "C:/Program Files (x86)/Python37-32 Anonip-main/anonip.py --ipv4mask 8 --output logs/test.log" combined
=> bringt auch nichts, da gibt es nichtmal eine Fehlermeldung im error.log
CustomLog "|/path/to/virtualenv/bin/python3.10 Anonip-main/anonip.py --ipv4mask 8 --output logs/test.log" combined
Weiß nicht, wie das gehen soll, bzw. Python ist ja sowieso global installiert, Umgebungsvariable gesetzt und es müsste also so laufen.
Wie gesagt, habe Python in 32 und 64 bit probiert. Der Server ist ein 64 bit Server, Apache als 64 bit Version installiert.
Python Version(en): Python37-32, Python37 als 64 bit, python 3.8.8
OS Version: Windows Server 2016 Standard, Version 1607
Apache Version:
C:\Apache24\bin>httpd -v
Server version: Apache/2.4.41 (Win64)
Apache Lounge VS16 Server built: Aug 9 2019 16:46:32
Das verstehe ich nicht.
i have some apache logs that looks like
[Tue Apr 21 21:10:08.859997 2020] [php7:warn] [pid 16519] [client 89.154.188.26:57424] PHP Warning:
thats in the error.log. The combined log looks fine. but there is no port behind the IP Adress.
Some quick test with python 2 looks like:
192.168.2.1
--> 192.168.0.0
192.168.2.1:16852
--> 192.168.2.1:16852
Consider the following slightly modified access log line from the first regex test:
3.3.3.3 - - [20/May/2015:21:05:01 +0000] "GET /723.3.3.357 HTTP/1.1" 200 13358 "-" "useragent"
The requested URI could be an OID from an SNMP MIB or something like that.
The current implementation would replace the 3.3.3.3 in that URI, even though it has nothing to do with the client IP address and isn't, in fact, an IP address at all:
3.3.0.0 - - [20/May/2015:21:05:01 +0000] "GET /723.3.0.057 HTTP/1.1" 200 13358 "-" "useragent"'
Note that this depends on the real client address being contained in the URI. The following line
2.2.2.2 - - [20/May/2015:21:05:01 +0000] "GET /723.3.3.357 HTTP/1.1" 200 13358 "-" "useragent"
doesn't modify the URI:
2.2.0.0 - - [20/May/2015:21:05:01 +0000] "GET /723.3.3.357 HTTP/1.1" 200 13358 "-" "useragent"
Hence this could even reveal the real client IP address if only 723.3.3.357 makes sense in that place, and 723.3.0.057 doesn't.
I'll open a PR in a minute that - among other things - addresses this problem by only replacing the groups where they actually matched and modifies the first test case to highlight this problem.
echo "test.de 2001:0db8:85a3:0000:0000:8a2e:0370:7334 - - [21/Nov/2019:11:07:56 +0000] - - - -" | /usr/local/sbin/anonip.py --column 2 WARNING:__main__:u'2001' does not appear to be an IPv4 or IPv6 network test.de 2001:0db8:85a3:0000:0000:8a2e:0370:7334 - - [21/Nov/2019:11:07:56 +0000] - - - -
see #33 for an possible fix
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.