kassner / log-parser Goto Github PK
View Code? Open in Web Editor NEWPHP Web Server Log Parser Library
License: Apache License 2.0
PHP Web Server Log Parser Library
License: Apache License 2.0
I want a web application like PIMP , and use your gut repository to parse my log files, please guide me how to use it, or provide some manual or instructions
This nice little library can be used for parsing other web server's log file!
I would suggest to rename it to make it more obvious to people that it is not only usable for apache!
Hello,
I want to read the access.log as pipeline, if request file have some format need to do some stuffs.
It is posible?
When using this package under PHP 8.2, the following warning is emitted PHP Deprecated: Creation of dynamic property is deprecated
.
PRs
I am trying to parse custom nginx access logs and keep getting a Kassner\LogParser\FormatException. This exception is not terribly verbose and I can't figure out what I did wrong. Can someone explain why this isn't working?
//nginx format (as written in nginx.conf)
//'$remote_addr - $remote_user [$time_local] ($host) "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'
$parser = new \Kassner\LogParser\LogParser();
$parser->addPattern('%NGXREMOTEADDR', '(?P<remoteaddr>.+)');
$parser->addPattern('%NGXDT', '(?P<datetime>[\d+/ :]+)');
$parser->addPattern('%NGXHST', '(?P<host>.+)');
$parser->addPattern('%NGXREQ', '(?P<request>.+)');
$parser->addPattern('%NGXSTATUS', '(?P<status>.+)');
$parser->addPattern('%NGXBYTES', '(?P<response_bytes>.+)');
$parser->addPattern('%NGXREF', '(?P<referer>.+)');
$parser->addPattern('%NGXAGENT', '(?P<user_agent>.+)');
$parser->setFormat('%NGXREMOTEADDR - %u [%NGXDT] (%NGXHST) "%NGXREQ" %NGXSTATUS %NGXBYTES "%NGXREF" "%NGXAGENT"');
at first I tried to do it the "easy way" like so but also got a format exception from the following:
$parser->setFormat('%h - %u [%t] (%h) "%r" %>s %b "%{Referer}i" \\"%{User-Agent}i"');
Having looked into the code a bit, it seems that i might need to escape some literals. I'm trying something like the following with the same result:
$parser->setFormat('%h \- %u \[%t\] \(%v\) \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"');
going back to the readme, I noticed the "default nginx" example, which led me to the following, which is still throwing the exception:
$parser->setFormat('%h - %u %t (%v) "%r" %>s %b "%{Referer}i" \"%{User-Agent}i"');
This turned out to be inconsistencies in our log files and was in no way the fault of your code.
I'm working on some log analyzing and came across some uncool stuff on my live server which reads as follows:
111.222.333.444 - - [25/Jun/2017:11:51:35 +0200] "\x16\x03\x01" 400 0 "-" "-"
Is there a possibility to skip those records and keep on parsing the rest?
Hello,
i am getting Kassner\LogParser\FormatException when a line contains domain user name.
::1 - DOMAIN\\user_name [03/Feb/2023:18:50:42 +0100] "GET /app/index.php HTTP/1.1" 200 229931
Do you have any idea how to fix it please?
Thank you
Is it possible to enable a pointer, so that when running this with crond at subsequent cycle it doesn't read the entire log file again? Thanks.
How to set nginx error.log format ?
Hi,
With the parser set to the common log format, I am getting a formatException with common log format string.
$line='31.32.246.3 - - [18/May/2019:09:26:02 -0400] "GET / HTTP/1.1" 200 3380 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0"';
$parser = new \Kassner\LogParser\LogParser();
$parser->setFormat('"%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"');
$entry = $parser->parse($line);
I get the same exception without changing the log format. From what I see the formats match, any clue why this could happen ?
Thanks
Hi,
I am trying to read an apache log in the following format:
127.0.0.1 - - [11/Jan/2016:18:30:16 +1300] "GET / HTTP/1.1" 403 505 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/47.0.2526.73 Chrome/47.0.2526.73 Safari/537.36"
I am using the following format:
$parser->setFormat('%h - - %t "%r" %s %b "-" "%{User-Agent}i"');
I have also tried:
$parser->setFormat('%h %l %u %t "%r" %s %b "%{Referer}i" \"%{User-Agent}i"');
but each time i get the format exception error. Any chance you can help me on this? I have tried setting no parser (so as to use the default) but no luck there either.
This is related to issues #58 and #50.
As described in #58, I'm getting some malformed HTTP requests in my nginx server access log, like these:
162.243.128.19 - - [18/Feb/2023:06:38:30 -0500] "MGLNDD_70.110.25.35_80" 400 150 "-" "-"
159.65.204.184 - - [19/Feb/2023:02:54:04 -0500] "\x16\x03\x01\x00{\x01\x00\x00w\x03\x03\x03VJT\xE3REk\xFE\x89\x5C\xCE\xFF\xBBh\xAF\xA5}@t6\x9D\xBA\xAA3\x22rWR\xAC\xB8\x90\x00\x00\x1A\xC0/\xC0+\xC0\x11\xC0\x07\xC0\x13\xC0\x09\xC0\x14\xC0" 400 150 "-" "-"
192.241.225.22 - - [19/Feb/2023:05:16:37 -0500] "SSH-2.0-Go" 400 150 "-" "-"
I've modified my code using the fix suggested in #58 and described in #50:
$laxParser = new \Kassner\LogParser\LogParser();
$laxParser->setFormat('%h %l %u %t "%r" %>s %O "%{Referer}i" \"%{User-Agent}i"');
$laxParser->addPattern('%r', '(?P<request>.+)');
...
$entry = $laxParser->parse($line);
However, the lines above are still causing a FormatException to be thrown. Did I miss something, or is there another pattern match that's failing here? If the "addPattern" worked, it should match instances of 1 or more of any character in the request- right?
log-parser can't handle Apache log entries where the third field is ""
rather than a dash. I see these log entries every so often.
88.99.141.3 - "" [18/Oct/2017:16:56:46 -0400] "GET / HTTP/1.1" ...
Trace:
PHP Fatal error: Uncaught Kassner\LogParser\FormatException: 88.99.141.3 - "" [02/Jul/2017:21:35:06 -0400] "GET / HTTP/1.1" 200 16153 "http://example.com/" "Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0"
in /whatever/Database/vendor/kassner/log-parser/src/LogParser.php:82
Stack trace:
#0 /whatever/Database/import-log.php(21): Kassner\LogParser\LogParser->parse('88.99.141.3 - "...')
#1 /whatever/Database/import-log.php(40): parseFile(Object(Kassner\LogParser\LogParser), 'logs/./logs09/a...')
#2 {main}
thrown in /whatever/Database/vendor/kassner/log-parser/src/LogParser.php on line 82
Hi there,
Great job on the lib. Works perfectly! However, would you mind tagging a stable version of the lib? As it enables composer cache.
Thanks !
Ran this:
<?php $loader = require __DIR__ . '/vendor/autoload.php'; $parser = new \Kassner\LogParser\LogParser(); $parser->setFormat('%h %l %u %t "%r" %>s %O "%{Referer}i" \"%{User-Agent}i"'); $lines = file('/var/log/nginx/access.log', FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES); foreach ($lines as $line) { $entry = $parser->parse($line); } ?>
and got this:
#0 /home/xxx/analiticas/analytics.php(7): Kassner\LogParser\LogParser->parse('xxx.xxx.xxx.xxx -...') #1 {main} thrown in /home/v/xxx/vendor/kassner/log-parser/src/LogParser.php on line 82 [29-Sep-2020 19:12:06 America/New_York] PHP Fatal error: Uncaught Kassner\LogParser\FormatException: xxx.xxx.xxx.xxx - - [29/Sep/2020:04:37:17 -0400] "\x03\x00\x00/*\xE0\x00\x00\x00\x00\x00Cookie: mstshash=Administr" 400 157 "-" "-" "-" in /home/xxx/axxx/vendor/kassner/log-parser/src/LogParser.php:82 Stack trace: #0 /home/xxx/analiticas/analytics.php(7): Kassner\LogParser\LogParser->parse('xxx.xxx.xxx.xxx -...') #1 {main} thrown in /home/xxx/analiticas/vendor/kassner/log-parser/src/LogParser.php on line 82
Is there a way to use this to parse syslog? I get this went attempting to run it against syslog:
Exception: Apr 2 00:06:46 chris-x1 kernel: [347472.787552] mce: CPU6: Package temperature above threshold, cpu clock throttled (total events = 393711)
In [/var/www/personal/easyupp/vendor/kassner/log-parser/src/LogParser.php, line 82]
2020-04-02 22:40:00 Error: [Kassner\LogParser\FormatException] Apr 2 00:06:46 chris-x1 kernel: [347472.787552] mce: CPU6: Package temperature above threshold, cpu clock throttled (total events = 393711) in /var/www/personal/easyupp/vendor/kassner/log-parser/src/LogParser.php on line 82
Stack Trace:
- /var/www/personal/easyupp/src/Command/ReadSysLogCommand.php:21
- /var/www/personal/easyupp/vendor/cakephp/cakephp/src/Console/BaseCommand.php:175
- /var/www/personal/easyupp/vendor/cakephp/cakephp/src/Console/CommandRunner.php:336
- /var/www/personal/easyupp/vendor/cakephp/cakephp/src/Console/CommandRunner.php:171
- /var/www/personal/easyupp/bin/cake.php:12
This real-life Apache log entry caused log-parser to crash on parse():
62.210.177.12 - - [05/Apr/2020:06:58:22 -0400] "post /wp-json/trx_addons/v2/get/sc_layout?sc=wp_insert_user&role=administrator&user_login=ndvtzaifnz&user_pass=6Wlh6SA0RT HTTP/1.1" 301 800 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36"
When post
is changed to POST
, log-parser works fine.
(I'm guessing that this log entry was produced by a WordPress hacking attempt.)
Can this parser parse apache error logs?
Error logs have a slightly different format for recording time [Fri Dec 16 01:46:23 2005] [error] [client 1.2.3.4] Directory index forbidden by rule: /home/test/
then what I can see is expected in access logs [13/Sep/2006:07:01:53 -0700]
Hello, i wonder if someone could help me figure this out:
When a file is downloaded via PHP script Apache doesn't log "responseBytes" instead it shows "-"
How could i make Apache log it properly?
I tried to parse the apache 'combined' log file as used in debian and other distros by default
Apache config:
LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined
Log-Parser:
$parser->setFormat('"%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\""');
$parser->setFormat('%h %l %u %t "%r" %>s %O "%{Referer}i" \\"%{User-Agent}i"');
Couldn't get it to work with the formats used above. What am I doing wrong?
I've set my parser to use the default Nginx format as described on the project README:
$parser->setFormat('%h %l %u %t "%r" %>s %O "%{Referer}i" "%{User-Agent}i"');
A line in my nginx log is causing a format exception to be thrown:
PHP Fatal error: Uncaught Kassner\LogParser\FormatException: 162.243.142.22 - - [17/Feb/2023:02:50:10 -0500] "MGLNDD_1.2.3.4_80" 400 150 "-" "-" in /test/vendor/kassner/log-parser/src/LogParser.php:83
Stack trace:
#0 /test/test.php(12): Kassner\LogParser\LogParser->parse()
#1 {main}
thrown in /test/vendor/kassner/log-parser/src/LogParser.php on line 83
I don't see how this line fails to match the specified format. Am I missing something?
I have an api that i need to track each access log. Is there a way to parse only the request that are coming from my API request? Thanks
Fatal error: Uncaught Kassner\LogParser\FormatException: 78.129.237.156 - - [10/Jun/2020:00:01:31 +0200] "GET /?action=display&bridge=FB2&u=complexdoc&format=Mrss HTTP/1.1" 200 81701 "-" "RSSMix/3.0" "78.129.237.156" in /var/www/html/admin/vendor/kassner/log-parser/src/LogParser.php:82 Stack trace: #0 /var/www/html/admin/access.log.php(16): Kassner\LogParser\LogParser->parse('78.129.237.156 ...') #1 {main} thrown in /var/www/html/admin/vendor/kassner/log-parser/src/LogParser.php on line 82
nginx.conf
:
log_format main '$remote_addr - $remote_user [$time_local] "$host" "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for"';
php file:
//'$remote_addr - $remote_user [$time_local] "$host" "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for"';
$parser->setFormat('%a - %u [%t] "%v" "%r" %>s %O "%{Referer}i" \"%{User-Agent}i" "{X-Forwarded-For}i"');
access.log
:
172.68.189.227 - - [10/Jun/2020:17:46:46 +0200] "minopia.de" "POST / HTTP/1.1" 405 575 "-" "Mozlila/5.0 (Linux; Android 7.0; SM-G892A Bulid/NRD90M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/60.0.3112.107 Moblie Safari/537.36" "13.91.103.115"
A log file contains some garbage columns that i don't need to work with. How do I skip the bad columns?
It is necessary to cover bad requests like this:
// Format: %h %l %u %t \"%r\" %>s %O "%{Referer}i" "%{User-Agent}i"
54.232.125.255 - - [07/Oct/2013:04:14:01 +0000] "" 400 0 "-" "-"
Although it is an invalid request, it is a valid log entry, and should be parsed correctly.
User field in my log contains dots, because users are web sites via api, and logins are domain names.
Nginx is OK with this, so any log parsing software must support this.
Now I changed 'user' field from %u
to (?P<user>(?:-|[\w-\.]+))
in my config string, to avoid creating fork.
Use Kassner\ApacheLogParser
e.g.
2016/05/04 18:14:15 [error] 3075#0: *11552 open() "/var/www/bak/mantoucc/favicon.ico" failed (2: No such file or directory), client: 115.175.32.29, server: mantoucc.com, request: "GET /favicon.ico HTTP/1.1", host: "mantoucc.com", referrer: "http://mantoucc.com/"
'%r' => '(?P<request>(?:(?:[A-Z]+) .+? HTTP/(1|2).(?:0|1))|-|)',
The parser dies if either the referrer or agent in a log are blank (if they consist only of a pair of double quotes). There are not many cases where this happens (I had about six in a 3 million line test log file), but it does halt further processing.
As a workaround (which is probably slow) I change the blank referrer or agent before processing the line:
$line = str_replace('" ""','" " "',$line);
$line = str_replace('"" "','"-" "',$line);
Also, as you probably know (from the outstanding IPv6 issue), if there are any IPv6 addresses the parser will also fail. This includes even the localhost Ipv6 address, in lines like:
www.example.com:80 ::1 - - [27/Oct/2013:06:27:33 +0000] "OPTIONS * HTTP/1.0" 200 126 "-" "Apache/2.2.22 (Ubuntu) (internal dummy connection)"
A workaround for this is to search for the "::1 - -" and skip the line if it is present.
Hello there -- first off, thank you for building this and saving us all the trouble of building our own regular expressions to parse Apache's log files.
When I tried using this package on my actual real world Apache logs, it mostly worked. However, there were a number of different lines where it failed to parse logs and threw an exception in my program. Here's one example
My log format looks like this
$parser->setFormat('%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"');
Here's one line that failed to parse
199.195.254.38 - - [27/Sep/2020:19:27:26 +0000] "GET ../../proc/ HTTP" 400 506 "-" "-"
and here's a few others
240e:d9:d800:200::d4 - - [29/Sep/2020:19:52:18 +0000] "\x16\x03\x01" 501 290 "-" "-"
172.105.43.21 - - [30/Sep/2020:01:05:53 +0000] "\x16\x03\x01" 501 290 "-" "-"
Is there a way to configure this library to be less strict when trying to parse these log lines?
If not, do you have any time/interest in enhancing the functionality of this library so it can handle cases like these?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.