Giter Club home page Giter Club logo

log-parser's People

Contributors

aaronweeden avatar drzippie avatar fiasco avatar gemorroj avatar jpwhite4 avatar kassner avatar kbsali avatar mrsecure avatar onnimonni avatar sophist-uk avatar tarjei avatar timandes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

log-parser's Issues

Read as pipeline

Hello,

I want to read the access.log as pipeline, if request file have some format need to do some stuffs.
It is posible?

custom nginx access log format

I am trying to parse custom nginx access logs and keep getting a Kassner\LogParser\FormatException. This exception is not terribly verbose and I can't figure out what I did wrong. Can someone explain why this isn't working?

//nginx format (as written in nginx.conf)
//'$remote_addr - $remote_user [$time_local] ($host) "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'

$parser = new \Kassner\LogParser\LogParser();

$parser->addPattern('%NGXREMOTEADDR', '(?P<remoteaddr>.+)');
$parser->addPattern('%NGXDT', '(?P<datetime>[\d+/ :]+)');
$parser->addPattern('%NGXHST', '(?P<host>.+)');
$parser->addPattern('%NGXREQ', '(?P<request>.+)');
$parser->addPattern('%NGXSTATUS', '(?P<status>.+)');
$parser->addPattern('%NGXBYTES', '(?P<response_bytes>.+)');
$parser->addPattern('%NGXREF', '(?P<referer>.+)');
$parser->addPattern('%NGXAGENT', '(?P<user_agent>.+)');

$parser->setFormat('%NGXREMOTEADDR - %u [%NGXDT] (%NGXHST) "%NGXREQ" %NGXSTATUS %NGXBYTES "%NGXREF" "%NGXAGENT"');

at first I tried to do it the "easy way" like so but also got a format exception from the following:

$parser->setFormat('%h - %u [%t] (%h) "%r" %>s %b "%{Referer}i" \\"%{User-Agent}i"');

Having looked into the code a bit, it seems that i might need to escape some literals. I'm trying something like the following with the same result:

$parser->setFormat('%h \- %u \[%t\] \(%v\) \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"');

going back to the readme, I noticed the "default nginx" example, which led me to the following, which is still throwing the exception:

$parser->setFormat('%h - %u %t (%v) "%r" %>s %b "%{Referer}i" \"%{User-Agent}i"');

This turned out to be inconsistencies in our log files and was in no way the fault of your code.

possibility to skip records

I'm working on some log analyzing and came across some uncool stuff on my live server which reads as follows:

111.222.333.444 - - [25/Jun/2017:11:51:35 +0200] "\x16\x03\x01" 400 0 "-" "-"

Is there a possibility to skip those records and keep on parsing the rest?

FormatException when line contains domain\\user_name

Hello,

i am getting Kassner\LogParser\FormatException when a line contains domain user name.

::1 - DOMAIN\\user_name [03/Feb/2023:18:50:42 +0100] "GET /app/index.php HTTP/1.1" 200 229931

Do you have any idea how to fix it please?

Thank you

Pointer

Is it possible to enable a pointer, so that when running this with crond at subsequent cycle it doesn't read the entire log file again? Thanks.

FormatException on common apache log format

Hi,

With the parser set to the common log format, I am getting a formatException with common log format string.

$line='31.32.246.3 - - [18/May/2019:09:26:02 -0400] "GET / HTTP/1.1" 200 3380 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0"';
$parser = new \Kassner\LogParser\LogParser();
$parser->setFormat('"%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"');
$entry = $parser->parse($line);

I get the same exception without changing the log format. From what I see the formats match, any clue why this could happen ?

Thanks

Apache 2.4.7 log format

Hi,

I am trying to read an apache log in the following format:

127.0.0.1 - - [11/Jan/2016:18:30:16 +1300] "GET / HTTP/1.1" 403 505 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/47.0.2526.73 Chrome/47.0.2526.73 Safari/537.36"

I am using the following format:

$parser->setFormat('%h - - %t "%r" %s %b "-" "%{User-Agent}i"');

I have also tried:

$parser->setFormat('%h %l %u %t "%r" %s %b "%{Referer}i" \"%{User-Agent}i"');

but each time i get the format exception error. Any chance you can help me on this? I have tried setting no parser (so as to use the default) but no luck there either.

Help with Pattern Matching?

This is related to issues #58 and #50.

As described in #58, I'm getting some malformed HTTP requests in my nginx server access log, like these:

162.243.128.19 - - [18/Feb/2023:06:38:30 -0500] "MGLNDD_70.110.25.35_80" 400 150 "-" "-"
159.65.204.184 - - [19/Feb/2023:02:54:04 -0500] "\x16\x03\x01\x00{\x01\x00\x00w\x03\x03\x03VJT\xE3REk\xFE\x89\x5C\xCE\xFF\xBBh\xAF\xA5}@t6\x9D\xBA\xAA3\x22rWR\xAC\xB8\x90\x00\x00\x1A\xC0/\xC0+\xC0\x11\xC0\x07\xC0\x13\xC0\x09\xC0\x14\xC0" 400 150 "-" "-"
192.241.225.22 - - [19/Feb/2023:05:16:37 -0500] "SSH-2.0-Go" 400 150 "-" "-"

I've modified my code using the fix suggested in #58 and described in #50:

$laxParser = new \Kassner\LogParser\LogParser();
$laxParser->setFormat('%h %l %u %t "%r" %>s %O "%{Referer}i" \"%{User-Agent}i"');
$laxParser->addPattern('%r', '(?P<request>.+)');
...
$entry = $laxParser->parse($line);

However, the lines above are still causing a FormatException to be thrown. Did I miss something, or is there another pattern match that's failing here? If the "addPattern" worked, it should match instances of 1 or more of any character in the request- right?

Fatal error when Apache field 3 is a quoted empty string

log-parser can't handle Apache log entries where the third field is "" rather than a dash. I see these log entries every so often.

88.99.141.3 - "" [18/Oct/2017:16:56:46 -0400] "GET / HTTP/1.1" ...

Trace:

PHP Fatal error:  Uncaught Kassner\LogParser\FormatException: 88.99.141.3 - "" [02/Jul/2017:21:35:06 -0400] "GET / HTTP/1.1" 200 16153 "http://example.com/" "Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0"
 in /whatever/Database/vendor/kassner/log-parser/src/LogParser.php:82
Stack trace:
#0 /whatever/Database/import-log.php(21): Kassner\LogParser\LogParser->parse('88.99.141.3 - "...')
#1 /whatever/Database/import-log.php(40): parseFile(Object(Kassner\LogParser\LogParser), 'logs/./logs09/a...')
#2 {main}
  thrown in /whatever/Database/vendor/kassner/log-parser/src/LogParser.php on line 82

Create a tag

Hi there,

Great job on the lib. Works perfectly! However, would you mind tagging a stable version of the lib? As it enables composer cache.

Thanks !

Error when trying to parse hexadecimal?

Ran this:

<?php $loader = require __DIR__ . '/vendor/autoload.php'; $parser = new \Kassner\LogParser\LogParser(); $parser->setFormat('%h %l %u %t "%r" %>s %O "%{Referer}i" \"%{User-Agent}i"'); $lines = file('/var/log/nginx/access.log', FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES); foreach ($lines as $line) { $entry = $parser->parse($line); } ?>

and got this:

#0 /home/xxx/analiticas/analytics.php(7): Kassner\LogParser\LogParser->parse('xxx.xxx.xxx.xxx -...') #1 {main} thrown in /home/v/xxx/vendor/kassner/log-parser/src/LogParser.php on line 82 [29-Sep-2020 19:12:06 America/New_York] PHP Fatal error: Uncaught Kassner\LogParser\FormatException: xxx.xxx.xxx.xxx - - [29/Sep/2020:04:37:17 -0400] "\x03\x00\x00/*\xE0\x00\x00\x00\x00\x00Cookie: mstshash=Administr" 400 157 "-" "-" "-" in /home/xxx/axxx/vendor/kassner/log-parser/src/LogParser.php:82 Stack trace: #0 /home/xxx/analiticas/analytics.php(7): Kassner\LogParser\LogParser->parse('xxx.xxx.xxx.xxx -...') #1 {main} thrown in /home/xxx/analiticas/vendor/kassner/log-parser/src/LogParser.php on line 82

Syslog support?

Is there a way to use this to parse syslog? I get this went attempting to run it against syslog:

Exception: Apr  2 00:06:46 chris-x1 kernel: [347472.787552] mce: CPU6: Package temperature above threshold, cpu clock throttled (total events = 393711)
In [/var/www/personal/easyupp/vendor/kassner/log-parser/src/LogParser.php, line 82]

2020-04-02 22:40:00 Error: [Kassner\LogParser\FormatException] Apr  2 00:06:46 chris-x1 kernel: [347472.787552] mce: CPU6: Package temperature above threshold, cpu clock throttled (total events = 393711) in /var/www/personal/easyupp/vendor/kassner/log-parser/src/LogParser.php on line 82
Stack Trace:
- /var/www/personal/easyupp/src/Command/ReadSysLogCommand.php:21
- /var/www/personal/easyupp/vendor/cakephp/cakephp/src/Console/BaseCommand.php:175
- /var/www/personal/easyupp/vendor/cakephp/cakephp/src/Console/CommandRunner.php:336
- /var/www/personal/easyupp/vendor/cakephp/cakephp/src/Console/CommandRunner.php:171
- /var/www/personal/easyupp/bin/cake.php:12


Fatal error when POST is not uppercase

This real-life Apache log entry caused log-parser to crash on parse():

62.210.177.12 - - [05/Apr/2020:06:58:22 -0400] "post /wp-json/trx_addons/v2/get/sc_layout?sc=wp_insert_user&role=administrator&user_login=ndvtzaifnz&user_pass=6Wlh6SA0RT HTTP/1.1" 301 800 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36"

When post is changed to POST, log-parser works fine.

(I'm guessing that this log entry was produced by a WordPress hacking attempt.)

Support for apache error log?

Can this parser parse apache error logs?
Error logs have a slightly different format for recording time [Fri Dec 16 01:46:23 2005] [error] [client 1.2.3.4] Directory index forbidden by rule: /home/test/ then what I can see is expected in access logs [13/Sep/2006:07:01:53 -0700]

responseBytes missing (-)

Hello, i wonder if someone could help me figure this out:
When a file is downloaded via PHP script Apache doesn't log "responseBytes" instead it shows "-"
How could i make Apache log it properly?

Apache combined log - parsing throws exception

I tried to parse the apache 'combined' log file as used in debian and other distros by default

Apache config:
LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined

Log-Parser:
$parser->setFormat('"%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\""');
$parser->setFormat('%h %l %u %t "%r" %>s %O "%{Referer}i" \\"%{User-Agent}i"');

Couldn't get it to work with the formats used above. What am I doing wrong?

Format Exception with Default Nginx Format

I've set my parser to use the default Nginx format as described on the project README:

$parser->setFormat('%h %l %u %t "%r" %>s %O "%{Referer}i" "%{User-Agent}i"');

A line in my nginx log is causing a format exception to be thrown:

PHP Fatal error: Uncaught Kassner\LogParser\FormatException: 162.243.142.22 - - [17/Feb/2023:02:50:10 -0500] "MGLNDD_1.2.3.4_80" 400 150 "-" "-" in /test/vendor/kassner/log-parser/src/LogParser.php:83
Stack trace:
#0 /test/test.php(12): Kassner\LogParser\LogParser->parse()
#1 {main}
thrown in /test/vendor/kassner/log-parser/src/LogParser.php on line 83

I don't see how this line fails to match the specified format. Am I missing something?

Fatal error: Uncaught Kassner\LogParser\FormatException

Fatal error: Uncaught Kassner\LogParser\FormatException: 78.129.237.156 - - [10/Jun/2020:00:01:31 +0200] "GET /?action=display&bridge=FB2&u=complexdoc&format=Mrss HTTP/1.1" 200 81701 "-" "RSSMix/3.0" "78.129.237.156" in /var/www/html/admin/vendor/kassner/log-parser/src/LogParser.php:82 Stack trace: #0 /var/www/html/admin/access.log.php(16): Kassner\LogParser\LogParser->parse('78.129.237.156 ...') #1 {main} thrown in /var/www/html/admin/vendor/kassner/log-parser/src/LogParser.php on line 82

nginx.conf:

    log_format  main  '$remote_addr - $remote_user [$time_local] "$host" "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for"';

php file:

//'$remote_addr - $remote_user [$time_local] "$host" "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for"';
$parser->setFormat('%a - %u [%t] "%v" "%r" %>s %O "%{Referer}i" \"%{User-Agent}i" "{X-Forwarded-For}i"');

access.log:

172.68.189.227 - - [10/Jun/2020:17:46:46 +0200] "minopia.de" "POST / HTTP/1.1" 405 575 "-" "Mozlila/5.0 (Linux; Android 7.0; SM-G892A Bulid/NRD90M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/60.0.3112.107 Moblie Safari/537.36" "13.91.103.115"

garbage columns

A log file contains some garbage columns that i don't need to work with. How do I skip the bad columns?

400 Bad Request

It is necessary to cover bad requests like this:

// Format: %h %l %u %t \"%r\" %>s %O "%{Referer}i" "%{User-Agent}i"
54.232.125.255 - - [07/Oct/2013:04:14:01 +0000] "" 400 0 "-" "-"

Although it is an invalid request, it is a valid log entry, and should be parsed correctly.

User name may contain dots, i.e. be a domain name

User field in my log contains dots, because users are web sites via api, and logins are domain names.
Nginx is OK with this, so any log parsing software must support this.

Now I changed 'user' field from %u to (?P<user>(?:-|[\w-\.]+)) in my config string, to avoid creating fork.

how about nginx error.log?

e.g.

2016/05/04 18:14:15 [error] 3075#0: *11552 open() "/var/www/bak/mantoucc/favicon.ico" failed (2: No such file or directory), client: 115.175.32.29, server: mantoucc.com, request: "GET /favicon.ico HTTP/1.1", host: "mantoucc.com", referrer: "http://mantoucc.com/"

HTTP/2 ?

'%r' => '(?P<request>(?:(?:[A-Z]+) .+? HTTP/(1|2).(?:0|1))|-|)',

Blank referrers and agents

The parser dies if either the referrer or agent in a log are blank (if they consist only of a pair of double quotes). There are not many cases where this happens (I had about six in a 3 million line test log file), but it does halt further processing.

As a workaround (which is probably slow) I change the blank referrer or agent before processing the line:

$line = str_replace('" ""','" " "',$line);
$line = str_replace('"" "','"-" "',$line);

Also, as you probably know (from the outstanding IPv6 issue), if there are any IPv6 addresses the parser will also fail. This includes even the localhost Ipv6 address, in lines like:

www.example.com:80 ::1 - - [27/Oct/2013:06:27:33 +0000] "OPTIONS * HTTP/1.0" 200 126 "-" "Apache/2.2.22 (Ubuntu) (internal dummy connection)"

A workaround for this is to search for the "::1 - -" and skip the line if it is present.

Throws Exceptions when Encountering Real World Logs

Hello there -- first off, thank you for building this and saving us all the trouble of building our own regular expressions to parse Apache's log files.

When I tried using this package on my actual real world Apache logs, it mostly worked. However, there were a number of different lines where it failed to parse logs and threw an exception in my program. Here's one example

My log format looks like this

$parser->setFormat('%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"');

Here's one line that failed to parse

199.195.254.38 - - [27/Sep/2020:19:27:26 +0000] "GET ../../proc/ HTTP" 400 506 "-" "-"

and here's a few others

240e:d9:d800:200::d4 - - [29/Sep/2020:19:52:18 +0000] "\x16\x03\x01" 501 290 "-" "-"

172.105.43.21 - - [30/Sep/2020:01:05:53 +0000] "\x16\x03\x01" 501 290 "-" "-"

Is there a way to configure this library to be less strict when trying to parse these log lines?

If not, do you have any time/interest in enhancing the functionality of this library so it can handle cases like these?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.