Giter Club home page Giter Club logo

gowatch's People

Contributors

fxnn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

vjeantet itd2007

gowatch's Issues

Support listing summarizer

The main summarizer we currently have is the GrokCounter, allowing to have a set of patterns (each with a name), which counts the occurences of each pattern.

Dovecot: Failed Login Attempts
==============================
5.196.31.23: 1
49.248.147.211: 1
52.6.24.186: 4
52.6.71.222: 3
52.6.130.221: 2
54.208.194.166: 1

Now, what I'd like to see is that we not just only have the number of occurences per pattern, but that we can also see what happened. In the above example, we could list the user names per IP.

Dovecot: Failed Login Attempts
==============================
5.196.31.23: webmaster
49.248.147.211: admin
52.6.24.186: joe, webmaster, admin, adm
52.6.71.222: adm, admin, joe
52.6.130.221: frank, joe
54.208.194.166: user

It's yet unclear to me how to specify the match to be displayed. The configuration for the GrokCounter is

- summarizer: count
  config: {
    '%{login_host}': 'auth\(%{PROG}\): %{PROG}\(%{USER},%{IPORHOST:login_host}\): unknown user'
  }

Guess we need a tuple or something, so that we can specify the pattern and the match to be displayed:

- summarizer: count
  config: {
    '%{login_host}': ['%{user}', 'auth\(%{PROG}\): %{PROG}\(%{USER:user},%{IPORHOST:login_host}\): unknown user']
  }

Unfortunately, tuples are bad to read. So, another map?

- summarizer: count
  config: {
    '%{login_host}': {
      list: '%{user}',
      for: 'auth\(%{PROG}\): %{PROG}\(%{USER:user},%{IPORHOST:login_host}\): unknown user'
    }
  }

Allow fixing timestamps that miss the year

Turns out that Syslog doesn't log the year.

Apr  5 05:39:01 lvps176-28-9-153 CRON[5773]: pam_unix(cron:session): session opened for user root by (uid=0)

Looks like we need to calculate the date. We could do this

  • using the timestamp of the logfile itself, or
  • using our current datetime, assuming the year is either the current year or, if the timestamp would be in future then, the year before.

Import configuration from URL

Setting up a configuration for many logfiles including expressive summaries is a hard piece of work to do, therefore, users will want to share pieces of configuration.

To facilitate sharing, we should make it as easy as possible. Therefore, one should be able to import configuration from an URL. Except for the remote location, everything should work as described in #6.

Supported protocols should be HTTP and HTTPS for the beginning. For faster execution, gowatch should cache the imported files between invocations. We should keep in mind that some users will want to use gowatch in an offline mode; we could support this by downloading once and never again -- maybe just by using the cache.

Allow for importing configuration files

Within a configuration file, it should be possible to import other configuration files by using an import statement on the top level.

Syntax should be as follows. At first, we should be able to import a single file.

import: /path/to/import.yml

Then, we should be able to import multiple files at once.

import:
- /path/to/import1.yml
- /path/to/import2.yml

The semantics shall be as follows. On importing a configuration file, every logfile defined therein must be added to the logfiles defined in the importing configuration file. Equally, every summary defined in the imported configuration file must be added to those defined in the importing configuration file.

Make predicate syntax more concise

Currently, predicates work like

where: {
  allof: [{
    field: Message,
    contains: some text
  }, {
    field: Message,
    matches: '%{SOME_PATTERN}'
  }]
}

This could be made a lot more concise by removing the "field" key and mapping fields to conditions instead:

where: {
  Message: {contains: some text, matches: '%{SOME_PATTERN}'},
  not: {Message: {matches: '%{ANOTHER_PATTERN}'}}
}

This of course is at the expense of prohibiting the custom fields not, allof, anyof. Guess we can live with that.

Use MapItems for summary configuration

The following summarizer is malfunctioning:

- summarizer: count
  title: Hosts of Discarded and Junk Mails
  where: {
    allof: {
      tags: {contains: 'mail.log'},
      SYSLOGPROG: {contains: 'dovecot'}
    }
  }
  config: {
    '%{msgid_host}': 'deliver\(%{USER:user}\): sieve: msgid=<%{DATA:msgid_nonhost}@%{IPORHOST:msgid_host}>: marked message to be discarded if not explicitly delivered',
    '%{msgid_host}': "deliver\\(%{USER:user}\\): sieve: msgid=<%{DATA:msgid_nonhost}@%{IPORHOST:msgid_host}>: stored mail into mailbox 'Junk'"
  }

The reason for this is the duplicate key in the config section. Currently, only one of both entries gets processed.

To resolve the problem, we must use YAML maps instead.

Support for parsing gzipped logfiles

In #22, we implement wildcards to support rotated logs. This mostly implies that we must be able to handle gzipped log files (transparently, i.e. just as if the file was uncompressed).

Remove log.Fatalf error handling

Guess it's bad style to have

log.Fatalf("Error message")
return logentry.AcceptNothingPredicate{} // actually never executed

all over the code. The decision to quit immediately should be made only at one place in code; until then, normal error handling (by return) should be used.

Add tag when predicate matches

While currently, one can only add tags to each line of a logfile, it should be possible to add tags only to lines matching a given predicate.

Though this is currently possible by parsing the same file multiple times with different predicates, that's not the way it's supposed to work.

Possible syntax would be

logfiles:
- filename: /path/to/file.log
  tags:
  - tag_for_each_line
  - tag_for_few_lines: {
      Message: { contains: some text }
    }

Alternatively, we could provide this feature only by using an extended mapping section, in addition to the existing logfiles and summary sections.

(Note, that this already uses the new predicate syntax from #4)

Support wildcards in logfile path

To be able to read rotated log files, we need to implement wildcards in the log file path.

Now, it would be a problem if we piped log files from several months (or even years) through gowatch just to find out that every single line is filtered out because of a timestamp predicate. Therefore, the parser should apply timestamp predicates to file modification timestamps also.

Allow for adding grok patterns

The config file should provide a means of adding grok patterns, ether by naming a file in the usual format

PATTERN_NAME (my)?p[at]tern

or by just providing patterns inside the yaml file. The syntax could be

patterns: {
    PATTERN_NAME: '(my)?p[at]tern'
}
patternsource: /path/to/patternfile

Make Grok the default

Currently, on parsing files with grok, one has to say parser: grok. Equaly, when summarizing log entries with grok, one has to say summarizer: grokcounter.

Currently, those seem to be the most useful to me, so we could just make them default.

Predicates on timestamps

Each logentry has a timestamp. One must be able to compare them, at least based on the current time, or even with some given timestamp.

Therefore, we could have predicates as:

timestamp: {before: "-2d"} // not newer than 2 days in past
timestamp: {after: "-1h30m"} // not older than 1 hour and 30 minutes
timestamp: {after: "2015-01-01T00:00:00Z"} // in 2015

Note that the last version would need to use some fixed timestamp format. RFC3339 aka ISO 8601 feels right here.

Make configuration keywords more intuitive

Currently, a configuration file looks as follows:

logfiles:

- filename: /var/log/auth.log
  tags: ['auth.log']
  timelayout: Stamp
  config: {pattern: '%{SYSLOGBASE} %{GREEDYDATA:Message}'}

summary:

- summarizer: count
  title: auth.log
  where: {tags: {contains: 'auth.log'}}
  config: {
    'sudo [%{user}->%{effective_user}] %{command}': '\s*%{USER:user}\s*: TTY=%{DATA:tty} ; PWD=%{PATH:pwd} ; USER=%{USER:effective_user} ; COMMAND=%{PATH:command}(: %{GREEDYDATA:arguments})?'
  }

Parts of it are made to be easy to read, like where: {tags: {contains: 'auth.log'}}. Everyone should know what's ment, and I also feel that it's quite intuitive and thus easy to write and remember.

This should be done with all keywords in the file (as far as possible). Ideas:

  • do: count (instead of summarizer)
  • with: {pattern: 'abc'} (instead of config)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.