Needed RITA Changes Current Status It is curre

Document Needed Changes to RITA about ipfix-rita HOT 6 CLOSED

activecm commented on June 4, 2024

Document Needed Changes to RITA

from ipfix-rita.

Comments (6)

ethack commented on June 4, 2024

I've been running RITA with only a mocked conn log and haven't noticed anything break. I use this script to generate the data. I haven't tried it with real world data though.
https://gist.github.com/ethack/182fa4c1e6099f23acc31cd90874f6b8

from ipfix-rita.

Zalgo2462 commented on June 4, 2024

I think the only issue will be with the meta database. Since we don't know when we're finished inserting into a a database, we make the metadb record when we make the first insert.

RITA could start analyzing a database while it is being written into which may produce oddities. (Not sure what exactly)

from ipfix-rita.

Zalgo2462 commented on June 4, 2024

The main issue is inserting data into the database while RITA is analyzing. If the database is already analyzed, then nothing should break.

The simple solution is to have rita copy the input collections when starting analysis. However a copy could be very expensive. Need to benchmark. I have a suspicious that an identity based aggregation may be faster than .find().foreach(x => .insert(x)). Unfortunately, aggregations only work within the same database. That's not a deal breaker though. We could have conn-in, http-in, dns-in, and conn, http, dns collections all in the same database. One for collection for input and the other copied from the input collection for analysis.

https://stackoverflow.com/questions/10624964/whats-the-fastest-way-to-copy-a-collection-within-the-same-database

from ipfix-rita.

Zalgo2462 commented on June 4, 2024

If someone runs rita analyze daily without specifying which database to analyze, we are guaranteed RITA will attempt to analyze all of the loaded databases at some point in the day.

If IPFIX-RITA inserts before the analyze command is run on the database, all is well.
If IPFIX-RITA inserts while the analyze command is being ran on the database, inconsistent results are likely to appear.
If IPFIX-RITA inserts after the analyze command is run on the database, the data will not be included in the analysis results.

If we had a flag in the MetaDB that marked whether a database was ready for analysis, this problem would be somewhat solved. rita analyze would only pick up on databases ready to be analyzed, avoiding the analysis of loading databases. However, at some point IPFIX-RITA would need to mark a database ready for analysis. If we received the data in order, we could mark the database ready after the output stream timestamps cross midnight. Unfortunately, we do not receive the data in order. A threshold of so-long-after-midnight could probably be established, but more testing would need to be done to come up with a good value for that threshold.

The traditional Bro importer also suffers from this problem. If two instances of rita are run concurrently, one instance of rita could start an import, while the other instance runs analysis, leading to the same issues as described above. A ready to analyze flag would solve this case as well. In this case, the time to set the flag is clearly defined.

from ipfix-rita.

Zalgo2462 commented on June 4, 2024

We can implement a ready to analyze flag by adding the field import_finished to RITA's MetaDatabase database records.

Current MetaDB Database schema:

DBMetaInfo struct {
	ID             bson.ObjectId `bson:"_id,omitempty"`   // Ident
	Name           string        `bson:"name"`            // Top level name of the database
	Analyzed       bool          `bson:"analyzed"`        // Has this database been analyzed
	ImportVersion  string        `bson:"import_version"`  // Rita version at import
	AnalyzeVersion string        `bson:"analyze_version"` // Rita version at analyze
}

How to Alter the Import Process

Before a record is inserted into RITA, the appropriate MetaDatabase database record is created.
Records are inserted into the database referenced by the MetaDatabase database record
(new) When it is known that no more records will be inserted into the database referenced by the MetaDatabase record, the import_finished flag is set to true

How to Alter the Analyze Process

Loop over the databases registered in the MetaDatabase database collection
- If the database is already analyzed, remove it from consideration
- If the database is incompatible with the running version of rita, remove it from consideration
- (new) if the import process is still altering the database (import_finished == true), remove it from consideration

from ipfix-rita.

Zalgo2462 commented on June 4, 2024

For IPFIX-RITA, it is difficult to know with certainty when incoming data corresponding with a database will stop.

Currently, IPFIX-RITA chooses which database to send a record to based on its closing timestamp. If the data does not arrive in order, which it usually does not, it is hard to determine ahead of time if any more records will be sent to a given database.

We make several assumptions to ease the decision making process:

The system is efficient. The input buffer will not grow without bound, creating a larger lag between wall time and the timestamps of the input data.
The data is loosely correlated in time.
- While the data may arrive somewhat out of order, there exists a large enough window, that when the timestamps are averaged within the given window, the timestamps grow monotonically with time.
- The closing timestamps of the data arriving at the collector will match either the current day or previous day by wall time
  - NOTE: This assumption breaks current functionality when processing arbitrary YAF flows from pcap files

Using these assumptions we come up with the following:

We must insert a records into databases according to the following plan:

Given a duration, d, a relative timestamp to act as a cutoff, r_cutoff, a record's
closing timestamp, t_close, and the current time t_current
- Calculate the current period, p_current as floor(t_current / d)
- Calculate the current relative timestamp, r_current as t_current % d
- Calculate the period of the closing timestamp, p_close as floor( t_close / d)
- If p_close == p_current:
  - Insert the record into the current period's database
- Else If p_close == p_current - 1 AND r_current < r_cutoff:
  - Insert the record into the previous period's database
- Else
  - Drop the record

If we follow this plan, then at t_current == d * p_current + r_cutoff, we can set import_finished to true for each period.

This plan allows a grace period for records from the previous period to make it into their period's database.

If we set d to 24 hours, the algorithm reads a bit more simply,

If the date of the closing timestamp of the record is the current day,
- Insert the record into today's database
If the date of the closing timestamp of the record is yesterday, and today's grace period has not elapsed,
- Insert the record into yesterday's database
Else
- Drop the record

Then, for each day, we can set the import_finished flag to true for yesterday's database after the grace period has elapsed.

from ipfix-rita.

Document Needed Changes to RITA about ipfix-rita HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent