Comments (6)
I've been running RITA with only a mocked conn log and haven't noticed anything break. I use this script to generate the data. I haven't tried it with real world data though.
https://gist.github.com/ethack/182fa4c1e6099f23acc31cd90874f6b8
from ipfix-rita.
I think the only issue will be with the meta database. Since we don't know when we're finished inserting into a a database, we make the metadb record when we make the first insert.
RITA could start analyzing a database while it is being written into which may produce oddities. (Not sure what exactly)
from ipfix-rita.
The main issue is inserting data into the database while RITA is analyzing. If the database is already analyzed, then nothing should break.
The simple solution is to have rita copy the input collections when starting analysis. However a copy could be very expensive. Need to benchmark. I have a suspicious that an identity based aggregation may be faster than .find().foreach(x => .insert(x)). Unfortunately, aggregations only work within the same database. That's not a deal breaker though. We could have conn-in, http-in, dns-in, and conn, http, dns collections all in the same database. One for collection for input and the other copied from the input collection for analysis.
from ipfix-rita.
If someone runs rita analyze
daily without specifying which database to analyze, we are guaranteed RITA will attempt to analyze all of the loaded databases at some point in the day.
If IPFIX-RITA inserts before the analyze command is run on the database, all is well.
If IPFIX-RITA inserts while the analyze command is being ran on the database, inconsistent results are likely to appear.
If IPFIX-RITA inserts after the analyze command is run on the database, the data will not be included in the analysis results.
If we had a flag in the MetaDB that marked whether a database was ready for analysis, this problem would be somewhat solved. rita analyze
would only pick up on databases ready to be analyzed, avoiding the analysis of loading databases. However, at some point IPFIX-RITA would need to mark a database ready for analysis. If we received the data in order, we could mark the database ready after the output stream timestamps cross midnight. Unfortunately, we do not receive the data in order. A threshold of so-long-after-midnight could probably be established, but more testing would need to be done to come up with a good value for that threshold.
The traditional Bro importer also suffers from this problem. If two instances of rita are run concurrently, one instance of rita could start an import, while the other instance runs analysis, leading to the same issues as described above. A ready to analyze flag would solve this case as well. In this case, the time to set the flag is clearly defined.
from ipfix-rita.
We can implement a ready to analyze flag by adding the field import_finished
to RITA's MetaDatabase database records.
Current MetaDB Database schema:
DBMetaInfo struct {
ID bson.ObjectId `bson:"_id,omitempty"` // Ident
Name string `bson:"name"` // Top level name of the database
Analyzed bool `bson:"analyzed"` // Has this database been analyzed
ImportVersion string `bson:"import_version"` // Rita version at import
AnalyzeVersion string `bson:"analyze_version"` // Rita version at analyze
}
How to Alter the Import Process
- Before a record is inserted into RITA, the appropriate MetaDatabase database record is created.
- Records are inserted into the database referenced by the MetaDatabase database record
- (new) When it is known that no more records will be inserted into the database referenced by the MetaDatabase record, the
import_finished
flag is set totrue
How to Alter the Analyze Process
- Loop over the databases registered in the MetaDatabase database collection
- If the database is already analyzed, remove it from consideration
- If the database is incompatible with the running version of rita, remove it from consideration
- (new) if the import process is still altering the database (
import_finished == true
), remove it from consideration
from ipfix-rita.
For IPFIX-RITA, it is difficult to know with certainty when incoming data corresponding with a database will stop.
Currently, IPFIX-RITA chooses which database to send a record to based on its closing timestamp. If the data does not arrive in order, which it usually does not, it is hard to determine ahead of time if any more records will be sent to a given database.
We make several assumptions to ease the decision making process:
- The system is efficient. The input buffer will not grow without bound, creating a larger lag between wall time and the timestamps of the input data.
- The data is loosely correlated in time.
- While the data may arrive somewhat out of order, there exists a large enough window, that when the timestamps are averaged within the given window, the timestamps grow monotonically with time.
- The closing timestamps of the data arriving at the collector will match either the current day or previous day by wall time
- NOTE: This assumption breaks current functionality when processing arbitrary YAF flows from pcap files
Using these assumptions we come up with the following:
We must insert a records into databases according to the following plan:
- Given a duration, d, a relative timestamp to act as a cutoff, rcutoff, a record's
closing timestamp, tclose, and the current time tcurrent- Calculate the current period, pcurrent as floor(tcurrent / d)
- Calculate the current relative timestamp, rcurrent as tcurrent % d
- Calculate the period of the closing timestamp, pclose as floor( tclose / d)
- If pclose == pcurrent:
- Insert the record into the current period's database
- Else If pclose == pcurrent - 1 AND rcurrent < rcutoff:
- Insert the record into the previous period's database
- Else
- Drop the record
If we follow this plan, then at tcurrent == d * pcurrent + rcutoff, we can set import_finished
to true for each period.
This plan allows a grace period for records from the previous period to make it into their period's database.
If we set d to 24 hours, the algorithm reads a bit more simply,
- If the date of the closing timestamp of the record is the current day,
- Insert the record into today's database
- If the date of the closing timestamp of the record is yesterday, and today's grace period has not elapsed,
- Insert the record into yesterday's database
- Else
- Drop the record
Then, for each day, we can set the import_finished
flag to true for yesterday's database after the grace period has elapsed.
from ipfix-rita.
Related Issues (20)
- Track down and fix config spacing issue HOT 1
- Support Netflow v5 HOT 1
- Add docs/ folder to the installer bundle
- Add "Generating a Release" to the Dev docs
- Write a Better Wrapper Script
- Modify install script HOT 3
- Add script which replays netflow/ ipfix data from a packet capture
- Provide an easy way to disable database rotation for testing
- Solve IPFIX issue with MikroTik Router Logs HOT 2
- Add Debugging to README
- Log Rollover Issue HOT 3
- Manual Changes
- Remove Version from tar file HOT 1
- Bump RITA Output Version HOT 1
- Reinstall May Change Docker Network HOT 3
- HOTFIX: Fix test cases HOT 1
- MongoDB/IPFIX-RITA Crashes On Reboot HOT 2
- Implement RITA freqConn HOT 1
- Add Time Stamps to IPFIX-RITA logs
- TZ variable is unset in environments where /etc/localtime is a file instead of a symlink
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ipfix-rita.