Comments (8)
Thinking on this more, could the IP regex be delegated to stdlib's Resolv::AddressRegex
?
[7] pry(main)> Resolv::AddressRegex
=> /(?:(?-mix:\A((?x-mi:0
|1(?:[0-9][0-9]?)?
|2(?:[0-4][0-9]?|5[0-5]?|[6-9])?
|[3-9][0-9]?))\.((?x-mi:0
|1(?:[0-9][0-9]?)?
|2(?:[0-4][0-9]?|5[0-5]?|[6-9])?
|[3-9][0-9]?))\.((?x-mi:0
|1(?:[0-9][0-9]?)?
|2(?:[0-4][0-9]?|5[0-5]?|[6-9])?
|[3-9][0-9]?))\.((?x-mi:0
|1(?:[0-9][0-9]?)?
|2(?:[0-4][0-9]?|5[0-5]?|[6-9])?
|[3-9][0-9]?))\z))|(?:(?x-mi:
(?:(?x-mi:\A
(?:[0-9A-Fa-f]{1,4}:){7}
[0-9A-Fa-f]{1,4}
\z)) |
(?:(?x-mi:\A
((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?) ::
((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)
\z)) |
(?:(?x-mi:\A
((?:[0-9A-Fa-f]{1,4}:){6,6})
(\d+)\.(\d+)\.(\d+)\.(\d+)
\z)) |
(?:(?x-mi:\A
((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?) ::
((?:[0-9A-Fa-f]{1,4}:)*)
(\d+)\.(\d+)\.(\d+)\.(\d+)
\z))))/
EDIT: Yes, there is a good reason. The regex is anchored so wouldn't match anything in a logging context.
from logstop.
Hey @bjeanes, great suggestion. I created an ipv6
branch for this. Unfortunately, there's a pretty big performance hit (slows down log throughput by around 33%), so need to decide what to do. Also need to make it work with url encoding. Anyways, should should be able to use the regexp from the branch for your use case for now.
https://github.com/ankane/logstop/compare/ipv6
from logstop.
Hah!
Regexp.new(Resolv::AddressRegex.source.gsub('\A', '\b').gsub('\z', '\b'))
^^ that is exactly what I have just deployed to production already!
from logstop.
slows down log throughput by around 33%
I've deployed this and the slowdown is noticeable in web RPMs too.
so need to decide what to do
I run on Heroku so my logs go via STDOUT. I do wonder if I'm better off with a solution that filters the logs outside of the core Ruby process, even if backed by this gem. That should, at least in theory, allow for some better use of multiple cores...
from logstop.
I'm surprised it's noticeable at the RPM level. What difference are you seeing there?
from logstop.
from logstop.
Just ran benchmarks with Ruby 2.7 and latest code:
Warming up --------------------------------------
no ipv6 3.126k i/100ms
ipv6 1.938k i/100ms
Calculating -------------------------------------
no ipv6 34.599k (± 3.5%) i/s - 175.056k in 5.065833s
ipv6 19.874k (± 3.4%) i/s - 100.776k in 5.076570s
It still appears to reduce throughput significantly, but 20k iterations per second is still pretty fast and most of the time spent in an application is not in logging. Will plan to merge once IPv6 has more adoption.
Edit: another approach could be to use a less complex regex if common sources of IPs (like Rack::Request
) use a specific format.
from logstop.
Edit: another approach could be to use a less complex regex if common sources of IPs (like
Rack::Request
) use a specific format.
Yeah, or use a less complex regex regardless. It probably is better (from GDPR etc standpoint) to accidentally filter non-IPs but catch all actual IPs than to only catch IPs but take a large hit in performance. This could even be a configuration option, that swaps in a more accurate regex when the user opts into the trade-off of slower perf.
from logstop.
Related Issues (7)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from logstop.