Giter Club home page Giter Club logo

Comments (7)

alaz avatar alaz commented on June 23, 2024 1

Published as 0.2.7 🚀

from legitbot.

kirichkov avatar kirichkov commented on June 23, 2024

Two solutions come to my mind:

  1. Evaluate ranges and discard the smaller overlapping ranges, pass the optimized array to SegmentTree
  2. Drop SegmentTree completely and iterate for each IP range, until match is found, or all entries are not a match.

I don't like either of these approaches, as they add overhead.

from legitbot.

alaz avatar alaz commented on June 23, 2024

Replying here to your comment and that above.

With the current implementation of SegmentTree using a sorted array and binary search, I don't see how this can be solved in here. Perhaps the list of IPs can be broken up beforehand into the least-common-denominator netmask and passed to SegmentTree as a list of same-sized netmask ranges.

There is a risk to inflate the list in this approach. It could be critical assuming some ranges in FB may be quite wide, especially in IPv6 space. This is also a reason why (2) "drop SegmentTree completely" is hardly an option.

Otherwise we could try to remove smaller ranges that have wider parents. What's on my mind is to sort the list and reduce it skipping those ranges that fit completely into a preceding wider one. This algo will not be able to normalize ranges that overlap partially (I hope FB will not output such a mess). This matches your proposal (1) as far as I understand and IMHO it's a viable approach. What do you think?

from legitbot.

kirichkov avatar kirichkov commented on June 23, 2024

There's also third way that might solve this:

Sort the array of ranges based on their end, as SegmentTree sorts it based on the start. Pass the sorted array, and bypass SegmentTree's sort:

SegmentTree.new(@sorted_ranges, true)

from legitbot.

kirichkov avatar kirichkov commented on June 23, 2024

There is a risk to inflate the list in this approach. It could be critical assuming some ranges in FB may be quite wide, especially in IPv6 space. This is also a reason why (2) "drop SegmentTree completely" is hardly an option.

Agreed! As I said I don't like either.

Otherwise we could try to remove smaller ranges that have wider parents. What's on my mind is to sort the list and reduce it skipping those ranges that fit completely into a preceding wider one. This algo will not be able to normalize ranges that overlap partially (I hope FB will not output such a mess). This matches your proposal (1) as far as I understand and IMHO it's a viable approach. What do you think?

That was also something I had in mind, but I was considering those partial matches as well, and accounting for those was why I discarded the idea. I also think there was some other bot that will need implementation for this workaround.

Let me know what you think of option 3, as well. So far this looks in my head as the most viable approach, with no added overhead. I'm just not sure whether it'll work in practice :-)

from legitbot.

alaz avatar alaz commented on June 23, 2024

I also think there was some other bot that will need implementation for this workaround.

IMHO it makes no sense to bother about them beforehand. That would be premature optimization. FB is rather unique in this regard so far.

Let me know what you think of option 3, as well. So far this looks in my head as the most viable approach, with no added overhead. I'm just not sure whether it'll work in practice :-)

This looks like the easiest approach, but it needs to be thought of thoroughly algorithm-wise... it takes time. This would also need careful testing on Legitbot's side. It's also worth mentioning that we shall depend on the knowledge of "segment_tree" internal structures then.

from legitbot.

kirichkov avatar kirichkov commented on June 23, 2024

It's also worth mentioning that we shall depend on the knowledge of "segment_tree" internal structures then.

I don't quite get what you mean by this? It's the same approach as used now, only the input array is pre-sorted, and SegmentTree is instructed not to sort it beforehand.

Regarding the other points - I agree. What I'm thinking is to cobble some tests and rewrite the facebook part and run it against the tests, and then against real-world ip data and see whether it works.

As a matter of fact this bug surfaced by pure coincidence. I think at one point it will resolve itself, when Facebook change their IP ranges, but who knows when that will be.

P.S.: This suggestion might be a possible solution too.

from legitbot.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.