Giter Club home page Giter Club logo

legitbot's People

Contributors

ajoneil avatar ajwgibson avatar alaz avatar allaud avatar dlackty avatar github-actions[bot] avatar kirichkov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

legitbot's Issues

NoMethodError: undefined method `empty?' for nil:NilClass

Thanks for this gem, I have seen a lot of error logs recently like this:

NoMethodError: undefined method `empty?' for nil:NilClass
…ms/legitbot-1.0.0/lib/legitbot/validators/ip_ranges.rb:   43:in `valid_ip?'
…ms/legitbot-1.0.0/lib/legitbot/validators/ip_ranges.rb:   20:in `valid_ip?'

I think it is caused by the strange ip of the visitor. Can I handle the null value?

Petalbot tests are failing

Hey @allaud ,

Petalbot tests are failing: https://github.com/alaz/legitbot/runs/2649297975

  1) Failure:
PetalbotTest#test_valid_ip [/home/runner/work/legitbot/legitbot/test/petalbot_test.rb:16]:
{:msg=>"114.119.153.50 is a valid Petalbot IP"}

  2) Failure:
PetalbotTest#test_valid_ua [/home/runner/work/legitbot/legitbot/test/petalbot_test.rb:34]:
{:msg=>"Valid Petalbot"}

These pages show 404:

Do you know if this bot still operates?

Resolv issues with googlebot sometimes

Thanks again for this project :)

I've been getting this sometimes now:
DNS result has no information for crawl-95-216-33-117.googlebot.com"

I can rescue nil inside the rack_attack Legitbot.bot call, but would love to solve the actual problem as well.

It's strange that it says "no information" but then clearly has resolved it to crawl-95-216-33-117.googlebot.com

Hmm, maybe reverse-dns is working to get the address, but then it's not able to ping it?

The IP reported in my error logs is in fact 95.216.33.117


/usr/local/rvm/rubies/ruby-2.7.2/lib/ruby/2.7.0/resolv.rb:379:in `getaddress'
/u/apps/ap.next/shared/bundle/ruby/2.7.0/gems/legitbot-1.5.1/lib/legitbot/validators/domains.rb:66:in `reverse_ip'
/u/apps/ap.next/shared/bundle/ruby/2.7.0/gems/legitbot-1.5.1/lib/legitbot/validators/domains.rb:48:in `valid_domain?'
/u/apps/ap.next/shared/bundle/ruby/2.7.0/gems/legitbot-1.5.1/lib/legitbot/validators/domains.rb:22:in `valid_domain?'
/u/apps/ap.next/shared/bundle/ruby/2.7.0/gems/legitbot-1.5.1/lib/legitbot/botmatch.rb:25:in `valid?'
/u/apps/ap.next/shared/bundle/ruby/2.7.0/gems/legitbot-1.5.1/lib/legitbot/botmatch.rb:29:in `fake?'
/u/apps/ap.next/current/config/initializers/rack_attack.rb:16:in `block in '
/u/apps/ap.next/shared/bundle/ruby/2.7.0/gems/rack-attack-6.5.0/lib/rack/attack/check.rb:15:in `matched_by?'
/u/apps/ap.next/shared/bundle/ruby/2.7.0/gems/rack-attack-6.5.0/lib/rack/attack/configuration.rb:72:in `block in blocklisted?'
/u/apps/ap.next/shared/bundle/ruby/2.7.0/gems/rack-attack-6.5.0/lib/rack/attack/configuration.rb:72:in `any?'
/u/apps/ap.next/shared/bundle/ruby/2.7.0/gems/rack-attack-6.5.0/lib/rack/attack/configuration.rb:72:in `blocklisted?'
/u/apps/ap.next/shared/bundle/ruby/2.7.0/gems/rack-attack-6.5.0/lib/rack/attack.rb:107:in `call'
/u/apps/ap.next/shared/bundle/ruby/2.7.0/gems/newrelic_rpm-8.3.0/lib/new_relic/agent/instrumentation/middleware_tracing.rb:100:in `call'
/u/apps/ap.next/shared/bundle/ruby/2.7.0/gems/rack-2.2.3/lib/rack/tempfile_reaper.rb:15:in `call'
/u/apps/ap.next/shared/bundle/ruby/2.7.0/gems/newrelic_rpm-8.3.0/lib/new_relic/agent/instrumentation/middleware_tracing.rb:100:in `call'

FacebookBot mislabeled as fake

I'm starting to see Facebook Bot being labeled as a fake search engine, when in reality the IP address is genuine and I think the issue here is in the SegmentTree being built.

The IP in question is 69.171.251.1

I get the following in the console:

irb> ranges = Legitbot::Facebook.reload!
=> {:ipv4=>SegmentTree(31.13.24.0..204.15.23.255), :ipv6=>SegmentTree(2401:db00::..2a03:2887:ff34:ffff:ffff:ffff:ffff:ffff)}
irb> ranges[:ipv4].find(IPAddr.new('69.171.251.1'))
=> nil

On the other hand:

irb> ip = IPAddr.new('31.13.24.0')
=> #<IPAddr: IPv4:31.13.24.0/255.255.255.255>
irb> ranges[:ipv4].find(ip)
=> #<SegmentTree::Segment:0x00000000085acdc0 @range=#<IPAddr: IPv4:31.13.24.0/255.255.248.0>..#<IPAddr: IPv4:31.13.31.255/255.255.248.0>, @value=true>

On one hand the IPv4 SegmentTree range is too broad, but despite that the valid IP address is not returned and a legitimate bot is labeled as a fake one and thus being blocked.

NoMethodError: undefined method `index' for nil:NilClass

Thank you for this gem, I got an error from log, I can not get user-agent of client, Is there any way to avoid such errors?

NoMethodError: undefined method `index' for nil:NilClass


…uby/2.6.0/gems/legitbot-1.0.0/lib/legitbot/legitbot.rb:   23:in `block(2 levels) in bot'
…uby/2.6.0/gems/legitbot-1.0.0/lib/legitbot/legitbot.rb:   23:in `any?'
…uby/2.6.0/gems/legitbot-1.0.0/lib/legitbot/legitbot.rb:   23:in `block in bot'
…uby/2.6.0/gems/legitbot-1.0.0/lib/legitbot/legitbot.rb:   23:in `select'
…uby/2.6.0/gems/legitbot-1.0.0/lib/legitbot/legitbot.rb:   23:in `bot'
…200810084721/app/controllers/application_controller.rb:  164:in `check_robot'

Possible Facebook RADB source issue?

We have been seeing errors for Facebook crawlers in the last couple of days. Walking through the code it seems to fail when source is not provided to the Irrc client:

client.query :radb, 'AS32934'
result = client.perform
Connecting to whois.radb.net
Processing AS32934
Executing "!s-*"
Got "F One or more selected sources are unavailable.
"
'!s-*' failed on 'whois.radb.net' (F One or more selected sources are unavailable.). when processing AS32934 for AS32934
No more queries
Closing a connection to whois.radb.net
Queue 0 guard objects
=> {}

Once a source is provided it seems to behave more as expected:

client.query :radb, 'AS32934', source: :radb
result = client.perform
Connecting to whois.radb.net
Processing AS32934
Executing "!sradb"
Got "C
"
Queue new 0 queries
No more queries
Closing a connection to whois.radb.net
Queue 0 guard objects
=> {"AS32934"=>
  {:ipv4=>
    {"AS32934"=>
      ["31.13.24.0/21",
       "31.13.64.0/18",
       "31.13.64.0/19",

Could something have changed with the service?

Facebook bot makes request as soon as gem is required

Hey, thanks for your work on this gem. I've noticed something while running test and I think it may require a change to the internals of Legitbot.

As it stands, legitbot will make actual web requests as soon as it's required, even before any calls to bot.valid? are called, because the Facebook bot matcher loads ValidIPs in the class declaration. Is there any way around this?

Is API ready for 1.0 ?

Dear users of legitbot,

What do you think about public Legitbot API ?
Do you want to improve it in any way before releasing version 1.0?
Is Legitbot ready for 1.0 release?

Regards,
Alexander.

iMessageBot

I'm getting a lot of hits like this that are being blocked by my rack-attack setup as you suggest:

E, [2022-02-03T06:53:01.889058 #1133986] ERROR -- : blocklist 47.155.9.106 GET / "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/601.2.4 (KHTML, like Gecko) Version/9.0.1 Safari/601.2.4 facebookexternalhit/1.1 Facebot Twitterbot/1.0"

Not sure how to get the proper list of IPs it's using? Here's ones I've seen:
172.91.121.17
70.181.168.184
47.155.9.106
216.150.126.58
108.7.233.172
108.54.49.32
71.227.168.241
73.228.203.166
162.235.153.62
107.184.85.25
174.208.224.248

.. and probably more, it's a lot of IPs

Here's an article about it
https://medium.com/@siggi/apples-imessage-impersonates-twitter-facebook-bots-when-scraping-cef85b2cbb7d

Valid bingbot detected as fake due to multiple DNS names

I'm seeing the following behavior: 157.55.39.132 is being identified as a fake bingbot, but it is indeed legitimate - verified by both the bing verification tool and the hostname contain "search.msn.net."

I've identified the issue to be that the IP address has two reverse pointers:

Non-authoritative answer:
132.39.55.157.in-addr.arpa	name = po18-218.co2-6nf-srch-2b.ntwk.msn.net.
132.39.55.157.in-addr.arpa	name = msnbot-157-55-39-132.search.msn.com.

The issue stems from the usage of getname instead of getnames at

@reverse_domain ||= @dns.getname(@ip)
.

Changing this, will require substantial changes as all dependent code will have to start working with array of strings, as opposed to a single string.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.