Giter Club home page Giter Club logo

Comments (30)

azurit avatar azurit commented on June 3, 2024

Hi,
if i understand you correctly: Most of the requests are handled by CDN but some of them are redirected (via proxy) to your site, is that right?

from fake-bot-plugin.

dune73 avatar dune73 commented on June 3, 2024

This is what RIPE says about this client IP: https://apps.db.ripe.net/db-web-ui/query?searchtext=137.74.122.3

Why would it use bingbot as user-agent? So whatever happens with the redirect (which is a bit odd), this seems to be a legitimate alert.

What is also puzzling for me is that the client and the server (cdn.achatpc.com) are both in France and seem to be hosted by the same company.

So what is happening here exactly?

from fake-bot-plugin.

lifeforms avatar lifeforms commented on June 3, 2024

Is your machine behind a reverse proxy, and is this the IP address of the reverse proxy: 137.74.122.3?

If so, you should use mod_remoteip (if you use Apache) to trust your reverse proxies and read the X-Forwarded-For header, so that your webserver and WAF see the real connecting address.

from fake-bot-plugin.

dune73 avatar dune73 commented on June 3, 2024

Good thinking. @azurit this might be something for the readme.

from fake-bot-plugin.

 avatar commented on June 3, 2024

Bonjour, si je vous comprends bien : la plupart des requêtes sont gérées par CDN mais certaines d'entre elles sont redirigées (via proxy) vers votre site, n'est-ce pas ?

Yes, you right. Static content and media content are via CDN

from fake-bot-plugin.

 avatar commented on June 3, 2024

Votre machine est-elle derrière un reverse proxy, et est-ce l'adresse IP du reverse proxy : 137.74.122.3 ?

Si tel est le cas, vous devez utiliser mod_remoteip (si vous utilisez Apache) pour faire confiance à vos proxys inverses et lire l' X-Forwarded-Foren-tête, afin que votre serveur Web et WAF voient la véritable adresse de connexion.

Yes, i dont know this module "mod_remoteip". I can have a look about. Thank you.

from fake-bot-plugin.

azurit avatar azurit commented on June 3, 2024

Yes, you right. Static content and media content are via CDN

As @dune73 stated, there's no reason for CDN to copy the User-Agent header - i assume that CDN is accessing only uncached files so requests to your site has only caching purpose.

from fake-bot-plugin.

 avatar commented on June 3, 2024

This is what RIPE says about this client IP: https://apps.db.ripe.net/db-web-ui/query?searchtext=137.74.122.3

Why would it use bingbot as user-agent? So whatever happens with the redirect (which is a bit odd), this seems to be a legitimate alert.

What is also puzzling for me is that the client and the server (cdn.achatpc.com) are both in France and seem to be hosted by the same company.

So what is happening here exactly?

OVH confirmed me and gave me the list of internal traffic IPs used for the CDN. This is a good part of it.

from fake-bot-plugin.

dune73 avatar dune73 commented on June 3, 2024

OK, got it. The response from @lifeforms and the confirmation from OVH make sense. You are seeing a false positive induced by the CDN that forwards the bingbot's request with its own IP address. The fake-bot-plugin then resolves the CDN's IP as a non-bing IP address and flags it as fake-bot.

Definitely a problem I did not anticipate.

from fake-bot-plugin.

 avatar commented on June 3, 2024

Thank you all, for help.

I have solved with mod_reoteip. I have now the ip client source on headers and logs.

But,

I can now see this error:

[Mon Feb 21 14:31:56.702061 2022] [:error] [pid 1003560:tid 139909004248832] [client 66.249.66.219:0] [client 66.249.66.219] ModSecurity: Warning. Fake Bot Plugin: Detected fake Googlebot. [file "/etc/modsecurity/plugins/fake-bot-after.conf"] [line "27"] [id "9504110"] [msg "Fake bot detected: Googlebot"] [data "Matched Data: googlebot found within REQUEST_HEADERS:User-Agent: Googlebot-Image/1.0"] [severity "CRITICAL"] [ver "fake-bot-plugin/1.0.0"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-bot"] [tag "capec/1000/225/22/77/13"] [tag "PCI/6.5.10"] [tag "paranoia-level/1"] [hostname "cdn.achatpc.com"] [uri "/media/catalog/product/2/7/27003969111020_5173564641.jpg"] [unique_id "YhOUTPQirk8Eh64A9KR43AAAF7c"]

When i check ip 66.249.66.219, i can see that is a google IP.

I use some services from google. Google Ads, google merchant center. I think that merchant center can have other ip's adress for check catalog content (images, links,...). Here, this is Googlebot-Image agent.

from fake-bot-plugin.

azurit avatar azurit commented on June 3, 2024

@dune73 I don't think that CDN should copy the User-Agent header - it is only caching content so, next time, it can be served locally.

from fake-bot-plugin.

azurit avatar azurit commented on June 3, 2024

@achatpc Googlebot-Image is working fine here (not blocked) so it seems that your modsecurity still see a real IP address. Have you checked if you are able to to do a reverse DNS resolve of IP 66.249.66.219 on the server, where your site is running?

from fake-bot-plugin.

azurit avatar azurit commented on June 3, 2024

@achatpc Are you willing to do some debug?

from fake-bot-plugin.

dune73 avatar dune73 commented on June 3, 2024

@azurit : copying is probably the wrong term. Let's say it forwards the request via a new tcp connection with it's own IP as source IP.

Keeping my fingers crossed, that Lua sees the correct REMOTE_ADDR.

from fake-bot-plugin.

 avatar commented on June 3, 2024

host 66.249.66.219
219.66.249.66.in-addr.arpa domain name pointer crawl-66-249-66-219.googlebot.com.

But i have other error before:

[Mon Feb 21 15:15:22.022090 2022] [:error] [pid 1020408:tid 139849227036416] [client 176.83.218.174:0] [client 176.83.218.174] ModSecurity: collections_remove_stale: Failed to access DBM file "/var/cache/modsecurity/magento_user-global": Permission denied [hostname "cdn.achatpc.com"] [uri "/static/version1645361515/frontend/Emipro/achatpc/fr_FR/Amasty_Scroll/images/loader.svg"] [unique_id "YhOeehcI88A3x7BZk3E_nQAAGDM"], referer: https://www.achatpc.com/

from fake-bot-plugin.

 avatar commented on June 3, 2024

@azurit : copying is probably the wrong term. Let's say it forwards the request via a new tcp connection with it's own IP as source IP.

Keeping my fingers crossed, that Lua sees the correct REMOTE_ADDR.

If Lua see access log, yes because i have change Apache log format:

Before:

LogFormat "%v:%p %h %l %u %t "%r" %>s %O "%{Referer}i" "%{User-Agent}i"" vhost_combined
LogFormat "%h %l %u %t "%r" %>s %O "%{Referer}i" "%{User-Agent}i"" combined
LogFormat "%h %l %u %t "%r" %>s %O" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent

After:

LogFormat "%v:%p %a %l %u %t "%r" %>s %O "%{Referer}i" "%{User-Agent}i"" vhost_combined
LogFormat "%a %l %u %t "%r" %>s %O "%{Referer}i" "%{User-Agent}i"" combined
LogFormat "%a %l %u %t "%r" %>s %O" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent

%a instead %h print remote_addr

from fake-bot-plugin.

azurit avatar azurit commented on June 3, 2024

@achatpc Please edit file fake-bot.lua and add this on line 63 (almost at the end of file):
m.log(2, string.format("Fake Bot Plugin DEBUG: REMOTE_ADDR: %s REMOTE_HOST: %s", remote_addr, remote_host))

so it will look like this:

end
m.log(2, string.format("Fake Bot Plugin DEBUG: REMOTE_ADDR: %s REMOTE_HOST: %s", remote_addr, remote_host))
m.setvar("tx.fake-bot-plugin_bot_name", bot_name)
return string.format("Fake Bot Plugin: Detected fake %s.", bot_name)

Reload web server and wait for another request from Google-Image - you should see some debug info in logs.

Fake Bot plugin is not using web server logs so it doesn't matter on it's format.

from fake-bot-plugin.

 avatar commented on June 3, 2024

@achatpc Please edit file fake-bot.lua and add this on line 63 (almost at the end of file): m.log(2, string.format("Fake Bot Plugin DEBUG: REMOTE_ADDR: %s REMOTE_HOST: %s", remote_addr, remote_host))

so it will look like this:

end
m.log(2, string.format("Fake Bot Plugin DEBUG: REMOTE_ADDR: %s REMOTE_HOST: %s", remote_addr, remote_host))
m.setvar("tx.fake-bot-plugin_bot_name", bot_name)
return string.format("Fake Bot Plugin: Detected fake %s.", bot_name)

Reload web server and wait for another request from Google-Image - you should see some debug info in logs.

Fake Bot plugin is not using web server logs so it doesn't matter on it's format.

[Mon Feb 21 16:12:58.350226 2022] [:error] [pid 1036742:tid 140554708952832] [client 66.249.66.219:0] [client 66.249.66.219] ModSecurity: Fake Bot Plugin DEBUG: REMOTE_ADDR: 66.249.66.219 REMOTE_HOST: 137.74.122.35 [hostname "cdn.achatpc.com"] [uri "/media/catalog/product/cache/993aa024f1a2c812c347b0876f4d0efd/4/1/41810189304034_5675549042.jpg"] [unique_id "YhOr-o-YEHPaOX93HVxKdwAAFhQ"]
[Mon Feb 21 16:12:58.350664 2022] [:error] [pid 1036742:tid 140554708952832] [client 66.249.66.219:0] [client 66.249.66.219] ModSecurity: Warning. Fake Bot Plugin: Detected fake Googlebot. [file "/etc/modsecurity/plugins/fake-bot-after.conf"] [line "27"] [id "9504110"] [msg "Fake bot detected: Googlebot"] [data "Matched Data: googlebot found within REQUEST_HEADERS:User-Agent: Googlebot-Image/1.0"] [severity "CRITICAL"] [ver "fake-bot-plugin/1.0.0"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-bot"] [tag "capec/1000/225/22/77/13"] [tag "PCI/6.5.10"] [tag "paranoia-level/1"] [hostname "cdn.achatpc.com"] [uri "/media/catalog/product/cache/993aa024f1a2c812c347b0876f4d0efd/4/1/41810189304034_5675549042.jpg"] [unique_id "YhOr-o-YEHPaOX93HVxKdwAAFhQ"]
[Mon Feb 21 16:13:16.494867 2022] [:error] [pid 1031694:tid 140555434333952] [client 92.184.105.227:0] [client 92.184.105.227] ModSecurity: collections_remove_stale: Failed to access DBM file "/var/cache/modsecurity/magento_user-global": Permission denied [hostname "cdn.achatpc.com"] [uri "/static/version1645361515/frontend/Emipro/achatpc/fr_FR/fonts/opensans/regular/opensans-400.woff2"] [unique_id "YhOsDIfnLCUjpOwKAX-suQAACQA"], referer: https://cdn.achatpc.com/static/version1645361515/_cache/merged/fonts_a0ee8e469788a2508da646d4452deae3.min.css
[Mon Feb 21 16:13:16.494927 2022] [:error] [pid 1031694:tid 140555434333952] [client 92.184.105.227:0] [client 92.184.105.227] ModSecurity: collections_remove_stale: Failed to access DBM file "/var/cache/modsecurity/magento_user-ip": Permission denied [hostname "cdn.achatpc.com"] [uri "/static/version1645361515/frontend/Emipro/achatpc/fr_FR/fonts/opensans/regular/opensans-400.woff2"] [unique_id "YhOsDIfnLCUjpOwKAX-suQAACQA"], referer: https://cdn.achatpc.com/static/version1645361515/_cache/merged/fonts_a0ee8e469788a2508da646d4452deae3.min.css

EDIT:

I fixed ModSecurity: collections_remove_stale: Failed to access DBM file "/var/cache/modsecurity/magento_user-global" issue with correct folder right.

Bad bot erro is issue with REMOTE_HOST: 137.74.122.35

from fake-bot-plugin.

azurit avatar azurit commented on June 3, 2024

So here is the problem:
Fake Bot Plugin DEBUG: REMOTE_ADDR: 66.249.66.219 REMOTE_HOST: 137.74.122.35

Looks like mod_remoteip is updating REMOTE_ADDR but not REMOTE_HOST.

from fake-bot-plugin.

azurit avatar azurit commented on June 3, 2024

@achatpc Can you try current version?

from fake-bot-plugin.

 avatar commented on June 3, 2024

@achatpc Can you try current version?

What do you mean by "current version" ?

from fake-bot-plugin.

azurit avatar azurit commented on June 3, 2024

Redownload fake-bot.lua: https://github.com/coreruleset/fake-bot-plugin/blob/main/plugins/fake-bot.lua

from fake-bot-plugin.

 avatar commented on June 3, 2024

Redownload fake-bot.lua: https://github.com/coreruleset/fake-bot-plugin/blob/main/plugins/fake-bot.lua

Your change fix false positive issue on log. But i need to try if working.

i try this from other server:

curl http://www.achatpc.com --header "User-Agent: Googlebot"

no error in log. I think that detection work not.

Or, my test is not correct. I can not see trace in access.log of my curl requests

from fake-bot-plugin.

azurit avatar azurit commented on June 3, 2024

It works on my side (but your test seems correct). Can you check if you have also newest version of this file?
https://github.com/coreruleset/fake-bot-plugin/blob/main/plugins/fake-bot-after.conf

from fake-bot-plugin.

 avatar commented on June 3, 2024

yes, if you are 217.*.*.*, working

[Mon Feb 21 17:31:05.165732 2022] [:error] [pid 1065666:tid 140535896561408] [client 217.*.*.*:36946] [client 217.*.*.*] ModSecurity: Warning. Fake Bot Plugin: Detected fake Googlebot. [file "/etc/modsecurity/plugins/fake-bot-after.conf"] [line "27"] [id "9504110"] [msg "Fake bot detected: Googlebot"] [data "Matched Data: googlebot found within REQUEST_HEADERS:User-Agent: Googlebot"] [severity "CRITICAL"] [ver "fake-bot-plugin/1.0.0"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-bot"] [tag "capec/1000/225/22/77/13"] [tag "PCI/6.5.10"] [tag "paranoia-level/1"] [hostname "www.achatpc.com"] [uri "/"] [unique_id "YhO-SNRB6SChFwbpfMevmAAIkAU"]

from fake-bot-plugin.

azurit avatar azurit commented on June 3, 2024

Yes, that was me. :D

from fake-bot-plugin.

 avatar commented on June 3, 2024

Yesm that was me. :D

Ok perfect. Thank you a lot.

from fake-bot-plugin.

 avatar commented on June 3, 2024

Yes, that was me. :D

Can i contact you in private ?

from fake-bot-plugin.

azurit avatar azurit commented on June 3, 2024

You can contact me on jozef at sudolsky dot sk .

from fake-bot-plugin.

azurit avatar azurit commented on June 3, 2024

Thanks for reporting and testing! Closing.

from fake-bot-plugin.

Related Issues (9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.