Giter Club home page Giter Club logo

Comments (38)

umputun avatar umputun commented on July 23, 2024 1

I have added --training mode. I think this is what you need, pls try it (it is on :master tag).

If you need a binary to run directly on the box, let me know, and I'll make one for you.

from tg-spam.

umputun avatar umputun commented on July 23, 2024 1

If in training mode instead of "Keep it banned" button (which is doing nothing) will be "Ban in the chat" button it would be enough.

The button actually does something important - it adds the message to dynamic spam samples.

The idea of allowing the actual ban in training mode sounds reasonable. probably in this case instead of "keep it banned" it should show "confirm ban"

from tg-spam.

umputun avatar umputun commented on July 23, 2024 1

the change in master, you can give it a try

from tg-spam.

umputun avatar umputun commented on July 23, 2024 1

Should be resolved in the latest master and v1.3.1

from tg-spam.

umputun avatar umputun commented on July 23, 2024 1

In the meantime, I have added more debug info on locator updates and details on a failed lookup. This is on :master (version=master-436b501-20231225T16:27:17) and may provide some clues.

from tg-spam.

umputun avatar umputun commented on July 23, 2024 1

This is helpful; I see the issue. It is here: https://github.com/umputun/tg-spam/blob/master/app/events/admin.go#L393 - it restores the original message with a space separator instead of \n, and this leads to a different message with a different hash. So, messages you had working are single-line, and failed are multiline.

The issue affects mostly training mode, because in the regular mode we don't need to ban user (already banned automatically) and the reason for the locator lookup is to get user id

Fixing this is more complex than putting \n in here. I need to think about handling it without affecting the bot's existing hashes and other parts where this getCleanMessage used

from tg-spam.

alehano avatar alehano commented on July 23, 2024 1

Now it's working 👍

from tg-spam.

umputun avatar umputun commented on July 23, 2024

The first part is already in; this is what it looks like

image

The second part is against the "dry mode" idea, as dry mode guarantees no "destructive" actions. As long as you are running in dry mode, you won't be able to send any action to the bot. However, it is an interesting idea to allow such training in some mode, I'll think about it

from tg-spam.

umputun avatar umputun commented on July 23, 2024

well, after some quick consideration, adding "BAN" in the dry mode to admin chat makes very little sense, as only detected spam (already banned) is forwarded. Clicking on "unban" will unban it, doing nothing will keep it banned.

The bottom line - just turn dry mode off and unban false positives. There is a section in docs explaining similar flow https://github.com/umputun/tg-spam#running-the-bot-with-an-empty-set-of-samples

from tg-spam.

alehano avatar alehano commented on July 23, 2024

The first part is already in; this is what it looks like

It’s strange but I don’t have this button. Here is how forwarded message looks like:

IMG_9440

Regarding dry mode I can’t switch it off cause many false positive. Maybe in my case regular messages looks like spam for a bot: many commercial proposals often with emojies, but it’s ok for my chat. So I need somehow to train a bot against false positive cases.
Btw, I didn’t use any pre set data except exclude-tokens.

UPD: After a closer look at the documentation, I think I should tune the similarity threshold and max-emoji. In my case, defaults don’t work well.

from tg-spam.

umputun avatar umputun commented on July 23, 2024

It’s strange, but I don’t have this button.

This is on the master version only; you probably run stable (:latest)

from tg-spam.

umputun avatar umputun commented on July 23, 2024

So I need somehow to train a bot against false positive cases.

yeah, this is a valid use case if you don't want to train it in live mode.

from tg-spam.

alehano avatar alehano commented on July 23, 2024

I have added --training mode. I think this is what you need, pls try it

After a several days of testing and about 100 spam detection I still have false positive rate about 20%. And in my case it's not acceptable to switch the bot for pre-moderation, because as I understand it's not possible to recover deleted messages, but I can't afford to delete even few.
So I consider to use the bot as a post-moderation tool, when a bot only forward suspicious messages to a admin group.
But for such case, it would be good to have ability to ban users (in main chat) from admin group by the button and train the bot if it's false positive detection.
So now we have two modes: dry and training, which is a bit confusing and some kind of duplicate. And these modes consider as temporary.
I suggest to make post-moderation mode instead of two, where all messages will have two buttons: "ban" (to ban in main chat) and "it's not a spam" (for training). And I believe it's more convenient to show both buttons right away. Now it needs extra click. If you want to protect from accidental clicks, It's better to give ability to undone some action rather than confirm every time.
I guess all of that is not easy to implement and will need some refactoring, but overall in terms of UX I believe it will be more straightforward and convenient.

from tg-spam.

umputun avatar umputun commented on July 23, 2024

So now we have two modes: dry and training, which is a bit confusing and some kind of duplicate. And these modes consider as temporary.

Those modes are somewhat similar but still different enough. Dry doesn't do anything real or update any spam/ham sample; it just shows what the bot will do in real life. This mode here is to see what will be banned if one turns it on without banning anything. Training mode is the real deal; it does everything for real except the actual ban and message removal.

After a several days of testing and about 100 spam detection I still have false positive rate about 20%.

Such a rate is suspiciously high. I guess it is mostly classifier detection, but you can see info on each ban right in tg or in logs. If this is a classifier, I would suggest making a ham samples from the normal messages from your group and keeping it trained with spam the way you did. Another thing you can do is to turn on openai checker in veto mode, this thing is decent with gpt-4 model.

Regarding manual/post-moderation - well, it is possible to introduce a flag "no-auto-actions" or smth like this and remove on confirmation from the admin group only, or by forwarding missed spam.

I suggest to make post-moderation mode instead of two

I don't see how this is a replacement for a dry mode. I also don't think it should be a special mode but rather an option for the regular mode.

I guess all of that is not easy to implement

Well, the way I described above is almost trivial to implement. We probably will need a different text to be sent to admin chat in addition to a flag check, but this is not a hard thing to do.

from tg-spam.

umputun avatar umputun commented on July 23, 2024

If you want to protect from accidental clicks, It's better to give ability to undone some action rather than confirm every time

The second click is here because there is no way I am aware of to "undelete" message in tg bot API. Initially, it was a direct button and I have witnessed accidental clicks a few times. So, my take on this - unless there is a way to undelete, I'd rather prefer admin to struggle a little bit but not run into a case with "deleted by mistake". You also said "I can't afford to delete even few." and I agree with this, and this is exactly the reason for the confirmation "dialog".

Saying this - it is possible to implement the direct actions without any confirmation optionally, but as of now I don't really see a good reason for this. It should be something insanely active to make a difference. I mean, I could understand if one gets a spam message every second, but in any normal situation I can think of, the extra click shouldn't be a problem.

from tg-spam.

alehano avatar alehano commented on July 23, 2024

you can see info on each ban right in tg or in logs

Please tell me how can I check is it a classifier or not in the logs.
I have records like so:
{"ts":"2023-12-21T10:34:07+03:00","display_name":"Тимур","user_name":"","user_id":66_____85,"text":"this is spam: "Тимур" (66_____85)"}

I think it's really difficult for a bot (and even chat gpt) to distinguish spam in my chat.
For example these are not spam in my case:


🧡🧡🧡🧡🧡🧡 1⃣5⃣ 🧡 🧡🧡🧡🧡🧡🧡🧡 Гарантия 14 дней со дня покупки 🛍 📱15 128 / 256 / 512 15 128 Black 🇯🇵 - 82.000 15 128 Blue 🇯🇵 - 83.100 15 128 Yellow 🇯🇵- 81.500 15 128 Green 🇯🇵- 81.000 15 128 Pink 🇮🇳 - 83.200 15 128 Pink 🇨🇳 - 81.500 🔥 📱15 Pro 128/ 256/ 512/ 1Tb 15 Pro 128 Blue 🇯🇵 -109.500 15 Pro 128 Black 🇯🇵 -109.500 15 Pro 128 White 🇯🇵 -109.500 15 Pro 128 Natural 🇯🇵 -108.000🔥 15 Pro 256 White 🇯🇵 -117.900 15 Pro 256 Black 🇯🇵 -117.800 15 Pro 256 Blue 🇯🇵 -116.900 15 Pro 256 Natural 🇯🇵- 116.000🔥 15 Pro 512 Black 🇯🇵 - 140.500 15 Pro 512 Blue 🇯🇵- 137.500 15 Pro 512 Natural 🇯🇵-136.500 15 Pro 512 White 🇯🇵-138.000 15 Pro 1TB Black 🇯🇵 - 150.000 15 Pro 1TB Blue 🇯🇵 - 148.500 15 Pro 1TB Natural 🇯🇵 - 148.500 15 Pro 1TB White 🇯🇵 - 157.000 📱15 Pro Max 256/ 512/ 1TB 15 Pro Max 256 Black 🇯🇵-124.500 15 Pro Max 256 Blue 🇯🇵-130.500 15 Pro Max 256 White 🇯🇵-130.000 15 Pro Max 256 Natural 🇯🇵 -170.000🔥 15 Pro Max 512 Black 🇯🇵 -152.500 15 Pro Max 512 Blue 🇯🇵-146.000 15 Pro max 512 Natural 🇯🇵 -142.000 15 Pro Max 512 White 🇯🇵-154.500 15 Pro Max 1TB Black 🇯🇵 -164.000 15 Pro max 1Tb Blue 🇯🇵 - 166.000 15 Pro max 1Tb Natural 🇯🇵 -166.000 15 Pro max 1Tb White 🇯🇵-171.000 🇨🇳🇭🇰- 2 SIM 🇰🇼🇯🇵🇨🇦🇮🇳🇰🇷🇸🇬🇪🇺-1 Sim+E SIM Пишите в ЛС


Интересует мужская одежда оптом ?
Пишите в ЛС
Рынок Дордой


Занимаюсь продажей электроники. В основном смартфоны. Самовывоз, наличная оплата при получении. МСК и СПБ Подробнее в профиле


Уважаемые коллеги, добрый день! Предлагаем Вашему вниманию: Ледяная рыба (Champsocephalus gunnari) В ассортименте: Ледяная, целая, IQF, 250г- 🌐 Производитель: Китай 📦 Кор. 5 кг ⚖️ Средний вес 1 шт.: ~180 г. 💰 Цена - 920 ₽/кг Ледяная, целая, IQF, 250г+ 🌐 Производитель: Китай 📦 Кор. 5 кг ⚖️ Средний вес 1 шт.: ~300 г. 💰 Цена - 975 ₽/кг Подробности в личные сообщения 💬


But these are spam:


В поиске работы?Гибкий график от 4500 в день.


•Есть связка внутри BYBIT❗️ •Не скам ⚠️🌐 •Без левых oбмeникoв и бирж •Прoтестирoвать мoжнo с любoй сумы✔️ •Прибыль сo связки 1-2% 💲 •Пo всем вoпрoсам к нему


Нyжны ( paзноpабoчиe кoторые гoтовы заpaбатывать вмeсте cо мной)
Плaтим oт 15.000 до 20.000 тысяч pублeй в день
Oплaта срaзy пoслe рaботы
Пишитe в личныe сooбщeния


I should say spam detecting quite well, but sometimes a bot block not spam messages. Each case I marked as "not a spam" to add to a ham list.

from tg-spam.

alehano avatar alehano commented on July 23, 2024

well, it is possible to introduce a flag "no-auto-actions"

I'm not sure is it a good idea adding so many modes. In my case I just need a quick way to block spammers from an admin group. If in training mode instead of "Keep it banned" button (which is doing nothing) will be "Ban in the chat" button it would be enough.

from tg-spam.

umputun avatar umputun commented on July 23, 2024

Please tell me how can I check is it a classifier or not in the logs.

log (in your docker container) says something like this (if --dbg set)

{Text:this is spam: "Sergiy Petrenko" (120025072) Send:true BanInterval:9600h0m0s User:{ID:120025072 
Username:grayodesa DisplayName:Sergiy Petrenko} ChannelID:0 ReplyTo:15 DeleteReplyTo:true CheckResults:
[{Name:stopword Spam:false Details:not found} {Name:emoji Spam:true Details:4/2} {Name:similarity Spam:false 
Details:0.37/0.50} {Name:classifier Spam:false Details:probability of ham: 66.21%} {Name:cas Spam:false Details:Record not found.}]}

You can see container logs by running docker log <container name>

You can also see it if you press "info" button

SCR-20231221-mgem

from tg-spam.

alehano avatar alehano commented on July 23, 2024

Thanks for adding this feature.
But after clicking Confirm ban I get this error:

error: failed confirmation ban: failed to ban user 6405_____355: Bad Request: can't restrict self

I tried a couple of times on different messages from different users. User id in the error message the same. So I suppose the bot trying to block not spammer ID.

from tg-spam.

alehano avatar alehano commented on July 23, 2024

Another issue after attempting to ban. Here is an error message:

error: failed confirmation ban: failed to find message "Нужна инfoграфика⁉️ Тогда тебе 🫵 ко мне, обращайтесь‼️ Портfолио в шапке профиля." in locator

I tried several times on different messages. Sometimes less than a minute after posting. And a message was right there in the chat, the last one.

Sometimes ban works but mostly don’t.

from tg-spam.

umputun avatar umputun commented on July 23, 2024

I can't reproduce it locally. it sounds like in your case bot doesn't store records in the locator, maybe permission issues updating tg-spam.db?

The log should show some warnings messages about it

from tg-spam.

alehano avatar alehano commented on July 23, 2024

The log should show some warnings messages about it

I have this in logs:

2023/12/25 08:54:17.764 [INFO] {events/listener.go:76 events.(*TelegramListener).Do} admin chat ID: -4058____3
2023/12/25 08:54:17.764 [DEBUG] {events/listener.go:95 events.(*TelegramListener).Do} admin handler created. &{tbAPI:0xc0000f80a0 bot:0xc000109a40 locator:0xc000011500 superUsers:[pret____nd al___ optl______ot al_____n V____PB] primChatID:-100______69 adminChatID:-40______3 trainingMode:true keepUser:false dry:false}

Also if I forward the message to a admin group, bot says the user banned and message deleted but the user didn't actually blocked.

It's not seems the permission issue. Maybe it's because I mixed old version FILES_ env vars in docker-compose with the new FILES_DYNAMIC. I cleaned it up and updated to a newest version. So will be investigating the problem further.

from tg-spam.

umputun avatar umputun commented on July 23, 2024

This part of the log doesn't say enough about anything related to the admin action. All the admin actions have {events/admin prefix and look like this:

2023/12/25 09:15:27.753 [DEBUG] {events/admin.go:166 events.(*admin).InlineCallbackHandler} spam info sent, 
chatID: -4037717836, userID: !1099397935, orig: "permanently banned {1099397935 tonaminina 🥷}\n\n🖇 НЕЙРОСЕТЬ ДЛЯ ПОИСКА ИНТИМ ФОТО   🔞 НАЙДИ ОБНАЖЕННЫЕ ФОТО И ВИДЕО ЛЮБОЙ ДЕВУШКИ  🔍 ДЛЯ ПРОВЕРКИ, НАПИШИ В ПОИСК ТЕЛЕГРАМ: DFKL40"

It also shows some info about ban confirmation requests and ban/unban results. If you don't see any of this, this can be when the bot has no telegram permissions to read your admin chat. That "bot says the user banned and a message deleted, but the user didn't blocked" is a good symptom of this, but even in this case, you should still see log messages about attempts to ban/delete and why it failed.

Disabling privacy mode may help you if this is tg permission issue. I'm not sure why with some chats it works without disabling it and with some not, but I have seen a chat that needed it to be disabled to even get messages from the telegram.

from tg-spam.

alehano avatar alehano commented on July 23, 2024

Ok I’ll check the next cases and post logs.
It’s really weird. As I mentioned only once bot was able to properly block the user.

Btw privacy mode was disables before adding the bot to the chat. And the bot has admin permissions.

from tg-spam.

umputun avatar umputun commented on July 23, 2024

yeah, it is odd for real. I have a testing chat what I use to debug issues. The chat is public and I can invite you in. You could post your things and we will see what's going on. As I tested --training functionality it worked as expected every single time.

ping me on tg if you want to try it and I'll setup a debug session. maybe something in your messages driving bot/tg crazy, not sure. I have noticed what tg is strangely sensitive to things related to message markup and also has a very unexpected privacy-related paranoia in some odd cases.

from tg-spam.

alehano avatar alehano commented on July 23, 2024

@umputun here is the log of the last case, maybe it's will be clear for you what's going on:

2023/12/25 20:21:01.716 [DEBUG] {events/listener.go:156 events.(*TelegramListener).procEvents} {"message_id":514412,"from":{"id":6426542577,"first_name":"Алексей","last_name":"Фролов","username":"G38189"},"date":1703535661,"chat":{"id":-1001540310869,"type":"supergroup","title":"Оптовый чат ОптЛист — поставщики оптом","username":"optlist_chat","photo":null,"location":null},"text":"🔞 Эскоpт модели 18+, с выездом по адpесу\n\n➡️ @Angels_Of_Lovess","entities":[{"type":"mention","offset":46,"length":17}],"message_auto_delete_timer_changed":null,"proximity_alert_triggered":null,"voice_chat_scheduled":null,"voice_chat_started":null,"voice_chat_ended":null,"voice_chat_participants_invited":null}
2023/12/25 20:21:01.716 [DEBUG] {events/listener.go:170 events.(*TelegramListener).procEvents} incoming msg: 🔞 Эскоpт модели 18+, с выездом по адpесу  ➡️ @Angels_Of_Lovess
2023/12/25 20:21:02.001 [INFO]  {bot/spam.go:84 bot.(*SpamFilter).OnMessage} user Алексей Фролов detected as spammer: {name: stopword, spam: false, details: not found}, {name: emoji, spam: false, details: 2/100}, {name: similarity, spam: false, details: 0.19/0.50}, {name: classifier, spam: true, details: probability of spam: 72.32%}, {name: cas, spam: false, details: record not found}, "🔞 Эскоpт модели 18+, с выездом по адpесу\n\n➡️ @Angels_Of_Lovess"
2023/12/25 20:21:02.001 [DEBUG] {events/listener.go:187 events.(*TelegramListener).procEvents} ban initiated for {Text:this is spam: "Алексей Фролов" (6426542577) Send:true BanInterval:9600h0m0s User:{ID:6426542577 Username:G38189 DisplayName:Алексей Фролов} ChannelID:0 ReplyTo:514412 DeleteReplyTo:true CheckResults:[{Name:stopword Spam:false Details:not found} {Name:emoji Spam:false Details:2/100} {Name:similarity Spam:false Details:0.19/0.50} {Name:classifier Spam:true Details:probability of spam: 72.32%} {Name:cas Spam:false Details:record not found}]}
2023/12/25 20:21:02.001 [DEBUG] {app/main.go:403 main.execute.makeSpamLogger.func7} spam detected from {6426542577 G38189 Алексей Фролов}, text: 🔞 Эскоpт модели 18+, с выездом по адpесу  ➡️ @Angels_Of_Lovess
2023/12/25 20:21:02.023 [INFO]  {events/events.go:128 events.banUserOrChannel} training mode: ban 6426542577 for 9600h0m0s
2023/12/25 20:21:02.023 [INFO]  {events/listener.go:205 events.(*TelegramListener).procEvents} {6426542577 G38189 Алексей Фролов} banned by bot for 9600h0m0s
2023/12/25 20:21:02.023 [DEBUG] {events/admin.go:40 events.(*admin).ReportBan} report to admin chat, ban msgsData for {6426542577 G38189 Алексей Фролов}, group: -4058117643
2023/12/25 20:21:02.023 [DEBUG] {events/admin.go:402 events.(*admin).sendWithUnbanMarkup} action response "change ban": user {ID:6426542577 Username:G38189 DisplayName:Алексей Фролов}, text: "**permanently banned [{6426542577 G38189 Алексей Фролов}](tg://user?id=6426542577)**\\n\\n🔞 Эскоpт модели 18+, с выездом по адpесу  ➡️ @Angels\\_Of\\_Lovess\\n\\n"
2023/12/25 20:21:11.736 [DEBUG] {events/admin.go:147 events.(*admin).InlineCallbackHandler} unban confirmation request sent, chatID: -4058117643, userID: 6426542577, orig: "permanently banned {6426542577 G38189 Алексей Фролов}\n\n🔞 Эскоpт модели 18+, с выездом по адpесу  ➡️ @Angels_Of_Lovess"
2023/12/25 20:21:13.063 [DEBUG] {bot/spam.go:100 bot.(*SpamFilter).UpdateSpam} update spam samples with "🔞 Эскоpт модели 18+, с выездом по адpесу  ➡️ @Angels_Of_Lovess"
2023/12/25 20:21:13.156 [WARN]  {events/listener.go:123 events.(*TelegramListener).Do} failed to process callback: failed confirmation ban: failed to find message "🔞 Эскоpт модели 18+, с выездом по адpесу  ➡️ @Angels_Of_Lovess" in locator
2023/12/25 20:21:13.156 [DEBUG] {events/listener.go:272 events.(*TelegramListener).sendBotResponse} bot response - error: failed confirmation ban: failed to find message "🔞 Эскоpт модели 18+, с выездом по адpесу  ➡️ @Angels_Of_Lovess" in locator, reply-to:0

The line with unban confirmation request sent seems suspicious for me because I confirmed ban rather than unban

from tg-spam.

alehano avatar alehano commented on July 23, 2024

I have noticed what tg is strangely sensitive to things related to message markup and also has a very unexpected privacy-related paranoia in some odd cases.

I remembered one thing. I'm pretty sure all spammers had premium status in Tg. Maybe it's related.

from tg-spam.

umputun avatar umputun commented on July 23, 2024

The line with unban confirmation request sent seems suspicious for me because I confirmed ban rather than unban

Nah, this is fine. The report is not about the action itself but rather about sending confirmation buttons to the telegram "Admin" group.

From your log it is pretty clear - for some reason bot doesn't store messages to the locator. The line with " incoming msg:" means it can get events from the primary chat, and adding to the locator db is literally the next line, which will show a warning if failed

	log.Printf("[DEBUG] incoming msg: %+v", strings.ReplaceAll(msg.Text, "\n", " "))
	if err := l.Locator.AddMessage(update.Message.Text, fromChat, msg.From.ID, msg.From.Username, msg.ID); err != nil {
		log.Printf("[WARN] failed to add message to locator: %v", err)
	}

can you send me (email or tg if you don't want to share publicly) tg-spam.db file? The file is just a list of hashes and timestamps locator uses to match on.

also, pls share how exactly you running the bot (docker/binary) and what params (env, command line, docker-compose.yml) are used ?

from tg-spam.

alehano avatar alehano commented on July 23, 2024

Yes, sure. I sent you my db file via email.
I checked sqlite db, messages table. And the record regarding the case above (user ID: 6426542577) is there.

And one more note. In my chat topics is enabled, it's like categories/threads. It's not common, maybe it some kind related.

I run the bot in Docker. Here is my docker-compose:

version: "3"
services:

  tg-spam:
    image: umputun/tg-spam:master
    hostname: tg-spam
    restart: always
    container_name: tg-spam
    user: "1000:1000" # set uid:gid to host user to avoid permission issues with mounted volumes
    logging: &default_logging
      driver: json-file
      options:
        max-size: "10m"
        max-file: "5"
    environment:
      - TZ=Europe/Moscow
      - TELEGRAM_TOKEN=64058983_____________________
      - TELEGRAM_GROUP=opt_____at # public group name to monitor and protect
      - ADMIN_GROUP=-40____43 # private group id for admin spam-reports
      - LOGGER_ENABLED=true
      - LOGGER_FILE=/srv/log/tg-spam.log
      - LOGGER_MAX_SIZE=5M
      #- FILES_DYNAMIC_SPAM=/srv/var/dynamic-spam.txt
      #- FILES_DYNAMIC_HAM=/srv/var/dynamic-ham.txt
      #- FILES_APPROVED_USERS=/srv/var/approved-users.dat
      #- FILES_EXCLUDE_TOKENS=/srv/var/exclude-tokens.txt
      - FILES_DYNAMIC=/srv/var
      - MAX_EMOJI=100
      - SIMILARITY_THRESHOLD=0.5
      - MIN_MSG_LEN=45
      - NO_SPAM_REPLY=true
      #- DRY=true
      - TRAINING=true
      - DEBUG=true
    volumes:
      - ./log/tg-spam:/srv/log
      - ./var/tg-spam:/srv/var
    command: --super=pre_____nd --super=ale_____7

from tg-spam.

umputun avatar umputun commented on July 23, 2024

Yes, sure. I sent you my db file via email.

Unfortunately, I got nothing. I didn't even receive notifications from this project via email, I guess Gmail doesn't like the word "Spam"

from tg-spam.

alehano avatar alehano commented on July 23, 2024

Yes, sure. I sent you my db file via email.

Unfortunately, I got nothing. I didn't receive even notification from this project via email, I guess Gmail doesn't like the word "Spam"

Ok, sent it again.

I digged into the code. You get the message data by clearing and hashing the text.
Maybe it's not the most reliable method. Why you not just get it by id with something like this:

a.locator.MessageByID(query.Message.MessageID)

from tg-spam.

umputun avatar umputun commented on July 23, 2024

All this locator thing is here because there is no ID of the original message after it was forwarded.

// it would be nice to ban this user right away, but we don't have forwarded user ID here due to tg privacy limitation.

from tg-spam.

alehano avatar alehano commented on July 23, 2024

So I have news.
Today I had two cases. One of them was successfully banned:

2023/12/26 06:14:16.511 [DEBUG] {events/admin.go:147 events.(*admin).InlineCallbackHandler} unban confirmation request sent, chatID: -4058117643, userID: 6956485865, orig: "permanently banned {6956485865  Игорь}\n\nBceм пpивeт,eсть xaлтуpка нa цeлый дeнь c высoкoй oплатoй,стaвь + мнe в ЛC,чтобы yзнaть пoдpoбнocти"
2023/12/26 06:14:17.473 [DEBUG] {bot/spam.go:100 bot.(*SpamFilter).UpdateSpam} update spam samples with "Bceм пpивeт,eсть xaлтуpка нa цeлый дeнь c высoкoй oплатoй,стaвь + мнe в ЛC,чтобы yзнaть пoдpoбнocти"
2023/12/26 06:14:17.609 [INFO]  {events/admin.go:277 events.(*admin).callbackBanConfirmed} user "" (6956485865) banned
2023/12/26 06:14:17.609 [DEBUG] {events/admin.go:157 events.(*admin).InlineCallbackHandler} ban confirmed, chatID: -4058117643, userID: +6956485865, orig: "permanently banned {6956485865  Игорь}\n\nBceм пpивeт,eсть xaлтуpка нa цeлый дeнь c высoкoй oплатoй,стaвь + мнe в ЛC,чтобы yзнaть пoдpoбнocти"

The other one didn't. And as I suspected the problem with hashing.

Here is the log:

2023/12/26 05:04:22.981 [DEBUG] {events/listener.go:156 events.(*TelegramListener).procEvents} {"message_id":514475,"from":{"id":6713257809,"first_name":"Маргарита","last_name":"Инфографика","username":"Natusik09o"},"date":1703567062,"chat":{"id":-1001540310869,"type":"supergroup","title":"Оптовый чат ОптЛист — поставщики оптом","username":"optlist_chat","photo":null,"location":null},"text":"Нужна инfoграфика⁉️\nТогда тебе 🫵 ко мне, обращайтесь‼️\nПортfолио в шапке профиля.","message_auto_delete_timer_changed":null,"proximity_alert_triggered":null,"voice_chat_scheduled":null,"voice_chat_started":null,"voice_chat_ended":null,"voice_chat_participants_invited":null}
2023/12/26 05:04:22.981 [DEBUG] {events/listener.go:170 events.(*TelegramListener).procEvents} incoming msg: Нужна инfoграфика⁉️ Тогда тебе 🫵 ко мне, обращайтесь‼️ Портfолио в шапке профиля.
2023/12/26 05:04:22.981 [DEBUG] {storage/locator.go:78 storage.(*Locator).AddMessage} add message to locator: "Нужна инfoграфика⁉️\nТогда тебе 🫵 ко мне, обращайтесь‼️\nПортfолио в шапке профиля.", hash:06f75d904947578851fd254b81b21e01efa11fd999d17df94c81a70cd3e9f5d5, userID:6713257809, user name:"Natusik09o", chatID:-1001540310869, msgID:514475
2023/12/26 05:04:23.382 [INFO]  {bot/spam.go:84 bot.(*SpamFilter).OnMessage} user Маргарита Инфографика detected as spammer: {name: stopword, spam: false, details: not found}, {name: emoji, spam: false, details: 3/100}, {name: similarity, spam: true, details: 1.00/0.50}, {name: classifier, spam: true, details: probability of spam: 97.83%}, {name: cas, spam: false, details: record not found}, "Нужна инfoграфика⁉️\nТогда тебе 🫵 ко мне, обращайтесь‼️\nПортfолио в шапке профиля."
2023/12/26 05:04:23.382 [DEBUG] {events/listener.go:187 events.(*TelegramListener).procEvents} ban initiated for {Text:this is spam: "Маргарита Инфографика" (6713257809) Send:true BanInterval:9600h0m0s User:{ID:6713257809 Username:Natusik09o DisplayName:Маргарита Инфографика} ChannelID:0 ReplyTo:514475 DeleteReplyTo:true CheckResults:[{Name:stopword Spam:false Details:not found} {Name:emoji Spam:false Details:3/100} {Name:similarity Spam:true Details:1.00/0.50} {Name:classifier Spam:true Details:probability of spam: 97.83%} {Name:cas Spam:false Details:record not found}]}
2023/12/26 05:04:23.382 [DEBUG] {app/main.go:416 main.execute.makeSpamLogger.func7} spam detected from {6713257809 Natusik09o Маргарита Инфографика}, text: Нужна инfoграфика⁉️ Тогда тебе 🫵 ко мне, обращайтесь‼️ Портfолио в шапке профиля.
2023/12/26 05:04:23.396 [INFO]  {events/events.go:128 events.banUserOrChannel} training mode: ban 6713257809 for 9600h0m0s
2023/12/26 05:04:23.396 [INFO]  {events/listener.go:205 events.(*TelegramListener).procEvents} {6713257809 Natusik09o Маргарита Инфографика} banned by bot for 9600h0m0s
2023/12/26 05:04:23.396 [DEBUG] {events/admin.go:40 events.(*admin).ReportBan} report to admin chat, ban msgsData for {6713257809 Natusik09o Маргарита Инфографика}, group: -4058117643
2023/12/26 05:04:23.396 [DEBUG] {events/admin.go:402 events.(*admin).sendWithUnbanMarkup} action response "change ban": user {ID:6713257809 Username:Natusik09o DisplayName:Маргарита Инфографика}, text: "**permanently banned [{6713257809 Natusik09o Маргарита Инфографика}](tg://user?id=6713257809)**\\n\\nНужна инfoграфика⁉️ Тогда тебе 🫵 ко мне, обращайтесь‼️ Портfолио в шапке профиля.\\n\\n"


2023/12/26 06:14:12.641 [DEBUG] {events/admin.go:147 events.(*admin).InlineCallbackHandler} unban confirmation request sent, chatID: -4058117643, userID: 6713257809, orig: "permanently banned {6713257809 Natusik09o Маргарита Инфографика}\n\nНужна инfoграфика⁉️ Тогда тебе 🫵 ко мне, обращайтесь‼️ Портfолио в шапке профиля."
2023/12/26 06:14:13.657 [DEBUG] {bot/spam.go:100 bot.(*SpamFilter).UpdateSpam} update spam samples with "Нужна инfoграфика⁉️ Тогда тебе 🫵 ко мне, обращайтесь‼️ Портfолио в шапке профиля."
2023/12/26 06:14:13.662 [DEBUG] {storage/locator.go:127 storage.(*Locator).Message} failed to find message by hash "7b8b365159e31747791dd995c98298f5241221bf8f49c46c81ee2192d80c6219": sql: no rows in result set

And here is the record in the DB:

06f75d904947578851fd254b81b21e01efa11fd999d17df94c81a70cd3e9f5d5	2023-12-26 05:04:22.982014149 +0000 UTC m=+22353.891124520	-1001540310869	6713257809	Natusik09o	514475

So as you can see the hash is different.

from tg-spam.

umputun avatar umputun commented on July 23, 2024

For the second case, you should also have a log of the moment the message was hashed, it would be a line [DEBUG] incoming msg: and the next one should be [DEBUG] add a message to locator:. It will be nice to see both.

The issue with hashing itself is unlikely as all it does is this and, as you can see, not much chance of making a mistake here:

// MsgHash returns sha256 hash of a message
// we use hash to avoid storing potentially long messages and all we need is just match
func (l *Locator) MsgHash(msg string) string {
	return fmt.Sprintf("%x", sha256.Sum256([]byte(msg)))
}

I suspect what message itself modified by tg as you forward it to the admin chat/channel.

from tg-spam.

alehano avatar alehano commented on July 23, 2024

For the second case, you should also have a log of the moment the message was hashed

I added the log above.

I suspect what message itself modified by tg as you forward it to the admin chat/channel.

Yes, hashing itself is straightforward.

SHA256 of "Нужна инfoграфика⁉️ Тогда тебе 🫵 ко мне, обращайтесь‼️ Портfолио в шапке профиля." is
7b8b365159e31747791dd995c98298f5241221bf8f49c46c81ee2192d80c6219
But somehow to the DB saved different.

I don't forward message to the admin group, bot detect the spam itself. I just press "confirm ban" in the group.

from tg-spam.

umputun avatar umputun commented on July 23, 2024

pls give it another try (:master image), and if all is good, I'll tag it. My local tests with single and multiline messages worked just fine.

I have extended the callback data (the info associated with the button markup) to include msgID in addition to the userID, and this way, the locator matching role was minimized. It is still needed, but not for the case above. There is one caveat: the size of this data is limited to 64 bytes by the telegram bot api. Currently, the sum length for both msgID and userID is not even close, but if someday we run into this (and tg keeps this limit unchanged) we could have an id-lookup table of some sort.

from tg-spam.

alehano avatar alehano commented on July 23, 2024

Today I had one case when the bot didn't delete spam message forwarded to a admin group. I'm not sure is it training mode specific problem or general.
Looks like the bot didn't get the message text.
Sha256 e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 is empty string.

Here is logs:

2023/12/28 09:03:24.411 [DEBUG] {events/listener.go:238 events.(*TelegramListener).isAdminChat} message in admin chat -4058117643, from pretty_friend
2023/12/28 09:03:24.411 [DEBUG] {events/admin.go:58 events.(*admin).MsgHandler} message from admin chat: msg id: 401, update id: 84145103, from: pretty_friend, sender: ""
2023/12/28 09:03:24.411 [DEBUG] {events/admin.go:69 events.(*admin).MsgHandler} forwarded message from superuser "pretty_friend" to admin chat -4058117643: ""
2023/12/28 09:03:24.412 [DEBUG] {storage/locator.go:131 storage.(*Locator).Message} failed to find message by hash "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855": sql: no rows in result set
2023/12/28 09:03:24.412 [WARN]  {events/listener.go:115 events.(*TelegramListener).Do} failed to process admin chat message: not found "" in locator
2023/12/28 09:03:24.413 [DEBUG] {events/listener.go:272 events.(*TelegramListener).sendBotResponse} bot response - error: not found "" in locator, reply-to:0

Here is the message, maybe it will be useful for a test case:

💎🎄ГОTОВЬСЯ K HOBОMУ ΓOДᎩ С НОBЫМ ДОХОДOМ🎄💎

❗️𝖬ecтa огрaничeны❗️ 

💰Πpисoединяйcя к нaш𝚎й кoманде в сф𝚎p𝚎 цифpoвых активоⲃ💰

🔥Тeбя ждет нoвoe интереcнoе нaпpaⲃлeние, пacивный yдaл𝚎нный доxoд oт 10 000 ₽₽₽/день, и бecплaтноe обyчeние🧑‍💻💸

Bcё, чтo нyжнo – вoзpаcт от 18 л𝚎т, смаpтфон и 1-2 часа ⲃремени ⲃ день
❗️Всe л𝚎гально и б𝚎з пр𝚎доплaт❗️

Давaй встречать Нoвый гoд c увеpеннocтью ⲃ cвоем Փинaнcовом yспехe! 💰🚀

‼️Пиши напрямyю pукоⲃодителю 👉 @ser______ratov

Here is how it looks like:
Screenshot 2023-12-28 at 09 30 34

I suppose it can be because of a picture. As I know Telegram sends a picture as a separate message and then glue it together with a text.

from tg-spam.

umputun avatar umputun commented on July 23, 2024

@alehano pls note - github doesn't send me any notifications on this repo, so there is no chance I would discover a comment to closed PR. It is better to open an issue, in this case, I can check it manually

from tg-spam.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.