umputun / tg-spam Goto Github PK

View Code? Open in Web Editor NEW

136.0 6.0 20.0 49.58 MB

Anti-Spam bot for Telegram

Home Page: https://tg-spam.umputun.dev

License: MIT License

Go 93.80% Shell 0.53% Dockerfile 0.33% CSS 0.22% HTML 4.76% Makefile 0.34%

anit-spam self-hosted telegram-bot spam-detection spam-classification

tg-spam's People

Stargazers

Watchers

tg-spam's Issues

Problem with spam messages containing pictures

Today I had one case when the bot didn't delete spam message forwarded to a admin group. I'm not sure is it training mode specific problem or general.
Looks like the bot didn't get the message text.
Sha256 e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 is empty string.

Here is logs:

2023/12/28 09:03:24.411 [DEBUG] {events/listener.go:238 events.(*TelegramListener).isAdminChat} message in admin chat -4058117643, from pretty_friend
2023/12/28 09:03:24.411 [DEBUG] {events/admin.go:58 events.(*admin).MsgHandler} message from admin chat: msg id: 401, update id: 84145103, from: pretty_friend, sender: ""
2023/12/28 09:03:24.411 [DEBUG] {events/admin.go:69 events.(*admin).MsgHandler} forwarded message from superuser "pretty_friend" to admin chat -4058117643: ""
2023/12/28 09:03:24.412 [DEBUG] {storage/locator.go:131 storage.(*Locator).Message} failed to find message by hash "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855": sql: no rows in result set
2023/12/28 09:03:24.412 [WARN]  {events/listener.go:115 events.(*TelegramListener).Do} failed to process admin chat message: not found "" in locator
2023/12/28 09:03:24.413 [DEBUG] {events/listener.go:272 events.(*TelegramListener).sendBotResponse} bot response - error: not found "" in locator, reply-to:0

Here is the message, maybe it will be useful for a test case:

💎🎄ГОTОВЬСЯ K HOBОMУ ΓOДᎩ С НОBЫМ ДОХОДOМ🎄💎

❗️𝖬ecтa огрaничeны❗️ 

💰Πpисoединяйcя к нaш𝚎й кoманде в сф𝚎p𝚎 цифpoвых активоⲃ💰

🔥Тeбя ждет нoвoe интереcнoе нaпpaⲃлeние, пacивный yдaл𝚎нный доxoд oт 10 000 ₽₽₽/день, и бecплaтноe обyчeние🧑‍💻💸

Bcё, чтo нyжнo – вoзpаcт от 18 л𝚎т, смаpтфон и 1-2 часа ⲃремени ⲃ день
❗️Всe л𝚎гально и б𝚎з пр𝚎доплaт❗️

Давaй встречать Нoвый гoд c увеpеннocтью ⲃ cвоем Փинaнcовом yспехe! 💰🚀

‼️Пиши напрямyю pукоⲃодителю 👉 @ser______ratov

Here is how it looks like:

I suppose it can be because of a picture. As I know Telegram sends a picture as a separate message and then glue it together with a text.

Originally posted by @alehano in #17 (comment)

UPD: Today I had the second case after forwarding a message to an admin group.

Auto delete message

Is there a feature to just delete the messages instead of banning the user, when the word selected in the spam filter?

If this feature exists, I didn't find it, could someone tell me, please?

Unable to properly ban spam via official(?) Channel_Bot

I've run into spam attack via Channel_Bot. The tg-spam can't properly ban users that post the messages. It seems tg-spam tries to ban Channel_Bot "id":136817688, but the culprit is the user that somehow abuse it.

See logs

The spam-bot admin group is spammed with unsuccessful attempts to ban Channel_Bot but not the culprits.

Warnings in logs

During the ban-unban cycle I see the following messages in the log:

tg-spam  | 2024/02/14 08:06:41.823 [INFO]  user Мария Свиткова detected as spammer: {name: stopword, spam: false, details: not found}, {name: emoji, spam: false, details: 0/2}, {name: similarity, spam: false, details: 0.04/0.50}, {name: classifier, spam: true, details: probability of spam: 61.25%}, {name: cas, spam: false, details: record not found}, "не переживай) из 5 северных сияний, которые были у нас в городе, я проспала 2. И еще 2 были видны из любого села по соседству, но не у нас, потому что над нашим стояли очередные облака"
tg-spam  | 2024/02/14 08:06:41.833 [INFO]  detected spam entry added for user_id:370236309, name:irda_noire
tg-spam  | 2024/02/14 08:06:41.877 [INFO]  {370236309 irda_noire Мария Свиткова} banned by bot for 9600h0m0s
tg-spam  | 2024/02/14 08:06:53.479 [WARN]  failed to send message as markdown, Bad Request: can't parse entities: Can't find end of the entity starting at byte offset 34
tg-spam  | 2024/02/14 08:06:57.660 [INFO]  add aproved user: id:370236309, name:"irda_noire"
tg-spam  | 2024/02/14 08:06:57.663 [INFO]  user "irda_noire" (370236309) added to approved users
tg-spam  | 2024/02/14 08:06:57.667 [WARN]  failed to send message as markdown, Bad Request: can't parse entities: Can't find end of the entity starting at byte offset 603
tg-spam  | 2024/02/14 08:06:57.778 [INFO]  user unbanned, chatID: -1002096077129, userID: 370236309:66269, orig: "permanently banned {370236309 irda_noire Мария Свиткова}\n\nне переживай) из 5 северных сияний, которые были у нас в городе, я проспала 2. И еще 2 были видны из любого села по соседству, но не у нас, потому что над нашим стояли очередные облака\n\n**spam detection results**\n- stopword: ham, not found\n- emoji: ham, 0/2\n- similarity: ham, 0.04/0.50\n- classifier: spam, probability of spam: 61.25%\n- cas: ham, record not found"

Lines 4 and 7 are of particular interest: it looks like the bot tries to send a message and can't? Probably repeat the message sent?

failed to process direct spam report

When we reply to a message with spam we get the following message in the admin channel

However the post is not removed and get the following errors in the console.

tg-spam  | 2024/03/29 19:53:35.575 [WARN]  can't write to log, can't make directories for new logfile: mkdir /srv/logs: permission denied
tg-spam  | 2024/03/29 19:53:35.582 [INFO]  detected spam entry added for user_id:6609487300, name:KamaraShell
tg-spam  | 2024/03/29 19:53:35.928 [INFO]  {6609487300 KamaraShell Shelby Kamara} banned by bot for 9600h0m0s
tg-spam  | 2024/03/29 20:31:31.396 [INFO]  user Сергей Гуськов detected as spammer: {name: stopword, spam: false, details: not found}, {name: emoji, spam: false, details: 0/2}, {name: similarity, spam: false, details: 0.14/0.50}, {name: classifier, spam: true, details: probability of spam: 99.42%}, {name: cas, spam: false, details: record not found}, {name: openai, spam: true, details: Promotion of a get-rich-quick scheme, confidence: 95%}, "A dream come true is when you double your savings in just 1 night, @WhalesofRocketmoonsignal is the only way to make it happen"
tg-spam  | 2024/03/29 20:31:31.396 [WARN]  can't write to log, can't make directories for new logfile: mkdir /srv/logs: permission denied
tg-spam  | 2024/03/29 20:31:31.404 [INFO]  detected spam entry added for user_id:7088948055, name:Thomas0Wright7
tg-spam  | 2024/03/29 20:31:31.732 [INFO]  {7088948055 Thomas0Wright7 Сергей Гуськов} banned by bot for 9600h0m0s
tg-spam  | 2024/03/30 00:04:44.719 [INFO]  user "1752434186" added to approved users
tg-spam  | 2024/03/30 00:07:17.847 [INFO]  user "1752434186" added to approved users
tg-spam  | 2024/03/30 02:52:33.163 [INFO]  user "edgepillar" (1560869066) added to approved users
tg-spam  | 2024/03/30 03:31:20.490 [INFO]  user "Yourself_ZNN" (6172198354) added to approved users
tg-spam  | 2024/03/30 06:07:12.981 [INFO]  user "triplea_z" (482795147) added to approved users
tg-spam  | 2024/03/30 07:08:32.013 [INFO]  remove aproved user: 6017513172
tg-spam  | 2024/03/30 07:08:32.233 [WARN]  failed to process direct spam report: failed to update spam for "": can't update spam samples: can't update spam samples: failed to open /srv/data/spam-dynamic.txt: open /srv/data/spam-dynamic.txt: no such file or directory
tg-spam  | 2024/03/30 07:22:10.251 [INFO]  remove aproved user: 6017513172
tg-spam  | 2024/03/30 07:22:10.460 [WARN]  failed to process direct spam report: failed to update spam for "": can't update spam samples: can't update spam samples: failed to open /srv/data/spam-dynamic.txt: open /srv/data/spam-dynamic.txt: no such file or directory
tg-spam  | 2024/03/30 07:22:44.962 [INFO]  remove aproved user: 6017513172
tg-spam  | 2024/03/30 07:22:45.178 [WARN]  failed to process direct spam report: failed to update spam for "": can't update spam samples: can't update spam samples: failed to open /srv/data/spam-dynamic.txt: open /srv/data/spam-dynamic.txt: no such file or directory

Here is the docker-compose.yml file. We are also getting a can't write to log, can't make directories for new logfile: mkdir /srv/logs: permission denied error. I see in other discussion this folder needs to be created, which it is. We have a logs folder in the same folder where docker-compose.yml exists.

services:
  tg-spam:
    image: umputun/tg-spam:latest
    hostname: tg-spam
    restart: always
    container_name: tg-spam
    user: "1000:1000" # set uid:gid to host user to avoid permission issues with mounted volumes
    logging: &default_logging
      driver: json-file
      options:
        max-size: "10m"
        max-file: "5"
    environment:
      - TZ=America/Chicago
      - TELEGRAM_TOKEN=6901977077:AAEcZdP4i9nFu4yak2uUNzq9S0VZJB7QYRE
      - TELEGRAM_GROUP=zenonnetwork
      - ADMIN_GROUP=-1001991632774 # admin group id
      - LOGGER_ENABLED=true     # enable logging
      - LOGGER_FILE=/srv/logs/tg-spam.log
      - LOGGER_MAX_SIZE=5M      # max log file size in megabytes before rotation
      - NO_SPAM_REPLY=true      # do not reply to spam messages in the public group
      - SERVER_ENABLED=false    # enable server, default port is 8080
      - OPENAI_TOKEN=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
    volumes:
      - ./logs:/srv/log
      - data:/srv/data
    command: --openai.veto

volumes:
  data:

Do not remove users after unban

According to docs

By default, this method guarantees that after the call the user is not a member of the chat, but will be able to join it. So if the user is a member of the chat they will also be removed from the chat

In my use case I would like to avoid it. In order to do it the only_if_banned parameter has to be passed. I would like to add it.

I will do it if you are ok with it. Do you think it should be run parameter, or it's fine to make it only possible behaviour of tg-spam?

CAPTCHA support

Hello! Thanks for your opensource bot - I am already using it in a fairly large group (several thousand users).

Is it appropriate to add captcha support for this bot? Most spam bots are not smart enough to pass even the simplest captcha checks, so this feature will help a lot.

I have opened this issue for discussion - if you agree, I would like to develop this feature myself.

Allow forwards without bans

We have an admin group for our main group, and sometimes we just forward messages from one to another to discuss something. I would really like users of the forwarded messages to not be banned, instead, I would prefer to mark messages as spam/ham manually.

Extend supported commands

The idea is to add more commands:

admins only

The /ban command performs the same thing as /spam but skips adding the message to spam samples. Sometimes, users post a message that is not generally spam but can be considered spam in a given context only; for example, someone is trying to promote his product.
/warn - replaces it in response with the predefined text, optionally restricts the user for some period of time

regular users

/report to allow users to report spam, see discussion in #58 for more details

Bot deletes messages of another bots even promoted to admin

The bot recognized as a spam a message from another bot — it was basically the message from the channel posted by Telegram itself to a group linked to the channel.
After unban and posting the same message again, the bot again detected a spam. I have added Telegram (ID: 777000) as a superuser to let the bot to ignore its messages.

Logs are below.

2023/12/14 12:54:48 stdout �[36m2023/12/14 10:54:48.530�[0m �[37m[DEBUG]�[0m �[34m{events/events.go:338 events.(*TelegramListener).sendBotResponse}�[0m �[37mbot response - permanently banned {777000  Telegram}\n⛔︎ unban if wrong ⛔︎\n\nВот вы всё "Apple", "Google", "монополия сторов" — мелко это всё. Вот поезда — это круто. В Польше одна региональная железная дорога решила сэкономить на обслуживании купленных пассажирских поездов Impuls от компании NEWAG и наняла независимую (от производителя) компанию для этого. После того, как сервисные работы были закончены, несколько поездов отказались заводиться. Ситуация была откровенно угрожающей (поездов не хватало для обеспечения перевозок), когда кто-то из персонала набрал в гугле "польские хакеры" и в итоге к проблеме подключились белые хакеры из группы Dragon Sector.   Им удалось разобраться с загадочными ошибками и обнаружить, что в поездах работает система распознавания "workshop-detection", которая начинает препятствовать работе, если обнаружено вмешательство неавторизованного механика. Короче говоря, производитель делает из поезда "кирпич", если его ремонтирует кто-то другой, показывая массу сообщений о нарушении копирайта и даже вроде имея возможность заблокировать поезд удаленно.  Хакерам удалось обойти это ограничение и запустить поезда. А производитель теперь отказывается от ответственности и заявляет, что ничего такого не делал и вообще грозит судиться с хакерами за клевету. Настаивая при этом, что ошибки были вызваны недостаточной квалификацией ремонтников, потому что обслуживать поезда должны только сотрудники производителя.   @blognot  https://www.404media.co/polish-hackers-repaired-trains-the-manufacturer-artificially-bricked-now-the-train-company-is-threatening-them/\n\n, reply-to:0�[0m

2023/12/14 12:54:48 stdout �[36m2023/12/14 10:54:48.530�[0m �[31m[WARN] �[0m �[34m{server/server.go:232 server.(*SpamWeb).UnbanURL}�[0m �[31mfailed to compress message "Вот вы всё "Apple", "Google", "монополия сторов" — мелко это всё. Вот поезда — это круто. В Польше одна региональная железная дорога решила сэкономить на обслуживании купленных пассажирских поездов Impuls от компании NEWAG и наняла независимую (от производителя) компанию для этого. После того, как сервисные работы были закончены, несколько поездов отказались заводиться. Ситуация была откровенно угрожающей (поездов не хватало для обеспечения перевозок), когда кто-то из персонала набрал в гугле "польские хакеры" и в итоге к проблеме подключились белые хакеры из группы Dragon Sector.   Им удалось разобраться с загадочными ошибками и обнаружить, что в поездах работает система распознавания "workshop-detection", которая начинает препятствовать работе, если обнаружено вмешательство неавторизованного механика. Короче говоря, производитель делает из поезда "кирпич", если его ремонтирует кто-то другой, показывая массу сообщений о нарушении копирайта и даже вроде имея возможность заблокировать поезд удаленно.  Хакерам удалось обойти это ограничение и запустить поезда. А производитель теперь отказывается от ответственности и заявляет, что ничего такого не делал и вообще грозит судиться с хакерами за клевету. Настаивая при этом, что ошибки были вызваны недостаточной квалификацией ремонтников, потому что обслуживать поезда должны только сотрудники производителя.   @blognot  https://www.404media.co/polish-hackers-repaired-trains-the-manufacturer-artificially-bricked-now-the-train-company-is-threatening-them/", encoded string is too long: 1622 characters�[0m

2023/12/14 12:54:48 stdout �[36m2023/12/14 10:54:48.529�[0m �[37m[DEBUG]�[0m �[34m{events/events.go:306 events.(*TelegramListener).reportToAdminChat}�[0m �[37mreport to admin chat, ban data for {777000  Telegram}, group: 120025072�[0m

2023/12/14 12:54:48 stdout �[36m2023/12/14 10:54:48.529�[0m �[33m[INFO] �[0m �[34m{events/events.go:206 events.(*TelegramListener).procEvents}�[0m �[33m{777000  Telegram} banned by bot for 9600h0m0s�[0m

2023/12/14 12:54:48 stdout �[36m2023/12/14 10:54:48.436�[0m �[37m[DEBUG]�[0m �[34m{app/main.go:260 main.execute.makeSpamLogger.func11}�[0m �[37mspam message: this is spam: "Telegram" (777000)�[0m

2023/12/14 12:54:48 stdout �[36m2023/12/14 10:54:48.436�[0m �[33m[INFO] �[0m �[34m{app/main.go:259 main.execute.makeSpamLogger.func11}�[0m �[33mspam detected from {777000  Telegram}, response: this is spam: "Telegram" (777000)�[0m

2023/12/14 12:54:48 stdout �[36m2023/12/14 10:54:48.436�[0m �[37m[DEBUG]�[0m �[34m{events/events.go:198 events.(*TelegramListener).procEvents}�[0m �[37mban initiated for {Text:this is spam: "Telegram" (777000) Send:true BanInterval:9600h0m0s User:{ID:777000 Username: DisplayName:Telegram} ChannelID:0 ReplyTo:42734 DeleteReplyTo:true}�[0m

2023/12/14 12:54:48 stdout �[36m2023/12/14 10:54:48.161�[0m �[37m[DEBUG]�[0m �[34m{events/events.go:338 events.(*TelegramListener).sendBotResponse}�[0m �[37mbot response - this is spam: "Telegram" (777000), reply-to:42734�[0m

2023/12/14 12:54:48 stdout �[36m2023/12/14 10:54:48.161�[0m �[33m[INFO] �[0m �[34m{bot/spam.go:84 bot.(*SpamFilter).OnMessage}�[0m �[33muser Telegram detected as spammer: {name: stopword, spam: false, details: not found}, {name: emoji, spam: false, details: 0/2}, {name: similarity, spam: false, details: 0.10/0.50}, {name: classifier, spam: true, details: probability: NaN%, certain: true}, {name: cas, spam: false, details: Record not found.}, "Вот вы всё "Apple", "Google", "монополия сторов" — мелко это всё. Вот поезда — это круто. В Польше одна региональная железная дорога решила сэкономить на обслуживании купленных пассажирских поездов Impuls от компании NEWAG и наняла независимую (от производителя) компанию для этого. После того, как сервисные работы были закончены, несколько поездов отказались заводиться. Ситуация была откровенно угрожающей (поездов не хватало для обеспечения перевозок), когда кто-то из персонала набрал в гугле "польские хакеры" и в итоге к проблеме подключились белые хакеры из группы Dragon Sector. \n\nИм удалось разобраться с загадочными ошибками и обнаружить, что в поездах работает система распознавания "workshop-detection", которая начинает препятствовать работе, если обнаружено вмешательство неавторизованного механика. Короче говоря, производитель делает из поезда "кирпич", если его ремонтирует кто-то другой, показывая массу сообщений о нарушении копирайта и даже вроде имея возможность заблокировать поезд удаленно.\n\nХакерам удалось обойти это ограничение и запустить поезда. А производитель теперь отказывается от ответственности и заявляет, что ничего такого не делал и вообще грозит судиться с хакерами за клевету. Настаивая при этом, что ошибки были вызваны недостаточной квалификацией ремонтников, потому что обслуживать поезда должны только сотрудники производителя. \n\n@blognot\n\nhttps://www.404media.co/polish-hackers-repaired-trains-the-manufacturer-artificially-bricked-now-the-train-company-is-threatening-them/"�[0m

2023/12/14 12:54:47 stdout �[36m2023/12/14 10:54:47.916�[0m �[37m[DEBUG]�[0m �[34m{events/events.go:184 events.(*TelegramListener).procEvents}�[0m �[37mincoming msg: Вот вы всё "Apple", "Google", "монополия сторов" — мелко это всё. Вот поезда — это круто. В Польше одна региональная железная дорога решила сэкономить на обслуживании купленных пассажирских поездов Impuls от компании NEWAG и наняла независимую (от производителя) компанию для этого. После того, как сервисные работы были закончены, несколько поездов отказались заводиться. Ситуация была откровенно угрожающей (поездов не хватало для обеспечения перевозок), когда кто-то из персонала набрал в гугле "польские хакеры" и в итоге к проблеме подключились белые хакеры из группы Dragon Sector.   Им удалось разобраться с загадочными ошибками и обнаружить, что в поездах работает система распознавания "workshop-detection", которая начинает препятствовать работе, если обнаружено вмешательство неавторизованного механика. Короче говоря, производитель делает из поезда "кирпич", если его ремонтирует кто-то другой, показывая массу сообщений о нарушении копирайта и даже вроде имея возможность заблокировать поезд удаленно.  Хакерам удалось обойти это ограничение и запустить поезда. А производитель теперь отказывается от ответственности и заявляет, что ничего такого не делал и вообще грозит судиться с хакерами за клевету. Настаивая при этом, что ошибки были вызваны недостаточной квалификацией ремонтников, потому что обслуживать поезда должны только сотрудники производителя.   @blognot  https://www.404media.co/polish-hackers-repaired-trains-the-manufacturer-artificially-bricked-now-the-train-company-is-threatening-them/�[0m

2023/12/14 12:54:47 stdout �[36m2023/12/14 10:54:47.916�[0m �[37m[DEBUG]�[0m �[34m{events/events.go:162 events.(*TelegramListener).procEvents}�[0m �[37m{"message_id":42734,"from":{"id":777000,"first_name":"Telegram"},"sender_chat":{"id":-1001065632275,"type":"channel","title":"БлоGнот","username":"blognot","photo":null,"location":null},"date":1702551287,"chat":{"id":-1001226560034,"type":"supergroup","title":"БлоGнот комментарии","username":"blognot_chat","photo":null,"location":null},"forward_from_chat":{"id":-1001065632275,"type":"channel","title":"БлоGнот","username":"blognot","photo":null,"location":null},"forward_from_message_id":4533,"forward_date":1702551284,"is_automatic_forward":true,"text":"Вот вы всё "Apple", "Google", "монополия сторов" — мелко это всё. Вот поезда — это круто. В Польше одна региональная железная дорога решила сэкономить на обслуживании купленных пассажирских поездов Impuls от компании NEWAG и наняла независимую (от производителя) компанию для этого. После того, как сервисные работы были закончены, несколько поездов отказались заводиться. Ситуация была откровенно угрожающей (поездов не хватало для обеспечения перевозок), когда кто-то из персонала набрал в гугле "польские хакеры" и в итоге к проблеме подключились белые хакеры из группы Dragon Sector. \n\nИм удалось разобраться с загадочными ошибками и обнаружить, что в поездах работает система распознавания "workshop-detection", которая начинает препятствовать работе, если обнаружено вмешательство неавторизованного механика. Короче говоря, производитель делает из поезда "кирпич", если его ремонтирует кто-то другой, показывая массу сообщений о нарушении копирайта и даже вроде имея возможность заблокировать поезд удаленно.\n\nХакерам удалось обойти это ограничение и запустить поезда. А производитель теперь отказывается от ответственности и заявляет, что ничего такого не делал и вообще грозит судиться с хакерами за клевету. Настаивая при этом, что ошибки были вызваны недостаточной квалификацией ремонтников, потому что обслуживать поезда должны только сотрудники производителя. \n\n@blognot\n\nhttps://www.404media.co/polish-hackers-repaired-trains-the-manufacturer-artificially-bricked-now-the-train-company-is-threatening-them/","entities":[{"type":"mention","offset":1374,"length":8},{"type":"url","offset":1384,"length":135}],"message_auto_delete_timer_changed":null,"proximity_alert_triggered":null,"voice_chat_scheduled":null,"voice_chat_started":null,"voice_chat_ended":null,"voice_chat_participants_invited":null}�[0m

2023/12/14 12:54:12 stdout �[36m2023/12/14 10:54:12.289�[0m �[33m[INFO] �[0m �[34m{server/server.go:179 server.(*SpamWeb).unbanHandler}�[0m �[33munban user 777000�[0m

2023/12/14 12:54:48	stdout	�[36m2023/12/14 10:54:48.530�[0m �[37m[DEBUG]�[0m �[34m{events/events.go:338 events.(TelegramListener).sendBotResponse}�[0m �[37mbot response - permanently banned {777000 Telegram}*\n⛔︎ unban if wrong ⛔︎\n\nВот вы всё "Apple", "Google", "монополия сторов" — мелко это всё. Вот поезда — это круто. В Польше одна региональная железная дорога решила сэкономить на обслуживании купленных пассажирских поездов Impuls от компании NEWAG и наняла независимую (от производителя) компанию для этого. После того, как сервисные работы были закончены, несколько поездов отказались заводиться. Ситуация была откровенно угрожающей (поездов не хватало для обеспечения перевозок), когда кто-то из персонала набрал в гугле "польские хакеры" и в итоге к проблеме подключились белые хакеры из группы Dragon Sector. Им удалось разобраться с загадочными ошибками и обнаружить, что в поездах работает система распознавания "workshop-detection", которая начинает препятствовать работе, если обнаружено вмешательство неавторизованного механика. Короче говоря, производитель делает из поезда "кирпич", если его ремонтирует кто-то другой, показывая массу сообщений о нарушении копирайта и даже вроде имея возможность заблокировать поезд удаленно. Хакерам удалось обойти это ограничение и запустить поезда. А производитель теперь отказывается от ответственности и заявляет, что ничего такого не делал и вообще грозит судиться с хакерами за клевету. Настаивая при этом, что ошибки были вызваны недостаточной квалификацией ремонтников, потому что обслуживать поезда должны только сотрудники производителя. @blognot https://www.404media.co/polish-hackers-repaired-trains-the-manufacturer-artificially-bricked-now-the-train-company-is-threatening-them/\n\n, reply-to:0�[0m
2023/12/14 12:54:48	stdout	�[36m2023/12/14 10:54:48.530�[0m �[31m[WARN] �[0m �[34m{server/server.go:232 server.(*SpamWeb).UnbanURL}�[0m �[31mfailed to compress message "Вот вы всё "Apple", "Google", "монополия сторов" — мелко это всё. Вот поезда — это круто. В Польше одна региональная железная дорога решила сэкономить на обслуживании купленных пассажирских поездов Impuls от компании NEWAG и наняла независимую (от производителя) компанию для этого. После того, как сервисные работы были закончены, несколько поездов отказались заводиться. Ситуация была откровенно угрожающей (поездов не хватало для обеспечения перевозок), когда кто-то из персонала набрал в гугле "польские хакеры" и в итоге к проблеме подключились белые хакеры из группы Dragon Sector. Им удалось разобраться с загадочными ошибками и обнаружить, что в поездах работает система распознавания "workshop-detection", которая начинает препятствовать работе, если обнаружено вмешательство неавторизованного механика. Короче говоря, производитель делает из поезда "кирпич", если его ремонтирует кто-то другой, показывая массу сообщений о нарушении копирайта и даже вроде имея возможность заблокировать поезд удаленно. Хакерам удалось обойти это ограничение и запустить поезда. А производитель теперь отказывается от ответственности и заявляет, что ничего такого не делал и вообще грозит судиться с хакерами за клевету. Настаивая при этом, что ошибки были вызваны недостаточной квалификацией ремонтников, потому что обслуживать поезда должны только сотрудники производителя. @blognot https://www.404media.co/polish-hackers-repaired-trains-the-manufacturer-artificially-bricked-now-the-train-company-is-threatening-them/", encoded string is too long: 1622 characters�[0m
2023/12/14 12:54:48	stdout	�[36m2023/12/14 10:54:48.529�[0m �[37m[DEBUG]�[0m �[34m{events/events.go:306 events.(*TelegramListener).reportToAdminChat}�[0m �[37mreport to admin chat, ban data for {777000 Telegram}, group: 120025072�[0m
2023/12/14 12:54:48	stdout	�[36m2023/12/14 10:54:48.529�[0m �[33m[INFO] �[0m �[34m{events/events.go:206 events.(*TelegramListener).procEvents}�[0m �[33m{777000 Telegram} banned by bot for 9600h0m0s�[0m
2023/12/14 12:54:48	stdout	�[36m2023/12/14 10:54:48.436�[0m �[37m[DEBUG]�[0m �[34m{app/main.go:260 main.execute.makeSpamLogger.func11}�[0m �[37mspam message: this is spam: "Telegram" (777000)�[0m
2023/12/14 12:54:48	stdout	�[36m2023/12/14 10:54:48.436�[0m �[33m[INFO] �[0m �[34m{app/main.go:259 main.execute.makeSpamLogger.func11}�[0m �[33mspam detected from {777000 Telegram}, response: this is spam: "Telegram" (777000)�[0m
2023/12/14 12:54:48	stdout	�[36m2023/12/14 10:54:48.436�[0m �[37m[DEBUG]�[0m �[34m{events/events.go:198 events.(*TelegramListener).procEvents}�[0m �[37mban initiated for {Text:this is spam: "Telegram" (777000) Send:true BanInterval:9600h0m0s User:{ID:777000 Username: DisplayName:Telegram} ChannelID:0 ReplyTo:42734 DeleteReplyTo:true}�[0m
2023/12/14 12:54:48	stdout	�[36m2023/12/14 10:54:48.161�[0m �[37m[DEBUG]�[0m �[34m{events/events.go:338 events.(*TelegramListener).sendBotResponse}�[0m �[37mbot response - this is spam: "Telegram" (777000), reply-to:42734�[0m
2023/12/14 12:54:48	stdout	�[36m2023/12/14 10:54:48.161�[0m �[33m[INFO] �[0m �[34m{bot/spam.go:84 bot.(*SpamFilter).OnMessage}�[0m �[33muser Telegram detected as spammer: {name: stopword, spam: false, details: not found}, {name: emoji, spam: false, details: 0/2}, {name: similarity, spam: false, details: 0.10/0.50}, {name: classifier, spam: true, details: probability: NaN%, certain: true}, {name: cas, spam: false, details: Record not found.}, "Вот вы всё "Apple", "Google", "монополия сторов" — мелко это всё. Вот поезда — это круто. В Польше одна региональная железная дорога решила сэкономить на обслуживании купленных пассажирских поездов Impuls от компании NEWAG и наняла независимую (от производителя) компанию для этого. После того, как сервисные работы были закончены, несколько поездов отказались заводиться. Ситуация была откровенно угрожающей (поездов не хватало для обеспечения перевозок), когда кто-то из персонала набрал в гугле "польские хакеры" и в итоге к проблеме подключились белые хакеры из группы Dragon Sector. \n\nИм удалось разобраться с загадочными ошибками и обнаружить, что в поездах работает система распознавания "workshop-detection", которая начинает препятствовать работе, если обнаружено вмешательство неавторизованного механика. Короче говоря, производитель делает из поезда "кирпич", если его ремонтирует кто-то другой, показывая массу сообщений о нарушении копирайта и даже вроде имея возможность заблокировать поезд удаленно.\n\nХакерам удалось обойти это ограничение и запустить поезда. А производитель теперь отказывается от ответственности и заявляет, что ничего такого не делал и вообще грозит судиться с хакерами за клевету. Настаивая при этом, что ошибки были вызваны недостаточной квалификацией ремонтников, потому что обслуживать поезда должны только сотрудники производителя. \n\n@blognot\n\nhttps://www.404media.co/polish-hackers-repaired-trains-the-manufacturer-artificially-bricked-now-the-train-company-is-threatening-them/"�[0m
2023/12/14 12:54:47	stdout	�[36m2023/12/14 10:54:47.916�[0m �[37m[DEBUG]�[0m �[34m{events/events.go:184 events.(*TelegramListener).procEvents}�[0m �[37mincoming msg: Вот вы всё "Apple", "Google", "монополия сторов" — мелко это всё. Вот поезда — это круто. В Польше одна региональная железная дорога решила сэкономить на обслуживании купленных пассажирских поездов Impuls от компании NEWAG и наняла независимую (от производителя) компанию для этого. После того, как сервисные работы были закончены, несколько поездов отказались заводиться. Ситуация была откровенно угрожающей (поездов не хватало для обеспечения перевозок), когда кто-то из персонала набрал в гугле "польские хакеры" и в итоге к проблеме подключились белые хакеры из группы Dragon Sector. Им удалось разобраться с загадочными ошибками и обнаружить, что в поездах работает система распознавания "workshop-detection", которая начинает препятствовать работе, если обнаружено вмешательство неавторизованного механика. Короче говоря, производитель делает из поезда "кирпич", если его ремонтирует кто-то другой, показывая массу сообщений о нарушении копирайта и даже вроде имея возможность заблокировать поезд удаленно. Хакерам удалось обойти это ограничение и запустить поезда. А производитель теперь отказывается от ответственности и заявляет, что ничего такого не делал и вообще грозит судиться с хакерами за клевету. Настаивая при этом, что ошибки были вызваны недостаточной квалификацией ремонтников, потому что обслуживать поезда должны только сотрудники производителя. @blognot https://www.404media.co/polish-hackers-repaired-trains-the-manufacturer-artificially-bricked-now-the-train-company-is-threatening-them/�[0m
2023/12/14 12:54:47	stdout	�[36m2023/12/14 10:54:47.916�[0m �[37m[DEBUG]�[0m �[34m{events/events.go:162 events.(*TelegramListener).procEvents}�[0m �[37m{"message_id":42734,"from":{"id":777000,"first_name":"Telegram"},"sender_chat":{"id":-1001065632275,"type":"channel","title":"БлоGнот","username":"blognot","photo":null,"location":null},"date":1702551287,"chat":{"id":-1001226560034,"type":"supergroup","title":"БлоGнот комментарии","username":"blognot_chat","photo":null,"location":null},"forward_from_chat":{"id":-1001065632275,"type":"channel","title":"БлоGнот","username":"blognot","photo":null,"location":null},"forward_from_message_id":4533,"forward_date":1702551284,"is_automatic_forward":true,"text":"Вот вы всё "Apple", "Google", "монополия сторов" — мелко это всё. Вот поезда — это круто. В Польше одна региональная железная дорога решила сэкономить на обслуживании купленных пассажирских поездов Impuls от компании NEWAG и наняла независимую (от производителя) компанию для этого. После того, как сервисные работы были закончены, несколько поездов отказались заводиться. Ситуация была откровенно угрожающей (поездов не хватало для обеспечения перевозок), когда кто-то из персонала набрал в гугле "польские хакеры" и в итоге к проблеме подключились белые хакеры из группы Dragon Sector. \n\nИм удалось разобраться с загадочными ошибками и обнаружить, что в поездах работает система распознавания "workshop-detection", которая начинает препятствовать работе, если обнаружено вмешательство неавторизованного механика. Короче говоря, производитель делает из поезда "кирпич", если его ремонтирует кто-то другой, показывая массу сообщений о нарушении копирайта и даже вроде имея возможность заблокировать поезд удаленно.\n\nХакерам удалось обойти это ограничение и запустить поезда. А производитель теперь отказывается от ответственности и заявляет, что ничего такого не делал и вообще грозит судиться с хакерами за клевету. Настаивая при этом, что ошибки были вызваны недостаточной квалификацией ремонтников, потому что обслуживать поезда должны только сотрудники производителя. \n\n@blognot\n\nhttps://www.404media.co/polish-hackers-repaired-trains-the-manufacturer-artificially-bricked-now-the-train-company-is-threatening-them/","entities":[{"type":"mention","offset":1374,"length":8},{"type":"url","offset":1384,"length":135}],"message_auto_delete_timer_changed":null,"proximity_alert_triggered":null,"voice_chat_scheduled":null,"voice_chat_started":null,"voice_chat_ended":null,"voice_chat_participants_invited":null}�[0m
2023/12/14 12:54:12	stdout	�[36m2023/12/14 10:54:12.289�[0m �[33m[INFO] �[0m �[34m{server/server.go:179 server.(*SpamWeb).unbanHandler}�[0m �[33munban user 777000�[0m

add forwarded messages directly to the bot from the admin to spam

add a function so that messages sent from the admin(s) directly to the bot receive spam status and are added to the spam message database

Update the list of approved users

It is not clear how to update the list of approved users.

It only loads list of users from file during the bot start: https://github.com/umputun/tg-spam/blob/master/app/main.go#L162 , but how (and when?) can I add user to that list.

My assumption is that list has to be updated after clicking unban admin?

Anyway, could you please explain how it should work from your perspective and I will do my best to fix that if needed.

issues with running latest v.1.3.1 bot

I'm receiving the following failure:

tg-spam v1.3.1-e9aa206-20231222T15:42:53
2023/12/23 19:10:18.341 [ERROR] can't make data db, unable to open database file: out of memory (14)
2023/12/23 19:10:18.341 [ERROR] can't make data db, unable to open database file: out of memory (14)
>>> stack trace:
main.main()
	/build/app/main.go:139 +0x465

for the following run command:

docker run ghcr.io/umputun/tg-spam:v1.3.1 --telegram.token=*** --telegram.group=*** --admin.group=*** --telegram.preserve-unbanned --no-spam-reply --max-emoji=-1

If I run from locally built go binary, everything is fine. Also, everything is fine if I run from locally built docker image.

What am I doing wrong?

automatic ban on multiple /warn

Currently, the /warn command doesn't keep track of which user received a warning and how many times. It would be beneficial to record this information and automatically ban users if a certain threshold is reached.

Add some progress indication on potentially slow operations in web ui

With 1000+ records in the custom spam list clicking on the trash icon does something for a second or two and it will be nice to show some progress indicator. HTMX supports this kind of things, should be easy enough.

as a side note - need to check why it takes that long, maybe some index missing or smth like this.

Controversial results on a spam message

It's one message, but not reported yet messages: one that it's spam, second that it's ok

Add the ability to delete the system message about joining a group

Discussed in #91

^{Originally posted by fear-rus June 3, 2024}
I suggest adding the ability to delete the system message about joining the group after deleting the message and banning the user. When there is a lot of spam, and the bot deletes it, then there are system messages about joining, which look like spam pages in the group. Is it possible to automatically delete messages about joining a group of spambots that have been deleted?

Add messages forwarded (by admins) to admin spam group as spam

Every message forwarded to the admin spam group should be added to the dynamic-spam file (see #2) and also should trigger spam removal and user ban right away

Add example webapp showing real-life usage of library/api

It would be nice to have some basic examples demonstrating the use of the library and/or API. Some basic htmx+templates allow users to enter a text message and show it on the common page, which will do the trick. This will be some one-topic forum or something like this. Maybe some fake auth to allow simulation of multiple users.

As soon as the message is submitted, it will be checked against tg-spam library/api and if spam is detected, it will add check results instead of the original message to this page. Adding a button "unban" for each banned message would also be nice, restoring the original text and sending the user ID to the list of approver users.

The goal here is not to build something useful but to make a minimal web app demonstrating as many aspects of integration with library/API as possible. Another goal is to get a feel for the usability of the library/api

This is also an excellent exercise for someone who wants to play with htmx go application and integration with third-party apis

Soft ban mode not working for my

I'm trying to use the Soft ban functionality, it is critical for me.
I write a phrase from the stop-words.txt dictionary in my test chat, but the bot does not restrict me, but bans me and removes me from the chat.

What's my mistake?

I'm using v1.12.0 here is the container configuration:

services:
  tg-spam:
    image: umputun/tg-spam:latest # use :master tag for latest (unstable) version
    hostname: tg-spam
    restart: always
    container_name: tg-spam
    logging: &default_logging
      driver: json-file
      options:
        max-size: "10m"
        max-file: "5"
    environment:
      - TELEGRAM_TOKEN=000000Q
      - TELEGRAM_GROUP=-00000
      - TZ=Europe/Sofia          
      - ADMIN_GROUP=-000000  
      - LOGGER_ENABLED=true      
      - LOGGER_FILE=/srv/logs/tg-spam.log
      - LOGGER_MAX_SIZE=5M      
      - FIRST_MESSAGES_COUNT=4   
      - META_IMAGE_ONLY=true     
      - META_LINKS_ONLY=true     
      - MAX_EMOJI=-1             
      - NO_SPAM_REPLY=false      
      - SERVER_ENABLED=false      
      - NO_SPAM_REPLY=true
      - SOFT_BAN=true
      - DEBUG=true 			 
      - DRY=false
      - PARANOID=true
      - TRAINING=false
    volumes:
      - /telegram/tg-spam/data/:/srv/data      
      - /telegram/tg-spam/logs/:/srv/logs

Potentially, the problem is that in main.go, the TelegramListener structure does not initialize the SoftBanMode parameter from the options strcut.

Add support of auto-generated spam and ham files

In addition to static samples, the system should update and use another pair of files it updates by itself:

dynamic-spam.txt - every message detected with a high probability of spam will be added to this file. The high-probability filter is either multiple filters reacted, cas reported, and (maybe) emoji filter triggered
dynamic-ham.txt - every message that admins unbanned will be added

not handling forwarded messages properly

For forwarded messages, Telegram is likely using different fields to represent the text and media. The tg-spam should detect forwarder messages and use the appropriate fileds for the detection. Currently, it seems to think those messages are empty.

unban failure

Hi umputun,

I used docker-compose to deploy the application, and when I try to unban a user I get a warning

tg-spam | 2024/03/24 11:10:32.014 [WARN] failed to process callback: failed to unban user: failed to update ham for "Нужен человек для удаленного заработка. С вас, телефон и 2 часа свободного времени. Доход достойный. Для связи (+) в ЛС": can't update ham samples: can't update ham samples: failed to open /srv/data/ham-dynamic.txt: open /srv/data/ham-dynamic.txt: no such file or directory

Deletion of spam sample duplicates

If in the admin chat you confirm the ban of users who write the same messages, such messages are added twice. And when you try to remove a duplicate from the sample database, both messages (both lines) are deleted. This is not a problem if you remember this - it is a nuance of placing the sample database in a text file.

Я перейду на русский если автор не против.
Предложил бы хранить семплы как-то по-другому, но идей кроме файловой базы у меня нет.. Да и не шарю я в этом вашем Golang
Честно говоря думаю автор не рассчитывал, что образцов спама будет так много =)
Возможно стоит в будущем предусмотреть какую-то очистку базы от дубликатов, либо проверку на существование идентичной записи при добавлении. Потому что невозможно вспомнить -- что уже было, а что нет.
Спасибо за внимание =)

Add a http port to an unban link

When a bot generates an unban link for admin, it uses only the domain name, supposing that there is a frontend which will translate the request to the bot as a backend. It will be useful to have an additional public port parameter when there is no frontend before the standalone bot.

Make it possible to run from AWS Lambda

Lambda seems to be a nice fit for this type of service. Technically, it is not trivial but should be double as TG provides webhooks that the bot can register, and lambda will be invoked. In addition, all the persistent parts should be adjusted to support either ebs, efs or dynamo, whatever is cheaper.

I don't have any need for such a mode persobally; however, if someone can implement PR with minimal disruption to the current functionality and without much new complexity - I'll merge.

Unban directly from admin group

If it's possible it would be great to be able to unban users directly from admin group where banned messages forwarded. Ideally to make a button under each message.

And if it's run in dry mode add a "BAN" button instead.

For example, now I run a bot in dry mode. And there are false positives messages. But I can’t tell the bot "it’s not a spam" to teach it. So I can’t switch off the dry mode cause the bot is not taught yet.

tg-spam.log permission denied

Hi umputun,

I used docker-compose to deploy the application, and I see a warning

tg-spam    | 2024/03/24 11:09:18.928 [INFO]  user **** as spammer: {name: stopword, spam: false, details: not found}, {name: emoji, spam: false, details: 0/2}, {name: similarity, spam: true, details: 1.00/0.50}, {name: classifier, spam: true, details: probability of spam: 100.00%}, {name: cas, spam: false, details: record not found}, "Нужен человек для удаленного заработка. С вас, телефон и 2 часа свободного времени. Доход достойный. Для связи (+) в ЛС"
tg-spam    | 2024/03/24 11:09:18.928 [WARN]  can't write to log, can't open new logfile: open /srv/logs/tg-spam.log: permission denied

in container I see created folders from root user:

/srv $ ls -la
total 14356
drwxr-xr-x    1 root     root          4096 Mar 24 11:07 .
drwxr-xr-x    1 root     root          4096 Mar 24 11:07 ..
drwxrwxr-x    2 app      app           4096 Mar 24 11:09 data
drwxr-xr-x    2 root     root          4096 Mar 24 11:07 logs
-rwxr-xr-x    1 root     root      14684160 Mar 24 07:22 tg-spam
/srv $ ps -a
PID   USER     TIME  COMMAND
    1 app       0:04 /srv/tg-spam
   22 app       0:00 /bin/sh
   30 app       0:00 ps -a

on host mashine folder ./logs created from root:

user@ubuntu-vm:~/tg-spam$ ls -la
total 20
drwxrwxr-x 3 user   user   4096 Mar 24 08:07 .
drwxr-x--- 8 user   user   4096 Mar 23 19:06 ..
-rw-r--r-- 1 user   user   2132 Mar 24 08:07 docker-compose.yml
-rw-r--r-- 1 user   user    638 Mar 23 20:49 .env
drwxr-xr-x 2 root root 4096 Mar 24 08:07 logs

# example of a compose file with server enabled, logging turned on and samples on named volume
services:
  tg-spam:
    image: umputun/tg-spam:latest # use :master tag for latest (unstable) version
    hostname: tg-spam
    user: app
    restart: always
    container_name: tg-spam
    deploy:
      resources:
        limits:
          cpus: '0.25'
          memory: 50M
      restart_policy:
        condition: on-failure
        max_attempts: 3
    logging: &default_logging
      driver: json-file
      options:
        max-size: "5m"
        max-file: "2"
    env_file:
      - .env
    volumes:
      - tg-data:/srv/data       # mount volume with samples and dynamic files
      - ./logs:/srv/logs        # mount logs location to host's ./log directory
    ports:
      - 127.0.0.1:4080:8080
volumes:
  tg-data:

umputun / tg-spam Goto Github PK

tg-spam's People

Stargazers

Watchers

Forkers

tg-spam's Issues

Discussed in #91

Recommend Projects

Recommend Topics

Recommend Org