Giter Club home page Giter Club logo

telegram-export's Introduction

telegram-export

Logo

A tool to download Telegram data (users, chats, messages, and media) into a database (and display the saved data).

This project is currently archived.

The main owners no longer contribute to this repository, because Telegram updates and schema changes make it very tedious work, and we have lost interest on the project.

As an alternative, some clients like Telegram Desktop now have a built-in "Export chat history" feature (in the three-dots menu of every chat) which you may use as a replacement.

Database schema:

Schema image

Installation

The simplest way is to run sudo pip3 install --upgrade telegram_export, after which telegram-export should simply be available as a command: telegram-export in the terminal. That's it!

If you don't like using sudo pip, you can use pip3 install --user telegram_export, but you'll have to add something like ~/.local/bin/ to your $PATH to get the command available. If you don't want to add to PATH, you can also use python3 -m telegram_export anywhere instead of telegram-export. You'll have a similar issue if you're using a virtualenv, but if you're using those you probably know what you're doing anyway :)

Slow downloads?

You may also want to install cryptg with the same method for a speed boost when downloading media. Telegram requires a lot of encryption and decryption and this can make downloading files especially slow unless using a nice fast library like cryptg. One user reported a speed increase of 1100%.

Usage

First, copy config.ini.example (from GitHub) to ~/.config/telegram-export/config.ini and edit some values. You'll probably need to create this folder. To write your config whitelist, you may want to refer to the output of telegram-export --list-dialogs to get dialog IDs or telegram-export --search <query> to filter the results.

Then run telegram-export and allow it to dump data.

Full option listing:

usage: __main__.py [-h] [--list-dialogs] [--search-dialogs SEARCH_STRING]
                   [--config-file CONFIG_FILE] [--contexts CONTEXTS]
                   [--format {text,html}] [--download-past-media]

Download Telegram data (users, chats, messages, and media) into a database
(and display the saved data)

optional arguments:
  -h, --help            show this help message and exit
  --list-dialogs        list dialogs and exit
  --search-dialogs SEARCH_STRING
                        like --list-dialogs but searches for a dialog by
                        name/username/phone
  --config-file CONFIG_FILE
                        specify a config file. Default config.ini
  --contexts CONTEXTS   list of contexts to act on eg --contexts=12345,
                        @username (see example config whitelist for full
                        rules). Overrides whitelist/blacklist.
  --format {text,html}  formats the dumped messages with the specified
                        formatter and exits.
  --download-past-media
                        download past media instead of dumping new data (files
                        that were seen before but not downloaded).

telegram-export vs telegram-history-dump

(For brevity we'll just refer them to as "export" and "dump")
  • SQLite instead of jsonlines allows for far more powerful queries and better efficiency but loses compatibility with text-manipulating UNIX tools as the data is not stored as text (or even more powerful tools like jq).
  • export's stored data is less complicated than dump's json dumps
  • Support for saving the history of a person or other dialog, so you can see e.g. what their name was over time.
  • Using telethon instead of tg-cli allows support for newer Telegram features like pinned messages, admin logs, user bios, first-class support for supergroups and avoids the tg-cli bug which made dumping channels impossible, as well as several other tg-cli annoyances (such as being somewhat harder to install).
  • Newer and less mature than dump
  • No dedicated analysis program yet (dump has telegram-analysis and pisg)
  • Implemented features which dump does not support (incomplete list):
    • Admin logs
    • Dumping Users/Channels/Chats as their own entities, not just as message metadata. This allows things like user bios, channel descriptions and profile pictures.
    • Pinned messages (dump kind of supports this, but only by saving a message replying to the pinned message with text 'pinned the message')
    • Participant lists
  • Closer interaction with the Telegram API theoretically allows big speed improvements (Practical comparison of times soon™)
  • export's database file is bound to a user (like dump), and the program will exit if you login as another person to avoid mixing things up. If you do use export with multiple users, you should specify a different database for each user. You can easily select different config files through --config-file.

Limitations

  • Still being worked on. It dumps things, but the schema may change and we won't support old schema transitions.
  • Relies on Telethon <https://github.com/LonamiWebs/Telethon>, which is still pre-1.0.
  • Certain information is not dumped for simplicity's sake. For example, edited messages won't be re-downloaded and there is currently no support for multiple versions of a message in the db. However, this shouldn't be much of an issue, since most edits or deletions are legit and often to fix typos.

What does it do? Is it a bot?

It uses the Telegram API (what Telegram apps use), so it has access to everything a Telegram app can do. This is why you need an API ID and API hash to use it, and why one from Telegram Desktop will work. Since normal clients need to download messages, media, users etc to display them in-app, telegram-export can do the same, and save them into a nice database.

So no, it's not really a bot, but it does use the same technology as userbots in order to work. As far as we know, it won't get you banned from using Telegram or anything like that.

Installation from source

git clone this repository, then python3 setup.py install. You should also read through the Installation section for related notes.

telegram-export's People

Contributors

ahmadalli avatar brokeyourbike avatar expectocode avatar lonami avatar pielco11 avatar schemacs avatar xkord avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

telegram-export's Issues

Speed problems

Finally I have it running! after some closed bugs (thanks!).

... but, I have huge connection and I see 180kb/s speed limitation downloading some Documents (700Mb file). It's strange, using other tools I got more speed, maybe its a API limitation? maybe its a telethon limitation?

TypeError: The given session must be a Session instance

python3 -m telegram_export
Traceback (most recent call last):
  File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.5/dist-packages/telegram_export/__main__.py", line 307, in <module>
    ret = loop.run_until_complete(main(loop)) or 0
  File "/usr/lib/python3.5/asyncio/base_events.py", line 387, in run_until_complete
    return future.result()
  File "/usr/lib/python3.5/asyncio/futures.py", line 274, in result
    raise self._exception
  File "/usr/lib/python3.5/asyncio/tasks.py", line 239, in _step
    result = coro.send(None)
  File "/usr/local/lib/python3.5/dist-packages/telegram_export/__main__.py", line 279, in main
    proxy=proxy
  File "/usr/local/lib/python3.5/dist-packages/telethon/telegram_client.py", line 197, in __init__
    **kwargs
  File "/usr/local/lib/python3.5/dist-packages/telethon/telegram_bare_client.py", line 101, in __init__
    raise TypeError('The given session must be a Session instance.')
TypeError: The given session must be a Session instance.
Exception ignored in: <bound method TelegramBareClient.__del__ of <telethon.telegram_client.TelegramClient object at 0x7f6ac035f5c0>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/telethon/telegram_bare_client.py", line 174, in __del__
  File "/usr/local/lib/python3.5/dist-packages/telethon/telegram_bare_client.py", line 246, in disconnect
AttributeError: 'TelegramClient' object has no attribute '_user_connected'

Can be fixed with

import telethon.sessions.SQLiteSession
TelegramClient(SQLiteSession(absolute_session_name), ...

Having RuntimeError "Number of retries reached 0 for ['GetFileRequest']"

I enabled the media download. I got a lot of "Lost Request" warning at first, then I got this error and the program was stopped.
Does it mean the media is not available any more or cannot download?

2018-05-02 18:25:39,746 - asyncio - ERROR - Task exception was never retrieved
future: <Task finished coro=<Downloader._media_consumer() done, defined at /usr/local/lib/python3.6/dist-packages/telegram_export/downloader.py:304> exception=RuntimeError("Number of retries reached 0 for ['GetFileRequest'].",)>
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/telegram_export/downloader.py", line 311, in _media_consumer
    bar)
  File "/usr/local/lib/python3.6/dist-packages/telegram_export/downloader.py", line 300, in _download_media
    progress_callback=progress
  File "/usr/local/lib/python3.6/dist-packages/telethon/telegram_client.py", line 2174, in download_file
    input_location, offset, part_size
  File "/usr/local/lib/python3.6/dist-packages/telethon/telegram_bare_client.py", line 449, in __call__
    [type(x).__name__ for x in requests]
RuntimeError: Number of retries reached 0 for ['GetFileRequest'].

unable to run the application : unexpected keyword argument 'loop'

Hi,

somehow I'm unable to figure out what causes this error :

$ telegram-export
Traceback (most recent call last):
File "/usr/local/bin/telegram-export", line 3, in
runpy.run_module("telegram_export", run_name="main", alter_sys=True)
File "/usr/lib/python3.5/runpy.py", line 196, in run_module
return _run_module_code(code, init_globals, run_name, mod_spec)
File "/usr/lib/python3.5/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.5/dist-packages/telegram_export/main.py", line 295, in
ret = loop.run_until_complete(main(loop)) or 0
File "/usr/lib/python3.5/asyncio/base_events.py", line 387, in run_until_complete
return future.result()
File "/usr/lib/python3.5/asyncio/futures.py", line 274, in result
raise self._exception
File "/usr/lib/python3.5/asyncio/tasks.py", line 239, in _step
result = coro.send(None)
File "/usr/local/lib/python3.5/dist-packages/telegram_export/main.py", line 267, in main
loop=loop
File "/usr/local/lib/python3.5/dist-packages/telethon/telegram_client.py", line 196, in init
**kwargs
TypeError: init() got an unexpected keyword argument 'loop'

Any ideas ?

Handle invalidation time properly

Currently media is always dumped, which holds back our other invalidation time checks as they see a different media ID to before.

TypeError: 'NoneType' object is not subscriptable

From #24:

/usr/local/lib/python3.6/dist-packages/telethon/telegram_client.py:291: RuntimeWarning: coroutine 'TelegramClient._check_events_pending_resolve' was never awaited
self._check_events_pending_resolve()
2018-03-03 22:26:51,842 - main - INFO - Closing exporter
example: 0 messages [00:00, ? messages/s]Traceback (most recent call last):██████████████████████████████████████████████| 1/1 [00:00/00:00, 9.94 entities/s, chat=example]
File "telegram_export.py", line 289, in
ret = loop.run_until_complete(main())
File "/usr/lib/python3.6/asyncio/base_events.py", line 467, in run_until_complete
return future.result()
File "telegram_export.py", line 263, in main
await downloader.start(who)
File "/video/backup_telegram/telegram-export/downloader.py", line 420, in start
max_id = self.dumper.get_message_id(target_id, 'MAX')
File "/video/backup_telegram/telegram-export/dumper.py", line 591, in get_message_id
""".format(which=which), (context_id,)).fetchone()[0]
TypeError: 'NoneType' object is not subscriptable

Use telethon-asyncio as a dependency, not telethon

With telethon 0.15 and bigger:

Traceback (most recent call last):
File "telegram_export.py", line 289, in
ret = loop.run_until_complete(main())
File "/usr/lib/python3.6/asyncio/base_events.py", line 467, in run_until_complete
return future.result()
File "telegram_export.py", line 245, in main
config['TelegramAPI']['ApiHash']
AttributeError: 'TelegramClient' object has no attribute 'start'

With 0.13:

Traceback (most recent call last):
File "telegram_export.py", line 17, in
from dumper import Dumper
File "/video/backup_telegram/telegram-export/dumper.py", line 15, in
from telethon.utils import get_peer_id, resolve_id, get_input_peer
ImportError: cannot import name 'get_peer_id'

With 0.17.4.5

Traceback (most recent call last):
File "telegram_export.py", line 289, in
ret = loop.run_until_complete(main())
File "/usr/lib/python3.6/asyncio/base_events.py", line 467, in run_until_complete
return future.result()
File "telegram_export.py", line 246, in main
).start(config['TelegramAPI']['PhoneNumber'])
TypeError: object TelegramClient can't be used in 'await' expression

Documentation for BaseFormatter methods

It's not very friendly to require Formatter writers to read the BaseFormatter methods. We should generate some docs, even if only to display name and docstring.

New Feature: export only new entities after [date]

Exporting some channels may take forever if they have a lot of history. But all you want is x latest posts, or since x number of days.

adding a command line option would help:
telegram_export --since-date '27-06-2018'

or something like this.

Asyncio version

It would be great to make our requests asynchronously, to speed up the export time. Telethon has an asyncio branch which we could use for this.

TypeError: object TelegramClient can't be used in 'await' expression

I get a "TypeError: object TelegramClient can't be used in 'await' expression" error when running telegram-export.

Please enter the code you received: xxxxx
Signed in successfully as Greg Lee
Traceback (most recent call last):
  File "/usr/local/bin/telegram-export", line 4, in <module>
    __import__('pkg_resources').run_script('telegram-export==0.1.3', 'telegram-export')
  File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 658, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1445, in run_script
    exec(script_code, namespace, namespace)
  File "/usr/local/lib/python3.6/site-packages/telegram_export-0.1.3-py3.6.egg/EGG-INFO/scripts/telegram-export", line 3, in <module>
    __requires__ = 'telegram-export==0.1.3'
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 205, in run_module
    return _run_module_code(code, init_globals, run_name, mod_spec)
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/site-packages/telegram_export-0.1.3-py3.6.egg/telegram_export/__main__.py", line 294, in <module>
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/base_events.py", line 468, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.6/site-packages/telegram_export-0.1.3-py3.6.egg/telegram_export/__main__.py", line 267, in main
TypeError: object TelegramClient can't be used in 'await' expression```

better date formating

in the database the dates are formatted as something like 1529956521

it would be nice to have a proper date in there. any date format will do just something more human readable

Logo Proposal

Greetings @expectocode ,

I was passing by (this repository) and tought I'd like to collaborate by proposing a logo
if that's something that may interest you! let me know!

I'm a Graphic Designer and an Open Source enthusiastic
looking to improve my portfolio and collaborate to really cool projects like this one!

Best Regards,
-Luigi.

Process hangs with option --list-dialogs

telegram-export --list-dialogs successfully list dialogs but it hangs and I have to send a signal (e.g. CTRL+C) to terminate it.

$ python -m telegram_export --config-file config.ini.example --list-dialogs
...
/home/edg/src/tg-bck/env/lib/python3.6/site-packages/telegram_export/__main__.py:244: RuntimeWarning: coroutine 'TelegramBaseClient.disconnect' was never awaited
  client.disconnect()

^CTraceback (most recent call last):
  File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/edg/src/tg-bck/env/lib/python3.6/site-packages/telegram_export/__main__.py", line 317, in <module>
    loop.run_until_complete(task)
  File "/usr/lib64/python3.6/asyncio/base_events.py", line 455, in run_until_complete
    self.run_forever()
  File "/usr/lib64/python3.6/asyncio/base_events.py", line 422, in run_forever
    self._run_once()
  File "/usr/lib64/python3.6/asyncio/base_events.py", line 1396, in _run_once
    event_list = self._selector.select(timeout)
  File "/usr/lib64/python3.6/selectors.py", line 445, in select
    fd_event_list = self._epoll.poll(timeout, max_ev)
KeyboardInterrupt

Dump admin logs

These seem more suited to dumping in JSON, as their keys vary depending on what kind of event occurred. Perhaps a table with Date, EventType and JSON columns?

Document dump callback API better

Currently, these are quite messy. They require editing __main__.py and calling methods on exporter.dumper, and the available callbacks are not available, as well as a lack of information on what data can be expected to be in the row passed to them.

Distribute to PyPI

While not an "importable package" (yet?) we could distribute the package as a runnable application to PyPi, so people could run pip install telegram-export and then dump chats as easily as telegram-export --dump whatever.

In the future, we could add more features that made the package more useful if imported as a Python module. See youtube-dl for example. Most people uses it as a standalone application, but can be imported to further script certain tasks.

"Lost request" when media downloading enabled

Edit: Media is still downloaded, the message is just annoying.

2018-03-01 23:46:26,451 - telethon.network.mtproto_sender - WARNING - Lost request (ID 6528126892703884000) with code 0x96a18d5 will be skipped, contents: FileJpeg()

Many such messages are logged during message dumping, but only when media downloading is enabled.

It seems to be some kind of Telethon issue. If only we had someone who knew about Telethon and the Telegram API...

Download all profile pictures

Now as now telegram-export downloads only the current propic, I think that it would be nice to get every propic that is possible to see as like as in the apps.

Dump participant lists

Due to account deletion or service message deletion, users joining and leaving may not appear in the service messages. We should store participants of dialogs, but in a way that does not have massive repetition. Also, it's probably overkill to dump every user who is a participant of any dumped dialog.

These requirements imply some kind of external store using deltas rather than full lists

Have a setting to disable media

Like, straight up disable it. None of those magic values that don't even work properly on some other unexpected configuration properties.

Previously located Media should be downloadable later

As summarised by Lonami:

i want to be able to dump this chat without media
and in two years i want to come back and say, "oh man, i need the media"
but not redownload the same messages

The problem here is knowing what information we need to reliably download media in the future, and presenting a good UX for this operation.

I suggest a Media table column for filename, and a command line flag like --download-past-media CONTEXT_ID

MaxChunks not working on Channels

In Config.ini:

# Maximum chunks to retrieve from a chat (if too many). 0 (default) means all.
MaxChunks = 1000

Expected behavior, this would download the latest 1000 messages, no matter if it's a chat, channel or group.

strategy to download

Just to know, what's the strategy to download/resume files?
ie, you are downloading, hit ^C, and run again. It will re-download all of the files? without any skip?

What happen if the same file exists? I see in the documentation will be re-download/overwriten, but it's not better to have .1 or .2 files? or skip, just to continue downloading from the previous execution maybe using some config setting.

I'm in a channel with tons of huge files, if something wrong happend, I need to restart again and then download everything again. Imho, don't have sense.

Error on export: ValueError: Could not find the input entity for "-1001132836449"

Hi.

When I try to do a basic export (telegram-export with the config already set up), I get the python error down below. This is on Manjaro Linux, installed via pip.

2018-06-07 02:20:10,835 - exporter - INFO - Saving to /data_mnt/backup/personal data/telegram
2018-06-07 02:20:11,731 - telegram_export.downloader - INFO - Not getting participants since we are not admin.                          
2018-06-07 02:20:12,171 - telegram_export.downloader - INFO - Done. Retrieving full information about 0 missing entities.1 [00:00<?, ? entities/s, chat=example]
entities: 100%|██████████████| 1/1 [00:00<00:00,  2.07 entities/s, chat=example]
media: 0.00B [00:00, ?B/s, chat=example]
2018-06-07 02:20:12,479 - telegram_export.downloader - INFO - Not getting participants since we are not admin.                                     
2018-06-07 02:20:13,437 - telegram_export.downloader - INFO - Done. Retrieving full information about 0 missing entities.          s/s, chat=Telegram Usernames]
entities: 100%|███| 1/1 [00:00<00:00,  1.04 entities/s, chat=Telegram Usernames]
media: 0.00B [00:00, ?B/s, chat=Telegram Usernames]
2018-06-07 02:20:13,440 - exporter - INFO - Closing exporter
Traceback (most recent call last):
  File "/usr/bin/telegram-export", line 3, in <module>
    runpy.run_module("telegram_export", run_name="__main__", alter_sys=True)
  File "/usr/lib/python3.6/runpy.py", line 205, in run_module
    return _run_module_code(code, init_globals, run_name, mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/lib/python3.6/site-packages/telegram_export/__main__.py", line 295, in <module>
    ret = loop.run_until_complete(main(loop)) or 0
  File "/usr/lib/python3.6/asyncio/base_events.py", line 468, in run_until_complete
    return future.result()
  File "/usr/lib/python3.6/site-packages/telegram_export/__main__.py", line 279, in main
    await exporter.start()
  File "/usr/lib/python3.6/site-packages/telegram_export/exporter.py", line 74, in start
    self.client):
  File "/usr/lib/python3.6/site-packages/async_generator/_impl.py", line 271, in __anext__
    return await self._do_it(self._it.__next__)
  File "/usr/lib/python3.6/site-packages/async_generator/_impl.py", line 290, in _do_it
    return await ANextIter(self._it, start_fn, *args)
  File "/usr/lib/python3.6/site-packages/async_generator/_impl.py", line 197, in __next__
    return self._invoke(first_fn, *first_args)
  File "/usr/lib/python3.6/site-packages/async_generator/_impl.py", line 209, in _invoke
    result = fn(*args)
  File "/usr/lib/python3.6/site-packages/telegram_export/exporter.py", line 36, in get_entities_iter
    async for ent in entities_from_str(client, in_list):
  File "/usr/lib/python3.6/site-packages/async_generator/_impl.py", line 271, in __anext__
    return await self._do_it(self._it.__next__)
  File "/usr/lib/python3.6/site-packages/async_generator/_impl.py", line 290, in _do_it
    return await ANextIter(self._it, start_fn, *args)
  File "/usr/lib/python3.6/site-packages/async_generator/_impl.py", line 197, in __next__
    return self._invoke(first_fn, *first_args)
  File "/usr/lib/python3.6/site-packages/async_generator/_impl.py", line 209, in _invoke
    result = fn(*args)
  File "/usr/lib/python3.6/site-packages/telegram_export/exporter.py", line 20, in entities_from_str
    await yield_(await client.get_input_entity(int(who)))
  File "/usr/lib/python3.6/site-packages/telethon/telegram_client.py", line 2663, in get_input_entity
    .format(peer)
ValueError: Could not find the input entity for "-1001132836449". Please read https://telethon.readthedocs.io/en/latest/extra/basic/entities.html to find out more details.

FloodWait when downloading media

Maybe you can catch this one:

2018-03-04 20:42:44,964 - asyncio - ERROR - Task exception was never retrieved
future: <Task finished coro=<Downloader._media_consumer() done, defined at /video/backup_telegram/telegram-export/downloader.py:150> exception=FloodWaitError('A wait of 445 seconds is required',)>
Traceback (most recent call last):
File "/video/backup_telegram/telegram-export/downloader.py", line 174, in _media_consumer
progress_callback=progress
File "/usr/local/lib/python3.6/dist-packages/telethon_aio/telegram_client.py", line 1956, in download_file
input_location, offset, part_size
File "/usr/local/lib/python3.6/dist-packages/telethon_aio/telegram_bare_client.py", line 424, in call
result = await self._invoke(call_receive, retry, *requests)
File "/usr/local/lib/python3.6/dist-packages/telethon_aio/telegram_bare_client.py", line 509, in _invoke
raise next(x.rpc_error for x in requests if x.rpc_error)
telethon_aio.errors.rpc_error_list.FloodWaitError: A wait of 445 seconds is required

Error starting

Hello again, I seen a lot of changes and I removed my session file (and all of the content), just because I see some error when I did a get pull and start again:

blaaaahhh: 717 messages [00:00, 547282.25 Traceback (most recent call last):
File "./telegram_export.py", line 318, in enturas [PALOMITAS]]
ret = loop.run_until_complete(main()) or 0ALOMITAS]]
File "/usr/lib/python3.6/asyncio/base_events.py", line 467, in run_until_complete
return future.result()
File "./telegram_export.py", line 289, in main
await downloader.start(entity)
File "/video/backup_telegram/telegram-export/downloader.py", line 336, in start
for media in self.dumper.iter_resume_media(target_id):
File "/video/backup_telegram/telegram-export/dumper.py", line 695, in iter_resume_media
"FROM ResumeMedia WHERE ContextID = ?", (context_id,)
sqlite3.OperationalError: no such table: ResumeMedia

After that the delete all the content, I can't start the app again:

python3 telegram_export.py
Traceback (most recent call last):
File "telegram_export.py", line 318, in
ret = loop.run_until_complete(main()) or 0
File "/usr/lib/python3.6/asyncio/base_events.py", line 467, in run_until_complete
return future.result()
File "telegram_export.py", line 269, in main
).start(config['TelegramAPI']['PhoneNumber'])
File "/usr/local/lib/python3.6/dist-packages/telethon_aio/telegram_client.py", line 308, in start
sent_code = await self.send_code_request(phone, force_sms=force_sms)
File "/usr/local/lib/python3.6/dist-packages/telethon_aio/telegram_client.py", line 210, in send_code_request
result = await self(SendCodeRequest(phone, self.api_id, self.api_hash))
File "/usr/local/lib/python3.6/dist-packages/telethon_aio/telegram_bare_client.py", line 424, in call
result = await self._invoke(call_receive, retry, *requests)
File "/usr/local/lib/python3.6/dist-packages/telethon_aio/telegram_bare_client.py", line 468, in _invoke
await self._sender.send(*requests)
File "/usr/local/lib/python3.6/dist-packages/telethon_aio/network/mtproto_sender.py", line 110, in send
await self._send_message(message)
File "/usr/local/lib/python3.6/dist-packages/telethon_aio/network/mtproto_sender.py", line 165, in _send_message
await self.connection.send(utils.pack_message(self.session, message))
File "/usr/local/lib/python3.6/dist-packages/telethon_aio/helpers.py", line 33, in pack_message
data = struct.pack('<qq', session.salt, session.id) + bytes(message)
File "/usr/local/lib/python3.6/dist-packages/telethon_aio/tl/tl_message.py", line 25, in bytes
body = GzipPacked.gzip_if_smaller(self.request)
File "/usr/local/lib/python3.6/dist-packages/telethon_aio/tl/gzip_packed.py", line 22, in gzip_if_smaller
data = bytes(request)
File "/usr/local/lib/python3.6/dist-packages/telethon_aio/tl/functions/auth.py", line 508, in bytes
TLObject.serialize_bytes(self.phone_number),
File "/usr/local/lib/python3.6/dist-packages/telethon_aio/tl/tlobject.py", line 98, in serialize_bytes
'bytes or str expected, not {}'.format(type(data)))
TypeError: bytes or str expected, not <class 'NoneType'>

No such table: SelfInformation

Having this:

Traceback (most recent call last):
  File "telegram-export.py", line 113, in <module>
    main()
  File "telegram-export.py", line 80, in main
    (client.get_me(input_peer=True).user_id,))
sqlite3.OperationalError: no such table: SelfInformation

Whitelist/blacklist comments per ID

Since we require a username, phone number, or context ID for the whitelist and blacklist, it would be nice if users were able to leave a comment by each item to remember what context that ID is for. As far as I know, there isn't an easy way to do this with INI - comments are for the whole line. Perhaps we could parse comments in the list ourselves.

Cannot export group chat

When trying to export a group chat, the following exception is thrown:

$ python -m telegram_export --config-file config.ini.example --context -286200882
2018-06-26 08:39:30,723 - exporter - INFO - Saving to /home/edg/tg-export
Exception ignored in: <bound method AsyncGenerator.__del__ of <async_generator._impl.AsyncGenerator object at 0x7f9c08cd9860>>
Traceback (most recent call last):████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 1904.77 entities/s, chat=test group]
  File "/home/edg/src/tg-bck/env/lib/python3.6/site-packages/async_generator/_impl.py", line 324, in __del__
    .format(self._coroutine.cr_frame.f_code.co_name)
RuntimeError: partially-exhausted async_generator 'get_entities_iter' garbage collected
Exception ignored in: <bound method AsyncGenerator.__del__ of <async_generator._impl.AsyncGenerator object at 0x7f9c08183a58>>
Traceback (most recent call last):
  File "/home/edg/src/tg-bck/env/lib/python3.6/site-packages/async_generator/_impl.py", line 324, in __del__
    .format(self._coroutine.cr_frame.f_code.co_name)
RuntimeError: partially-exhausted async_generator 'entities_from_str' garbage collected
2018-06-26 08:39:31,479 - exporter - INFO - Closing exporter
test group: 3 messages [00:00,  7.57 messages/s]/home/edg/src/tg-bck/env/lib/python3.6/site-packages/telegram_export/exporter.py:63: RuntimeWarning: coroutine 'TelegramBaseClient.disconnect' was never awaited
  self.client.disconnect()
Traceback (most recent call last):
  File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/edg/src/tg-bck/env/lib/python3.6/site-packages/telegram_export/__main__.py", line 307, in <module>
    ret = loop.run_until_complete(main(loop)) or 0
  File "/usr/lib64/python3.6/asyncio/base_events.py", line 468, in run_until_complete
    return future.result()
  File "/home/edg/src/tg-bck/env/lib/python3.6/site-packages/telegram_export/__main__.py", line 291, in main
    await exporter.start()
  File "/home/edg/src/tg-bck/env/lib/python3.6/site-packages/telegram_export/exporter.py", line 75, in start
    await self.downloader.start(entity)
  File "/home/edg/src/tg-bck/env/lib/python3.6/site-packages/telegram_export/downloader.py", line 445, in start
    can_get_participants = is_channel and (target.megagroup or
AttributeError: 'Chat' object has no attribute 'megagroup'
Exception ignored in: <bound method tqdm.__del__ of test group: 3 messages [00:00,  6.93 messages/s]>
Traceback (most recent call last):
  File "/home/edg/src/tg-bck/env/lib/python3.6/site-packages/tqdm/_tqdm.py", line 882, in __del__
  File "/home/edg/src/tg-bck/env/lib/python3.6/site-packages/tqdm/_tqdm.py", line 1087, in close
  File "/home/edg/src/tg-bck/env/lib/python3.6/site-packages/tqdm/_tqdm.py", line 439, in _decr_instances
  File "/home/edg/src/tg-bck/env/lib64/python3.6/_weakrefset.py", line 109, in remove
KeyError: <weakref at 0x7f9c0811f4f8; to 'tqdm' at 0x7f9c081836a0>

I've installed telethon from master branch (see #59):

$ pip freeze
appdirs==1.4.3
async-generator==1.9
pyaes==1.6.1
pyasn1==0.4.3
rsa==3.4.2
telegram-export==0.1.6
Telethon-aio==1.0rc1
tqdm==4.23.4

JSON-based ElasticSearch tool

I think that could be useful, and more easy to handle with, the json type format for dumping/report/storage. What about this?

TypeError: 'async for' requires an object with __aiter__ method, got coroutine

Fail if try to repeat the script run next time:

$ telegram-export
2018-03-27 17:48:58,000 - telethon.telegram_bare_client - INFO - Connecting to 149.154.167.51:443...
2018-03-27 17:48:58,047 - telethon.telegram_bare_client - INFO - Connection success!
2018-03-27 17:48:58,048 - telethon.telegram_bare_client - INFO - Initializing a new connection while invoking
2018-03-27 17:48:58,161 - exporter - INFO - Saving to /Users/user/exports/chat
2018-03-27 17:48:58,163 - telethon.telegram_bare_client - INFO - Starting to wait for items from the network
customer: 9496 messages [00:00, 569.42 messages/s]
2018-03-27 17:48:58,585 - telegram_export.downloader - INFO - Done. Retrieving full information about 1 missing entities.
entities: 100%|***** | 2/2 [00:01<00:00,  1.89 entities/s, chat=customer]
media: 0.00B [00:00, ?B/s, chat=customer]
2018-03-27 17:49:04,421 - exporter - INFO - Closing exporter
2018-03-27 17:49:04,421 - telethon.telegram_bare_client - INFO - Disconnecting...
2018-03-27 17:49:04,421 - telethon.network.mtproto_sender - INFO - Disconnecting MtProtoSender...
Traceback (most recent call last):
  File "/Users/user/.virtualenvs/mine/bin/telegram-export", line 3, in <module>
    runpy.run_module("telegram_export", run_name="__main__", alter_sys=True)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 205, in run_module
    return _run_module_code(code, init_globals, run_name, mod_spec)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/user/.virtualenvs/mine/lib/python3.6/site-packages/telegram_export/__main__.py", line 287, in <module>
    ret = loop.run_until_complete(main()) or 0
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/base_events.py", line 467, in run_until_complete
    return future.result()
  File "/Users/user/.virtualenvs/mine/lib/python3.6/site-packages/telegram_export/__main__.py", line 271, in main
    await exporter.start()
  File "/Users/user/.virtualenvs/mine/lib/python3.6/site-packages/telegram_export/exporter.py", line 71, in start
    self.client):
  File "/Users/user/.virtualenvs/mine/lib/python3.6/site-packages/telegram_export/exporter.py", line 33, in get_entities_iter
    async for ent in entities_from_str(client, in_list):
  File "/Users/user/.virtualenvs/mine/lib/python3.6/site-packages/telegram_export/exporter.py", line 18, in entities_from_str
    yield await client.get_input_entity(int(who))
  File "/Users/user/.virtualenvs/mine/lib/python3.6/site-packages/telethon/telegram_client.py", line 2448, in get_input_entity
    async for dialog in self.get_dialogs(100):
TypeError: 'async for' requires an object with __aiter__ method, got coroutine

Fix entity dumping resuming

Since we download all messages, then the entities (users, channels etc) from those messages, when the dumping is interrupted those entities are never dumped. This could be solved by dumping entities at the same time as messages, though this may have speed costs.

struct.error: 'i' format requires -2147483648 <= number <= 2147483647 100%|█

Files are not downloaded correctly:

python3 telegram_export.py
2018-03-04 10:26:24,630 - downloader - INFO - Getting participants...
📺 Aventuras [PALOMITAS]: 713 messages [00:00, 393802.84 messages/s]
2018-03-04 10:26:24,751 - downloader - INFO - Getting participants aborted (not admin).                          | 0/1 [00:00/?, ? entities/s, chat=📺 Aventuras [PALOMITAS]📺 Aventuras [PALOMITAS]: 713 messages [00:00, 5811.63 messages/s]
2018-03-04 10:26:25,098 - downloader - DEBUG - Received less messages than limit, done.██████████████████| 1/1 [00:00/00:00,  8.17 entities/s, chat=📺 Aventuras [PALOMITAS]📺 Aventuras [PALOMITAS]: 813 messages [00:00, 212.94 messages/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00/00:00,  2.13 entities/s, chat=📺 Aventuras [PALOMITAS]2018-03-04 10:26:25,104 - downloader - INFO - Done. Retrieving full information about 0 missing entities.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00/00:00,  2.10 entities/s, chat=📺 Aventuras [PALOMITAS]2018-03-04 10:26:25,104 - asyncio - ERROR - Task exception was never retrieved
future: <Task finished coro=<Downloader._media_consumer() done, defined at /video/backup_telegram/telegram-export/downloader.py:150> exception=error("'i' format requires -2147483648 <= number <= 2147483647",)>
Traceback (most recent call last):
  File "/video/backup_telegram/telegram-export/downloader.py", line 174, in _media_consumer
    progress_callback=progress
  File "/usr/local/lib/python3.6/dist-packages/telethon/telegram_client.py", line 1886, in download_file
    input_location, offset, part_size
  File "/usr/local/lib/python3.6/dist-packages/telethon/telegram_bare_client.py", line 424, in __call__
    result = await self._invoke(call_receive, retry, *requests)
  File "/usr/local/lib/python3.6/dist-packages/telethon/telegram_bare_client.py", line 468, in _invoke
    await self._sender.send(*requests)
  File "/usr/local/lib/python3.6/dist-packages/telethon/network/mtproto_sender.py", line 110, in send
    await self._send_message(message)
  File "/usr/local/lib/python3.6/dist-packages/telethon/network/mtproto_sender.py", line 165, in _send_message
    await self.connection.send(utils.pack_message(self.session, message))
  File "/usr/local/lib/python3.6/dist-packages/telethon/helpers.py", line 33, in pack_message
    data = struct.pack('<qq', session.salt, session.id) + bytes(message)
  File "/usr/local/lib/python3.6/dist-packages/telethon/tl/tl_message.py", line 25, in __bytes__
    body = GzipPacked.gzip_if_smaller(self.request)
  File "/usr/local/lib/python3.6/dist-packages/telethon/tl/gzip_packed.py", line 22, in gzip_if_smaller
    data = bytes(request)
  File "/usr/local/lib/python3.6/dist-packages/telethon/tl/functions/upload.py", line 121, in __bytes__
    bytes(self.location),
  File "/usr/local/lib/python3.6/dist-packages/telethon/tl/types/__init__.py", line 6556, in __bytes__
    struct.pack('<i', self.version),
struct.error: 'i' format requires -2147483648 <= number <= 2147483647
100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00/00:00,  1.85 entities/s, chat=📺 Aventuras [PALOMITAS]

FloodWaitErrors when retrieving users

This may be occurring as a result of our request sleeping not taking retries into account? Sometimes, the log fills up with message like this:

2018-05-07 21:29:12,689 - telethon.network.mtproto_sender - DEBUG - Processing Remote Procedure Call result                                                                                                                                                                       
2018-05-07 21:29:12,691 - telethon.network.mtproto_sender - DEBUG - Received response for request with ID 6552938774096794980                                                                                                                                                     
2018-05-07 21:29:12,691 - telethon.network.mtproto_sender - DEBUG - Received response for request with ID 6552938774096794980                                                                                                                                                     
2018-05-07 21:29:12,694 - telethon.network.mtproto_sender - DEBUG - Confirmed GetFullUserRequest through error A wait of 27 seconds is required                                                                                                                                   
2018-05-07 21:29:12,694 - telethon.network.mtproto_sender - DEBUG - Confirmed GetFullUserRequest through error A wait of 27 seconds is required                                                                                                                                   
2018-05-07 21:29:12,717 - telethon.telegram_bare_client - DEBUG - Receiving items from the network...                                                                                                                                                                             
2018-05-07 21:29:12,717 - telethon.telegram_bare_client - DEBUG - Receiving items from the network...                                                                                                                                                                             
2018-05-07 21:29:12,718 - telethon.telegram_bare_client - WARNING - Request invoked too often, wait 27s                                                                                                                                                                           
2018-05-07 21:29:12,718 - telethon.telegram_bare_client - WARNING - Request invoked too often, wait 27s                                                                                                                                                                           
2018-05-07 21:29:12,723 - telethon.telegram_bare_client - DEBUG - Invoking GetFullUserRequest                                                                                                                                                                                     
2018-05-07 21:29:12,723 - telethon.telegram_bare_client - DEBUG - Invoking GetFullUserRequest                                                                                                                                                                                     
2018-05-07 21:29:12,724 - telethon.network.mtproto_sender - DEBUG - Sending requests with IDs: GetFullUserRequest: 6552938774430906196                                                                                                                                            
2018-05-07 21:29:12,724 - telethon.network.mtproto_sender - DEBUG - Sending requests with IDs: GetFullUserRequest: 6552938774430906196                                                                                                                                            
2018-05-07 21:29:12,764 - telethon.network.mtproto_sender - DEBUG - Processing Remote Procedure Call result                                                                                                                                                                       
2018-05-07 21:29:12,764 - telethon.network.mtproto_sender - DEBUG - Processing Remote Procedure Call result                                                                                                                                                                       
2018-05-07 21:29:12,765 - telethon.network.mtproto_sender - DEBUG - Received response for request with ID 6552938774430906196                                                                                                                                                     
2018-05-07 21:29:12,765 - telethon.network.mtproto_sender - DEBUG - Received response for request with ID 6552938774430906196                                                                                                                                                     
2018-05-07 21:29:12,774 - telethon.network.mtproto_sender - DEBUG - Confirmed GetFullUserRequest through error A wait of 27 seconds is required                                                                                                                                   
2018-05-07 21:29:12,774 - telethon.network.mtproto_sender - DEBUG - Confirmed GetFullUserRequest through error A wait of 27 seconds is required                                                                                                                                   
2018-05-07 21:29:12,810 - telethon.telegram_bare_client - DEBUG - Receiving items from the network...                                                                                                                                                                             
2018-05-07 21:29:12,810 - telethon.telegram_bare_client - DEBUG - Receiving items from the network...                                                                                                                                                                             
2018-05-07 21:29:12,812 - telethon.telegram_bare_client - WARNING - Request invoked too often, wait 27s                                                                                                                                                                           
2018-05-07 21:29:12,812 - telethon.telegram_bare_client - WARNING - Request invoked too often, wait 27s                                                                                                                                                                           
2018-05-07 21:29:12,816 - telethon.telegram_bare_client - DEBUG - Invoking GetFullUserRequest                                                                                                                                                                                     
2018-05-07 21:29:12,816 - telethon.telegram_bare_client - DEBUG - Invoking GetFullUserRequest                                                                                                                                                                                     
2018-05-07 21:29:12,818 - telethon.network.mtproto_sender - DEBUG - Sending requests with IDs: GetFullUserRequest: 6552938774806583304                                                                                                                                            
2018-05-07 21:29:12,818 - telethon.network.mtproto_sender - DEBUG - Sending requests with IDs: GetFullUserRequest: 6552938774806583304                                                                                                                                            
2018-05-07 21:29:12,894 - telethon.network.mtproto_sender - DEBUG - Processing Remote Procedure Call result                                                                                                                                                                       
2018-05-07 21:29:12,894 - telethon.network.mtproto_sender - DEBUG - Processing Remote Procedure Call result                                                                                                                                                                       
2018-05-07 21:29:12,895 - telethon.network.mtproto_sender - DEBUG - Received response for request with ID 6552938774806583304                                                                                                                                                     
2018-05-07 21:29:12,895 - telethon.network.mtproto_sender - DEBUG - Received response for request with ID 6552938774806583304                                                                                                                                                     
2018-05-07 21:29:12,896 - telethon.network.mtproto_sender - DEBUG - Confirmed GetFullUserRequest through error A wait of 27 seconds is required                                                                                                                                   
2018-05-07 21:29:12,896 - telethon.network.mtproto_sender - DEBUG - Confirmed GetFullUserRequest through error A wait of 27 seconds is required                                                                                                                                   
2018-05-07 21:29:12,910 - telethon.telegram_bare_client - DEBUG - Receiving items from the network...                                                                                                                                                                             
2018-05-07 21:29:12,910 - telethon.telegram_bare_client - DEBUG - Receiving items from the network...                                                                                                                                                                             
2018-05-07 21:29:12,911 - telethon.telegram_bare_client - WARNING - Request invoked too often, wait 27s                                                                                                                                                                           
2018-05-07 21:29:12,911 - telethon.telegram_bare_client - WARNING - Request invoked too often, wait 27s  ```

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.