Giter Club home page Giter Club logo

twitch-chat-downloader's Introduction

Twitch Chat Downloader

Discord

pip install tcd

A neat tool to download chat messages from past broadcasts.

Requirements

Usage

tcd
# Download chat from VODs by video id
tcd --video 789654123,987456321 --format irc --output ~/Downloads
# Download chat from the first 10 VODs from multiple streamers
tcd --channel sodapoppin,nymn,lirik --first=10

Features


DocumentationTwitch Python

twitch-chat-downloader's People

Contributors

ashtov avatar dependabot[bot] avatar eugenechung81 avatar gorrrg avatar haotongye avatar oshawk avatar petterkraabol avatar rosuav avatar strager avatar thepaperpilot avatar thiscris avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

twitch-chat-downloader's Issues

[bug] {title} in output is not safe for OS path

When a streamer sets the title of the stream to something like "????" or "/root/" and you use it in your output format... bad things can happen from failing to save the output file to maybe even overriding of files.

First impressions on a Mac

I don't think Python 2.7 is supported anymore.

Traceback (most recent call last):
  File "app.py", line 164, in <module>
    file = open(directory + '/' + videoId + '.txt', 'w', encoding='utf-8')
TypeError: 'encoding' is an invalid keyword argument for this function

To install requests for Python3 on a Mac, I had to run (the 3.5 will change):

/Library/Frameworks/Python.framework/Versions/3.5/bin/pip3 install -r requirements.txt

With the default config, I had to create a 'chats' directory before running.

md chats
python3 app-py -v [video#]

Adding custom SSA syntax in the settings.json

I want to change the style of the font in the comment part with SSA markup codes like {\i1\b1}bolditalic{\i0\b0}. But obviously it gets interpreted by your parser:

json.decoder.JSONDecodeError: Invalid \escape: line 83 column 94 (char 3040)

My the "dialog" line looks like this: "dialogue": "Dialogue: Marked=0,{start},{end},Default,{name},0000,0000,0000,,\{\i1\b1\}{name}: \{\i0\b0\}{text}"

Maybe change from using curved brackets to something else? Or make escaping the SSA markup work (like I tried in my example)?

Special user badges in SSA

For broadcaster, mods, subscribers etc... IRC log has them, SSA doesn't. It would be a good idea to make it an optional field, like in the IRC format {commenter[irc_badge]}. But instead of irc_badge it's badge for both IRC and SSA. ;)

Downloader hasn't been working properly since a few days ago.

My python version is 3.7.1.
Have you checked for errors regarding the following? I look forward to hearing from you.

C:\Twitch-Chat-Downloader>python app.py -v 1234567
Traceback (most recent call last):
  File "app.py", line 4, in <module>
    import app
  File "C:\Twitch-Chat-Downloader\app\__init__.py", line 3, in <module>
    from .downloader import download, draw_progress
  File "C:\Twitch-Chat-Downloader\app\downloader.py", line 6, in <module>
    import app.formats as formats
  File "C:\Twitch-Chat-Downloader\app\formats\__init__.py", line 1, in <module>
    from .formats import use
  File "C:\Twitch-Chat-Downloader\app\formats\formats.py", line 4, in <module>
    import app.twitch as twitch
  File "C:\Twitch-Chat-Downloader\app\twitch\__init__.py", line 1, in <module>
    from .video import Video
  File "C:\Twitch-Chat-Downloader\app\twitch\video.py", line 13, in <module>
    class Video:
  File "C:\Twitch-Chat-Downloader\app\twitch\video.py", line 55, in Video
    def comment_generator_from_api(video: helix.Video) -> Generator[dict, None, None]:
AttributeError: module 'twitch.helix' has no attribute 'Video'

SyntaxError at app.py, Line 10 (colon after whitelist)

After following the installation instructions in the README and running python3 app.py on Ubuntu 14.04 I get a SyntaxError at the colon after whitelist. I'm running Python 3.4.3.

I'm not familiar with Python 3's colon syntax, so this is could be an obvious/easy fix and user error.

Here's the full output:

~/Twitch-Chat-Downloader# python3 app.py -h
  File "app.py", line 10
    whitelist: List[str] = None
             ^
SyntaxError: invalid syntax

[bug] Syntax error in app.py

Running app.py in the terminal gives me this:
File "app.py", line 10
whitelist: List[str] = None
^
SyntaxError: invalid syntax
I might have done something wrong? using python 3 and following the read me directions. Thanks

Again about the default SSA style

Creating the ASS styles by hand seems tedious. So I searched for an editor. ASS has near desktop publishing capabilities for a subtitle system. So there must be graphical editors for that.

http://www.aegisub.org/

...is the best I've come across. It seems to be feature complete when it comes to ASS's capabilities. Now I think you could use it to create a neat default style for Twitch Chat Downloader.

Maybe even link it in a wiki page with a guide on how to export user created styles into the settings.json.

PS: For example, I think it would be possible to give the style a chat log feeling, justify it to the left side and all...

[bug] Unicode conversion fails

This is the error message:
Traceback (most recent call last): File "D:\Downloads\Software\(new)\Twitch-Chat-Downloader-master 2017-03-14\app.py", line 254, in <module> line = 'Dialogue: Marked=0, ' + subtitleStart[:-4] + ', ' + subtitleStop[:-4] + ', Default, ' + sender + ', 0000, 0000, 0000 , , {\c&H' + color + '&}' + sender + '{\c&H000000&}: ' + text + '\n' UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 34: ordinal not in range(128)

And this is the raw JSON that causes the error:
{"attributes": {"from": "rainylace", "room": "threemilkt", "tags": {"turbo": false, "sent-ts": "1488508496364", "tmi-sent-ts": "1488508496502", "color": "#5F9EA0", "room-id": "147210100", "emotes": null, "subscriber": false, "user-type": null, "display-name": "rainylace", "user-id": "147820243", "id": "7dac8816-21ee-4b4a-901b-313676b974a8", "badges": null, "mod": false}, "deleted": false, "timestamp": 1488508482049, "color": "#5F9EA0", "command": "", "message": "my username is based on aesthetics\u2122"}, "type": "rechat-message", "id": "chat-9-2017:AVqSBTCczvyLkzIQiULL", "links": {"self": "/rechat-message/chat-9-2017:AVqSBTCczvyLkzIQiULL"}},

It's probably the trademark sign ™ (U+2122). It was converted to the actual ™ character just fine with your old SSA code. Maybe a unicode/ascii mix-up when concatenating strings?

[request] Download all formats in one go.

Currently only one file is written to disk with each program start. But I usually want the chat in all formats, so I created a shell script to start TCD four times in a row. But this is inefficient (and not nice towards Twitch) because really the chat only needs to be downloaded once, since the different output is based on the same input JSON.

It would be more convenient when there was an option to either just save all formats or have TCD download the formats you have selected on the command line with (multiple) "-f" in one sweep.

Question about "default_user_color"

I've set it to "default_user_color": "#ffffff" but it still assigns random colors to a user. Isn't that supposed to be the correct format?

Also wouldn't it be handy to create a random user color out of the hash of a user's name or even better their Twitch ID? Currently random will give each comment a random color even if it's from the same user, and that's actually really confusing when I read it.

User colors should be BGR not RGB

I've noticed that in the SSA subtitle my user color wasn't the same color that I have set it on twitch.tv. I thought that default_user_color only sets the user color for users that haven't set their colors in their Twitch chat settings.

EDIT: Oh no, it's actually something different. The RGB values are reversed. SSA uses BGR not RGB... old bug, I think you did the same mistake with the old API as well... ;)

AttributeError: module 'twitch' has no attribute 'chat'

C:\Users\josep\Downloads\Twitch-Chat-Downloader-master\Twitch-Chat-Downloader-master>py -3 app.py -v 348839719 -f ssa
Traceback (most recent call last):
File "app.py", line 4, in
import app
File "C:\Users\josep\Downloads\Twitch-Chat-Downloader-master\Twitch-Chat-Downloader-master\app_init_.py", line 3, in
from .downloader import download, draw_progress
File "C:\Users\josep\Downloads\Twitch-Chat-Downloader-master\Twitch-Chat-Downloader-master\app\downloader.py", line 6, in
import app.formats as formats
File "C:\Users\josep\Downloads\Twitch-Chat-Downloader-master\Twitch-Chat-Downloader-master\app\formats_init_.py", line 1, in
from .formats import use
File "C:\Users\josep\Downloads\Twitch-Chat-Downloader-master\Twitch-Chat-Downloader-master\app\formats\formats.py", line 4, in
import app.twitch as twitch
File "C:\Users\josep\Downloads\Twitch-Chat-Downloader-master\Twitch-Chat-Downloader-master\app\twitch_init_.py", line 1, in
from .video import Video
File "C:\Users\josep\Downloads\Twitch-Chat-Downloader-master\Twitch-Chat-Downloader-master\app\twitch\video.py", line 5, in
import twitch
File "C:\Users\josep\AppData\Local\Programs\Python\Python36-32\lib\site-packages\twitch_init_.py", line 1, in
from .chat import Chat
File "C:\Users\josep\AppData\Local\Programs\Python\Python36-32\lib\site-packages\twitch\chat_init_.py", line 1, in
from .chat import Chat
File "C:\Users\josep\AppData\Local\Programs\Python\Python36-32\lib\site-packages\twitch\chat\chat.py", line 4, in
import twitch.chat as chat
AttributeError: module 'twitch' has no attribute 'chat'

[bug] timestamp formatting ignored now for output?

It looks as if the date/time formatting is broken. This here...

"output": { "format": "{channel[name]} - {created_at} - {_id} - {title}.txt", "timestamp": { "absolute": "%Y%m%d" } }
produces a OSError: [Errno 22] Invalid argument: error because the string looks like 2017-12-27T13:30:00Z despite the formatting.

ModelueNotFoundError: No module named 'requests'

After I add both video and client IDs, save client (same issue even if I don't save) and hit enter, prompt closes pretty much right away with the following log and error:

err

Any idea what am I doing wrong? :) Thanks!

Option to get relative time from VOD start, maybe subtitle format support

Hi Petter,

Twitch Chat Download is an awesome tool that I use in combination with youtube-dl (which doesn't seem to have such a feature) to archive VoDs from streamers. Thank You So Much. And as of now it still works flawlessly with the Twitch API.

I have some feature requests:

  1. Add the option to change timezone from UTC to a predefined timezone. I mean you can simply replace UTC with the streamers local time with a text editor, but that gets tedious.

  2. Relative timecode without date where 00:00.0 is the start of the video, basically a video referenced timecode. So that way it's easier to find the corresponding part of the video to a specific message.

  3. Also what would be cool instead of just doing that, to totally convert the output format to a subtitle file format, like SRT or even better SSA as it supports more color options. I know subtitles makes little sense as some chats are very active, but there are players that can display the subtitles as a list in a seperate window. Also messages that appeared in the same second should be ordered by arbitrary fractions of a second I guess which the Twitch API does not provide. But I don't think it's a big deal if your script just makes up those fractions just for the sake of saving the messages in the correct order. I know this is quite some request.

  4. Support command line options and make non-interaction an option so your script can be adopted more easily in shell scripts for different purposes.

I would be most happy about request 2. and 4.

Bye

Using the same user badge symbols as Chatty for the IRC log?

I don't know if you are familiar with Chatty, a text client for Twitch channel chats. It also uses text symbols to show a user's special status. You both use the same symbols for most things but some are different...

This is what Chatty uses:

~ The Broadcaster
@ Moderator
+ Turbo/Prime
% Subscriber
$ Donated Bits
^ Bot (as recognized by local setting, FFZ API and BTTV API)
* Global Moderator
! Admin
& Staff

Imho since Chatty is more popular you should use their connotation. It might also give you some ideas for more user badges to add to the IRC log in case the Twitch API does provide them for VODs.

How get Twitch Client ID

Since i'm not good at programming, how can i get a Twitch Client ID? Twitch forum isn't so much clear about it.

Can't get json data from vod

I'm trying to get chat data of a vod with the id of "304617140".

python app.py -v 304617140 --format all --output ~/Downloads/Twitch

With this parameters, I get this error:
Traceback (most recent call last): File "app.py", line 45, in <module> main() File "app.py", line 24, in main app.arguments.input = app.download(app.arguments.video, 'json') File "C:\Users\Garavel\Dev\Twitch-Chat-Downloader\app\downloader.py", line 26, in download lines, output = formats.use(format_name, video) File "C:\Users\Garavel\Dev\Twitch-Chat-Downloader\formats\formats.py", line 16, in use return _json.use(video) File "C:\Users\Garavel\Dev\Twitch-Chat-Downloader\formats\json.py", line 16, in use for comment in video.comments: File "C:\Users\Garavel\Dev\Twitch-Chat-Downloader\twitch\api.py", line 40, in comments fragment = comment_fragment(video_id, fragment['_next']) File "C:\Users\Garavel\Dev\Twitch-Chat-Downloader\twitch\api.py", line 30, in comment_fragment return get('videos/{}/comments'.format(video_id), {'cursor': cursor}).json() File "C:\Users\Garavel\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\models.py", line 896, in json return complexjson.loads(self.text, **kwargs) File "C:\Users\Garavel\AppData\Local\Programs\Python\Python37-32\lib\json\__init__.py", line 348, in loads return _default_decoder.decode(s) File "C:\Users\Garavel\AppData\Local\Programs\Python\Python37-32\lib\json\decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "C:\Users\Garavel\AppData\Local\Programs\Python\Python37-32\lib\json\decoder.py", line 353, in raw_decode obj, end = self.scan_once(s, idx) json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 25684 (char 25683)

Any idea why?

Introducing Twitch-Python

To allow developers to develop their own projects similar to Twitch-Chat-Downloader, the API logic should be separated to its own repository.

  • Twitch-Chat-Downloader will import Twitch-Python for accessing the Twitch API.
  • Twitch-Python takes a neat object-oriented approach to accessing the Twitch API, making it easy to retrieve comments from multiple videos from multiple channels with few lines of code (issue #13).
  • Twitch-Python may be expanded to a full wrapper for Twitch API and accessing live chat through IRC.

Closing this issue when the separation is complete.

import twitch

helix = twitch.Helix('client-id')

for user, videos in helix.users('sodapoppin', 'reckful').videos(first=5):
    for video, comments in videos.comments():
        for comment in comments:
            print(comment.commenter.display_name, comment.message.body)

Discord

[bug] Timecode syntax error: second and millisecond truncated

I think either when the timecode is exactly a whole number or >something else happens< then the parser truncates the trailing digits. This leads to a syntax error in Media Player Classic - Home Cinema with SSA subtitle at least.

Twitch-Chat-Downloader commit hash of the version I'm using is e21c9d7.

Here's an example Twitch video, 125972865 (please don't judge, lol). Error occurs at line 221 in the SSA.

EDIT: Oh and MPC-HC also gives me a syntax error on the last line of the SSA subtitle, even though it looks perfectly fine.

Invalid Character Mapping

Getting this error only 7% in. I've tried assigning a client ID and changing cooldown, leading me to believe it's having an issue with a certain character in chat.

Traceback (most recent call last): File "app.py", line 281, in <module> file.write(line) File "C:\Program Files\Python36\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode characters in position 29-30: character maps to <undefined>

Import errors

I installed the requirements using:
pip install -U --user -r requirements.txt

I am using python 3.6

python3 app.py -v 1234567
Traceback (most recent call last):
  File "app.py", line 4, in <module>
    import app
  File "/home/lcheungchang/Twitch-Chat-Downloader/app/__init__.py", line 3, in <module>
    from .downloader import download, draw_progress
  File "/home/lcheungchang/Twitch-Chat-Downloader/app/downloader.py", line 6, in <module>
    import app.formats as formats
  File "/home/lcheungchang/Twitch-Chat-Downloader/app/formats/__init__.py", line 1, in <module>
    from .formats import use
  File "/home/lcheungchang/Twitch-Chat-Downloader/app/formats/formats.py", line 4, in <module>
    import app.twitch as twitch
  File "/home/lcheungchang/Twitch-Chat-Downloader/app/twitch/__init__.py", line 1, in <module>
    from .video import Video
  File "/home/lcheungchang/Twitch-Chat-Downloader/app/twitch/video.py", line 5, in <module>
    import twitch
  File "/home/lcheungchang/.local/lib/python3.6/site-packages/twitch/__init__.py", line 1, in <module>
    from .chat import Chat
  File "/home/lcheungchang/.local/lib/python3.6/site-packages/twitch/chat/__init__.py", line 1, in <module>
    from .chat import Chat
  File "/home/lcheungchang/.local/lib/python3.6/site-packages/twitch/chat/chat.py", line 1, in <module>
    from rx.subjects import Subject
ModuleNotFoundError: No module named 'rx'

What Is the Redirect URI?

As we need a twitch client id, we need to register an application with a "redirect url" of the application. I am not sure that this should be for Twitch-Chat-Downloader. Any hint?

when if video length is too long.

hi,
Your program works very well. Thank you.

However, when if video length is too long i failed to download chat log with this error message.

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='rechat.twitch.tv', port=443): Max retries exceeded with url:

Is there anyway to download chat log when if video length is too long?
Or is there anyway to download part of video?
(ex. 2:00:00 ~ 3:00:00)

[bug] "ValueError: 'raw' is not in list"

`Downloading raw JSON
Traceback (most recent call last):
File "D:\Downloads\Software(new)\Twitch-Chat-Downloader-master 2017-03-14\app.py", line 225, in
if ['srt', 'ass', 'ssa'].index(settings['format']) != -1:
ValueError: 'raw' is not in list

Downloading timestamped text
Traceback (most recent call last):
File "D:\Downloads\Software(new)\Twitch-Chat-Downloader-master 2017-03-14\app.py", line 225, in
if ['srt', 'ass', 'ssa'].index(settings['format']) != -1:
ValueError: 'timestamp' is not in list

Downloading relative timecoded text
Traceback (most recent call last):
File "D:\Downloads\Software(new)\Twitch-Chat-Downloader-master 2017-03-14\app.py", line 225, in
if ['srt', 'ass', 'ssa'].index(settings['format']) != -1:
ValueError: 'relative' is not in list`

If array expects one of the three subtitle strings. Which I fixed on my end by removing that if array condition and adding the subtitleStart, and '.000000' bugfix parts to the two respective if settings['format'] == 'srt' and 'ssa'|'ass'.

[request] Extract video URLs from the "videos/all" page

A new command line option would be cool where you just enter the Twitch user name and the script extracts all the video IDs from that user's "videos/all" page and downloads the corresponding chats. It would make updating your chat logs to the newest and fixed format of Twitch-Chat-Downloader so much easier.

The Twitch API probably offers a convenient way to get the currently available VODs, so there's no need to go digging for URLs in the HTML. I dunno.

Currently I'm using a shell script where I manually copy and paste the video URLs in and then let the script copy the video IDs for use with Twitch-Chat-Downloader. I would like to skip the manual part.

"Downloading JSON data..."

I have removed print('Downloading JSON data...') in json.py because the other formats don't have such a status message. That is just my personal preference. But I wonder will the logging output become more verbose like that for all formats?

Extremely slow on very active+long VODs

Downloading chat for VOD 162459509 with timestamp format it has steadily slowed down. It's now (104000 seconds in) going at less than one chat-second per real second and pinning one core at 100% usage.

[feature request] getting video.json for a vod

It would be useful to have an option to fetch the raw video JSON (like the one in the examples folder) of a video.

If that's not gonna happen, can anybody tell me how I get fetch it via an URL? And if I need a client ID for it or a web cookie.

Duration of subtitles

I think the current 2 seconds is too fast, and it would make more sense to expose that number as an argument to the program.

Fantastic tool.

Hey! I actually have no issues.

Just wanted to say AWESOME tool! I was able to get it up and running very quickly.

:)!

KeyError: 'data'

I have python3.7.0 and pip3 on my computer. I have already run the command "pip install -r requirement.txt". When I run the command "python app.py", an error occurs:

image

How should I fix it?

Issue - but I fixed it

Howdy, I'm running Python 3.6.2

On line 164 of your code app.py you have

file = open(directory + '/' + videoId + '.txt', 'w', encoding='utf-8')

This caused errors for me, I just took out the encoding to this

file = open(directory + '/' + videoId + '.txt', 'w')

and everything works great.

Nice library.

UnicodeEncodeError

I see the following error for all formats except JSON.

return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f440' in position 43: character maps to

Location of the settings.json file

Currently the app expects the settings file to be in the same directory as the user is currently in, right? Isn't it better if the app looks in its own base directory instead?

I've run into this problem when writing a shell script that's traversing through directories, execution of app.py failed because of that. A dirty fix was to remember the current working directory and the file to be processed in a temporary variable, moving to the Twitch-Chat-Downloader directory for execution employing -o and then moving back again to the working directory both via the temporary variable.

Also an option "--settings" to point to an arbitrary settings file would be nice.

[bug] for SRT {timestamp[absolute]} in output doesn't work?

Ok, I used {timestamp[absolute]} and added the timestamp absolute formatting for all types as you suggested in #33. But it seems to be failing for SRT.

This from my settings.json:

        "srt": {
            "duration": 2000,
            "comments": {
                "format": "{commenter[display_name]}: {message[body]}"
            },
            "output": {
                "format": "{channel[name]} - {timestamp[absolute]} - {_id} - {title}.srt",
                "timestamp": {
                    "absolute": "%Y%m%d"
                }
            }
        }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.