Giter Club home page Giter Club logo

pullcord's Introduction

pullcord

Pullcord is a Discord archiver. It downloads channel logs, server logs, attachments, avatars, server icons, server splashes and emoji.

Install

go get github.com/tsudoko/pullcord/cmd/pullcord

This installs pullcord to $GOPATH/bin, by default ~/go/bin.

How to use

Right now the only implemented mode is history. It is used for downloading channel history, server history and all related files. A realtime mode is planned.

By default pullcord downloads data from every channel and server the account is connected to, with exception of DMs. To fine-tune this behavior, filtering options such as -c, -s, -C, -S and -dm can be used. All files are downloaded to the current working directory; creating a new empty directory is recommended.

Pullcord exits as soon as it encounters any error.

Basic usage:

pullcord -<mode> [auth_options] [filter_options]

All options can be seen by running pullcord -h.

Log format

See FORMAT.md.

pullcord's People

Contributors

ivan avatar tsudoko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pullcord's Issues

Thread support

Right now threads are just ignored, logs are incomplete if those are not archived as well

Retry or skip failed avatar download

pullcord stopped after:

2019/04/27 02:19:42 [85338836384628736] error downloading avatar for user 105888302300717056: (cdndl.ErrNotOk) non-200 status code: 415

I am having an Authentication failed error

Hi,

I am having an Authentication failed error.
"opening the websocket connection failed:websocket: close 4004: Authentication failed."

What would you recommend I do?
I am running on Windows 10 and have tried running the command as an admin.
The command I ran was pullcord -history -c 738xxxxxxxxxxxxx

Thanks

server sent GOAWAY and closed the connection

2018/11/12 10:19:33 downloading https://cdn.discordapp.com/attachments/162272327759953921/460863149390692382/unknown.png
2018/11/12 10:19:33 [162272327759953921/162272327759953921] error downloading attachment 460863149390692382: http2: server sent GOAWAY and closed the connection; LastStreamID=1999, ErrCode=NO_ERROR, debug=""

Pullcord should retry and resume.

Missing access

2018/11/12 09:45:56 [446709324614467596/446714306428665856] error getting messages from : HTTP 403 Forbidden, {"code": 50001, "message": "Missing Access"}

pullcord should skip channels without access.

error downloading attachment X for message Y: (*os.LinkError)

pullcord stopped with this error while getting the history for my account:

2018/11/15 22:59:44 downloading https://cdn.discordapp.com/attachments/85630519307427840/458619352686788608/skillHUD_elan.gif
2018/11/15 22:59:44 downloading https://cdn.discordapp.com/attachments/85630519307427840/458646373701517323/225.gif
2018/11/15 22:59:45 downloading https://cdn.discordapp.com/attachments/85630519307427840/458646373701517323/225.gif
2018/11/15 22:59:45 [85338836384628736/85630519307427840] error downloading attachment 458646373701517323 for message 458646374561611786: (*os.LinkError) rename attachments/85630519307427840/458646373701517323/225.gif.part attachments/85630519307427840/458646373701517323/225.gif: no such file or directory
# ls -l attachments/85630519307427840/458646373701517323
total 2,928,640
-rw-r----- 1 ivan ivan 2,928,056 2018-11-15 22:59 225.gif

(OS is Debian 9.6; Go is the official go1.11.2.linux-amd64.tar.gz)

edit: and another:

2018/11/15 23:22:28 downloading https://cdn.discordapp.com/attachments/85630519307427840/502439705674121226/Halloween_Gif_2_min.gif
2018/11/15 23:22:28 downloading https://cdn.discordapp.com/attachments/85630519307427840/502439705674121226/Halloween_Gif_2_min.gif
2018/11/15 23:22:29 [85338836384628736/85630519307427840] error downloading attachment 502439705674121226 for message 502439707326545930: (*os.LinkError) rename attachments/85630519307427840/502439705674121226/Halloween_Gif_2_min.gif.part attachments/85630519307427840/502439705674121226/Halloween_Gif_2_min.gif: no such file or directory

error downloading external emoji: 415

2018/11/12 22:32:53 downloading https://cdn.discordapp.com/emojis/396521772868042754.gif?size=2048
2018/11/12 22:32:53 [306193495234707456/347190075471298568] error downloading external emoji 396521772868042754: non-200 status code: 415

415 is Unsupported Media Type

Basically a more exotic #6

Light mode

22:04 < Sanqui> moufu: no possibility to exclude files/attachments?
22:06 < moufu> nope
22:07 < Sanqui> consider it a feature request, then ;)
22:08 < Sanqui> in general, it would be cool to have a "light mode", which is only textual data, 
                and be able to download files for it later
  • exclude files (disabled in the UI because downloading later is not implemented yet)
  • download them later

cdndl.absDL doesn't seem to work on Windows

23:56 < CarlM> It seems to work for me... right up until it tries to change an avatar's extension 
               from .png.part to .png and gets told off because something else is also using that 
               file, so it dies
23:59 < CarlM> If I do the renaming for it, it proceeds to choke on the next avatar in the same 
               way :x

Add new logging format

For large servers, using a TSV file is very difficult to work with. Could you consider maybe using an SQL database or a mongoDB file to store the data?

Auxiliary program to dump file links

Could help with implementing #15 with little effort, since we could just pass a URL list to wget or wpull. If implemented, we could drop file DL support from the main program, removing a major source of issues that are solved by other programs already. We'd have #1 for free, too.

Sytntax

Could some example syntax's be provided? Especially as it relates to making sure we're authenticating against Discord properly.

error downloading attachment: file name too long

2018/11/13 20:03:32 downloading https://cdn.discordapp.com/attachments/290646848778010624/290927371433934848/y7DmozupaxfEqzdwLLp245Yfj9FTx43j-kxqOKmDCcnINOm7JfEWhxWd2JB1Pth7M1C_YOsb8aod1ZJE1tV7GmFbVoMfuRILxqTiUKMeM1jrhsHV35wgsGfgIXUPynK0-NjU-bTucnB5TC-Fe-sKM0_jgQ-f9Ra_hUo9wh0TErh9m4in8vx9vBmGxuLbQj7N4m3hmbd_peALImzbsxLwogqaCankdspaAXY78Q5mVwLyabka3Tgk2Zuc19I7v2r_K_xJQiHZeCMkvj46F8tjDHtZxIQ8rHixnqyUPD3Eif0IMpOgvdnnvn2zFJUbpi7ELNTCvJ8JmwT9pT1laWliz07cuWbOFw7NjHUvUv-DwMCk2QZBxVUi9un6IRFPlGPR9onYuQOxZP0cbHFtQBIjdzfdAMtTDK05MJwEVqPZ5aR_uM0z4sdCH_bS_PDzxWdLiXkV1O352UgZBDwkl_VYZrfHkE36C-CgqnHj098wHvAPFoKoskHmJ0S5rUbf..png
2018/11/13 20:03:33 [162218687514804224/290646848778010624] error downloading attachment 290927371433934848: open attachments/290646848778010624/290927371433934848/y7DmozupaxfEqzdwLLp245Yfj9FTx43j-kxqOKmDCcnINOm7JfEWhxWd2JB1Pth7M1C_YOsb8aod1ZJE1tV7GmFbVoMfuRILxqTiUKMeM1jrhsHV35wgsGfgIXUPynK0-NjU-bTucnB5TC-Fe-sKM0_jgQ-f9Ra_hUo9wh0TErh9m4in8vx9vBmGxuLbQj7N4m3hmbd_peALImzbsxLwogqaCankdspaAXY78Q5mVwLyabka3Tgk2Zuc19I7v2r_K_xJQiHZeCMkvj46F8tjDHtZxIQ8rHixnqyUPD3Eif0IMpOgvdnnvn2zFJUbpi7ELNTCvJ8JmwT9pT1laWliz07cuWbOFw7NjHUvUv-DwMCk2QZBxVUi9un6IRFPlGPR9onYuQOxZP0cbHFtQBIjdzfdAMtTDK05MJwEVqPZ5aR_uM0z4sdCH_bS_PDzxWdLiXkV1O352UgZBDwkl_VYZrfHkE36C-CgqnHj098wHvAPFoKoskHmJ0S5rUbf..png.part: file name too long

The file name is obscenely long, however, as it stands this prevents the Discord server from being archived. Such a file is probably best skipped.

Partial/distributed grabs are hard to do

Pullcord is currently not well suited for grabbing only a select portion of the history. It is possible to start grabbing from some point in time which is not the beginning, but to do that you'd need to specifically prepare a log file with a fake entry, which seems hacky. Getting logs only up to a certain point is impossible to do reliably right now.

Adding support for partial grabs is almost certainly going to increase the amount of command line options. I'd prefer to avoid clutter and/or feature creep, but I'm not sure if it's possible with the way the application is laid out right now. It wasn't really designed with large scale grabs in mind.

Too many non-fatal errors are treated as fatal

Examples: #3, #7, #9, #10, #18, #20
There is a good reason for the current default behavior: skipping many of these errors would leave logs permanently incomplete. If you re-run pullcord, it continues downloading data from where it left off without checking if the already downloaded logs don't have gaps. This is in order to avoid re-requesting entire message history on each run, which would make incremental downloads extremely slow, and arguably, not very incremental.

The default behavior is perfectionist and tries to clone guild state as closely as possible, aborting if it can't save absolutely everything that can be saved1. As the above issues illustrate, this often leads to early failures and inability to download the rest of the guild at all.

One way of solving this could be marking some errors as non-fatal and adding a command line option that would skip those instead of aborting. It would be still useful to have them reported back here, though, as many of these errors indicate some kind of bug in pullcord. Not crashing could make people less likely to report bugs.

  1. Well, to be exact, this is no longer the case as of d6f3ad6 because that error can happen very often, but ideally we'd abort in that case as well.

Download stops after error downloading avatar

I ran ./pullcord -history -user EMAIL -pass PASS -t TOKEN on my account, and after 5990 avatar downloads, I saw

2018/11/14 18:36:54 downloading https://cdn.discordapp.com/avatars/180393403144601600/e55aa45b022140d1ea222ebc99c8c9d4.png?size=2048
2018/11/14 18:36:54 downloading https://cdn.discordapp.com/avatars/180396749951729664/60e5cca70a6068a6e51f88c3f1abd4a6.png?size=2048
2018/11/14 18:37:04 [85338836384628736] error downloading avatar for user 180396749951729664: non-200 status code: 504

and the download stopped.

panic: assignment to entry in nil map

I'm trying to archive a server, and it seems to panic on the splash image of a server with error:

panic: assignment to entry in nil map

goroutine 29 [running]:
github.com/tsudoko/pullcord/logcache.(*Entries).WriteNew(0xc0001bd368, 0x817a60, 0xc0000b6098, 0xc00009ed00, 0xd, 0xd)
        /home/webb/go/src/github.com/tsudoko/pullcord/logcache/main.go:58 +0x172
github.com/tsudoko/pullcord/logpull.(*Puller).PullGuild(0xc0001bd350, 0xc0001c0bc0, 0x12, 0x12, 0xc0001dfd58)
        /home/webb/go/src/github.com/tsudoko/pullcord/logpull/main.go:130 +0x299
main.do(0xc0000dc3c0, 0xc00018c000)
        /home/webb/go/src/github.com/tsudoko/pullcord/cmd/pullcord/main.go:47 +0x948
github.com/tsudoko/pullcord/vendor/github.com/bwmarrin/discordgo.readyEventHandler.Handle(0x7bfbe0, 0xc0000dc3c0, 0x70eaa0, 0xc00018c000)
        /home/webb/go/src/github.com/tsudoko/pullcord/vendor/github.com/bwmarrin/discordgo/eventhandlers.go:692 +0x57
created by github.com/tsudoko/pullcord/vendor/github.com/bwmarrin/discordgo.(*Session).handle
        /home/webb/go/src/github.com/tsudoko/pullcord/vendor/github.com/bwmarrin/discordgo/event.go:171 +0x10e

Weird behaviour

I'm trying to download from individual channels, but I keep getting this error:

pullcord -t xxxx -c 566527119652290568 -history
2020/01/23 02:19:59 [DG0] wsapi.go:540:onEvent() error unmarshalling READY event, json: cannot unmarshal number into Go struct field ReadState.last_message_id of type string
2020/01/23 02:20:20 [566527119652290566] error getting members from 0: (*discordgo.RESTError) HTTP 403 Forbidden, {"message": "Missing Access", "code": 50001}

But I do have access to that channel.
And soon after I instantly get blocked by Discord:

pullcord -t NDc1NjI2NjEzODMzMzM0Nzg3.XiCj9A.GoLxRH3fq4yBNR1rh2_NyLxAyBI -c 566527119652290568 -history
2020/01/23 02:22:08 [DG0] wsapi.go:540:onEvent() error unmarshalling READY event, json: cannot unmarshal number into Go struct field ReadState.last_message_id of type string
2020/01/23 02:22:09 error getting guilds:HTTP 403 Forbidden, {"message": "You need to verify your account in order to perform this action.", "code": 40002}

Note that I'm using a non-bot account.

Download is missing server join messages

Hi,

When I download a channels content only the messages are saved.
I would like to download the server entry messages as well.
What would you recommend I do to achieve this?

Example of server entry below
image

Thanks

Missing reaction data

094229| sanqui@kuriputoraido:~/archive/discord/run $ pullcord [scrubbed]
2018/11/12 09:43:48 token: [scrubbed]
2018/11/12 09:43:50 [306193495234707456] downloaded 673 members, last id 504121747776471041 with name ichi
2018/11/12 09:43:51 [306193495234707456/306193495234707456] error getting users for reaction #⃣ to 407748716318162949: HTTP 404 Not Found, {"code": 0, "message": "404: Not Found"}```

error downloading external emoji

2018/11/12 13:52:37 downloading https://cdn.discordapp.com/emojis/409850591477628939.png?size=2048
2018/11/12 13:52:38 [386543982399979531/386553240763891724] error downloading external emoji : non-200 status code: 404

pullcord should retry and eventually give up and continue. Or maybe just give up straight away on 404.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.