tsudoko / pullcord Goto Github PK
View Code? Open in Web Editor NEWDiscord archiver
License: The Unlicense
Discord archiver
License: The Unlicense
I'm trying to download from individual channels, but I keep getting this error:
pullcord -t xxxx -c 566527119652290568 -history
2020/01/23 02:19:59 [DG0] wsapi.go:540:onEvent() error unmarshalling READY event, json: cannot unmarshal number into Go struct field ReadState.last_message_id of type string
2020/01/23 02:20:20 [566527119652290566] error getting members from 0: (*discordgo.RESTError) HTTP 403 Forbidden, {"message": "Missing Access", "code": 50001}
But I do have access to that channel.
And soon after I instantly get blocked by Discord:
pullcord -t NDc1NjI2NjEzODMzMzM0Nzg3.XiCj9A.GoLxRH3fq4yBNR1rh2_NyLxAyBI -c 566527119652290568 -history
2020/01/23 02:22:08 [DG0] wsapi.go:540:onEvent() error unmarshalling READY event, json: cannot unmarshal number into Go struct field ReadState.last_message_id of type string
2020/01/23 02:22:09 error getting guilds:HTTP 403 Forbidden, {"message": "You need to verify your account in order to perform this action.", "code": 40002}
Note that I'm using a non-bot account.
Could some example syntax's be provided? Especially as it relates to making sure we're authenticating against Discord properly.
pullcord stopped with this error while getting the history for my account:
2018/11/15 22:59:44 downloading https://cdn.discordapp.com/attachments/85630519307427840/458619352686788608/skillHUD_elan.gif
2018/11/15 22:59:44 downloading https://cdn.discordapp.com/attachments/85630519307427840/458646373701517323/225.gif
2018/11/15 22:59:45 downloading https://cdn.discordapp.com/attachments/85630519307427840/458646373701517323/225.gif
2018/11/15 22:59:45 [85338836384628736/85630519307427840] error downloading attachment 458646373701517323 for message 458646374561611786: (*os.LinkError) rename attachments/85630519307427840/458646373701517323/225.gif.part attachments/85630519307427840/458646373701517323/225.gif: no such file or directory
# ls -l attachments/85630519307427840/458646373701517323
total 2,928,640
-rw-r----- 1 ivan ivan 2,928,056 2018-11-15 22:59 225.gif
(OS is Debian 9.6; Go is the official go1.11.2.linux-amd64.tar.gz
)
edit: and another:
2018/11/15 23:22:28 downloading https://cdn.discordapp.com/attachments/85630519307427840/502439705674121226/Halloween_Gif_2_min.gif
2018/11/15 23:22:28 downloading https://cdn.discordapp.com/attachments/85630519307427840/502439705674121226/Halloween_Gif_2_min.gif
2018/11/15 23:22:29 [85338836384628736/85630519307427840] error downloading attachment 502439705674121226 for message 502439707326545930: (*os.LinkError) rename attachments/85630519307427840/502439705674121226/Halloween_Gif_2_min.gif.part attachments/85630519307427840/502439705674121226/Halloween_Gif_2_min.gif: no such file or directory
2018/11/12 22:32:53 downloading https://cdn.discordapp.com/emojis/396521772868042754.gif?size=2048
2018/11/12 22:32:53 [306193495234707456/347190075471298568] error downloading external emoji 396521772868042754: non-200 status code: 415
415 is Unsupported Media Type
Basically a more exotic #6
Examples: #3, #7, #9, #10, #18, #20
There is a good reason for the current default behavior: skipping many of these errors would leave logs permanently incomplete. If you re-run pullcord, it continues downloading data from where it left off without checking if the already downloaded logs don't have gaps. This is in order to avoid re-requesting entire message history on each run, which would make incremental downloads extremely slow, and arguably, not very incremental.
The default behavior is perfectionist and tries to clone guild state as closely as possible, aborting if it can't save absolutely everything that can be saved1. As the above issues illustrate, this often leads to early failures and inability to download the rest of the guild at all.
One way of solving this could be marking some errors as non-fatal and adding a command line option that would skip those instead of aborting. It would be still useful to have them reported back here, though, as many of these errors indicate some kind of bug in pullcord. Not crashing could make people less likely to report bugs.
I'm trying to archive a server, and it seems to panic on the splash image of a server with error:
panic: assignment to entry in nil map
goroutine 29 [running]:
github.com/tsudoko/pullcord/logcache.(*Entries).WriteNew(0xc0001bd368, 0x817a60, 0xc0000b6098, 0xc00009ed00, 0xd, 0xd)
/home/webb/go/src/github.com/tsudoko/pullcord/logcache/main.go:58 +0x172
github.com/tsudoko/pullcord/logpull.(*Puller).PullGuild(0xc0001bd350, 0xc0001c0bc0, 0x12, 0x12, 0xc0001dfd58)
/home/webb/go/src/github.com/tsudoko/pullcord/logpull/main.go:130 +0x299
main.do(0xc0000dc3c0, 0xc00018c000)
/home/webb/go/src/github.com/tsudoko/pullcord/cmd/pullcord/main.go:47 +0x948
github.com/tsudoko/pullcord/vendor/github.com/bwmarrin/discordgo.readyEventHandler.Handle(0x7bfbe0, 0xc0000dc3c0, 0x70eaa0, 0xc00018c000)
/home/webb/go/src/github.com/tsudoko/pullcord/vendor/github.com/bwmarrin/discordgo/eventhandlers.go:692 +0x57
created by github.com/tsudoko/pullcord/vendor/github.com/bwmarrin/discordgo.(*Session).handle
/home/webb/go/src/github.com/tsudoko/pullcord/vendor/github.com/bwmarrin/discordgo/event.go:171 +0x10e
094229| sanqui@kuriputoraido:~/archive/discord/run $ pullcord [scrubbed]
2018/11/12 09:43:48 token: [scrubbed]
2018/11/12 09:43:50 [306193495234707456] downloaded 673 members, last id 504121747776471041 with name ichi
2018/11/12 09:43:51 [306193495234707456/306193495234707456] error getting users for reaction #⃣ to 407748716318162949: HTTP 404 Not Found, {"code": 0, "message": "404: Not Found"}```
pullcord stopped after:
2019/04/27 02:19:42 [85338836384628736] error downloading avatar for user 105888302300717056: (cdndl.ErrNotOk) non-200 status code: 415
I ran ./pullcord -history -user EMAIL -pass PASS -t TOKEN
on my account, and after 5990 avatar downloads, I saw
2018/11/14 18:36:54 downloading https://cdn.discordapp.com/avatars/180393403144601600/e55aa45b022140d1ea222ebc99c8c9d4.png?size=2048
2018/11/14 18:36:54 downloading https://cdn.discordapp.com/avatars/180396749951729664/60e5cca70a6068a6e51f88c3f1abd4a6.png?size=2048
2018/11/14 18:37:04 [85338836384628736] error downloading avatar for user 180396749951729664: non-200 status code: 504
and the download stopped.
2018/11/12 13:52:37 downloading https://cdn.discordapp.com/emojis/409850591477628939.png?size=2048
2018/11/12 13:52:38 [386543982399979531/386553240763891724] error downloading external emoji : non-200 status code: 404
pullcord should retry and eventually give up and continue. Or maybe just give up straight away on 404.
23:56 < CarlM> It seems to work for me... right up until it tries to change an avatar's extension
from .png.part to .png and gets told off because something else is also using that
file, so it dies
23:59 < CarlM> If I do the renaming for it, it proceeds to choke on the next avatar in the same
way :x
22:04 < Sanqui> moufu: no possibility to exclude files/attachments?
22:06 < moufu> nope
22:07 < Sanqui> consider it a feature request, then ;)
22:08 < Sanqui> in general, it would be cool to have a "light mode", which is only textual data,
and be able to download files for it later
Hi,
I am having an Authentication failed error.
"opening the websocket connection failed:websocket: close 4004: Authentication failed."
What would you recommend I do?
I am running on Windows 10 and have tried running the command as an admin.
The command I ran was pullcord -history -c 738xxxxxxxxxxxxx
Thanks
For large servers, using a TSV file is very difficult to work with. Could you consider maybe using an SQL database or a mongoDB file to store the data?
When using a bot token, I get this error:
2020/01/15 21:06:24 error getting guilds:HTTP 401 Unauthorized, {"message": "401: Unauthorized", "code": 0}
../../go/src/github.com/tsudoko/pullcord/logpull/main.go:372:62: too few values in discordgo.MessageReaction literal
../../go/src/github.com/tsudoko/pullcord/logpull/main.go:381:60: too few values in discordgo.MessageReaction literal
2018/11/13 20:03:32 downloading https://cdn.discordapp.com/attachments/290646848778010624/290927371433934848/y7DmozupaxfEqzdwLLp245Yfj9FTx43j-kxqOKmDCcnINOm7JfEWhxWd2JB1Pth7M1C_YOsb8aod1ZJE1tV7GmFbVoMfuRILxqTiUKMeM1jrhsHV35wgsGfgIXUPynK0-NjU-bTucnB5TC-Fe-sKM0_jgQ-f9Ra_hUo9wh0TErh9m4in8vx9vBmGxuLbQj7N4m3hmbd_peALImzbsxLwogqaCankdspaAXY78Q5mVwLyabka3Tgk2Zuc19I7v2r_K_xJQiHZeCMkvj46F8tjDHtZxIQ8rHixnqyUPD3Eif0IMpOgvdnnvn2zFJUbpi7ELNTCvJ8JmwT9pT1laWliz07cuWbOFw7NjHUvUv-DwMCk2QZBxVUi9un6IRFPlGPR9onYuQOxZP0cbHFtQBIjdzfdAMtTDK05MJwEVqPZ5aR_uM0z4sdCH_bS_PDzxWdLiXkV1O352UgZBDwkl_VYZrfHkE36C-CgqnHj098wHvAPFoKoskHmJ0S5rUbf..png
2018/11/13 20:03:33 [162218687514804224/290646848778010624] error downloading attachment 290927371433934848: open attachments/290646848778010624/290927371433934848/y7DmozupaxfEqzdwLLp245Yfj9FTx43j-kxqOKmDCcnINOm7JfEWhxWd2JB1Pth7M1C_YOsb8aod1ZJE1tV7GmFbVoMfuRILxqTiUKMeM1jrhsHV35wgsGfgIXUPynK0-NjU-bTucnB5TC-Fe-sKM0_jgQ-f9Ra_hUo9wh0TErh9m4in8vx9vBmGxuLbQj7N4m3hmbd_peALImzbsxLwogqaCankdspaAXY78Q5mVwLyabka3Tgk2Zuc19I7v2r_K_xJQiHZeCMkvj46F8tjDHtZxIQ8rHixnqyUPD3Eif0IMpOgvdnnvn2zFJUbpi7ELNTCvJ8JmwT9pT1laWliz07cuWbOFw7NjHUvUv-DwMCk2QZBxVUi9un6IRFPlGPR9onYuQOxZP0cbHFtQBIjdzfdAMtTDK05MJwEVqPZ5aR_uM0z4sdCH_bS_PDzxWdLiXkV1O352UgZBDwkl_VYZrfHkE36C-CgqnHj098wHvAPFoKoskHmJ0S5rUbf..png.part: file name too long
The file name is obscenely long, however, as it stands this prevents the Discord server from being archived. Such a file is probably best skipped.
2018/11/12 09:45:56 [446709324614467596/446714306428665856] error getting messages from : HTTP 403 Forbidden, {"code": 50001, "message": "Missing Access"}
pullcord should skip channels without access.
Pullcord is currently not well suited for grabbing only a select portion of the history. It is possible to start grabbing from some point in time which is not the beginning, but to do that you'd need to specifically prepare a log file with a fake entry, which seems hacky. Getting logs only up to a certain point is impossible to do reliably right now.
Adding support for partial grabs is almost certainly going to increase the amount of command line options. I'd prefer to avoid clutter and/or feature creep, but I'm not sure if it's possible with the way the application is laid out right now. It wasn't really designed with large scale grabs in mind.
Right now threads are just ignored, logs are incomplete if those are not archived as well
Depends on PR #608 for discordgo.
2018/11/12 10:19:33 downloading https://cdn.discordapp.com/attachments/162272327759953921/460863149390692382/unknown.png
2018/11/12 10:19:33 [162272327759953921/162272327759953921] error downloading attachment 460863149390692382: http2: server sent GOAWAY and closed the connection; LastStreamID=1999, ErrCode=NO_ERROR, debug=""
Pullcord should retry and resume.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.