Giter Club home page Giter Club logo

yark's Introduction

Yark

YouTube archiving made simple.

Installation

To install Yark, simply download Python 3.9+ and FFmpeg (optional), then run the following:

$ pip3 install yark

Managing your Archive

Once you've installed Yark, think of a name for your archive (e.g., "foobar") and copy the target's url:

$ yark new foobar https://www.youtube.com/channel/UCSMdm6bUYIBN0KfS2CVuEPA

Now that you've created the archive, you can tell Yark to download all videos and metadata using the refresh command:

$ yark refresh foobar

Once everything has been downloaded, Yark will automatically give you a status report of what's changed since the last refresh:

Report Demo

Viewing your Archive

Viewing you archive is easy, just type view with your archives name:

$ yark view foobar

This will pop up an offline website in your browser letting you watch all videos ๐Ÿš€

Viewer Demo

Under each video is a rich history report filled with timelines and graphs, as well as a noting feature which lets you add timestamped and permalinked comments ๐Ÿ‘

Viewer Demo โ€“ Stats

Light and dark modes are both available and automatically apply based on the system's theme.

Details

Here are some things to keep in mind when using Yark; the good and the bad:

  • Don't create a new archive again if you just want to update it, Yark accumulates all new metadata for you via timestamps
  • Feel free to suggest new features via the issues tab on this repository
  • Scheduling isn't a feature just yet, please use cron or something similar!

Archive Format

The archive format itself is simple and consists of a directory-based structure with a core metadata file and all thumbnail/video data in their own directories as typical files:

  • [name]/ โ€“ Your self-contained archive
    • yark.json โ€“ Archive file with all metadata
    • yark.bak โ€“ Backup archive file to protect against data damage
    • videos/ โ€“ Directory containing all known videos
      • [id].* โ€“ Files containing video data for YouTube videos
    • thumbnails/ โ€“ Directory containing all known thumbnails
      • [hash].png โ€“ Files containing thumbnails with its hash

It's best to take a few minutes to familiarize yourself with your archive by looking at files which look interesting to you in it, everything is quite readable.

yark's People

Contributors

alexanderdavidj avatar owez avatar tantalor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

yark's Issues

FFMPEG merging for better quality

Right now Yark is unfortunately limited to 720p videos at most because YouTube only has videos with video + audio combined at this resolution. To download at 1080p or higher (2k/4k videos), an FFMPEG merging solution needs to be implemented to pull seperate audio and video from YouTube, and then combine them.

This issue stems from #55 and #58, it'll be part of the upcoming Yark 1.3 as a major fix.


This issue was a red herring; there's now much nicer support for better quality videos and documentation to support users using FFmpeg but people could always download high quality videos; it turns out I didn't have FFmpeg installed properly during testing

Sort videos by date

Should've done this a while ago, top should have the most recent videos. Could separate them by month but that's quite annoying to do.

Better description rendering

Some problems with descriptions on the viewer:

  • Description links don't work
  • Timestamps are included in descriptions and look ugly
  • No newlines on description
  • Text should probably be smaller

Report command

I'd like there to be an overall yark report command which shows the most interesting details about a channel โ€“ what they've updated and what they've deleted manually. Might use the same reporter formatter as the rest, but ideally this would have some nice terminal graphics. Could maybe be a part of the viewer in the future, but for now a nice report command would be easiest to implement and the most useful until categories are implemented in the viewer.

Don't add statistic if it's null

View/like statistics can sometimes (but rarely) return null values for some reason. Luckily, this just looks like a missing bit on a view graph. To fix this, just don't add the new tally if it's null.

Number of videos

Will it be possible to download the required number of videos?

Download Videos in 1080p

Hi, first I would like to thank you for making this program.

There's an issue that I want to bring up that although that the command line program will show that the videos downloaded are in 1080p, inspecting the video files themselves will show that they are actually in 720p.
image

Is there a way to force it to download a certain quality when archiving...? Thanks.

Skip pending livestreams when archiving

Hi again, sorry to open another issue.

Will you give the option to skip pending livestreams when archiving channels? It would be annoying to often wait for those streams to finish before being able to run yark on a streamer's channel.

image

Noting

On the side of the videos there should be a good note-taking feature. I'm calling this notes instead of comments to avoid confusion.

Requirements

The requirements for this are:

  • Easy addition from the sidebar where YouTube's reccomended videos are
  • Permalinks that just add like /id to the end of the video url
    • /note/id could also work, but I don't see the need
    • Easily copyable so they can be linked to from other sources1
  • Ideally markdown-based2

Schema

The schema inside of yark.json will be:

// video
{
    // ..
    "notes": [
        // note
        {
            "id": "uuid", // uuid
            "title": "text", // plaintext title
            "body": "markdown" // markdown-formatted comment, can be null
            "timestamp": 0 // seconds, can be null
        }
    ]
}

Notice the "x": [{}] instead of "x": {"y": {}} because this issue will be started once #9 is out.

Footnotes

  1. I might be making a wiki project, so having links to channels/videos and comments in videos is very useful โ†ฉ

  2. Rendering might be a bit annoying because headings shouldn't really be allowed โ†ฉ

Track deleted videos

The final addition to the reporter is to say which videos have been deleted. This might involve each video having a "deleted" tag inside of them so we can track which have been reported on already. This will also allow a possible new report command which lists all of the major updates and deletions inside for a channel.

Channel name on navbar

The navbar should have a link next to "Yark" which lets people navigate back to their channel because navigability is currently quite bad, though #21 helped.

Firefox navbar goes off x axis

There's a horizontal scrollbar when testing with Firefox on windows. My Mac-book doesn't seem to have the same issue.

Split files up for restructure

A personal benchmark for splitting yark.py up into a proper module was 1.5k lines and it's reached that more or less. Will make yark.py into a full module with reccomended conventions. This should also come with some nicer module-level documentation so it can be made into a module-first program in the future to make it less reliant on the CLI.

Git bash fails with reporting

As I earlier reported on reddit post, I have this error during execution of refresh command:

image

Note: This error don't avoid to run and use correctly the "view" command.

The error says:
"UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f50e' in position 42: character maps to undefined"

So I have found that character '\U0001f50e' is the maginify icon: ๐Ÿ”Ž (https://emojipedia.org/es/emoji/%F0%9F%94%8E/)

I have tried to fix it by looking in the code, but I couldn't find the magnify icon neither the Unicode code: 1f50e โ˜น

Viewer for livestreams and shorts

During the process of fixing #28 I started to seperate videos from livestreams and shorts in the viewer and now everything is based on /[channel]/videos instead of just /[channel]. This can be expanded by having /[channel]/shorts and /[channel]/livestreams, with a tab bar letting people navigate from one to the other. Annoying job to do but it's needed for livestreams and shorts to be feature-complete.

Thumbnail performance

Currently all thumbnails are downloaded everytime the metadata is refreshed. Hopefully there might be a hash attached online on YouTube so they don't need to be downloaded to check if we've already got them.

Maximum for category in cli

To fully complete maximums, they should be bucketed into videos, shorts, and livestreams. By default a maximum of 5 downloads 5 from each (see #29 for recent changes). If you specify something like --videos=3 then it'll be 5 videos and 0 others, but --videos=0 --shorts=26 then itd be that.

Basically a way to explicitly define maximums for each category, this could replace the current max option; it makes it more complex but I doubt people will always want 5 of each category. Leaving both in makes it even more complicated.

Native playlist downloading

Hey, saw this in Hackernews, looks pretty neat.

Even though the README only mentions channels, I know that yt-dlp also supports downloading playlists. When I give it a playlist that has videos that were deleted, yark breaks. This is common in music playlists, for example this one.

I had a quick look in the code, I managed to get it to work with this simple hotfix:

diff --git a/yark/channel.py b/yark/channel.py
index 4863823..26eb7af 100644
--- a/yark/channel.py
+++ b/yark/channel.py
@@ -139,6 +139,7 @@ class Channel:
         settings = {
             "outtmpl": "%(id)s%(ext)s",
             "logger": VideoLogger(),
+            "ignoreerrors": True
         }
 
         # Get response and snip it
@@ -188,6 +189,8 @@ class Channel:
                 else:
                     _err_msg(f"Unknown video kind '{kind}' found", True)
 
+        videos = [x for x in videos if x is not None]
+
         # Parse metadata
         self._parse_metadata("video", videos, self.videos)
         self._parse_metadata("livestream", livestreams, self.livestreams)

I'm not making a formal PR because you might know of a cleaner way of implementing this check, but you're welcome to apply this patch if you want. Also it might be good to mention in the README that it works on playlists too, not just channels.

Update example images

New ones should be made because yark looks a bit different now with rounded thumbnails.

Video history on viewer

I want to have a graph of likes/views and a history of title/description changes on the viewer for videos.

Logo

Need a logo to make the repo fancy, probably a yt symbol ontop of an ark.

Download livestreams

After the recent changes in Youtube - the separation of videos and livestreams, will it be possible to download livestreams?

Dry run for downloads

If there's a massive channel and you just want to download the videos after stopping the big download session, you probably don't want to refresh your massive metadata. To fix this, having some --dry argument in the refresh config would be good. Need to differentiate:

  • Not pulling new metadata (this issue)
  • Not downloading videos (future issue)

So that these two options wont collide with naming. I don't want --no-.. because that's a bit ugly.

Livestreams

Listening into livestreams which are marked as non-saved would be useful because otherwise all livestreams go unarchived. For this to work I need to rely on a yt-dlp listener which I can impl into the daemon once #12 is finished.

Livestreams which get saved can be treated as normal videos, so they won't be recorded real-time. Livestreams that go un-saved are different than normal videos, but i'm not sure if they deserve their own "livestreams": [] section in the master file.

Viewed and unviewed

Great tool, thanks for making it.

After a video is viewed, could it be marked as "viewed" to make it easy to see on the gallery page which videos haven't been viewed yet? It would also be nice to have a toggle button to mark a video as "viewed" or "unviewed".

Deterministic maximums

Right now if you specify a max of 5 for example it will download 5 of the newest videos it hasn't already got. Ideally the max would be more specifiying "we should have 5 of the most recent videos", so if you've already got like 10 of the newest videos downloaded it won't download anymore.

Same concept as terraform has with servers and that; repeatedly putting a max of 5 should download 5 videos, not 5 then another 5 and another 5 etc.

Yark vs youtube-dl/yt-dlp

I realize the repo has just been released and the read me will get populated eventually as the repo's usage makes a progress.

But can we have some detailed view from the author @Owez, of a compare/contrast this repo with yt-dl or yt-dlp?

User channels

Custom ids can also mean /user instead of the usual custom /c channel.

Use a format fallback or allow to manually set the video format

The current format used to download videos is notoriously error prone (see yt-dlp/yt-dlp#3372).

One solution would be for yark to fallback to webm in such cases, to ask the user to manually pick a format from the yt-dlp -F output or to have an option to pass the following string as a CLI argument https://github.com/Owez/yark/blob/c48e37ae405052bb443b04098d088ccc8e071b4e/yark/channel.py#LL209C10-L209C10
The later would also allow to save channels at higher resolution since YouTube only serves mp4 with audio up to 720p resolution.

The main drawback is that the video file will generally be larger in such cases.

Example of a video affected by it: https://www.youtube.com/watch?v=YbYpbXMUsYM

yt-dlp error:

yt-dlp -f "best/[ext=mp4]/hasvid" "https://www.youtube.com/watch?v=YbYpbXMUsYM" -o YbYpbXMUsYM.mp4
[youtube] Extracting URL: https://www.youtube.com/watch?v=YbYpbXMUsYM
[youtube] YbYpbXMUsYM: Downloading webpage
[youtube] YbYpbXMUsYM: Downloading android player API JSON
[info] YbYpbXMUsYM: Downloading 1 format(s): 22
[download] Resuming download at byte 1713408


ERROR: Did not get any data blocks

yark error:

yark refresh munecat
Loading munecat channel..
Downloading metadata..
Parsing video metadata..
Parsing livestream metadata..
Parsing shorts metadata..
Cleaning out previous temporary files..
Downloading 19 new videos..
  โ€ข Downloading YbYpbXMUsYM, at 0.2%..
  โ€ข Unknown error whilst downloading videos, details below:
[download] Got error: Downloaded 1713408 bytes, expected 780994664 bytes, retrying in a few seconds..
  โ€ข Fault with YouTube's servers, retrying in a few seconds..
  โ€ข Unknown error whilst downloading videos, details below:
ERROR: Did not get any data blocks, retrying in a few seconds..
  โ€ข Unknown error whilst downloading videos, details below:
ERROR: Did not get any data blocks, retrying in a few seconds..
  โ€ข Unknown error whilst downloading videos, details below:
ERROR: Did not get any data blocks
  โ€ข Sorry, failed to download {name}
Please file a bug report if you think this is a problem with Yark!

Viewer capitalisation

The change from yark to Yark has left the viewer's title looking off because video titles where stylised towards lowercase-only.

Object-based schema

Whilst we're still in development, I want to revise the schema to use arrays and json objects to simplify parsing, and because it's more logical. This doesn't have any real downsides except for maybe being easier to identify videos when you collapse json in an editor during debugging.

The version will still be 1 because this is pre-release. Any breaking changes after yark is published to PyPI will have the schema's version increment.

Current

We currently rely on videos being like this:

"videos":
    // video
    "id": {
        // rest of video
    },
    // video
    "id": {
        // rest of video
    }
}

This isn't optimal because the ID is hard to get out and it has some edge cases and stuff that isn't nice when using jinja.

Revised

The better model for the schema would be object-based:

"videos": [
    // video
    {
        "id": "id",
        // rest of video
    },
    // video
    {
        "id": "id",
        // rest of video
    }
]

It is also just far easier to parse with python so it'll make conversion efforts and the reuse of video.id down. This revised format will also be used for the new #10 noting feature.

CSS timestamps

Should make an ISO timestamp into a shortened block which is copied to clipboard on click. Can use CSS or JS.

Video downloading

The archiver fully works with all metadata and thumbnails, just need to add downloading to it which should be an easy job all things considered.

Viewer

I'd like a flask-based viewer which is like an offline YouTube that has all the changes on it. Documented in the README and it should be included in the demo because it's a cool feature.

Nicer yt-dlp outputs

The current stock ones don't fit with the simple printing. Ideally its just "[id] [percentage]%" instead of all the printing. Quiet mode doens't reallt work because it spits out warning messages for big channels โ€“ still way faster than youtube-dl though.

Playlist viewing

Hi there!
I like this project!

Can we get videos grouped or sorted by playlist ?

Sometimes when I need to download the entire channel, I've to get all playlist links from the channel and download it one by one.
yt-dlp --yes-playlist -o "./%(channel)s/%(playlist)s/%(playlist_index)s-%(title)s-%(id)s.%(ext)s" playlist_url

image

image

But, when there are too many playlists, it's not that convenient to copy playlist_url one by one, and download it with yt-dlp --yes-playlist ...

What You think about it ?

Publish to PyPI

Needs a setup.py file and some compatibility with Pipenv for ease of publishing.

Fix errors from new version check

yark/yark.py

Line 915 in c0b8510

def _pypi_version():

In development builds theres multiple different errors which stem from the recently-introduced version checker. It should be more resilient to errors and maybe quietly error if it can't connect because it's ran on every call of Yark. This isn't a critical issue, it just means that the version checker won't work when the errors pop up.

Comments

๐ŸŽ‰ Hey everyone from hackernews!

I'd like to add comment archiving to Yark in v1.3 because it's been suggested a few times, originally I didn't add it because I wanted to focus on the core archiver but now that it's stable comments would be a nice new feature as a time capsule and for OSINT stuff.

Not sure how the archive format will change to accomodate this; there will need to be a new "comments" bit on each video controlled by a new comment class, so this can be part of a new v4 of the archive format which will also change with #27

Update message

Should maybe query PyPI to see if there's a new version out to tell users when they use the cli, this'd be useful because Yark is still under fast-ish development. Will use urllib even though it's jank, no need to add extra deps for requests.

Rewrite the cli

The current self-rolled CLI is nice but with #40 it should eventually be replaced with click or something similar.

Add package to popular package managers

I sadly don't have the competence to do it myself at this moment in time, though there's documentation for someone to do it properly, this way it can be easily installed and kept up to date on macOS systems

Error 500 because of YouTube

Just need a try-catch retry on this because rarely YouTube does a "500 internal server error" and crashes yark.

Broken searchbar in python

The searchbar in the viewer is giving off some python error and you have to traverse to /channel/[name] manually right now from the homepage.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.