jappeace / cut-the-crap Goto Github PK

Automated video editing for streamers

License: MIT License

Haskell 94.35% Nix 2.14% Makefile 1.14% Shell 0.59% Dockerfile 0.13% C 1.65%

cut-the-crap's Introduction

Bless This Mess

Cut the crap is an automatic video editing program for streamers. It can cut out uninteresting parts by detecting silences. This was inspired by jumpcutter, where this program can get better quality results by using an (optional) dedicated microphone track. This prevents cutting of quieter consonants for example. Using ffmpeg more efficiently also produces faster results and is less error prone.

Youtube has different requirements from streams then twitch does. We want to cut out boring parts. Jumpcut has solved that problem partly and this program builds on top of that idea. At the moment we use ffmpeg for silence detection, then we do some maths to figure out which segments are sounded, which is combined into the output video.

In the future we will add support for a music track which will not be chopped up.

Install

From source

Install the nix package manager.

git clone https://github.com/jappeace/cut-the-crap
cd cut-the-crap
nix-build .
result/bin/cut-the-crap

Bundle build (staticly linked bundled with runtime deps)

From version 2.1.1 and onwards these nix bundles will be attached to releases on the release page. These should work on any Linux distribution. Download the executable from the release page.

Under the hood we use nix-bundle for this. These are so large because everything from libc to youtube-dl are packaged within.

Nix/Nixos

Run nix-env -iA nixos.haskellPackages.cut-the-crap or add to systemPackages.
simply run cut-the-crap to display usage instructions.

This only works for nixpkgs that have cut-the-crap >= 1.4.2 or =< 1.3 There were some build issues with 1.4.0 and 1.4.1 (now fixed)

Usage notes

Up to date help is available in the program itself:

cut-the-crap

Run the program:

cut-the-crap listen https://www.youtube.com/watch?v=_PB6Hdi4R7M

It works both with youtube or twitch videos (VODS). The program simply passes the URL to youtube-dl.

We can also run it on a local file of course:

cut-the-crap listen somelocalfile.mkv

There is also a work in progress subtitle generation:

cut-the-crap subtitles https://www.youtube.com/watch?v=_PB6Hdi4R7M

Noise gate

Make sure to record with a noise gate on your microphone. This will cut out background buzzing and allow you to use a more aggressive threshold on noise detection.

OBS tracks

Setup OBS so that you record the microphone and the desktop audio on separate tracks. In my own setup I have track 1 for combining all audio, track 2 for just the microphone and track 3 for desktop audio. Then I can use:

    cut-the-crap listen ./recordFromObs.mkv ./someOut.mkv --voiceTrack 2 --musicTrack 3

So we throw away track 1, we use track 2 for silence detection, and track 3 get's mixed in after cutting is complete. If you don't want music being mixed back into the result, for example for further editing, you can also leave that argument out. I did this for example to mix back in the music of the original file later.

Test data

It maybe a bit awkward to record yourself just for testing data. To get some easy test date we can use youtube-dl, and make it a bit shorter with ffmpeg, for example:

youtube-dl "https://www.youtube.com/watch?v=kCpQ4aTzlis" && ffmpeg -i "Opening Ceremony & 'Languages all the way down' by Rob Rix - ZuriHac 2020-kCpQ4aTzlis.mkv" -t 00:20:00.00 -c copy input.mkv

Use case

I'm using this program to record my stream and upload it to my Youtube channel.

The concrete result is that your audience retention percentage will go up since the videos will be shorter, and more engaging. Sometimes on stream I have intro screens for example which completely get removed, and other times I'm simply thinking. Reducing videos by 30% is not uncommon in my case, which means by default 30% more retention. You could even decide to edit after that which means you have to spend less time on cutting out silences and more time on making it look cool.

Feel free to use or modify this program however you like. Pull requests are appreciated.

Features

Track based silence detection

It is possible to specify one audio output as speech track. This will be used to for silence detection only. The result is very precise silence detection.

Separate music track

Another track would be background and won't be modified at all. In the end it just get's cut of how far it is.

This way we get good music and interesting stream. Another idea is to remix an entirely different source of music into the video, so we can play copyrighted music on stream and Youtube friendly music on Youtube.

Design

This project is mostly a wrapper around ffmpeg. We use Haskell for shell programming.

We first figure out what's going on with the video. For example we do silence detection or speech recogontion, maybe even motion detection etc. After the analyze phase we act in the edit phase. Where we for example cut. Finally we produce some result.

The shelly library was chosen in support of shell programming. Originally we used turtle, but that library is much more complicated to use because it assumes you want to do stream programming, creating several unexpected bugs. So we replaced it with shelly and noticabally reduced code complexity. Now it's truly a 'dumb' wrapper around ffmpeg.

Why not to extend jumpcutter directly?

I wish to build out this idea more to essentially make all streams look like human edited Youtube videos. Although I'm familiar with python, I (am or feel) more productive in haskell, therefore I chose to integrate with, and eventually replace jumpcutter. On stream we've determined most of the functionality is basically ffmpeg. Haskell also opens up the ability to do direct native ffmpeg integration, where we use ffmpeg as a library instead of calling it as a CLI program.

One glaring limitation I've encountered with jumpcutter is that it can't handle larger video files (2 hour 30 minutes +). Scipy throws an exception complaining the wav is to big. Since this program doesn't use scipy it doesn't have that issue.

It also appears like jumpcutter is unmaintained.

Alternatives

This idea is obviously not new, considering ffmpeg has first class support for it. These are listed in no particular order:

Auto editor seems actively maintained and packed with features. It's target audience is different, whereas I wish to host this project on videocut.org and make it available to everyone, auto editor is to be a command line tool.

cut-the-crap's People

Contributors

Stargazers

Watchers

Forkers

blackheaven badly-drawn-wizards mem-memov ariakenom dpwiz lijovklm walseb

cut-the-crap's Issues

Output file with absolute path breaks program

Describe usage in a manual entry

Link up keyboard input to figure if interesting things are happening with recording

Some streamers don't talk a lot but their interesting parts are indicated more by their keyboard input.

A potential way of getting this would be maybe a key logger (which just logs timestamps of presses, not the actual keys) that could run next to obs and that result could then be inserted in cut-the-crap.

Accept piped input

We should be able to accept input from std-in.

Filter out keyboard presses

If we can distinct between human speech and keyboard presses, it shouldn't be hard to filter out segments where there just has been typed

Figure out what's going on with optical character recognition

If we'd use OCR on the video, and then use something like sumy to figure out what's going, we could for example detect 'chapters' in the video.

On the fly editing

Rather then editing a video before watching it, it'd be nicer to directly integrate this project into a player.

Technically I'm not sure how to do this, some possibilities:

Spit out chunks of video to feed into a player
directly hack the editing into the encoder/decoder.

Make chat part of the video

We can use vodus: https://github.com/tsoding/vodus

Fix uncaught exception on running help

Senitment detection for figuring out if parts are interesting or not

We could use it with for example vadar sentiment, or detect directly upon the audio with things like pitch.

A suggestion was to first remove background (like nvidias cool filter). RTX Voice.

No silences in video crashes the program

Add install and usage docs

Nixos

Apply overlay
add to package list
simply run cut-the-crap

Ubuntu/debian

I'm thinking of making a ppa, doesn't exist yet. #10

Fix ubuntu release for new speech recognition changes

we just need to change the docker image to have pocket sphinx and make stack aware of that.

Get rid of optparse generic

I think it's to complex for the benefits, it's also quite inflexible.

The final segment gets cut off

For the hammer video we have these silences be detected:

("[silencedetect @ 0xfc3400] silence_start: 1.88365","[silencedetect @ 0xfc3400] silence_end: 4.76173 | silence_duration: 2.87808")
("[silencedetect @ 0xfc3400] silence_start: 4.85402","[silencedetect @ 0xfc3400] silence_end: 5.50827 | silence_duration: 0.65425")

Those are correct, but we forget the final piece, eg from last silence to the end.

Flip the logic around with a flag, silences are now interesting instead of audio!

Another streamer mentioned that his talking is probably the worst part of his stream. So instead he'd like to filter out all his speech and end up with the perfect audio!

Use anti sound to get rid of background music

If we were able to recognise whatever music is played. We could run the opposite sound against it and then get rid of it entirely.
Loudness would be a bit tricky.

This would result in much better silence detection.

Using twitch videos break downloading from uri

youtube-dl will not make a .mkv file for twitch videos, wheras it does for youtube videos.

A proposed solution is check if the output file is .mkv or not, and if not, move it into a .mkv file.

nix-build: cannot coerce null to a string

I'm getting an error when trying to build the program on macOS 12.6 as per the instructions
https://github.com/jappeace/cut-the-crap#from-source

~/Syncthing/git/cut-the-crap% nix-build . --show-trace                                               
error: cannot coerce null to a string

       at /nix/store/d4pz7qqdiq3747rkcvk72z4rmqpaj97c-nixos-pin-10.10.2020/pkgs/stdenv/generic/make-derivation.nix:192:34:

          191|         // (lib.optionalAttrs (!(attrs ? name) && attrs ? pname && attrs ? version)) {
          192|           name = "${attrs.pname}-${attrs.version}";
             |                                  ^
          193|         } // (lib.optionalAttrs (stdenv.hostPlatform != stdenv.buildPlatform && !dontAddHostSuffix && (attrs ? name || (attrs ? pname && attrs ? version)))) {

       … while evaluating the derivation attribute 'name'
(...)

Full log is here: nix_build_2022-10-24 10-03.txt
Please tell me what troubleshooting info can I provide.

Travis is failing

It complains about the wrong nixpkgs being used right now.

Produce file processing log

At one point I got a video of 20 minute as output from a 4 hour source file, that's obviously wrong. I don't know what happened because I don't have any log files. I probably always should generate a log per file processed at least.

Videos get scrambled

The input.txt used for the final concat command seems to be in a wrong order:

[root@outsource:~]# cat /tmp/sharable-31799-03531b40ae858777/input.txt
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000008170.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000005684.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000000985.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000003083.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000006474.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000000905.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000003958.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000009934.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000009500.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000009699.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000011158.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000011883.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000011620.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000010914.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000010972.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000006961.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000005191.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000011083.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000011053.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000003030.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000000694.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000007942.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000000000.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000003255.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000003204.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000007176.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000011535.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000004523.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000010293.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000001888.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000004188.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000003500.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000000616.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000003305.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000008290.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000011482.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000004917.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000011764.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000011307.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000004069.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000008924.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000009823.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000000740.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000010518.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000003818.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000009099.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000001113.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000012038.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000010691.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000008253.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000008556.mkv'
file '/tmp/sharable-31799-03531b40ae858777/extract/adaptive-60084.mkv-0000002010.mkv'

I think this is genreted by running ls on that folder, but that shouldn't be neccisary since we can generate those filenames from the listed silences in the detect list.

Use the default mechanism from opt parse applicative rather then the lenses (non)

If we do this then the CLI frontend can display the default value as well which may help understand how it works. (would've helped me right now).

Link directly against ffmpeg

At the moment we're using ffmpeg as a cli app.
This is incredibly inefficient because we need to run various ffmpeg commands in sequence. Linking directly against the ffmpeg as a library would allow us much greater flexibility and setup the entire process as a stream.

gui frontend

I'm considering making a frontend for this project that makes it easier to use for regular people (not software engineers).

Probably with reflex (because I'm familiar with it).

Filter out stopping words

If we have speech recognition we could filter out dumb words like 'uhm' and the like.

Nix bundle release doesn't recognize the working directory

I added logging to the writing of the file. And if a root path as workdir show up It'll tell the user to try an absolute path. But it would be better if we somehow give the workdir to the bundle instead.

Youtube upload

I think this was the original idea at some point. Turns out I don't need quantity.

Make ppa for ubuntu

some viewers asked for a ppa for ubuntu or debian.

At the time I didn't want to do that because I was affraid it'll be hard.
But it may be interesting to investigate none the less.

Use shellFor for the nix shell

I have some example code and docs, so it should be easy for anyone to pick this up.

Cache steps depending on settings

We can construct a nix style cache for the various steps in editing.
This makes it easier to tweak settings.
We for example shouldn't need to run silence detection again if we increase the detect margin.
Part of this issue is figuring out what can be cached.

Original tracks should be preserved even if a music track is specified

At the moment we (re) combine the voice track and music track at the final stage.
This assumption is correct if we want to publish the result immediately.

But for some video's I've been wanting to do some additional editing. To do that I need to have access to the orignal voice and music track so I can decide what to do with that.

Motion detection on the video

If motion is happening on screen, perhaps something interesting is going on as well?
A mechanism for detecting motion would be nice.

Installation instructions - applying the overlay

I'm a total noob at nix, installed it just for this package. After a lot of time I finally figured I had to ran nix-build to build the program and it was successful. The program runs if I use its full path, but it doesn't detect ffmpeg since I couldn't follow the install instructions to set up the nix-env.

I have no idea how to apply the overlay. Maybe something changed about nix or how nix deals with haskell packages.

I've tried multiple things, including going to ./nix/overlay and running nix-build since there's a default.nix (which i assume is there to apply the overlay contained in haskellPackages.nix), but I get the error:
error: expression does not evaluate to a derivation (or a set or list of those)
Copying neither, either or both .nix in the overlay folder to ~/.config/nixpkgs either gives an error
error: attribute 'cut-the-crap' in selection path 'cut-the-crap' not found
or a problem with recursion
error: infinite recursion encountered, at undefined position

Am I doing it wrong or is there an actual error going on?
Can you please provide clearer installation instructions for beginners like me?

Replace model dir compile flag with cli argument

This allows us to get rid of package.yaml.template as well.

For now our release can't generate subtitles because of this.