Giter Club home page Giter Club logo

subtitles-rs's Introduction

Rust subtitle utilities

Build Status Build status

Are you looking for substudy? Try here. (substudy has been merged into the subtitles-rs project.)

This repository contains a number of related tools and libraries for manipulating subtitles. See the README.md files in this individual subdirectories for more details.

  • substudy: Learn foreign languages using audio and subtitles extracted from video files.
  • vobsub: A Rust library for parsing subtitles in sub/idx format.
  • vobsub2png: A command-line tool for converting sub/idx subtitles to PNGs with JSON metadata.
  • opus_tools: Utilities for parsing subtitle data from the OPUS project, for use as input to various language models.
  • common_failures: Useful Fail implementations and error-handling tools.
  • cli_test_dir: A simple integration testing harness for CLI tools.

The following subtitle-related projects can be found in other repositories:

  • aligner: This GPLed library by kaegi uses dynamic programming to re-align out-of-sync subtitles using another subtitle file with known-good timing.
  • subparse: This library by kaegi parses many common subtitle formats.

License

This code is distributed under the CC0 1.0 Universal public domain grant (plus fallback license), with the exception of some data in the fixtures directory, which contains a few individual frames of subtitle data used in tests. Note that none of the individual crates include that data.

Contributions

Your feedback and contributions are welcome! Please feel free to submit issues and pull requests using GitHub.

subtitles-rs's People

Contributors

arthaey avatar emk avatar kaegi avatar mattkanwisher avatar symphorien avatar yrashk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

subtitles-rs's Issues

Option to include an extra line of dialog before or after

The automatic splitting seems pretty decent, but sometimes it chops a sentence awkwardly. It would be nice if I could ask substudy to include one line of dialog before or after the "target" line for context.

Ideally, as separate "columns" in the csv output, so I can style them differently in Anki. :)

Cargo install substudy hangs on terminal

Using
cargo 0.10.0-nightly
multirust 0.8.0 (8654d1c07 2016-03-08 23:02:12 +0000)

This may be a cargo issue, but

cargo install substudy --verbose

gives

error: failed to compile `substudy v0.4.0`, intermediate artifacts can be found at `/Users/cd/target-install`
Caused by:
  unable to get packages from source

Caused by:
  failed to download package `cld2-sys v0.1.0` from https://crates.io/api/v1/crates/cld2-sys/0.1.0/download

Caused by:
  Timeout was reached

Your tool looks really cool by the way!!

Error: Cannot truncate time period Period { begin: 453.57, end: 457.84 } at 453.57

Hi! I'm getting this error using the export function. I get it with both the binary release and installing via cargo install.

$ ~/.cargo/bin/substudy export review umaru-1.mp4 en-1.srt jp-1.srt
Error: Cannot truncate time period Period { begin: 453.57, end: 457.84 } at 453.57
$ ~/.cargo/bin/substudy export csv umaru-1.mp4 en-1.srt jp-1.srt
Error: Cannot truncate time period Period { begin: 453.57, end: 457.84 } at 453.57

The sub files were originally .ass files that I converted to .srt files using this tool: https://subtitletools.com/convert-to-srt-online

This is my system: Linux macbookpro-arch 4.14.11-1-ARCH #1 SMP PREEMPT Wed Jan 3 07:02:42 UTC 2018 x86_64 GNU/Linux

If it's relevant, here is my FFmpeg version:

ffmpeg version 3.4.1 Copyright (c) 2000-2017 the FFmpeg developers
built with gcc 7.2.1 (GCC) 20171128
configuration: --prefix=/usr --disable-debug --disable-static --disable-stripping --enable-avisynth --enable-avresample --enable-fontconfig --enable-gmp --enable-gnutls --enable-gpl --enable-ladspa --enable-libass --enable-libbluray --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libiec61883 --enable-libmodplug --enable-libmp3lame --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libv4l2 --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxcb --enable-libxvid --enable-shared --enable-version3
libavutil      55. 78.100 / 55. 78.100
libavcodec     57.107.100 / 57.107.100
libavformat    57. 83.100 / 57. 83.100
libavdevice    57. 10.100 / 57. 10.100
libavfilter     6.107.100 /  6.107.100
libavresample   3.  7.  0 /  3.  7.  0
libswscale      4.  8.100 /  4.  8.100
libswresample   2.  9.100 /  2.  9.100
libpostproc    54.  7.100 / 54.  7.100

If it helps, here is (what I think is) the relevant sections in the .srt files:

106
00:07:30,520 --> 00:07:33,550
Ah, so cat videos are popular.

107
00:07:33,570 --> 00:07:37,840
Wins
1 Million Yen!

108
00:07:33,570 --> 00:07:37,840
The pet that ranks 1st
in number of views...

109
00:07:33,570 --> 00:07:37,840
Contest!

110
00:07:33,570 --> 00:07:37,840
Animal
Video

111
00:07:33,570 --> 00:07:37,840
Enter Now!

112
00:07:33,990 --> 00:07:37,850
Oh? Looks like there's a contest going
on for whoever can get the most views.

113
00:07:37,850 --> 00:07:39,890
The one million yen is mine!

There are several segments with the time period that it complains about.

It doesn't seem to be refering to any times in jp-1.srt but here is that general area:

125
00:07:31,520 --> 00:07:33,180
猫動画が人気あるのか

126
00:07:34,960 --> 00:07:37,620
今 再生数を競う大会が開かれてるな

127
00:07:38,140 --> 00:07:39,860
100万はうまるのだよ

Fix: After manually merging the segments in en-1.srt that started at the same time, this issue went away and everything worked as expected.

Reduce padding on subtitle times

Great tool, thank you for your efforts.

I do wish for one feature especially though, it's almost a deal-breaker, and that's the ability to adjust the padding of subtitle times through an argument when generating Anki cards through substudy.

It seems to me, and correct me if I'm wrong, but it certainly seems that the times are being automatically padded by substudy, meaning there is extra time added on both the start and the end.

This might be a sensible default, as it's presumably better to have a little too much of what's said, than too little, but I have some pretty nicely timed subtitles, and it would be great to be able to prevent this padding from taking place, even if it's not the default.

I've tried looking through the source, but tbh, I'm not a great programmer, I'm actually integrating substudy into some scripts, but I've been looking at a lot of python and POSIX sh for the last few years, not so much C-like languages, and I'm totally unfamiliar with Rust. It seems a lot isn't commented, so I find my eyes glazing over and my brain shutting down when I tried to figure stuff out in the source myself.

Even if this isn't what's happening, perhaps a command line argument to reduce or increase padding would be good.

I'd try to implement it myself as I'm sure it's rather simple, but for the reasons I stated earlier, I'm pretty much at the mercy of your (or some other kind soul's) benevolence.

Allow user to overwrite output dir

When testing a new video & subtitle file, I run substudy a couple times. I have to manually delete the output dir each time, or else substudy quits. It would be nice if it could prompt me to overwrite if I want to continue anyway. Or maybe a --force option, if you want to conform to Unix-y expectations and avoid interactive prompts.

[request] downloading subs and video from youtoube

It would be nice to provide link to youtube video and languages, and substudy would download it with subs and create anki deck out of it.

Maybe it should be created on top of substudy? What do you think?

Feature: test first N minutes

I'm trying substudy out on another file, and this one has timing issues that I thought I had fixed, but apparently didn't. So that means I have to try fixing the timing again, then hope that this time I got it right, then wait for substudy to process the entire video again, then re-import into Anki, before I can check if it's okay.

I wish there was a "test" feature of substudy, so that I could check the first 2 or 5 minutes of the video for timing issues, before "committing" to processing the entire video.

MissingFieldError("codec_name")

Seems if you use a format (mp4) other then MKV you get this error "MissingFieldError("codec_name")". However a simple convert works in ffmpeg. I wasn't sure if there is another way to get around this. I may try my hand at a PR on this, but have never done Rust ;) Great program btw, much easier then sub2srs

ffmpeg -i ~/Downloads/file.mp4 -vcodec copy -acodec copy ~/Downloads/out_file.mkv

Can't build latest version on Linux Mint

error: could not find native static library uchardet, perhaps an -L flag is missing?

Build failed, waiting for other jobs to finish...
error: failed to compile substudy v0.4.1-pre.1, intermediate artifacts can be found at /tmp/cargo-install.tvoajIvp9IEE

Caused by:
Could not compile uchardet-sys.

To learn more, run the command again with --verbose.

graphic interface

It would be nice to have graphic interface sometime in the future, although I know there is no clear solution to this problem now, so maybe just keep this issue open until there is a light, cross-platform solution in the future.

ReasonML GUI "spike"

I want to see if ReasonML is a good choice for a GUI layer.

  • Build a "Hello, world" app.
  • Display video.
  • Convert old server to generate basic JSON needed for GUI.
  • Figure out how to open a JSON file using the GUI (still very primitive).
  • Port over CSS.
  • Implement automatic window layout and subtitle-only scrolling.
  • Implement "scroll to playing subtitle".

No Tag data when MKV created by ogmrip

Using the list track command produces an error when querying a mkv file created by ogmrip. Had to use ogmrip because of copy protection.

substudy list tracks CM6-0E-UT3.2_DES.mkv
MissingFieldError("tags")

Using the command mkvinfo, I can see the track information and extract the appropriate sub-title track.

mkvinfo CM6-0E-UT3.2_DES.mkv
| + A track
| + Track number: 4 (track ID for mkvmerge & mkvextract: 3)
| + Track UID: 914786270282254080
| + Track type: subtitles
| + Lacing flag: 0
| + Codec ID: S_TEXT/UTF8
| + A track
| + Track number: 5 (track ID for mkvmerge & mkvextract: 4)
| + Track UID: 9054752488569948265
| + Track type: subtitles
| + Default flag: 0
| + Lacing flag: 0
| + Codec ID: S_TEXT/UTF8
| + Language: ita

any subsequent reference to tracks however fails on the tags metadata.

substudy export tracks CM6-0E-UT3.2_DES.mkv it.srt
MissingFieldError("tags")

I cannot get substudy to function

When I run, substudy combine episode_01_01.es.srt episode_01_01.en.srt > episode_01_01.bilingual.srt

Nothing happens. If I drop the word substudy from the above, I get "Usage: combine Filel OP file2" as an error.

If I run substudy export csv episode_01_01.mkv episode_01_01.es.srt episode_01_01.en.srt then I get "MissingFieldError("streams")"

If I run that code wthout the ">" after mkv, then I get:
Invalid arguments. Usage: substudy clean <subs> substudy combine <foreign-subs> <native-subs> substudy export csv <video> <foreign-subs> [<native-subs>] substudy export review <video> <foreign-subs> [<native-subs>] substudy export tracks <video> <foreign-subs> substudy list tracks <video> substudy --help substudy --version

https://forum.language-learners.org/viewtopic.php?f=19&t=6360&start=10 is where another person tried to help. I welcome your thoughts on where I am going wrong. I installed LinuxMint18 just to use your program, so maybe it is possible I am missing some foundational download I need before using substudy.

Support video files with no language data

I have an .avi video file that has no language data. substudy list tracks myfile.avi outputs the following error: MissingFieldError("tags")

For what it's worth, here's the output from ffmpeg -i myfile.avi:

ffmpeg version 2.4.2 Copyright (c) 2000-2014 the FFmpeg developers
  built on Oct  8 2014 19:37:09 with llvm-gcc 4.2.1 (LLVM build 2336.11.00)
  configuration: --prefix=/Volumes/Ramdisk/sw --enable-gpl --enable-pthreads --enable-version3 --enable-libspeex --enable-libvpx --disable-decoder=libvpx --enable-libmp3lame --enable-libtheora --enable-libvorbis --enable-libx264 --enable-avfilter --enable-libopencore_amrwb --enable-libopencore_amrnb --enable-filters --enable-libgsm --enable-libvidstab --enable-libx265 --arch=x86_64 --enable-runtime-cpudetect
  libavutil      54.  7.100 / 54.  7.100
  libavcodec     56.  1.100 / 56.  1.100
  libavformat    56.  4.101 / 56.  4.101
  libavdevice    56.  0.100 / 56.  0.100
  libavfilter     5.  1.100 /  5.  1.100
  libswscale      3.  0.100 /  3.  0.100
  libswresample   1.  1.100 /  1.  1.100
  libpostproc    53.  0.100 / 53.  0.100
[mpeg4 @ 0x7f84b280fe00] Failed to parse extradata
Input #0, avi, from '/Users/arthaey/Downloads/myfile.avi':
  Metadata:
    encoder         : VirtualDubMod 1.5.10.2 (build 2540/release)
    title           : MyFile
  Duration: 01:11:42.00, start: 0.000000, bitrate: 1651 kb/s
    Stream #0:0: Video: mpeg4 (Advanced Simple Profile) (XVID / 0x44495658), yuv420p, 704x400 [SAR 1:1 DAR 44:25], 1317 kb/s, 25 fps, 25 tbr, 25 tbn, 25 tbc
    Stream #0:1: Audio: ac3 ([0] [0][0] / 0x2000), 48000 Hz, stereo, fltp, 192 kb/s
    Metadata:
      title           : MyFile
    Stream #0:2: Audio: mp3 (U[0][0][0] / 0x0055), 48000 Hz, stereo, s16p, 128 kb/s
    Metadata:
      title           : MyFile

Trouble with using mp3 files

I’m getting the following error with mp3 files on substudy:

$ substudy export csv Traicionera.mp3 Traicionera_subtitles.en.srt Traicionera_subtitles.es.srt
MissingFieldError("tags")

There might need to be some flags or “tags” to process the .mp3?

Check for output dir earlier

It takes a while before substudy quits with an error about the output directory already existing. It would be nice if it checked that first and exited immediately.

Show full error chains

We're only printing out the "top-level" error message and not the root cause. Fix in both the regular app and in substudy-backend.

Feature request: custom HTML template for export review

I'd like to be able to provide my own HTML template for the output of the export review command.

My personal interest in this is that I'd like to be able to use my own audio player to play the files rather than the built in audio element (for better sound playback control) and also potentially having a side-by-side layout rather than an interlinear one.

ExpectedError("Number", "-99")

I'm trying to export my mkv to csv using the given command substudy export csv episode_07_01.mkv \ episode_07_01.fr.srt episode_07_01.en.srt but I can't seem to get it working. I get ExpectedError("Number", "-99").

I'm having no problems using the combine code for the bilingual subtitles. I just can't seem to get the export working for the life of me..

Downloaded but I cannot get it to work

Is there documentation about how to use subtitles-rs. I cannot find it. I downloaded the binary but I cannot see any user documenation either online or within the tool. Please help.

stdweb & wasm-based GUI "spike"

It's time for (I hope) one last proof of concept for the GUI. To do this, we would need to implement:

  • koute/stdweb#62 Add video support to stdweb.
  • Ditto for yew.
  • Build a video player which shows time.
  • Load and display subtitles.
  • Figure out what to implement next.

Handle zero-duration subtitles

These are generated by the aeneas alignment tool. I'm not sure how to best handle them, but at least we shouldn't error out.

Indicate progress when exporting

I ran substudy export review for the first time, and wondered if it was working because it showed no progress indicator, or even a "Now exporting" status message. An indicator other than my CPU spiking would be cool. :)

Split audio according to subtitle timing exactly for songs

The default algorithm for splitting up audio works well enough for TV shows, where splitting up a sentence isn't really the end of the world.

For my weaker languages, I had the idea to use substudy with songs instead of TV. But it's going to be really weird and annoying to split up lyrics that way, especially when they're already "bite-sized".

So here I am, asking for yet another option to be added to substudy. ;)

Cannot truncate time period Period

An odd error.

Cannot truncate time period Period { begin: 610.839, end: 611.874 } at 610.839

The first part of the file is.

1
00:00:06,120 --> 00:00:08,475
Parco nazionale Harwood
PENNSYLVANIA

2
00:00:08,519 --> 00:00:09,998
Forse dovremmo tornare indietro.

I've compared this to other ones which have been successful and it doesn't look any different. So, is it possible the problem is with the mpv file output by ogmrip ?

Feature request: multi-lingual (unlimited languages, not limited to only two)

  • I'd love to be able to provide three or more input subtitles each in a different language.
  • I'd also love to have a "language agnostic" mode where no one language is designated as the "native" language and that no one language is designated as the "foreign" language. No language is given priority over another one in any way other than the order they are specified (formatting, etc.).
  • And I'd like to be able to optionally pair a media file with each of the input subtitles. (Meaning the possibility of not using a media file at all, only pairing the texts without audio at all, and also the possibility of using a media file for each language, so that audio can be listened to in any of the languages.)

I would like to use this in order to output an HTML file with a table of three or more columns of languages with media for each cell (all columns/languages, and all rows/segments). As a reference, this is what I'm after: http://emichaelsphoto.com/~aardbei/content/parallelponies/ (clicking and holding on a cell plays the associated media.) Screenshot

Another simple usage scenario for being able to provide more than two subtitle files is being able to provide Japanese, hiragana, and English in order to have all three interlaced in the current format of the export review command.

Image not showing

This tool generated Image fields like this:

My front template has a placeholder like this:

{{Image}}

When reviewing, instead of seeing the image as expected, I see the HTML code for the image instead.

Is there a way to make Anki display an image whose HTML code is stored as a Note field? Is this a limitation of this tool or a limitation of Anki?

Error E0713 when building substudy

when I tried to build substudy: git clone https://github.com/emk/substudy.git cd substudy cargo build
I got this error:

Compiling submodel v0.1.0 (/Users/me/substudy/submodel)
error[E0713]: borrow may still be in use when destructor runs
--> /Users/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/url-1.6.0/src/form_urlencoded.rs:261:40
|
259 | impl<'a> Target for ::UrlQuery<'a> {
| -- lifetime 'a defined here
260 | fn as_mut_string(&mut self) -> &mut String...
261 | fn finish(self) -> &'a mut ::Url { self.url }
| ^^^^^^^^ - here, drop of self needs exclusive access to *self.url, because the type UrlQuery<'_> implements the Drop trait
| |
| returning this value requires that *self.url is borrowed for 'a

Screenshot 2023-08-29 at 10 39 14

For more information about this error, try rustc --explain E0713.
error: could not compile url (lib) due to previous error
warning: build failed, waiting for other jobs to finish...
warning: vobsub (lib) generated 26 warnings
warning: subtitle_ocr (lib) generated 2 warnings
warning: substudy (lib) generated 10 warnings (run cargo fix --lib -p substudy to apply 4 suggestions)

and when I tried to fix it as suggested: cargo fix --lib -p substudy

I got this:

error: the working directory of this package has uncommitted changes, and cargo fix can potentially perform destructive changes; if you'd like to suppress this error pass --allow-dirty, --allow-staged, or commit the changes to these files:

  • Cargo.lock (dirty)

What can I do? Thanks!

Can't compile substudy

I'm trying to update substudy and get it running again after a long time not using it. It used to work! On this same laptop, even, but before I'd upgraded to El Capitan.

I'm on a Mac (El Capitan 10.11.5). I ran multirust update stable and brew install cmake ffmpeg to make sure those dependencies were all up to date. No issues there.

But whether I run cargo install substudy or cargo build, I get the same error message:

/Users/arthaey/.multirust/toolchains/stable/cargo/registry/src/github.com-1ecc6299db9ec823/substudy-0.4.0/src/video.rs:13:20: 13:25 warning: unused import, #[warn(unused_imports)] on by default
/Users/arthaey/.multirust/toolchains/stable/cargo/registry/src/github.com-1ecc6299db9ec823/substudy-0.4.0/src/video.rs:13 use err::{err_str, Error, Result};
                                                                                                                                               ^~~~~
error: linking with `cc` failed: exit code: 1
note: "cc" "-m64" [snip verbose options]
note: Undefined symbols for architecture x86_64:
[snip stacktrace]
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

error: aborting due to previous error
error: failed to compile `substudy v0.4.0`, intermediate artifacts can be found at `/var/folders/j0/8d669w252655xfzfrfhnsgjc0000gn/T/cargo-install.qUrb7Re5e91W`

Running with --verbose didn't give any more useful output.

Unhelpful Error message

I'm trying to build a csv from a youtube video with transcripts. However I get this error message.

substudy export csv Clôture_de_TEDxParis_2015_Chris_Esquerre_TEDxParis.mp4 XKqlIjZ690_0_fr.srt 9XKqlIjZ690_0_en.srt
error at 140:1: expected one of \n, \r, [0-9]

I understand that the problem is it is expecting a carrage return or newline, but in which file? And is the 140: the line number or the srt block number? (both of which look ok, btw)

9XKqlIjZ690_0_en.srt.txt
9XKqlIjZ690_0_fr.srt.txt

subtitles error? error at 2:8: expected one of `,`, `[0-9]`

I tried to make flashcards from youtube video, but I get an error:

error at 2:8: expected one of `,`, `[0-9]`

Something is wrong with the subtitles?

Edit: Ok I fixed it. Apparently frame number was missing (after the time). I used different site to download the subtitles and it worked :)

Substudy: Adjust image size as command line option

The tool is great but the images are a little small for me. I guess defaulting to small images is to save on disk space but it would be great to able to select image size.
I'm not so familiar with Rust so I couldn't figure out from the code how image size is handled currently, but perhaps allowing the user to scale the input resolution by a factor of the original would be the simplest way.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.