Giter Club home page Giter Club logo

get-sauce's People

Contributors

anotherxday avatar gan-of-culture avatar stegosawr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

get-sauce's Issues

Automatic create folder each download

Is there any way to automatic create a folder for each download?

Example: I have multiple links from a site and want to my download folder be like:

Download Link 1 - Folder name -> [SiteX] nameXXX - All files from this specific link included.
Download Link 2 - Folder name -> [SiteY] nameYYY - All files from this specific link included.

And so go on.

Tried a few things with options, but nothing seems to work.

Thanks.

orzqwq incorrect url parsing and details about rokuhentai

Get 9 urls of gif when using get-sauce -j https://orzqwq.com/manga/badatgarbage-lilina-x-gonzalez-fire-emblem-the-binding-blade/, but the gallery actually has only one gif.


The default view of rokuhentai is from right to left.
To change the view, click menu button in the top left corner of the gallery.
Url of images in rokuhentai gallery takes simple format of
https://rokuhentai.com/_images/pages/<galleryID>/<page>.jpg, <page> index start from 0.
For example, "https://rokuhentai.com/_images/pages/voy6xm/0.jpg" is the first image of the gallery "https://rokuhentai.com/voy6xm".

[BUG] Don't download by site hitomi.la

Good morning, first of all I would like to congratulate you because this program is very useful. Now I'm trying to use it to download some manga in Italian, but I'm having difficulty with the following codes 'go run https://hitomi.la/reader/2464336.html#1-'. I think what I've written is correct, maybe the website has been updated and the software hasn't?

terminal

go run https://hitomi.la/reader/2464336.html#1-

error 
2023/02/15 17:06:51 URL parse failed
exit status 1

I hope I was clear
Good day

Api creation

is there a way we can supply it title of the anime and it provides all the links from the available resources?

Add rokuhentai and orzqwq, bug report

Feature requests:

  1. rokuhentai: https://rokuhentai.com
  2. orzqwq: https://orzqwq.com

Bugs:

  1. Get "URL parse failed" when trying to download from hentai2read.com.
  2. Get "dial tcp: lookup cdn.pururin.io: no such host" when trying to download from pururin.to.
  3. Get "fatal error: concurrent map writes" or "fatal error: concurrent map iteration and map write" frequently when using htdoujin extractors with many download workers, e.g:100. Previous version does not have this issue.

[BUG] hitomi.la not downloading Artist CG and Game CG

Hi, now I can download almost all the albums found on the hitomi.la site, but I realized that some don't download, I'm reporting my code with the error, I'm waiting for confirmation whether it's a bug or if it's just me from problems
A clear and concise description of what the bug is.

get-sauce.exe https://hitomi.la/cg/ump-40--girls%E2%80%99-frontline--no-buta-koubi-italiano-1834710.html
or 
D:\hitomi.la ita\UMP 40 Girls Frontline No Buta Koubi>get-sauce.exe https://hitomi.la/cg/ump-40--girls%E2%80%99-frontline--no-buta-koubi-italiano-1834710.html

output error




panic: runtime error: slice bounds out of range [-1:]

goroutine 1 [running]:
github.com/gan-of-culture/get-sauce/extractors/hitomi.extractData({0xc00029c000?, 0x5a?})
        /home/runner/work/get-sauce/get-sauce/extractors/hitomi/hitomi.go:145 +0x645
github.com/gan-of-culture/get-sauce/extractors/hitomi.(*extractor).Extract(0x1025ba0?, {0xc00005e060?, 0xc00005e068?})
        /home/runner/work/get-sauce/get-sauce/extractors/hitomi/hitomi.go:80 +0xfe
github.com/gan-of-culture/get-sauce/extractors.Extract({0xc00005e060, 0x5a})
        /home/runner/work/get-sauce/get-sauce/extractors/extractors.go:127 +0xb6
main.download({0xc00005e060?, 0x5a?})
        /home/runner/work/get-sauce/get-sauce/main.go:46 +0xb5
main.main()
        /home/runner/work/get-sauce/get-sauce/main.go:115 +0x15d

Good afternoon

Hentaienvy not working

CDNPrefixSrcURLPart of hentaienvy.com has been changed to "js/main_v4.js".
CDNPrefixSrcURLPart of imhentai.xxx and hentairox.com have also been changed to "js/main_94xa9x.js" and "js/main_v8.js", but seems working fine.

Renaming the project

This project started out as a small program that was able to scrape sites like rule34. Since it has quite grown and found other use besides scraping image boards I decieded that it is now time to give this repo a useable name.

Currently the most probable name is get-sauce and this will be change with the next release on the coming weekend. Older versions of this program will still work as long as the site you are trying to scrape doesn't break. I chose to change the name mainly because the current name is quite long and difficult to type with a double dash.

If you have any better recommendations please let me know.

Kind regards,
gan-of-culture

hentaihaven.xxx: URL parse failed

Describe the bug
Episodes according to the scheme https://hentaihaven.xxx/watch/[series]/episode-[num]/ can be downloaded without problems, but not OVAs:

To Reproduce

$get-sauce https://hentaihaven.xxx/watch/fuyu-no-semi/ova-1/
2023/05/14 20:56:52 URL parse failed

Expected behavior
Downloads normally like a episode.

Desktop (please complete the following information):

  • OS: Parrot OS (Debian 11-based)
  • Can the video or image be viewed on the website? Yes

Edit: Same applies for URLs like https://hentaihaven.xxx/watch/ai-no-kusabi/episode-1-1999/

External Subtitle File Download

Not reporting any bug just wanted to know is there any way to get the subtitle file as well or do we need to extract it from the merged mp4 file ourselves? tried -c 0 option but it downloads the subtitles and merges it to the video and audio file. Was wondering if there was a way for the program to leave a copy of the subtitle file also

Bug Size image

Hi, I would like to ask something even if I don't know if it can be considered a bug because this problem is also present on the code page where there are instructions. The problem is the following (code taken directly from https://github.com/gan-of-culture/get-sauce)

get-sauce -i https://nhentai.net/g/364616/ https://nhentai.net/g/364591/
OUTPUT 
  Site: https://nhentai.net
  Title: Matsuri tte Ii na
  Type: image
  Streams: # All available qualities
      [0] -------------------
      Type: image
      Quality: unknown
      Parts: 31
      Size: 0B  
      # download with: get-sauce -s 0 ...

the size is 0 , you could when you write get-sauce -i https://nhentai.net/g/364616/ https://nhentai.net/g/364591/ with -i also the Size i.e. the sum of the sizes of the images. In my opinion it is to be considered a bug, but maybe there are important complications. However, it is a way to make this excellent project more complete.

HentaiStream.moe subtitle isn't downloaded

Hi, thanks for this tool. Does HentaiStream.moe support downloading the subtitle? I tried download from there but it just downloading the video. I checked using browser developer mode and found the subtitle URL in the HTML code, so the video I downloaded have substitle file.
image
Thanks

invalid character '<' looking for beginning of value

Describe the bug
When I try to download from nhentai.net I get invalid character error?

get-sauce -h "cookie: cf_clearance=xxxx; csrftoken=xxxx
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36" https://nhentai.net/g/435327/

Expected behavior
get-sauce downloads from nhentai.net

Desktop (please complete the following information):

  • OS: android 11 termux
  • Can the video or image be viewed on the website? Yes

UncensoredHentai.xxx Video Subtitles

I can't get subtitles to the videos I'm downloading,

I tried downloading a video through uncensored hentai but whatever command I use the video does not contain the subtitles, I tried using '-k', and '-c' but all ended up with a video that does not contain subtitles.

[BUG] Can't download gallery with gallery ID smaller than a certain number

Describe the bug
When I try to download gallery with small gallery ID from hentaienvy,
I get no CDN prefix was found. Check if CDNPrefixLevels have been parsed correctly.

To Reproduce
Steps to reproduce the behavior:

  1. Use get-sauce to download following gallery: https://hentaienvy.com/gallery/273160/

Expected behavior
Images downloaded.

Desktop:

  • OS: Windows 11

Additional context
It seems that sites with the same site design also have the same issue.
For hentaienvy and hentaizap, the number is 273161, for imhentai and hentaiera, the number is 274825.

[ERROR] <Pls help>

I build it and tried to run on cmd i get this error,
and when i tried the executable, doesnt open for more than 1 msec
Can u help me pls ?

CMD error:

D:\Scripts\go-hentai-scraper-master\go-hentai-scraper-master>go run main.go
panic: runtime error: index out of range [0] with length 0

goroutine 1 [running]:
main.main()
D:/Scripts/go-hentai-scraper-master/go-hentai-scraper-master/main.go:95 +0xb9
exit status 2

[BUG] nhentai invalid JSON for: xxxxx

Describe the bug
I'm receiving this error
get-sauce https://nhentai.net/g/364616/ 2022/05/18 14:46:54 invalid JSON for: 364616
Can't download from nhentai?

Expected behavior
start downloading from nhentai

Desktop (please complete the following information):

  • OS: Android 11 Termux, Xiaomi Redmi Note 8 Pro

[BUG] Subtitle doesn't show after 2 or 3 minute [hentaistream.moe]

Describe the bug
Subtitle doesn't show after the 2nd or 3rd minute of the video?

To Reproduce
Steps to reproduce the behavior:

  1. get-sauce -s 0 -c 0 https://hentaistream.moe/970/akane-wa-tsumare-somerareru-1/
  2. video and subtitle are merged successfully but
  3. when I try to play with mpv ( vlc, mx or mpc-hc)
  4. subtitle never appears again after 2 or 3 minute
  5. tried to extract subtitle with program and saw subtitle is only downloaded until the 2 or 3 minute

Expected behavior
The Whole Subtitle file will be downloaded for the video

Desktop (please complete the following information):

  • OS: Android 11 Termux, Xiaomi Redmi Note 8 Pro

Additional context
I think this is a ffmpeg problem because in hentaistream.moe's .vtt subtitle
I saw that they put spaces between some lines and ffmpeg skips the whole lines when such space appears.

Example:

00:02:30.000-xx:xx:xx.000
text text

00:02:57.000-xx:xx:xx.000
text text

text text

00:03:40.000-xx:xx:xx.000
text text

00:03:45.000-xx:xx:xx.000
text text

So in 00:02:57.000 there is a space between the lines and because of this.
ffmpeg does skip 00:02:57.000 line and whatever line comes after so the subtitle file will be only downloaded until 00:02:30.000 minute.

Edit:

merging the video and .vtt subtitle with mkvmerge without converting the subtitle with ffmpeg solves THİS problem! The problem does not occur with mkvmerge.

Could you also add mkvmerge as dependencies? and change the command for merging(for hentaistream.moe ONLY ofcourse) to use mkvmerge instead of ffmpeg?

[Bug] `strconv.Atoi: parsing "": invalid syntax:`

Describe the bug
Hi,

I can't download from specific URL https://oppai.stream/watch?e=Shin-Sei-Yariman-Gakuen-Enkou-Nikki-The-Animation-1 when I execute

get-sauce -i https://oppai.stream/watch?e=Shin-Sei-Yariman-Gakuen-Enkou-Nikki-The-Animation-1
2023/12/16 23:02:35 strconv.Atoi: parsing "": invalid syntax: Shin-Sei-Yariman-Gakuen-Enkou-Nikki-The-Animation-1

I had not problem with other urls from this site but this URL did give strconv.Atoi error?

Question about Separated Caption and Timeout

Separated Caption

  • Is there any reason why it's always merged into .mp4 container/file format instead of original container?
  • Is there any option to not merge them ? so they are still separated video & subs. If yes how to do it ? if no could you add option for that? or seperated video and subs is not "best practice" kind of thing so you merge them?

Timeout

  • I'm using pi-hole + unbound, so the first request would take long time and I often have TLS handshake timeout, is it possible to increase the timeout?

[BUG] hentais.tube and hentaitv.fun not working, htdoujin extractor not fixed

  1. hentais.tube
    get-sauce https://www.hentais.tube/tvshows/furueru-kuchibiru/ returns
    open unknown.tube\tvshows\furueru-kuchibiru: The system cannot find the path specified.
  2. hentaitv.fun
    get-sauce "https://hentaitv.fun/episode/2170/tiny-evil-episode-1 returns
    data source parse failed: https://hentaitv.fun/episode/2170/tiny-evil-episode-1.
    When I try to play videos on the website, I get "404: file not found"
  3. htdoujin extractor
    Still get "fatal error: concurrent map writes" or "fatal error: concurrent map iteration and map write" frequently when using htdoujin extractor with many download workers, e.g:100. Version 1.2.31 works.
    There seems to be something wrong with downloader.go after version 1.2.32.

Orzqwq scraping issue

I am having an issue downloading image sets in the form of a manga. Unsure if this is an issue with how the website is structured or an issue on my end.

Tried various methods to download
https://orzqwq.com/manga/camonome-bukatsu-chokugo-no-rikujoubu-joshi-ni-sasowarete/

get-sauce https://orzqwq.com/manga/camonome-bukatsu-chokugo-no-rikujoubu-joshi-ni-sasowarete/japanese/?style=list

Site: https://orzqwq.com/
Title:
Type: image
Stream:

 [0]  -------------------
 Type:            image
 Quality:         unknown
 Size:            0 B
 # download with: get-sauce -s 0 ...

get-sauce -p 1-61 https://orzqwq.com/manga/camonome-bukatsu-chokugo-no-rikujoubu-joshi-ni-sasowarete/

Site: https://orzqwq.com/
Title: [Camonome] Bukatsu Chokugo no Rikujoubu Joshi ni Sasowarete
Type: image
Stream:

 [0]  -------------------
 Type:            image
 Quality:         unknown
 Parts:           61
 Size:            0 B
 # download with: get-sauce -s 0 ...

Downloaded 61 blank images

get-sauce https://orzqwq.com/manga/camonome-bukatsu-chokugo-no-rikujoubu-joshi-ni-sasowarete/japanese/p/1/

Site: https://orzqwq.com/
Title: [Camonome] Bukatsu Chokugo no Rikujoubu Joshi ni Sasowarete
Type: image
Stream:

 [0]  -------------------
 Type:            image
 Quality:         unknown
 Size:            0 B
 # download with: get-sauce -s 0 ...

get-sauce -p 1-61 https://orzqwq.com/manga/camonome-bukatsu-chokugo-no-rikujoubu-joshi-ni-sasowarete/japanese/p/1/
panic: runtime error: index out of range [0] with length 0

goroutine 1 [running]:
github.com/gan-of-culture/get-sauce/extractors/orzqwq.extractData({0xc00007e060, 0x60})
C:/Users/XXXXX/go/pkg/mod/github.com/gan-of-culture/[email protected]/extractors/orzqwq/orzqwq.go:92 +0x698
github.com/gan-of-culture/get-sauce/extractors/orzqwq.(*extractor).Extract(0x3c5ba0?, {0xc00007e060?, 0xc00007e068?})
C:/Users/XXXXX/go/pkg/mod/github.com/gan-of-culture/[email protected]/extractors/orzqwq/orzqwq.go:34 +0xc5
github.com/gan-of-culture/get-sauce/extractors.Extract({0xc00007e060, 0x60})
C:/Users/XXXXX/go/pkg/mod/github.com/gan-of-culture/[email protected]/extractors/extractors.go:127 +0xb6
main.download({0xc00007e060?, 0x60?})
C:/Users/XXXXX/go/pkg/mod/github.com/gan-of-culture/[email protected]/main.go:46 +0xb5
main.main()

get-sauce https://orzqwq.com/manga/camonome-bukatsu-chokugo-no-rikujoubu-joshi-ni-sasowarete/japanese/?style=list

Site: https://orzqwq.com/
Title:
Type: image
Stream:

 [0]  -------------------
 Type:            image
 Quality:         unknown
 Size:            0 B
 # download with: get-sauce -s 0 ...

I did notice that there was no direct image address unless I went to the images on the page/list and copied the address from the image. When that url was used in get-sauce it was downloaded. The images are in the format:

https://img2.orzqwq.com/manga_62910f394c745/4fff8e79f1f2db9b2e349f1b1f770920/001.jpg
https://img2.orzqwq.com/manga_62910f394c745/4fff8e79f1f2db9b2e349f1b1f770920/002.jpg
https://img2.orzqwq.com/manga_62910f394c745/4fff8e79f1f2db9b2e349f1b1f770920/003.jpg
and so on

Is there a different command that I should be using for this website?

[BUG] Referer Checking added for hentaistream.moe

Describe the bug
Referer checking added on our Servers

To Reproduce
Steps to reproduce the behavior:

  1. Referer for HTTP request to the CDN-server must be set at least to "https://hentaistream.moe/"
  2. otherwise you will see an 403.

Additional context
Hello, I am one of the Admins of Hentaistream, and wanted to inform you, that we enforce HTTP-Referer-Checking because our Service is being used by other Websites as CDN.

Hopefully you can implement the fix quite easily.

Have a great new year 2022! ;)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.