Giter Club home page Giter Club logo

grab's Introduction

grab

GoDoc Build Status Go Report Card

Downloading the internet, one goroutine at a time!

$ go get github.com/cavaliergopher/grab/v3

Grab is a Go package for downloading files from the internet with the following rad features:

  • Monitor download progress concurrently
  • Auto-resume incomplete downloads
  • Guess filename from content header or URL path
  • Safely cancel downloads using context.Context
  • Validate downloads using checksums
  • Download batches of files concurrently
  • Apply rate limiters

Requires Go v1.7+

Example

The following example downloads a PDF copy of the free eBook, "An Introduction to Programming in Go" into the current working directory.

resp, err := grab.Get(".", "http://www.golang-book.com/public/pdf/gobook.pdf")
if err != nil {
	log.Fatal(err)
}

fmt.Println("Download saved to", resp.Filename)

The following, more complete example allows for more granular control and periodically prints the download progress until it is complete.

The second time you run the example, it will auto-resume the previous download and exit sooner.

package main

import (
	"fmt"
	"os"
	"time"

	"github.com/cavaliergopher/grab/v3"
)

func main() {
	// create client
	client := grab.NewClient()
	req, _ := grab.NewRequest(".", "http://www.golang-book.com/public/pdf/gobook.pdf")

	// start download
	fmt.Printf("Downloading %v...\n", req.URL())
	resp := client.Do(req)
	fmt.Printf("  %v\n", resp.HTTPResponse.Status)

	// start UI loop
	t := time.NewTicker(500 * time.Millisecond)
	defer t.Stop()

Loop:
	for {
		select {
		case <-t.C:
			fmt.Printf("  transferred %v / %v bytes (%.2f%%)\n",
				resp.BytesComplete(),
				resp.Size,
				100*resp.Progress())

		case <-resp.Done:
			// download is complete
			break Loop
		}
	}

	// check for errors
	if err := resp.Err(); err != nil {
		fmt.Fprintf(os.Stderr, "Download failed: %v\n", err)
		os.Exit(1)
	}

	fmt.Printf("Download saved to ./%v \n", resp.Filename)

	// Output:
	// Downloading http://www.golang-book.com/public/pdf/gobook.pdf...
	//   200 OK
	//   transferred 42970 / 2893557 bytes (1.49%)
	//   transferred 1207474 / 2893557 bytes (41.73%)
	//   transferred 2758210 / 2893557 bytes (95.32%)
	// Download saved to ./gobook.pdf
}

Design trade-offs

The primary use case for Grab is to concurrently downloading thousands of large files from remote file repositories where the remote files are immutable. Examples include operating system package repositories or ISO libraries.

Grab aims to provide robust, sane defaults. These are usually determined using the HTTP specifications, or by mimicking the behavior of common web clients like cURL, wget and common web browsers.

Grab aims to be stateless. The only state that exists is the remote files you wish to download and the local copy which may be completed, partially completed or not yet created. The advantage to this is that the local file system is not cluttered unnecessarily with addition state files (like a .crdownload file). The disadvantage of this approach is that grab must make assumptions about the local and remote state; specifically, that they have not been modified by another program.

If the local or remote file are modified outside of grab, and you download the file again with resuming enabled, the local file will likely become corrupted. In this case, you might consider making remote files immutable, or disabling resume.

Grab aims to enable best-in-class functionality for more complex features through extensible interfaces, rather than reimplementation. For example, you can provide your own Hash algorithm to compute file checksums, or your own rate limiter implementation (with all the associated trade-offs) to rate limit downloads.

grab's People

Contributors

bt avatar cavaliercoder avatar nathan13888 avatar oliverpool avatar riyaz-ali avatar thomasf avatar twooster avatar virb3 avatar wadtech avatar wph95 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

grab's Issues

No error in Response (in async) when checking with a bad checksum

Hello,

When using client.DoAsync and the checksum functionality, if the checksum is different, this error is not available in resp.Error (this value is nil). I must check if the file exists in order to detect this sort of error because I've set RemoveOnError on the request.

Regards

Make buffer size configurable

Hey im downloading to an external hard drive and with the default buffer of 4096 bytes I'm getting really poor performance <1M/s. I manually increased the buffer size to 4096*1024 bytes and now im getting 6.2M/s (which is the maximum my internet connection offers).

TestResponseProgress fails in Go v1.4.3

--- FAIL: TestResponseProgress (0.74s)
    response_test.go:31: Transfer should not have started
    response_test.go:35: Transfer should not have started yet but progress is 1

return grab.GetAsync implementation

could you please return grab.GetAsync implementation? it lost somewhere in previous commits. or please advice an alternative for this function.

need callback for each internal requests

how many internally call http requests? if file exists one head request, is it only this call? or have another?
i need to get header ("auth") of the head request and set new one on next request.

any apis to achieve this?

how to print the download progress with DoBatch

The lib works great for my use case. However, I'm stuck figuring a way to display progress using DoBatch. Could anyone show me an example? I've tried the following, but the progress will only show for the first file being downloaded, not the rest of files progress.

func getFiles(urls []string) {
	// create a request and a buffered response channel
	reqch := make(chan *grab.Request)
	respch := make(chan *grab.Response, 10)

	client := grab.NewClient()
	wg := sync.WaitGroup{}
	for i := 0; i < len(urls); i++ {
		wg.Add(1)
		go func() {
			client.DoChannel(reqch, respch)
			wg.Done()
		}()
	}

	go func() {
		// send requests
		for _, url := range urls {
			req, err := grab.NewRequest(".", url)
			if err != nil {
				panic(err)
			}
			reqch <- req
		}
		close(reqch)

		// start UI loop
		t := time.NewTicker(500 * time.Millisecond)
		defer t.Stop()

		resp := <-respch

	Loop:
		for {

			select {
			case <-t.C:
				fmt.Printf("  transferred %v / %v bytes (%.2f%%)\n",
				resp.BytesComplete(),
				resp.Size,
				100*resp.Progress()),
				)

			case <-resp.Done:
				// download is complete
				break Loop
			}
		}

		// wait for workers to finish
		wg.Wait()
		close(respch)
	}()

	// check each response
	for resp := range respch {
		// block until complete
		if err := resp.Err(); err != nil {
			panic(err)
		}

		fmt.Printf("Downloaded %s to %s\n", resp.Request.URL(), resp.Filename)
	}
}

Thanks!

rateLimit inaccurate when downloading files over https

len(c.b) is 32k by setting grab.Client.BufferSize, rate limitation works as expectation when downloading files using http but https.

When download files over https the nr is always 16k on return, some thing from go's implementation or from some deeper tls packages when making http request over tls, said by google search.

But I made the waitN to wait the time for 32k not for 16k, for example: I would like the grab to download 3 times of 32k per second so waitN should sleep 330 milliseconds, the actually is 3 times of 16kb been downloaded.

image

Please add os.O_TRUNC to writeFlags

...otherwise when the target file already exists downloading will with ErrBadLength because the existing file may have a different length.

Downloading dynamic content does not work

Hi,

when downloading dynamic content with Request.SkipExisting = false and Request.NoResume = true, it does not get overwritten, if ContentLength of remote file does not equal local file size,
It can be fixed in two ways.

  1. Add proper documentation to Request.SkipExisting, that if you download dynamically generated file, you should also set Request.NoResume to false
  2. Add new variable to Request, that will set Request.SkipExisting to false and Request.NoResume to true

After discussion is done, I'll prepare pull request.

Regards
Alex

Downloading Text file gives an issue while in resume mode.

I am trying to to download text file form a URL.

Grab downloads it perfectly first time.

But if I make some small change in source text file and try to download again; Grab append that change to end of a file.
now downloaded text file is of a no use.

Can you please help in this regards?

Move test files to tmp/shm

Current test files generate garbage files that dangle on early termination and break subsequent tests.

Corrupted contents when downloading on top of wrong file

Grab looked like a good fit for a project I'm working on so I gave it a spin. I found that it downloaded a file perfectly and when asked to download the same file again managed to avoid downloading all the bytes again, which was just what I was looking for.

I then overwrote the downloaded file with completely different contents and then downloaded again using grab. The message:

  206 Partial Content

was emitted and the download was apparently successful. The downloaded file even had the same number of bytes as the original, but unfortunately the contents were corrupted.

Fortunately, this problem is easily reproduced using the example program in the README:

$ go run main.go
Downloading http://www.golang-book.com/public/pdf/gobook.pdf...
  200 OK
Download saved to ./gobook.pdf
$ mv gobook.pdf gobook.pdf.good
$ cp main.go gobook.pdf
$ go run main.go
Downloading http://www.golang-book.com/public/pdf/gobook.pdf...
  206 Partial Content
Download saved to ./gobook.pdf
$ diff gobook.pdf gobook.pdf.good
Binary files gobook.pdf and gobook.pdf.good differ
$ ls -l
total 11320
-rw-r--r--  1 gnormington  staff  2893557  1 Jan  1970 gobook.pdf
-rw-r--r--  1 gnormington  staff  2893557  1 Jan  1970 gobook.pdf.good
-rw-r--r--  1 gnormington  staff     1139  3 Nov 11:10 main.go

The environment is go version go1.9.2 darwin/amd64 on macOS 10.13.1.

In case the README changes, the contents of main.go above is:

package main

import (
	"fmt"
	"os"
	"time"

	"github.com/cavaliercoder/grab"
)

func main() {
	// create client
	client := grab.NewClient()
	req, _ := grab.NewRequest(".", "http://www.golang-book.com/public/pdf/gobook.pdf")

	// start download
	fmt.Printf("Downloading %v...\n", req.URL())
	resp := client.Do(req)
	fmt.Printf("  %v\n", resp.HTTPResponse.Status)

	// start UI loop
	t := time.NewTicker(500 * time.Millisecond)
	defer t.Stop()

Loop:
	for {
		select {
		case <-t.C:
			fmt.Printf("  transferred %v / %v bytes (%.2f%%)\n",
				resp.BytesComplete(),
				resp.Size,
				100*resp.Progress())

		case <-resp.Done:
			// download is complete
			break Loop
		}
	}

	// check for errors
	if err := resp.Err(); err != nil {
		fmt.Fprintf(os.Stderr, "Download failed: %v\n", err)
		os.Exit(1)
	}

	fmt.Printf("Download saved to ./%v \n", resp.Filename)

	// Output:
	// Downloading http://www.golang-book.com/public/pdf/gobook.pdf...
	//   200 OK
	//   transferred 42970 / 2893557 bytes (1.49%)
	//   transferred 1207474 / 2893557 bytes (41.73%)
	//   transferred 2758210 / 2893557 bytes (95.32%)
	// Download saved to ./gobook.pdf
}

Hook for AfterCopy

At the moment there is a Hook for BeforeCopy. It would be great if there is one for AfterCopy as well.
Use case:

  • House keeping once copying is completed.

Response from github is 403 when downloading a release file.

Not able to download release artifact from GitHub.

package main

import (
	"fmt"
	"log"

	"github.com/cavaliercoder/grab"
)

func main() {
	client := grab.NewClient()
	req, err := grab.NewRequest("", "https://github.com/minishift/minishift-centos-iso/releases/download/v1.12.0/minishift-centos7.iso")
	if err != nil {
		log.Fatal(err)
	}
	resp := client.Do(req)
	fmt.Printf("Response is: %v\n", resp.HTTPResponse.Status)
}

Unexpected one.

==== Output ====
$ go run test.go
Response is: 403 Forbidden

Requests following a 302 response will result in ErrBadLength

Currently if I try to download a file that gives a 302 response, the Content-Length retrieved will be incorrect and cause the download to fail

Edit:
Made some changes for my use case to work, the failing part is here:

	if size < c.fi.Size() {
		fmt.Printf("Expected %v, Got %v\n", size, c.fi.Size())
		// return false, ErrBadLength
	}

Expected 2023597, Got 2023599

Call chtimes to set last modified

This does most of the sane things by default, except set the last modified time to be the file's modified time (aka. --remote-time in curl).

Grab hangs when server sends unexpected TCP RST packet

I'm trying to download larger files using grab. Sometimes it works as expected, but sometimes it's hanging. After debugging, it seems server sends unexpected TCP RST packet when downloading and causing grab to hang. Is it possible to catch TCP RST packet inside grab.(*transfer).copy ?

I know, setting timeout on http client will solve hanging issue, but it's not desirable, since we don't know how long will it take to download file. Also if server sends TCP RST, there's no point waiting for timeout, it would be nice if grab performs fail-fast for us.

Download failed: net/http: request canceled (Client.Timeout exceeded while reading body)

I'm able to simulate server sending unexpected TCP RST packets, so happy to lending a hand if debugging is needed.

Here's the stacktrace when grab is hanging. For ease of read, newlines are inserted between frames.

goroutine 22 [IO wait, 6 minutes]:
internal/poll.runtime_pollWait
/usr/lib/go-1.10/src/runtime/netpoll.go:173 +0x57

internal/poll.(*pollDesc).wait
/usr/lib/go-1.10/src/internal/poll/fd_poll_runtime.go:85 +0x9b

internal/poll.(*pollDesc).waitRead
/usr/lib/go-1.10/src/internal/poll/fd_poll_runtime.go:90 +0x3d

internal/poll.(*FD).Read
/usr/lib/go-1.10/src/internal/poll/fd_unix.go:157 +0x17d

net.(*netFD).Read
/usr/lib/go-1.10/src/net/fd_unix.go:202 +0x4f

net.(*conn).Read
/usr/lib/go-1.10/src/net/net.go:176 +0x6a

crypto/tls.(*block).readFromUntil
/usr/lib/go-1.10/src/crypto/tls/conn.go:493 +0x96

crypto/tls.(*Conn).readRecord
/usr/lib/go-1.10/src/crypto/tls/conn.go:640 +0x1fa

crypto/tls.(*Conn).Read
/usr/lib/go-1.10/src/crypto/tls/conn.go:1156 +0x100

net/http.(*persistConn).Read
/usr/lib/go-1.10/src/net/http/transport.go:1453 +0x136

bufio.(*Reader).Read
/usr/lib/go-1.10/src/bufio/bufio.go:202 +0x12c

io.(*LimitedReader).Read
/usr/lib/go-1.10/src/io/io.go:446 +0x63

net/http.(*body).readLocked
/usr/lib/go-1.10/src/net/http/transfer.go:778 +0x61

net/http.(*body).Read
/usr/lib/go-1.10/src/net/http/transfer.go:770 +0xdd

net/http.(*bodyEOFSignal).Read
/usr/lib/go-1.10/src/net/http/transport.go:2187 +0xdc

github.com/cavaliercoder/grab.(*transfer).copy
/home/bat/go/src/github.com/cavaliercoder/grab/transfer.go:48 +0xf0

github.com/cavaliercoder/grab.(*Client).copyFile
/home/bat/go/src/github.com/cavaliercoder/grab/client.go:457 +0x93

github.com/cavaliercoder/grab.(*Client).(github.com/cavaliercoder/grab.copyFile)-fm
/home/bat/go/src/github.com/cavaliercoder/grab/client.go:88 +0x34

github.com/cavaliercoder/grab.(*Client).run
/home/bat/go/src/github.com/cavaliercoder/grab/client.go:175 +0x6e

created by github.com/cavaliercoder/grab.(*Client).Do
/home/bat/go/src/github.com/cavaliercoder/grab/client.go:88 +0x229

DeleteOnError does not seem to work (at least on Windows)

// create download request
req, err := NewRequest("", "http://example.com/example.zip")
if err != nil {
    panic(err)
}

// set request checksum
sum, err := hex.DecodeString("33daf4c03f86120fdfdc66bddf6bfff4661c7ca11c5da473e537f4d69b470e57")
if err != nil {
    panic(err)
}
req.SetChecksum(sha256.New(), sum, true)

// download and validate file
resp := DefaultClient.Do(req)
if err := resp.Err(); err != nil {
    panic(err)
}

Although there is a checksum mismatch, the file will not be removed:

panic: checksum mismatch

goroutine 1 [running]:
main.downloadOpenjdk()
        C:/Users/path/to/main.go:88 +0x1b6
main.main()
        C:/Users/path/to/main.go:94 +0x2c
exit status 2

Apart from this, how to prevent that the file gets downloaded anyway if there is checksum mismatch?

I think that one of the first improvements could be adding error handling to the os.remove snippet
https://github.com/cavaliercoder/grab/blob/925bcfe56bc16868f1a398af4231cd4ffa07276f/client.go#L294 to ensure that at least an error message is returned. In my opinion error message should not be omitted.

Prevent download if status code is not correct

Loving the library but been forced to use something else recently, due to the fact that if a 404 Status Code has a body, grab will still download and save it.

In this case, it wasn't the file I wanted but a JSON payload telling me it didn't exist.

If I could whitelist the status codes I wanted to accept, that'd be great (maybe I've missed how to do this)

Thanks!

nil pointer dereference @ response.go:81

After update:

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x4 pc=0xc6974]

goroutine 34 [running]:
panic(0x346eb0, 0x1070a038)
/root/.gvm/gos/go1.6/src/runtime/panic.go:464 +0x330
sync/atomic.loadUint64(0x1080a1d4, 0x0, 0x0)
/root/.gvm/gos/go1.6/src/sync/atomic/64bit_arm.go:10 +0x54
github.com/cavaliercoder/grab.(_Response).BytesTransferred(0x1080a180, 0x4, 0x386878)
/root/.gvm/pkgsets/go1.6/global/src/github.com/cavaliercoder/grab/response.go:81 +0x40
github.com/cavaliercoder/grab.(_Client).do(0x107b79c0, 0x107f80f0, 0x0, 0x0, 0x0)
/root/.gvm/pkgsets/go1.6/global/src/github.com/cavaliercoder/grab/client.go:222 +0x3c0
github.com/cavaliercoder/grab.(_Client).DoAsync.func1(0x107b79c0, 0x107f80f0, 0x10802380)
/root/.gvm/pkgsets/go1.6/global/src/github.com/cavaliercoder/grab/client.go:94 +0x24
created by github.com/cavaliercoder/grab.(_Client).DoAsync
/root/.gvm/pkgsets/go1.6/global/src/github.com/cavaliercoder/grab/client.go:102 +0x60

Insecure SSL Downloads

Is there a possibility to download from an insecure SSL Link ?

Like in this Example:

    tr := &http.Transport{
        TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
    }
    client := &http.Client{Transport: tr}
    _, err := client.Get("https://golang.org/")

Downloading smaller file with NoResume doesn't truncate

This is related to #23 and should have been fixed by 799551c, but it's not. I'm not sure whether it's expected behaviour, but I don't think it is.

We have a file on disk, and we're downloading a newer version of it with grab. If the newer version is smaller, the file is not truncated, even with NoResume enabled.

I followed the code path (using good ol' fmt.Printf :) ) and discovered that this is because none of the blocks that set writeFlags are accessed in our case. In validateLocal, we exit before the NoResume block is reached, via this block:

        // determine expected file size
	size := resp.Request.Size
	if size == 0 && resp.HTTPResponse != nil {
		size = resp.HTTPResponse.ContentLength
	}
	if size == 0 {
		return c.headRequest
	}

as size is zero.

Then, in headRequest, there's this block:

        if resp.Request.NoResume {
		return c.getRequest
	}

that doesn't add the proper writeFlags. This means that when we get to openWriter, the writeFlags are untouched, as initialized by Do:

        writeFlags: os.O_CREATE | os.O_WRONLY,

And thus the file is not truncated.

What I did as a temporary fix is to set the flags in headRequest, just like you do in validateLocal:

        if resp.Request.NoResume {
                resp.writeFlags = os.O_TRUNC | os.O_WRONLY
		return c.getRequest
	}

However, I think that perhaps the NoResume check and the setting of the flags should be in one place, rather than two, but I'm not familiar enough with the library to know where that would be, so I thought I'd report this and leave it to you. Sorry this isn't a pull request, we're not using github plus my time is unfortunately limited at the moment.

[Feature Request] client.DoChannel()

Hi,

I'm considering using your library with a web crawler. One issue is that I don't know how many requests I will have when I start.

It could be great to have a client.DoChannel method (like DoBatch) which will consume a Request channel (until closing).
This new method could be used to refactor DoBatch (which internally already uses a channel).

If you think that this could be a good idea, I would be happy to propose a PR.

need callback for reading data from response

Server will return Etag in headers. The Etag is this hash value of the response's body. So When dumping body to file, i need a callback to streamy get body data to count hash value.
You may want me to count hash after body has dumped.But this will reread file from local disk. And when range-get, i need the range-start, range-length.

Optionally allow GET+Range 0-0 instead of HEAD

AWS S3 buckets are capable of generating temporary signed url, which grant access to an object for a limited amount of time. Unfortunately grab is not able to download these urls, because the url must be signed with the http verb. In other words the same url cannot be used for both a GET and HEAD request.

It would be helpful for us to simulate a HEAD request with GET + Range: bytes=0-0. If this was an option on the request, users could use the default or override it with a GET request if needed.

Willing to submit a PR if you'd like to support this option.

Can I download one file from multiple mirror link ?

Here is a new requirement, and the case is:
a large file on the internet which has multiple mirror download links, and the md5sum is the same.
the mirror http server all support accept-range.
than I want to download this file from multiple mirror link to speed up download bandwidth.
example: linux.iso total 300M
http://a.com/linux.iso
http://mirror.b.com/linux.iso
http://mirror.c.com/linux.iso

download [0, 100M) from a.com
download [100M,200M) from b.com
download [200M,300M) from c.com

download at one time, merge into one file.

download file from go-fastdfs

no filename could be determined
2019/09/29 16:44:31 E! DownloadFile client connect error: no filename could be determined

code:

func main() {
err := downloadFile("http://10.110.18.101:30039/group1/agent/ecs-monitor-03-linux64-amd64.tar.gz")
if err != nil {
fmt.Println(err)
}
}

func downloadFile(httpUrl string) error {
//client := grab.NewClient()

_, err := grab.Get(".", httpUrl)
if err != nil {
	log.Printf("E! DownloadFile new Request error: %s", err.Error())
	return err
}

//resp := client.Do(req)
//if err = resp.Err(); err != nil {
//	log.Printf("E! DownloadFile client connect error: %s", err.Error())
//	return err
//}
//defer resp.Cancel()

return nil

}

range requests (resume) only work if request.Filename is explicitly set

I'm using grab for the first time, and I've written a function that takes a directory and a list of URLs, and uses grab to download the specified URLs to the directory, in parallel. Unless I'm doing something wrong (quite possible) range request headers don't seem to be added to my requests.

If I interrupt a download, and then start it again, the existing files do not resume downloading where they left off, regardless of how I set request.NoResume.

With request.NoResume = false, the download is transferred again from the beginning, but the destination file is not truncated. Its size does not change until the downloaded data exceeds the size of the existing file. But all of the file data is downloaded from the server from byte 0, regardless of the size of the existing destination file. I know that it is downloading from 0, because I see the incoming traffic in iftop, and because of what I see in the http headers (see below).

With request.NoResume = true, the behavior is the same, except that the destination file is truncated as soon as the file download begins.

I fired up Wireshark to investigate the requests going over the wire. In response to grab's initial HEAD request for the file, nginx correctly responds with Accept-Ranges: bytes, but the following GET request from grab does not contain a Range header of any kind.

HEAD /images/xxxxx.box HTTP/1.1
Host: image-build-201.xxxxx.org
User-Agent: pogos

HTTP/1.1 200 OK
Server: nginx/1.12.2
Date: Wed, 16 May 2018 21:10:46 GMT
Content-Type: application/octet-stream
Content-Length: 1629246392
Last-Modified: Wed, 16 May 2018 18:07:18 GMT
Connection: keep-alive
ETag: "5afc7356-611c53b8"
Accept-Ranges: bytes

GET /images/xxxxx.box HTTP/1.1
Host: image-build-201.xxxxx.org
User-Agent: pogos
Accept-Encoding: gzip

I have only tested this using the batch request mode, so I don't know if this also affects requests made synchronously.

I am using grab v2.0.0 with this code. If I'm "doing it wrong," please let me know.

panic: runtime error: invalid memory address or nil pointer dereference

I get the error below on running the example batchdl code:

package main

import (
    "fmt"
    "github.com/cavaliercoder/grab"
    "os"
    "time"
)

func main() {
    // get URL to download from command args
    if len(os.Args) < 2 {
        fmt.Fprintf(os.Stderr, "usage: %s url [url]...\n", os.Args[0])
        os.Exit(1)
    }

    urls := os.Args[1:]

    // start file downloads, 3 at a time
    fmt.Printf("Downloading %d files...\n", len(urls))
    respch, err := grab.GetBatch(3, ".", urls...)
    if err != nil {
        fmt.Fprintf(os.Stderr, "%v\n", err)
        os.Exit(1)
    }

    // start a ticker to update progress every 200ms
    t := time.NewTicker(200 * time.Millisecond)

    // monitor downloads
    completed := 0
    inProgress := 0
    responses := make([]*grab.Response, 0)
    for completed < len(urls) {
        select {
        case resp := <-respch:
            // a new response has been received and has started downloading
            // (nil is received once, when the channel is closed by grab)
            if resp != nil {
                responses = append(responses, resp)
            }

        case <-t.C:
            // clear lines
            if inProgress > 0 {
                fmt.Printf("\033[%dA\033[K", inProgress)
            }

            // update completed downloads
            for i, resp := range responses {
                if resp != nil && resp.IsComplete() {
                    // print final result
                    if resp.Error != nil {
                        fmt.Fprintf(os.Stderr, "Error downloading %s: %v\n", resp.Request.URL(), resp.Error)
                    } else {
                        fmt.Printf("Finished %s %d / %d bytes (%d%%)\n", resp.Filename, resp.BytesTransferred(), resp.Size, int(100*resp.Progress()))
                    }

                    // mark completed
                    responses[i] = nil
                    completed++
                }
            }

            // update downloads in progress
            inProgress = 0
            for _, resp := range responses {
                if resp != nil {
                    inProgress++
                    fmt.Printf("Downloading %s %d / %d bytes (%d%%)\033[K\n", resp.Filename, resp.BytesTransferred(), resp.Size, int(100*resp.Progress()))
                }
            }
        }
    }

    t.Stop()

    fmt.Printf("%d files successfully downloaded.\n", len(urls))
}
C:\GoCode\src>go run batchdl.go "https://www.google.com/images/branding/googlelo
go/1x/googlelogo_color_272x92dp.png"
Downloading 1 files...
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x0 pc=0x52a9db]

goroutine 10 [running]:
panic(0x6735e0, 0x10c60030)
        C:/Go/src/runtime/panic.go:464 +0x326
sync/atomic.LoadUint64(0x10c8a594, 0x6abcc0, 0x4)
        C:/Go/src/sync/atomic/asm_386.s:159 +0xb
github.com/cavaliercoder/grab.(*Response).BytesTransferred(0x10c8a540, 0x4, 0x6a
a718)
        C:/GoCode/src/github.com/cavaliercoder/grab/response.go:81 +0x2b
github.com/cavaliercoder/grab.(*Client).do(0x10cc17e0, 0x10c502d0, 0x0, 0x0, 0x0
)
        C:/GoCode/src/github.com/cavaliercoder/grab/client.go:222 +0x3a2
github.com/cavaliercoder/grab.(*Client).DoAsync.func1(0x10cc17e0, 0x10c502d0, 0x
10c485c0)
        C:/GoCode/src/github.com/cavaliercoder/grab/client.go:94 +0x2d
created by github.com/cavaliercoder/grab.(*Client).DoAsync
        C:/GoCode/src/github.com/cavaliercoder/grab/client.go:102 +0x62
exit status 2

C:\GoCode\src>

Strangely, this code runs fine on my Windows 10 64bit setup???

FTP Support

The title says it all. Apologies if it's a newbie question :)

How to correctly handle 416 Requested Range not Satisfiable

If I use grab to download the same file multiple times consecutively, given a server that responds to unsatisfiable range requests with 416 and a response body, I will get a corrupted file on the second attempt.

Using the demo code provided:

package main

import (
    "fmt"
    "os"
    "time"

    "github.com/cavaliercoder/grab"
)

func main() {
    // get URL to download from command args
    if len(os.Args) < 2 {
        fmt.Fprintf(os.Stderr, "usage: %s url\n", os.Args[0])
        os.Exit(1)
    }

    url := os.Args[1]

    // start file download
    fmt.Printf("Downloading %s...\n", url)
    respch, err := grab.GetAsync(".", url)
    if err != nil {
        fmt.Fprintf(os.Stderr, "Error downloading %s: %v\n", url, err)
        os.Exit(1)
    }

    // block until HTTP/1.1 GET response is received
    fmt.Printf("Initializing download...\n")
    resp := <-respch

    // print progress until transfer is complete
    for !resp.IsComplete() {
        fmt.Printf("Progress %d / %d bytes (%d%%)\n", resp.BytesTransferred(), resp.Size, int(100*resp.Progress()))
        time.Sleep(200 * time.Millisecond)
    }

    // check for errors
    if resp.Error != nil {
        fmt.Fprintf(os.Stderr, "Error downloading %s: %v\n", url, resp.Error)
        os.Exit(1)
    }

    fmt.Printf("Successfully downloaded to ./%s\n", resp.Filename)
}

and the url https://s.w.org/screenshots/3.8/colors.png

I get the following output:

% ./test-download https://s.w.org/screenshots/3.8/colors.png
Downloading https://s.w.org/screenshots/3.8/colors.png...
Initializing download...
Progress 0 / 356063 bytes (0%)
Successfully downloaded to ./colors.png
% ./test-download https://s.w.org/screenshots/3.8/colors.png
Downloading https://s.w.org/screenshots/3.8/colors.png...
Initializing download...
Progress 356063 / 356452 bytes (99%)
Successfully downloaded to ./colors.png
% ./test-download https://s.w.org/screenshots/3.8/colors.png
Downloading https://s.w.org/screenshots/3.8/colors.png...
Initializing download...
Error downloading https://s.w.org/screenshots/3.8/colors.png: Existing file (356452 bytes) is larger than remote (356063 bytes)

Note the difference in calculated content length.

Using less to view the end of the file reveals the error body from the later requests:

... rest omitted ...
^C<EA><A1>nˢ<D4>|<F3><F9>k<F3>;<C6>6^F+<A5><E1>:%`<92>C<F2><U+DCFD><B8>^_j<E5><D3>fn<E5><BC><CB>1<92>N<A6><CC>e<DE>ֹ<C5><CB>        E<EB><AC>Cm_<8B>^N\<BE><92><C7>=<8B>8Q^LW<A8>(^Y<81>E<D1>C:
^B<A4><A3>e<B4>^\<E4>^O3     J^Gy<A8><A1>*M<A4><F9>^X^\$^S<BF>d+^RA]<EC>^_<8D>ee<A8><D5>(^S<C3>(<89><CC> R^B<A5>@.<97>ESC
<80>^W<D9>X;^A^H<D9><F2><A2>-^_<94><9F>cFZ<A2>/U<AB><EA>v<E5><A7>^T<FB>)I><E9>;^P^H^F^P<FA>t|<98>^@Q^D\^<AC>ESC<C1>Z'*<84><A2><E7>11u<A4><FA>>r<A9>1fE[i<A9>^R<B7>C<95><F3>#Ռ)^^<BF><D7><E3>
<83><CC>䆲<EB>4<FF><F2>\D^K<87>^L<FE>R_l^D^?^E<BB>M*1<D0>^*N<9C>^G!<EE>^D^Y<B8>ʾ<D1><D4>;,^@v<8E>'<BB>W/<F7><8A>
<9B>5<DF>]<8F>|<98>^^D<99>8Rw/e^K^\'K<A1><A9><FF>^R<9F><9B>dN<FF><C6>I<E2>d<AB>^Xډ<C6> ^X;<92>^Zؒ<EA><BB>^B<E8>!jf-l^O<C5><F7><A0><C2>@Y^@<AB>^Tv]<FF>wg^A/<E5><B6>,<88>^@^E|c<A7>p<BD><94><D9>W<80><A8><AF><F7>tZ<E6>V<CC>5<AA><F2>\^C<A8>^T<B0>uu<D0>^X<CF>^S^X<E5><EF><E2><9C>^Z<E6><D8>^@<CF>B^Y<D8>><FE>^K<90><EE><C4>>e^Y<B4>ݎ3 ^VLIl}<BF><C5>yQ"c^W<D5><CB><F3><89>Vi<D8>^Y<A5><E0><A8>
<EC><9D><F7><8B>^FD<93><DC>.V<AE><AB>>[^\<CB> [<C6>t<AE>4ɚ<C9><U+063B>w<EF>!<95>^\^_       ;<ED><FF>233<F3>Jf1gMr<86><98>^E<F7><C3>^P{^^<97><FF>/<C0>^@    g?<B9>G<A8><DE>H^@^@^@^@IEND<AE>B`
<82><?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
         "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
        <head>
                <title>416 - Requested Range Not Satisfiable</title>
        </head>
        <body>
                <h1>416 - Requested Range Not Satisfiable</h1>
        </body>
</html>

If this is something I should account for in my client code (I'm using the library similarly to the demo) then please could I have some pointers on where to look in the docs? If it's a bug I'm happy to help.

Thanks

Data race

I just ran the example on the start page with the race detector.

Downloading http://www.golang-book.com/public/pdf/gobook.pdf...
  200 OK
==================
WARNING: DATA RACE
Read at 0x00c420136298 by main goroutine:
  github.com/cavaliercoder/grab.(*Response).BytesComplete()
      /home/rkaufmann/go/src/github.com/cavaliercoder/grab/response.go:133 +0x43
  main.main()
      /home/rkaufmann/Downloads/grab.go:30 +0x442

Previous write at 0x00c420136298 by goroutine 14:
  [failed to restore the stack]

Goroutine 14 (running) created at:
  github.com/cavaliercoder/grab.(*Client).Do()
      /home/rkaufmann/go/src/github.com/cavaliercoder/grab/client.go:81 +0x451
  main.main()
      /home/rkaufmann/Downloads/grab.go:18 +0x325
==================
==================
WARNING: DATA RACE
Write at 0x00c420076180 by main goroutine:
  sync/atomic.CompareAndSwapInt32()
      /usr/local/go/src/runtime/race_amd64.s:293 +0xb
  sync.(*Mutex).Lock()
      /usr/local/go/src/sync/mutex.go:74 +0x4d
  github.com/cavaliercoder/grab.(*transfer).N()
      /home/rkaufmann/go/src/github.com/cavaliercoder/grab/transfer.go:74 +0x4a
  github.com/cavaliercoder/grab.(*Response).BytesComplete()
      /home/rkaufmann/go/src/github.com/cavaliercoder/grab/response.go:133 +0x58
  main.main()
      /home/rkaufmann/Downloads/grab.go:30 +0x442

Previous write at 0x00c420076180 by goroutine 14:
  [failed to restore the stack]

Goroutine 14 (running) created at:
  github.com/cavaliercoder/grab.(*Client).Do()
      /home/rkaufmann/go/src/github.com/cavaliercoder/grab/client.go:81 +0x451
  main.main()
      /home/rkaufmann/Downloads/grab.go:18 +0x325
==================
  transferred 1156706 / 0 bytes 2893557 bps (39.98%)
  transferred 2617418 / 0 bytes 2893557 bps (90.46%)
Download saved to ./gobook.pdf
Found 2 data race(s)
exit status 66

The examples and cmd race

selecting on a ticker channel and accessing the resp object whilst the client Do method is working on it causes a race.

Easily reproduced.

The race exists between setting up the transfer and reading anything from it. As client.Do returns before resp.transfer has been fully set up, it can be read from 'too soon'.

In fact calling go resp.watchBps() also races with the transfer assignment.

(I will tal at a solution soon)

Should follow 302 responses

It looks like one of your latest releases removed the ability to follow a redirect (15e1314), unless you specify req.IgnoreBadStatusCodes = true. If this was an intentional update, then you can close this issue. Otherwise if you change the upper bound on this line to 399 we're good!

Multiple downloads to the same file result in bad content length

https://gist.github.com/yene/1f4f7da1ba2955590246480e8a17d802

$ go run grab-bug.go 
2019/04/27 17:42:40 finished download
2019/04/27 17:42:40 Could not download https://trello-attachments.s3.amazonaws.com/52460829b7e0c8990b00008e/59665dfd12eb088169fe65a0/3c1ea606328206cdaa4779183fa60bdb/image.png
2019/04/27 17:42:40 bad content length

Found this by downloading multiple files to the same filename, it seems statFileInfo has the filesize of the previous file.

Tests Panic on Windows

This is likely a Newbie issue (aka Id10t). I have just brought down the library (go get) to my windows 10 computer. Attempting to run tests on the code results in the following...

MINGW64 ~/go/src/github.com/cavaliercoder/grab (master)
$ go test .
--- FAIL: TestFilenameResolution (0.25s)
    --- FAIL: TestFilenameResolution/Using_Content-Disposition_Header_with_target_directory (0.05s)
        client_test.go:70: Filename mismatch. Expected '.test/header-filename', got '.test\header-filename'.
    --- FAIL: TestFilenameResolution/Using_URL_Path_with_target_directory (0.05s)
        client_test.go:70: Filename mismatch. Expected '.test/url-filename', got '.test\url-filename'.
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x10 pc=0x6bff14]

goroutine 1592 [running]:
testing.tRunner.func1(0xc00037a900)
        C:/Go/src/testing/testing.go:874 +0x3aa
panic(0x71f480, 0xa05920)
        C:/Go/src/runtime/panic.go:679 +0x1c0
github.com/cavaliercoder/grab.guessFilename(0xc000471ed0, 0xc00000a0f8, 0x787a34, 0x3, 0x78fc61)
        C:/Users/camer/go/src/github.com/cavaliercoder/grab/util.go:51 +0x34
github.com/cavaliercoder/grab.TestURLFilenames.func2(0xc00037a900)
        C:/Users/camer/go/src/github.com/cavaliercoder/grab/util_test.go:54 +0x12a
testing.tRunner(0xc00037a900, 0x7a3bb8)
        C:/Go/src/testing/testing.go:909 +0xd0
created by testing.(*T).Run
        C:/Go/src/testing/testing.go:960 +0x357
FAIL    github.com/cavaliercoder/grab   3.666s
FAIL
 ---

I am not fussed about the filename fails, but the panic has me blocked. I get the same error when attempting to use the library in my own code. Any suggestions are appreciated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.