Comments (10)
Possibly related?
facebook/zstd#206
from zstd.
Ok, the best thing to do if one cannot ensure good-sized calls to Write, is to use a bufio.Writer
like this:-
func main() {
b := &bytes.Buffer{}
for i := 0; i < 500; i++ {
b.Write([]byte("Hello World! "))
}
data1 := b.Bytes()
fmt.Println("data len", len(data1))
// Compress 1
buffer1 := &bytes.Buffer{}
w1 := zstd.NewWriterLevel(buffer1, CompressionLevel)
w1.Write(data1)
w1.Close()
fmt.Println("Buffer1 len", buffer1.Len())
// Compress 2
buffer2 := &bytes.Buffer{}
w2 := zstd.NewWriterLevel(buffer2, CompressionLevel)
bw := bufio.NewWriter(w2) // default buffer size = 4k
// bw := bufio.NewWriterSize(w2, 8192) // buffer size = 8k
for i := 0; i < 500; i++ {
bw.Write([]byte("Hello World! "))
}
bw.Flush()
w2.Close()
fmt.Println("Buffer2 len", buffer2.Len())
}
Output:
data len 6500
Buffer1 len 33
Buffer2 len 44
It's not so elegant, but ¯\(ツ)/¯
— Hope that helps someone!
from zstd.
@jimsmart , try gozstd.Writer. It uses another underlying zstd API, which should have lower overhead.
from zstd.
The zstd bug referenced above (facebook/zstd#206) has been closed. Is this issue still ongoing? If yes, the need to wrap the writer with a buffer shall be documented, this is a pretty subtle usage advice.
from zstd.
Hi @rgeronimi,
I checked again previous result and indeed you'd currently have the same results.
zstd
(this lib) & gozstd
(from @valyala above) use 2 slightly different C zstd API with slightly different directions.
(this) zstd
library uses ZSTD_compressContinue
which basically use buffer-less zstd streaming compression, meaning we have complete control over memory at the expense of needing to do buffers Go-side if you want to optimize for compression size on small inputs.
gozstd
uses ZSTD_compressStream
which abstract that buffer logic into the C code (at the cost of having less control over memory consumption C land)
Hope this helps!
from zstd.
The following limitations for ZSTD_compressContinue
look scary:
- ZSTD_compressContinue() presumes prior input is still accessible and unmodified (up to maximum distance size, see WindowLog). It remembers all previous contiguous blocks, plus one separated memory segment (which can itself consists of multiple contiguous blocks)
- ZSTD_compressContinue() detects that prior input has been overwritten when
src
buffer overlaps. In which case, it will "discard" the relevant memory section from its history.
As I understand, they mean two things:
zstd
could use garbage as a dictionary from the previous buffers used in theZSTD_compressContinue
call if address of these buffers doesn't match the address of the current buffer. This also may lead to segmentation fault if the underlying memory of the previous buffer has been unmapped from the process address space.zstd
may have bad compression rate, since it discards dictionary data from the previously compressed block if the buffer passed to the func is re-used.
cc'ing @Cyan4973 for further clarification.
from zstd.
@Viq111 explanations are correct.
ZSTD_compressContinue()
is a fairly low level function, designed for systems which need absolute control over memory allocation. It requires a fairly good control over buffer content and lifetime. To be fair, it's more targeted at embedded environments than managed languages, but I'm not a qualified expert to tell if this is a good fit or not for go
.
When in doubt, prefer using ZSTD_compressStream()
. It's safer to use, and abstract all the machinery, at the cost of also managing its own internal buffers.
from zstd.
ZSTD_compressContinue()
is a fairly low level function, designed for systems which need absolute control over memory allocation. It requires a fairly good control over buffer content and lifetime. To be fair, it's more targeted at embedded environments than managed languages, but I'm not a qualified expert to tell if this is a good fit or not forgo
.
This depends on what the Go wrapper code does. I just checked it and it transmits directly the user-provided buffer as a C pointer. If I understand what @valyala wrote, this could be a critical bug as the zstd C code is expecting this buffer to remain accessible after the function returns. If true, this would have the potential for data corruptions, process crashes, and hard-to-reproduce cases.
When in doubt, prefer using
ZSTD_compressStream()
. It's safer to use, and abstract all the machinery, at the cost of also managing its own internal buffers.
from zstd.
Reading back at the code, we started implementing the go wrapper at zstd v0.5 which only had the ZBUFF_decompressContinue
methods indeed: https://github.com/facebook/zstd/blob/201433a7f713af056cc7ea32624eddefb55e10c8/lib/zstd_buffered.h#L79
It may actually be also the issue for #39
If anyone could put up a PR for migrating to ZSTD_compressStream
, we are accepting all contributions!
Otherwise I can also look into it as it seems it could bit a couple of people using the streaming interface
from zstd.
We don't have the skillset to zoom into that soon. For storage tasks (e.g., blob storage in DB or compressed custom backup) this bug is a showstopper unfortunately.
from zstd.
Related Issues (20)
- ctx creation and customition HOT 3
- Mac M1 help HOT 1
- Decompress does not detect missing checksum
- Unable to decompress valid zstd content HOT 1
- Undefined behavior triggered in C code HOT 2
- Clarify zstd.NewWriterLevel usage for `n` separate units
- Not Support Apple M1 HOT 2
- Setting decompression WindowSize? HOT 1
- Update to Zstandard v1.5.5 HOT 1
- Unable to read valid zst file in version 1.5.2
- Decompress return UnexpectedEOF if Skippable frames written HOT 1
- Unable to use streaming Reader with bufio.NewScanner HOT 2
- dictionary of size 0 < 8 HOT 1
- decompress cpu usage would change if compress api changed. HOT 2
- Zstandard v1.5.6 is available HOT 2
- If I have two types data --compressed and uncompressed, when I call the DeCompress function, would it report an error? HOT 1
- Add Go module file HOT 2
- TestStreamCompressionDecompressionParallel frequently fails on machines with many cpus HOT 8
- [zstd_stream] Reader.Read can block even if a zstd block is available HOT 1
- Can not work with archive/tar HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zstd.