-
I am compressing a large number of small data. I find that the compression performance is slightly worse than zlib for types of data that have small size (~100 bytes). I came across this post (facebook/zstd#1134 (comment)) that suggests using "block mode" to reduce header size. I would like to ask:
I'm aware that dictionary compression is recommended for small data. But, I would like to know if there is a more general way that does not require a dictionary for each data type. This is because I have quite a few data types, and I have trained a dictionary for the most frequently occurring type, but it will take time to train dictionaries for all of them. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
I don't want to add a dedicated "block" interface, since that will just add confusion about which is which. A reasonable approach would be to strip the frame header and re-add it. I can add that to the Header parsing that already exists. // DecodeAndStrip will decode the header from the beginning of the stream
// and on success return the remaining bytes.
// This will decode the frame header and the first block header if enough bytes are provided.
// It is recommended to provide at least HeaderMaxSize bytes.
// If the frame header cannot be read an error will be returned.
// If there isn't enough input, io.ErrUnexpectedEOF is returned.
// The FirstBlock.OK will indicate if enough information was available to decode the first block header.
func (h *Header) DecodeAndStrip(in []byte) (remain []byte, err error)
// AppendTo will append the encoded header to the dst slice.
func (h *Header) AppendTo(dst []byte) ([]byte, error) This way you can strip the header after encoding and you can construct a header to add before decoding. |
Beta Was this translation helpful? Give feedback.
I don't want to add a dedicated "block" interface, since that will just add confusion about which is which.
A reasonable approach would be to strip the frame header and re-add it. I can add that to the Header parsing that already exists.