Giter Club home page Giter Club logo

ssdpcm's Introduction

SSDPCM

SSDPCM is an audio codec developed by Algorithm (of demoscene fame). I was fascinated by its compression efficiency and decompression speed, so I decided to study it and perform my own implementation of it.

This repository contains this implementation of an encoder/decoder for multiple variants of the SSDPCM codec. It also specifies my own standard on how SSDPCM files should be encoded. Read further for more information.

What is SSDPCM?

SSDPCM stands for Step-Selected Differential Pulse Code Modulation. It's basically a form of DPCM that's split into blocks of samples, where the step sizes for each codeword are explicitly specified along with each block. It's an efficient low-bitrate audio codec designed with very low decode complexity.

Is SSDPCM a form of ADPCM?

No. ADPCM works by signalling the step size change implicitly and continuously over time. In ADPCM, the decoder infers the step size change from the codewords themselves, and the encoder is responsible for generating the correct codewords.

SSDPCM, on the other hand, signals the step size change explicitly and only at the beginning of a new block of samples. This might sound like a disadvantage for SSDPCM, but it turns out to be quite an advantage against ADPCM once we begin to look at one of its big shortcomings - adaptation delay.

ADPCM takes time to react to highly changing slope sizes, which at lower bitrates leads to higher levels of slope overload and impulse noise, because its encoding process can only predict the next sample. SSDPCM encoding, on the other hand, analyzes the entire block of samples at once and determines the best slopes to use within that block in order to minimize error, which minimizes slope overload and impulse noise in tradeoff for flat hissing which is less objectionable.

What is SSDPCM good for?

SSDPCM was originally intended as an efficient low-bitrate audio codec for fast decompression on ancient hardware, and that's what it does best. SSDPCM's original bitrate was 2 bits per sample; SSDPCM's sweet spot for encoding efficiency lies at the medium bitrates, between 1.6 and 2.3 bits per sample inclusive (fractional bit rates are a relatively recent development of mine for SSDPCM; the original author had only worked with integer bitrates of 1 and 2 bits per sample).

SSDPCM also does very well at 1 bit per sample, but the quality is a bit harsh and in my opinion you should really only use it if you really need to save space. There's a comb filter mode that helps cut down on the hiss in exchange for some of the high frequency reproduction.

SSDPCM was meant to be easily played back by even simple 8-bit CPUs like the 6502. It's so lightweight that my SSDPCM player for the Nintendo Entertainment System, https://github.com/Kagamiin/ssplayer-nes, is capable of achieving peak decompression and playback rates of up to 35 KHz (with some sacrifice - in this case, a tiny bit of jitter). That's almost CD sample rate, mind you.

Why does SSDPCM take so long to encode at higher bitrates?

SSDPCM requires the encoder to analyze each block of samples and search for the best set of slopes. The higher the bitrate, the higher the number of combinations of slopes that needs to be searched through. My older implementation used a brute force algorithm which was unbearably dog slow. My current implementation uses a bisecting search algorithm that's much faster, but it still takes somewhat of a long time to encode at 3 bits per sample.

Note that if you use 16-bit samples as input, it can take even longer to encode, too, especially for the higher bitrates. Though that's where it makes the most sense to use 16-bit samples.

Usage

To use this repository, you'll need a C compiler and the following tools/libraries:

  • GNU Make
  • OpenMP (optional)

If you don't have or don't want to use OpenMP, remove -fopenmp from line 23 of the Makefile and remove encoder_parallel from line 113.

Building

To build the executables, simply run make. They'll be generated into the build directory, which will be automatically created for you.

The following programs will be compiled:

  • encoder - This is a SSDPCM encoder/decoder. It supports all of the modes documented above and is able to both encode and decode files in the format documented above. It supports mono and stereo.

  • encoder_parallel - This is a paralellized SSDPCM encoder. It also supports all of the modes documented above, but it's most useful for the higher bitrate modes. It generates slightly larger files than the normal encoder, because it has to store reference samples for every block in order to be able to encode them in parallel. It can decode files too, but it's not parallelized for that and it's a bit slower than the other program at it. It supports mono and stereo, too.

  • nes_encoder - This is a special SSDPCM encoder tailored for my NES SSDPCM sample player. It only supports the subset of the modes that are supported by my sample player. It does not support WAV input, only raw unsigned 8-bit PCM (I need to change that). And the output it generates is not a single file, but a bunch of small files to be used in the assembly process. It also simultaneously generates a decoded output file so you can hear the result immediately after encoding. It obviously only supports mono, because the NES is mono.

  • wav_simulator - This is a toy encoder that can be used to experiment with SSDPCM encoding. It lets you specify the number of slopes directly, and allows you to use comb filtering in any of the modes - so it can actually simulate a lot of SSDPCM bitrates that don't actually exist as a mode (yet, or due to being impractical to pack/unpack). The only disadvantage is that being a toy, it doesn't actually generate an encoded file - it internally encodes and decodes the output, then saves the decoded output as a WAV file. It also only supports mono, because it's an older program that was made before I conceived stereo encoding for SSDPCM.

SSDPCM file format specification

SSDPCM can be stored in quite a few ways, as long as it's convenient enough for playback. For instance, nes_encoder.c illustrates a quite unorthodox way of storing SSDPCM - where bitstream and slope data is stripped apart into a bunch of separate binary files, to be later assembled into a NES ROM. Such method happens to be quite convenient for making NES sample players using my own tool (https://github.com/Kagamiin/ssplayer-nes).

But in any case, we need a file format with the following characteristics:

  • can store SSDPCM audio in a single file
  • is easy to manipulate
  • can preferably be used with existing libraries (avoid custom containers as much as possible)

File container

SSDPCM is stored in the WAV container file format, following the WAVEFORMATEXTENSIBLE specification.

SSDPCM uses the subformat GUID 50445353-4d43-4b3a-6167-616d69696e7e. Note that subformat GUIDs are written out with the first 3 groups of bytes reversed, so this would be written out as: 0x53, 0x53, 0x44, 0x50, 0x43, 0x4d, 0x3a, 0x4b, 0x61, 0x67, 0x61, 0x6d, 0x69, 0x69, 0x6e, 0x7e.

By convention, it's good practice to use the .AUD extension to name SSDPCM files, in order to not mix them up with normal WAV files.

NOTE: All values are little-endian unless specified.

Field Description Length Accepted values
wFormatTag Legacy format tag from the WAVEFORMATEX header. 2 bytes 0xFFFE
nChannels Number of channels (mono or stereo). 2 bytes 1 or 2
nSamplesPerSec Sampling rate. 4 bytes Any unsigned integer
nAvgBytesPerSec Average bitrate divided by 8, rounded down. 4 bytes The expected value.
nBlockAlign Number of bytes per SSDPCM frame - not per block, read further for more info. 2 bytes (bytes_per_block * nChannels) + (bits_per_output_sample * has_reference_sample_on_every_block * nChannels / 8)
wBitsPerSample Unused - my SSDPCM implementation has fractional bit-per-sample values. 2 bytes 0
cbSize Length of the following extra data after the WAVEFORMATEX header. 2 bytes 0x26
wSamplesPerBlock Number of samples per block. 2 bytes Any unsigned integer that's a multiple of the number of samples that fit in bytes_per_read_alignment (see below).
dwChannelMask Channel bitmask - see WAVEFORMATEXTENSIBLE specification for more information. 4 bytes 1 or 2 bits set depending on number of channels
SubFormat Subformat GUID (GUID-endianness) 16 bytes The SSDPCM GUID specified above.
ssdpcm_data_chunk_id SSDPCM subchunk identifier (big-endian) 4 bytes "SsDP"
mode_fourcc SSDPCM mode identifier (big-endian) 4 bytes See below.
num_slopes Number of distinct slopes in the chosen SSDPCM mode. 1 byte See below.
bits_per_output_sample Determines if the file is based around 8 or 16-bit samples. 1 byte 8 or 16
bytes_per_read_alignment Determines the minimum packing alignment for reading the codewords. 1 byte See below.
has_reference_sample_on_every_block Determines if every block has a reference sample or not. 1 byte 0 or 1
block_length Number of samples per block. 2 bytes Same as wSamplesPerBlock
bytes_per_block Number of bytes per block. 2 bytes num_slopes * bits_per_output_sample / 8 + number of bytes used to represent the wSamplesPerBlock codewords (must be a multiple of bytes_per_read_alignment)

Possible values for mode_fourcc:

  • ss1 - 1-bit SSDPCM
    • Implies num_slopes = 2, bytes_per_read_alignment = 1
  • ss1c - 1-bit SSDPCM with comb filtering
    • Implies num_slopes = 2, bytes_per_read_alignment = 1
  • s1.6 - 1.6-bit SSDPCM
    • Implies num_slopes = 3, bytes_per_read_alignment = 1
  • ss2 - 2-bit SSDPCM
    • Implies num_slopes = 4, bytes_per_read_alignment = 1
  • s2.3 - 2.3-bit SSDPCM
    • Implies num_slopes = 5, bytes_per_read_alignment = 7
  • ss3 - 3-bit SSDPCM
    • Implies num_slopes = 8, bytes_per_read_alignment = 3

Bitstream specification

SSDPCM is divided into byte-aligned blocks of samples. One or more blocks are grouped into a frame, according to the number of channels in the stream.

The structure of a block is as follows:

  • Block header
    • Size: num_slopes * bits_per_output_sample / 8
    • Contains the first floor(num_slopes / 2) slopes for that block
  • Codestream
    • Size: bytes_per_block - (num_slopes * bits_per_output_sample / 8)
    • Contains the packed codestream representing the codewords for that block

The structure of a frame is as follows:

  • Reference samples
    • Size: bits_per_output_sample / 8 * nChannels
    • Contains the reference samples for each respective block in this frame
  • Blocks
    • Size: bytes_per_block * nChannels
    • Contains the concatenated blocks as specified above.

Code packing structure for different modes

ss1/ss1c

ss1 and ss1c have identical codestream formats, differing only in how they're encoded and decoded - ss1c is encoded and decoded with an in-loop comb filter, while ss1 isn't.

ss1 and ss1c's codewords can be either 0 or 1, representing only the sign of the single slope. Those codewords are represented as single bits, where 8 codewords are packed into 1 byte in MSB-first bit endianness.

ss2

ss2's codewords can range from 0 to 3. The first two codewords select the two respective slopes with positive magnitude, and the last two codewords select the same two slopes but with negative magnitude.

Those codewords are represented as 2-bit numbers, where 4 codewords are packed into 1 byte in MSB-first bit endianness.

ss1.6

ss1.6's codewords can range from 0 to 2. The first two codewords represent the positive and negative magnitudes of the single slope, while the last codeword represents a zero magnitude.

Those codewords are range-coded together into a single 8-bit value that can range from 0 to 242. Note that when decoding the codewords via standard range code extraction, they must be decoded in reverse order - that is, the fifth code must be extracted first, then the fourth, then the third, then the second, then the first.

ss2.3

ss2.3's codewords can range from 0 to 4. The first two codewords represent the positive magnitudes of the two respective slopes, with the next two codewords representing the negative magnitudes of those same slopes, and the last slope representing a zero magnitude.

Those codewords are arranged into groups of 24. Within each group, the 24 codewords are further subdivided into 8 subgroups of 3 codewords, that are range-coded together into a 7-bit value that can range from 0 to 124.

The 8 subgroups are then efficiently packed together into 7 bytes of data in the following manner: the 7-bit values of the first 7 subgroups are first left-shifted by 1 bit. Then for each of those, one bit of the last subgroup, from the least significant to the most significant, is inserted into their least significant bit slot.

ss3

ss3's codewords can range from 0 to 7. The first four codewords represent the positive magnitudes of the four respective slopes, while the last four codewords represent the negative magnitudes of those same slopes.

Normally, those codewords could simply be represented as 3-bit numbers. However, for the sake of decoding efficiency on ancient hardware, a different approach is taken to maintain inter-byte alignment:

Those codewords are arranged into groups of 8. Within each group, the 8 codewords are further subdivided into 4 groups of 2 codewords, that are range coded together into a 6-bit value that can range from 0 to 63.

The 4 subgroups are then efficiently packed together into 3 bytes of data in the following manner: the 6-bit values of the first 3 subgroups are first left-shifted by 2 bits. Then for each of those, a group of two bits from the last subgroup, from the least significant to the most significant, is inserted into the least 2 significant bit slots, without swapping the order of the two bits.

ssdpcm's People

Contributors

kagamiin avatar pinobatch avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Forkers

pinobatch

ssdpcm's Issues

wav_file.c ignores return value of fread

I am attempting to build SSDPCM using gcc (Ubuntu 13.2.0-4ubuntu3) 13.2.0 on Xubuntu 23.10 on x86-64. I have removed -fopenmp and encoder_parallel as instructed. It fails to build from source:

$ make
[snip]
./src/wav_file.c: In function ‘wav_read_fmt_chunk_’:
./src/wav_file.c:238:9: error: ignoring return value of ‘fread’ declared with attribute ‘warn_unused_result’ [-Werror=unused-result]
  238 |         fread(&chunk->fmt_type, sizeof(uint16_t), 1, w->fp);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./src/wav_file.c:239:9: error: ignoring return value of ‘fread’ declared with attribute ‘warn_unused_result’ [-Werror=unused-result]
  239 |         fread(&chunk->num_channels, sizeof(uint16_t), 1, w->fp);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./src/wav_file.c:240:9: error: ignoring return value of ‘fread’ declared with attribute ‘warn_unused_result’ [-Werror=unused-result]
  240 |         fread(&chunk->sample_rate, sizeof(uint32_t), 1, w->fp);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[snip]
cc1: all warnings being treated as errors
make: *** [Makefile:128: wav_file.o] Error 1

A workaround is to add -Wno-unused-result to CFLAGS_DEV around line 20. Are you interested in a pull request to fix wav_file.c to not ignore unused results?

Makefile links libm before object files

I am attempting to build SSDPCM using gcc (Ubuntu 13.2.0-4ubuntu3) 13.2.0 on Xubuntu 23.10 on x86-64. I have removed -fopenmp and encoder_parallel as instructed. It fails to build from source:

$ make
[snip]
cc -Wall -Wextra -Werror -Wno-unused-result -O3 -Ofast  -D_DEBUG -I./src/include -o ./build/nes_encoder -lm ./build/block_codec.o ./build/sigma.o ./build/sigma_generic.o ./build/sigma_generic_comb.o ./build/sigma_u8_overflow.o ./build/sigma_u8_overflow_comb.o ./build/sigma_u7_overflow.o ./build/sigma_u7_overflow_comb.o ./build/encode_bruteforce.o ./build/encode_binary_search.o ./build/sample_conv.o ./build/sample_filter.o ./build/bit_pack_unpack.o ./build/range_coder.o ./build/nes_encoder.o ./build/wav_file.o ./build/error_strs.o
/usr/bin/ld: ./build/encode_binary_search.o: in function `ssdpcm_encode_binary_search':
encode_binary_search.c:(.text+0x11e): undefined reference to `log2'
collect2: error: ld returned 1 exit status
make: *** [Makefile:125: build/nes_encoder] Error 1

In general, when ld (called by cc, which on my system is gcc) encounters a library specified with -l on the command line, it searches the library only for outstanding symbols from input files that precede it on the command line. (See Why does the order in which libraries are linked sometimes cause errors in GCC?.)

The solution is to move all -l arguments after all object file name arguments. For example, replace this:

$(BUILD_DIR)/encoder: $(objects_enc)
	$(CC) $(CFLAGS) -o $(BUILD_DIR)/encoder -lm $(patsubst %,$(BUILD_DIR)/%,$(objects_enc))

with this:

$(BUILD_DIR)/encoder: $(objects_enc)
	$(CC) $(CFLAGS) -o $(BUILD_DIR)/encoder $(patsubst %,$(BUILD_DIR)/%,$(objects_enc)) -lm

Are you interested in a pull request to make this change to the makefile?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.