Currently already using the lowest fidelity settings possible (8KHz, mono, 8bit). But

Compress PCM data about dogwatch HOT 3 CLOSED

pzl commented on August 25, 2024

Compress PCM data

from dogwatch.

Comments (3)

pzl commented on August 25, 2024

Our audio data situation is unique in that quality is not really that important, and it has a lot of dead air. Compression formats such as FLAC may be optimized for other cases where audio data might be intended for listening (music, podcast, etc) instead of computing.

Largest gain may be in cases where long periods of silence are compressed as much as possible.

from dogwatch.

pzl commented on August 25, 2024

First idea is to compress a long list of silent bytes (0x80, or 128) as a sequence that instead says "the next X bytes are 0x80". In PCM, this would take X bytes. Otherwise specifying the byte that will repeat, and then the length of repetition might be 2 or 3 bytes. For constantly changing values, of course this would balloon the size of the file 3 times: [byte Y, length: 1],[byte Z, length: 1], etc instead of [Y],[Z]...

So the whole file should not follow this system of specifying how long this byte lasts. It needs to intermix bytes as data values and bytes as length specifiers. To do this, we need a way of switching contexts. Since 0x00 through 0x255 are all valid PCM data values, we may need to annex one to have a special meaning. This means we might lose this value as a valid data. It makes the most sense to use an uncommon value, such as 0x00 or 0x255. Picking an end makes sense, as it only slightly alters where audio clipping happens. A single value difference shouldn't be noticeable.

To specify repeating data, first will be the byte 0x00 as a "signal" byte. The following byte is the value to repeat. The next byte is the number of samples the previous byte should repeat for.

The byte 0x00 could be reclaimed by the following byte sequence: 00 00 01 defining a single 0x00 data value. This is a 3x size increase for this byte, but for 255 samples of silence, it drops to 3 bytes (85x decrease!).

from dogwatch.

pzl commented on August 25, 2024

The above case is lossless "compression".

We can improve size by employing a little lossy compression. Given proper mic gain and proximity placement to the animal, barks should have quite significant amplitude. And background noise might oscillate +/- 2 from the midline. Do we care about the background noise? Nope, not so far. For continuous samples within 126 to 130, we could clamp that to 128. This increases the chances of long stretches of silence, lowering the file size at the cost of losing information on quiet noises.

Could make this a setting. Also the amount of lossiness (+/- X turns to silence)

from dogwatch.

Compress PCM data about dogwatch HOT 3 CLOSED

Comments (3)

Related Issues (12)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent