Giter Club home page Giter Club logo

Comments (3)

pzl avatar pzl commented on August 25, 2024

Our audio data situation is unique in that quality is not really that important, and it has a lot of dead air. Compression formats such as FLAC may be optimized for other cases where audio data might be intended for listening (music, podcast, etc) instead of computing.

Largest gain may be in cases where long periods of silence are compressed as much as possible.

from dogwatch.

pzl avatar pzl commented on August 25, 2024

First idea is to compress a long list of silent bytes (0x80, or 128) as a sequence that instead says "the next X bytes are 0x80". In PCM, this would take X bytes. Otherwise specifying the byte that will repeat, and then the length of repetition might be 2 or 3 bytes. For constantly changing values, of course this would balloon the size of the file 3 times: [byte Y, length: 1],[byte Z, length: 1], etc instead of [Y],[Z]...

So the whole file should not follow this system of specifying how long this byte lasts. It needs to intermix bytes as data values and bytes as length specifiers. To do this, we need a way of switching contexts. Since 0x00 through 0x255 are all valid PCM data values, we may need to annex one to have a special meaning. This means we might lose this value as a valid data. It makes the most sense to use an uncommon value, such as 0x00 or 0x255. Picking an end makes sense, as it only slightly alters where audio clipping happens. A single value difference shouldn't be noticeable.

To specify repeating data, first will be the byte 0x00 as a "signal" byte. The following byte is the value to repeat. The next byte is the number of samples the previous byte should repeat for.

The byte 0x00 could be reclaimed by the following byte sequence: 00 00 01 defining a single 0x00 data value. This is a 3x size increase for this byte, but for 255 samples of silence, it drops to 3 bytes (85x decrease!).

from dogwatch.

pzl avatar pzl commented on August 25, 2024

The above case is lossless "compression".

We can improve size by employing a little lossy compression. Given proper mic gain and proximity placement to the animal, barks should have quite significant amplitude. And background noise might oscillate +/- 2 from the midline. Do we care about the background noise? Nope, not so far. For continuous samples within 126 to 130, we could clamp that to 128. This increases the chances of long stretches of silence, lowering the file size at the cost of losing information on quiet noises.

Could make this a setting. Also the amount of lossiness (+/- X turns to silence)

from dogwatch.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.