Giter Club home page Giter Club logo

huffman's Introduction

Huffman coding

Huffman coding is an optimal way of encoding a string x = "..."1 in fewest bits, given that we wish each original character in the string to appear at one unique position (i.e. we are not allowed to encode substrings as single characters or represent them overlapping or other tricks sometimes used in compression). This restriction is sometimes important in algorithms where we need to be able to index into the string efficiently, but that is a topic for another day (for example a topic for GSA) and requries a few more tricks.

The simplest way to represent a string is with a fixed number of bits per character. The ASCII character set, for example uses 7 bits per character, although they are usually stored in 8-bit bytes, originally using the extra bit as an error code.

If you do this, and your character set is n large, you need log n bits per character, because that is how many bits you need to give each character a unique number.

This is fine if all characters are equally likely to appear in a string, but they aren't. There are special characters that you practically never see in real life. And with Unicode that aims at being able to handle all the worlds languages and quite a few invented languages as well, you only ever see a tiny sub-set of all characters.

It would be more efficient to use few bits for letters you see frequenty and then pay for that by using more bits for rare characters. You still need to map each character to a unique number, but they don't have to be represented in the same number of bits.

Unicode itself only specifies the mapping from characters to numbers, but encodings deal with how strings are stored, and an encoding such as UTF-8 puts the letters we are familiar with in the West, such as the ASCII characters and letters in latin-based alphabets, in one byte at the cost of using two or more for other characters.

Huffman encoding doesn't consern itself with bytes but individual bits and is a way of mapping each character to a sequence of bits, such that the total number of bits to represent is string is minimised.

The algorithm is described in the book (page 631), but in this exercise you get to implement some of it yourself.

Exercise: In src/huffman.py impelement the encoding(x: str) -> Tree function that builds a Huffman tree (basically a heap) from a string.

Exercise: Implement the missing bit of the build_encoding_table(...) function. You need to traverse the Huffman tree to extract the bit pattern that is matched to each letter.

Exercise: Implement the missing part of decode(x: bits, enc: Encoding) -> str that recovers the original string from an encoding.

Footnotes

  1. A string in general terms, it can be any sequence over a finite set of characters. But think strings as in English text or DNA or such. What you think of as a string will work with Huffman coding. โ†ฉ

huffman's People

Contributors

mailund avatar

Watchers

 avatar

huffman's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.