Giter Club home page Giter Club logo

bitinformation's Introduction

publish test

Bit-Information-Content Tool

Bitshaper is a powerful tool for comparing data without regard to the bits that do not carry significant information. It also offers the possibility to explore the data for information content.

Quick start

This section shows some examples of the most common applications.

Install

python3 -m pip install bitinformation

Compare GRIB files

The comparison is made with a mask. This mask is calculated by the analyser, it can be read from the file or passed with the mask parameter.

Alt text

The usual case is that you will probably just want to compare two files. But this assumes that you already have a configuration.

bitshaper.py --compare file1.grib file2.grib --preprocessor raw

If you don't have a configuration file yet, you can create one by running the tool with --use-analyser --add-missing-parameters arguments.

bitshaper.py --compare file1.grib file2.grib --preprocessor raw --use-analyser --add-missing-parameters

Compute bitsPerValue in Simple Packing

test

Alt text

bitshaper.py --compare file1.grib file2.grib

Explore data

If you want to analyse the levels of each parameter, you must first define the primary key. This is a set of keys from different sources, e.g. Mars keys, Analyser and Preprocessor parameters. Then, in --value-key you define which values you want to record. The data then is exported to a CSV file.

params+="--primary-key short_name stream analyser_precision levelist preprocessor_bits_per_value "
params+="--value-key mask nbits_used"
params+="--csv $out_dir/explore.csv "
$tool $params --stats file1.grib file2.grib file3.grib

Algorithm behind the scene

The method calculates how much information content each bit in a number has. In essence, it is a statistical analysis of bit sequences. For example, according to this approach, random sequences of binary values and or a sequences of ones or zeros contain no information. Once a sequence has a structure, the information content is non-zero.

[0101010101010101] # low information content
[1111111111111111] # zero information content
[0000000000000000] # zero information content
[0000111100001111] # high information content

The following example explains the algorithm step by step without using formulas, when possible.

In the first step, assume there is a sequence S of 4-bit numbers. The sequence S is split into two arrays A and B. A is created by removing the last element from S and B, by removing the first element. The example below uses Python notatation to illustrate that.

S = [0, 1, 2, 3, 4, 5, 6, 7]

A = S[:-1] = [0, 1, 2, 3, 4, 5, 6]
B = S[1: ] = [1, 2, 3, 4, 5, 6, 7]

The next step is presented as a spreadsheet. In our example we work with 4-bit numbers, so we can identify each bit with the index i = [0, 1, 2, 3]. For illustration, we exapand our table with i and A, and i and B. A' and B' are the binary representations of the columns A and B, respectively. The columns A'[i] and B'[i] are the bits at the position i.

i A B A'=bin(A) B'=bin(B) A'[i] B'[i] seq = A'[idx]B'[idx]
0 0 1 0000 0001 0 1 01
0 1 2 0001 0010 1 0 10
0 2 3 0010 0011 0 1 01
0 3 4 0011 0100 1 0 10
0 4 5 0100 0101 0 1 01
0 5 6 0101 0110 1 0 10
0 6 7 0110 0111 0 1 01
1 0 1 0000 0001 0 0 00
1 1 2 0001 0010 0 1 01
1 2 3 0010 0011 1 1 11
1 3 4 0011 0100 1 0 10
1 4 5 0100 0101 0 0 00
1 5 6 0101 0110 0 1 01
1 6 7 0110 0111 1 1 11
2 0 1 0000 0001 0 0 00
2 1 2 0001 0010 0 0 00
2 2 3 0010 0011 0 0 00
2 3 4 0011 0100 0 1 01
2 4 5 0100 0101 1 1 11
2 5 6 0101 0110 1 1 11
2 6 7 0110 0111 1 1 11
3 0 1 0000 0001 0 0 00
3 1 2 0001 0010 0 0 00
3 2 3 0010 0011 0 0 00
3 3 4 0011 0100 0 0 00
3 4 5 0100 0101 0 0 00
3 5 6 0101 0110 0 0 00
3 6 7 0110 0111 0 0 00

The next stpe is groupping the table by (i, seq) columns and count the occurences. p is the probability with wich a sequence at bit position i occurs.

i seq count p = count/7
0 00 0 0.000
0 01 4 0.571
0 10 3 0.429
0 11 0 0.000
1 00 2 0.286
1 01 2 0.286
1 10 1 0.143
1 11 2 0.286
2 00 3 0.429
2 01 1 0.143
2 10 0 0.000
2 11 3 0.429
3 00 7 1.000
3 01 0 0.000
3 10 0 0.000
3 11 0 0.000

In the last step we compute the mutual information. To do that we take the columns i and p from the table and reshape them so that we have the probabilities for each sequence, i.e., p00, p01, p10, p11, in separate columns. This allows us to continue our example as a spreadsheet.

Formula below computes mutual information. It says how much information a bit contains.

M' = p00 * log(p00 / (p00 + p01) / (p00 + p10)) +
     p01 * log(p01 / (p00 + p01) / (p01 + p11)) +
     p10 * log(p10 / (p10 + p11) / (p00 + p10)) +
     p11 * log(p11 / (p10 + p11) / (p01 + p11))

M = M' / log(2)
i p00 p01 p10 p11 M
0 0.000 0.571 0.429 0.000 0.699
1 0.286 0.286 0.143 0.286 2.061
2 0.429 0.143 0.000 0.429 0.235
3 1.000 0.000 0.000 0.000 0.000

bitinformation's People

Contributors

joobog avatar

Stargazers

 avatar

Watchers

 avatar

bitinformation's Issues

Is this library working ?

Hi !

I'm trying to use this library, so as writen in the pypi page, I tryed:

import numpy as np
import bitinformation.bitinformation as bit
data = np.random.rand(10000)  
bi = bit.BitInformation()
bi.bitinformation(data)

but BitInformation class doesn't seems to exist here.
So can you help me to make it working ?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.