Giter Club home page Giter Club logo

nupic_tutorial's Introduction

Numenta's NuPIC Algorithm API tutorial

The Numenta Platform for Intelligent Computing (NuPIC) is a machine intelligence platform that implements the HTM learning algorithms. HTM is a detailed computational theory of the neocortex. At the core of HTM are time-based continuous learning algorithms that store and recall spatial and temporal patterns. NuPIC is suited to a variety of problems, particularly anomaly detection and prediction of streaming data sources. For more information, see numenta.org or the NuPIC Forum.

Today, we are going to implement anomaly detection and forecasting algorithm using Nupic's base classes.

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
import pandas as pd
from tqdm import tnrange, tqdm_notebook


from nupic.encoders import ScalarEncoder
from nupic.research.spatial_pooler import SpatialPooler
from nupic.research.temporal_memory import TemporalMemory
from nupic.algorithms.anomaly import Anomaly
from nupic.algorithms.sdr_classifier import SDRClassifier

%matplotlib inline
!wget -c -b http://www-personal.umich.edu/~mejn/cp/data/sunspots.txt
Continuing in background, pid 24640.
Output will be written to ‘wget-log.1’.
df = pd.read_csv('sunspots.txt', sep='\t', names=['Months', 'SunSpots'], header=None, index_col='Months')
df.head()
SunSpots
Months
0 58.0
1 62.6
2 70.0
3 55.7
4 85.0

NuPIC

For this example, we will try to implement nupic pipeline using barebone classes. The goal is to forecast anomalies from the given dataset. We will be using the sunspots per month dataset that you can get here

The NuPIC pipeline looks like this:

  • Encoding data into SDR using Encoders
  • Passing encoded data through the Spatial Pooler
  • Running Temporal Memory over the Spatial Pooler’s active columns
  • Extracting predictions using a Classifier
  • Extracting anomalies using Anomaly and AnomalyLikelihood

Encoder

Firstly, we need to set up an encoder. Since we have a limited set of data, we will be using ScalarEncoder. We could have also gone for the AdaptiveScalarEncoder, and the only difference between the two is that AdaptiveScalarEncoder doesn't need to know the min and max values of your data

When creating a Scalar Encoder, we need to supply it the following parameters.

w
The number of bits that are set to encode a single value in the output (applied to 0s and 1s). How many 1's you are going to get. w has to be an odd number greater than 21
minval
Minimal value in your dataset
maxval
Maximum value in your dataset
periodic
If true, then the input value "wraps around" such that minval=maxval, basically imagine a sin() function.
n
Total length of the output, aka number of columns. For example, if you have a layer 32x32, n=32^2
radius
Applies to input. Two inputs separated by more than the radius have non-overlapping representations. Two inputs separated by less than the radius will in general overlap in at least some of their bits. You can think of this as the radius of the input. Default is 0. What this means, is that if you have a radius of 2: 2,4,8, etc will not share any bits
resolution
Applies to input. Two inputs separated by greater than, or equal to the resolution are guaranteed to have different representations. For example, with resolution=1, 1 and 2 will have different representation, but not 1 and 1.5. Default is 0.
name
Name of the encoder
verbosity
How much info to output: 0, 1, 2
clipInput
If true, non-periodic inputs smaller than minval or greater than maxval will be clipped to minval/maxval
forced
Skip safety checks

Some formuals

  • Range: $range = maxval - minval$
  • Half-width: $h = (w-1)/2$
  • Resolution: $resolution = radius / w$
  • Output length for periodic input: $n = w * range/radius$ or $n=w^2 * range * resolution$
  • Output length for non-periodic input: $n = w * range/radius + 2 * h$

As you can see, n, resolution and radius are mutually dependant. Hence, you can only provide one of those.

To learn more about encoders, read the official documentation and watch HTM School episodes 0, 1, 2, 3, 4, 5, 6

Let's create a scalar encoder with w=21 and n=100

enc = ScalarEncoder(
    w=21, 
    minval=df.min().values, 
    maxval=df.max().values, 
    periodic=False, 
    n=125, 
    #radius=0, 
    #resolution=1, 
    name="sunspots", 
    verbosity=0, 
    clipInput=False, 
    forced=False)

Spatial Pooler

Now that we have an encoder, let's set up a Spatial Pooler. The most fundamental function of the spatial pooler is to convert a region’s input into a sparse pattern. This function is important because the mechanism used to learn sequences and make predictions requires starting with sparse distributed patterns.

Spatial Pooler supports the following hyperparameters:

inputDimensions
A sequence representing the dimensions of the input vector. Format is (height, width, depth, ...)
columnDimensions
A sequence representing the dimensions of the columns in the region. Format is (height, width, depth, ...), where each value represents the size of the dimension. For a topology of one dimension with 2000 columns use 2000, or (2000,). For a three dimensional topology of 32x64x16 use (32, 64, 16).
potentialRadius
This parameter determines the extent of the input that each column can potentially be connected to. This can be thought of as the input bits that are visible to each column, or a 'receptiveField' of the field of vision. A large enough value will result in 'global coverage', meaning that each column can potentially be connected to every input bit. This parameter defines a square (or hyper square) area: a column will have a max square potential pool with sides of length $2 * potentialRadius + 1$.
potentialPct
The percent of the inputs, within a column's potential radius, that a column can be connected to. If set to 1, the column will be connected to every input within its potential radius. This parameter is used to give each column a unique potential pool when a large potentialRadius causes overlap between the columns. At initialization time we choose $((2*potentialRadius + 1)^(# inputDimensions) * potentialPct)$ input bits to comprise the column's potential pool.
globalInhibition
If true, then during inhibition phase the winning columns are selected as the most active columns from the region as a whole. Otherwise, the winning columns are selected with respect to their local neighborhoods. Using global inhibition boosts performance x60. To learn more about inhibition and topology, watch this HTM School episode.
localAreaDensity
The desired density of active columns within a local inhibition area (the size of which is set by the internally calculated inhibitionRadius, which is in turn determined from the average size of the connected potential pools of all columns). The inhibition logic will insure that at most N columns remain ON within a local inhibition area, where $N = localAreaDensity * (total number of columns in inhibition area)$
numActiveColumnsPerInhArea
An alternate way to control the density of the active columns. If numActiveColumnsPerInhArea is specified then localAreaDensity must be less than 0, and vice versa. When using numActiveColumnsPerInhArea, the inhibition logic will insure that at most 'numActiveColumnsPerInhArea' columns remain ON within a local inhibition area (the size of which is set by the internally calculated inhibitionRadius, which is in turn determined from the average size of the connected receptive fields of all columns). When using this method, as columns learn and grow their effective receptive fields, the inhibitionRadius will grow, and hence the net density of the active columns will *decrease*. This is in contrast to the localAreaDensity method, which keeps the density of active columns the same regardless of the size of their receptive fields.
stimulusThreshold
This is a number specifying the minimum number of synapses that must be on in order for a columns to turn ON. The purpose of this is to prevent noise input from activating columns. Specified as a percent of a fully grown synapse.
synPermInactiveDec
The amount by which an inactive synapse is decremented in each round. Specified as a percent of a fully grown synapse.
synPermActiveInc
The amount by which an active synapse is incremented in each round. Specified as a percent of a fully grown synapse.
synPermConnected
The default connected threshold. Any synapse whose permanence value is above the connected threshold is a "connected synapse", meaning it can contribute to the cell's firing.
minPctOverlapDutyCycle
A number between 0 and 1.0, used to set a floor on how often a column should have at least stimulusThreshold active inputs. Periodically, each column looks at the overlap duty cycle of all other columns within its inhibition radius and sets its own internal minimal acceptable duty cycle to: minPctDutyCycleBeforeInh * max(other columns' duty cycles). On each iteration, any column whose overlap duty cycle falls below this computed value will get all of its permanence values boosted up by synPermActiveInc. Raising all permanences in response to a sub-par duty cycle before inhibition allows a cell to search for new inputs when either its previously learned inputs are no longer ever active, or when the vast majority of them have been "hijacked" by other columns.
dutyCyclePeriod
The period used to calculate duty cycles. Higher values make it take longer to respond to changes in boost or synPerConnectedCell. Shorter values make it more unstable and likely to oscillate.
boostStrength
A number greater or equal than 0.0, used to control the strength of boosting. No boosting is applied if it is set to 0. Boosting strength increases as a function of boostStrength. Boosting encourages columns to have similar activeDutyCycles as their neighbors, which will lead to more efficient use of columns. However, too much boosting may also lead to instability of SP outputs.
seed
Seed for our own pseudo-random number generator.
spVerbosity
spVerbosity level: 0, 1, 2, or 3
wrapAround
Determines if inputs at the beginning and end of an input dimension should be considered neighbors when mapping columns to inputs.

Let's create a SpatialPooler to match our encoder's output. Our input dimension must be of the same size as our output dimension (columnDimension), for example if input is (x, x) the output should be (y, y). Let's define out input dimension as (10, 10) to match n=100, and our output dimension to be a bit smaller, (5, 5)

To learn more about Spatial Poolers, read the BAmL documentation and watch HTM School episodes 7, 8

sp = SpatialPooler(
    inputDimensions=(5,25), #enc.encode(0).shape,
    columnDimensions=(5, 10),
    potentialRadius=16,
    potentialPct=0.5,
    globalInhibition=False,
    localAreaDensity=-1.0,
    numActiveColumnsPerInhArea=10.0,
    stimulusThreshold=0,
    synPermInactiveDec=0.008,
    synPermActiveInc=0.05,
    synPermConnected=0.10,
    minPctOverlapDutyCycle=0.001,
    dutyCyclePeriod=1000,
    boostStrength=0.0,
    seed=-1,
    spVerbosity=0,
    wrapAround=True
)

Also, in order to use SpatialPooler we need to create an array whose size is equal to the number of columns, where SP will store its output.

active_array = np.zeros(np.prod(sp.getColumnDimensions()))

Temporal Memory

Next step is Temporal Memory.

columnDimensions
Dimensions of the column space
cellsPerColumn
Number of cells per column
activationThreshold
If the number of active connected synapses on a segment is at least this threshold, the segment is said to be active.
initialPermanence
Initial permanence of a new synapse
connectedPermanence
If the permanence value for a synapse is greater than this value, it is said to be connected.
minThreshold
If the number of potential synapses active on a segment is at least this threshold, it is said to be "matching" and is eligible for learning.
maxNewSynapseCount
The maximum number of synapses added to a segment during learning.
permanenceIncrement
Amount by which permanences of synapses are incremented during learning.
permanenceDecrement
Amount by which permanences of synapses are decremented during learning.
predictedSegmentDecrement
Amount by which segments are punished for incorrect predictions.
maxSegmentsPerCell
The maximum number of segments per cell.
maxSynapsesPerSegment
The maximum number of synapses per segment.
seed
Seed for the random number generator.

Let's initialize a TM with column dimensions 50x50 and depth 5. The workflow of TM is following: we feed it active indices calculated by SP and get TM.

To learn more about TemporalMemory, check out HTM School episode 11,

tm = TemporalMemory(
    columnDimensions=(50,50),
    cellsPerColumn=5,
    activationThreshold=13,
    initialPermanence=0.21,
    connectedPermanence=0.50,
    minThreshold=10,
    maxNewSynapseCount=20,
    permanenceIncrement=0.10,
    permanenceDecrement=0.10,
    predictedSegmentDecrement=0.0,
    maxSegmentsPerCell=255,
    maxSynapsesPerSegment=255,
    seed=42
)

Anomaly Detector

The following params are available

slidingWindowSize
how many elements are summed up; enables moving average on final anomaly score
mode
how to compute anomaly: 'pure' - raw anomaly score, 'likelihood' - probability of receiving this value and anomalyScore, 'weighted' - multiplies the likelihood result with the raw anomaly score that was used to generate the likelihood
binaryAnomalyThreshold
if set [0,1] anomaly score will be discretized to 1/0 (1 if >= binaryAnomalyThreshold) The transformation is applied after moving average is computed
anomaly = Anomaly(
    slidingWindowSize=None,
    mode='pure',
    binaryAnomalyThreshold=None
)

SDR Classifier

Classifier is used not only to predict labels, but to perform regression as well. After initializing the classifier, we need to fit it a very particular pattern of data, which it learns and outputs predicted labels. The reason for such peculiar behavior is that its not possible to decode outputs from Spatial Poolers and Temporal Memory back into SDR.

To initalize the classifier, you can use the following parameters:

steps
Sequence of the different steps of multi-step predictions to learn
alpha
The alpha used to adapt the weight matrix during learning. A larger alpha results in faster adaptation to the data.
actValueAlpha
Used to track the actual value within each bucket. A lower actValueAlpha results in longer term memory
verbosity
verbosity: 0, 1, 2

In order to get the prediction for the next variable, we need to train it with the following data by calling compute method

recordNum
Record number of this input pattern. Record numbers normally increase sequentially by 1 each time unless there are missing records in the dataset. Knowing this information insures that we don’t get confused by missing records.
patternNZ
List of the active indices from the output below. When the input is from TemporalMemory, this list should be the indices of the active cells.
classification
Dict of the classification information where:
  • bucketIdx: index of the encoder bucket, could be obtained by calling getBucketIndices from encoder
  • actValue: actual value going into the encoder
  • Classification could be None for inference mode.
learn
if true, learn this sample
infer
if true, perform inference

If you just want to do inference, you can call inference method

Both methods return the following:

classifier = SDRClassifier(steps=[1], alpha=0.1, actValueAlpha=0.1, verbosity=0)

Now, let's push some data through our pipeline and see how it goes

anomaly_scores = []
predictions = []
next_t_columns = [] # temp column to keep previous predicted columns
for x in tqdm_notebook(xrange(df.shape[0])):
    sp.compute(inputVector=enc.encode(df.values[x][0]), learn=True, activeArray=active_array)
    tm.compute(np.where(active_array == 1)[0])
    anomaly_score = anomaly.compute(activeColumns=tm.getActiveCells(), 
                   predictedColumns=next_t_columns)
    next_t_columns = tm.getPredictiveCells()
    cl_ = classifier.compute(
        recordNum=x,
        patternNZ=tm.getActiveCells(), 
        classification = {
            "bucketIdx": enc.getBucketIndices(df.values[x][0])[0], 
            "actValue": df.values[x][0]
        }, 
        learn=True, 
        infer=True
    )
    top_prediction = sorted(zip(cl_[1], cl_["actualValues"]), reverse=True)[0][1] # returns: likelihood, value
    anomaly_scores.append(anomaly_score)
    predictions.append(top_prediction)
df['Prediction'] = predictions
anomaly_threshold = 2
z_scores = np.abs((anomaly_scores - np.mean(anomaly_scores))/np.std(anomaly_scores, ddof=1))
anomalies = np.where(z_scores > anomaly_threshold)[0]
# Plot
fig = plt.figure(figsize=(80,10))
ax = plt.gca()

# Format Dates
ax.xaxis.set_major_formatter(mdates.DateFormatter('%m/%Y'))
ax.xaxis.set_major_locator(mdates.MonthLocator())

# Remove Spines
ax.spines["top"].set_visible(False)    
ax.spines["bottom"].set_visible(False)    
ax.spines["right"].set_visible(False)    
ax.spines["left"].set_visible(False)  

# Set ticks
ax.get_xaxis().tick_bottom()    
ax.get_yaxis().tick_left()  

# Set X-limit
plt.xlim(df.iloc[0].name, df.iloc[-1].name)


# Draw horizontal lines
for y in xrange(0, 300, 25):    
    plt.plot((df.iloc[0].name, df.iloc[-1].name), (y, y), "--", lw=0.5, color="#333333", alpha=0.3) 

plt.xlabel("Date", fontsize=16)  
plt.ylabel("Sun Spots", fontsize=16)  
plt.title("Fault Detection Using online LSTM-based Predictive Data Model")
# Plot data
ax.plot(df['SunSpots'], 'k', label="Sun Spots")
ax.plot(df['Prediction'], 'r--', label="Predicted Values")
ax.scatter(df.ix[anomalies].index.values, 
           df.ix[anomalies]['SunSpots'].values, 
           edgecolors='#fc7e7e', facecolors='#f44242', label='Anomalies', lw=1, s=200, alpha=0.8)
ax.legend(frameon=False)
plt.savefig('NuPIC_Tutorial.png', format='png', dpi=300)

png

nupic_tutorial's People

Stargazers

Roman avatar

Watchers

James Cloos avatar Mehdi Dali. avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.