Numenta's NuPIC Algorithm API tutorial

The Numenta Platform for Intelligent Computing (NuPIC) is a machine intelligence platform that implements the HTM learning algorithms. HTM is a detailed computational theory of the neocortex. At the core of HTM are time-based continuous learning algorithms that store and recall spatial and temporal patterns. NuPIC is suited to a variety of problems, particularly anomaly detection and prediction of streaming data sources. For more information, see numenta.org or the NuPIC Forum.

Today, we are going to implement anomaly detection and forecasting algorithm using Nupic's base classes.

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
import pandas as pd
from tqdm import tnrange, tqdm_notebook


from nupic.encoders import ScalarEncoder
from nupic.research.spatial_pooler import SpatialPooler
from nupic.research.temporal_memory import TemporalMemory
from nupic.algorithms.anomaly import Anomaly
from nupic.algorithms.sdr_classifier import SDRClassifier

%matplotlib inline

!wget -c -b http://www-personal.umich.edu/~mejn/cp/data/sunspots.txt

Continuing in background, pid 24640.
Output will be written to ‘wget-log.1’.

df = pd.read_csv('sunspots.txt', sep='\t', names=['Months', 'SunSpots'], header=None, index_col='Months')

df.head()

	SunSpots
Months
0	58.0
1	62.6
2	70.0
3	55.7
4	85.0

NuPIC

For this example, we will try to implement nupic pipeline using barebone classes. The goal is to forecast anomalies from the given dataset. We will be using the sunspots per month dataset that you can get here

The NuPIC pipeline looks like this:

Encoding data into SDR using Encoders
Passing encoded data through the Spatial Pooler
Running Temporal Memory over the Spatial Pooler’s active columns
Extracting predictions using a Classifier
Extracting anomalies using Anomaly and AnomalyLikelihood

Encoder

Firstly, we need to set up an encoder. Since we have a limited set of data, we will be using ScalarEncoder. We could have also gone for the AdaptiveScalarEncoder, and the only difference between the two is that AdaptiveScalarEncoder doesn't need to know the min and max values of your data

When creating a Scalar Encoder, we need to supply it the following parameters.

w: The number of bits that are set to encode a single value in the output (applied to 0s and 1s). How many 1's you are going to get. w has to be an odd number greater than 21
minval: Minimal value in your dataset
maxval: Maximum value in your dataset
periodic: If true, then the input value "wraps around" such that minval=maxval, basically imagine a sin() function.
n: Total length of the output, aka number of columns. For example, if you have a layer 32x32, n=32^2
radius: Applies to input. Two inputs separated by more than the radius have non-overlapping representations. Two inputs separated by less than the radius will in general overlap in at least some of their bits. You can think of this as the radius of the input. Default is 0. What this means, is that if you have a radius of 2: 2,4,8, etc will not share any bits
resolution: Applies to input. Two inputs separated by greater than, or equal to the resolution are guaranteed to have different representations. For example, with resolution=1, 1 and 2 will have different representation, but not 1 and 1.5. Default is 0.
name: Name of the encoder
verbosity: How much info to output: 0, 1, 2
clipInput: If true, non-periodic inputs smaller than minval or greater than maxval will be clipped to minval/maxval
forced: Skip safety checks

Some formuals

Range: $range = maxval - minval$
Half-width: $h = (w-1)/2$
Resolution: $resolution = radius / w$
Output length for periodic input: $n = w * range/radius$ or $n=w^2 * range * resolution$
Output length for non-periodic input: $n = w * range/radius + 2 * h$

As you can see, n, resolution and radius are mutually dependant. Hence, you can only provide one of those.

To learn more about encoders, read the official documentation and watch HTM School episodes 0, 1, 2, 3, 4, 5, 6

Let's create a scalar encoder with w=21 and n=100

enc = ScalarEncoder(
    w=21, 
    minval=df.min().values, 
    maxval=df.max().values, 
    periodic=False, 
    n=125, 
    #radius=0, 
    #resolution=1, 
    name="sunspots", 
    verbosity=0, 
    clipInput=False, 
    forced=False)

Spatial Pooler

Now that we have an encoder, let's set up a Spatial Pooler. The most fundamental function of the spatial pooler is to convert a region’s input into a sparse pattern. This function is important because the mechanism used to learn sequences and make predictions requires starting with sparse distributed patterns.

Spatial Pooler supports the following hyperparameters:

inputDimensions: A sequence representing the dimensions of the input vector. Format is (height, width, depth, ...)
columnDimensions: A sequence representing the dimensions of the columns in the region. Format is (height, width, depth, ...), where each value represents the size of the dimension. For a topology of one dimension with 2000 columns use 2000, or (2000,). For a three dimensional topology of 32x64x16 use (32, 64, 16).
potentialRadius: This parameter determines the extent of the input that each column can potentially be connected to. This can be thought of as the input bits that are visible to each column, or a 'receptiveField' of the field of vision. A large enough value will result in 'global coverage', meaning that each column can potentially be connected to every input bit. This parameter defines a square (or hyper square) area: a column will have a max square potential pool with sides of length $2 * potentialRadius + 1$.
potentialPct: The percent of the inputs, within a column's potential radius, that a column can be connected to. If set to 1, the column will be connected to every input within its potential radius. This parameter is used to give each column a unique potential pool when a large potentialRadius causes overlap between the columns. At initialization time we choose $((2*potentialRadius + 1)^(# inputDimensions) * potentialPct)$ input bits to comprise the column's potential pool.
globalInhibition: If true, then during inhibition phase the winning columns are selected as the most active columns from the region as a whole. Otherwise, the winning columns are selected with respect to their local neighborhoods. Using global inhibition boosts performance x60. To learn more about inhibition and topology, watch this HTM School episode.
localAreaDensity: The desired density of active columns within a local inhibition area (the size of which is set by the internally calculated inhibitionRadius, which is in turn determined from the average size of the connected potential pools of all columns). The inhibition logic will insure that at most N columns remain ON within a local inhibition area, where $N = localAreaDensity * (total number of columns in inhibition area)$
numActiveColumnsPerInhArea: An alternate way to control the density of the active columns. If numActiveColumnsPerInhArea is specified then localAreaDensity must be less than 0, and vice versa. When using numActiveColumnsPerInhArea, the inhibition logic will insure that at most 'numActiveColumnsPerInhArea' columns remain ON within a local inhibition area (the size of which is set by the internally calculated inhibitionRadius, which is in turn determined from the average size of the connected receptive fields of all columns). When using this method, as columns learn and grow their effective receptive fields, the inhibitionRadius will grow, and hence the net density of the active columns will *decrease*. This is in contrast to the localAreaDensity method, which keeps the density of active columns the same regardless of the size of their receptive fields.
stimulusThreshold: This is a number specifying the minimum number of synapses that must be on in order for a columns to turn ON. The purpose of this is to prevent noise input from activating columns. Specified as a percent of a fully grown synapse.
synPermInactiveDec: The amount by which an inactive synapse is decremented in each round. Specified as a percent of a fully grown synapse.
synPermActiveInc: The amount by which an active synapse is incremented in each round. Specified as a percent of a fully grown synapse.
synPermConnected: The default connected threshold. Any synapse whose permanence value is above the connected threshold is a "connected synapse", meaning it can contribute to the cell's firing.
minPctOverlapDutyCycle: A number between 0 and 1.0, used to set a floor on how often a column should have at least stimulusThreshold active inputs. Periodically, each column looks at the overlap duty cycle of all other columns within its inhibition radius and sets its own internal minimal acceptable duty cycle to: minPctDutyCycleBeforeInh * max(other columns' duty cycles). On each iteration, any column whose overlap duty cycle falls below this computed value will get all of its permanence values boosted up by synPermActiveInc. Raising all permanences in response to a sub-par duty cycle before inhibition allows a cell to search for new inputs when either its previously learned inputs are no longer ever active, or when the vast majority of them have been "hijacked" by other columns.
dutyCyclePeriod: The period used to calculate duty cycles. Higher values make it take longer to respond to changes in boost or synPerConnectedCell. Shorter values make it more unstable and likely to oscillate.
boostStrength: A number greater or equal than 0.0, used to control the strength of boosting. No boosting is applied if it is set to 0. Boosting strength increases as a function of boostStrength. Boosting encourages columns to have similar activeDutyCycles as their neighbors, which will lead to more efficient use of columns. However, too much boosting may also lead to instability of SP outputs.
seed: Seed for our own pseudo-random number generator.
spVerbosity: spVerbosity level: 0, 1, 2, or 3
wrapAround: Determines if inputs at the beginning and end of an input dimension should be considered neighbors when mapping columns to inputs.

Let's create a SpatialPooler to match our encoder's output. Our input dimension must be of the same size as our output dimension (columnDimension), for example if input is (x, x) the output should be (y, y). Let's define out input dimension as (10, 10) to match n=100, and our output dimension to be a bit smaller, (5, 5)

To learn more about Spatial Poolers, read the BAmL documentation and watch HTM School episodes 7, 8

sp = SpatialPooler(
    inputDimensions=(5,25), #enc.encode(0).shape,
    columnDimensions=(5, 10),
    potentialRadius=16,
    potentialPct=0.5,
    globalInhibition=False,
    localAreaDensity=-1.0,
    numActiveColumnsPerInhArea=10.0,
    stimulusThreshold=0,
    synPermInactiveDec=0.008,
    synPermActiveInc=0.05,
    synPermConnected=0.10,
    minPctOverlapDutyCycle=0.001,
    dutyCyclePeriod=1000,
    boostStrength=0.0,
    seed=-1,
    spVerbosity=0,
    wrapAround=True
)

Also, in order to use SpatialPooler we need to create an array whose size is equal to the number of columns, where SP will store its output.

active_array = np.zeros(np.prod(sp.getColumnDimensions()))

Temporal Memory

Next step is Temporal Memory.

columnDimensions: Dimensions of the column space
cellsPerColumn: Number of cells per column
activationThreshold: If the number of active connected synapses on a segment is at least this threshold, the segment is said to be active.
initialPermanence: Initial permanence of a new synapse
connectedPermanence: If the permanence value for a synapse is greater than this value, it is said to be connected.
minThreshold: If the number of potential synapses active on a segment is at least this threshold, it is said to be "matching" and is eligible for learning.
maxNewSynapseCount: The maximum number of synapses added to a segment during learning.
permanenceIncrement: Amount by which permanences of synapses are incremented during learning.
permanenceDecrement: Amount by which permanences of synapses are decremented during learning.
predictedSegmentDecrement: Amount by which segments are punished for incorrect predictions.
maxSegmentsPerCell: The maximum number of segments per cell.
maxSynapsesPerSegment: The maximum number of synapses per segment.
seed: Seed for the random number generator.

Let's initialize a TM with column dimensions 50x50 and depth 5. The workflow of TM is following: we feed it active indices calculated by SP and get TM.

To learn more about TemporalMemory, check out HTM School episode 11,

tm = TemporalMemory(
    columnDimensions=(50,50),
    cellsPerColumn=5,
    activationThreshold=13,
    initialPermanence=0.21,
    connectedPermanence=0.50,
    minThreshold=10,
    maxNewSynapseCount=20,
    permanenceIncrement=0.10,
    permanenceDecrement=0.10,
    predictedSegmentDecrement=0.0,
    maxSegmentsPerCell=255,
    maxSynapsesPerSegment=255,
    seed=42
)

Anomaly Detector

The following params are available

slidingWindowSize: how many elements are summed up; enables moving average on final anomaly score
mode: how to compute anomaly: 'pure' - raw anomaly score, 'likelihood' - probability of receiving this value and anomalyScore, 'weighted' - multiplies the likelihood result with the raw anomaly score that was used to generate the likelihood
binaryAnomalyThreshold: if set [0,1] anomaly score will be discretized to 1/0 (1 if >= binaryAnomalyThreshold) The transformation is applied after moving average is computed

anomaly = Anomaly(
    slidingWindowSize=None,
    mode='pure',
    binaryAnomalyThreshold=None
)

SDR Classifier

Classifier is used not only to predict labels, but to perform regression as well. After initializing the classifier, we need to fit it a very particular pattern of data, which it learns and outputs predicted labels. The reason for such peculiar behavior is that its not possible to decode outputs from Spatial Poolers and Temporal Memory back into SDR.

To initalize the classifier, you can use the following parameters:

steps: Sequence of the different steps of multi-step predictions to learn
alpha: The alpha used to adapt the weight matrix during learning. A larger alpha results in faster adaptation to the data.
actValueAlpha: Used to track the actual value within each bucket. A lower actValueAlpha results in longer term memory
verbosity: verbosity: 0, 1, 2

In order to get the prediction for the next variable, we need to train it with the following data by calling compute method

recordNum

Record number of this input pattern. Record numbers normally increase sequentially by 1 each time unless there are missing records in the dataset. Knowing this information insures that we don’t get confused by missing records.

patternNZ

List of the active indices from the output below. When the input is from TemporalMemory, this list should be the indices of the active cells.

classification

Dict of the classification information where:

bucketIdx: index of the encoder bucket, could be obtained by calling getBucketIndices from encoder
actValue: actual value going into the encoder
Classification could be None for inference mode.

learn

if true, learn this sample

infer

if true, perform inference

If you just want to do inference, you can call inference method

Both methods return the following:

classifier = SDRClassifier(steps=[1], alpha=0.1, actValueAlpha=0.1, verbosity=0)

Now, let's push some data through our pipeline and see how it goes

anomaly_scores = []
predictions = []
next_t_columns = [] # temp column to keep previous predicted columns

for x in tqdm_notebook(xrange(df.shape[0])):
    sp.compute(inputVector=enc.encode(df.values[x][0]), learn=True, activeArray=active_array)
    tm.compute(np.where(active_array == 1)[0])
    anomaly_score = anomaly.compute(activeColumns=tm.getActiveCells(), 
                   predictedColumns=next_t_columns)
    next_t_columns = tm.getPredictiveCells()
    cl_ = classifier.compute(
        recordNum=x,
        patternNZ=tm.getActiveCells(), 
        classification = {
            "bucketIdx": enc.getBucketIndices(df.values[x][0])[0], 
            "actValue": df.values[x][0]
        }, 
        learn=True, 
        infer=True
    )
    top_prediction = sorted(zip(cl_[1], cl_["actualValues"]), reverse=True)[0][1] # returns: likelihood, value
    anomaly_scores.append(anomaly_score)
    predictions.append(top_prediction)

df['Prediction'] = predictions

anomaly_threshold = 2
z_scores = np.abs((anomaly_scores - np.mean(anomaly_scores))/np.std(anomaly_scores, ddof=1))
anomalies = np.where(z_scores > anomaly_threshold)[0]

# Plot
fig = plt.figure(figsize=(80,10))
ax = plt.gca()

# Format Dates
ax.xaxis.set_major_formatter(mdates.DateFormatter('%m/%Y'))
ax.xaxis.set_major_locator(mdates.MonthLocator())

# Remove Spines
ax.spines["top"].set_visible(False)    
ax.spines["bottom"].set_visible(False)    
ax.spines["right"].set_visible(False)    
ax.spines["left"].set_visible(False)  

# Set ticks
ax.get_xaxis().tick_bottom()    
ax.get_yaxis().tick_left()  

# Set X-limit
plt.xlim(df.iloc[0].name, df.iloc[-1].name)


# Draw horizontal lines
for y in xrange(0, 300, 25):    
    plt.plot((df.iloc[0].name, df.iloc[-1].name), (y, y), "--", lw=0.5, color="#333333", alpha=0.3) 

plt.xlabel("Date", fontsize=16)  
plt.ylabel("Sun Spots", fontsize=16)  
plt.title("Fault Detection Using online LSTM-based Predictive Data Model")
# Plot data
ax.plot(df['SunSpots'], 'k', label="Sun Spots")
ax.plot(df['Prediction'], 'r--', label="Predicted Values")
ax.scatter(df.ix[anomalies].index.values, 
           df.ix[anomalies]['SunSpots'].values, 
           edgecolors='#fc7e7e', facecolors='#f44242', label='Anomalies', lw=1, s=200, alpha=0.8)
ax.legend(frameon=False)
plt.savefig('NuPIC_Tutorial.png', format='png', dpi=300)

mehdi-dali / nupic_tutorial Goto Github PK

nupic_tutorial's Introduction

Numenta's NuPIC Algorithm API tutorial

NuPIC

Encoder

Spatial Pooler

Temporal Memory

Anomaly Detector

SDR Classifier

nupic_tutorial's People

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent