Strings? about brain.js HOT 20 CLOSED

arjunmenon commented on May 4, 2024

Strings?

from brain.js.

Comments (20)

robertleeplummerjr commented on May 4, 2024 7

Also, fantastic work!

from brain.js.

robertleeplummerjr commented on May 4, 2024 6

A recurrent neural net in some form is probably the way to go here, the reason being is that their whole design is based on a varying length input, where as a traditional neural net generally has a defined sized input. There are ways around this, of course, but are essentially hacky, like: a maximum length input, normalize all characters, and the rest where doesn't take up the rest of the input, set to 0.... But don't do that, too icky.

Try this:

var brain = require('brain.js')
var net = new brain.recurrent.LSTM();
net.train([
  {input: "my unit-tests failed.", output: "software"},
  {input: "tried the program, but it was buggy.", output: "software"},
  {input: "i need a new power supply.", output: "hardware"},
  {input: "the drive has a 2TB capacity.", output: "hardware"},
  //added for less overfitting
  {input: "unit-tests", output: "software"},
  {input: "program", output: "software"},
  {input: "power supply", output: "hardware"},
  {input: "drive", output: "hardware"},
]);

console.log(net.run("drive"));

Try this with a bunch of data, to avoid overfitting, and see what happens.

It, working (careful, it takes time to train, we are working to make that way way way faster): https://jsfiddle.net/g1poj29x/

from brain.js.

MarkusPint commented on May 4, 2024 3

I'm trying the example in @robertleeplummerjr 's Mar 17 post, and the training takes forever. It's been already 4-5 minutes at least. And these strings are pretty short, what would happen with long texts? 500 words? 1000 words?

Any clues to how to overcome this issue? I need to classify texts similarly to @arjunmenon

from brain.js.

NxRoot commented on May 4, 2024 3

I found a better way of working with strings.
Instead of using LSTM() that takes really long time training, we use normal network training, but before we pass it any data we process the text, heres how:

First your training output data must be like this -> my network-language.json:

[   
    {
        "input": "what",
        "output": {"INFORMATION": 1, "COMMAND": 0, "WEATHER": 0}
    },
    {
        "input": "why",
        "output": {"INFORMATION": 1, "COMMAND": 0, "WEATHER": 0}
    },
    {
        "input": "how",
        "output": {"INFORMATION": 1, "COMMAND": 0, "WEATHER": 0}
    },
    {
        "input": "what's the weather",
        "output": {"INFORMATION": 1, "COMMAND": 0, "WEATHER": 0.90}
    },
    {
        "input": "what's the weather in",
        "output": {"INFORMATION": 1, "COMMAND": 0, "WEATHER": 0.90}
    },
    {
        "input": "open google",
        "output": {"INFORMATION": 0, "COMMAND": 1, "WEATHER": 0}
    },
    {
        "input": "open",
        "output": {"INFORMATION": 0, "COMMAND": 1, "WEATHER": 0}
    },
    {
        "input": "do",
        "output": {"INFORMATION": 0, "COMMAND": 1, "WEATHER": 0}
    },
    {
        "input": "execute",
        "output": {"INFORMATION": 0, "COMMAND": 1, "WEATHER": 0}
    },
    {
        "input": "create",
        "output": {"INFORMATION": 0, "COMMAND": 1, "WEATHER": 0}
    }
]

After that we create a class to parse our network strings and decode them for the normal network to recognize them, there's not much to explain here:

class DictionaryService {

    constructor(data){
        this.data = data
        this.dictionary = this.buildWordDictionary(data)
    }

    // build dictionary of recognized words, based on the phrases inside the data file

    buildWordDictionary (trainingData) {
        const tokenisedArray = trainingData.map(item => {
          const tokens = item.input.split(' ')
          return tokens
        })
        
        const flattenedArray = [].concat.apply([], tokenisedArray)
        return flattenedArray.filter((item, pos, self) => self.indexOf(item) == pos)
    }

    // encode strings to numbers

    encode (phrase) {
        const phraseTokens = phrase.split(' ')
        const encodedPhrase = this.dictionary.map(word => phraseTokens.includes(word) ? 1 : 0)
    
        return encodedPhrase
    }

    // encode dataset for training

    encodeTrainingSet(){
        return this.data.map(dataSet => {
            const encodedValue = this.encode(dataSet.input)
            return {input: encodedValue, output: dataSet.output}
        })
    }
}

module.exports = DictionaryService

Then we create a simple class service to start our network

const brain = require('brain.js');
const DictionaryService = require('./dictionary');

class NetworkService {
    net = new brain.NeuralNetwork();

    constructor(data){
        this.dictService = new DictionaryService(data)
        this.trainingSet = this.dictService.encodeTrainingSet()
        this.net.train(this.trainingSet)
        return this
    }

    execute(e){
        // run network iteration
        const encoded = this.dictService.encode(e)
        return this.net.run(encoded)
        // const result = brain.likely(encoded, this.net);
        // return result
    }
  
}

module.exports = NetworkService

Now all we have to do is start our network in index.js

// import services and training data
const NetworkService = require('./nx-core/services/network');
const net_language = require("./network-language.json")

// pass the training data to the service
const net = new NetworkService(net_language)

const myString = "how is the weather in Dubai"

// run network
const iteration = net.execute(myString)
console.log(iteration, "\n")

Result:

{
  INFORMATION: 0.9783821105957031,
  COMMAND: 0.022460367530584335,
  WEATHER: 0.7260972857475281
}

from brain.js.

arjunmenon commented on May 4, 2024 1

First tried encoding the data with a simple BOW vector and then later tried with normal strings.
Label 0 is software and 1 is hardware
All tests run in CLI.

net.train([{input: [ 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ], output: [0]},
           {input: [ 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ], output: [0]},
           {input: [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0 ], output: [1]},
           {input: [ 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1 ], output: [1]},
           //added for less overfitting
           {input: [ 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ], output: [0]},
           {input: [ 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ], output: [0]},
           {input: [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0 ], output: [1]},
           {input: [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0 ], output: [1]}
], {
  errorThresh: 0.005,  
  iterations: 20000,  
  log: true,
  logPeriod: 10, 
  learningRate: 0.3
});

with

var net = new brain.NeuralNetwork();

training

{ error: 0.004978560090833529, iterations: 156 }

and the result was

output = net.run([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0 ]); // "drive"

Float64Array { '0': 0.8837038196102716 } // "should have been 1 - hardware"

==========

with
var net = new brain.recurrent.LSTM();

training resulted a higher error than neural network
{ error: 1.1078238334154429, iterations: 20000 }

> output = net.run([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0 ]);
'1' // correct!

AND BY GOING WITH THE PREVIOUS EXAMPLE WITH STRINGS with LSTM

{ error: 1.1040929337492722, iterations: 20000 }
> output = net.run("drive")
'hardware' // correct!

Observations

Neural network resulted in incorrect output
Neural network training time was marvelously short.
Neural network reported very low margin of error
LSTM resulted in correct output
LSTM training time was way way way too long. And particularly longer if using strings.
LSTM reported higher error as compared to neural network.

Conclusion, waiting when recurrent network would be way way way faster!

PS - I would post result from more tests particularly from the reuters dataset.

from brain.js.

robertleeplummerjr commented on May 4, 2024 1

I've started to address this in the GPU version, I will try and have something out this week. Faster will come soon!

from brain.js.

robertleeplummerjr commented on May 4, 2024 1

You need to encode the numbers as strings using the RNN.

from brain.js.

riatzukiza commented on May 4, 2024

Usually one has to one hot encode a string to use it in a neural network.

…

On Thu, Mar 16, 2017 at 1:48 PM, Arjun Menon ***@***.***> wrote: Hey Would like to know if the inputs objects support text strings. I did something like this var brain = require('brain.js') var net = new brain.NeuralNetwork(); net.train([ {input: "my unit-tests failed.", output: "software"}, {input: "tried the program, but it was buggy.", output: "software"}, {input: "i need a new power supply.", output: "hardware"}, {input: "the drive has a 2TB capacity.", output: "hardware"} ]); It outputs { error: NaN, iterations: 1 } One of the comments says <#47 (comment)> that rnn supports them, so is it only available for it or other models can also make use of strings? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#65>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKLqvXgXlW91UbNRcG5CiofKzcOnSukKks5rmYP9gaJpZM4MfugO> .

from brain.js.

robertleeplummerjr commented on May 4, 2024

May I get some feedback to see if this solved your issue?

from brain.js.

arjunmenon commented on May 4, 2024

Hey
Sorry could not respond. Was juggling between projects.
Would try this approach and expand it for large datasets, by end of day.
By way you have put up the example, I believe the model is too sensitive with stopwords.

from brain.js.

robertleeplummerjr commented on May 4, 2024

This is an outstanding bug, the error rate for recurrent neural net does not equate to that of the standard recurrent neural net. How long did it take to train? One of the things we are working towards is a refactor of the recurrent neural net that will run both much faster on the cpu, and eventually the gpu.

from brain.js.

robertleeplummerjr commented on May 4, 2024

Since we were able to put strings in, I'll go ahead and close this issue for continuity, let me know if that isn't satisfactory.

from brain.js.

robertleeplummerjr commented on May 4, 2024

Can you share your code?

from brain.js.

hellonearthis commented on May 4, 2024

With the text input data, you used full sentences and then key words from the sentences, does that enhance the learning of those patterns and is that how phrases should be learned?

from brain.js.

robertleeplummerjr commented on May 4, 2024

@MarkusPint, would you like to open an issue? I'd like to try and solve for performance if you are up for it.

from brain.js.

binary-person commented on May 4, 2024

The performance does really suffer when it comes to strings in lstm. There's a better alternative of using brain.js though: https://github.com/lordpoint/neural-network-author-classifier but I really really look forward to see lstm perform at least half the speed of the neural network.

In the browser on a chromebook, its even worse, a 8 letter phrase combined with another 6 letter phrase takes about 10 minutes...

from brain.js.

binary-person commented on May 4, 2024

I would just use the tensorflow library. This library is really good for beginners but not for complicated applications

from brain.js.

timendez commented on May 4, 2024

Also coming across slow speeds with strings in LSTM ->
training data (roughly 2k entries like this):

  { input: 'Test', output: 18.77 },
  { input: 'Fchjk', output: 15.5 },
  { input: 'Guest', output: 7.25 },
  { input: 'Guest', output: 13.2 },
  { input: 'Test First', output: 28.25 },
  { input: 'steve', output: 10.25 },
  { input: 'Auto', output: 0.53 },
  { input: 'Test First', output: 9.74 },
  { input: 'Guest', output: 9.5 },
  { input: 'Justin', output: 10.5 },
  { input: 'Guest', output: 9.85 },
  { input: 'Auto', output: 15.54 },

it's about 5 minutes for every 10 iterations

Didn't notice anything about LSTMGPU mode in the docs, but please let me know if I should attempt a faster training method for string inputs.

Thanks!

from brain.js.

timendez commented on May 4, 2024

Thanks @robertleeplummerjr! It's much faster now. I think it may unfortunately still be too slow for my uses.

I was able to refocus my goal, however, and use a {string: bool} map with a low number of keys, so that way both my inputs and outputs would be more uniform. From there I used a standard neural net.

I'll definitely check LSTM back out if there ends up being a GPU version or if LSTMs can even take advantage of multiple cores. Thanks again!

from brain.js.

JIBSIL commented on May 4, 2024

I found a better way of working with strings. Instead of using LSTM() that takes really long time training, we use normal network training, but before we pass it any data we process the text, heres how:

Apologies for necroposting, but what is the significance of using 0.90 instead of 1? Is it to prevent strong correlation with any string with the word "weather" and weather itself? Or because it's not a type?

from brain.js.

Strings? about brain.js HOT 20 CLOSED

Comments (20)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent