shreyans29 / thesemicolon Goto Github PK

This repository contains Ipython notebooks and datasets for the data analytics youtube tutorials on The Semicolon.

Home Page: https://www.youtube.com/c/thesemicolon

Jupyter Notebook 98.08% Python 1.92%

neural-network pandas-tutorial semicolon scikit perceptron gradient-descent keras scikit-learn matplotlib numpy

thesemicolon's Introduction

The Semicolon

This repository contains the Ipython Notebooks to the Data Analytics youtube tutorials on The Semicolon. The youtube link for the tutorials :https://www.youtube.com/c/thesemicolon

The following Ipython notebooks are available on this repository.

Python for Data Analytics

Deep Learning with Keras

Apart from this the datasets used are housing.csv, mnist.csv and smsspam

thesemicolon's People

Contributors

Stargazers

Watchers

Forkers

babesier laksmi1940 amjadhisham prafulanand ehfo0 vdt biranchi2018 mas-dse-dmysoren afsanq dheriman nadaa 0xdaksh adityacooljain jablkoadj megamanics ardittot paulocoehlo1992 phpmind ramaswamym1987 abhibisht89 shivmah harsha10d kwfe coco0303a steviezissou skammagithub royd1990 ashwinthotads singhcse sunildsk tejamukka rahulbagad kakkartushar1 appseamr msd495 franciscotavares aj470 shravankumar147 reger-men deepak-mane markluro ramsane gustavobruges istiyakv lehider pratipkhandelwal bala4359 shivichan ajay1994 gypsysunny parikshitdeshmukh siva2k16 fitrialif seanreed1111 cssd1983 shubham398 premy990 dezhili d1m1tr1s92 vaverka pyknife mikekane2112 mansiarora1009 kevinbsc zhf459 watkyns pratikpitreds shubhampachori12110095 androiddevdeepan newjenl jbdatascience nalinc james-fu victe7 zermeno98 hasan808 mujahed85 imachocolateman rbhambriiit supereng jishmisc28 faizalam pyradd alankarpadman marsela anish509 hafloresc iosnewbie2016 jenalgit vaibhaw2731 wasanthasampath deneshkumar rishabhag aishurs puneet-kr anuragsidana ugastudent navoneel1092283 cod3r0k sintu-kumar

thesemicolon's Issues

(unicode error)

hii..I am new to this word2vec world. when I ran your code,it shows error. can you please help me. I have uploaded the error snap.Any help can be appreciated.

Error in chat.py

when I run chat.py (Tensorflow backend) I am getting this!

Optimizer weight shape (1200,) not compatible with provided weight shape (300, 300)

error utf-8 on python 3.5 on Anaconda Spyder

error utf-8 on python 3.5 on Anaconda Spyder on Windows 10
tok_corp= [nltk.word_tokenize(sent.decode('utf-8')) for sent in corpus]

\Users\Suyog\Desktop\python>python chat.py
C:\Users\Suyog\AppData\Local\Programs\Python\Python35\lib\site-packages\gensim\utils.py:1212: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
Using TensorFlow backend.
WARNING (theano.configdefaults): g++ not available, if using conda: conda install m2w64-toolchain
C:\Users\Suyog\AppData\Local\Programs\Python\Python35\lib\site-packages\theano\configdefaults.py:560: UserWarning: DeprecationWarning: there is no c++ compiler.This is deprecated and with Theano 0.11 a c++ compiler will be mandatory
warnings.warn("DeprecationWarning: there is no c++ compiler."
WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Traceback (most recent call last):
File "chat.py", line 22, in
model=load_model(f1)
File "C:\Users\Suyog\AppData\Local\Programs\Python\Python35\lib\site-packages\keras\engine\saving.py", line 419, in load_model
model = _deserialize_model(f, custom_objects, compile)
File "C:\Users\Suyog\AppData\Local\Programs\Python\Python35\lib\site-packages\keras\engine\saving.py", line 221, in _deserialize_model
model_config = f['model_config']
File "C:\Users\Suyog\AppData\Local\Programs\Python\Python35\lib\site-packages\keras\utils\io_utils.py", line 302, in getitem
raise ValueError('Cannot create group in read only mode.')
ValueError: Cannot create group in read only mode.

ValueError: setting an array element with a sequence.

@shreyans29 I am getting error "ValueError: setting an array element with a sequence." I used your conversation dataset itself, any idea why so?

error

File "/home/sachin/Desktop/thesemicolon-master/chatbotPreprocessing.py", line 26, in
model = gensim.models.Word2Vec.load('/home/sachin/Desktop/thesemicolon-master/enwiki_dbow/doc2vec.bin');
File "/usr/local/lib/python2.7/dist-packages/gensim/models/word2vec.py", line 979, in load
return load_old_word2vec(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/gensim/models/deprecated/word2vec.py", line 172, in load_old_word2vec
'batch_words': old_model.batch_words,
AttributeError: 'Doc2Vec' object has no attribute 'batch_words'
[Finished in 11.8s with exit code 1]

problem in running simple LSTM

the error occurred is:
Traceback (most recent call last):
File "/home/srinath/char_gen.py", line 63, in
model.add(LSTM(256, input_shape=(x.shape[2], x.shape[1]), return_sequences=True))
File "/home/srinath/anaconda3/lib/python3.6/site-packages/keras/engine/sequential.py", line 166, in add
layer(x)
File "/home/srinath/anaconda3/lib/python3.6/site-packages/keras/layers/recurrent.py", line 500, in call
return super(RNN, self).call(inputs, **kwargs)
File "/home/srinath/anaconda3/lib/python3.6/site-packages/keras/engine/base_layer.py", line 460, in call
output = self.call(inputs, **kwargs)
File "/home/srinath/anaconda3/lib/python3.6/site-packages/keras/layers/recurrent.py", line 2112, in call
initial_state=initial_state)
File "/home/srinath/anaconda3/lib/python3.6/site-packages/keras/layers/recurrent.py", line 609, in call
input_length=timesteps)
File "/home/srinath/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2957, in rnn
maximum_iterations=input_length)
TypeError: while_loop() got an unexpected keyword argument 'maximum_iterations'
ERROR:tensorflow:==================================
Object was never used (type <class 'tensorflow.python.ops.tensor_array_ops.TensorArray'>):
<tensorflow.python.ops.tensor_array_ops.TensorArray object at 0x7f7901757dd8>
If you want to mark it as used call its "mark_used()" method.
It was originally created here:
['File "/home/srinath/char_gen.py", line 63, in \n model.add(LSTM(256, input_shape=(x.shape[2], x.shape[1]), return_sequences=True))', 'File "/home/srinath/anaconda3/lib/python3.6/site-packages/keras/engine/sequential.py", line 166, in add\n layer(x)', 'File "/home/srinath/anaconda3/lib/python3.6/site-packages/keras/layers/recurrent.py", line 500, in call\n return super(RNN, self).call(inputs, **kwargs)', 'File "/home/srinath/anaconda3/lib/python3.6/site-packages/keras/engine/base_layer.py", line 460, in call\n output = self.call(inputs, **kwargs)', 'File "/home/srinath/anaconda3/lib/python3.6/site-packages/keras/layers/recurrent.py", line 2112, in call\n initial_state=initial_state)', 'File "/home/srinath/anaconda3/lib/python3.6/site-packages/keras/layers/recurrent.py", line 609, in call\n input_length=timesteps)', 'File "/home/srinath/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2877, in rnn\n input_ta = input_ta.unstack(inputs)', 'File "/home/srinath/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/tf_should_use.py", line 170, in wrapped\n return _add_should_use_warning(fn(*args, **kwargs))', 'File "/home/srinath/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/tensor_array_ops.py", line 413, in unstack\n indices=math_ops.range(0, num_elements), value=value, name=name)', 'File "/home/srinath/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/tf_should_use.py", line 170, in wrapped\n return _add_should_use_warning(fn(*args, **kwargs))', 'File "/home/srinath/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/tf_should_use.py", line 139, in _add_should_use_warning\n wrapped = TFShouldUseWarningWrapper(x)', 'File "/home/srinath/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/tf_should_use.py", line 96, in init\n stack = [s.strip() for s in traceback.format_stack()]']

sequence item 0: expected str instance, bytes found

Here's my code for python3, the above mentioned error is persisting by all the means I'm trying. Can you review it up once?

#-- coding: utf-8 --

import time
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import json
from textblob import TextBlob
import matplotlib.pyplot as plt
import re

import test

ckey=test.ckey
csecret=test.csecret
atoken=test.atoken
asecret=test.asecret

def calctime(a):
    return time.time()-a

class listener(StreamListener):
    
    def on_data(self,data):
        global initime
        t=int(calctime(initime))
        all_data=json.loads(data)
        tweet=all_data["text"].encode("utf-8")
        # tweet=all_data["text"].encode("utf-8")     
        tweet=all_data["text"].strip() 
        #username=all_data["user"]["screen_name"]
        tweet=" ".join(re.findall("[a-zA-Z]+", tweet))
        blob=TextBlob(tweet.strip())

        global positive
        global negative     
        global compound  
        global count
        
        count=count+1
        senti=0
        for sen in blob.sentences:
            senti=senti+sen.sentiment.polarity
            if sen.sentiment.polarity >= 0:
                positive=positive+sen.sentiment.polarity   
            else:
                negative=negative+sen.sentiment.polarity  
        compound=compound+senti        
        print(count)
        print(tweet.strip())
        print(senti)
        print(t)
        print(str(positive) + ' ' + str(negative) + ' ' + str(compound)) 
        
    
        plt.axis([ 0, 70, -20,20])
        plt.xlabel('Time')
        plt.ylabel('Sentiment')
        plt.plot([t],[positive],'go',[t] ,[negative],'ro',[t],[compound],'bo')
        plt.show()
        plt.pause(0.0001)
        if count==200:
            return False
        else:
            return True
        
    def on_error(self,status):
        print(status)

"""str="Donal Trump"
str=str.decode('utf-8')
twitterStream.filter(track=[str])

If this still doesn't work, 
try this 
twitterStream.filter(track=[b"Donald Trump"])

or try adding this on the first line of your file """
# -- coding: utf-8 --

positive=0
negative=0
compound=0

count=0
initime=time.time()
plt.ion()

auth=OAuthHandler(ckey,csecret)
auth.set_access_token(atoken,asecret)

twitterStream=  Stream(auth, listener(count))
#str="Donald Trump"
#str.encode().decode()
#str=str.decode('utf-8')
#twitterStream.filter(track=[str])
twitterStream.filter(track=[b'Donald Trump'])

Getting wrong output

Getting output as 401 401 401 while executing..

TypeError: cannot use a string pattern on a bytes-like object

import time
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import json
from textblob import TextBlob
import matplotlib.pyplot as plt
import re

def calctime(a):
return time.time()-a

positive=0
negative=0
compound=0

count=0
initime=time.time()
plt.ion()

import test

ckey=''
csecret=''
atoken=''
asecret=''

class listener(StreamListener):

def on_data(self,data):
    global initime
    t=int(calctime(initime))
    all_data=json.loads(data)
    tweet=all_data["text"].encode("utf-8")
    #username=all_data["user"]["screen_name"]
    tweet=" ".join(re.findall("[a-zA-Z]+", tweet))
    blob=TextBlob(tweet.strip())

    global positive
    global negative     
    global compound  
    global count
    
    count=count+1
    senti=0
    for sen in blob.sentences:
        senti=senti+sen.sentiment.polarity
        if sen.sentiment.polarity >= 0:
            positive=positive+sen.sentiment.polarity   
        else:
            negative=negative+sen.sentiment.polarity  
    compound=compound+senti        
    print (count)
    print (tweet.strip())
    print (senti)
    print (t)
    print (str(positive) + ' ' + str(negative) + ' ' + str(compound))
    

    plt.axis([ 0, 70, -20,20])
    plt.xlabel('Time')
    plt.ylabel('Sentiment')
    plt.plot([t],[positive],'go',[t] ,[negative],'ro',[t],[compound],'bo')
    plt.show()
    plt.pause(0.0001)
    if count==200:
        return False
    else:
        return True
    
def on_error(self,status):
    print(status)

auth=OAuthHandler(ckey,csecret)
auth.set_access_token(atoken,asecret)

twitterStream= Stream(auth, listener(count))
twitterStream.filter(track=["Donald Trump"])

Accuracy very low and not improving

Hi, I am using your basic LSTM architecture to recreate the chatbot. However, I am using GloVe embedding.
During my training process, my Training accuracy gets stuck at very low values (0.1969) and no progress happens. I am attaching my code below. Can you tell me what can be done to improve the training?

from keras.models import Sequential
from keras.layers import Embedding, Flatten, Dense, LSTM
from keras.optimizers import Adam

#model.reset_states()
model=Sequential()
model.add(Embedding(max_words,embedding_dim,input_length=maxlen))
model.add(LSTM(units=100,return_sequences=True, kernel_initializer="glorot_normal", recurrent_initializer="glorot_normal", activation='sigmoid'))
model.add(LSTM(units=100,return_sequences=True, kernel_initializer="glorot_normal", recurrent_initializer="glorot_normal", activation='sigmoid'))
model.add(LSTM(units=100,return_sequences=True, kernel_initializer="glorot_normal", recurrent_initializer="glorot_normal", activation='sigmoid'))
model.add(LSTM(units=100,return_sequences=True, kernel_initializer="glorot_normal", recurrent_initializer="glorot_normal", activation='sigmoid'))
model.summary()

model.layers[0].set_weights([embedding_matrix])
model.layers[0].trainable = False

model.compile(loss='cosine_proximity', optimizer='adam', metrics=['accuracy'])
#model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
model.fit(x_train, y_train,
epochs = 500,
batch_size = 32,
validation_data=(x_val,y_val))

Epoch 498/500
60/60 [==============================] - 0s 3ms/step - loss: -0.1303 - acc: 0.1969 - val_loss: -0.1785 - val_acc: 0.2909
Epoch 499/500
60/60 [==============================] - 0s 3ms/step - loss: -0.1303 - acc: 0.1969 - val_loss: -0.1785 - val_acc: 0.2909
Epoch 500/500
60/60 [==============================] - 0s 3ms/step - loss: -0.1303 - acc: 0.1969 - val_loss: -0.1785 - val_acc: 0.2909

Further training (on the same conversation data set ) does not improve accuracy.

Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=2

When running the code I get the following error.

Using Theano backend.
model.py:20: UserWarning: Update your `LSTM` call to the Keras 2 API: `LSTM(kernel_initializer="glorot_normal", input_shape=(15,), recurrent_initializer="glorot_normal", units=300, return_sequences=True, activation="sigmoid")`
  model.add(LSTM(output_dim=300,input_shape=x_train.shape[1:],return_sequences=True, init='glorot_normal', inner_init='glorot_normal', activation='sigmoid'))
Traceback (most recent call last):
  File "model.py", line 20, in <module>
    model.add(LSTM(output_dim=300,input_shape=x_train.shape[1:],return_sequences=True, init='glorot_normal', inner_init='glorot_normal', activation='sigmoid'))
  File "/usr/local/lib/python2.7/site-packages/keras/models.py", line 430, in add
    layer(x)
  File "/usr/local/lib/python2.7/site-packages/keras/layers/recurrent.py", line 257, in __call__
    return super(Recurrent, self).__call__(inputs, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 534, in __call__
    self.assert_input_compatibility(inputs)
  File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 433, in assert_input_compatibility
    str(K.ndim(x)))
ValueError: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=2

Twitter sentiment: compute average

Great tool, really! How would you compute average sentiment on processed tweets?

Can you please share the links to these files? Thank you.

"word2vec.bin"
"conversation.json"

os.chdir("D:\semicolon\Deep Learning\chatbot");
model = gensim.models.Word2Vec.load('word2vec.bin');
path2="corpus";
file=open(path2+'/conversation.json');