michaelznidarsic Goto Github PK

followers: 1.0 following: 3.0 repos: 18.0 gists: 0.0

Name: Michael Znidarsic

Type: User

Bio: Devotee of all things Machine Learning. Computer Vision, Text Mining/NLP, forecasting, optimization, A/B, etc.

Location: SF Bay Area

Michael Znidarsic's Projects

bert

TensorFlow code and pre-trained models for BERT

Predicts news text's reliability with 91%+ validation accuracy. Uses Google BERT encoding as input for a Deep Bidirectional-LSTM Neural Network. Dataset consists of decent-length articles balanced for political leaning and spanning a diverse spectrum of reliability to fit the real-world newsscape. Initial research for this model available at https://github.com/michaelznidarsic/FakeNewsDetection

cicero-impact-of-translation-on-authorial-voice

This study compares how effectively text mining algorithms can classify the addressee of Cicero's letters when given an English translation versus the original Latin.

cs224d

Code for Stanford CS224D: deep learning for natural language understanding

customer-lifetime-value-multichannel-gift-company

customersegmentation

disinformation-topic-modeling

fakenewsdetection

Novel approaches to detecting intentionally fake and willfully misleading news articles. The end result of this study is an ensemble learning binary classifier of news (fake vs. real, or more accurately: unreliable vs. reliable). Attributes fed into the submodels include normalized word frequencies (e.g. TF-IDF), lexical cues, and distributions of word sentiment severity. The formatting of the PowerPoint may have been somewhat distorted in a conversion process. The key source for most of the compiled dataset was several27's excellent FakeNewsCorpus at https://github.com/several27/FakeNewsCorpus

license-detection

Process improvement A/B study for Stairstep Consulting.

marketinganalyticshw2

pixel-importance-image-classification

An exploration of the predictive importance of individual pixels in a deep convolutional neural network using SHAP values. Neural Network architecture inspired by VGG16. Image classification on the Intel Scene Classification dataset available at https://www.kaggle.com/nitishabharathi/scene-classification.

purchaseprediction-customersegmentation

A series of projects all attempting to link customer traits/actions to target behavior. Unsupervised methods including KMeans clustering and Principal Component Analysis are used for Customer Segmentation. Machine Learning models such as XGBoost, RandomForest, SVMs, and Deep Neural Networks are used to predict customer behavior. Datasets are generally from banks or markets.

saferwdemo

speech-recognition-convolutional-nn

Experiment in Speech Recognition on Google's Speech Command Dataset using Tensorflow/Keras. 88%-89% validation accuracy achieved classifying between spoken digits (zero through nine) using MFCC transformation and a deep CNN. Work in progress, a couple preprocessing functions disclaimed as borrowed in the code.

split-gene-classification

A neural network that takes as input a sequence of 60 nitrogenous bases (DNA) and predicts whether the sequence contains an intron/exon boundary (IE), an exon/intron boundary (EI), or neither (N). A maximum validation accuracy of 96.24% was reached. Data obtained at https://archive.ics.uci.edu/ml/datasets/Molecular+Biology+%28Splice-junction+Gene+Sequences%29

syracuseuniversityadmissionsprediction

This study creates and compares the efficacy of several machine learning models for the prediction of whether or not an undergraduate student offered admission at Syracuse University will accept admission. The dataset is proprietary and cannot be shared.

textualentailmentbilstmattention

A Bi-Directional LSTM with Neural Attention and word embeddings. Tackles the difficult problem of Textual Entailment using the Stanford Natural Language Inference (SNLI) corpus. Demonstrates that a 3-class validation accuracy of 76%+ can be obtained on the corpus without resorting to pre-training or recursion/trees. Concept pioneered in "Reasoning about Entailment with Neural Attention" by Rocktäschel et al. Inspiration taken from https://github.com/shyamupa/snli-entailment. Please find data corpus at https://nlp.stanford.edu/projects/snli/

textualentailmentdualembeddedcnn

A 2-input Convolutional Neural Network with word embeddings. Tackles the difficult problem of Textual Entailment using the Stanford Natural Language Inference (SNLI) corpus. Demonstrates that a 3-class validation accuracy of 73%+ can be obtained on the corpus without resorting to pre-training, recursion/trees, attention, or LSTM/RNNs. Please find data corpus at https://nlp.stanford.edu/projects/snli/

michaelznidarsic Goto Github PK

Michael Znidarsic's Projects

Recommend Projects

Recommend Topics

Recommend Org