rrenaud / gibberish-detector Goto Github PK
View Code? Open in Web Editor NEWA small program to detect gibberish using a Markov Chain
License: MIT License
A small program to detect gibberish using a Markov Chain
License: MIT License
Thanks for this code.
I'm curious, where does big.txt come from?
Is there a specific set of datasets you combined to get it?
Thanks for your code, I learn a lot, but I have a question, like following.
change
for i, row in enumerate(counts):
s = float(sum(row))
for j in xrange(len(row)):
row[j] = math.log(row[j] / s)
to
for i, row in enumerate(counts):
s = float(sum(row))
for j in xrange(len(row)):
#row[j] = math.log(row[j] / s)
counts[i][j]=math.log(row[j]/s)
yes?
Thanks again.
Hello ๐
I want to say thanks for the job you did!
I created a line-by-line port of Gibberish-Detector in JavaScript and it is also published in NPM. Would appreciate review or any suggestions. Additionally I'd like to ask if it's possible and appropriate to refer from your README back to my implementation?
Some of the functions are incompatible with Python3 like xrange.
Probably add python3 compatible files in separate folder in repo
It is giving gibberish for word "ok", cn anyone please help me regarding this
I would greatly appreciate it if you could devise some code for the case of "the the the the the the the the the a a a a a" I will send you a small donation if you can effectively solve this problem.
Hi!
Is it possible to include a licence for this repository? Without a licence, it defaults to all rights reserved and cannot be incorporated into other software.
Thanks!
Threshold selection only uses two very small datasets. Is this method too simple? Is there a more appropriate way to select the threshold?
Peter's talk can be found at https://youtu.be/yvDCzhbjYWs, the current link doesn't work for me.
Hello, and thank you for writing this fantastic program!
Do you think there could be a way for this program to output parts of the input that are gibberish, while leaving out the parts of the strings that "make sense" based on its model, instead of simply answering "True" or "False"? Or simply highlight the gibberish parts in some way?
If this was made possible, any string could be parsed to have gibberish parts deleted out. Would be a good way to sanitize a lot of file content that's meant for human reading but has junk strings in it for whatever reason.
Thanks again for writing this in the first place. I think you could expand this project to become something majorly useful. Who wouldn't want to be able to detect gibberish in their strings! I would love to contribute to this project, and I have recently started learning Python so hopefully I can someday, but I tried to understand your code and am too much for a noob yet to do it unfortunately.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.