georgesaussy / systemprogram Goto Github PK
View Code? Open in Web Editor NEWWork relating to the identification of the author of the First System Program of German Idealism
Work relating to the identification of the author of the First System Program of German Idealism
README Introduction This directory should contain all the files and programs necessary to quickly replicate the results documented in the write up. All files should be self-documented in addition to an outline of their use below. Each of the computer experiments run will be documented separately. All binaries were compiled and ran on Ubuntu 14. They may need to be recompiled for another OS. The documentation for the data files is sparse. The data All of the data was taken from "spiegel.de". My understanding is that the site is a German mirror of Project Gutenburg. The data was downloaded by python glue scripts using the Linux terminal browser lynx with the flag "--dump". The directories for each author are located in file in "rawdata" carrying the respective authors names. The original files downloaded were file linking to each of works by each of the respective authors. Those links were then downloaded in the same way and were then parsed for the text of each work. These parses are the "*corpus" and "*corpus1" files in the main directory. Both the downloading and the parsing was imperfect and introduced errors in the transcription. Steps were taken to clean up the data, but I suspect error remain and steps were taken to correct for these. (The corpora could not be reviewed by hand in time for the project's deadline.) This is discussed in the writeup and some of the documentation. Naive Bayes The Naive Bayes is run by, in the naivebayes directory, running first $ python naivebayes_logwriter.py -d func and then $ python naivebayse_classifier.py As an example "logfreq_data" contains the printed information from naivebayes_logwriter.py. The model prints the log-likelihood that each author wrote the text. It also prints the confusion matrix given the model only trained on 90% of the data and tested on the other 10%. The model uses +1 smoothing. See the "naivebayes_logwriter.py" and "naivebayes_classifier.py" self-documentation for further details. In addition, if instead of with the "func" option, one writes $ python naivebayes_logwriter.py -d all the program will use the authors' combined vocabulary to run the Naive Bayes model instead of the just the functional words. See the writeup for more notes on the distinction. Binning changes the results and is discussed in the writeup. See "naivebayes/README" for further documentation. Both "naivebayes_logwriter.py" and "naivebayes_classifier.py" are self documented as well. Perceptron The perceptron is run by, in the "perceptron" sub-directory using the command $ python perceptron.py --eta [float] --itrs [int] Where the "eta" parameter is the learning factor for the model and the "itrs" parameter is the number of iterations. The program will print the strength with which a perceptron trained to recognize each author believes the System Program was written by that author. It was found that the model was extremely sensitive to the learning parameter. This is consistent with the results from the SVM and indicate that the data in the training corpus is not separable. This is further discussed in the writeup. See "perceptron/README" for further documentation and "perceptron.py" is self-documented as well.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.