andrewfm / nlpfinal Goto Github PK
View Code? Open in Web Editor NEWCS533 NLP Question Answering / Information Retrieval
CS533 NLP Question Answering / Information Retrieval
It's in the same overleaf project as our proposal report.
@MoonWatcher582
@Scorpio750
@AndrewFM
I'm thinking we'll have three modes of operation. If the program is very confident it has the correct subdomain that matches the question, it'll only search within that subdomain. If it's only kind of confident, it'll search within the top 3 subdomains or something. If it doesn't have a clue which subdomain to use, it'll reject, and tell the user it doesn't know the answer to the question.
Use NLTK's Porter Stemmer to stem the words in the user's question before passing it to the classifier.
(See 23.1.3 in the book)
Augment relevant synonymous words onto the user's question before passing it to the classifier. For simplicity, we can use NLTK's implementation of the WordNet thesaurus. If we want to get fancy (probably not), we can try using clustering to build our own domain-specific thesaurus from the training data, instead.
(See 23.1.6 in the book)
In addition to doing a classification task to figure out which stackexchange subdomain the question belongs to (ie: the category of the question), we also need to do a separate classification task to determine the answer type of the question (ie: person, place, thing).
Read: http://cogcomp.cs.illinois.edu/papers/LiRo05a.pdf
http://cogcomp.cs.illinois.edu/page/resource_view/49
http://cogcomp.cs.illinois.edu/page/tools_view/10
The API has the ability to provide a search query, and it'll return questions that match that query. I think I also saw that it has another function called 'similar', which I think returns questions that are similar to a query. Maybe we want to use that instead of search.
We'll probably grab the top 5 or 10 results to work with, I guess. Also will probably only hold on to the highest-voted/accepted answer from each question.
See 23.2.3 in the book. I think we should aim to do pattern extraction rather than N-gram tiling.
See 23.2.2 in the book.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.