Giter Club home page Giter Club logo

cs3245-hw4's Introduction

CS3245-HW4

Project Design

Indexing

Main entry point is index.py. Helper files are InputOutput.py, Tokenizer.py.

Indexing approach TBC, need to decide on how to approach positional indexing.

Searching

TBC.

Relevance Feedback

Query Expansion

We created a thesaurus that is specific to legal-context by scraping dictionary.law.com, forming a mapping from a legal term to a set of related terms. We then performed stemming on both the key and value since we are performing stemming on the query (otherwise the query and thesaurus would not match).

Source code for scraping can be found here.

Project style and setup

Project setup

Use whatever CPython interpreter as long as you are on 3.8.10 and have NLTK (with punkt downloaded). We can use external libraries if we package them with our code, but let's try not to.

Code style

Let's keep things consistent!

  • snake_case for variables and functions
  • PascalCase for classes/objects and custom types

Type hints

This project will probably get big and complicated with multiple people working on it, so type hints [1] [2] are encouraged. Add custom types to Types.py and import to whichever files need it.

Dictionaries and nested dictionaries are common, so we have common atomic values like DocId and Term abstracted as custom types in Types.py to make dictionary types clearer. E.g., Dict[DocId, DocFreq] rather than Dict[int, int].

Since we are on Python 3.8.10, we will need from __future__ import annotations to get proper type hinting.

cs3245-hw4's People

Contributors

simjunyou avatar aizatazhar avatar johnson-yee avatar soepaingzaw avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.