Giter Club home page Giter Club logo

software-project's Introduction

Dense retrieval system for general court laws

Abstract

In this report, we present and discuss the development of a Civil Law dense information retrieval system made to identify a user's specific case and retrieve similar contexts from the available civil laws and case articles. Despite the lack of annotated data, we discover that our system does a fine job of retrieving relevant information based on user questions. According to the results, our Civile-Law-IR model outperforms other models that have been trained on millions of data points but have not been fine-tuned to domain knowledge. The goal of this project is to provide easy access to legal information for the general public. It does not replace any professional help, the information provided is just informative, containing some aspects that can help the individual understand their position and rights.

Installation

Required environment:

  • Unix system
  • Python 3.10

Clone this repository: git clone https://github.com/Rowan1697/software-project.git

To download and setup the necessary data and libraries: ./setup.sh

Repository structure

  • dataset/ : scripts within this folder were used to preprocess CASS data. The .story files needed for the project will be downloaded once you run setup.sh. All other pre-processed files are present in the directory.

  • scripts/ : contains semantic_search.py script for the initial semantic search model, syntheticData_generation folder, and cross-encode folder.

  • scripts/syntheticData_generation: contains the question_generation.py and synthetic-nli.py scripts to generate synthetic question dataset and the synthetic-nli dataset.

  • scripts/cross-encode: contains the training scripts for STSB - CE_stsb_train.py and Civile-NLI - CE_civile-nli_train.py models. Also, has the final dense_retrieval_pipeline.py script that contains front-end for human evaluation.

  • scripts/front-end: contains the app.py script to load the front-end system for the project. Use streamlit run app.py to run the script. The online version can be found here in Huggingface Spaces.

  • results/ : contains the automatic_evaluation.py script.

  • presentations/ : folder containing all the intermediate presentations as PDF. Each file is labled using the template SoftwarePresentation_[month-and-date].pdf

  • Report/: contains the final project pdf report.

Usage Guide

  • Make sure that you're running the code from the directory where the script is located.

  • To run any file within the directory just use python [Script_name].py. For example: python automatic_evaluation.py will run the automatic evaluation script.

software-project's People

Contributors

ssilwalcode avatar adriana671 avatar ldalila avatar rowan1224 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.