Giter Club home page Giter Club logo

mini-project-2's Introduction

mini-project-2

Build Status

Documentation Status

CodeCov

Requirements

  • Python 3.5+
  • libdb4.8-dev
  • libdb4.8++-dev
  • db-util

Overview

mini-project-2 is a python command-line application that interfaces with the Berkeley DB Python 3 package (bsddb3). Using the program users can specify queries written in the query language grammar seen here: https://github.com/CMPUT291F18MP2/Mini-Project-2/blob/master/mini_project_2/input_parser.py. These queries are processed by the program and the associated data is retrieved and presented to the user.

Installation

To install the Berkeley DB dependencies for mini-project-2 on Ubuntu run the following commands:

sudo add-apt-repository ppa:bitcoin/bitcoin
sudo apt-get update
sudo apt-get install libdb4.8-dev libdb4.8++-dev
sudo apt-get install db-util -y

mini-project-2 can then be installed from source by running:

pip install .

Within the same directory as mini-project-2's setup.py file.

Usage

After installing mini-project-2's shell can be started by the following console command:

mini-project-2 --phase [1-3]

To get additional usage help on starting mini-project-2 run the following console command:

mini-project-2 --help

mini-project-2's People

Contributors

ryfurrer avatar tim-tran avatar

Watchers

 avatar

mini-project-2's Issues

pdates.txt Creation

pdates.txt: This file includes one line for each ad in the form of d:a,c,l where d is a non-empty date at which the ad is posted and a, c, and l are respectively the ad id, category and location of the ad.

Report.pdf Creation

Your report must be type-written, saved as PDF and be included in your submission. Your report cannot exceed 3 pages.
The report should include
(a) a general overview of your system with a small user guide,
(b) a description of your algorithm for efficiently evaluating queries, in particular evaluating queries with multiple conditions and wild cards and range searches and an analysis of the efficiency of your algorithm,
(c) your testing strategy, and
(d) your group work break-down strategy.

Brief and full outputs

By default, the output of each query is the ad id and the title of all matching ads. The user should be able to change the output format to full record by typing "output=full" and back to id and title only using "output=brief".

README.txt Creation

The file README.txt is a text file that lists the names and ccids of all group members. This file must also include the names of anyone you collaborated with (as much as it is allowed within the course policy) or a line saying that you did not collaborate with anyone else. This is also the place to acknowledge the use of any source of information besides the course textbook and/or class notes.

ads.txt

ads.txt: This file includes one line for each ad in the form of a:rec where a is the ad id and rec is the full ad record in xml.

Index file creation

Phase 2 would produce four indexes which should be named ad.idx, te.idx, da.idx, and pr.idx respectively corresponding to indexes 1, 2, 3, and 4, as discussed above.

Given the sorted files terms.txt, pdates.txt, prices.txt and ads.txt, create the following four indexes: (1) a hash index on ads.txt with ad id as key and the full ad record as data, (2) a B+-tree index on terms.txt with term as key and ad id as data, (3) a B+-tree index on pdates.txt with date as key and ad id, category and location as data, (4) a B+-tree index on prices.txt with price as key and ad id, category and location as data. You should note that the keys in all 4 cases are the character strings before colon ':' and the data is everything that comes after the colon.

prices.txt

prices.txt: This file includes one line for each ad that has a non-empty price field in the form of p:a,c,l where p is a number indicating the price and a, c, and l are respectively the ad id, category and location of the ad.

terms.txt creation

terms.txt: This file includes terms extracted from ad titles and descriptions; for our purpose, suppose a term is a consecutive sequence of alphanumeric, underscore '' and dashed '-' characters, i.e [0-9a-zA-Z-]. The format of the file is as follows: for every termT in the title or the description of an ad with id a, there is a row in this file of the form t:a where t is the lowercase form of T. Ignore all special characters coded as &#number; such as ็”ฃ which represents ็”ฃ as well as ', " and & which respectively represent ', " and &. Also ignore terms of length 2 or less. Convert the terms to all lowercase before writing them out. Here are the respective files for our input files with 10 records and 1000 records.

test failures

Travis fails on these:
test/unit/test_phase2.py::test_sort_all FAILED [ 96%]
test/unit/test_phase2.py::test_format_all FAILED [ 98%]
Do they work for you @lionkingsimba ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.