Python3 was the language used to build the program.
- Download the
hw3-files-2021
folder into the root directory of the project. - Open the command line
- Navigate to the root directory of the project
../msci-541-720-hw3-christinajiyunlee
- Run
ComputeAverages.py
by entering the following command:python ComputeAverages.py
- If run successfully, you should now see the formatted outputs of topic averages for student2 & student 12 (
student2_output.txt
&student12_output.txt
), csv of mean averages (hw3-5a-J623LEE.csv
) and a csv of student's t-test p-values (hw3-5d-J623LEE.csv
) in the root directory of the project.
- Ensure
avdl.txt
file contains numeric value of the average length of all documents in collection - Open the command line
- Navigate to the root directory of the project
../msci-541-720-hw4-christinajiyunlee
- Run
BM25.py
by entering the following command:python BM25.py
- If run successfully, you should see either the
hw4-bm25-baseline-j623lee.txt
orhw4-bm25-stem-j623lee.txt
file in the root directory of the project, depending on the configuration of the script.
This file does not need to be run, as it is imported into the BM25.py
file and used there.
- Generate the
hw4-bm25-baseline-j623lee.txt
,hw4-bm25-stem-j623lee.txt
files into the root directory of the project. - Open the command line
- Navigate to the root directory of the project
../msci-541-720-hw4-christinajiyunlee
- Run
ComputeAverages.py
by entering the following command:python ComputeAverages.py
- If run successfully, you should now see the formatted outputs in the
hw4-metrics-j623lee.txt
file
- Download the
latimes.gz
compressed file into the root directory of the project - Open the command line
- Navigate to the root directory of the project
../msci-541-720-hw4-christinajiyunlee
- Run
IndexEngine.py
by entering the following command:python IndexEngine.py latimes.gz latimes-index
- If you see an
ERROR
, follow the instructions to resolve then run the command from step 4 again - If run successfully, you should now see an
latimes-index/
directory andindex.txt
file in the root directory of the project.
This script was optained from https://tartarus.org/martin/PorterStemmer/.
It was written by Vivake Gupta
, following the algorithm from Porter, 1980, An algorithm for suffix stripping, Program, Vol. 14, no. 3, pp 130-137,
and was available as open source code.