Asignment for Web information extraction and retrieval course of University of Ljubljana, Faculty of Computer and Information Science. Implemented approaches for extraction of data are in directory implemented.
Required python libraries: html, re, json, BeautifulSoup,lxml, html.parser
TEST XPATH AND REGULAR EXPRESSIONS
Solutions using regular expressions and XPath can be started using console: python reg_expression.py python xpath.py Application runs out of the box, so dont move pages or scripts. Scripts print out extracted data in json format.
TEST ROADRUNNER
In the file road_runner.py in line 7 and 8 you can change parameters (locations of the two web pages). Then run python road_runner.py using console.