Giter Club home page Giter Club logo

web-b-gone's Introduction

Web-B-Gone

Workspace for the group Web-B-Gone of the 'Big Data and Language Technologies' course SoSe22.

Setup

You can use Docker to setup this project. If you are not familiar with Docker, please visit the linked tutorial.

Clone this repository and create a docker image with the Dockerfile. This image contains the entrypoint to the startup.py. The program needs three directories to work correctly:

  • an input directory where the data is located (default: ./data)
  • a working directory where the index and other stuff is saved for multiple use (default: ./working)
  • an output directory where the results are saved (default: ./out)

It's possible to set the directories in the config.json, if so the config.json - PATH has to be the parameter after -cfg.

Dataset

The dataset can be downloaded here. In order to use the dataset for this project, it first needs to be refactored. To achieve this start the program with the parameters -swde path/to/SWDE.zip -reswde. If you want to compress the restructured SWDE dataset use the parameter -cswde. For extraction of the compressed restructured SWDE dataset use -e path/to/restruc_SWDE.zip

Usage

In the main method of startup.py some example calls of the main functionalities of this project, like model training, evaluation, etc. are given. These are for illustration purposes and can be adjusted as desired.

Models

All trained and evaluated models are stored in the GIT as working.zip. To use them, they can be extracted and moved to the /working directory.

Paper

The entire GIT project is based on the paper "Information-Extraction from websites with a NER-Approach", which can also be found in the GIT.

web-b-gone's People

Contributors

jan108 avatar lo-hei avatar

Watchers

Tobias Schreieder avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.