Giter Club home page Giter Club logo

outlier_detection's Introduction

cluster_outlier

This repository contains the code and embeddings for Outlier Detection and Word Similarity estimation.

  1. Outlier Detection:
  • The script to run the task is outlier_detection/prototype.py and it can be run on the Camacho-Collados_Dataset.

  • A preprocessed version of the Camacho-Collados_Dataset can be found in preprocessed_datasets/Camacho-Collados_Dataset.txt.

  • Embeddings are filtered to contain only the required vectors, and they are provided in embeddings/

  • The basic command to run the script:

       python prototype.py --input $DIR_DATA --embedding $DIR_EMBEDDING
    
  • To learn about the options provided by the script, run the following command:

      python prototype.py --help
    
  • Tip : With minor changes, the script can be run on a preprocessed version of Blair datasets, also provided in the folder preprocessed_datasets/. The script requires to accomodate the varying number of items in each Blair cluster.

  1. Word Similarity:
  • The script to run the task is word_sim/wordsim.py and it can be run on the MEN, SimLEX, and WordSIM353 datasets.

  • The command to run the script:

       python wordsim.py --input $DIR_DATA --embedding $DIR_EMBEDDING
    
  • To learn about the options provided by the script, run the following command:

       python wordsim.py --help
    

outlier_detection's People

Contributors

esantus avatar wanghm92 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

quanghuy2506

outlier_detection's Issues

CODE

prototype.py

line 99 should change from
word = line[0].
to
word = line[0].lower()

This will make the provided word embedding file working well.

Otherwise lots of out of vocabulary words are detected.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.