Giter Club home page Giter Club logo

tksaha / con-s2v Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 2.0 31.44 MB

Con-S2V: A Generic Framework for Incorporating Extra-Sentential Context into Sen2Vec

Home Page: https://link.springer.com/chapter/10.1007%2F978-3-319-71249-9_45

License: Other

Shell 0.47% Python 31.09% Jupyter Notebook 46.83% Makefile 0.10% MATLAB 0.27% C++ 8.12% C 7.16% Perl 5.97%
rouge sentence sentence-classification sentence-representation sentence-embeddings topic-classification topic-clustering summarization joint-models

con-s2v's Introduction

CON-S2V: A Generic Framework for Incorporating Extra-Sentential Context into Sen2Vec

Extra-Sentential Context into Sen2Vec Latent Representation for the sentences.

Citation

If you are using the code, please consider citing the following papers:

@inproceedings{saha2017c,
  title={Con-S2V: A Generic Framework for Incorporating Extra-Sentential Context into Sen2Vec},
  author={Saha, Tanay Kumar and Joty, Shafiq and Al Hasan, Mohammad},
  booktitle={Joint European Conference on Machine Learning and Knowledge Discovery in Databases},
  pages={753--769},
  year={2017},
  organization={Springer}
}
@inproceedings{saha2017regularized,
  title={Regularized and Retrofitted models for Learning Sentence Representation with Context},
  author={Saha, Tanay Kumar and Joty, Shafiq and Hassan, Naeemul and Hasan, Mohammad Al},
  booktitle={Proceedings of the 2017 ACM on Conference on Information and Knowledge Management},
  pages={547--556},
  year={2017},
  organization={ACM}
}

Requirements

Python Environment setup and Update

  1. Copy the sen2vec_environment.yml file into anaconda/envs folder
  2. Get into anaconda/envs folder.
  3. Run the following command:
conda env create -f sen2vec_environment.yml

Now, you have successfully installed sen2vec environment and now you can activate the environment using the following command.

source activate sen2vec

If you have added more packages into the environment, you can update the .yml file using the following command:

conda env export > sen2vec_environment.yml

ROUGE Environment setup

Please go to the ROUGE directory and run the following command to check whether the provided perl script will work or not:

./ROUGE-1.5.5.pl 

If it shows the options for running the script, then you are fine. However, if it shows you haven't have XML::DOM installed then please type following command to install it:

cpan XML::DOM

Here, CPAN stands for Comprehensive Perl Archive Network.

Database Creation and update

If you have already installed postgresql, then you can create a table with the following command for the newsgroup [news] dataset:

psql -c "create database news"

After creating the database, use pg_restore to create the schemas which is agnostic to the dataset:

pg_restore --jobs=3 --exit-on-error --no-owner --dbname=news sql_dump.dump

or

pg_restore --jobs=3 -n public --exit-on-error --no-owner --dbname=news sql_dump.dump

We are assuming that either you are using postgres as the username or any other username which already has all the required privileges. To change the password for the postgres user, use the following command-

psql -h localhost -d news -U postgres -w
\password

If you have made any changes to the database, you can updated the dump file using following command (schema only):

[You may need to set peer authentication: Peer authentication]

sudo -u postgres pg_dump -s --no-owner -FC news >sql-dump.dump 

To dump the data of a particular table from the database:

sudo -u postgres pg_dump --data-only -t summary news --no-owner -Fc > news_summary.dump

Setting Environment Variables

Set the dataset folder path and the connection string in the environment.sh file properly and then run the following command-

source environment.sh #Unix, os-x

Creating Executable for Word2Vec (Mikolov's Implementation)

Please go to the word2vec code directory inside the project and type the following command for creating executable:

make clean
make

Installation of Theano for Skip-Thought

pip install theano
sudo apt install nvidia-cuda-toolkit

Installation of Keras (Sequential API)

pip install keras

To change the backend to theano please change the default configuration in ~/.keras/keras.json

{
    "image_dim_ordering": "tf",
    "epsilon": 1e-07,
    "floatx": "float32",
    "backend": "theano"
}

Downloading the {C-PHRASE} vectors:

Please download the C-Phrase vectors from [C-Phrase link] (http://clic.cimec.unitn.it/composes/cphrase-vectors.html) and join the files using following commands:

cat cphrase.txt.zip_* > cphrase.txt.zip 
sed -i  '1 i\174814 300'  cphrase.txt  # converting into word2vec format

Downloading GLove Pretrained Vectors for SDAE:

Please download the vectors from [Glove link] (http://nlp.stanford.edu/projects/glove/) and then append a line in the first line using following command:

sed -i '1 i\400000 300' glove.6B.300d.txt

Running the Project

Run sen2vec with -h argument to see all possible options:

python sen2vec -h
usage: sen2vec [-h] -dataset DATASET -ld LD

Sen2Vec

optional arguments:
  -h, --help            show this help message and exit
  -dataset DATASET, --dataset DATASET
                        Please enter dataset to work on [reuter, news]
  -ld LD, --ld LD       Load into Database [0, 1]
  
  -pd PD, --pd PD       Prepare Data [0, 1]
  
  -rbase RBASE, --rbase RBASE       Run the Baselines [0, 1]
  
  -gs GS, --gs GS       Generate Summary [0, 1]

For example, you can run for the news dataset using the following command-

python sen2vec -dataset news -ld 1 -pd 1 -rbase 1 -gs 1

con-s2v's People

Contributors

tksaha avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

afcarl xjtuerz0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.