rstriv / know-evolve Goto Github PK

View Code? Open in Web Editor NEW

107.0 9.0 22.0 16.9 MB

Implementation code for ICML '17 paper "Deep Temporal Reasoning for Dynamic Knowledge Graphs"

Makefile 0.89% C++ 72.07% Shell 0.79% Cuda 22.21% C 4.04%

know-evolve's Introduction

Know-Evolve

This repository holds code for ICML '17 paper: "Know-Evolve: Deep Temporal Reasoning for Dynamic Knowledge Graphs"

Install

To get source code, run:

git clone https://github.com/rstriv/Know-Evolve.git

There are two steps required for complete installation:

Install base graphnn library:

cd code/graphnn_base

Please follow the intallation instructions provided on Readme page. This is an obsolete and slightly modified version of graphnn library.

Build Know-Evolve Code:

cd code/know_evolve
make

This would create a build directory. If you get any error, please check to see that paths in your Makefile are correct.

Run Experiments

To run experiments on small sample dataset (500 entities):

cd code/know_evolve
./run_small.sh

To run experiments on full dataset (will require longer testing time):

cd code/know_evolve
./run_large.sh

You can change various hyper-parameters and try your own dataset using configurations in these files.

For any questions, please contact: rstrivedi AT gatech DOT edu

Reference

@InProceedings{trivedi2017knowevolve,
  title = 	 {Know-Evolve: Deep Temporal Reasoning for Dynamic Knowledge Graphs},
  author = 	 {Trivedi, Rakshit and Dai, Hanjun and Wang, Yichen and Song, Le},
  booktitle = 	 {Proceedings of the 34th International Conference on Machine Learning},
  year = 	 {2017}
}

know-evolve's People

Contributors

Stargazers

Watchers

know-evolve's Issues

Cannot run this code on other dataset, the program suddenly exit in TestLoop.

I have successfully run the "run_small.sh" and "run_large.sh".
Now I am trying to run the experiment on other dataset by modifying "run_largh.sh". However, it' fine in Training but the program suddenly exits in TestLoop.
The only difference between the dataset I used and icews(full) the author provided is: I deleted first 2776 lines data in "train.txt" in icews(full) and set cfg::skip to 0.
I think it's actually the same result because the cfg::skip in "run_large.sh" was set to 2776 to skip the first 2776 lines in "train.txt".

I checked the core dump file and loaded it using GDB.
Here is the information provided by core file.

Core was generated by `./build/main -cur_iter 0 -max_iter 20 -bptt 200 -lr 0.0005 -l2 0.00 -embed_E 64'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00000000004138f9 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider::_Alloc_hider (__a=...,
    __dat=<optimized out>, this=<optimized out>)
    at /usr/include/c++/5/bits/basic_string.h:109
109		: allocator_type(__a), _M_p(__dat) { }
[Current thread is 1 (Thread 0x7f0154004740 (LWP 27924))]
(gdb) where
#0  0x00000000004138f9 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider::_Alloc_hider (__a=..., __dat=<optimized out>, this=<optimized out>) at /usr/include/c++/5/bits/basic_string.h:109
#1  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string (
    __str=<error reading variable: Cannot access memory at address 0x10>, this=0x7ffdc16b53e0)
    at /usr/include/c++/5/bits/basic_string.h:399
#2  NNGraph<(MatMode)0, double>::InsertLayer (this=this@entry=0x997e80 <gnn>, layer=layer@entry=0x23031d10,
    operands=std::vector of length 12498, capacity 12498 = {...}) at ../graphnn_base/include/graphnn/nngraph.h:77
#3  0x0000000000408c0a in gl<SparseSurvivalNllLayer, (MatMode)0, double, int&, int&, LinearParam<(MatMode)0, double>*>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, NNGraph<(MatMode)0, double>&, std::vector<ILayer<(MatMode)0, double>*, std::allocator<ILayer<(MatMode)0, double>*> >, int&, int&, LinearParam<(MatMode)0, double>*&&) (operands=<error reading variable: access outside bounds of object referenced via synthetic pointer>, gnn=...,
    name_prefix=<error reading variable: access outside bounds of object referenced via synthetic pointer>)
    at ./include/layer_holder.h:38
#4  BuildOutput (cur_time=..., cur_event=cur_event@entry=0x463ec30, inputs=std::map with 1 elements = {...},
    layer_buf=std::vector of length 12498, capacity 12498 = {...}, is_last=<optimized out>) at src/main.cpp:275
#5  0x000000000040b17f in TestLoop (latest_embeddings=std::vector of length 12498, capacity 12498 = {...},
    lookup_entity_onehot=std::vector of length 12498, capacity 12498 = {...},
    lookup_rel_onehot=std::vector of length 261, capacity 261 = {...},
    lookup_entity_init=std::vector of length 12498, capacity 12498 = {...},
    lookup_rel_init=std::vector of length 261, capacity 261 = {...}) at src/main.cpp:509
#6  0x000000000040cdac in MainLoop () at src/main.cpp:642
#7  0x0000000000405d04 in main (argc=<optimized out>, argv=<optimized out>) at src/main.cpp:771
(gdb)

training detail and result

Hello, i implement you paper in Pytorch and there are some problems.

How to process one entity is involved in tow succesive events when computing happened term loss? I compute in following way, is that correct?
-log(t - t_bar+1e-5) +log(1e-5)
I found the negtive log loss oscillated not decreased. And i saw the training result pictures in one issue before under this repository, the nll he displayed didn't decrease, too.
Do you only care about the MAR and Hits@10 outperformance other baseline model, not consider the convergence of algorithm?

I can't run your code, so do you remeber the training time when run your code? My code seems time comsuming...
Thank you

Missing GDELT dataset

Could you please publish the GDELT & GDELT-500 datasets?

Can you provide the version of intel mkl and cppformat?

Cause by the difference of the version for these libs, I cannot run this code.

Program suddenly exits in first iteration

In the first iteration of training the program suddenly exits with no error messages in the middle of iterating feed-forward computation.
I was running run_small.sh after compiling using make, using the datasets supplied within the package.
The program stops here:
main.cpp: mainloop(), line 672-685

		inputs.clear();
		GetMiniBatch_SEQ(e, event_mini_batch);

		int train_in_batch = BuildTrainNet(T_begin, event_mini_batch, inputs, 
							lookup_entity_onehot, lookup_rel_onehot,
							lookup_entity_init, lookup_rel_init);

		**gnn.FeedForward(inputs, TRAIN);**
		auto loss_map = gnn.GetLoss();
    			if (cfg::iter % cfg::report_interval == 0)
		{		
			Dtype nll = 0.0, avg_rank = 0.0, mae = 0.0, rmse = 0.0;
			for (auto it = loss_map.begin(); it != loss_map.end(); ++it)
			{

nngraph.cpp(in graphnn_base): void NNGraph<mode, Dtype>::FeedForward(std::map<std::string, IMatrix<mode, Dtype>* > input, Phase phase), line 23-50

    for (size_t i = 0; i < ordered_layers.size(); ++i)
    {
        std::cerr << "Running batch " << i << " of " << ordered_layers.size() << "\n";
        assert(layer_dict.count(ordered_layers[i].first));
        auto* cur_layer = layer_dict[ordered_layers[i].first];
        auto& operands = ordered_layers[i].second;
        assert(name_idx_map.count(cur_layer->name));
        if (operands.size() == 0 && ! hash[name_idx_map[cur_layer->name]])
            continue;

        bool ready = true;
        for (auto* layer : operands)
        {
            if (static_layer_dict.count(layer->name))
                continue;
            assert(name_idx_map.count(layer->name));
            auto idx = name_idx_map[layer->name];
            ready &= hash[idx];
        }
        hash[name_idx_map[cur_layer->name]] = ready;
        if (ready)
            **cur_layer->UpdateOutput(operands, phase);**
        else if (phase != TEST)
            throw std::runtime_error("wrong computation flow");
    }

param_layer.h(in graphnn_base): class ParamLayer, virtual void UpdateOutput(std::vector< ILayer<mode, Dtype>* >& operands, Phase phase)

    virtual void UpdateOutput(std::vector< ILayer<mode, Dtype>* >& operands, Phase phase) override
    {
        //**THE PROGRAM SUDDENLY STOPS HERE WITH RETURN CODE 0**
        assert(operands.size() == params.size());
        auto& cur_output = this->state->DenseDerived();
        for (size_t i = 0; i < operands.size(); ++i)
        {
            if (i == 0)
                params[i]->ResetOutput(operands[i]->state, &cur_output); 
            params[i]->UpdateOutput(operands[i]->state, &cur_output, i == 0 ? 0.0 : 1.0, phase);
        }
    }

Symptoms

It is the first ever iteration executed.

I have tracked the numbers of iterations that has been run. It stops in the middle. The "Running Here +id" is my tracking where the program stops. It stops here every time. It seems not because of the asserts. For dependencies I am using Intel mkl with compilers and libraries at 2019.1.144, parallel studio XE at 2019.1.053.

docker image for Know-Evolve

Could someone provide a docker image for Know-Evolve? I followed the installation instructions and successfully built the project, but the program threw segmentation fault when I ran it.

Parameter Initialization
************************
cur_iter = 0
max_iter = 3258
bptt = 200
learning rate = 0.0005
l2_penalty = 0
negative samples = 1
skip events = 0
n_embed_E = 64
n_embed_R = 32
n_hidden = 64
warm start = 500
min. duration = 24
max. duration = 500
time scale = 0.0001
weight_scale = 0.1
test_interval = 3258
report_interval = 100
save_interval = 3258
meta file = ../../data/icews//stat_500.txt
train file = ../../data/icews//train_500.txt
test file = ../../data/icews//test_500.txt
Model folder = .//E_64-R_32-H_64
*******************************
Train size:217017
Test size:228648
Events loaded
******************************

# train: 217017 # test: 228648
Total number of entities: 500
Total number of unique relations: 260
Train Map size: 90798
Train Entity size: 500
Data loaded
Param created

Training Start
=================

Segmentation fault

What's the version of cppformat

What's the version of cppformat?

Stability related and MAR score/Conditional Density calculation issue

Dear Authors,

I was reading your paper and trying to implement it in tensorflow. However, while doing so I came across a few things which I couldn't understand. First, I will post the queries that I have related to the paper.

I also got your C++ code running on my system. I noticed a few weird things related to the computation of MAR Score - which is in turn related to the reported results. I will post these queries in the second part.

Queries about the paper:

Equation 3 for the conditional intensity(lambda) is not bounded in the positive direction. So during optimisation it can explode depending on the gradient it receives (as lambda = exp(g)). How to prevent this from happening?
In Equation 2, to compute the conditional density of an event, we have f = lambda * S. Aren't these two terms competing against each other? As lambda increases, S decreases. How do we ensure the stability while training?
In section 4, you mention a trainable Parameter W_e, but this is not used in any of the equations in the paper. What is this related to?
It is also mentioned that V_e is a trainable parameter. Do you mean that it is trainable because it evolves over time, or it is actually a trainable node (i.e. apart from temporal evolution, it gets updated during back prop)?

Queries related to code base
I cloned the repo and got it running on my system. I was trying to understand the code and I noticed something weird. It is related to the way you compute the results - MAR score - in the SparseOnedimRankCriterionLayer class.

I ran the code, as it is, with the same set of hyperparameters as you have given and without any code change - just added a few missing includes to get it to compile on my system(see my forked repo).

While training I noticed that the MAR score for the train samples were 1 right from batch 1. So I debugged further. I noticed the following:

In the SparseOnedimRankCriterionLayer class, the following lines of code compute the MAR score:

		this->loss = 1.0;
		...
		sim = LogLL(sim, this->event_t - this->cur_time.GetCurTime(subject, object), true);
		//#pragma omp parallel for
		for (size_t i = 0; i < bg->entity_list.size(); ++i)
		{
			...

			auto& other_feat = operands[bg->entity_idx[pred_object]]->state->DenseDerived();
			D.GeMM(B,other_feat,Trans::N, Trans::T, 1.0, 0.0);
			Dtype cur_sim = D.data[0]; 
			cur_sim = LogLL(cur_sim, this->event_t - this->cur_time.GetCurTime(subject, object), true);
			std::cerr<<cur_sim<<" "<< sim<<std::endl;
			if (cur_sim > sim && order == RankOrder::DESC)
			{
				this->loss++;
				std::cerr<<'---Loss Increased'<<std::endl;
			}
			...

		}

In the above snippet, the variable this->loss would have the MAR score at the end of the loop. You are computing the conditional density of the original triplet and in the for loop you are computing the same for the corruptions. If the corruptions have higher conditional density you decrease the rank. This is absolutely fine.

However, when I debugged the code further, I noticed that the sim and curr_sim values for most of the original samples and for most of the corruptions comes as -inf. SO, the comparision if (cur_sim > sim && order == RankOrder::DESC) yields False and the this->loss never changes, or in other words, the MAR score will be high(close to 1). I am attaching the screenshots of training and test. I printed the NLL and MAR score for every 10th batch while training. (Again, this is related to my ques 1 and 2 in the first section - the intensity explodes and -log(intensity) becomes -inf). There are very few instances where average rank is not 1 while training(cases where LogLL is not -inf).

I am trying to figure this out for weeks. Am I doing something wrong? I have not modified your code base except added few print statements. Can you please help me with this. What's even more weird is that the results in the test set are similar to the ones that you have reported in the paper. I have attached the results as well. The hits@10 >90% and avg_rank close to 200 but it is due to the incorrect way of evaluating it i.e. -inf comparison results in MAR score staying around 1 in most cases. I couldn't exactly get the values that you mentioned in the paper but they are close(as you are not setting the seed for random)

I am sure you may not have got -inf as conditional density(LogLL value for sim and curr_sim) or avg_rank == 1 right from the batch 1. So, I think you may have uploaded an incorrect/older dev version of the codebase. Can you please update the codebase if there was some mismatch while uploading. Are these the exact hyperparams(in the config file of run_large.sh) that you used for reporting the results in the paper? If not can you please share the exact hyperparameter file. Can you also seed the random number generator so that I can reproduce your results.

This is the link to the forked repo. Only changes are : I have added missing imports to cppformat and vector.h and removed parallel for as it was crashing for me.

https://github.com/sumitpai/Know-Evolve

Below are the screenshots of training and test:

I look forward for your response.

Missing ICEWS dataset labels

Could you please publish the string labels associated to ICEWS entities and properties under data/icews?

Relational embeddings initialization

Dear authors,

I'm trying to implement your framework in TensorFlow and have a few questions about how you deal with relational embeddings.

I understand that in the bilinear formulation, there is one relationship weight matrix "R" per relation, trained during backpropagation. But how does relational embeddings "r" work? You say in the article that these embeddings are static: it mean that they don't evolve like the entities, but are they trained during backpropagation? Do you initialize them at zero like the entities?

In the code, you fetch the last existing embedding if it exists or you get the embedding parameters if not.

latest_subject_rel_embed = GetEmbeddingParam("rel", cfg::num_rels, e->rel, inputs, param_dict["w_rel_init"], lookup_rel_onehot, lookup_rel_init);

It is not quite clear for me what this line does. I understand that you use the one hots and the parameters w_rel_init to fetch the embedding, but what do you fetch exactly? The line of w_rel_init that embed the current relation, using the one hot layer? And is w_rel_init the Wr mentioned in Section 4?

Thanks in advance for your answer.

graphnn unable to be compiled

This time we updated the system, and was forced to use g++ version 5 because Ubuntu 1804 comes with g++ 7 and our CUDA version is too low to use g++7. We do not want to update CUDA, so we use g++5. now we face a compiling problem:

anyone meeting with this problem before?