AAAI 2021 - Multi-Dimensional Explanation of Target Variables from Documents

Automated predictions require explanations to be interpretable by humans. Past work used attention and rationale mechanisms to find words that predict the target variable of a document. Often though, they result in a tradeoff between noisy explanations or a drop in accuracy. Furthermore, rationale methods cannot capture the multi-faceted nature of justifications for multiple targets, because of the non-probabilistic nature of the mask. In this paper, we propose the Multi-Target Masker (MTM) to address these shortcomings. The novelty lies in the soft multi-dimensional mask that models a relevance probability distribution over the set of target variables to handle ambiguities. Additionally, two regularizers guide MTM to induce long, meaningful explanations. We evaluate MTM on two datasets and show, using standard metrics and human annotations, that the resulting masks are more accurate and coherent than those generated by the state-of-the-art methods. Moreover, MTM is the first to also achieve the highest F1 scores for all the target variables simultaneously.

Link of the full paper: here.

Stay tuned for the code!

Data

You can download the hotel dataset here. It contains the train, dev, test sets, and the embeddings trained on HotelRec.

Each sample contains the five normalized ratings, the text, and the tokenized words.

{
  "aspects": [
    1.0, // Aspect Service
    1.0, // Aspect Cleanliness
    1.0, // Aspect Value
    0.8, // Aspect Location
    0.8  // Aspect Rooms
  ],
  "text": "We were on a road trip and just picking a hotel where we happened to be for the evening. We were surprised that at 5:00 on a Tuesday, there would be no rooms available at all of the \"better\" hotels in the area. The receptionist at Comfort Inn suggested we try the Quality Inn because she had just found out they had rooms. From the outside, this hotel is a disaster in looks and location. The manager told us they had just remodeled a few months ago and she was sure we would like the room. She was right. The bed was comfortable, the bathroom was well appointed and clean and the soundproofing was adequate. We had a lovely stay and left refreshed. We can recommend this hotel if you want a good night's sleep and don't need to impress anyone.",
  "words": [
    "we",
    "were",
    ...
    "to",
    "impress",
    "anyone"
  ]
}

For the Beer dataset, you can download the small de-correlated version and the embedding here. Please contact the author of the dataset, Prof. McAuley, for the full set (which has been removed, see here).

Citation

Please cite our papers if you find the code (first) or data (second) helpful, thanks!

@InProceedings{antognini2021,
  author    = {Antognini, Diego  and  Musat, Claudiu and Faltings, Boi},
  title     = {Multi-Dimensional Explanation of Target Variables from Documents},
  volume    = {35}, 
  journal   = {Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI)},
  month     = {February},
  year      = {2021},
  url       = {https://www.aaai.org/AAAI21Papers/AAAI-9984.AntogniniD.pdf}
}

@InProceedings{antognini-faltings:2020:LREC1,
  author    = {Antognini, Diego  and  Faltings, Boi},
  title     = {HotelRec: a Novel Very Large-Scale Hotel Recommendation Dataset},
  booktitle      = {Proceedings of The 12th Language Resources and Evaluation Conference},
  month          = {May},
  year           = {2020},
  address        = {Marseille, France},
  publisher      = {European Language Resources Association},
  pages     = {4917--4923},
  abstract  = {Today, recommender systems are an inevitable part of everyone's daily digital routine and are present on most internet platforms. State-of-the-art deep learning-based models require a large number of data to achieve their best performance. Many datasets fulfilling this criterion have been proposed for multiple domains, such as Amazon products, restaurants, or beers. However, works and datasets in the hotel domain are limited: the largest hotel review dataset is below the million samples. Additionally, the hotel domain suffers from a higher data sparsity than traditional recommendation datasets and therefore, traditional collaborative-filtering approaches cannot be applied to such data. In this paper, we propose HotelRec, a very large-scale hotel recommendation dataset, based on TripAdvisor, containing 50 million reviews. To the best of our knowledge, HotelRec is the largest publicly available dataset in the hotel domain (50M versus 0.9M) and additionally, the largest recommendation dataset in a single domain and with textual reviews (50M versus 22M). We release HotelRec for further research: https://github.com/Diego999/HotelRec.},
  url       = {https://www.aclweb.org/anthology/2020.lrec-1.605}
}

diego999 / mtm Goto Github PK

mtm's Introduction

AAAI 2021 - Multi-Dimensional Explanation of Target Variables from Documents

Data

Citation

mtm's People

Contributors

Stargazers

Watchers

mtm's Issues

Question about multiply word embedding with target distribution

Code availability?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent