Giter Club home page Giter Club logo

summac's People

Contributors

tingofurro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

summac's Issues

Range of numbers summac have

Hi, would two exact sentences have a summac_conv score near 1.?
I am having trouble interpreting the output results

Problem when loading the benchmark

hi, I encounter a key error problem using benchmark.py to load the cogensumm dataset.
The self.get_cnndm_document func always returns a key error indicating the "aid" is not in the loaded CNN/DM dataset. I can't figure out why this problem happens. Could you please help?

Pip install fails on multiple python versions (Ubuntu)

I tried multiple python versions but keep running into compatibility issues across the different required libraries:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gradient 2.0.6 requires click<9.0,>=8.0.1, but you have click 7.1.2 which is incompatible.
sentence-transformers 2.2.2 requires huggingface-hub>=0.4.0, but you have huggingface-hub 0.0.12 which is incompatible.
Successfully installed click-7.1.2 datasets-1.7.0 huggingface-hub-0.0.12 nltk-3.6.6 pyarrow-3.0.0 scikit-learn-1.0.2 summac-0.0.3 tokenizers-0.10.3 tqdm-4.49.0 transformers-4.8.1 xlrd-1.2.0

Can you dump a pip freeze somewhere that has all of the python library versions you are using, along with the python version? Then i can hopefully install it.

IndexError: list index out of range

Hi,

I encountered an error:

File "/add_score.py", line 53, in add_score
    res = function(["? I haven't had a birthday since 2007. I have a b-day in October and it's almost completely ignored."], ["",])
  File "/add_score_summac.py", line 28, in <lambda>
    "my_summacZS_batched": lambda summs, docs: modelZS.score(docs, summs)['scores'],
  File "/usr/local/lib/python3.9/site-packages/summac/model_summac.py", line 351, in score
    score = self.score_one(source, gen)
  File "/usr/local/lib/python3.9/site-packages/summac/model_summac.py", line 322, in score_one
    image = self.imager.build_image(original, generated)
  File "/usr/local/lib/python3.9/site-packages/summac/model_summac.py", line 113, in build_image
    generated_chunks = self.split_text(generated, granularity=gran_sum)
  File "/usr/local/lib/python3.9/site-packages/summac/model_summac.py", line 94, in split_text
    return self.split_sentences(text)
  File "/usr/local/lib/python3.9/site-packages/summac/model_summac.py", line 71, in split_sentences
    sentences = nltk.tokenize.sent_tokenize(text)
  File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/__init__.py", line 107, in sent_tokenize
    return tokenizer.tokenize(text)
  File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1276, in tokenize
    return list(self.sentences_from_text(text, realign_boundaries))
  File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1332, in sentences_from_text
    return [text[s:e] for s, e in self.span_tokenize(text, realign_boundaries)]
  File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1332, in <listcomp>
    return [text[s:e] for s, e in self.span_tokenize(text, realign_boundaries)]
  File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1322, in span_tokenize
    for sentence in slices:
  File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1421, in _realign_boundaries
    for sentence1, sentence2 in _pair_iter(slices):
  File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 318, in _pair_iter
    prev = next(iterator)
  File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1395, in _slices_from_text
    for match, context in self._match_potential_end_contexts(text):
  File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1382, in _match_potential_end_contexts
    before_words[match] = split[-1]
IndexError: list index out of range

I think it is caused by the leading "? ", which might lead in an empty sentence within the metric.
Is this to be expected and explained somewhere or is this a bug?

kind regards

Edit:
I circumvented (not fixed) this is issue for now using this code:

match = re.match(r"(\s*[.?!]+\s)", summaries[i])
if match:
    summaries[i] = summaries[i][len(match.group(1)):]

because empty leading sentences with other symbols than "?" also caused this issue.

Performance

Hi,

I am wondering, whether this metric runs as time efficient as possible at my machine.
I assume running on the gpu is faster than on cpu.
What would be the best indicator to see if my machine is fully utilized?
nvidia-smi only reports about 1 GB of additional gpu memory usage during evaluation, but close to 100% GPU-Util.
Is this indicating max gpu usage? I am curious because of the low memory usage, even though I processed a batch of documents and summaries. That took a few minutes to finish.
Are there any parameters to tweak performance, like batch sizes etc.?

kind regards

How to interpret negative scores produced by SummaCzs

I got negative scores from SummaCzs model following the example usage with my data. For example, scores look like the following.

        "scores": {
            "for_reference_summary": [
                -0.193,
                -0.746
            ],
            "for_generated_summary": [
                -0.259,
                -0.402
            ]
        },

Is it normal to have negative scores? How can I interpret the negative scores if so?

Thanks

Issue pip package install with python version 3.10

Hi,

thanks for providing the project as a pip package!
The installation doesn't work on python 3.10 for me. Anybody gotten the same experience?
I can circumvent the issue by using python 3.9.
If the issue is confirmed, could you please fix the package or add the python requirement to the Readme.

kind regards.

Threshold tuning

I was looking at your code and attempting to recreate your results.

If this this is how the results quoted in the paper were obtained it seems a bit strange that you are fine-tuning your threshold on the test set. Not withstanding the fact that the threshold is tuned per dataset in the benchmark (this being mentioned in the paper).

Can't run the notebooks

I am unable to run the notebook, and I feel like I am debugging all of the provided files. It seems like the code files have bugs in quite a few places.

How can I install `utils_misc` package?

Thanks for your work, it's great! But I have some trouble running the example code provided in the README.md.

When I ran the code, the package utils_misc is missing and I cannot find the right package used in model_summac.py.

I tried pip install utils_misc but obviously that's not the package required.

How can I install utils_misc ? Or could you provide a requirements.txt file ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.