jenojp / negspacy Goto Github PK

View Code? Open in Web Editor NEW

266.0 7.0 35.0 269 KB

spaCy pipeline object for negating concepts in text

License: MIT License

Python 100.00%

python nlp spacy negation negex negation-phrases spacy-pipeline spacy-extension

negspacy's Introduction

negspacy: negation for spaCy

spaCy pipeline object for negating concepts in text. Based on the NegEx algorithm.

NegEx - A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries Chapman, Bridewell, Hanbury, Cooper, Buchanan https://doi.org/10.1006/jbin.2001.1029

What's new

Version 1.0 is a major version update providing support for spaCy 3.0's new interface for adding pipeline components. As a result, it is not backwards compatible with previous versions of negspacy.

If your project uses spaCy 2.3.5 or earlier, you will need to use version 0.1.9. See archived readme.

Installation and usage

Install the library.

pip install negspacy

Import library and spaCy.

import spacy
from negspacy.negation import Negex

Load spacy language model. Add negspacy pipeline object. Filtering on entity types is optional.

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("negex", config={"ent_types":["PERSON","ORG"]})

View negations.

doc = nlp("She does not like Steve Jobs but likes Apple products.")

for e in doc.ents:
	print(e.text, e._.negex)

Steve Jobs True
Apple False

Consider pairing with scispacy to find UMLS concepts in text and process negations.

NegEx Patterns

pseudo_negations - phrases that are false triggers, ambiguous negations, or double negatives
preceding_negations - negation phrases that precede an entity
following_negations - negation phrases that follow an entity
termination - phrases that cut a sentence in parts, for purposes of negation detection (.e.g., "but")

Termsets

Designate termset to use, en_clinical is used by default.

en = phrases for general english language text
en_clinical DEFAULT = adds phrases specific to clinical domain to general english
en_clinical_sensitive = adds additional phrases to help rule out historical and possibly irrelevant entities

To set:

from negspacy.negation import Negex
from negspacy.termsets import termset

ts = termset("en")

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe(
    "negex",
    config={
        "neg_termset":ts.get_patterns()
    }
)

Additional Functionality

Change patterns or view patterns in use

Replace all patterns with your own set

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe(
    "negex", 
    config={
        "neg_termset":{
            "pseudo_negations": ["might not"],
            "preceding_negations": ["not"],
            "following_negations":["declined"],
            "termination": ["but","however"]
        }
    }
    )

Add and remove individual patterns on the fly from built-in termsets

from negspacy.termsets import termset
ts = termset("en")
ts.add_patterns({
            "pseudo_negations": ["my favorite pattern"],
            "termination": ["these are", "great patterns", "but"],
            "preceding_negations": ["wow a negation"],
            "following_negations": ["extra negation"],
        })
#OR
ts.remove_patterns(
        {
            "termination": ["these are", "great patterns"],
            "pseudo_negations": ["my favorite pattern"],
            "preceding_negations": ["denied", "wow a negation"],
            "following_negations": ["unlikely", "extra negation"],
        }
    )

View patterns in use

from negspacy.termsets import termset
ts = termset("en_clinical")
print(ts.get_patterns())

Negations in noun chunks

Depending on the Named Entity Recognition model you are using, you may have negations "chunked together" with nouns. For example:

nlp = spacy.load("en_core_sci_sm")
doc = nlp("There is no headache.")
for e in doc.ents:
    print(e.text)

# no headache

This would cause the Negex algorithm to miss the preceding negation. To account for this, you can add a chunk_prefix:

nlp = spacy.load("en_core_sci_sm")
ts = termset("en_clinical")
nlp.add_pipe(
    "negex",
    config={
        "chunk_prefix": ["no"],
    },
    last=True,
)
doc = nlp("There is no headache.")
for e in doc.ents:
    print(e.text, e._.negex)

# no headache True

Contributing

contributing

Authors

Jeno Pizarro

License

license

Other libraries

This library is featured in the spaCy Universe. Check it out for other useful libraries and inspiration.

If you're looking for a spaCy pipeline object to extract values that correspond to a named entity (e.g., birth dates, account numbers, or laboratory results) take a look at extractacy.

negspacy's People

Contributors

Stargazers

Watchers

negspacy's Issues

Error while processing the example

Hi I am getting the following error while trying to process the example with UmlsEntityLinker() 👍

File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
[Sun Sep 20 16:41:24.954031 2020] [wsgi:error] [pid 458:tid 140686318831360] [remote 122.176.234.216:63073] obj, end = self.raw_decode(s, idx=_w(s, 0).end())
[Sun Sep 20 16:41:24.954035 2020] [wsgi:error] [pid 458:tid 140686318831360] [remote 122.176.234.216:63073] File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode
[Sun Sep 20 16:41:24.954041 2020] [wsgi:error] [pid 458:tid 140686318831360] [remote 122.176.234.216:63073] raise JSONDecodeError("Expecting value", s, err.value) from None
[Sun Sep 20 16:41:24.954045 2020] [wsgi:error] [pid 458:tid 140686318831360] [remote 122.176.234.216:63073]

json.decoder.JSONDecodeError: Expecting value: line 1 column 219152385 (char 219152384)

I am using Python 3.8.2

Please help.

Regards

Prabhat

Spacy extension error

When I use this code, I can generate negex as seperate column
doc=nlp_negex(d)
labels=["ENTITY", "FAMILY"]
df_test = pd.DataFrame(columns=["ent","label", "negex", "sent"])
attrs = ["text", "label_", "sent"]
for e in doc.ents:
if e.label_ in labels:
df_test = df_test.append({'ent': e.text,'label': e.label_, 'negex': e._.negex, "sent":str(e.sent) }, ignore_index=True)

But, when I use the following code

doc=nlp_negex(d)
labels=['ENTITY', 'FAMILY']
attrs = ["text", "label_", "sent", "..negex"]
data = [[str(getattr(ent, attr)) for attr in attrs] for ent in doc.ents if ent.label in labels]
df2 = pd.DataFrame(data, columns=attrs)`

I am getting following error
`AttributeError: 'spacy.tokens.span.Span' object has no attribute '._.negex'

How can I access negex as a registered extension

Thank you

Error with the termset

Hi,

With negspacy 1.0.0 and spacy 3.0.1, I get the error for using termsets:

'negex -> neg_tersmset extra fields not permitted'

    nlp = spacy.load("en_core_web_sm")
    ts = termset("en")
    nlp.add_pipe(
        "negex",
        config={
            "neg_tersmset": ts.get_patterns()
        }
    )

Can you help?

Example from README is not working

First of all, thanks for this awesome library.

Describe the bug
When running the example from the README with the version 1.0.0 I get the following error:

TypeError: __init__() missing 4 required positional arguments: 'name', 'neg_termset', 'extension_name', and 'chunk_prefix'

To Reproduce

Steps to reproduce the behavior:

Go to a Python terminal and run the following example:

import spacy
from negspacy.negation import Negex


nlp = spacy.load("en_core_web_sm")
negex = Negex(nlp, ent_types=["PERSON","ORG"])

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __init__() missing 4 required positional arguments: 'name', 'neg_termset', 'extension_name', and 'chunk_prefix'

Expected behavior

Expecting it to work as in the README example with the latest version.

Desktop (please complete the following information):

OS: Linux PopOS
Spacy: 3.0.1

Termset choices

Thanks for enabling negex in the spaCy ecosystem -- this is incredibly helpful.

I noticed your termsets.py file is a subset of the trigger words/phrases historically used by negex (see here)

Was this for performance issues? Or to make negspacy more generalizable in non-healthcare domains? Some other reason?

I'm aware you can override negspacy's default termsets (nice feature), so this is more of a general question.

Thanks again for making this available.

spacy-2.2.3

Describe the bug
A clear and concise description of what the bug is.
uninstalls the latest spacy 2.2.3 and reinstalls an older version spacy 2.1.8.
To Reproduce
Steps to reproduce the behavior:
pip install negspacy
Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. MacOS Mojave]

Additional context
Add any other context about the problem here.

extract patterns out the negated entities

Hi, how do we extract the patterns of the negations identified? Is there any standard methods of negex we can use?

Like if I want to know which negation pattern below was identified for the negated entity.

NegEx Patterns
psuedo_negations - phrases that are false triggers, ambiguous negations, or double negatives
preceding_negations - negation phrases that precede an entity
following_negations - negation phrases that follow an entity
termination - phrases that cut a sentence in parts, for purposes of negation detection (.e.g., "but")

Negation detection for :No terms

Hi,

For cases like "Blood Transfusion: No" the negation is failing tried adding :No in the termset still no change

nlp = spacy.load("en_core_sci_lg")
ts = termset("en_clinical")
nlp.add_pipe(
"negex",
config={
"chunk_prefix": ["no"]
},
last=True,
)

doc = nlp("Blood Transfusion: No")
for e in doc.ents:
print(e.text, e._.negex)

Output:
Blood Transfusion False

Is it possible to apply Negspacy over token?

Hi,

How can I apply negations on the token? is it possible?

I tried following but it ended up in error.
`for token in doc:

print(token.text, token._.negex)

`The error is`

AttributeError Traceback (most recent call last)
in
1 for token in doc:
----> 2 print(token.text, token._.negex)

~/anaconda3/envs/python3/lib/python3.6/site-packages/spacy/tokens/underscore.py in getattr(self, name)
33 def getattr(self, name):
34 if name not in self._extensions:
---> 35 raise AttributeError(Errors.E046.format(name=name))
36 default, method, getter, setter = self._extensions[name]
37 if getter is not None:

AttributeError: [E046] Can't retrieve unregistered extension attribute 'negex'. Did you forget to call the set_extension method?
`

Documentation at negspacy's PyPI webpage needs to be updated

Describe the bug
Same as issues #31 and #34, the example here describing how to add negspacy object to pipeline needs to be updated.

P.S. Thank you for releasing and maintaining such a useful library!

Can negspacy be used with already identified Entities from scispacy

Is your feature request related to a problem? Please describe.
Can negspacy be used with already identified Entities and their spans through scispacy, by providing them somehow?

Describe the solution you'd like

For instance, scispacy has been already run with its EntityLinker, and umls entities with their the indices have been obtained and stored somewhere.

It would be computationally expensive to run the whole scispacy with negspacy again. Is there a way to only run (sci)spacy with only base spacy functionality like the tokenizer, and provide the full text string, the entities and their indices somehow, so that negspacy can determine the negation status?

KeyError: 'es_clinical'

I tested the pipeline for Spanish text as following

ts = termset("es_clinical")

nlp = spacy.load("es_core_news_sm")

nlp.add_pipe(
    "negex",
    config={
        "neg_termset":ts.get_patterns()
    }
)

and got the error

File "/root/negex_spacy/negex.py", line 5, in <module>
    ts = termset("es_clinical")
File "/usr/local/lib/python3.10/dist-packages/negspacy/termsets.py", line 212, in __init__
    self.terms = LANGUAGES[termset_lang]
KeyError: 'es_clinical'

Update: try to install via cloned repository and works. Seems that the one installed via pip is not the lastest version.

negspacy docs on spacy universe are out of date

Describe the bug
If you go to the docs here you'll see the following code example:

import spacy
from negspacy.negation import Negex

nlp = spacy.load("en_core_web_sm")
negex = Negex(nlp, ent_types=["PERSON','ORG"])
nlp.add_pipe(negex, last=True)

doc = nlp("She does not like Steve Jobs but likes Apple products.")
for e in doc.ents:
    print(e.text, e._.negex)

This is out of date and should be replaced with the contents of the current readme.

Negation of a wrong dependency

I have a custom NER pipeline. When I pair it with negspacy, I see wrong part of the sentence being negated.

e.g. in these cases, "home" is detected as negated:

"he has not been able to walk much outside his home"
"He does not have any help at home"

In these examples I am looking whether person is housed or homeless

How can I get this to work?

I think I'm missing something here and can't seem to resolve it.

The code works with the example texts provided in much of the documentation (e.g. "She does not like Steve Jobs but likes Apple products."), and the term 'cannot' appears in the termset - how can I identify these simple negations? Please note the print is indented in the original code.

Here's my code:

pip install negspacy

import spacy
from negspacy.negation import Negex

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("negex", config={"ent_types":["PERSON","ORG"]})

ts = termset("en")

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("negex", config={"ent_types":["PERSON","ORG"]})

doc = nlp("Men cannot play football.")
for e in doc.ents:
print(e.text,` e._.negex)

Applying Negex to Adjectives

Is there a straight forward way to apply Negex to adjectives? I already incorporated Negex into my pipeline with my own custom component, but I didn't realize until after the fact that it seems to only be searching for negations in relation to Named Entities. For example, I was hoping to apply it, so I'd get positive matches on something like:

doc = nlp("Eve is not nice. Eve is friendly. Eve is not chill.")
for s in doc.sents:
    for t in s.tokens:
      print(t._.negex, t.text)

True nice
False friendly
True chill

It seems like this is not supported at the moment, but if anyone has any advice on how to customize Negex to achieve this it would be much appreciated. Also, if there's a good reason to not bother trying to do this at all, would love to understand that too.

Thanks!

Allow user updates to different negation dictionaries

Is your feature request related to a problem? Please describe.
Currently, a user cannot modify negation dictionaries.

Describe the solution you'd like
When initializing NegEx object, allow a user to add custom terminology lists or keep defaults.

Describe alternatives you've considered
N/A

Additional context
N/A

Tagging 'possible' terms

Is your feature request related to a problem? Please describe.
An additional functionality of tagging terms as 'possible': It's a feature in one of the original negex implementations as well as in pyConTextNLP. Also, some important negation corpora include such annotation (i.e. speculated/possible terms).

Describe the solution you'd like
An example could like this:

doc = nlp("breast cancer may be ruled out")
for e in doc.ents:
    print(e.text, e._.negex)

Output:

breast cancer possible

Obviously, this would require adjusting the return value of e._.negex to be type of i.e. string instead of bool.
This implementation could help when considering the logic behind this feature. In case, anyone wanna run this negex with possible tagging enabled, check this issue here.
The "possible" pre and post triggers ([PREP] and [POSP]) can be also added easily from the the same implementation's negex_triggers.txt file.

Describe alternatives you've considered
None other than using this mentioned negex code separately (combined with spacy, without negspacy).

Additional context
I can refer to the README as well as negex.py files in here. I imagine, step 2. is the only one that would require more work and having good understanding of negspacy.

Context-based medical concepts extraction from the given text in python spacy

Hi ,

Thanks for this awesome negation tool, its amazing.

Motivating by negspacy I have some thoughts to implement which is similar task of negspacy, so kindly suggest to me.

I have a medical text which has a diagnosis, past history, negations (like no headache, fever), and in what case-patient to consult a doctor in the future/emergency.

Example of text

The patient admitted to the hospital with hypertension and chronic kidney disease. The patient had a past history of diabetes mellitus and coronary artery disease. When the patient admitted to the hospital, no symptoms of fever, giddiness, and headache found. The patient is asked to consult a doctor in case of vomiting and nausea.

The above sentence has a present illness (sentence 1), past history (sentence 2), negations (sentence 3), and future consultation (sentence 4). I have been using scispacy for medical concept extraction and negspacy for negations, both of them are working fine.

Now my next task is,
How do I separate present illness, past history, and future consultations in the NLP technique?

I have thought in mind that to add "past history of", "in case of emergency", "history of" in the chunk_prefix . is it a good move?

Can I create a duplicate of negspacy and add my own terms and add as a separate pipeline to spacy?

Spacy 3.3 support

When I use multiprocessing pool.map, word._.negex throws an error as Can't retrieve unregistered extension attribute 'negex'

Describe the bug
I am processing the medical texts written by nurses and doctors using spacy English() model and Negex to find the appropriate negations. The code works fine when i run it in single thread but when I use Multiprocessing to process texts simultaneously it raises an Exception as given below

File "../code/process_notes.py", line 154, in multiprocessing_finding_negation pool_results = pool.map(self.process, split_dfs) File "/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/pool.py", line 364, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/pool.py", line 771, in get raise self._value AttributeError: ("[E046] Can't retrieve unregistered extension attribute 'negex'. Did you forget to call theset_extension method?", 'occurred at index 2')

To Reproduce

def load_spacy_model(self):
	nlp = English()
	nlp.add_pipe(self.nlp.create_pipe('sentencizer'))
	ruler = EntityRuler(self.nlp)
	# adding labels and patterns in the entity ruler
	ruler.add_patterns(self.progressnotes_constant_objects.return_pattern_label_list(self.client))
	# adding entityruler in spacy pipeline
	nlp.add_pipe(ruler)
	preceding_negations = en_clinical['preceding_negations']
	following_negations = en_clinical['following_negations']
	# adding custom preceding negations with the default preceding negations.
	preceding_negations += self.progressnotes_constant_objects.return_custom_preceding_negations()
	# adding custom following negations with the default following negations.
	following_negations += self.progressnotes_constant_objects.return_custom_following_negations()
	# negation words to see if a noun chunk has negation or not.
	negation_words = self.progressnotes_constant_objects.return_negation_words()
	negex = Negex(nlp, language='en_clinical', chunk_prefix=negation_words,
	              preceding_negations=preceding_negations,
	              following_negations=following_negations)
	# adding negex in the spacy pipeline
	# input----->|entityruler|----->|negex|---->entities
	nlp.add_pipe(negex, last=True)
      
def process(self, split_dfs):
    split_dfs = split_dfs
    # this function is run under multiprocessing.
    split_dfs = self.split_dfs.apply(self.lambda_func, axis=1)

def lambda_func(self, row):
    """
    This is a lambda function which runs inside a multiprocessing pool.
    It read single row of the dataframe.
    Applies basic cleanup using replace dict.
    Finds positive,their respective tart-end index and negative words.
    positive words are the words mentioned in the keywords patterns.
    """
    row['clean_note'] = row['notetext']
    
    # passing the sentence from NLP pipeline.
    doc = self.nlp(row['clean_note'])
    neg_list = list()
    pos_list, pos_index_list = list(), list()
    for word in doc.ents:
        # segregating positive and negative words.
        if not word._.negex:
            # populating positive and respective positive index list.
            pos_list.append(word.text)
            pos_index_list.append((word.start_char, word.end_char))
        else:
            neg_list.append(word.text)

p = os.cpu_count() - 1
pool = mp.Pool(processes=p)
split_dfs = np.array_split(notes_df, 25)  # notes_df is a panda dataframe
pool_results = pool.map(self.process, split_dfs)
pool.close()
pool.join()

Expected behavior
pos_list & neg_list needs to get populated

Screenshots

Desktop (please complete the following information):

OS: MacOS Catalina, 8GB RAM , 1.6 Ghz dual-core

Can negspacy be also used to detect family mentions?

Is your feature request related to a problem? Please describe.
With the architecture to detect negations, could negspacy also detect if named-entities are within the scope of family mentions? This would be especially useful in combination with found UMLS concepts. Instead of negation words, it would probably have to use words like brother, sister, etc.

Describe the solution you'd like
In addition to the negation status of UMLS concepts, a familiy mention status could be reported.

Spacy 3.2 support

negspacy pairing example

Thank you for creating negspacy! This is extremely helpful. I was wondering you can provide more detail on Consider pairing with scispacy to find UMLS concepts in text and process negations. ?

I searched Google and was not able to find any examples. How do you pair with othet spacy elements.

pyproject.toml

To avoid issues in pip 23, utilize pyproject.toml over legacy setup.py method.

Ensure spacy 3.0 support

adding on custom patterns in negspacy

My code currently looks like -

import en_core_sci_lg
from negspacy.negation import Negex
nlp = en_core_sci_lg.load()

negex = Negex(nlp, language = "en_clinical_sensitive")
nlp.add_pipe(negex, last=True)

doc = nlp(""" patient has no signs of shortness of breath. """)

for word in doc.ents:
    print(word, word._.negex)

The output is -

patient False
shortness True

I want the output to be -

patient False
shortness of breath True

How can I consider phrases like "shortness of breath", "sore throat", "respiratory distress" as a single entity.

I was thinking of adding this custom phrases to add in negation.py line 81. how can I do that? is there any other approach with which I can resolve this issue.

Wrongly picking negations in negaspcy?

Hi,

I am trying to find negations in a sentence using negspacy. But, it's printing first negation (no headache) as False which is supposed to be True and picking the second negation correctly. Should I fine-tune any parameters to get the first negation correctly?

Here is my code.

nlp = spacy.load("en_core_sci_md")
negex = Negex(nlp, language = "en_clinical")
doc = nlp('I am having Hypertension with no headache and fever')
for ent in doc.ents:
    print(ent.text, ent._.negex)

Output:

Hypertension False
no headache False
fever True

Get the List of Corresponding Negation Terms for a Set of Negated Lexicons

Hello, I am working on a project with a clinical dataset. So far I was able to detect all the diagnoses and whether they are negated or not. But I really like to get the negation term used to detect negated lexicon as well. For example:

import spacy
from negspacy.negation import Negex

nlp = spacy.load("en_core_sci_lg")
nlp.add_pipe("negex")

doc = nlp("She has neither fever nor cough.")

for e in doc.ents:
print(e.text, e._.negex)

fever True
cough True

What more do I expect to get:
The negation term for each negated lexicon detected: neither, nor

It would be very appreciated if you could help me.

Compatibility with Scispacy

I am using the versions
spaCy 3.0.3
negspacy 1.0.0
scispacy 0.4.0

I think the current version of negspacy is not compatible with scispacy. I already read the issue but I think it works with the previous negspacy version. I also tried other models of scispacy like en_core_sci_sm but got the same error:

ValueError: [E002] Can't find factory for 'negex' for language English (en). This usually happens when spaCy calls `nlp.create_pipe` with a custom component name that's not registered on the current language class. If you're using a Transformer, make sure to install 'spacy-transformers'. If you're using a custom component, make sure you've added the decorator `@Language.component` (for function components) or `@Language.factory` (for class components).

Available factories: attribute_ruler, tok2vec, merge_noun_chunks, merge_entities, merge_subtokens, token_splitter, parser, beam_parser, entity_linker, ner, beam_ner, entity_ruler, lemmatizer, tagger, morphologizer, senter, sentencizer, textcat, textcat_multilabel, en.lemmatizer

My code is as follows:

 nlp = spacy.load("en_core_sci_md")
    ts = termset("en_clinical")
    ts.add_patterns({'preceding_negations': ["nor","non"]})
    nlp.add_pipe(
        "negex",
        config={
            "neg_termset":ts.get_patterns(),
            "chunk_prefix": ["no"],
        },
        last=True,
    )

    for text in df['Cons']:
        doc = nlp(str(text))
        ..........

Can you help?

Install error on negspacy (pip install)

Describe the bug
Getting the below error with pip install negspacy command in Anacondo Prompt.

To Reproduce
Steps to reproduce the behavior:
Run Anaconda Prompt as Administrator
type - pip install negspacy and enter
Expected behavior
negspacy package gets install and should be present in C:\ProgramData\Anaconda3\Lib\site-packages

Screenshots
(base) C:\WINDOWS\system32>pip install C:\negspacy-0.1.0a0.tar.gz
Processing c:\negspacy-0.1.0a0.tar.gz
ERROR: Command errored out with exit status 1:
command: 'c:\programdata\anaconda3\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\xx\AppData\Local\Temp\pip-req-build-4b2hz8em\setup.py'"'"'; file='"'"'C:\Users\xx\AppData\Local\Temp\pip-req-build-4b2hz8em\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base pip-egg-info
cwd: C:\Users\xxAppData\Local\Temp\pip-req-build-4b2hz8em
Complete output (7 lines):
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\xx\AppData\Local\Temp\pip-req-build-4b2hz8em\setup.py", line 10, in
long_description=open("README.md").read(),
File "c:\programdata\anaconda3\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 274: character maps to
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Desktop (please complete the following information):

OS: Windows

Additional context
Add any other context about the problem here.

Allow for negation algorithm to consider first token of a noun chunk

Is your feature request related to a problem? Please describe.
Some models may noun chunk a negation into an entity span. For example:
There is no headache doc.ents will include "no headache".

This would cause the negation algorithm to miss an obvious negation. This does not seem to happen with spacy out of the box models but more so in scispacy's biomedical language models.

Support for Spanish language

I used the Negex algorithm to deal with Spanish text. With the help of a Spanish Language expert, a list of negex terms based on the lists provided in the original paper was created. I'd like to contribute those to this repository. To that effect, I've created a fork of this repository. I opened this issue so there could be a discussion as to how this extension can happen.

When running example it returns "Steve Jobs False"

Describe the bug
When I run this example code:

import spacy
from negspacy.negation import Negex

nlp = spacy.load("en_core_web_sm")
negex = Negex(nlp, ent_types=["PERSON','ORG"])
nlp.add_pipe(negex, last=True)

doc = nlp("She does not like Steve Jobs but likes Apple products.")
for e in doc.ents:
    print(e.text, e._.negex)

It outputs:
Steve Jobs False
Apple False

Expected behavior
Steve Jobs True

Desktop (please complete the following information):

OS: Windows 10

spaCy issue #4267 will make negspacy believe a doc has not been processed for NER

Describe the bug
See explosion/spaCy#4267

To Reproduce
See explosion/spaCy#4267

Expected behavior
Whether using model based NER or EntityRuler, negspacy should know a document was NERed.

Pseudo negations are not being handled properly.

Describe the bug
Pseudo negations are not being handled properly.

jenojp / negspacy Goto Github PK

negspacy's Introduction

negspacy: negation for spaCy

What's new

Installation and usage

NegEx Patterns

Termsets

Additional Functionality

Change patterns or view patterns in use

Negations in noun chunks

Contributing

Authors

License

Other libraries

negspacy's People

Contributors

Stargazers

Watchers

Forkers

negspacy's Issues

The error is

Recommend Projects

Recommend Topics

Recommend Org

`The error is`