asyml / forteaug Goto Github PK
View Code? Open in Web Editor NEWA rich Data Augmentation library supporting structured NLP data
License: Apache License 2.0
A rich Data Augmentation library supporting structured NLP data
License: Apache License 2.0
run tests/algorithms/word_splitting_op_test.py, get errors as follow:
..\..\fortex\aug\base\base_data_augmentation_op.py:118: in perform_augmentation
augmented_data_pack = self._apply_augmentations(input_pack)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <fortex.aug.algorithms.word_splitting_op.RandomWordSplitDataAugmentOp object at 0x000001FE7F7FF460>
data_pack = <forte.data.data_pack.DataPack object at 0x000001FE7F805E20>
def _apply_augmentations(
self,
data_pack: DataPack,
) -> DataPack:
r"""
The objective of this function is to actualize the augmentations
proposed by the augment function. It will copy and update the text
of datapack and auto-align the annotation spans. The links are also
copied if its parent & child are both present in the new pack.
The groups are copied if all its members are present
in the new pack.
Args:
data_pack: The Datapack holding the replaced annotations.
Returns:
A new data_pack holds the text after replacement. The annotations
in the original data pack will be copied and auto-aligned as
instructed by the "other_entry_policy" in the configuration.
The links and groups will be copied if their members are copied.
New annotations added by the `insert_annotated_spans` function
will also be added to the newly created data pack. Conversely, if
annotation is deleted by the `delete_annotation` function or an annotation
exists within a span that is deleted by the `delete_span` function, it will
not be added to the new data pack.
"""
replaced_annotations = self._replaced_annos[data_pack.pack_id]
if len(replaced_annotations) == 0:
return deepcopy(data_pack)
spans: List[Span] = [span for span, _ in replaced_annotations]
replacement_strs: List[str] = [
replacement_str for _, replacement_str in replaced_annotations
]
# Get the new text for the new data pack.
new_text: str = ""
for i, span in enumerate(spans):
new_span_str = replacement_strs[i]
# First, get the gap text between last and this span.
last_span_end: int = spans[i - 1].end if i > 0 else 0
gap_text: str = data_pack.text[last_span_end : span.begin]
new_text += gap_text
# Then, append the replaced new text.
new_text += new_span_str
# Finally, append to new_text the text after the last span.
new_text += data_pack.text[spans[-1].end :]
# Get the span (begin, end) before and after replacement.
new_spans: List[Span] = []
# Bias is the delta between the beginning
# indices before & after replacement.
bias: int = 0
for i, span in enumerate(spans):
old_begin: int = spans[i].begin
old_end: int = spans[i].end
new_begin: int = old_begin + bias
new_end = new_begin + len(replacement_strs[i])
new_spans.append(Span(new_begin, new_end))
bias = new_end - old_end
new_pack: DataPack = DataPack()
new_pack.set_text(new_text)
entry_map: Dict[int, int] = {}
insert_ind: int = 0
pid: int = data_pack.pack_id
# Only iterate over those entries that are necessary. ie. the
# ones that are inserted or are present in the other_entry_policy
# config.
existing_entries = self.configs["other_entry_policy"].keys()
new_entries: Dict[str, List[Tuple[int, int]]] = {}
for pos, data in self._inserted_text[pid].items():
new_entries[data[1]] = new_entries.get(data[1], []) + [
(pos, data[0])
]
entries_to_copy: Set[str] = set(
list(existing_entries)
+ [val for val in new_entries if val is not None]
)
def _insert_new_span(
entry_class: str,
insert_ind: int,
inserted_annos: List[Tuple[int, int]],
new_pack: DataPack,
spans: List[Span],
new_spans: List[Span],
):
"""
An internal helper function for insertion.
Args:
entry_class: The new annotation type to be created.
insert_ind: The index to be insert.
inserted_annos: The annotation span information to be inserted.
new_pack: The new data pack to insert the annotation.
spans: The original spans before replacement, should be
a sorted ascending list.
new_spans: The original spans before replacement, should be
a sorted ascending list.
"""
pos: int
length: int
pos, length = inserted_annos[insert_ind]
if entry_class is None:
return
insert_end: int = self.modify_index(
pos,
spans,
new_spans,
is_begin=False,
# Include the inserted span itself.
is_inclusive=True,
)
insert_begin: int = insert_end - length
new_anno = create_class_with_kwargs(
entry_class,
{"pack": new_pack, "begin": insert_begin, "end": insert_end},
)
new_pack.add_entry(new_anno)
# Iterate over all the original entries and modify their spans.
for entry_to_copy in entries_to_copy:
class_to_copy = get_class(entry_to_copy)
insert_ind = 0
if not issubclass(class_to_copy, Annotation):
raise AttributeError(
f"The entry type to copy from [{entry_to_copy}] is not "
f"a sub-class of 'forte.data.ontology.top.Annotation'."
)
if entry_to_copy not in new_entries:
new_entries[entry_to_copy] = []
orig_annos: Iterable[Annotation] = data_pack.get(class_to_copy)
for orig_anno in orig_annos:
> old_sent = orig_anno.sentiment
E AttributeError: 'Token' object has no attribute 'sentiment'
..\..\fortex\aug\base\base_data_augmentation_op.py:535: AttributeError
======================== 1 failed, 1 warning in 6.00s =========================
Process finished with exit code 1
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.