-
Notifications
You must be signed in to change notification settings - Fork 23
#328 - Remove annotations in a specified range #329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -367,6 +367,38 @@ def add_annotations(self, annotations: Iterable[FeatureStructure]): | |||||
| """ | ||||||
| self.add_all(annotations) | ||||||
|
|
||||||
| def cut_sofa_string_to_range(self, sofa_begin:int, sofa_end:int, overlap=True): | ||||||
| """Replaces current sofa string with a cutout of the given range. Removes all annotations outside of range, | ||||||
| but keeps annotations that overlap with cutout points by default. | ||||||
|
|
||||||
| Args: | ||||||
| sofa_begin: The beginning of the cutout sofa. | ||||||
| sofa_end: The end of the cutout sofa. | ||||||
| overlap: If true, keeps overlapping annotations and modifies begin and end of annotation accordingly. | ||||||
|
|
||||||
| Raises: | ||||||
| ValueError: If cutout indices are invalid. | ||||||
| """ | ||||||
| if 0 <= sofa_begin < sofa_end <= len(self.sofa_string): | ||||||
| self.sofa_string = self.sofa_string[sofa_begin:sofa_end] | ||||||
| for annotation in self.select_all(): | ||||||
| if sofa_begin <= annotation.begin and annotation.end <= sofa_end: | ||||||
| annotation.begin = annotation.begin - sofa_begin | ||||||
| annotation.end = annotation.end - sofa_begin | ||||||
| elif overlap and sofa_begin < annotation.end <= sofa_end: | ||||||
| annotation.begin = 0 | ||||||
| annotation.end = sofa_end - sofa_begin | ||||||
|
||||||
| annotation.end = sofa_end - sofa_begin | |
| annotation.end = annotation.end - sofa_begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this suggestion.
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterating over select_all() and removing items during iteration can cause issues. The list returned by select_all() is modified during iteration when self.remove(annotation) is called. Consider iterating over a copy: for annotation in list(self.select_all()):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
select_all returns an independent list of annotations. It should be safe to directly remove an annotation from the CAS while iterating over the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interface only promises to return a List, it does not promise that this list is an independent object. See comment above.
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing spaces around type annotations. Should be cut_begin: int, cut_end: int, and type_: Union[Type, str] = None to comply with PEP 8 style guidelines.
| def remove_in_range(self, cut_begin:int, cut_end:int, type_: Union[Type, str]=None): | |
| def remove_in_range(self, cut_begin: int, cut_end: int, type_: Union[Type, str] = None): |
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterating and removing from collection during iteration can cause issues. When using self.select_all(), the result is a list, so this works. However, converting to a list explicitly (e.g., list(annotations)) before iteration would be safer and more explicit to prevent potential issues if the return type changes in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I believe the code should fail if the return types of cas.select_all and cas.select change, because it signifies a major change in functionality. But I am on the fence on this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the Copilot. Just because select_all() returns a List, it does not mean that the List may not be a live list. Removing stuff from a live list would be problematic. When removing stuff from a container such as the CAS, it is a good practice to make a snapshot of the element list over which to iterate before modifying the container.
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -604,3 +604,78 @@ def test_covered_text_on_annotation_without_sofa(): | |||||||
|
|
||||||||
| with pytest.raises(AnnotationHasNoSofa): | ||||||||
| ann.get_covered_text() | ||||||||
|
|
||||||||
|
|
||||||||
| def test_remove_in_range(small_typesystem_xml, small_xmi): | ||||||||
| typesystem = load_typesystem(small_typesystem_xml) | ||||||||
| cas = load_cas_from_xmi(small_xmi, typesystem) | ||||||||
|
|
||||||||
| begin = 10 | ||||||||
| end = 20 | ||||||||
|
|
||||||||
| expected_leftover_annotations = [annotation for annotation in cas.select_all() | ||||||||
| if not (begin <= annotation.begin < annotation.end <= end)] | ||||||||
|
|
||||||||
| cas.remove_in_range(begin, end) | ||||||||
|
|
||||||||
| result_leftover_annotations = cas.select_all() | ||||||||
|
|
||||||||
| assert len(result_leftover_annotations) == len(expected_leftover_annotations) | ||||||||
|
|
||||||||
| for annotation in expected_leftover_annotations: | ||||||||
| assert annotation in result_leftover_annotations | ||||||||
|
|
||||||||
| def test_remove_in_range_with_type(small_typesystem_xml, small_xmi): | ||||||||
| typesystem = load_typesystem(small_typesystem_xml) | ||||||||
| cas = load_cas_from_xmi(small_xmi, typesystem) | ||||||||
|
|
||||||||
| begin = 0 | ||||||||
| end = 27 | ||||||||
| type_ = 'cassis.Token' | ||||||||
| expected_leftover_annotations = [annotation for annotation in cas.select_all() | ||||||||
| if not (begin <= annotation.begin < annotation.end <= end | ||||||||
| and annotation.type.name == type_)] | ||||||||
|
|
||||||||
| cas.remove_in_range(begin, end, type_) | ||||||||
|
|
||||||||
| result_leftover_annotations = cas.select_all() | ||||||||
|
|
||||||||
| assert len(result_leftover_annotations) == len(expected_leftover_annotations) | ||||||||
|
|
||||||||
| for annotation in expected_leftover_annotations: | ||||||||
| assert annotation in result_leftover_annotations | ||||||||
| if begin <= annotation.begin < annotation.end <= end: | ||||||||
| assert annotation.type.name != type_ | ||||||||
|
|
||||||||
|
|
||||||||
| def test_cut_sofa_string_to_range(small_typesystem_xml, small_xmi): | ||||||||
| typesystem = load_typesystem(small_typesystem_xml) | ||||||||
| cas = load_cas_from_xmi(small_xmi, typesystem) | ||||||||
|
|
||||||||
| begin = 10 | ||||||||
| end = 20 | ||||||||
|
|
||||||||
| expected_leftover_annotations = [annotation for annotation in cas.select_all() | ||||||||
| if (begin <= annotation.begin < end) | ||||||||
| or (annotation.begin < begin < end <= annotation.end)] | ||||||||
|
||||||||
| or (annotation.begin < begin < end <= annotation.end)] | |
| or (begin < annotation.end <= end) | |
| or (annotation.begin < begin and annotation.end > end)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this suggestion.
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug print statement left in test code. This should be removed or replaced with an assertion that verifies the expected sofa_string value.
| print(cas.sofa_string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing space after colon in type annotations. Should be
sofa_begin: intandsofa_end: intto comply with PEP 8 style guidelines.