Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions cassis/cas.py
Original file line number Diff line number Diff line change
Expand Up @@ -367,6 +367,38 @@ def add_annotations(self, annotations: Iterable[FeatureStructure]):
"""
self.add_all(annotations)

def cut_sofa_string_to_range(self, sofa_begin:int, sofa_end:int, overlap=True):
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after colon in type annotations. Should be sofa_begin: int and sofa_end: int to comply with PEP 8 style guidelines.

Suggested change
def cut_sofa_string_to_range(self, sofa_begin:int, sofa_end:int, overlap=True):
def cut_sofa_string_to_range(self, sofa_begin: int, sofa_end: int, overlap=True):

Copilot uses AI. Check for mistakes.
"""Replaces current sofa string with a cutout of the given range. Removes all annotations outside of range,
but keeps annotations that overlap with cutout points by default.

Args:
sofa_begin: The beginning of the cutout sofa.
sofa_end: The end of the cutout sofa.
overlap: If true, keeps overlapping annotations and modifies begin and end of annotation accordingly.

Raises:
ValueError: If cutout indices are invalid.
"""
if 0 <= sofa_begin < sofa_end <= len(self.sofa_string):
self.sofa_string = self.sofa_string[sofa_begin:sofa_end]
for annotation in self.select_all():
if sofa_begin <= annotation.begin and annotation.end <= sofa_end:
annotation.begin = annotation.begin - sofa_begin
annotation.end = annotation.end - sofa_begin
elif overlap and sofa_begin < annotation.end <= sofa_end:
annotation.begin = 0
annotation.end = sofa_end - sofa_begin
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect calculation for overlapping annotation end position. When an annotation starts before sofa_begin but ends within the range, the end should be annotation.end - sofa_begin, not sofa_end - sofa_begin. The current code incorrectly extends the annotation to the full cutout range.

Suggested change
annotation.end = sofa_end - sofa_begin
annotation.end = annotation.end - sofa_begin

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this suggestion.

elif overlap and sofa_begin <= annotation.begin < sofa_end:
annotation.begin = annotation.begin - sofa_begin
annotation.end = len(self.sofa_string)
elif overlap and annotation.begin <= sofa_begin and sofa_end <= annotation.end:
annotation.begin = 0
annotation.end = len(self.sofa_string)
else:
self.remove(annotation)
Comment on lines +384 to +398
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterating over select_all() and removing items during iteration can cause issues. The list returned by select_all() is modified during iteration when self.remove(annotation) is called. Consider iterating over a copy: for annotation in list(self.select_all()):

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

select_all returns an independent list of annotations. It should be safe to directly remove an annotation from the CAS while iterating over the list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface only promises to return a List, it does not promise that this list is an independent object. See comment above.

else:
raise ValueError(f"Invalid indices for begin {sofa_begin} and end {sofa_end}")

def remove(self, annotation: FeatureStructure):
"""Removes an annotation from an index. This throws if the
annotation was not present.
Expand All @@ -386,6 +418,26 @@ def remove_annotation(self, annotation: FeatureStructure):
"""
self.remove(annotation)


def remove_in_range(self, cut_begin:int, cut_end:int, type_: Union[Type, str]=None):
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing spaces around type annotations. Should be cut_begin: int, cut_end: int, and type_: Union[Type, str] = None to comply with PEP 8 style guidelines.

Suggested change
def remove_in_range(self, cut_begin:int, cut_end:int, type_: Union[Type, str]=None):
def remove_in_range(self, cut_begin: int, cut_end: int, type_: Union[Type, str] = None):

Copilot uses AI. Check for mistakes.
"""Removes annotations between two indices of the sofa string.

Args:
cut_begin: The beginning of the cutting interval.
cut_end: The end of the cutting interval.
type_: The type or name of the type name whose annotation instances are to be found
Raises:
ValueError: If range indices are invalid or annotation type_ not found.
"""

annotations = self.select_all() if type_ is None else self.select(type_)
if 0 <= cut_begin < cut_end <= len(self.sofa_string):
for annotation in annotations:
if cut_begin <= annotation.begin < annotation.end <= cut_end:
self.remove(annotation)
Comment on lines +433 to +437
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterating and removing from collection during iteration can cause issues. When using self.select_all(), the result is a list, so this works. However, converting to a list explicitly (e.g., list(annotations)) before iteration would be safer and more explicit to prevent potential issues if the return type changes in the future.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I believe the code should fail if the return types of cas.select_all and cas.select change, because it signifies a major change in functionality. But I am on the fence on this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the Copilot. Just because select_all() returns a List, it does not mean that the List may not be a live list. Removing stuff from a live list would be problematic. When removing stuff from a container such as the CAS, it is a good practice to make a snapshot of the element list over which to iterate before modifying the container.

else:
raise ValueError(f"Invalid indices for begin {cut_begin} and end {cut_end}")

@deprecation.deprecated(details="Use annotation.get_covered_text()")
def get_covered_text(self, annotation: FeatureStructure) -> str:
"""Gets the text that is covered by `annotation`.
Expand Down
75 changes: 75 additions & 0 deletions tests/test_cas.py
Original file line number Diff line number Diff line change
Expand Up @@ -604,3 +604,78 @@ def test_covered_text_on_annotation_without_sofa():

with pytest.raises(AnnotationHasNoSofa):
ann.get_covered_text()


def test_remove_in_range(small_typesystem_xml, small_xmi):
typesystem = load_typesystem(small_typesystem_xml)
cas = load_cas_from_xmi(small_xmi, typesystem)

begin = 10
end = 20

expected_leftover_annotations = [annotation for annotation in cas.select_all()
if not (begin <= annotation.begin < annotation.end <= end)]

cas.remove_in_range(begin, end)

result_leftover_annotations = cas.select_all()

assert len(result_leftover_annotations) == len(expected_leftover_annotations)

for annotation in expected_leftover_annotations:
assert annotation in result_leftover_annotations

def test_remove_in_range_with_type(small_typesystem_xml, small_xmi):
typesystem = load_typesystem(small_typesystem_xml)
cas = load_cas_from_xmi(small_xmi, typesystem)

begin = 0
end = 27
type_ = 'cassis.Token'
expected_leftover_annotations = [annotation for annotation in cas.select_all()
if not (begin <= annotation.begin < annotation.end <= end
and annotation.type.name == type_)]

cas.remove_in_range(begin, end, type_)

result_leftover_annotations = cas.select_all()

assert len(result_leftover_annotations) == len(expected_leftover_annotations)

for annotation in expected_leftover_annotations:
assert annotation in result_leftover_annotations
if begin <= annotation.begin < annotation.end <= end:
assert annotation.type.name != type_


def test_cut_sofa_string_to_range(small_typesystem_xml, small_xmi):
typesystem = load_typesystem(small_typesystem_xml)
cas = load_cas_from_xmi(small_xmi, typesystem)

begin = 10
end = 20

expected_leftover_annotations = [annotation for annotation in cas.select_all()
if (begin <= annotation.begin < end)
or (annotation.begin < begin < end <= annotation.end)]
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expected leftover annotations logic doesn't match the implementation. The condition doesn't account for all overlap cases. For example, annotations where annotation.end overlaps but annotation.begin < begin and annotation.end > end should be included but aren't. The condition should be: if (begin <= annotation.begin < end) or (begin < annotation.end <= end) or (annotation.begin < begin and annotation.end > end)

Suggested change
or (annotation.begin < begin < end <= annotation.end)]
or (begin < annotation.end <= end)
or (annotation.begin < begin and annotation.end > end)]

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this suggestion.


cas.cut_sofa_string_to_range(begin, end)

assert len(cas.select_all()) == len(expected_leftover_annotations)


def test_cut_sofa_string_to_range_no_overlap(small_typesystem_xml, small_xmi):
typesystem = load_typesystem(small_typesystem_xml)
cas = load_cas_from_xmi(small_xmi, typesystem)

begin = 10
end = 20

expected_leftover_annotations = [annotation for annotation in cas.select_all()
if begin <= annotation.begin < annotation.end <= end]

cas.cut_sofa_string_to_range(begin, end, overlap=False)

assert len(cas.select_all()) == len(expected_leftover_annotations)

print(cas.sofa_string)
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug print statement left in test code. This should be removed or replaced with an assertion that verifies the expected sofa_string value.

Suggested change
print(cas.sofa_string)

Copilot uses AI. Check for mistakes.