Skip to content

Conversation

@dmnc-grdnr
Copy link

This PR adds the deep copy functionality to the CAS object.

Tests for the function was added to test_cas.py

I have verified the functionality on different CAS files, but due to the complexity of the data structure, there might be something I have overlooked.

Therefore feedback is highly appreciated!

@reckart reckart changed the title feature/330-enable-deep-copy #330 - Enable deep copy Nov 10, 2025
@reckart reckart requested a review from Copilot November 10, 2025 19:36
@reckart reckart added this to the 0.11.0 milestone Nov 10, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds deep copy functionality to the CAS (Common Analysis Structure) object, allowing users to create independent copies of CAS instances with or without duplicating the associated typesystem.

Key changes:

  • Added a deep_copy() method to the Cas class with optional typesystem copying
  • Implemented comprehensive test coverage for both copying modes (with and without typesystem)

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
cassis/cas.py Implements the deep_copy() method to create deep copies of CAS objects, handling sofas, views, feature structures, and their references
tests/test_cas.py Adds two test cases verifying deep copy functionality with and without typesystem copying

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@reckart
Copy link
Member

reckart commented Nov 10, 2025

@dmnc-grdnr Could you please consider the Copilot review. If you believe and of its review remarks are wrong, please comment them.

@reckart
Copy link
Member

reckart commented Nov 11, 2025

Oh, and thank you again for another PR :)

@dmnc-grdnr
Copy link
Author

I reviewed the Copilot suggestions agreed with most, but found that a few lines are actually obsolete and should be removed instead of modified. This is my first pull request and I am unsure how to make the changes without breaking things. Could you please point me to what to do next?

Copy link
Member

@reckart reckart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much. To continue, it would be great if you could simply address the comments by updating the code and then committing the changes into this branch. Once the changes are in, the review will be repeated. Please do not use the "amend" function of git, just add another commit. Avoid force pushes, then you can't really break anything. Thanks!

@dmnc-grdnr
Copy link
Author

I was finally able to iron out all of the issues that appeared after including the random CAS tests. They were extremely helpful! I think, I made all the changes that you and Copilot required. Could you please have another look? Thanks!

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 12 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

xmiID=sofa.xmiID,
)
sofa_copy.mimeType = sofa.mimeType
sofa_copy.sofaArray = sofa.sofaArray
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 869, sofaArray is copied by direct assignment, which creates a shallow copy. If sofaArray contains mutable data (like a list or array), modifications to one could affect the other. Consider using copy.copy() or copy.deepcopy() depending on the data structure.

Copilot uses AI. Check for mistakes.
Comment on lines +546 to +604
def test_deep_copy_without_typesystem(small_xmi, small_typesystem_xml):
org = load_cas_from_xmi(small_xmi, typesystem=load_typesystem(small_typesystem_xml))
copy = org.deep_copy(copy_typesystem=False)

assert org != copy
assert len(copy.to_json(pretty_print=True)) == len(org.to_json(pretty_print=True))
assert copy.to_json(pretty_print=True) == org.to_json(pretty_print=True)

assert org.typesystem == copy.typesystem


def test_deep_copy_with_typesystem(small_xmi, small_typesystem_xml):
org = load_cas_from_xmi(small_xmi, typesystem=load_typesystem(small_typesystem_xml))
copy = org.deep_copy(copy_typesystem=True)

assert org != copy
assert len(copy.to_json(pretty_print=True)) == len(org.to_json(pretty_print=True))
assert copy.to_json(pretty_print=True) == org.to_json(pretty_print=True)


assert org.typesystem != copy.typesystem
assert len(org.typesystem.to_xml()) == len(copy.typesystem.to_xml())
assert org.typesystem.to_xml() == copy.typesystem.to_xml()


def test_random_multi_type_random_deep_copy():
generator = MultiTypeRandomCasGenerator()
for i in range(0, 10):
generator.size = (i + 1) * 10
generator.type_count = i + 1
typesystem = generator.generate_type_system()
org = generator.generate_cas(typesystem)
print(f"CAS size: {sum(len(view.get_all_annotations()) for view in org.views)}")
copy = org.deep_copy(copy_typesystem=True)

org_text = org.to_xmi(pretty_print=True)
copy_text = copy.to_xmi(pretty_print=True)

assert org != copy
assert len(org_text) == len(copy_text)
assert org_text == copy_text


def test_random_multi_feature_deep_copy():
generator = MultiFeatureRandomCasGenerator()
for i in range(0, 10):
generator.size = (i + 1) * 10
typesystem = generator.generate_type_system()
org = generator.generate_cas(typesystem)
print(f"CAS size: {sum(len(view.get_all_annotations()) for view in org.views)}")
copy = org.deep_copy(copy_typesystem=True)

org_text = org.to_xmi(pretty_print=True)
copy_text = copy.to_xmi(pretty_print=True)

assert org != copy
assert len(org_text) == len(copy_text)
assert org_text == copy_text

Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests don't cover edge cases such as: feature structures with None values in arrays, feature structures with None non-primitive features, empty arrays, or multiple views. Consider adding tests for these scenarios to ensure the deep_copy method handles them correctly.

Copilot uses AI. Check for mistakes.
Comment on lines +851 to +856
cas_copy = Cas(ts,
document_language=self.document_language,
lenient=self._lenient,
sofa_mime=self.sofa_mime,
)

Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 852, document_language is read from self.document_language, which accesses the document annotation. If the original CAS doesn't have a document annotation, this will create one, which might not be desired behavior during copying. Consider handling the case where document_language is None or catching exceptions when the document annotation doesn't exist.

Suggested change
cas_copy = Cas(ts,
document_language=self.document_language,
lenient=self._lenient,
sofa_mime=self.sofa_mime,
)
# Safely determine the document language without assuming that a
# document annotation already exists on the original CAS.
document_language = None
try:
document_language = self.document_language
except AttributeError:
document_language = None
cas_copy = Cas(
ts,
document_language=document_language,
lenient=self._lenient,
sofa_mime=self.sofa_mime,
)

Copilot uses AI. Check for mistakes.
Comment on lines +894 to +918
for feature in t.all_features:
if ts.is_primitive(feature.rangeType):
fs_copy[feature.name] = fs.get(feature.name)
elif ts.is_primitive_collection(feature.rangeType):
fs_copy[feature.name] = ts.get_type(feature.rangeType.name)()
fs_copy[feature.name].elements = fs.get(feature.name).elements
elif ts.is_array(feature.rangeType):
fs_copy[feature.name] = ts.get_type(TYPE_NAME_FS_ARRAY)()
# collect referenced xmiIDs for mapping later
referenced_list = []
for item in fs[feature.name].elements:
if hasattr(item, 'xmiID') and item.xmiID is not None:
referenced_list.append(item.xmiID)
referenced_arrays.setdefault(fs.xmiID, {})
referenced_arrays[fs.xmiID][feature.name] = referenced_list
elif feature.rangeType.name == TYPE_NAME_SOFA:
# ignore sofa references
pass
else:
if hasattr(fs[feature.name], 'xmiID') and fs[feature.name].xmiID is not None:
references.setdefault(feature.name, [])
references[feature.name].append((fs.xmiID, fs[feature.name].xmiID))
else:
warnings.warn(f"Original non-primitive feature \"{feature.name}\" was and not copied from feature structure {fs.xmiID}.")

Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deep_copy method doesn't handle FSList (linked list) types. While it handles primitive collections and FSArrays, FSList types (like NonEmptyFSList) are not explicitly handled in the feature copying logic (lines 894-918). These will fall into the else clause on line 912, where they'll be treated as single references, which is incorrect for list structures. Consider adding explicit handling for FSList types similar to how FSArray is handled.

Copilot uses AI. Check for mistakes.
cas_copy._sofas[sofa_copy.sofaID] = sofa_copy
cas_copy._views[sofa_copy.sofaID] = View(sofa=sofa_copy)

# removes the _IntialView created with the initialization of the copied CAS
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "_IntialView" should be "_InitialView"

Suggested change
# removes the _IntialView created with the initialization of the copied CAS
# removes the _InitialView created with the initialization of the copied CAS

Copilot uses AI. Check for mistakes.
Comment on lines +898 to +908
fs_copy[feature.name] = ts.get_type(feature.rangeType.name)()
fs_copy[feature.name].elements = fs.get(feature.name).elements
elif ts.is_array(feature.rangeType):
fs_copy[feature.name] = ts.get_type(TYPE_NAME_FS_ARRAY)()
# collect referenced xmiIDs for mapping later
referenced_list = []
for item in fs[feature.name].elements:
if hasattr(item, 'xmiID') and item.xmiID is not None:
referenced_list.append(item.xmiID)
referenced_arrays.setdefault(fs.xmiID, {})
referenced_arrays[fs.xmiID][feature.name] = referenced_list
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code doesn't handle the case where a primitive collection feature value (line 897-899) or an array feature value (line 900-908) could be None. If fs.get(feature.name) returns None for these feature types, the code will fail with an AttributeError when trying to access .elements. Add a check for None before accessing .elements.

Suggested change
fs_copy[feature.name] = ts.get_type(feature.rangeType.name)()
fs_copy[feature.name].elements = fs.get(feature.name).elements
elif ts.is_array(feature.rangeType):
fs_copy[feature.name] = ts.get_type(TYPE_NAME_FS_ARRAY)()
# collect referenced xmiIDs for mapping later
referenced_list = []
for item in fs[feature.name].elements:
if hasattr(item, 'xmiID') and item.xmiID is not None:
referenced_list.append(item.xmiID)
referenced_arrays.setdefault(fs.xmiID, {})
referenced_arrays[fs.xmiID][feature.name] = referenced_list
original_value = fs.get(feature.name)
if original_value is not None:
fs_copy[feature.name] = ts.get_type(feature.rangeType.name)()
fs_copy[feature.name].elements = original_value.elements
else:
fs_copy[feature.name] = None
elif ts.is_array(feature.rangeType):
original_value = fs.get(feature.name)
if original_value is not None:
fs_copy[feature.name] = ts.get_type(TYPE_NAME_FS_ARRAY)()
# collect referenced xmiIDs for mapping later
referenced_list = []
for item in original_value.elements:
if hasattr(item, 'xmiID') and item.xmiID is not None:
referenced_list.append(item.xmiID)
referenced_arrays.setdefault(fs.xmiID, {})
referenced_arrays[fs.xmiID][feature.name] = referenced_list
else:
fs_copy[feature.name] = None

Copilot uses AI. Check for mistakes.
Comment on lines +933 to +934
elements = [all_copied_fs[reference_ID] for reference_ID in referenced_list]
all_copied_fs[current_ID][feature].elements = elements
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 933, the code assumes all reference_IDs in referenced_list exist in all_copied_fs, but doesn't handle the case where a reference might be missing. This will raise a KeyError. Consider adding error handling similar to lines 925-928, or filter out missing references.

Suggested change
elements = [all_copied_fs[reference_ID] for reference_ID in referenced_list]
all_copied_fs[current_ID][feature].elements = elements
elements = []
for reference_ID in referenced_list:
try:
elements.append(all_copied_fs[reference_ID])
except KeyError:
warnings.warn(
f"Reference {reference_ID} not found for feature '{feature}' of feature structure {current_ID}"
)
try:
all_copied_fs[current_ID][feature].elements = elements
except KeyError:
warnings.warn(
f"Feature '{feature}' or feature structure {current_ID} not found when setting array references"
)

Copilot uses AI. Check for mistakes.
Comment on lines +913 to +915
if hasattr(fs[feature.name], 'xmiID') and fs[feature.name].xmiID is not None:
references.setdefault(feature.name, [])
references[feature.name].append((fs.xmiID, fs[feature.name].xmiID))
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code checks if fs[feature.name] has an xmiID attribute (line 913), but doesn't handle the case where fs[feature.name] could be None. This will cause an error. Add a None check before accessing attributes: if fs[feature.name] is not None and hasattr(...)

Suggested change
if hasattr(fs[feature.name], 'xmiID') and fs[feature.name].xmiID is not None:
references.setdefault(feature.name, [])
references[feature.name].append((fs.xmiID, fs[feature.name].xmiID))
feature_value = fs[feature.name]
if feature_value is not None and hasattr(feature_value, 'xmiID') and feature_value.xmiID is not None:
references.setdefault(feature.name, [])
references[feature.name].append((fs.xmiID, feature_value.xmiID))

Copilot uses AI. Check for mistakes.

# add feature structures to the appropriate views
feature_structures = sorted(all_copied_fs.values(), key=lambda f: f.xmiID, reverse=False)
for item in all_copied_fs.values():
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable 'feature_structures' is created on line 937 but never used. It appears this was meant to iterate over the sorted list, but line 938 iterates over all_copied_fs.values() instead, which is unsorted. This could lead to issues if feature structures need to be added in a specific order.

Suggested change
for item in all_copied_fs.values():
for item in feature_structures:

Copilot uses AI. Check for mistakes.
fs_copy[feature.name] = fs.get(feature.name)
elif ts.is_primitive_collection(feature.rangeType):
fs_copy[feature.name] = ts.get_type(feature.rangeType.name)()
fs_copy[feature.name].elements = fs.get(feature.name).elements
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When copying primitive collections (line 899), the code directly assigns elements from the original collection to the copy. This creates a shallow copy of the elements list. If the elements are mutable (e.g., lists of bytes), modifications to one could affect the other. Consider creating a deep copy of the elements: fs_copy[feature.name].elements = list(fs.get(feature.name).elements) or using copy.deepcopy.

Suggested change
fs_copy[feature.name].elements = fs.get(feature.name).elements
fs_copy[feature.name].elements = list(fs.get(feature.name).elements)

Copilot uses AI. Check for mistakes.
@reckart
Copy link
Member

reckart commented Jan 27, 2026

@dmnc-grdnr coming back to this, I re-ran copilot just for good measure before I would review the code again and possible merge it. Unfortunately, I have to say it found a few points that appear to be quite critical and which I may have quite easily missed in a manual review. Do you still have the capacity and interest to address those issues?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants