Skip to content

Conversation

@gerrod3
Copy link
Contributor

@gerrod3 gerrod3 commented Jan 21, 2026

Ok, the last fix was correct for duplicate artifacts across domains, but it didn't solve for duplicate metadata artifacts within a domain. At first this seems impossible, but there is a common scenario where this can occurs. A user uploads package a.1.whl with metadata xyz. They realize the package is missing some files and rebuild with the new files, exact same name and crucially the exact same metadata. They reupload and pulp creates a new package since the entire package has a new sha256 even though the metadata is the same as the old one. Then in our migration we will encounter two "different" packages with the same metadata artifact inside them.

My changes try to fix this by keeping track of the metadata artifacts shad256s and avoiding making duplicates. Since we do the saves in batches I have to do a check first within the batch to make sure there are no dups and then do a second check to make sure there are no dups from previous batches. Also, I'm grouping the packages by domain, so all the batches should be inside the same domain.

Hopefully I didn't screw up the logic anywhere.

fixes: #1071

@jobselko jobselko mentioned this pull request Jan 21, 2026
@gerrod3 gerrod3 force-pushed the 19-fix branch 3 times, most recently from 59be7f9 to dc4a082 Compare January 21, 2026 18:46
.exclude(metadata_sha256="")
.prefetch_related("_artifacts")
.only("filename", "metadata_sha256")
.only("sha256", "filename", "metadata_sha256", "pulp_domain_id", "pulp_type")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pulp_type is unused

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing some testing of the generated SQL I found it is since instantiating the package model requires pulp_type else we would get another query: https://github.com/pulp/pulpcore/blob/main/pulpcore/app/models/base.py#L163

@gerrod3 gerrod3 changed the title Fix migration 19 failing on duplicate artifact saves Turn migration 19 into a noop Jan 21, 2026
@gerrod3 gerrod3 changed the title Turn migration 19 into a noop Change migration 19 to reset metadata_sha256 to null Jan 21, 2026
@gerrod3
Copy link
Contributor Author

gerrod3 commented Jan 21, 2026

Ok, looks like the data migration plan isn't going to work. New plan: have the migration just set the metadata_sha256 field to null, then we will introduce a new command later that will populate the missing metadata artifacts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migration 0019 fails with uniqueness violation

2 participants