-
Notifications
You must be signed in to change notification settings - Fork 80
Change migration 19 to reset metadata_sha256 to null #1073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
pulp_python/app/migrations/0019_create_missing_metadata_artifacts.py
Outdated
Show resolved
Hide resolved
pulp_python/app/migrations/0019_create_missing_metadata_artifacts.py
Outdated
Show resolved
Hide resolved
59be7f9 to
dc4a082
Compare
| .exclude(metadata_sha256="") | ||
| .prefetch_related("_artifacts") | ||
| .only("filename", "metadata_sha256") | ||
| .only("sha256", "filename", "metadata_sha256", "pulp_domain_id", "pulp_type") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pulp_type is unused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing some testing of the generated SQL I found it is since instantiating the package model requires pulp_type else we would get another query: https://github.com/pulp/pulpcore/blob/main/pulpcore/app/models/base.py#L163
|
Ok, looks like the data migration plan isn't going to work. New plan: have the migration just set the |
Ok, the last fix was correct for duplicate artifacts across domains, but it didn't solve for duplicate metadata artifacts within a domain. At first this seems impossible, but there is a common scenario where this can occurs. A user uploads package a.1.whl with metadata xyz. They realize the package is missing some files and rebuild with the new files, exact same name and crucially the exact same metadata. They reupload and pulp creates a new package since the entire package has a new sha256 even though the metadata is the same as the old one. Then in our migration we will encounter two "different" packages with the same metadata artifact inside them.
My changes try to fix this by keeping track of the metadata artifacts shad256s and avoiding making duplicates. Since we do the saves in batches I have to do a check first within the batch to make sure there are no dups and then do a second check to make sure there are no dups from previous batches. Also, I'm grouping the packages by domain, so all the batches should be inside the same domain.
Hopefully I didn't screw up the logic anywhere.
fixes: #1071