-
Notifications
You must be signed in to change notification settings - Fork 275
fix: Migrate importer & workers off of Bug entities #4658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
The head ref may contain hidden characters: "\u{1F41E}"
Conversation
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is a significant refactoring to migrate from using Bug entities to Vulnerability entities as the source of truth. The changes are extensive and touch the importer, worker, and recoverer components, as well as associated tests and datafix tools.
Key changes include:
- Removing
Bug-dependent logic from the importer and worker. - Updating the worker to operate directly on
Vulnerabilityentities and their corresponding protobufs in GCS. - Modifying the recoverer to trigger a forced re-import for missing GCS files instead of regenerating from
Bugentities. - Updating data models and helper functions to reflect the new data structure.
- Archiving old datafix tools that operated on
Bugentities.
The changes appear to be well-executed and consistent with the goal of the migration. The test suite has been updated to reflect the new implementation, which is great. My review focuses on a minor logging improvement for better debuggability.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
another-rex
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basically LGTM first try!
Just confirming that we want to delete all affected information on withdraw?
| ds_vuln.is_withdrawn = True | ||
| ds_vuln.modified = proto_vuln.modified.ToDatetime(datetime.UTC) | ||
| osv.models.put_entities(ds_vuln, proto_vuln) | ||
| osv.update_affected_commits(vuln_id, [], False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be deleting all affected commits information?
It does not look like the existing implementation is deleting affected commits.
|
|
||
| # NB: Only OSS-Fuzz is editable - all other sources are read-only. | ||
| # This should not be reachable by this worker. | ||
| output_path = os.path.join(osv.repo_path(repo), path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this add a warning log if it gets to this point.
| else: | ||
| to_put.append(ListedVulnerability.from_vulnerability(vuln_pb)) | ||
| new_affected = affected_from_proto(vuln_pb) | ||
| added, removed = diff_affected_versions(old_affected, new_affected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, all affected versions are deleted when a record is withdrawn.
#3850
This change makes OSV finally stop writing new Bug entities, moving the source of truth to solely be the Vulnerability entity and proto in GCS (for non-oss-fuzz stuff).
old/directory for reference.Also added some missing type hints in some places.