Skip to content

Conversation

@williamjallen
Copy link
Collaborator

The buildfailuredetails table is used to deduplicate recurring build failures. In practice, both of these tables are several orders of magnitude smaller than the build2test and testoutput tables, and deduplicating errors doesn't save much space. The crc32-based deduplication process is also problematic due to the low entropy hash function and need for relatively complex logic when inserting new records. This commit simplifies the schema by merging the columns of the buildfailuredetails table into the buildfailure table, bringing it in line with the similar builderror table. Follow-up work will merge the builderror and buildfailure tables.

The `buildfailuredetails` table is used to deduplicate recurring build failures.  In practice, both of these tables are several orders of magnitude smaller  than the build2test and testoutput tables, and deduplicating errors doesn't save much space.  The crc32-based deduplication process is also problematic due to the low entropy hash function and need for relatively complex logic when inserting new records.  This commit simplifies the schema by merging the columns of the `buildfailuredetails` table into the `buildfailure` table, bringing it in line with the similar `builderror` table.  Follow-up work will merge the `builderror` and `buildfailure` tables.
@williamjallen
Copy link
Collaborator Author

Requesting a double review since this is a relatively large change.

Copy link
Member

@josephsnyder josephsnyder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I don't see anything that worries me in the exploration of the testing data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants