Skip to content

Conversation

@plaidfinch
Copy link
Contributor

When Trust Quorum commits a new epoch, all encrypted U.2 datasets need their encryption keys rotated. This change implements that flow:

  • trust-quorum: Add watch channel to broadcast committed epoch changes from NodeTask to subscribers
  • sled-agent: Wire committed_epoch_rx to the config reconciler
  • config-reconciler:
    • Listen for epoch change notifications in the reconciler run loop and rekey datasets when the epoch changes
    • Add KeyRotationError, RekeyRequest types for the rekey API
    • Add rekey_datasets batch operation on DatasetTaskHandle
    • Add datasets_rekey to DatasetTask in the ZFS operation serializer task for key rotation
    • Add rekey_for_epoch to OmicronDatasets to coordinate rekeying all managed disks when an epoch is committed
    • Add managed_disks iterator to ExternalDisks
  • illumos-utils:
    • Add Zfs::change_key using zfs-atomic-change-key crate (temporarily) to rotate keys atomically with the change of the oxide:epoch property
    • Add ChangeKeyError type
    • Add epoch field to DatasetProperties and include oxide:epoch in ZFS property queries
  • key-manager: Add Debug derives to key types

The rekey operation is idempotent: datasets already at the target epoch are skipped. On startup, we process the initial epoch to catch any missed rekeys from crashes.

Fixes #9587

When Trust Quorum commits a new epoch, all encrypted U.2 datasets need
their encryption keys rotated. This change implements that flow:

- trust-quorum: Add watch channel to broadcast committed epoch changes
  from NodeTask to subscribers

- sled-agent: Wire committed_epoch_rx to the config reconciler

- config-reconciler:
  - Add KeyRotationError, RekeyRequest types for the rekey API
  - Add rekey_datasets() batch operation on DatasetTaskHandle
  - Add datasets_rekey() to DatasetTask for serialized key rotation
  - Add rekey_for_epoch() to OmicronDatasets to coordinate rekeying
    all managed disks when an epoch is committed
  - Handle epoch change notifications in the reconciler run loop
  - Add managed_disks() iterator to ExternalDisks

- illumos-utils:
  - Add Zfs::change_key() using zfs-atomic-change-key crate
  - Add ChangeKeyError type
  - Add epoch field to DatasetProperties and include oxide:epoch in
    ZFS property queries

- key-manager: Add Debug derives to key types

The rekey operation is idempotent: datasets already at the target epoch
are skipped. On startup, we process the initial epoch to catch any
missed rekeys from crashes.

Fixes #9587

/// Derived Disk Encryption key
#[derive(Default)]
#[derive(Debug, Default)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use this in a struct that implements Debug elsewhere. SecretBox has a [REDACTED] impl, so this is safe.

Comment on lines +297 to +302
pub(super) fn managed_disks(&self) -> impl Iterator<Item = &Disk> {
self.disks.iter().filter_map(|disk_state| match &disk_state.state {
DiskState::Managed(disk) => Some(disk),
DiskState::FailedToManage(_) => None,
})
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return an error if we hit the FailedToManage state anywhere, or — as is done here — silently omit any such disk?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TQ: Support for ZFS Key Rotation

2 participants