Skip to content

Conversation

@eshulman2
Copy link
Contributor

@eshulman2 eshulman2 commented Jan 28, 2026

This change introduces a new resyncPeriod field in ManagedOptions that enables periodic reconciliation of resources to detect and correct drift from the desired state in OpenStack.

Motivation: Resources managed by ORC can drift from their desired state due to external modifications in OpenStack. Without drift detection, these changes go unnoticed until the next spec change triggers reconciliation. The resyncPeriod option allows users to configure periodic checks to detect and remediate such drift automatically.

Implementation:

  • Add resyncPeriod field to ManagedOptions API
  • Add GetResyncPeriod() helper method for safe access with nil handling
  • Modify shouldReconcile() in the generic controller to check if enough
    time has passed since the last successful reconciliation
  • Schedule next resync when periodic resync is enabled and no other
    reschedule is pending
  • Update all CRDs, OpenAPI schema, and documentation

Usage:

spec:
  managedOptions:
    resyncPeriod: 60m  # Reconcile every hour to detect drift

If not specified default sync is 10H. if set to 0, periodic resync is disabled.

Closes: #655

Co-Authored-By: Claude Opus 4.5 noreply@anthropic.com

@github-actions github-actions bot added the semver:minor Backwards-compatible change label Jan 28, 2026
Copy link
Collaborator

@mandre mandre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That great, it looks relatively straightforward to implement. Some points to consider:

  • we'll need docs, and highly visible warnings about the side effects of enabling drift detection (higher loads on the cloud).
  • we'll want tests as well
  • we probably want the controller to accept a CLI flag, to set a global resync period
  • for configuring per Kind resync period (say I want my a force resync of my networks every hour, while more frequent resync of floating IPs), we'll have to think about a solution. Perhaps this solution is KRO.
  • we should have a strategy for when the managed resource was updated out-of-band, but ORC does not have the ability to revert it to the expected state (the field is immutable in ORC for example)
  • what happens when a managed resource is deleted in OpenStack?
  • many more questions

Ideally, we'll discuss all these points in a design document.

@mandre mandre marked this pull request as draft January 28, 2026 15:08
@eshulman2 eshulman2 mentioned this pull request Jan 28, 2026
This change introduces a new `resyncPeriod` field in `ManagedOptions` that
enables periodic reconciliation of resources to detect and correct drift
from the desired state in OpenStack.

Motivation:
Resources managed by ORC can drift from their desired state due to
external modifications in OpenStack. Without drift detection, these
changes go unnoticed until the next spec change triggers reconciliation.
The resyncPeriod option allows users to configure periodic checks to
detect and remediate such drift automatically.

Implementation:
- Add `resyncPeriod` field (metav1.Duration) to ManagedOptions API
- Add GetResyncPeriod() helper method for safe access with nil handling
- Modify shouldReconcile() in the generic controller to check if enough
  time has passed since the last successful reconciliation
- Schedule next resync when periodic resync is enabled and no other
  reschedule is pending
- Update all CRDs, OpenAPI schema, and documentation

Usage:
```yaml
spec:
  managedOptions:
    resyncPeriod: 1h  # Reconcile every hour to detect drift
```

If not specified or set to 0, periodic resync is disabled (default
behavior unchanged).

Closes: k-orc#655

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@eshulman2
Copy link
Contributor Author

eshulman2 commented Jan 28, 2026

I'll try to reply some right now:

  • we'll need docs, and highly visible warnings about the side effects of enabling drift detection (higher loads on the cloud).

Added a warning in docs

  • we'll want tests as well

We need to think how we want to test it but I defiantly agree we should figure out a way to gate it. For now as a draft tested locally on my computer.

  • we should have a strategy for when the managed resource was updated out-of-band, but ORC does not have the ability to revert it to the expected state (the field is immutable in ORC for example)

I think in this case it just won't fix the drift but it won't trigger another update I believe as when we are updating we compare the fields that we can change

  • what happens when a managed resource is deleted in OpenStack?

This case would behave similarly to reconciling before the resource exist and ORC will try to create it.
EDIT: seems like this is problematic as well

Copy link
Contributor

@dlaw4608 dlaw4608 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice, I played around with it for a while locally, works great!!


if osResource == nil {
// Programming error: if we don't have a resource we should either have an error or be waiting on something
return reconcileStatus.WithError(fmt.Errorf("oResource is not set, but no wait events or error"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noticed a small typo not from this PR, should be "osResource is not set ...." up to you @eshulman2 if you want to correct it.

Recreate resource in case it was deleted by something external to ORC.
this solves the issue when deleting with external but does raise concerns
I am afraid that in case of split brain or other edge case the resource
will be created over and over causing a catastrophic failure
@eshulman2
Copy link
Contributor Author

@mandre I added a second commit to re-create the resource on external delete. I must say I'm a bit uncomfortable with the possible risk profile it might add in edge cases, but I'll leave it here for now for reference and discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

semver:minor Backwards-compatible change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Drift detection

3 participants