-
Notifications
You must be signed in to change notification settings - Fork 34
Fix SecurityGroup availability status by counting security group rules #651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -45,7 +45,24 @@ func (securityGroupStatusWriter) ResourceAvailableStatus(orcObject orcObjectPT, | |
| } | ||
| } | ||
|
|
||
| // SecurityGroup is available as soon as it exists | ||
| resourceSpec := orcObject.Spec.Resource | ||
| if resourceSpec != nil && resourceSpec.Rules != nil { | ||
| // Make sure specified security group rules exist in resource | ||
|
|
||
| resourceStatus := orcObject.Status.Resource | ||
| if resourceStatus == nil || resourceStatus.Rules == nil { | ||
| return metav1.ConditionFalse, progress.WaitingOnOpenStack(progress.WaitingOnReady, securityGroupAvailablePollingPeriod) | ||
| } | ||
|
|
||
| if len(resourceSpec.Rules) != len(resourceStatus.Rules) { | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should not be comparing the number of rules in the spec and in the status, because they can be different: the security group inherits the default security group rules.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I am not sure I fully understand. So every security group implicitly inherits the rules of the security group But it seems like ORC also does not currently use the (inherited) default rules to populate the status of the security group, so that should not actually be a problem. Unless this is unwanted behaviour and ORC is supposed to do that in the future, in which case the change in this PR will indeed break.
Yes, my understanding is that the actuator will take care of making sure the rules are matching from a content perspective. That being said, the rules should only become part of the security group resource status if and when they match. So to update the
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, this confused me as well so I ran some tests:
I believe this is the expected behavior, and makes the experience with ORC deterministic. This is effectively a consequence of #174, and this comment in the actuator should now be removed because it's outdated. All that to say that in most cases, we can indeed check if the resource is fully provisioned by comparing the number of rules.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Got it. At first I thought you meant like some sort of implicit inheritence
I amended the commit to remove the comment
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! I'm still a unsure about the |
||
| return metav1.ConditionFalse, progress.WaitingOnOpenStack(progress.WaitingOnReady, securityGroupAvailablePollingPeriod) | ||
| } | ||
|
|
||
| if len(resourceSpec.Rules) != len(osResource.Rules) { | ||
| return metav1.ConditionFalse, progress.WaitingOnOpenStack(progress.WaitingOnReady, securityGroupAvailablePollingPeriod) | ||
| } | ||
| } | ||
|
|
||
| return metav1.ConditionTrue, nil | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thoughts, I wonder if this is really what we want: the security group is indeed available right away, meaning we can do operations on it. The
Availablecondition is accurate.The
Progressingcondition, on the other hand, should reflect that we still have security group rules being provisioned, and that the desired state does not yet match the observed state.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue #120 states that the current way to represent the
Availablestatus in security groups is undesired. As far as I understand, security group rules can only ever be part of a single security group. And they can not be cross-referenced. So I would agree with #120 in that a security group is only "available and matches the spec" when the security groups are actually created.However, this design decision has to be decided by the project according to it's own goals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this hinges on understanding what 'Available' most usefully means in the API. We certainly can proceed to create resources as soon as the SecurityGroup exists, but is that creating risk?
Another thought: we create a SecurityGroup, it's fully reconciled and available. We then modify its Rules. Should it still be Available in the period while the new spec has not been applied?
Is it sufficient that a user should wait for 'Available=True AND Progressing=False'?
I don't have the answers, unfortunately. I recommend writing down all the use cases and edge conditions you can think of and deciding what the correct status should be in all cases. If it can't be defined, does that mean we need to add a new Condition?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I think about it, the more I believe
Available=Trueshould mean we can operate on the resource. Granted, securitygroup is a bit special because we've convoluted 2 resources into 1 controller (security groups and security group rules).My current thinking is that:
statusfield for it.Progressingcondition.Likewise, I'm of the opinion that the
Availablecondition of the security group should not change when the rules are being updated. This can again be tracked via theProgressingcondition.This is what we say in the docs:
And also, about the
ResourceAvailableStatus()method:We should clarify what "ready to be used" means. I'll check how the other controllers implement the
ResourceAvailableStatus()method.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the existing controllers, they're all returning
Availableas soon as they're created, except for the resources that have astatusfield. For these we're looking at the following conditions:activeanddownstatusesactivestatusactivestatusactiveanddownstatusactivestatusactivestatusactiveandin-usestatusesUh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the spirit of this PR, but I think a better implementation would be to add a third condition rather than modifying Available.
The problem is Available=True, Progressing=True doesn't tell you what it's progressing on. Maybe you don't care if it's updating a tag, but you do care if a security rule failed. With the current situation tag updates a indistinguishable from rule creation error and doesn't provide enough info to the user:
I would propose to add a third condition for subresources. A SubResourcesReady condition for controllers with sub-resources (SecurityGroup→Rules, Router→Interfaces, etc.):
This gives users clear semantics:
This pattern could apply to other controllers too making it easy to understand sub resource failures and act on them, providing a better user experience. WDYT? @mandre @mdbooth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the question here comes down to "What is the goal of ORC and in what context do you want its users to operate?"
You have to decide whether you want to look at the resources that users define using ORC as "bit-for-bit representation of an OpenStack resources", or if you want to see these resources as abstractions or "ORC resources".
If you look at a SecurityGroup as a bit-for-bit OpenStack resource representation, then sure, it is ready/available as soon as it is accepted and created.
In this case I personally think a SecurityGroupRule should be its own resource and not be part of the SecurityGroup. Similar to how RouterInterfaces and Routers are handled.
If, however, you look at how SecurityGroups are defined in ORC - with specific rules stated in the spec - then it becomes an abstraction of a OpenStack resource - an "ORC resource". And the available status should represent that. In that case, I don't think a third status is necessary. At least for SecurityGroups it makes no sense to be available when the rules are failing to be created.
I think both views are valid, but the project has to decide on one and clearly specify what it is.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe adding this condition offers a few key advantages:
My general goal here is to find a middle ground, let’s provide the data and let the user decide how they want ORC to act in their specific environment or even on a per-resource basis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle I agree we need a new condition to differentiate between when the resource is available (meaning you can operate on it) and when we consider it ready to be used (generally when sub resources are all ready). From the k8s API convention document:
If we wanted to give control to the user, for what condition to look for when provisioning resources that reference other resources, we'd have to change the dependency management a bit. Right now, controllers have watches for when deps become available. We would need watches for other conditions, which can quickly become heavy to manage.
Also, now that I look at it, the port controller does not even check that the security group is Available (here and here) 🤦... We should check it.
I'm inclined to accept this change, and mark the security group available only when all rules are fully provisioned, until we have a need for more granularity.
Could you then make sure the Port controller operates on available security groups?