Skip to content

Conversation

@tthvo
Copy link
Member

@tthvo tthvo commented Sep 9, 2025

Important

A rough draft of installer changes required to support dual-stack environment on AWS.

This PR is only for previewing the changes and experimenting with upstream CAPA PR. I will close this and open another PR with finalized sets of changes.

This PR also includes commits (message starting with hack: ) to "imitate" CCM, MAPI, and Cluster Ingress Operator to create necessary resources for cluster ingress (i.e. NLB, Route53 records, Security Groups, etc) and enable Ipv6 primary (if applicable). These commits are to be removed, assuming dual-stack is supported in operators later on.

This depends on upstream CAPA PR: kubernetes-sigs/cluster-api-provider-aws#5603

How to install

Below is the details of how to reproduce the installation.

$ export OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=quay.io/thvo/origin-release:v4.22.0-preview-ds
$ export AWS_PROFILE=<profile-admin>
$ ./openshift-install create cluster --dir=.

Custom release image

Custom release image: quay.io/thvo/origin-release:v4.22.0-preview-ds

This includes the following operator changes:

For the cluster-network-operator, we have the open PR here with feature gate checking: openshift/cluster-network-operator/pull/2804

Install Config

Use the below install-config snippet to configure networking and AWS platform.

Note: machineNetwork does not contain IPv6 CIDR as it is unknown at install time (i.e. will be patched later when infra is ready). The cluster network and service network contain ULA IPv6 CIDR.

IPv4 Primary:

networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  - cidr: fd01::/48
    hostPrefix: 64
  machineNetwork:
  - cidr: 10.0.0.0/16
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
  - fd02::/112
platform:
  aws:
    region: us-east-1
    ipFamily: DualStackIPv4Primary
featureSet: TechPreviewNoUpgrade

IPv6 Primary:

networking:
  clusterNetwork:
  - cidr: fd01::/48
    hostPrefix: 64
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 10.0.0.0/16
  networkType: OVNKubernetes
  serviceNetwork:
  - fd02::/112
  - 172.30.0.0/16
platform:
  aws:
    region: us-east-1
    ipFamily: DualStackIPv6Primary
featureSet: TechPreviewNoUpgrade

Important notes: [IPv6-primary only] The ingress operator will be stuck as health check on targets are failing because the k8s Service for ingress routers only have IPv6 cluster IP. The hacks only configures the ingress LB target group as IPv4, thus the connection cannot switch to IPv6 when travelling internally.

You must edit the that service openshift-ingress/router-nodeport-default to set its ipFamilyPolicy to PreferDualStack. For example:

$ kubectl -n openshift-ingress patch svc router-nodeport-default \
    -p '{"spec":{"ipFamilyPolicy":"PreferDualStack"}}'

Updated: There is a new commit to "hack" enable IPv6 primary on ec2 instances for cluster nodes. So, above step is no longer needed. The Target Group for dual-stack IPv6 primary is now also IPv6.

Installer binary

The installer binary can be built normally from these commits (i.e. capa is vendored from my fork). So, just:

./hack/build.sh

/hold
/label platform/aws

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 9, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Sep 9, 2025

@tthvo: This pull request references CORS-4072 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target either version "4.21." or "openshift-4.21.", but it targets "openshift-4.20" instead.

Details

In response to this:

A rough draft of installer changes required to support dual-stack environment on AWS.

Important

This PR is only for previewing the changes and supposed to be closed after done experimenting. I will open another PR with finalized sets of changes.

Notes

/hold
/label platform/aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. platform/aws labels Sep 9, 2025
@tthvo
Copy link
Member Author

tthvo commented Sep 9, 2025

/cc @sadasu @barbacbd @rna-afk

@tthvo
Copy link
Member Author

tthvo commented Sep 9, 2025

/cc @mtulio

Just rough hacks but in case you are interested :D

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Sep 9, 2025

@tthvo: This pull request references CORS-4072 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target either version "4.21." or "openshift-4.21.", but it targets "openshift-4.20" instead.

Details

In response to this:

A rough draft of installer changes required to support dual-stack environment on AWS.

This PR is only for previewing the changes and experimenting with upstream CAPA PR. I will close this and open another PR with finalized sets of changes.

How to install

Below is the details of how to reproduce the installation.

$ export OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=quay.io/thvo/origin-release:v4.21.0-preview
$ export AWS_PROFILE=<profile-admin>
$ ./openshift-install create cluster --dir=.

Custom release image

Custom release image: quay.io/thvo/origin-release:v4.21.0-preview.

This includes the following operator changes:

Install Config

Use the below install-config snippet to configure networking and AWS platform.

networking:
 clusterNetwork:
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 - cidr: fd01::/48
   hostPrefix: 64
 machineNetwork:
 - cidr: 10.0.0.0/16
 networkType: OVNKubernetes
 serviceNetwork:
 - 172.30.0.0/16
 - fd02::/112
platform:
 aws:
   region: us-east-1
   infraStack: DualStack

Note that machineNetwork does not contain IPv6 CIDR as it is unknown at install time (i.e. will be patched later when infra is ready). The cluster network and service network contain ULA IPv6 CIDR.

Installer binary

It looks the installer binary can be built from these commits despite a reference to my local CAPA fork. So, just:

./hack/build.sh

Important

/hold
/label platform/aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@tthvo tthvo force-pushed the CORS-4072 branch 3 times, most recently from 0e37a46 to 537f4d0 Compare September 9, 2025 07:17
@tthvo
Copy link
Member Author

tthvo commented Sep 10, 2025

/test e2e-aws-default-config e2e-aws-ovn-shared-vpc-custom-security-groups

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 13, 2025
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 15, 2025
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 21, 2025
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 21, 2025
@tthvo
Copy link
Member Author

tthvo commented Oct 21, 2025

/retitle CORS-4072: [Draft] Dual stack support for AWS

This PR is for experimenting and collecting info about what changes are needed. I will separate the commits into smaller PRs :D

PTAL 🙏 All reviews and nitpicks are appreciated!

@openshift-ci openshift-ci bot changed the title CORS-4072: [WIP] Dual stack support for AWS CORS-4072: [Draft] Dual stack support for AWS Oct 21, 2025
@tthvo tthvo force-pushed the CORS-4072 branch 2 times, most recently from d5cef48 to dac6b81 Compare October 21, 2025 21:03
@tthvo
Copy link
Member Author

tthvo commented Oct 31, 2025

The rebase is to stay on top of upstream/main and remove the hack for MCO (9fa264d) as MCO should handle setting the approriate 0.0.0.0/:: to --node-ip argument of kubelet. See PR description more details 😁 🙏

@tthvo
Copy link
Member Author

tthvo commented Nov 11, 2025

I rebuilt another release image: quay.io/thvo/origin-release:v4.21.0-preview-1. This includes the changes for openshift/cluster-network-operator#2804 instead of my own hack tthvo/cluster-network-operator@617e05f.

If you'd like to use the new custom release image, you need to set the techpreview feature set:

featureSet: TechPreviewNoUpgrade

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 11, 2025
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 13, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 13, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign sadasu for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jan 13, 2026

@tthvo: This pull request references CORS-4072 which is a valid jira issue.

Details

In response to this:

Important

A rough draft of installer changes required to support dual-stack environment on AWS.

This PR is only for previewing the changes and experimenting with upstream CAPA PR. I will close this and open another PR with finalized sets of changes.

This PR also includes commits (message starting with hack: ) to "imitate" CCM, MAPI, and Cluster Ingress Operator to create necessary resources for cluster ingress (i.e. NLB, Route53 records, Security Groups, etc) and enable Ipv6 primary (if applicable). These commits are to be removed, assuming dual-stack is supported in operators later on.

This depends on upstream CAPA PR: kubernetes-sigs/cluster-api-provider-aws#5603

How to install

Below is the details of how to reproduce the installation.

$ export OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=quay.io/thvo/origin-release:v4.22.0-preview-ds
$ export AWS_PROFILE=<profile-admin>
$ ./openshift-install create cluster --dir=.

Custom release image

Custom release image: quay.io/thvo/origin-release:v4.22.0-preview-ds

This includes the following operator changes:

For the cluster-network-operator, we have the open PR here with feature gate checking: openshift/cluster-network-operator/pull/2804

Install Config

Use the below install-config snippet to configure networking and AWS platform.

Note: machineNetwork does not contain IPv6 CIDR as it is unknown at install time (i.e. will be patched later when infra is ready). The cluster network and service network contain ULA IPv6 CIDR.

IPv4 Primary:

networking:
 clusterNetwork:
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 - cidr: fd01::/48
   hostPrefix: 64
 machineNetwork:
 - cidr: 10.0.0.0/16
 networkType: OVNKubernetes
 serviceNetwork:
 - 172.30.0.0/16
 - fd02::/112
platform:
 aws:
   region: us-east-1
   ipFamily: DualStackIPv4Primary

IPv6 Primary:

networking:
 clusterNetwork:
 - cidr: fd01::/48
   hostPrefix: 64
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 machineNetwork:
 - cidr: 10.0.0.0/16
 networkType: OVNKubernetes
 serviceNetwork:
 - fd02::/112
 - 172.30.0.0/16
platform:
 aws:
   region: us-east-1
   ipFamily: DualStackIPv6Primary

Important notes: [IPv6-primary only] The ingress operator will be stuck as health check on targets are failing because the k8s Service for ingress routers only have IPv6 cluster IP. The hacks only configures the ingress LB target group as IPv4, thus the connection cannot switch to IPv6 when travelling internally.

You must edit the that service openshift-ingress/router-nodeport-default to set its ipFamilyPolicy to PreferDualStack. For example:

$ kubectl -n openshift-ingress patch svc router-nodeport-default \
   -p '{"spec":{"ipFamilyPolicy":"PreferDualStack"}}'

Updated: There is a new commit to "hack" enable IPv6 primary on ec2 instances for cluster nodes. So, above step is no longer needed. The Target Group for dual-stack IPv6 primary is now also IPv6.

Installer binary

The installer binary can be built normally from these commits (i.e. capa is vendored from my fork). So, just:

./hack/build.sh

/hold
/label platform/aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jan 13, 2026

@tthvo: This pull request references CORS-4072 which is a valid jira issue.

Details

In response to this:

Important

A rough draft of installer changes required to support dual-stack environment on AWS.

This PR is only for previewing the changes and experimenting with upstream CAPA PR. I will close this and open another PR with finalized sets of changes.

This PR also includes commits (message starting with hack: ) to "imitate" CCM, MAPI, and Cluster Ingress Operator to create necessary resources for cluster ingress (i.e. NLB, Route53 records, Security Groups, etc) and enable Ipv6 primary (if applicable). These commits are to be removed, assuming dual-stack is supported in operators later on.

This depends on upstream CAPA PR: kubernetes-sigs/cluster-api-provider-aws#5603

How to install

Below is the details of how to reproduce the installation.

$ export OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=quay.io/thvo/origin-release:v4.22.0-preview-ds
$ export AWS_PROFILE=<profile-admin>
$ ./openshift-install create cluster --dir=.

Custom release image

Custom release image: quay.io/thvo/origin-release:v4.22.0-preview-ds

This includes the following operator changes:

For the cluster-network-operator, we have the open PR here with feature gate checking: openshift/cluster-network-operator/pull/2804

Install Config

Use the below install-config snippet to configure networking and AWS platform.

Note: machineNetwork does not contain IPv6 CIDR as it is unknown at install time (i.e. will be patched later when infra is ready). The cluster network and service network contain ULA IPv6 CIDR.

IPv4 Primary:

networking:
 clusterNetwork:
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 - cidr: fd01::/48
   hostPrefix: 64
 machineNetwork:
 - cidr: 10.0.0.0/16
 networkType: OVNKubernetes
 serviceNetwork:
 - 172.30.0.0/16
 - fd02::/112
platform:
 aws:
   region: us-east-1
   ipFamily: DualStackIPv4Primary
featureSet: TechPreviewNoUpgrade

IPv6 Primary:

networking:
 clusterNetwork:
 - cidr: fd01::/48
   hostPrefix: 64
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 machineNetwork:
 - cidr: 10.0.0.0/16
 networkType: OVNKubernetes
 serviceNetwork:
 - fd02::/112
 - 172.30.0.0/16
platform:
 aws:
   region: us-east-1
   ipFamily: DualStackIPv6Primary
featureSet: TechPreviewNoUpgrade

Important notes: [IPv6-primary only] The ingress operator will be stuck as health check on targets are failing because the k8s Service for ingress routers only have IPv6 cluster IP. The hacks only configures the ingress LB target group as IPv4, thus the connection cannot switch to IPv6 when travelling internally.

You must edit the that service openshift-ingress/router-nodeport-default to set its ipFamilyPolicy to PreferDualStack. For example:

$ kubectl -n openshift-ingress patch svc router-nodeport-default \
   -p '{"spec":{"ipFamilyPolicy":"PreferDualStack"}}'

Updated: There is a new commit to "hack" enable IPv6 primary on ec2 instances for cluster nodes. So, above step is no longer needed. The Target Group for dual-stack IPv6 primary is now also IPv6.

Installer binary

The installer binary can be built normally from these commits (i.e. capa is vendored from my fork). So, just:

./hack/build.sh

/hold
/label platform/aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@tthvo
Copy link
Member Author

tthvo commented Jan 13, 2026

The PR is rebased on top of main with latest capa IPv6 PR changes; and adjusted to match #10207 implementation for the install-config field. I also rebuilt a new custom release image based on 4.22.0-ec.0: quay.io/thvo/origin-release:v4.22.0-preview-ds.

Both dual-stack Ipv4 primary and dual-stack IPv6 primary install (using install-config in the PR description) should proceed to the end successfully (and "seamlessly") 😄

Copy link
Contributor

@patrickdillon patrickdillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good on first pass

tthvo and others added 11 commits January 14, 2026 12:35
Based on install-config input, update AWSPlatformStatus's IPFamily
field within the Infrastructure manifest. Update unit tests to reflect
this new field.
…-network-server

The commit ensures all service networks are considered (i.e. that is all
IP families) when generating the certificate
kube-apiserver-service-network-server.
The installconfig in the cluster-config ConfigMap needs to have the
Ipv6 CIDR of the VPC in the case of full IPI.
This applies to dualstack installation only.

IPv4-primary: IPv4 Target Group
IPv6-primary: Ipv6 Target Group
FIXME: we should use the VPC CIDR as the source CIDRs. But the IPv6 cidr
is not yet knowned at install time. We should edit the awscluster after
infraReady to add the VPC IPv6 CIDR as source instead.
FIXME: CCM and in-cluster MAPI/CAPI needs to handle this
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 15, 2026

@tthvo: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure-ovn-resourcegroup 49c133a link false /test e2e-azure-ovn-resourcegroup
ci/prow/e2e-aws-custom-dns-techpreview 49c133a link false /test e2e-aws-custom-dns-techpreview
ci/prow/okd-scos-e2e-aws-ovn 44dfeb3 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-azure-ovn 0985757 link true /test e2e-azure-ovn
ci/prow/e2e-aws-ovn-heterogeneous 0985757 link false /test e2e-aws-ovn-heterogeneous
ci/prow/e2e-gcp-custom-dns 0985757 link false /test e2e-gcp-custom-dns
ci/prow/e2e-azurestack 0985757 link false /test e2e-azurestack

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@tthvo
Copy link
Member Author

tthvo commented Jan 27, 2026

I have started on splitting the changes in this PR into smaller individual PRs for easy review. But this PR is still kept open for quick testing.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 27, 2026
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. platform/aws

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants