-
Notifications
You must be signed in to change notification settings - Fork 589
feat: add hbone_idle_timeout field to MeshConfig API #3611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Add configurable idle timeout for HBONE connections between proxies and ztunnel to address stale connection reuse when pod IPs are recycled. This is particularly critical in environments with aggressive IP address reuse, such as AWS EKS with VPC CNI (default 30s cooldown period). Without an explicit idle timeout, Envoy defaults to 1 hour, causing proxies to reuse stale connections from connection pools when target pod IPs are recycled, resulting in 503 errors and upstream reset failures. The new hbone_idle_timeout field in MeshConfig allows operators to configure the idle timeout appropriately for their environment. For AWS VPC CNI, a value of 15 seconds is recommended.
|
😊 Welcome @dcoppa! This is either your first contribution to the Istio api repo, or it's been You can learn more about the Istio working groups, Code of Conduct, and contribution guidelines Thanks for contributing! Courtesy of your friendly welcome wagon. |
|
Hi @dcoppa. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
keithmattix
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one comment on docs and placement of the field.
/cc @howardjohn
|
GitHub is being weird for me..here's my review comment: |
I believe the current placement in MeshConfig is more appropriate because the underlying issue is infrastructure-wide: IP address recycling in the AWS VPC CNI affects all workloads equally, and the 30-second cooldown period is applied cluster-wide. As a result, there is no clear justification for giving different workloads distinct HBONE idle timeouts. This choice is also consistent with the existing connect_timeout, which already resides in MeshConfig and represents a similar connection-level timeout. As for the documentation, I tried to make the comment clearer following your advice. Is it better now? |
Add configurable idle timeout for HBONE connections between proxies and ztunnel to address stale connection reuse when pod IPs are recycled.
This is particularly critical in environments with aggressive IP address reuse, such as AWS EKS with VPC CNI (default 30s cooldown period). Without an explicit idle timeout, Envoy defaults to 1 hour, causing proxies to reuse stale connections from connection pools when target pod IPs are recycled, resulting in 503 errors and upstream reset failures.
The new hbone_idle_timeout field in MeshConfig allows operators to configure the idle timeout appropriately for their environment. For AWS VPC CNI, a value of 15 seconds is recommended.
See: istio/istio#58389