Skip to content

Conversation

@prashanthjos
Copy link
Contributor

Add per-host connection limits to DestinationRule

Description

This PR adds support for per-host connection limits in the API by introducing a new perHostLimits field to ConnectionPoolSettings. This feature aligns with Envoy's per_host_thresholds capability for circuit breakers.

Motivation

Currently, Istio's DestinationRule only allows setting a global connection limit for the entire cluster, independent of the number of endpoints. This makes it difficult to properly manage concurrency for destination services, especially in autoscaling scenarios where the number of replicas changes dynamically.

Per-host connection limits allow controlling connections to each individual endpoint, which:

  • Prevents overload of individual hosts
  • Manages concurrency properly in autoscaling scenarios
  • Maintains healthy connection limits without hitting overloaded applications

The underlying Envoy cluster circuit breaker already has this per_host_thresholds capability. Other Envoy-based tools (Envoy Gateway, Contour) have already added this support.

Example Usage

apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: reviews-per-host-limits
spec:
  host: reviews.prod.svc.cluster.local
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      perHostLimits:
        tcp:
          maxConnections: 10

Limitations

Currently only the maxConnections field is supported for per-host limits, as per Envoy's circuit breaker implementation. Other fields in TCPSettings will be ignored.

Testing

Generated CRD files successfully with make gen
Test case not included due to CRD validation cost limits (will be validated in istio/istio integration)

Fixes #57697

@prashanthjos prashanthjos requested a review from a team as a code owner December 13, 2025 22:38
@istio-policy-bot
Copy link

🤔 🐛 You appear to be fixing a bug in Go code, yet your PR doesn't include updates to any test files. Did you forget to add a test?

Courtesy of your friendly test nag.

@istio-testing istio-testing added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 13, 2025
Add perHostLimits field to ConnectionPoolSettings to enable per-host connection limits for individual endpoints in a cluster.This aligns with Envoy's per_host_thresholds capability and is useful for preventing endpoint overload in autoscaling scenarios.

Currently only maxConnections field is supported per Envoy's implementation.

Fixes #57697
message PerHostLimits {
// Maximum number of connections to each individual host in the upstream cluster.
// If not specified, there is no per-host limit.
int32 max_connections = 1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses a simplified structure with a single max_connections field rather than reusing TCPSettings to avoid exceeding Kubernetes CRD validation cost limits. The TCPSettings message contains multiple Duration fields with CEL validation rules that would double the validation cost when reused here, causing the total schema validation cost to exceed Kubernetes limits by ~9%. Since Envoy only supports max_connections for per-host limits currently, this simplified structure provides the same functionality while staying within CRD validation budgets.

@prashanthjos
Copy link
Contributor Author

prashanthjos commented Dec 15, 2025

@ramaraochavali, @howardjohn , @therealmitchconnors , @keithmattix can you please share your thoughts when you get sometime.

Copy link
Contributor

@keithmattix keithmattix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no concerns with this in principle, but we might want to discuss how we expect DestinationRule to evolve generally. Especially as Backend evolves in Kubernetes

@ramaraochavali
Copy link
Contributor

IMO PerHostThresholds make sense for DFP type of cluster where each host can point to a different upstream service/external endpoint but not very useful for regular clusters. The cluster load balancing algorithm will almost ensure requests are sent to hosts that can handle them with low latency(Least Request) still ensuring the cluster level circuit breakers are honored to prevent from cascading failures . Adaptive concurrency is a better choice to handle auto scaling behaviors. Also PerHostThresholds only support max connections. Even with connections under the limit, the backend host can be overwhelmed with lot of requests in some protocols like HTTP/2.

@prashanthjos
Copy link
Contributor Author

prashanthjos commented Dec 16, 2025

Thank you for the thoughtful review, I agree with several of the points raised and appreciate the perspective.

I agree that per-host thresholds are most obviously useful for DFP-style clusters, where individual endpoints may represent very different upstream services. I also agree that for regular service clusters, Envoy’s Least Request load balancing and cluster-level circuit breakers already do a good job of steering traffic away from overloaded endpoints and preventing cascading failures.

That said, I believe per-host connection limits still provide complementary value, even for regular Kubernetes service clusters, particularly in the following cases:

Proactive protection vs reactive steering

Least Request and outlier detection are reactive mechanisms, they rely on latency or failures that have already occurred. Per-host connection limits provide a hard, proactive guardrail that prevents an endpoint from being overwhelmed in the first place, especially during sudden traffic spikes or cold-start scenarios.

Autoscaling lag and uneven capacity

In autoscaling environments, new pods often start with:

  • cold caches
  • reduced CPU availability
  • delayed readiness at the application layer

During these windows, per-host connection limits help ensure that a newly added or temporarily degraded pod is not immediately saturated, even if it technically passes readiness checks.

Connection pressure still matters, even with HTTP/2

I agree that per_host_thresholds currently only support max_connections, and that for protocols like HTTP/2 a single connection can still carry many concurrent streams. However, in practice:

  • connection establishment itself is not free (TLS, memory, file descriptors)
  • many applications still exhibit degradation under high connection churn or concurrent streams
  • limiting connections can still meaningfully reduce pressure on constrained backends

While not a complete concurrency control solution, per-host limits act as a coarse but effective safety mechanism.

Parity with Envoy and ecosystem tooling

Since Envoy already supports per_host_thresholds, and other Envoy-based projects (Envoy Gateway, Contour) have exposed this capability, providing it in Istio improves configuration parity and reduces surprises for users migrating between tools or using Istio in heterogeneous environments.

I fully agree that adaptive concurrency control is a superior long-term solution for handling autoscaling and dynamic load, and I see per-host connection limits as complementary rather than competing with that direction. Adaptive concurrency addresses request-level saturation, while per-host connection limits provide a simpler, deterministic control at the connection level.

If the concern is around:

  • encouraging misuse,
  • limited effectiveness for HTTP/2,
  • or unclear guidance for users,

I would be happy to:

  • clearly document the intended use cases and limitations,
  • scope the feature explicitly as an advanced / opt-in capability,
  • or align the API more closely with Envoy semantics to avoid confusion.

I am happy to go in either direction, just wanted to add my point of view.

@frittentheke
Copy link

frittentheke commented Dec 17, 2025

Adaptive concurrency is a better choice to handle auto scaling behaviors. Also PerHostThresholds only support max connections. Even with connections under the limit, the backend host can be overwhelmed with lot of requests in some protocols like HTTP/2.

I somewhat agree @ramaraochavali, even though I am really keen on (at least) getting some per host connection limit in (I opened the issue istio/istio#57697 after all).
Yes, e.g. HTTP/2 streams are an amplification in regards to concurrency and it would be awesome if the request queue could be limited per number of hosts (or multiples thereof). So if I run with 3 backends I might allow for 100 pending requests, but when scaling up to 30 I might also want to increase the request queue to 1000 accordingly.

This kind of "adaptive concurrency" as suggested in istio/istio#25991 and is supported by Envoy via https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/adaptive_concurrency_filter.
@prashanthjos has already stated in istio/istio#25991 (comment) he might also take a look at that one. It's a bigger change though.

But in the end it all comes down to managing the effects of Little's Law. Higher latency very quickly causes a concurrency explosion, causing the latency per request to go higher and higher until there is timeouts of the backend cannot take it anymore (if it does not implement some limit itself).

Only using and trusting on Outlier Detection makes managing concurrency a reactive task (there have to be traffic affecting issues already for Envoy to take action), even though Envoy has capabilities in regarding to maintaining a healthy flow of requests without static configuration of connection counts or even rate limits (which never hit the right number).

To me these capabilities are

a) adaptive LB algos like PEWMA (envoyproxy/envoy#20907 -> istio/proxy#6690, or Prequal (envoyproxy/envoy#42091). Which send requests to the backend likely resulting in the lowest latency and when looking again at Little's Law, server the user best (low latency) while also maximizing overall rps (at the same concurrency).

b) adaptive concurrency to not overwhelm individual backends with connections / requests as suggested in istio/istio#25991

@ramaraochavali
Copy link
Contributor

. proactive guardrail that prevents an endpoint from being overwhelmed in the first place, especially during sudden traffic spikes or cold-start scenarios.

These type of issues are better solved by warm-up configuration than circuit breakers

connection establishment itself is not free (TLS, memory, file descriptors)

This is true whether you do it at cluster level or at host level. But practically for HTTP2, Envoy would not create more than one connection per worker thread unless max_concurrent_streams is reached. Even high throughput apps wont create more connections in Http2.

Only using and trusting on Outlier Detection makes managing concurrency a reactive task (there have to be traffic affecting issues already for Envoy to take action),

I am not advocating outlier detection to handle this. I am just saying static per host connection limits wont help much.

@prashanthjos
Copy link
Contributor Author

@ramaraochavali I agree that, in practice, this change is unlikely to have a significant impact for most standard Kubernetes service clusters, especially given Envoy’s existing load-balancing behavior and cluster-level circuit breakers. That said, I also don’t see any real downside or risk in exposing this capability, since it is already supported by Envoy and aligns Istio with the broader Envoy ecosystem.

While per-host connection limits may not materially improve concurrency management in many HTTP/2-heavy workloads, they can still serve as a simple, deterministic guardrail in certain scenarios and provide configuration parity with tools like Envoy Gateway and Contour. As long as the feature is clearly documented with its limitations and positioned as an advanced or niche control rather than a primary concurrency mechanism, I don’t think it introduces confusion or harm.

Given that, I’m comfortable with this moving forward and would defer to the broader perspective and judgment of @ramaraochavali, @howardjohn, @therealmitchconnors, and @keithmattix for the final decision.

Please do let me know.

@prashanthjos
Copy link
Contributor Author

prashanthjos commented Dec 17, 2025

One additional point worth calling out is that per-host connection limits are significantly more valuable for pure TCP workloads (e.g., Redis, DBs, Kafka) than for HTTP/2 services. For these protocols, connections map directly to backend resource consumption (threads/event loops, memory, file descriptors). There is no request multiplexing or adaptive concurrency signal, so limiting connections at the per-host level is one of the few effective and generic guardrails available to prevent individual endpoints from being overwhelmed, especially during bursts, restarts, or scaling events.
Many of the concerns around limited effectiveness (HTTP/2 stream amplification, Envoy connection reuse, etc.) do not apply in this case. Adaptive concurrency and advanced LB algorithms are excellent directions, but they primarily address request-level saturation and are not applicable to raw TCP protocols.

Given that Envoy already supports per_host_thresholds, and similar tooling has exposed this capability, adding this to Istio seems reasonable as a complementary, opt-in mechanism, particularly for TCP services, provided the limitations are clearly documented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants