Introduction
While HTTP/2 offers significant improvements over HTTP/1.1—such as multiplexing, header compression, and persistent connections—it also introduces new layers of complexity. Debugging real-world issues involving HTTP/2 can be pretty challenging, especially when working with proxies like Envoy that handle low-level protocol details transparently.
We also use HTTP/2 for our services to take advantage of reusable connections. This can be easily achieved using Istio, which is based on the Envoy and supports high-performance traffic handling. However, sometimes, it is tough to debug with the Envoy because it is too complicated and challenging to know/understand all the options it provides.
Situation
One day, we were told the client could not send the large file. When we tried manually, it got the following error: “upstream connect error or disconnect/reset before headers. reset reason: connection termination”. It was curious because the endpoint was okay, and the network metric seemed fine. One weird thing was that we couldn’t send the file only with the specific size limit, but we never set this limit. So, we needed to dig deeper into this situation.
Timeout: Sometimes, TCP or request timeout can be the reason for sending a large payload. Sending large payloads takes some time, and it could exceed the timeout. We calculated the response time to check the timeout issue, but it was not over the timeout we had set.
TCP window size full: HTTP/2 utilizes one TCP connection with multiple streams. Sometimes, it is possible that the connection becomes really busy, and TCP cannot receive more packets. To check this, we dumped the TCP connection and checked the TCP window size. Unfortunately, our TCP window size was large enough to receive more packets.
HTTP/2 per connection buffer limit size: We can set the buffer limit size of each cluster connection in Envoy, but we never set the stream buffer size(the default is 1MB). If the application could not read the data faster, this might be the limitation. So, we tried to increase the
per_connection_buffer_limit_bytes
value but had no effect.Check the Envoy debug log: We checked the Envoy debugging log to find out what happened when we sent the large file. We found many “error sending frames: Too many frames in the outbound queue.” error messages in the log. There was an
max_outbound_frames
option; the connection would be terminated if the max outbound frame count exceeded this limit. I mean, TCP connection, not the stream. When we changed themax_outbound_frames
value, it worked well.
Why did Envoy adopt the max_outbound_frames?
While searching for the history, I found the following security issue in the Envoy Github.
https://github.com/envoyproxy/envoy/security/advisories/GHSA-hm8q-x6qm-xxvw
https://github.com/envoyproxy/envoy/security/advisories/GHSA-q24r-4w7h-qv3p
https://github.com/envoyproxy/envoy/security/advisories/GHSA-5m79-fj88-wc5m
https://github.com/envoyproxy/envoy/security/advisories/GHSA-jhv4-f7mr-xx76
The max_outbound_frames option was adopted in 2019 when there was a ping flood issue with HTTP/2. However, there was another similar issue in 2023. Even though this was not the main reason for adopting the option, we can find why it should terminate the whole connection, not the stream, here.
There was a DoS Vulnerability related to the Rapid Reset in HTTP/2. Simply put, it was the DoS attack by sending many requests using stream with RST_STREAM frames.
RST_STREAM is used to close the stream logically within the TCP. Basically, most proxy has the default max concurrent streams per TCP connection, so many simultaneous requests would create many TCP connections. This can be detected quickly, and scale-out might mitigate the issue.
However, because of the RST_STREAM frame, the server closes the stream every time it gets the header. However, the server should create the resource to handle the header and route to other destinations if used in the proxy. It can cause a resource shortage because it consumes the CPU or network. To make matters worse, it is not easy to detect because it is proceeded within a single TCP connection.

The Google blog said we should close the TCP connection to mitigate this issue. This will prevent the malicious user from using all the server resources. Even though trying again with a new TCP connection is possible, it would be closed every time it tries.
You can find this by checking the TCP dump file. I reproduced the traffic to test this option. I made a script to regularly send the traffic with a small payload. The server will wait 30 seconds before sending the response, which means it is unnecessary to close the stream before sending the large file. Then, I sent a massive payload with the HTTP/2 protocol.
Server: listening port 8000
Envoy: Listening port 10000 and proxy all requests to server:8000
Here is a sample of the TCP stream. As you can see, stream 5 sent the request, and it did not get a response. (There was no 200 OK packet.) However, when I sent the heavy request with a 100MB payload, the whole TCP connection was finished because of the envoy. The FIN packet sender was the Envoy!
Why was the outbound queue increased?
The Envoy provides the HTTP Buffer filter to limit the request size. We set this filter to prevent users from sending too large files, which might affect network performance. This is also useful when the upstream cannot read the stream data.
If this filter is enabled, when the Envoy receives the frame, it holds the data inside the frame and does not send it upstream. When it gets all the payload, it passes the data upstream with a content-length header(if missing). At this moment, the nghttp2 library splits all the data into multiple frames to meet the requirements for the HTTP/2 protocol.
The HTTP/2 frame size is 16KB by default. So, nghttp2 will break the payload down to a 16KB chunk and send it as one frame. When the frame is ready, Envoy adds it to outbound_queue until the frame is well received.
However, splitting frames is way faster than sending frames upstream. This caused the outbound_queue to overflow.
Takeaways
We should check the max_outbound_frames value when the buffer filter is enabled. Buffer filtering can be useful when you want to buffer the request inside the Envoy, but this can cause unexpected TCP termination.
We should set the maximum number of concurrent streams in a single TCP connection to mitigate the flooding issue. Otherwise, a single TCP termination will cause bulk stream resets.