ActiveTcpClient destructor deletes the wrong connection pool #37679

luksa · 2024-12-16T10:00:01Z

Title: ActiveTcpClient destructor deletes the wrong connection pool

Description:
We're seeing cases where an upstream connection is closed locally for no reason. If retries are configured, then every new connection is also immediately closed until max retries is reached.

The issue happens when a connection is being closed at the same time as a new connection to the same cluster is being established. If these two events are processed in the same call to event_base_loop, then the subsequent clearing of the deferred deletion list will destroy the wrong connection pool (the new one instead of the old one). If retries are configured, then yet another new pool and connection are created, but these are again destroyed immediately. This then continues for every retry.

Here's what happens:

ActiveTcpClient for the upstream connection is closed and added to the deferred deletion list.
The connection pool is removed from the pool map.
A new connection pool is created for the same cluster (and with the same hash_key), since the previous one is no longer in the map.
A new upstream connection is created in the new connection pool.
The deferred deletion list is then cleared. This invokes the old ActiveTcpClient's destructor.
The destructor invokes the old pool's idle callbacks.
This is where things go wrong: The idle callback erroneously erases the new pool, because tcpConnPoolIsIdle() finds and erases the pool by hash_key from the pools map (this map contains the new pool, because the old one was already removed in step 2).

Repro steps:
We haven't been able to reproduce this ourselves, but our customer is seeing this in their test cluster. However, as explained earlier, this happens when a new connection arrives at the same time as a previous connection is being closed (perhaps a key thing here is also that the old upstream connection is being drained). To reproduce this, you must ensure that the following two events are processed in the same invocation of event_base_loop():

connection close event that causes the connection pool to become idle and thus erased
a new connection event that causes a new connection pool to be created (with the same hash_key)

After these two events are processed, the ThreadLocalClusterManagerImpl will contain the new connection pool in its pools_ map instead of the old one; at the same time, the deferred deletion list will contain the ActiveTcpClient of the previous connection. When this ActiveTcpClient's destructor is called, it will call isIdleImpl() on the old connection pool, and then call the idle callbacks on this same (old) connection pool. But, the idle call back itself (tcpConnPoolIsIdle) will delete the new connection pool.

Note: The Envoy_collect tool
gathers a tarball with debug logs, config and the following admin
endpoints: /stats, /clusters and /server_info. Please note if there are
privacy concerns, sanitize the data prior to sharing the tarball/pasting.

Admin and Stats Output:

Include the admin output for the following endpoints: /stats,
/clusters, /routes, /server_info. For more information, refer to the
admin endpoint documentation.

Note: If there are privacy concerns, sanitize the data prior to
sharing.

Config:
(this is just the relevant part of the config)

{
    "cluster": {
        "version_info": "2024-05-13T11:30:42Z/2",
        "cluster": {
            "@type": "type.googleapis.com/envoy.config.cluster.v3.Cluster",
            "name": "outbound|443||api-test.example.com",
            "type": "ORIGINAL_DST",
            "connect_timeout": "10s",
            "lb_policy": "CLUSTER_PROVIDED",
            "circuit_breakers": {
                "thresholds": [
                    {
                        "max_connections": 4294967295,
                        "max_pending_requests": 4294967295,
                        "max_requests": 4294967295,
                        "max_retries": 4294967295,
                        "track_remaining": true
                    }
                ]
            },
            "metadata": {
                "filter_metadata": {
                    "istio": {
                        "services": [
                            {
                                "name": "api-test.example.com",
                                "namespace": "test",
                                "host": "api-test.example.com"
                            }
                        ]
                    }
                }
            },
            "common_lb_config": {
                "locality_weighted_lb_config": {}
            },
            "filters": [
                {
                    "name": "istio.metadata_exchange",
                    "typed_config": {
                        "@type": "type.googleapis.com/envoy.tcp.metadataexchange.config.MetadataExchange",
                        "protocol": "istio-peer-exchange"
                    }
                }
            ]
        },
        "last_updated": "2024-05-13T11:33:54.523Z"
    },

    "listener": {
        "name": "0.0.0.0_443",
        "active_state": {
            "version_info": "2024-05-13T11:30:42Z/2",
            "listener": {
                "@type": "type.googleapis.com/envoy.config.listener.v3.Listener",
                "name": "0.0.0.0_443",
                "address": {
                    "socket_address": {
                        "address": "0.0.0.0",
                        "port_value": 443
                    }
                },
                "filter_chains": [
                    {
                        "filter_chain_match": {
                            "server_names": [
                                "api-test.example.com"
                            ]
                        },
                        "filters": [
                            {
                                "name": "istio.stats",
                                "typed_config": {
                                    "@type": "type.googleapis.com/stats.PluginConfig"
                                }
                            },
                            {
                                "name": "envoy.filters.network.tcp_proxy",
                                "typed_config": {
                                    "@type": "type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy",
                                    "stat_prefix": "outbound|443||api-test.example.com",
                                    "cluster": "outbound|443||api-test.example.com",
                                    "access_log": [
                                        {
                                            "name": "envoy.access_loggers.file",
                                            "typed_config": {
                                                "@type": "type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog",
                                                "path": "/dev/stdout",
                                                "log_format": {
                                                    "text_format_source": {
                                                        "inline_string": "[%START_TIME%] \"%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%\" %RESPONSE_CODE% %RESPONSE_FLAGS% %RESPONSE_CODE_DETAILS% %CONNECTION_TERMINATION_DETAILS% \"%UPSTREAM_TRANSPORT_FAILURE_REASON%\" %BYTES_RECEIVED% %BYTES_SENT% %DURATION% %RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)% \"%REQ(X-FORWARDED-FOR)%\" \"%REQ(USER-AGENT)%\" \"%REQ(X-REQUEST-ID)%\" \"%REQ(:AUTHORITY)%\" \"%UPSTREAM_HOST%\" %UPSTREAM_CLUSTER% %UPSTREAM_LOCAL_ADDRESS% %DOWNSTREAM_LOCAL_ADDRESS% %DOWNSTREAM_REMOTE_ADDRESS% %REQUESTED_SERVER_NAME% %ROUTE_NAME%\n"
                                                    }
                                                }
                                            }
                                        }
                                    ]
                                }
                            }
                        ]
                    }
                ],
                "listener_filters": [
                    {
                        "name": "envoy.filters.listener.tls_inspector",
                        "typed_config": {
                            "@type": "type.googleapis.com/envoy.extensions.filters.listener.tls_inspector.v3.TlsInspector"
                        }
                    }
                ],
                "traffic_direction": "OUTBOUND",
                "access_log": [
                    {
                        "name": "envoy.access_loggers.file",
                        "filter": {
                            "response_flag_filter": {
                                "flags": [
                                    "NR"
                                ]
                            }
                        },
                        "typed_config": {
                            "@type": "type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog",
                            "path": "/dev/stdout",
                            "log_format": {
                                "text_format_source": {
                                    "inline_string": "[%START_TIME%] \"%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%\" %RESPONSE_CODE% %RESPONSE_FLAGS% %RESPONSE_CODE_DETAILS% %CONNECTION_TERMINATION_DETAILS% \"%UPSTREAM_TRANSPORT_FAILURE_REASON%\" %BYTES_RECEIVED% %BYTES_SENT% %DURATION% %RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)% \"%REQ(X-FORWARDED-FOR)%\" \"%REQ(USER-AGENT)%\" \"%REQ(X-REQUEST-ID)%\" \"%REQ(:AUTHORITY)%\" \"%UPSTREAM_HOST%\" %UPSTREAM_CLUSTER% %UPSTREAM_LOCAL_ADDRESS% %DOWNSTREAM_LOCAL_ADDRESS% %DOWNSTREAM_REMOTE_ADDRESS% %REQUESTED_SERVER_NAME% %ROUTE_NAME%\n"
                                }
                            }
                        }
                    }
                ],
                "default_filter_chain": {
                    "filter_chain_match": {},
                    "filters": [
                        {
                            "name": "istio.stats",
                            "typed_config": {
                                "@type": "type.googleapis.com/stats.PluginConfig"
                            }
                        },
                        {
                            "name": "envoy.filters.network.tcp_proxy",
                            "typed_config": {
                                "@type": "type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy",
                                "stat_prefix": "BlackHoleCluster",
                                "cluster": "BlackHoleCluster",
                                "access_log": [
                                    {
                                        "name": "envoy.access_loggers.file",
                                        "typed_config": {
                                            "@type": "type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog",
                                            "path": "/dev/stdout",
                                            "log_format": {
                                                "text_format_source": {
                                                    "inline_string": "[%START_TIME%] \"%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%\" %RESPONSE_CODE% %RESPONSE_FLAGS% %RESPONSE_CODE_DETAILS% %CONNECTION_TERMINATION_DETAILS% \"%UPSTREAM_TRANSPORT_FAILURE_REASON%\" %BYTES_RECEIVED% %BYTES_SENT% %DURATION% %RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)% \"%REQ(X-FORWARDED-FOR)%\" \"%REQ(USER-AGENT)%\" \"%REQ(X-REQUEST-ID)%\" \"%REQ(:AUTHORITY)%\" \"%UPSTREAM_HOST%\" %UPSTREAM_CLUSTER% %UPSTREAM_LOCAL_ADDRESS% %DOWNSTREAM_LOCAL_ADDRESS% %DOWNSTREAM_REMOTE_ADDRESS% %REQUESTED_SERVER_NAME% %ROUTE_NAME%\n"
                                                }
                                            }
                                        }
                                    }
                                ]
                            }
                        }
                    ],
                    "name": "PassthroughFilterChain"
                },
                "bind_to_port": false
            },
            "last_updated": "2024-05-13T11:33:55.614Z"
        }
    }
}

Logs:
Notes:

see the comments on the right-hand side of some of the lines.
2277 and 2278 are the healthy connections
2279 is the downstream connection that eventually fails
2280 is the upstream connection that gets destroyed because the connection pool gets destroyed
2281 is the first upstream connection retry that also gets destroyed; the logs for the remaining retries are omitted

2024-12-09T10:54:27.067529Z	debug	envoy main external/envoy/source/common/event/dispatcher_impl.cc:349	dispatcher is processing an event	thread=28
2024-12-09T10:54:27.101673Z	trace	envoy connection external/envoy/source/common/network/connection_impl.cc:614	[Tags: "ConnectionId":"2278"] socket event: 3	thread=28
2024-12-09T10:54:27.101705Z	trace	envoy connection external/envoy/source/common/network/connection_impl.cc:737	[Tags: "ConnectionId":"2278"] write ready	thread=28
2024-12-09T10:54:27.101714Z	trace	envoy connection external/envoy/source/common/network/connection_impl.cc:654	[Tags: "ConnectionId":"2278"] read ready. dispatch_buffered_data=0	thread=28
2024-12-09T10:54:27.101781Z	trace	envoy connection external/envoy/source/common/network/raw_buffer_socket.cc:25	[Tags: "ConnectionId":"2278"] read returns: 1947	thread=28
2024-12-09T10:54:27.101792Z	trace	envoy connection external/envoy/source/common/network/raw_buffer_socket.cc:25	[Tags: "ConnectionId":"2278"] read returns: 0	thread=28
2024-12-09T10:54:27.101807Z	trace	envoy filter external/envoy/source/common/tcp_proxy/tcp_proxy.cc:765	[Tags: "ConnectionId":"2277"] upstream connection received 1947 bytes, end_stream=true	thread=28
2024-12-09T10:54:27.101814Z	trace	envoy connection external/envoy/source/common/network/connection_impl.cc:529	[Tags: "ConnectionId":"2277"] writing 1947 bytes, end_stream true	thread=28
2024-12-09T10:54:27.101829Z	trace	envoy connection external/envoy/source/common/network/connection_impl.cc:614	[Tags: "ConnectionId":"2277"] socket event: 2	thread=28
2024-12-09T10:54:27.101891Z	trace	envoy connection external/envoy/source/common/network/connection_impl.cc:737	[Tags: "ConnectionId":"2277"] write ready	thread=28
2024-12-09T10:54:27.101944Z	trace	envoy connection external/envoy/source/common/network/raw_buffer_socket.cc:70	[Tags: "ConnectionId":"2277"] write returns: 1947	thread=28
2024-12-09T10:54:27.101992Z	trace	envoy connection external/envoy/source/common/network/connection_impl.cc:614	[Tags: "ConnectionId":"2277"] socket event: 2	thread=28
2024-12-09T10:54:27.102016Z	trace	envoy connection external/envoy/source/common/network/connection_impl.cc:737	[Tags: "ConnectionId":"2277"] write ready	thread=28
2024-12-09T10:54:27.111569Z	trace	envoy connection external/envoy/source/common/network/connection_impl.cc:614	[Tags: "ConnectionId":"2277"] socket event: 3	thread=28
2024-12-09T10:54:27.111602Z	trace	envoy connection external/envoy/source/common/network/connection_impl.cc:737	[Tags: "ConnectionId":"2277"] write ready	thread=28
2024-12-09T10:54:27.111614Z	trace	envoy connection external/envoy/source/common/network/connection_impl.cc:654	[Tags: "ConnectionId":"2277"] read ready. dispatch_buffered_data=0	thread=28
2024-12-09T10:54:27.111640Z	trace	envoy connection external/envoy/source/common/network/raw_buffer_socket.cc:25	[Tags: "ConnectionId":"2277"] read returns: 31	thread=28
2024-12-09T10:54:27.111646Z	trace	envoy connection external/envoy/source/common/network/raw_buffer_socket.cc:25	[Tags: "ConnectionId":"2277"] read returns: 0	thread=28
2024-12-09T10:54:27.111660Z	trace	envoy filter external/envoy/source/common/tcp_proxy/tcp_proxy.cc:683	[Tags: "ConnectionId":"2277"] downstream connection received 31 bytes, end_stream=true	thread=28
2024-12-09T10:54:27.111671Z	trace	envoy connection external/envoy/source/common/network/connection_impl.cc:529	[Tags: "ConnectionId":"2278"] writing 31 bytes, end_stream true	thread=28
2024-12-09T10:54:27.111682Z	debug	envoy connection external/envoy/source/common/network/connection_impl.cc:714	[Tags: "ConnectionId":"2277"] remote close	thread=28
2024-12-09T10:54:27.111687Z	debug	envoy connection external/envoy/source/common/network/connection_impl.cc:278	[Tags: "ConnectionId":"2277"] closing socket: 0	thread=28                       <-- RemoteClose
2024-12-09T10:54:27.111718Z	trace	envoy connection external/envoy/source/common/network/connection_impl.cc:469	[Tags: "ConnectionId":"2277"] raising connection event 0	thread=28             <-- RemoteClose
2024-12-09T10:54:27.111769Z	trace	envoy filter external/envoy/source/common/tcp_proxy/tcp_proxy.cc:737	[Tags: "ConnectionId":"2277"] on downstream event 0, has upstream = true	thread=28
2024-12-09T10:54:27.111775Z	debug	envoy connection external/envoy/source/common/network/connection_impl.cc:146	[Tags: "ConnectionId":"2278"] closing data_to_write=31 type=0	thread=28
2024-12-09T10:54:27.111792Z	trace	envoy filter external/envoy/source/common/tcp_proxy/tcp_proxy.cc:954	[Tags: "ConnectionId":"2278"] draining the upstream connection	thread=28
2024-12-09T10:54:27.111798Z	trace	envoy conn_handler external/envoy/source/extensions/listener_managers/listener_manager/active_stream_listener_base.cc:125	[Tags: "ConnectionId":"2277"] tcp connection on event 0	thread=28       <-- handle LocalClose on downstream connection
2024-12-09T10:54:27.111803Z	debug	envoy conn_handler external/envoy/source/extensions/listener_managers/listener_manager/active_stream_listener_base.cc:135	[Tags: "ConnectionId":"2277"] adding to cleanup list	thread=28         <-- cleaning up downstream connection
2024-12-09T10:54:27.111808Z	trace	envoy main external/envoy/source/common/event/dispatcher_impl.cc:242	item added to deferred deletion list (size=1)	thread=28                                 <-- deferredDelete of activeTcpConnection (downstream)
2024-12-09T10:54:27.111812Z	trace	envoy main external/envoy/source/common/event/dispatcher_impl.cc:242	item added to deferred deletion list (size=2)	thread=28                                 <-- deferredDelete of active connections map from connections_by_context

2024-12-09T10:54:27.111868Z	trace	envoy filter external/envoy/source/extensions/filters/listener/original_dst/original_dst.cc:22	original_dst: new connection accepted	thread=28               <-- start of problematic downstream connection (2279)
2024-12-09T10:54:27.111878Z	debug	envoy filter external/envoy/source/extensions/filters/listener/original_dst/original_dst.cc:69	original_dst: set destination to 10.68.4.14:443	thread=28
2024-12-09T10:54:27.111915Z	trace	envoy filter external/envoy/source/extensions/filters/listener/tls_inspector/tls_inspector.cc:106	tls inspector: new connection accepted	thread=28
2024-12-09T10:54:27.111954Z	trace	envoy misc external/envoy/source/common/network/tcp_listener_impl.cc:116	TcpListener accepted 1 new connections.	thread=28
2024-12-09T10:54:27.111960Z	trace	envoy main external/envoy/source/common/event/dispatcher_impl.cc:124	clearing deferred deletion list (size=2)	thread=28
2024-12-09T10:54:27.112057Z	trace	envoy connection external/envoy/source/common/network/connection_impl.cc:614	[Tags: "ConnectionId":"2278"] socket event: 2	thread=28
2024-12-09T10:54:27.112063Z	trace	envoy connection external/envoy/source/common/network/connection_impl.cc:737	[Tags: "ConnectionId":"2278"] write ready	thread=28
2024-12-09T10:54:27.112155Z	trace	envoy connection external/envoy/source/common/network/raw_buffer_socket.cc:70	[Tags: "ConnectionId":"2278"] write returns: 31	thread=28
2024-12-09T10:54:27.112198Z	debug	envoy connection external/envoy/source/common/network/connection_impl.cc:788	[Tags: "ConnectionId":"2278"] write flush complete	thread=28
2024-12-09T10:54:27.112206Z	debug	envoy connection external/envoy/source/common/network/connection_impl.cc:278	[Tags: "ConnectionId":"2278"] closing socket: 1	thread=28
2024-12-09T10:54:27.112226Z	trace	envoy connection external/envoy/source/common/network/connection_impl.cc:469	[Tags: "ConnectionId":"2278"] raising connection event 1	thread=28         <-- LocalClose of upstream connection

2024-12-09T10:54:27.112235Z	debug	envoy pool external/envoy/source/common/conn_pool/conn_pool_base.cc:484	[Tags: "ConnectionId":"2278"] client disconnected, failure reason: 	thread=28
2024-12-09T10:54:27.112250Z	trace	envoy main external/envoy/source/common/event/dispatcher_impl.cc:242	item added to deferred deletion list (size=1)	thread=28                             <-- client added to deferred deletion list
2024-12-09T10:54:27.112383Z	debug	envoy pool external/envoy/source/common/conn_pool/conn_pool_base.cc:454	invoking idle callbacks - is_draining_for_deletion_=false	thread=28               <-- pool is erased
2024-12-09T10:54:27.112391Z	trace	envoy upstream external/envoy/source/common/upstream/cluster_manager_impl.cc:2140	Idle pool, erasing pool for host outbound|443||api-test.example.com10.68.4.14:443, hash key [0]	thread=28
2024-12-09T10:54:27.112394Z	trace	envoy main external/envoy/source/common/event/dispatcher_impl.cc:242	item added to deferred deletion list (size=2)	thread=28
2024-12-09T10:54:27.112401Z	trace	envoy main external/envoy/source/common/event/dispatcher_impl.cc:242	item added to deferred deletion list (size=3)	thread=28

2024-12-09T10:54:27.112406Z	trace	envoy filter external/envoy/source/common/network/listener_filter_buffer_impl.cc:95	onFileEvent: 1	thread=28
2024-12-09T10:54:27.112413Z	trace	envoy filter external/envoy/source/common/network/listener_filter_buffer_impl.cc:60	recv returned: 444	thread=28
2024-12-09T10:54:27.112417Z	trace	envoy filter external/envoy/source/extensions/filters/listener/tls_inspector/tls_inspector.cc:146	tls inspector: recv: 444	thread=28
2024-12-09T10:54:27.112464Z	trace	envoy filter external/envoy/source/extensions/filters/listener/tls_inspector/tls_inspector.cc:128	tls:onALPN(), ALPN: h2,http/1.1	thread=28
2024-12-09T10:54:27.112524Z	debug	envoy filter external/envoy/source/extensions/filters/listener/tls_inspector/tls_inspector.cc:137	tls:onServerName(), requestedServerName: api-test.example.com	thread=28
2024-12-09T10:54:27.112615Z	debug	envoy filter external/envoy/source/common/tcp_proxy/tcp_proxy.cc:246	[Tags: "ConnectionId":"2279"] new tcp proxy session	thread=28
2024-12-09T10:54:27.112620Z	trace	envoy connection external/envoy/source/common/network/connection_impl.cc:401	[Tags: "ConnectionId":"2279"] readDisable: disable=true disable_count=0 state=0 buffer_length=0	thread=28

2024-12-09T10:54:27.112635Z	debug	envoy filter external/envoy/source/common/tcp_proxy/tcp_proxy.cc:430	[Tags: "ConnectionId":"2279"] Creating connection to cluster outbound|443||api-test.example.com	thread=28
2024-12-09T10:54:27.112651Z	trace	envoy upstream external/envoy/source/extensions/clusters/original_dst/original_dst_cluster.cc:65	Using existing host outbound|443||api-test.example.com10.68.4.14:443 10.68.4.14:443.	thread=28
2024-12-09T10:54:27.112657Z	trace	envoy upstream external/envoy/source/common/upstream/cluster_manager_impl.cc:2111	Will allocate pool for host outbound|443||api-test.example.com10.68.4.14:443, hash key [0]	thread=28
2024-12-09T10:54:27.112661Z	debug	envoy misc external/envoy/source/common/upstream/cluster_manager_impl.cc:2257	Allocating TCP conn pool	thread=28
2024-12-09T10:54:27.112670Z	debug	envoy pool external/envoy/source/common/conn_pool/conn_pool_base.cc:291	trying to create new connection	thread=28
2024-12-09T10:54:27.112729Z	trace	envoy pool external/envoy/source/common/conn_pool/conn_pool_base.cc:292	ConnPoolImplBase 0x557bf3021400, ready_clients_.size(): 0, busy_clients_.size(): 0, connecting_clients_.size(): 0, connecting_stream_capacity_: 0, num_active_streams_: 0, pending_streams_.size(): 1 per upstream preconnect ratio: 1	thread=28
2024-12-09T10:54:27.112733Z	debug	envoy pool external/envoy/source/common/conn_pool/conn_pool_base.cc:145	creating a new connection (connecting=0)	thread=28
2024-12-09T10:54:27.112796Z	debug	envoy connection external/envoy/source/common/network/connection_impl.cc:1009	[Tags: "ConnectionId":"2280"] connecting to 10.68.4.14:443	thread=28
2024-12-09T10:54:27.113109Z	debug	envoy connection external/envoy/source/common/network/connection_impl.cc:1028	[Tags: "ConnectionId":"2280"] connection in progress	thread=28
2024-12-09T10:54:27.113119Z	trace	envoy pool external/envoy/source/common/conn_pool/conn_pool_base.cc:131	not creating a new connection, shouldCreateNewConnection returned false.	thread=28
2024-12-09T10:54:27.113124Z	trace	envoy connection external/envoy/source/common/network/connection_impl.cc:469	[Tags: "ConnectionId":"2279"] raising connection event 2	thread=28         <-- Connected
2024-12-09T10:54:27.113129Z	trace	envoy filter external/envoy/source/common/tcp_proxy/tcp_proxy.cc:737	[Tags: "ConnectionId":"2279"] on downstream event 2, has upstream = false	thread=28
2024-12-09T10:54:27.113136Z	debug	envoy conn_handler external/envoy/source/extensions/listener_managers/listener_manager/active_tcp_listener.cc:159	[Tags: "ConnectionId":"2279"] new connection from 10.128.2.184:54084	thread=28
2024-12-09T10:54:27.113139Z	trace	envoy main external/envoy/source/common/event/dispatcher_impl.cc:242	item added to deferred deletion list (size=4)	thread=28

2024-12-09T10:54:27.113142Z	trace	envoy main external/envoy/source/common/event/dispatcher_impl.cc:124	clearing deferred deletion list (size=4)	thread=28                                 <-- calls the ActiveTcpClient destructor
2024-12-09T10:54:27.113146Z	debug	envoy pool external/envoy/source/common/conn_pool/conn_pool_base.cc:215	[Tags: "ConnectionId":"2278"] destroying stream: 0 remaining	thread=28           <-- DESTROYING PREVIOUS UPSTREAM CONNECTION (from ActiveTcpClient destructor)
2024-12-09T10:54:27.113151Z	debug	envoy pool external/envoy/source/common/conn_pool/conn_pool_base.cc:454	invoking idle callbacks - is_draining_for_deletion_=false	thread=28               <-- from ActiveTcpClient destructor -> checkForIdleAndCloseIdleConnsIfDraining() -> checkForIdleAndNotify()
2024-12-09T10:54:27.113154Z	trace	envoy upstream external/envoy/source/common/upstream/cluster_manager_impl.cc:2140	Idle pool, erasing pool for host outbound|443||api-test.example.com10.68.4.14:443, hash key [0]	thread=28       <-- finds the wrong pool !!!!! (looks for the pool in container.pools_ by hash_key)

2024-12-09T10:54:27.113156Z	trace	envoy main external/envoy/source/common/event/dispatcher_impl.cc:242	item added to deferred deletion list (size=1)	thread=28
2024-12-09T10:54:27.113176Z	trace	envoy main external/envoy/source/common/event/dispatcher_impl.cc:124	clearing deferred deletion list (size=1)	thread=28
2024-12-09T10:54:27.113191Z	debug	envoy connection external/envoy/source/common/network/connection_impl.cc:146	[Tags: "ConnectionId":"2280"] closing data_to_write=0 type=1	thread=28             <-- ConnectionImpl::close(NoFlush)
2024-12-09T10:54:27.113194Z	debug	envoy connection external/envoy/source/common/network/connection_impl.cc:278	[Tags: "ConnectionId":"2280"] closing socket: 1	thread=28
2024-12-09T10:54:27.113211Z	trace	envoy connection external/envoy/source/common/network/connection_impl.cc:469	[Tags: "ConnectionId":"2280"] raising connection event 1	thread=28
2024-12-09T10:54:27.113215Z	debug	envoy pool external/envoy/source/common/conn_pool/conn_pool_base.cc:484	[Tags: "ConnectionId":"2280"] client disconnected, failure reason: 	thread=28

2024-12-09T10:54:27.113225Z	debug	envoy filter external/envoy/source/common/tcp_proxy/tcp_proxy.cc:430	[Tags: "ConnectionId":"2279"] Creating connection to cluster outbound|443||api-test.example.com	thread=28
2024-12-09T10:54:27.113233Z	trace	envoy upstream external/envoy/source/extensions/clusters/original_dst/original_dst_cluster.cc:65	Using existing host outbound|443||api-test.example.com10.68.4.14:443 10.68.4.14:443.	thread=28
2024-12-09T10:54:27.113239Z	trace	envoy upstream external/envoy/source/common/upstream/cluster_manager_impl.cc:2111	Will allocate pool for host outbound|443||api-test.example.com10.68.4.14:443, hash key [0]	thread=28
2024-12-09T10:54:27.113241Z	debug	envoy misc external/envoy/source/common/upstream/cluster_manager_impl.cc:2257	Allocating TCP conn pool	thread=28
2024-12-09T10:54:27.113245Z	debug	envoy pool external/envoy/source/common/conn_pool/conn_pool_base.cc:291	trying to create new connection	thread=28
2024-12-09T10:54:27.113263Z	trace	envoy pool external/envoy/source/common/conn_pool/conn_pool_base.cc:292	ConnPoolImplBase 0x557bf2f82640, ready_clients_.size(): 0, busy_clients_.size(): 0, connecting_clients_.size(): 0, connecting_stream_capacity_: 0, num_active_streams_: 0, pending_streams_.size(): 1 per upstream preconnect ratio: 1	thread=28
2024-12-09T10:54:27.113266Z	debug	envoy pool external/envoy/source/common/conn_pool/conn_pool_base.cc:145	creating a new connection (connecting=0)	thread=28
2024-12-09T10:54:27.113303Z	debug	envoy connection external/envoy/source/common/network/connection_impl.cc:1009	[Tags: "ConnectionId":"2281"] connecting to 10.68.4.14:443	thread=28
2024-12-09T10:54:27.113492Z	debug	envoy connection external/envoy/source/common/network/connection_impl.cc:1028	[Tags: "ConnectionId":"2281"] connection in progress	thread=28
2024-12-09T10:54:27.113498Z	trace	envoy pool external/envoy/source/common/conn_pool/conn_pool_base.cc:131	not creating a new connection, shouldCreateNewConnection returned false.	thread=28
2024-12-09T10:54:27.113502Z	trace	envoy pool external/envoy/source/common/conn_pool/conn_pool_base.cc:131	not creating a new connection, shouldCreateNewConnection returned false.	thread=28
2024-12-09T10:54:27.113504Z	trace	envoy main external/envoy/source/common/event/dispatcher_impl.cc:242	item added to deferred deletion list (size=1)	thread=28
2024-12-09T10:54:27.113506Z	debug	envoy pool external/envoy/source/common/conn_pool/conn_pool_base.cc:454	invoking idle callbacks - is_draining_for_deletion_=false	thread=28
2024-12-09T10:54:27.113508Z	trace	envoy upstream external/envoy/source/common/upstream/cluster_manager_impl.cc:2140	Idle pool, erasing pool for host outbound|443||api-test.example.com10.68.4.14:443, hash key [0]	thread=28
2024-12-09T10:54:27.113510Z	trace	envoy main external/envoy/source/common/event/dispatcher_impl.cc:242	item added to deferred deletion list (size=2)	thread=28
2024-12-09T10:54:27.113514Z	trace	envoy main external/envoy/source/common/event/dispatcher_impl.cc:124	clearing deferred deletion list (size=2)	thread=28
2024-12-09T10:54:27.113522Z	debug	envoy connection external/envoy/source/common/network/connection_impl.cc:146	[Tags: "ConnectionId":"2281"] closing data_to_write=0 type=1	thread=28
2024-12-09T10:54:27.113525Z	debug	envoy connection external/envoy/source/common/network/connection_impl.cc:278	[Tags: "ConnectionId":"2281"] closing socket: 1	thread=28
2024-12-09T10:54:27.113541Z	trace	envoy connection external/envoy/source/common/network/connection_impl.cc:469	[Tags: "ConnectionId":"2281"] raising connection event 1	thread=28
2024-12-09T10:54:27.113545Z	debug	envoy pool external/envoy/source/common/conn_pool/conn_pool_base.cc:484	[Tags: "ConnectionId":"2281"] client disconnected, failure reason: 	thread=28
...

The text was updated successfully, but these errors were encountered:

luksa · 2024-12-16T10:00:56Z

Note: our Envoy team is already working on a fix for this; will submit a PR shortly.

adisuissa · 2024-12-16T17:01:17Z

cc @alyssawilk @KBaichoo who may have better understanding whether such a race is possible.

fixes envoyproxy#37679 Co-authored-by: Marko Lukša <[email protected]> Signed-off-by: Zuzana Miklankova <[email protected]>

luksa added bug triage Issue requires triage labels Dec 16, 2024

adisuissa added investigate Potential bug that needs verification area/connection and removed triage Issue requires triage labels Dec 16, 2024

zmiklank added a commit to zmiklank/envoy that referenced this issue Jan 9, 2025

tcp conn pool: fix wrong connection pool deletion

d585d83

fixes envoyproxy#37679 Co-authored-by: Marko Lukša <[email protected]> Signed-off-by: Zuzana Miklankova <[email protected]>

zmiklank linked a pull request Jan 9, 2025 that will close this issue

tcp conn pool: fix wrong connection pool deletion #37944

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ActiveTcpClient destructor deletes the wrong connection pool #37679

ActiveTcpClient destructor deletes the wrong connection pool #37679

luksa commented Dec 16, 2024 •

edited

Loading

luksa commented Dec 16, 2024

adisuissa commented Dec 16, 2024

ActiveTcpClient destructor deletes the wrong connection pool #37679

ActiveTcpClient destructor deletes the wrong connection pool #37679

Comments

luksa commented Dec 16, 2024 • edited Loading

luksa commented Dec 16, 2024

adisuissa commented Dec 16, 2024

luksa commented Dec 16, 2024 •

edited

Loading