-
Notifications
You must be signed in to change notification settings - Fork 11
Description
TCP keepalive is optional and off by default.
Its purpose is to:
- detect dead peers. If peer is not alive, close the socket to save resources.
- prevent connection from being closed by firewall or NAT proxy due to inactivity.
A real world use case of 2. got reported in https://rabbitmq.slack.com/archives/C1EDN83PA/p1656489674972399.
Therefore in Osiris, TCP keepalive can be optionally enabled for the stream data replication. Both client and server have to opt in by setting osiris parameter replica_keepalive:
Line 700 in 6d81744
KeepAlive = application:get_env(osiris, replica_keepalive, false), osiris/src/osiris_replica_reader.erl
Line 381 in ccc4ee6
KeepAlive = application:get_env(osiris, replica_keepalive, false),
This issue is about whether we should make the TCP keepalive parameters configurable:
- Keepalive time is the duration between two keepalive transmissions in idle condition. TCP keepalive period is required to be configurable and by default is set to no less than 2 hours.
- Keepalive interval is the duration between two successive keepalive retransmissions, if acknowledgement to the previous keepalive transmission is not received.
- Keepalive retry is the number of retransmissions to be carried out before declaring that remote end is not available
Specifically, it may be desirable to decrease 1. Keepalive time to a value lower than 2 hours.
See https://github.com/emqx/emqx/blob/6d5ad97528072e7b9186cb35e2eab7695dd0393a/apps/emqx/src/emqx_connection.erl#L269-L272 for an Erlang example.
Note however:
Code such as these [raw socket option] examples is inherently non-portable, even different versions of the same OS on the same platform can respond differently to this kind of option manipulation. Use with care.