I have golang services with persistant Bigtable client. The services are making hundred of read/write operation on Bigtable per sec.
Every hours from the service boot, I experience hundred of errors like this one:
Retryable error: rpc error: code = Unavailable desc =
 the connection is draining, retrying in 74.49241ms
The error are followed by an increased processing time I can't allow when thoses errors occur.
I was able to figure out that Bigtable client is using a pool of gRPC connections.
It seems that Bigtable gRPC server have a connection maxAge of 1 hour which can explain the error above and the processing time increasing during reconnection.
A maxAgeGrace configuration is supposed to give additional time to complete current operations and avoid all pool connections to terminate at the same time.
I increased connection pool size from default 4 to 12 with no real benefit
How do I prevent processing time to increase during reconnections and these error to happen, given my traffic will keep growing?
Cloud bigtable clients use a pool of gRPC connections to connect to bigtable. Java client uses a channel pool per HBase connection, each channel pool has multiple gRPC connections. gRPC connections are shut down every hour (or after 15 minute of inactivity) and the underlying gRPC infrastructure performs a reconnect. The first request on each new connection performs a number of setup tasks such as TLS handshakes and warming server side caches. These operations are fairly expensive and may cause the latency spikes.
Bigtable is designed to be a high throughput system and the amortized cost of these reconnections with sustained query volume should be negligible. However, if the client application has very low QPS or long periods of idle time between queries and can not tolerate these latency spikes, it can create a new Hbase connection(java) or a new CBT client(golang) every 30-40 minutes and run no op calls (exist on hbase client or read a small row) on the new connection/client to prime the underlying gRPC connections (one call per connection, for hbase default is twice the number of CPUs, go has 4 connections by default). Once primed you can swap out the new connection/client for the main operations in the client application. Here is sample go code for this workaround.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With