Optimizing HAProxy for security and performance by tuning timeouts

Preface

This article assumes you already know how to setup and configure HAProxy: you know what the terms proxies, listen block, frontend and backend mean. You should also know that traffic enters HAProxy through a frontend block or a listen block and that it is passed on to one or more backend servers based on Access Control Lists (acls).

Important to know is that when we say HAProxy, we’re talking about version 1.8.x!

One more thing

Timeouts are not the problem!

Remember that quote well! If you’re unsure as to what it means, it’s simple: when hitting a timeout, your first reaction should not be to increase the timeout. You should investigate why something is taking so long that you hit a timeout somewhere and fix the root cause. It could also be that you need to push long running jobs to the background using queueing mechanisms or other techniques. If you do not take this to heart, the problem will come back and bite you really, really hard, especially if you want the application to scale properly.

A very basic config file

Let’s have a look at an extremely basic HAProxy config with a single frontend passing data on to a single backend:

global
    user    haproxy
    group   haproxy
    pidfile /var/run/haproxy-tep.pid
    stats   socket /var/run/haproxy.stats
    maxconn 20480

defaults
    retries 3
    option  redispatch
    timeout client 30s
    timeout connect 4s
    timeout server 30s

frontend www_frontend
    bind            :80
    mode            http

    default_backend www_backend

backend www_backend
    mode         http

    server       apache24_1 192.168.0.1:8080 check fall 2 inter 1s

You cannot go much more minimal without HAProxy spewing out warnings about missing timeouts etc. What this config does should be obvious:

it accepts http traffic on port 80 for any IP address with a maximum of 20480 connections.
it will forward this traffic to an Apache 2.4 server running on a host with IP 192.168.0.1 and port 8080
HAProxy will check if the Apache server is available every second and will consider it dead after 2 consecutive failed checks.

You will instantly notice three timeouts in this config that HAProxy will nag about when you do not set them:

timeout client <timeout>
timeout connect <timeout>
timeout server <timeout>

Let’s tackle these basic ones one by one.

The three basic HAProxy timeouts

timeout client

Set the maximum inactivity time on the client side.

That’s what the manual says and that’s exactly what it is: when the client is expected to acknowledge or send data, this timeout is applied. In our example it was set to 30 seconds, so when the client doesn’t start sending or accepting (receiving) data within 30 seconds, the connection is closed.

timeout connect

Set the maximum time to wait for a connection attempt to a server to succeed.

That’s quite important: it applies to the server, not the client! And it obviously only applies to the connection phase, not the transfer of data or anything else. With servers located in the same network, the connection time will be a few milliseconds. With more complex topology, say cross cloud connectivity which is what DeltaBlue is an expert at, you need to allow a bit more time for this. Always stay within reasonable parameters, though. If you need to go higher than 4 seconds, you really have a different problem altogether (remember “Timeouts are not the problem!“).

timeout server

Set the maximum inactivity time on the server side.

The exact same as the first timeout we looked at, but at the server side: when the server is expected to acknowledge or send data, this timeout is applied. In our example, this applies to the Apache 2.4 server that could be running a PHP application with its own timeouts. If that PHP application does not start sending HTTP headers (our frontend is running in HTTP mode) within 30 seconds, the client will receive a 504 Gateway timeout error from HAProxy. So this timeout is all about the server’s processing time for the given request. Anything higher than 30 seconds should really be considered way too slow and again: you have a different problem (hint: it’s not the timeout).

HAProxy’s other timeouts that you really need

We’ll expand our basic config file a bit to look like this:

global
    user    haproxy
    group   haproxy
    pidfile /var/run/haproxy-tep.pid
    stats   socket /var/run/haproxy.stats
    maxconn 20480

defaults
    retries 3
    option  redispatch
    timeout client 30s
    timeout connect 4s
    timeout server 30s
    # Newly added timeouts
    timeout http-request 10s
    timeout http-keep-alive 2s
    timeout queue 5s
    timeout tunnel 2m
    timeout client-fin 1s
    timeout server-fin 1s

frontend www_frontend
    bind            :80
    mode            http

    default_backend www_backend

backend www_backend
    mode         http

    server       apache24_1 192.168.0.1:8080 check fall 2 inter 1s

So the next batch we will be looking at are these:

timeout http-request <timeout>
timeout http-keep-alive <timeout>
timeout queue <timeout>
timeout tunnel <timeout>
timeout client-fin <timeout>
timeout server-fin <timeout>

timeout http-request

Set the maximum allowed time to wait for a complete HTTP request

A very easy and therefore popular attack is a Denial of Service (DoS) attack. A lot of timeout settings can help mitigate these and so can this one. When concerned about security, you will no doubt have heard about Slow loris. Not the animal, but the attack (named after the animal) ;-) This attack will open as many connections as possible and keep them open in order to consume all possible sockets thereby denying other people access and effectively ‘closing down’ the host.

In HAProxy, use this parameter to limit the time frame in which a complete HTTP request can be sent, rendering attacks such as Slow loris largely ineffective. By separating this from the timeout client, you can do more fine grained tweaking in complex setups.

As the article title suggested, we will be tuning for performance and security. This one will actually do both as it will also keep HAProxy clear of processing (too much) garbage so it can direct its resources on useful things.

timeout http-keep-alive

Set the maximum allowed time to wait for a new HTTP request to appear

HTTP Keep-Alive is also referred to as a persistent connection allowing browsers to work more efficiently with connections and offering a faster end user experience in page loading using HTTP/1.1 (HTTP/2 always uses a single connection per client).

Say you have an HTML page loading CSS, JavaScript, images and other assets, using a persistent connection will be much faster, as a single connection can be reused to send the data. The overhead of recreating a connection for each asset is gone.

When the server sends a response, this timeout kicks in and when a new request is received within this time frame, the connection is reused. As soon as a new request comes in, the timeout http-request will take over!

If you do not set timeout http-keep-alive, the timeout http-request value will be used.

timeout queue

Set the maximum time to wait in the queue for a connection slot to be free

So that’s quite clear: when the maximum connections (of 20480 in this example) are reached, the requests will be queued for this amount of time. To keep performance optimal, you should set this timeout to prevent clients from being queued indefinitely.

If you do not set it, timeout connect will be used instead.

timeout tunnel

Set the maximum inactivity time on the client and server side for tunnels.

As our config only handles HTTP, this setting will be used when upgrading a connection to, say, a WebSocket.

Tunnels are usually long lived connections, so keep timeouts higher but still reasonable. Also be sure to set the timeout client-fin parameter!

timeout client-fin

Set the inactivity timeout on the client side for half-closed connections.

This timeout starts ticking when the client disappears suddenly while it was still expected to acknowledge or send data. This can happen for various reasons: networking issues, buggy clients, …

In order for these semi-closed connections to be cleaned up swiftly, you should keep this timeout short so that you do not end up with a huge list of FIN_WAIT connections flooding the server. When the client is gone, it’s gone. It’ll reconnect when it needs to.

timeout server-fin

Set the inactivity timeout on the server side for half-closed connections.

Exactly the same as the client side version, but for the server side. In cloud environments where you would have several servers per backend block, closing these wonky connections swiftly will make HAProxy switch to a ‘working’ server faster to keep ‘downtime’ to an absolute minimum.

How about hosting some legacy applications?

Ok, let’s tackle the inevitable question:

“But I have some old applications that I just can’t migrate to a modern framework because no-one wants to invest in them anymore yet they’re still being actively used! I need higher timeouts!”

Sad but true, there are lots of those out there, but HAProxy can solve all of this legacy stuff for you if harness its power properly.

Don’t be tempted to mindlessly raise timeouts as they will get exploited at some point in time!

So we’ll do it as good as we can (can’t say properly as that would mean adjusting the application itself which we weren’t going to do) and adjust our configuration to allow an admin area with higher timeouts:

global
    user    haproxy
    group   haproxy
    pidfile /var/run/haproxy-tep.pid
    stats   socket /var/run/haproxy.stats
    maxconn 20480

defaults
    retries 3
    option  redispatch
    timeout client 30s
    timeout connect 4s
    # Newly added timeouts
    timeout http-request 10s
    timeout http-keep-alive 2s
    timeout queue 5s
    timeout tunnel 2m
    timeout client-fin 1s
    timeout server-fin 1s

frontend www_frontend
    bind            :80
    mode            http

    acl             is_path_admin path_beg /admin

    use_backend     www_backend_slow_pool if is_path_admin

    default_backend www_backend

backend www_backend
    mode         http
    timeout      server 30s
    server       apache24_1 192.168.0.1:8080 check fall 2 inter 1s

backend www_backend_slow_pool
    mode         http
    timeout      server 3600s
    server       apache24_1 192.168.0.1:8080 check fall 2 inter 1s

So what happened?

The timeout server declaration has been moved to the backend blocks.
An ACL was added that detects if the requested path begins with /admin.
A use_backend statement was added to route traffic to a slow pool backend if the path ACL matches.
A slow pool backend was added with a (ridiculously) high timeout to prevent HAProxy from throwing a 504 Gateway Timeout because of slow server responses.

These timeouts will obviously only affect HAProxy: the server behind the slow pool backend must be setup with its own proper high timeouts on various levels so that it won’t time out for the admin area. If that would happen, then HAProxy would return a 503 Service Unavailable.

Preventing high timeout abuse/exploitation

You can go further and setup a separate instance that will only handle the slow requests and give it, say, an admin.example.com domain. You can add an ACL to route all traffic for that domain to the slow pool and even add some IP locking, basic auth protection etc. to prevent abuse of the high timeouts because that is a valid concern indeed.

Offloading this traffic to a separate instance will also make sure regular users are not impacted due to lots of slow processes on the server eating up the connection or worker pool.