Production TCP Servers in Go

Go makes creating TCP servers deceptively simple: create a listener, accept connections, spawn some goroutines for reading from and writing to the connection, and presto! You’ve got yourself a TCP server. This isn’t enough, though, to confidently deploy your server on the public Internet. Resources on how to do this are scant, so in this post I’ve compiled the things I’ve learned deploying production TCP servers as part of my work at Kyokan. While the examples I’ll give will be written in Go, many of the principles described here are applicable to other languages too.

To get things started, let’s assume that we’re building a simple TCP server that accepts newline-delimited sequences of strings. When the server receives a string, it will respond with the string in all capital letters. The basic code for this server might look like this:

package main

import (
	"bufio"
	"log"
	"net"
	"strings"
)

func main() {
	lis, err := net.Listen("tcp", ":1234")
	if err != nil {
		log.Fatalf("failed to start listener: %v", err)
	}

	for {
		conn, err := lis.Accept()
		if err != nil {
			log.Printf("failed to accept conn: %v", err)
			continue
		}

		go handleConn(conn)
	}
}

func handleConn(conn net.Conn) {
	defer conn.Close()
	done := make(chan struct{})

	go func() {
		scan := bufio.NewScanner(conn)
		for scan.Scan() {
			input := scan.Text()
			output := strings.ToUpper(input)
			if _, err := conn.Write([]byte(output + "\n")); err != nil {
				log.Printf("failed to write output: %v", err)
				return
			}
			log.Printf("wrote response: %s", output)
		}

		done <- struct{}{}
	}()

	<-done
}

This code should generally match most online tutorials. We create a listener, accept new connections, then loop forever while we wait to receive data on the socket. Once we receive some data, we process it and write the output to the socket.

Now, on to making this code more robust.

1. Close Connections On Errors

You may notice that in the above code, a failure to read from or write to the socket will trigger the deferred conn.Close() in handleConn. This is deliberate. Any IO error from a socket means that the connection is unstable, and should be closed. Why is this? Well, Go’s networking libraries wrap raw syscalls, and errors returned by these syscalls are generally non-recoverable. For example, a calling conn.Write in the server code above from a Linux machine would call the write(2) syscall. write(2) has 13 defined error codes, all of which denote either problems with the host system or unrecoverable network errors (with the exception of EAGAIN and EWOULDBLOCK, which Go’s standard library handles for you). Rather than trying to salvage the socket in the rare instances where doing so is possible, it is far safer to simply close the connection.

Note that this advice doesn’t apply to application-level logic. Let’s pretend for a moment that our server reads JSON objects from the socket rather than simple strings. A JSON parse error in this case might be recoverable depending on the application.

The code above handles this case out-of-the box, so we don’t need to change anything in our example just yet.

2. Bound All Reads

TCP is a stream-oriented protocol, and Go’s standard io.Reader and io.Writer interfaces make interacting with streams extremely simple. However, this simplicity can come at a cost. In our current example, we’re passing net.Conn (which is an io.ReadWriter) directly into an instance of bufio.Scanner. If a client sent an infinitely long stream of bytes to our server without including a newline, our application would run out of memory and crash. Not good. Luckily, Go makes solving this easy. First, let’s define a constant called MaxLineLenBytes with the maximum line length in bytes we’ll support:

// near the top of the file

const MaxLineLenBytes = 1024

Then, amend the reader goroutine in handleConn as follows:

go func() {
	// limit the maximum line length (in bytes)
	lim := &io.LimitedReader{
		R: conn,
		N: MaxLineLenBytes,
	}
	scan := bufio.NewScanner(lim)
	for scan.Scan() {
		input := scan.Text()
		output := strings.ToUpper(input)
		if _, err := conn.Write([]byte(output)); err != nil {
			log.Printf("failed to write output: %v", err)
			return
		}
		log.Printf("wrote response: %s", output)
		// reset the number of bytes remaining in the LimitReader
		lim.N = MaxLineLenBytes
	}

	done <- struct{}{}
}()

In effect, we’ve wrapped our net.Conn in an io.LimitedReader object that will return an io.EOF error in the event that more than MaxLineLenBytes bytes are read from the connection. Every time a line is read, we reset the number of bytes remaining to MaxLineLenBytes to ensure that we only limit the number of bytes per line rather than the total number of bytes per connection. Always make sure to set some limit on the maximum amount of data your server will process before forcibly closing the connection.

Note that in this scenario, io.LimitedReader returning io.EOF would cause the reader goroutine to return and the connection to be closed. Your application may want to process oversized messages differently, in which case io.LimitedReader may be, well, too limiting since it’s not possible to disambiguate an io.EOF coming from the io.LimitedReader or the underlying net.Conn. In cases like this, it’s relatively trivial to either write your own version of a limited reader or use another utility from the standard library such as http.MaxBytesReader.

3. Set Timeouts

TCP connections are routed through numerous machines before finally reaching your server. Any one of these machines or the client itself could fail and not send the FIN or RST segments that signal the end of the connection. As such, it’s important to set reasonable timeouts to ensure that dead connections eventually get closed in order to prevent resource leaks. Go’s APIs allow read and write timeouts to be set separately. Doing this is easy. Add a constant, ReadWriteTimeout:

// near the top of the file

const ReadWriteTimeout = time.Minute

Then, amend handleConn as follows:

func handleConn(conn net.Conn) {
	defer conn.Close()
	done := make(chan struct{})

	// time out one minute from now if no
	// data is received. the error can be
	// safely ignored.
	_ = conn.SetReadDeadline(time.Now().Add(ReadWriteTimeout))

	go func() {
		// limit the maximum line length (in bytes)
		lim := &io.LimitedReader{
			R: conn,
			N: MaxLineLenBytes,
		}
		scan := bufio.NewScanner(lim)
		for scan.Scan() {
			input := scan.Text()
			output := strings.ToUpper(input)
			if _, err := conn.Write([]byte(output)); err != nil {
				log.Printf("failed to write output: %v", err)
				return
			}
			log.Printf("wrote response: %s", output)
			// reset the number of bytes remaining in the LimitReader
			lim.N = MaxLineLenBytes
			// reset the read deadline
			_ = conn.SetReadDeadline(time.Now().Add(ReadWriteTimeout))
		}

		done <- struct{}{}
	}()

	<-done
}

Note that Go uses deadlines, so we have to turn our timeout value into an explicit point-in-time at which the connection will time out.

Now, clients have to send data at least once every 60 seconds for the connection to remain open.

4. Use Application-Level Keepalives

TCP supports keepalives, which are null-data ACK segments designed to prevent connections from being closed due to inactivity. In practice, however, TCP keepalives have numerous issues that prevent them from being reliable enough for production use:

Not all TCP implementations support them.
They interact in subtle, frustrating ways with TCP_USER_TIMEOUT.
They can cause network congestion.
Some firewalls filter out keepalive packets.
The way TCP keepalives are implemented (i.e., ACK segments with no data) means that they may not be reliably transmitted by TCP.

So, the solution is to use application-level keepalives in order to ensure that messages related to testing the status of a connection are reliably sent. There are many ways to do this. One of the ways is to have peers send each other ping/pong messages on a fixed interval. This isn’t a requirement for our TCP server, however if your application must reliably determine the state of its connected peers then you should absolutely develop your own keepalive mechanism rather than leave it up to TCP.

In Summary

Here’s our server, ready to be deployed (I added some additional logging):

package main

import (
	"bufio"
	"io"
	"log"
	"net"
	"strings"
	"time"
)

const MaxLineLenBytes = 1024
const ReadWriteTimeout = time.Minute

func main() {
	lis, err := net.Listen("tcp", ":1234")
	if err != nil {
		log.Fatalf("failed to start listener: %v", err)
	}

	for {
		conn, err := lis.Accept()
		if err != nil {
			log.Printf("failed to accept conn: %v", err)
			continue
		}

		go handleConn(conn)
	}
}

func handleConn(conn net.Conn) {
	log.Printf("accepted connection from %s", conn.RemoteAddr())

	defer func() {
		_ = conn.Close()
		log.Printf("closed connection from %s", conn.RemoteAddr())
	}()
	done := make(chan struct{})

	// time out one minute from now if no
	// data is received
	_ = conn.SetReadDeadline(time.Now().Add(ReadWriteTimeout))

	go func() {
		// limit the maximum line length (in bytes)
		lim := &io.LimitedReader{
			R: conn,
			N: MaxLineLenBytes,
		}
		scan := bufio.NewScanner(lim)
		for scan.Scan() {
			input := scan.Text()
			output := strings.ToUpper(input)
			if _, err := conn.Write([]byte(output + "\n")); err != nil {
				log.Printf("failed to write output: %v", err)
				return
			}
			log.Printf("wrote response: %s", output)
			// reset the number of bytes remaining in the LimitReader
			lim.N = MaxLineLenBytes
			// reset the read deadline
			_ = conn.SetReadDeadline(time.Now().Add(ReadWriteTimeout))
		}

		done <- struct{}{}
	}()

	<-done
}

To play with it, compile and run the program. Then, in another terminal, run nc localhost 1234. Anything you enter followed by a newline should be echoed back to you in all caps. Try experimenting with some of the issues we’ve covered here: send it a super long line, wait for the timeout to close the connection, etc. If you run it on a non-local machine, see what happens when you disable your development machine’s wifi.

Next Steps

In a separate post, I’ll expand upon some useful ways to structure Go TCP servers. I’ll also touch upon how to build TCP wire protocols - i.e., the bytes that get sent across the network - that are robust and efficient.