Go makes creating TCP servers deceptively simple: create a listener, accept connections, spawn some goroutines for reading from and writing to the connection, and presto! You’ve got yourself a TCP server. This isn’t enough, though, to confidently deploy your server on the public Internet. Resources on how to do this are scant, so in this post I’ve compiled the things I’ve learned deploying production TCP servers as part of my work at Kyokan. While the examples I’ll give will be written in Go, many of the principles described here are applicable to other languages too.
To get things started, let’s assume that we’re building a simple TCP server that accepts newline-delimited sequences of strings. When the server receives a string, it will respond with the string in all capital letters. The basic code for this server might look like this:
package main
import (
"bufio"
"log"
"net"
"strings"
)
func main() {
lis, err := net.Listen("tcp", ":1234")
if err != nil {
log.Fatalf("failed to start listener: %v", err)
}
for {
conn, err := lis.Accept()
if err != nil {
log.Printf("failed to accept conn: %v", err)
continue
}
go handleConn(conn)
}
}
func handleConn(conn net.Conn) {
defer conn.Close()
done := make(chan struct{})
go func() {
scan := bufio.NewScanner(conn)
for scan.Scan() {
input := scan.Text()
output := strings.ToUpper(input)
if _, err := conn.Write([]byte(output + "\n")); err != nil {
log.Printf("failed to write output: %v", err)
return
}
log.Printf("wrote response: %s", output)
}
done <- struct{}{}
}()
<-done
}
This code should generally match most online tutorials. We create a listener, accept new connections, then loop forever while we wait to receive data on the socket. Once we receive some data, we process it and write the output to the socket.
Now, on to making this code more robust.
1. Close Connections On Errors
You may notice that in the above code, a failure to read from or write to the socket will trigger the deferred conn.Close()
in handleConn
. This is deliberate. Any IO error from a socket means that the connection is unstable, and should be closed. Why is this? Well, Go’s networking libraries wrap raw syscalls, and errors returned by these syscalls are generally non-recoverable. For example, a calling conn.Write
in the server code above from a Linux machine would call the write(2)
syscall. write(2)
has 13 defined error codes, all of which denote either problems with the host system or unrecoverable network errors (with the exception of EAGAIN
and EWOULDBLOCK
, which Go’s standard library handles for you). Rather than trying to salvage the socket in the rare instances where doing so is possible, it is far safer to simply close the connection.
Note that this advice doesn’t apply to application-level logic. Let’s pretend for a moment that our server reads JSON objects from the socket rather than simple strings. A JSON parse error in this case might be recoverable depending on the application.
The code above handles this case out-of-the box, so we don’t need to change anything in our example just yet.
2. Bound All Reads
TCP is a stream-oriented protocol, and Go’s standard io.Reader
and io.Writer
interfaces make interacting with streams extremely simple. However, this simplicity can come at a cost. In our current example, we’re passing net.Conn
(which is an io.ReadWriter
) directly into an instance of bufio.Scanner
. If a client sent an infinitely long stream of bytes to our server without including a newline, our application would run out of memory and crash. Not good. Luckily, Go makes solving this easy. First, let’s define a constant called MaxLineLenBytes
with the maximum line length in bytes we’ll support:
// near the top of the file
const MaxLineLenBytes = 1024
Then, amend the reader goroutine in handleConn
as follows:
go func() {
// limit the maximum line length (in bytes)
lim := &io.LimitedReader{
R: conn,
N: MaxLineLenBytes,
}
scan := bufio.NewScanner(lim)
for scan.Scan() {
input := scan.Text()
output := strings.ToUpper(input)
if _, err := conn.Write([]byte(output)); err != nil {
log.Printf("failed to write output: %v", err)
return
}
log.Printf("wrote response: %s", output)
// reset the number of bytes remaining in the LimitReader
lim.N = MaxLineLenBytes
}
done <- struct{}{}
}()
In effect, we’ve wrapped our net.Conn
in an io.LimitedReader
object that will return an io.EOF
error in the event that more than MaxLineLenBytes
bytes are read from the connection. Every time a line is read, we reset the number of bytes remaining to MaxLineLenBytes
to ensure that we only limit the number of bytes per line rather than the total number of bytes per connection. Always make sure to set some limit on the maximum amount of data your server will process before forcibly closing the connection.
Note that in this scenario, io.LimitedReader
returning io.EOF
would cause the reader goroutine to return and the connection to be closed. Your application may want to process oversized messages differently, in which case io.LimitedReader
may be, well, too limiting since it’s not possible to disambiguate an io.EOF
coming from the io.LimitedReader
or the underlying net.Conn
. In cases like this, it’s relatively trivial to either write your own version of a limited reader or use another utility from the standard library such as http.MaxBytesReader
.
3. Set Timeouts
TCP connections are routed through numerous machines before finally reaching your server. Any one of these machines or the client itself could fail and not send the FIN
or RST
segments that signal the end of the connection. As such, it’s important to set reasonable timeouts to ensure that dead connections eventually get closed in order to prevent resource leaks. Go’s APIs allow read and write timeouts to be set separately. Doing this is easy. Add a constant, ReadWriteTimeout
:
// near the top of the file
const ReadWriteTimeout = time.Minute
Then, amend handleConn
as follows:
func handleConn(conn net.Conn) {
defer conn.Close()
done := make(chan struct{})
// time out one minute from now if no
// data is received. the error can be
// safely ignored.
_ = conn.SetReadDeadline(time.Now().Add(ReadWriteTimeout))
go func() {
// limit the maximum line length (in bytes)
lim := &io.LimitedReader{
R: conn,
N: MaxLineLenBytes,
}
scan := bufio.NewScanner(lim)
for scan.Scan() {
input := scan.Text()
output := strings.ToUpper(input)
if _, err := conn.Write([]byte(output)); err != nil {
log.Printf("failed to write output: %v", err)
return
}
log.Printf("wrote response: %s", output)
// reset the number of bytes remaining in the LimitReader
lim.N = MaxLineLenBytes
// reset the read deadline
_ = conn.SetReadDeadline(time.Now().Add(ReadWriteTimeout))
}
done <- struct{}{}
}()
<-done
}
Note that Go uses deadlines, so we have to turn our timeout value into an explicit point-in-time at which the connection will time out.
Now, clients have to send data at least once every 60 seconds for the connection to remain open.
4. Use Application-Level Keepalives
TCP supports keepalives, which are null-data ACK
segments designed to prevent connections from being closed due to inactivity. In practice, however, TCP keepalives have numerous issues that prevent them from being reliable enough for production use:
- Not all TCP implementations support them.
- They interact in subtle, frustrating ways with
TCP_USER_TIMEOUT
. - They can cause network congestion.
- Some firewalls filter out keepalive packets.
- The way TCP keepalives are implemented (i.e.,
ACK
segments with no data) means that they may not be reliably transmitted by TCP.
So, the solution is to use application-level keepalives in order to ensure that messages related to testing the status of a connection are reliably sent. There are many ways to do this. One of the ways is to have peers send each other ping/pong messages on a fixed interval. This isn’t a requirement for our TCP server, however if your application must reliably determine the state of its connected peers then you should absolutely develop your own keepalive mechanism rather than leave it up to TCP.
In Summary
Here’s our server, ready to be deployed (I added some additional logging):
package main
import (
"bufio"
"io"
"log"
"net"
"strings"
"time"
)
const MaxLineLenBytes = 1024
const ReadWriteTimeout = time.Minute
func main() {
lis, err := net.Listen("tcp", ":1234")
if err != nil {
log.Fatalf("failed to start listener: %v", err)
}
for {
conn, err := lis.Accept()
if err != nil {
log.Printf("failed to accept conn: %v", err)
continue
}
go handleConn(conn)
}
}
func handleConn(conn net.Conn) {
log.Printf("accepted connection from %s", conn.RemoteAddr())
defer func() {
_ = conn.Close()
log.Printf("closed connection from %s", conn.RemoteAddr())
}()
done := make(chan struct{})
// time out one minute from now if no
// data is received
_ = conn.SetReadDeadline(time.Now().Add(ReadWriteTimeout))
go func() {
// limit the maximum line length (in bytes)
lim := &io.LimitedReader{
R: conn,
N: MaxLineLenBytes,
}
scan := bufio.NewScanner(lim)
for scan.Scan() {
input := scan.Text()
output := strings.ToUpper(input)
if _, err := conn.Write([]byte(output + "\n")); err != nil {
log.Printf("failed to write output: %v", err)
return
}
log.Printf("wrote response: %s", output)
// reset the number of bytes remaining in the LimitReader
lim.N = MaxLineLenBytes
// reset the read deadline
_ = conn.SetReadDeadline(time.Now().Add(ReadWriteTimeout))
}
done <- struct{}{}
}()
<-done
}
To play with it, compile and run the program. Then, in another terminal, run nc localhost 1234
. Anything you enter followed by a newline should be echoed back to you in all caps. Try experimenting with some of the issues we’ve covered here: send it a super long line, wait for the timeout to close the connection, etc. If you run it on a non-local machine, see what happens when you disable your development machine’s wifi.
Next Steps
In a separate post, I’ll expand upon some useful ways to structure Go TCP servers. I’ll also touch upon how to build TCP wire protocols - i.e., the bytes that get sent across the network - that are robust and efficient.