cadlag dot org

Sockets and TCP Connections

Last time I wrote about UDP, a simple but unreliable abstraction over IP. For reliable data transfer, the key protocol is TCP. However, this reliability comes at a cost: TCP is significantly more complicated than UDP.

Rather than slog through RFC 793, we can start by looking at Python’s socket library. This is just a Pythonic wrapper around BSD sockets. These allow you to interact with network interfaces more-or-less as if they were files (e.g. with read and write, or their counterparts recv and send).

It’s pretty easy to write a simple UDP server in Python, and since there are many tutorials available on the web I will be brief in description. In the following example, the server accepts messages and then echoes them back to the sender:

1import socket
2
3def udp_server(host: str, port: int):
4    sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
5    sock.bind((host, port))
6    while True:
7        msg, client = sock.recvfrom(2048)
8        sock.sendto(msg, client)

You can tell that this is UDP because the socket type is SOCK_DGRAM. A toy client might look like the following:

1def udp_client(addr: str, port: int):
2    dest = (addr, port)
3    sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
4    msg = input('Please enter a message: ')
5    sock.sendto(msg.encode(), dest)
6    response, _ = sock.recvfrom(2048)
7    print(response.decode())

All this does is get a message from the user, send it to the server, and then print out the response.

When we move to TCP, things get a bit more complicated. A similar server (but now with type SOCK_STREAM) requires a bit more set-up:

1def tcp_server(host: str, port: int):
2    listener = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
3    listener.bind((host, port))
4    listener.listen(1)
5    while True:
6        conn, _ = listener.accept()
7        msg = conn.recv(2048)
8        conn.send(msg)
9        conn.close()

Here we have a listener socket which is bound to the given (host, port) pair. It then notifies the operating system that it is ready to listen for incoming connections.

The TCP client now has to explicitly connect, as shown below:

1def tcp_client(addr: str, port: int):
2    dest = (addr, port)
3    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
4    sock.connect(dest)
5    msg = input('Please enter a message: ')
6    sock.send(msg.encode())
7    response = sock.recv(2048)
8    print(response.decode())

On the server side, the accept call responds to the client’s connect. This also produces a new socket (conn above), dedicated to that specific connection. This is a lot more complicated than the UDP counterpart – what’s going on here?

This connect/accept logic handles the TCP connection establishment. The first TCP segment (roughly, the TCP equivalent of a packet) sent from the client to the server has a special flag in the header to indicate that it wants a connection. This segment is sent to listener. Under the hood, there is a brief back-and-forth with the client (this is the “three way handshake” described in the Wikipedia article). After that, the accept call on the server returns a new socket. Subsequent TCP segments to and from the client carry normal data, and use this second socket.

Notice also that in the code above, the TCP server could have multiple connections (and so multiple sockets) associated with the same port. What we see here is TCP multiplexing. It can be helpful to contrast with UDP. When a UDP datagram arrives at a host, it is routed to the appropriate socket simply by looking at the pair (dest addr, dest port). This means that for a given UDP port, all incoming packets are routed to the same socket, regardless of sender. With TCP, each connection is uniquely identified by the tuple (source addr, source port, dest addr, dest port). This allows multiple connections on the same destination port, as long as they originate from different sources. But because the listener handles the communication before the TCP connection is established, we can only bind one socket to a given (dest adddr, dest port) pair.

Anyways, this post is something of a teaser – there’s still a lot more to say about TCP. Next time we’ll break down the structure of a TCP segment and explore how sequence numbers and acknowledgements are used to ensure reliable communication.

Reply to this post by email ↪