Sockets and TCP Connections
Last time I wrote about UDP, a simple but unreliable abstraction over IP. For reliable data transfer, the key protocol is TCP. However, this reliability comes at a cost: TCP is significantly more complicated than UDP.
Rather than slog through RFC
793, we can start by
looking at Python’s
socket library. This
is just a Pythonic wrapper around BSD
sockets. These allow
you to interact with network interfaces more-or-less as if they were
files (e.g. with read
and write
, or their counterparts recv
and
send
).
It’s pretty easy to write a simple UDP server in Python, and since there are many tutorials available on the web I will be brief in description. In the following example, the server accepts messages and then echoes them back to the sender:
1import socket
2
3def udp_server(host: str, port: int):
4 sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
5 sock.bind((host, port))
6 while True:
7 msg, client = sock.recvfrom(2048)
8 sock.sendto(msg, client)
You can tell that this is UDP because the socket type is
SOCK_DGRAM
. A toy client might look like the following:
1def udp_client(addr: str, port: int):
2 dest = (addr, port)
3 sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
4 msg = input('Please enter a message: ')
5 sock.sendto(msg.encode(), dest)
6 response, _ = sock.recvfrom(2048)
7 print(response.decode())
All this does is get a message from the user, send it to the server, and then print out the response.
When we move to TCP, things get a bit more complicated. A similar
server (but now with type SOCK_STREAM
) requires a bit more set-up:
1def tcp_server(host: str, port: int):
2 listener = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
3 listener.bind((host, port))
4 listener.listen(1)
5 while True:
6 conn, _ = listener.accept()
7 msg = conn.recv(2048)
8 conn.send(msg)
9 conn.close()
Here we have a listener
socket which is bound to the given (host, port)
pair. It then notifies the operating system that it is ready to
listen
for incoming connections.
The TCP client now has to explicitly connect
, as shown below:
1def tcp_client(addr: str, port: int):
2 dest = (addr, port)
3 sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
4 sock.connect(dest)
5 msg = input('Please enter a message: ')
6 sock.send(msg.encode())
7 response = sock.recv(2048)
8 print(response.decode())
On the server side, the accept
call responds to the client’s
connect
. This also produces a new socket (conn
above),
dedicated to that specific connection. This is a lot more
complicated than the UDP counterpart – what’s going on here?
This connect/accept
logic handles the TCP connection
establishment. The
first TCP segment (roughly, the TCP equivalent of a packet) sent from
the client to the server has a special flag in the header to indicate
that it wants a connection. This segment is sent to listener
. Under
the hood, there is a brief back-and-forth with the client (this is the
“three way handshake” described in the Wikipedia article). After that,
the accept
call on the server returns a new socket. Subsequent TCP
segments to and from the client carry normal data, and use this second
socket.
Notice also that in the code above, the TCP server could have multiple
connections (and so multiple sockets) associated with the same
port. What we see here is TCP multiplexing. It can be helpful to
contrast with UDP. When a UDP datagram arrives at a host, it is routed
to the appropriate socket simply by looking at the pair (dest addr, dest port)
. This means that for a given UDP port, all incoming
packets are routed to the same socket, regardless of sender. With TCP,
each connection is uniquely identified by the tuple (source addr, source port, dest addr, dest port)
. This allows multiple connections
on the same destination port, as long as they originate from different
sources. But because the listener
handles the communication before
the TCP connection is established, we can only bind
one socket to a
given (dest adddr, dest port)
pair.
Anyways, this post is something of a teaser – there’s still a lot more to say about TCP. Next time we’ll break down the structure of a TCP segment and explore how sequence numbers and acknowledgements are used to ensure reliable communication.