12/06/2020

Socket: what it is and how it works

A Socket is a software tool, designed to guide data bundles from one host to another correctly. It is used in internet communications (between remote hosts) as well as in local communications (Inter-Process Communication).

In both cases, communicating sockets form a pair, consisting of two important components: an address and a port. These allow the data bundle to reach the correct destination, thanks to a logical connection. Using special API’s, the operating systems enable applications to use network sockets.

How a socket address is created

It is important to include the address and port in a socket address, to enable the two interlocutors to communicate correctly. In fact, the processes carried out for each interlocutor (whether a server or a client) are numerous, so it is essential to know which one to focus on. In order to do this, ports are used with specific numbers allocated to each process.

A socket address is composed of: an IP address, a 32 bit and a 16 bit port number. The port numbers are grouped into different sub-categories (defined by the IANA), to facilitate their identification and operation:

Well-known (reserved for specific protocols): 20, 23, 25, 80, 110,…;
Un-used: 0;
Reserved for well-known processes: 1-255;
Reserved for other processes: 356-1023;
Other applications: 1024-65535.

The operating system allocates port numbers to processes classified as not well-known independently. In this case, they are known as ephemeral doors. Another important aspect of sockets is the family (also known as domain). This varies according to the type of communication protocol used, but the most noteworthy are:

AF_INET: for communication between remote hosts, via internet;

AF_UNIX: which identifies communication between local processes (more precisely on Unix devices). Another name for this family is Unix Domain Socket.

Types of socket and communication methods

In each family, several different types of socket can be found, which vary according to the connection method used. Datagram sockets, for example, are based on the UDP protocol and are not oriented at connection (connectionless). Stream sockets, in contrast, are oriented at connection and are based on the more reliable TCP or SCTP. There are also raw sockets in which the header is accessible at application level and the transport level is bypassed.

Thanks to secure protocols such as TCP and SCTP, stream sockets are more reliable, especially for online communication. They guarantee full-duplex features and oriented connections with a flow of bytes of variable length. In this specific example, communication is divided into several phases:

The socket is created in both client and server. The server uses its own socket to listen via a specific port;
The connection request is made by the client to the server. If the latter accepts, a connection is created;
At this point the server creates a new specific socket for the client, known as a data socket, which is used for data exchange between the two;
The client communicates the end of the message to the server, which eliminates the data socket message and closes the connection.

Datagram sockets, based on UDPs, are quicker than stream sockets (because they omit the phase of requesting a connection to the server), but also less secure. In this type of socket, communication is composed of just three phrases:

Creation of the socket (without need for a connection);
Sending data (the client sends the data directly to the server via the respective port numbers):
The server’s answer (continuous communication in a loop as long as there is data to send).

Translated by Joanne Beckwith