21/11/2018

Regex: its history and how it works

Regex (Regular Expressions) is the term used to describe a tool which, for anyone involved in development, can prove very useful. Regular Expressions are an instrument used in numerous programming languages. Two examples of such languages are PHP and Javascript.

Originally formalised in the early forties, they first appeared in the IT world in the nineteen seventies, in the Unix environment. The first text editor to enable their use was a version of QED written by Ken Thompson, considered to be one of the Unix pioneers.

This editor, featuring an interface with command bars, offered users a command known as global regular expression print. This was soon transformed into an independent application, or grep.

At least until the early eighties, Regular Expressions were not especially popular. Things changed following the invention of programming languages such as Perl (very popular thanks to its versatility), which envisaged its use from the start.

That concludes the historical background so we will now consider the practical features of this instrument, bearing in mind that its main purposes are to check syntax for errors and the manipulation of alphanumeric sequences. In order to carry out validation of these, they must be compared against one or more models, known as patterns.

Regular Expressions: how are they formed?

Regex are formed from various elements. Here is a summary:

Sequence of characters, or a row of letters to be interpreted literally;
Wildcards, the interpretation of which goes beyond their literal meaning;
Anchors, elements which indicate the specific position in which a single Regular Expression must carry out its analysis;
Modifiers, elements which carry out the function of expanding or diminishing the piece of text that must be analysed by the single Expression.

Depending on the programming language used, the motor operating the Regular Expressions changes. This aspect does not have any particular consequences on the principal commands. The latter, in fact, are the same in all languages.

How to create a Regular Expression to verify an alphanumeric sequence

Now it is time to take a look at how to create a Regex in order to verify a particular sequence. It is essential to use specific commands. Each of these in fact, carries out a very precise function.

Let us start with the asterisk (*), needed to carry out a search for the letter or series of letters which precede it. We will continue with the full stop (.), which is used to verify all characters, with the exception of ‘return’.

Other commands include the circumflex (^), whose purpose is to verify the start of a line. A specific example is its insertion between square brackets. In this case, the command rejects the series of letters which follow.

Also of note is the dollar sign which, if inserted at the end of a Regex, verifies the end of a line. Square brackets are also very important, as they enclose the characters to be verified. Depending on their content, verification can be carried out on several characters.

Another command on the list is the backslash (\), or the escape character for the special character. This means that, in other words, the letter is interpreted literally.

Let us conclude the list of Regular Expression commands by mentioning curved brackets with escaping, which indicate the beginning or end of a given word.

A thorough analysis of Regex has to include mention of their role for those who do SEO. Regular Expressions in fact, may be used inside .htaccess files when a redirect 301 is required for the entire website.

Translated by Joanne Beckwith