Linux regular expression tutorial

Using regular expressions in grep, awk and sed tools to find a specific text string is a useful tool for Linux administrators. Regular expressions in Linux provide more specific and efficient returns.

As a Linux administrator, you'll need to work with text files. Different tools such as grep, awk and sed are at your disposal to find files that contain a specific text string. Here I offer an introduction to working with regular expressions to search for text in these files in a flexible manner.

Let's consider an example where regular expressions play a role. For instance, if you try a command like grep -r host /, it would give you a huge result because every word that contains the text host (e.g., words like ghostscript) would match because they contain the string host. By using regular expression you can be much more specific about what you are looking for. For example, you can tell grep that it should look only for lines that start with the word host by using the regular expression '^host'.

Regular expressions are not available for all commands -- the command that you use must be programmed to work with regular expressions. The most common examples of such commands, are the grep, tr and vi utilities. Other utilities, like sed and awk can also work with them.

An example of the use of a regular expression is:

 grep 'lin.x' *

The dot in the regular expression 'lin.x' has a special meaning, it makes every character at that particular position in the text string viewed as a match. To prevent interpretation problems, I advise you to always put regular expressions between single quotes. This way, you'll prevent the shell from interpreting the regular expression.

Using regular expressions
There are many things that you can do with regular expressions. In the list below you'll find examples of some of the most common and useful regular expressions.

  • ^: indicates that the text string has to be at the beginning of a line. So, to find lines only that have the word "hosts" at the beginning of a line, use: grep -ls '^hosts'
  • $: refers to the end of a line. So, to find lines only that have the word "hosts" at the end of the line, use: grep -ls 'hosts$'

You can combine ^ and $ in a regular expression. To find lines that contain only the word "yes", you would use grep -ls '^yes$'

  • .: a wildcard that refers to any character, with the exception of a newline character. To find lines that contain tex, tux, tox or tix, use: grep -ls 't.x'
  • [ ]: indicates in a regular expression that characters between the square brackets are interpreted as alternatives. To find users that have the name pinda or linda: grep -ls '[pl]inda'
  • [^ ]: ignores all characters between square brackets after the ^ sign. To find all lines that have the text inda in them, but not lines that contain the text linda or pinda: grep -ls '[^pl]inda'
  • -: refers to a class or a range of characters. This is useful in commands like tr, where the following is used to translate all lowercase letters into uppercase letters: tr a-z A-Z < mytext. Likewise, you could use a regular expression to find all files that have lines that start with a number, using: grep -ls '^0-9'
  • \< and \>: searches for patterns at the beginning of a word or at the end of a word. To find lines that have words beginning with "san": grep \ . These regular expressions have two disadvantages -- they don't find lines that start with the provided regular expression and they are not supported by all utilities, however, vi and grep will work.
  • \: ensures that a character that has a special meaning in a regular expression is not interpreted. To search a text string that starts with any character, followed by the text "host": grep -ls '.host'. If you need to find a text string that has a dot at the first position followed by the text "host": grep -ls '\.host'

These regular expressions help you find words that contain certain text strings. You can also use regular expressions to specify how often a given string should occur in a word. For example, you can use a regular expression to search for files containing the username "linda" exactly three times. To do this, you need to use regular expression repetition operators and you need to make sure that the entire regular expression is in quotes. Without the quotes, you may end up with the shell interpreting your repetition operator.

A list of the most important repetition operators:

  • *: indicates that the preceding regular expression may occur once, more than once or not at all. Caution: don't try to use it as a * in the shell -- in a shell environment, * stands for any character. In regular expressions, * indicates that the preceding regular expression may exist.
  • ?: indicates that there may be a character at this position (but there doesn't have to be). For example, where both the words color and colour are found: grep -ls 'colo.r'
  • +: indicates the preceding character or regular expression has to be present at least once.
  • \{n\}: indicates the preceding character or regular expression occurs at least n times. Useful in a regular expression where you are looking for example for a number between 100 and 999: grep -ls '0-9\{3\}'

Here you have been given an overview of how to work with regular expressions. This allows you to do your work as an administrator more efficiently. Regular expressions have much more to offer, including rather complicated operations. However, before starting on that path, make sure you master the skills discussed here. Regular expressions can be so complex that it can be easy to get lost in them.

ABOUT THE AUTHOR: Sander van Vugt is an author and independent technical trainer, specializing in Linux since 1994. Vugt is also a technical consultant for high-availability (HA) clustering and performance optimization, as well as an expert on SUSE Linux Enterprise Desktop 10 (SLED 10) administration.

This was first published in January 2009

Dig deeper on Linux administration tools

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchDataCenter

SearchServerVirtualization

SearchCloudComputing

SearchEnterpriseDesktop

Close