Regular Expressions

Is a special text string for describing a search pattern. You can think of regular expressions as wildcards on steroids. You are probably familiar with wildcard notations such as *.txt to find all text files in a file manager.

Essentially allows us to find a pattern in a string.

Regexes are widely supported in many programming languages including Java, Python, C++, Perl, Javascript, and PHP. It is also supported in text processing programs advanced text editors, and some other programs.

Regular Expressions

Some commonly used commands with regular expressions are tr, sed, vi, and grep.

Regular Expressions

Here are SOME of the main rules:

Text:

      .           Any single character
      [chars]     Character class: Any character of the class ``chars''
      [^chars]    Character class: Not a character of the class ``chars''
      text1|text2 Alternative: text1 or text2
  • Example: grep ^[a-c].[b-r].$ /usr/share/dict/words
  • Example: cat /usr/share/dict/words | egrep "(body|car)"
  • Example: grep 'curlcurd\|curd' /usr/share/dict/words

Regular Expressions

     Anchors:
      ^           Start-of-line anchor
      $           End-of-line anchor
  • Example: grep ^aa /usr/share/dict/words words that start with an 'aa'
  • Example: grep aa$ /usr/share/dict/words words that end with 'aa'
  • Example: grep aa /usr/share/dict/words find all words with an 'aa' in it.

Regular Expressions

     Quantifiers:
      ?           0 or 1 occurrences of the preceding text
      *           0 or N occurrences of the preceding text (N > 0)
      +           1 or N occurrences of the preceding text (N > 1)
  • Example: grep '^bow?' /usr/share/dict/words
  • Example: grep '^co?.$' /usr/share/dict/words

Regular Expressions

Grouping:

      (text)      Grouping of text (used either to set the borders of an alternative as above, or to make backreferences, where the Nth group can be referred to on the RHS of a RewriteRule as $N)
  • Example: grep '(bo).*\1' /usr/share/dict/words

Interval Regular Expressions

    {n}		matches the preceding char appearing 'n'times
  • Example: cat /usr/share/dict/words | grep -E p{2}

Basic regular expressions with grep

In its simplest form, when no regular expression type is given, grep interpret search patterns as basic regular expressions. To interpret the pattern as an extended regular expression, use the -E ( or --extended-regexp) option. (Or the egrep command)

Basic regular expressions with grep

In GNU’s implementation of grep there is no functional difference between the basic and extended regular expression syntaxes. The only difference is that in basic regular expressions the meta-characters ?, +, {, |, (, and ) are interpreted as literal characters. To keep the meta-characters’ special meanings when using basic regular expressions, the characters must be escaped with a backslash ().

Basic vs extended

Not too much difference, except:

  • basic must escape the previous explained chars

  • extended does not

  • Example: egrep 'some|tab' sample.txt

  • Example: grep 'some|tab' sample.txt

SED and regex

  • cat testing | sed '/^daemon/d' #delete all lines starting with daemon
  • cat testing | sed '/sh$/d' #delete all lines ending with sh
  • cat /etc/syslog.conf | sed -n '/^[[:alpha:]]/p' #print only lines that start with an alphabet