x386.org: Debian and Ubuntu Documentation

|  Home  |  About  |  Contact  |  License  |  Privacy Policy  |

Copyright (C) 2020 - 2024 Exforge exforge@x386.org

This document is free text: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.

This document is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see .


RegexOnDebianUbuntu

Regular Expressions On Debian and Ubuntu

0. Specs


0.0. Info

Well I know Regex is (almost) the same on every system, but this site is for Debian and Ubuntu, so it is named as.

0.1. Resources:

Book: 978-0-13-475706-3 Learning Regular Expressions by Ben Forta
Book: 978-1-4842-3875-2 Regex Quick Syntax Reference by Zsolt Nagy
Book: 978-1-449-31943-4Regular Expressions Cookbook by Jan Goyvaerts and Steven Levithan


1. Basics


1.1. A string itself

Obviously every string is a match for itself - go: go - dog: dog

1.2. Or Operator

| or []

1.3. Ranges

1.4. Exception

1.5. Character Classes

1.6. Escape Characters

Any operator or quantifier can be escaped with \ to resemble itself

When used in a bracket, \ is not necessary


2. Quantifiers and Boundaries


2.1. * Quantifier

A * quantifier after a character or group means 0 or more occurences of it.

2.2. + Quantifier

A + quantifier after a character or group means 1 or more occurences of it.

2.3. ? Quantifier

A ? quantifier after a character or group means 0 or 1 occurences of it.

2.4. {} Quantifier

Could be in {m} {n,} or {p,r} forms, m, n, p, r are all whole numbers

They come after a character or group and mean: 1. Exactly m occurences 2. n or more occurences 3. p to r occurences

2.5. Greedy and Lazy Quantifiers

By default, a quantifier matches as many of characters as possible.

When we try the regex:

\(.*\)

(find anything in paranthesis) on the following

abc(def)ghi(jkl)mno

instead of matching (def) and (jkl), it matches (def)ghi(jkl). This is called greedy matchings. So quantifiers are greedy by default.

To change the behaviour, that is matching the minimum, we can use the lazy versions of quantifiers by adding a ? to the end. Like:

\(.*?\)

This regex matches (def) and (jkl), and this is called lazy matching.

The lazy versions of the quantifiers are as follow:

2.6. Word Boundary: \b

\b denotes beginning or end of a word (characters surrounded by whitespaces).

2.7. Line Boundaries: ^ and $

^ defines the start of a line
$ defines the end of a line


3. Subexpressions and Backreferences


3.1. Subexpressions

A subexpression is a group of characters or operators in paranthesis. They are used to apply quantifiers to expressions.

Subexpressions can be nested

3.2. Backreferences

A backreference is in the format of a backslash followed by a digit, like \1 \2 \3. It refers to the subexpression in the relative position.

For example, the following regex matches the repeating words:

[\s]+(\w+)[\s]+\1[\s]+

[\s]+(\w+)[\s]+ matches a word, that is 1 or more whitespaces, followed by 1 or more characters, followed by 1 or more whitespaces.

As the part (\w+) is the first subexpression in the regex, \1 matches to whatever it matches. So the regex matches the repeating word.

Another example would be matching repeating word couples:

[\s]+(\w+)[\s]+(\w+)[\s]+\1[\s]+\2[\s]+

The first (\w+) will be the first word as \1, and the second one will be the second word as \2.

Backreferences help a lot at find and replace operations. At the repeating word example, if we want to replace repeating words to a single one, for the replace part we would have to write \1


4. Regex Examples


Please consider, this examples are not perfect. You or someone else can definitely find or write better versions.

4.1. Email address abc.def@email.duck.com.nz

\w+[\w\.]*@\w+[\w\.]*\.\w+

Name Part:
it can only start with a letter or a digit \w+ then may follow any number of letters, digits and dots [\w.]* then comes @

Domain Part:

it can only start with a letter or digit \w+ then may follow any number of letters, digits and dots [\w.]* then comes . \. and then comes the TLD part \w+

4.2. Date Format 02/02/2020 1\5\12 1-11-1995 31.12.2020

\d{1,2}[-\/.]\d{1,2}[-\/.]\d{2,4}

1 or 2 digit day field → \d{1,2}
Separator - \ / or . → [-\/.]
1 or 2 digit month field → \d{1,2}
Separator - \ / or . → [-\/.]
2 to 4 digit year field → \d{2,4}

4.3. IP Address 192.168.1.110 (A better one is coming too)

(\d{1,3}\.){3}\d{1,3}

3 of (1 to 3 digit numbers, followed by a dot) → (\d{1,3}.){3} 1 (1 to 3 digit numbers) → \d{1,3}

Actually this regex matches invalid IP addresses too, like: 300.288.11.11

4.4. A Better IP Adress

(((25[0-5])|(2[0-4]\d)|(1\d{2})|(\d{1,2}))\.)(((25[0-5])|(2[0-4]\d)|(1\d{2})|(\d{1,2})))