Switch tokenizing article to new math delimiters
Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
This commit is contained in:
parent
189422bf1e
commit
409f8b7186
|
@ -55,31 +55,31 @@ patterns that a string has to match. We define regular expressions
|
|||
as follows:
|
||||
|
||||
* Any character is a regular expression that matches that character. Thus,
|
||||
\\(a\\) is a regular expression (from now shortened to regex) that matches
|
||||
\(a\) is a regular expression (from now shortened to regex) that matches
|
||||
the character 'a', and nothing else.
|
||||
* \\(r_1r_2\\), or the concatenation of \\(r_1\\) and \\(r_2\\), is
|
||||
a regular expression that matches anything matched by \\(r_1\\), followed
|
||||
by anything that matches \\(r_2\\). For instance, \\(ab\\), matches
|
||||
* \(r_1r_2\), or the concatenation of \(r_1\) and \(r_2\), is
|
||||
a regular expression that matches anything matched by \(r_1\), followed
|
||||
by anything that matches \(r_2\). For instance, \(ab\), matches
|
||||
the character 'a' followed by the character 'b' (thus matching "ab").
|
||||
* \\(r_1|r_2\\) matches anything that is either matched by \\(r_1\\) or
|
||||
\\(r_2\\). Thus, \\(a|b\\) matches the character 'a' or the character 'b'.
|
||||
* \\(r_1?\\) matches either an empty string, or anything matched by \\(r_1\\).
|
||||
* \\(r_1+\\) matches one or more things matched by \\(r_1\\). So,
|
||||
\\(a+\\) matches "a", "aa", "aaa", and so on.
|
||||
* \\((r_1)\\) matches anything that matches \\(r_1\\). This is mostly used
|
||||
* \(r_1|r_2\) matches anything that is either matched by \(r_1\) or
|
||||
\(r_2\). Thus, \(a|b\) matches the character 'a' or the character 'b'.
|
||||
* \(r_1?\) matches either an empty string, or anything matched by \(r_1\).
|
||||
* \(r_1+\) matches one or more things matched by \(r_1\). So,
|
||||
\(a+\) matches "a", "aa", "aaa", and so on.
|
||||
* \((r_1)\) matches anything that matches \(r_1\). This is mostly used
|
||||
to group things together in more complicated expressions.
|
||||
* \\(.\\) matches any character.
|
||||
* \(.\) matches any character.
|
||||
|
||||
More powerful variations of regex also include an "any of" operator, \\([c_1c_2c_3]\\),
|
||||
which is equivalent to \\(c_1|c_2|c_3\\), and a "range" operator, \\([c_1-c_n]\\), which
|
||||
matches all characters in the range between \\(c_1\\) and \\(c_n\\), inclusive.
|
||||
More powerful variations of regex also include an "any of" operator, \([c_1c_2c_3]\),
|
||||
which is equivalent to \(c_1|c_2|c_3\), and a "range" operator, \([c_1-c_n]\), which
|
||||
matches all characters in the range between \(c_1\) and \(c_n\), inclusive.
|
||||
|
||||
Let's see some examples. An integer, such as 326, can be represented with \\([0-9]+\\).
|
||||
Let's see some examples. An integer, such as 326, can be represented with \([0-9]+\).
|
||||
This means, one or more characters between 0 or 9. Some (most) regex implementations
|
||||
have a special symbol for \\([0-9]\\), written as \\(\\setminus d\\). A variable,
|
||||
have a special symbol for \([0-9]\), written as \(\setminus d\). A variable,
|
||||
starting with a lowercase letter and containing lowercase or uppercase letters after it,
|
||||
can be written as \\(\[a-z\]([a-zA-Z]+)?\\). Again, most regex implementations provide
|
||||
a special operator for \\((r_1+)?\\), written as \\(r_1*\\).
|
||||
can be written as \([a-z]([a-zA-Z]+)?\). Again, most regex implementations provide
|
||||
a special operator for \((r_1+)?\), written as \(r_1*\).
|
||||
|
||||
So how does one go about checking if a regular expression matches a string? An efficient way is to
|
||||
first construct a [state machine](https://en.wikipedia.org/wiki/Finite-state_machine). A type of state machine can be constructed from a regular expression
|
||||
|
|
Loading…
Reference in New Issue
Block a user