Switch tokenizing article to new math delimiters
Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
This commit is contained in:
parent
189422bf1e
commit
409f8b7186
|
@ -55,31 +55,31 @@ patterns that a string has to match. We define regular expressions
|
||||||
as follows:
|
as follows:
|
||||||
|
|
||||||
* Any character is a regular expression that matches that character. Thus,
|
* Any character is a regular expression that matches that character. Thus,
|
||||||
\\(a\\) is a regular expression (from now shortened to regex) that matches
|
\(a\) is a regular expression (from now shortened to regex) that matches
|
||||||
the character 'a', and nothing else.
|
the character 'a', and nothing else.
|
||||||
* \\(r_1r_2\\), or the concatenation of \\(r_1\\) and \\(r_2\\), is
|
* \(r_1r_2\), or the concatenation of \(r_1\) and \(r_2\), is
|
||||||
a regular expression that matches anything matched by \\(r_1\\), followed
|
a regular expression that matches anything matched by \(r_1\), followed
|
||||||
by anything that matches \\(r_2\\). For instance, \\(ab\\), matches
|
by anything that matches \(r_2\). For instance, \(ab\), matches
|
||||||
the character 'a' followed by the character 'b' (thus matching "ab").
|
the character 'a' followed by the character 'b' (thus matching "ab").
|
||||||
* \\(r_1|r_2\\) matches anything that is either matched by \\(r_1\\) or
|
* \(r_1|r_2\) matches anything that is either matched by \(r_1\) or
|
||||||
\\(r_2\\). Thus, \\(a|b\\) matches the character 'a' or the character 'b'.
|
\(r_2\). Thus, \(a|b\) matches the character 'a' or the character 'b'.
|
||||||
* \\(r_1?\\) matches either an empty string, or anything matched by \\(r_1\\).
|
* \(r_1?\) matches either an empty string, or anything matched by \(r_1\).
|
||||||
* \\(r_1+\\) matches one or more things matched by \\(r_1\\). So,
|
* \(r_1+\) matches one or more things matched by \(r_1\). So,
|
||||||
\\(a+\\) matches "a", "aa", "aaa", and so on.
|
\(a+\) matches "a", "aa", "aaa", and so on.
|
||||||
* \\((r_1)\\) matches anything that matches \\(r_1\\). This is mostly used
|
* \((r_1)\) matches anything that matches \(r_1\). This is mostly used
|
||||||
to group things together in more complicated expressions.
|
to group things together in more complicated expressions.
|
||||||
* \\(.\\) matches any character.
|
* \(.\) matches any character.
|
||||||
|
|
||||||
More powerful variations of regex also include an "any of" operator, \\([c_1c_2c_3]\\),
|
More powerful variations of regex also include an "any of" operator, \([c_1c_2c_3]\),
|
||||||
which is equivalent to \\(c_1|c_2|c_3\\), and a "range" operator, \\([c_1-c_n]\\), which
|
which is equivalent to \(c_1|c_2|c_3\), and a "range" operator, \([c_1-c_n]\), which
|
||||||
matches all characters in the range between \\(c_1\\) and \\(c_n\\), inclusive.
|
matches all characters in the range between \(c_1\) and \(c_n\), inclusive.
|
||||||
|
|
||||||
Let's see some examples. An integer, such as 326, can be represented with \\([0-9]+\\).
|
Let's see some examples. An integer, such as 326, can be represented with \([0-9]+\).
|
||||||
This means, one or more characters between 0 or 9. Some (most) regex implementations
|
This means, one or more characters between 0 or 9. Some (most) regex implementations
|
||||||
have a special symbol for \\([0-9]\\), written as \\(\\setminus d\\). A variable,
|
have a special symbol for \([0-9]\), written as \(\setminus d\). A variable,
|
||||||
starting with a lowercase letter and containing lowercase or uppercase letters after it,
|
starting with a lowercase letter and containing lowercase or uppercase letters after it,
|
||||||
can be written as \\(\[a-z\]([a-zA-Z]+)?\\). Again, most regex implementations provide
|
can be written as \([a-z]([a-zA-Z]+)?\). Again, most regex implementations provide
|
||||||
a special operator for \\((r_1+)?\\), written as \\(r_1*\\).
|
a special operator for \((r_1+)?\), written as \(r_1*\).
|
||||||
|
|
||||||
So how does one go about checking if a regular expression matches a string? An efficient way is to
|
So how does one go about checking if a regular expression matches a string? An efficient way is to
|
||||||
first construct a [state machine](https://en.wikipedia.org/wiki/Finite-state_machine). A type of state machine can be constructed from a regular expression
|
first construct a [state machine](https://en.wikipedia.org/wiki/Finite-state_machine). A type of state machine can be constructed from a regular expression
|
||||||
|
|
Loading…
Reference in New Issue
Block a user