Switch tokenizing article to new math delimiters

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
This commit is contained in:
Danila Fedorin 2024-05-13 18:20:04 -07:00
parent 189422bf1e
commit 409f8b7186

View File

@ -55,31 +55,31 @@ patterns that a string has to match. We define regular expressions
as follows:
* Any character is a regular expression that matches that character. Thus,
\\(a\\) is a regular expression (from now shortened to regex) that matches
\(a\) is a regular expression (from now shortened to regex) that matches
the character 'a', and nothing else.
* \\(r_1r_2\\), or the concatenation of \\(r_1\\) and \\(r_2\\), is
a regular expression that matches anything matched by \\(r_1\\), followed
by anything that matches \\(r_2\\). For instance, \\(ab\\), matches
* \(r_1r_2\), or the concatenation of \(r_1\) and \(r_2\), is
a regular expression that matches anything matched by \(r_1\), followed
by anything that matches \(r_2\). For instance, \(ab\), matches
the character 'a' followed by the character 'b' (thus matching "ab").
* \\(r_1|r_2\\) matches anything that is either matched by \\(r_1\\) or
\\(r_2\\). Thus, \\(a|b\\) matches the character 'a' or the character 'b'.
* \\(r_1?\\) matches either an empty string, or anything matched by \\(r_1\\).
* \\(r_1+\\) matches one or more things matched by \\(r_1\\). So,
\\(a+\\) matches "a", "aa", "aaa", and so on.
* \\((r_1)\\) matches anything that matches \\(r_1\\). This is mostly used
* \(r_1|r_2\) matches anything that is either matched by \(r_1\) or
\(r_2\). Thus, \(a|b\) matches the character 'a' or the character 'b'.
* \(r_1?\) matches either an empty string, or anything matched by \(r_1\).
* \(r_1+\) matches one or more things matched by \(r_1\). So,
\(a+\) matches "a", "aa", "aaa", and so on.
* \((r_1)\) matches anything that matches \(r_1\). This is mostly used
to group things together in more complicated expressions.
* \\(.\\) matches any character.
* \(.\) matches any character.
More powerful variations of regex also include an "any of" operator, \\([c_1c_2c_3]\\),
which is equivalent to \\(c_1|c_2|c_3\\), and a "range" operator, \\([c_1-c_n]\\), which
matches all characters in the range between \\(c_1\\) and \\(c_n\\), inclusive.
More powerful variations of regex also include an "any of" operator, \([c_1c_2c_3]\),
which is equivalent to \(c_1|c_2|c_3\), and a "range" operator, \([c_1-c_n]\), which
matches all characters in the range between \(c_1\) and \(c_n\), inclusive.
Let's see some examples. An integer, such as 326, can be represented with \\([0-9]+\\).
Let's see some examples. An integer, such as 326, can be represented with \([0-9]+\).
This means, one or more characters between 0 or 9. Some (most) regex implementations
have a special symbol for \\([0-9]\\), written as \\(\\setminus d\\). A variable,
have a special symbol for \([0-9]\), written as \(\setminus d\). A variable,
starting with a lowercase letter and containing lowercase or uppercase letters after it,
can be written as \\(\[a-z\]([a-zA-Z]+)?\\). Again, most regex implementations provide
a special operator for \\((r_1+)?\\), written as \\(r_1*\\).
can be written as \([a-z]([a-zA-Z]+)?\). Again, most regex implementations provide
a special operator for \((r_1+)?\), written as \(r_1*\).
So how does one go about checking if a regular expression matches a string? An efficient way is to
first construct a [state machine](https://en.wikipedia.org/wiki/Finite-state_machine). A type of state machine can be constructed from a regular expression