external | ||
include | ||
src | ||
.gitignore | ||
.gitmodules | ||
CMakeLists.txt | ||
README.md |
libregex
A regular expression matching library using Nondeterministic Finite Automata (NFA).
Rationale
There is very little rationale for programming libds, looking at it from a very general poin of view. It's part of the compulsion to re-implement everything on my own. The regular expression subset that this library can handle is very bare.
Usage
The usage for the library is fairly simple. The first step is to build a regular expression from a string. The regex_build
function takes care of this, returning a libregex_result
. libregex_result
is an enum used for reporting errors, and the function will return LIBREGEX_SUCCESS
if all goes well, or a choice of either LIBREGEX_MALLOC
or LIBREGEX_INVALID
if the compilation fails due to a bad memory allocation or an invalid regular expression, respectively. The detailed function prototype can be seen below:
libregex_result regex_build(regex_node** root, char* expression);
The first parameter is a pointer to the "root" node of the NFA. It's the first state which will be the starting point during the matching process. The pointer will be populated by the function. The second parameter is the actual regular expression, as a NULL-terminated string.
Once a regular expression has been built, it can be used to match strings. This is, unfortunately, not thread-safe, as the states of the NFA carry some small pieces of data that are necessary for the matching process to stay efficient. The function prototype of the matching function can be seen below:
libregex_result regex_match_string(regex_node* root, char* string, regex_result* result);
The first parameters it the root node acquired through regex_build
, the second parameter is the NULL-terminated string to be matched, and the last parameter is a pointer to a struct into which the data from the match will be stored. result->matches
will be set to 1 if the expression matched or 0 otherwise, and the result->groups
is an array of matched regular expression groups, each consisting of a regex_match
struct, which, in turn, carries the beginning and end index of the regular expression (inclusive). The result allocates memory, and needs to be freed using regex_result_free
:
void regex_result_free(regex_result* result);
The regular expression NFA also needs to be freed:
libregex_result regex_free(regex_node* root);