An experimental attempt to write a regular expression matcher.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Danila Fedorin 6123d03471 Update libds. 5 years ago
external Update libds. 5 years ago
include Make sure codebase is C90. 5 years ago
src Make sure codebase is C90. 5 years ago
.gitignore Initial commit. Create .gitignore. 6 years ago
.gitmodules Edit the URL to use http instead of ssh, allowing for non-ssh access. 6 years ago
CMakeLists.txt Make sure codebase is C90. 5 years ago Add 6 years ago


A regular expression matching library using Nondeterministic Finite Automata (NFA).


There is very little rationale for programming libds, looking at it from a very general poin of view. It's part of the compulsion to re-implement everything on my own. The regular expression subset that this library can handle is very bare.


The usage for the library is fairly simple. The first step is to build a regular expression from a string. The regex_build function takes care of this, returning a libregex_result. libregex_result is an enum used for reporting errors, and the function will return LIBREGEX_SUCCESS if all goes well, or a choice of either LIBREGEX_MALLOC or LIBREGEX_INVALID if the compilation fails due to a bad memory allocation or an invalid regular expression, respectively. The detailed function prototype can be seen below:

libregex_result regex_build(regex_node** root, char* expression);

The first parameter is a pointer to the "root" node of the NFA. It's the first state which will be the starting point during the matching process. The pointer will be populated by the function. The second parameter is the actual regular expression, as a NULL-terminated string.

Once a regular expression has been built, it can be used to match strings. This is, unfortunately, not thread-safe, as the states of the NFA carry some small pieces of data that are necessary for the matching process to stay efficient. The function prototype of the matching function can be seen below:

libregex_result regex_match_string(regex_node* root, char* string, regex_result* result);

The first parameters it the root node acquired through regex_build, the second parameter is the NULL-terminated string to be matched, and the last parameter is a pointer to a struct into which the data from the match will be stored. result->matches will be set to 1 if the expression matched or 0 otherwise, and the result->groups is an array of matched regular expression groups, each consisting of a regex_match struct, which, in turn, carries the beginning and end index of the regular expression (inclusive). The result allocates memory, and needs to be freed using regex_result_free:

void regex_result_free(regex_result* result);

The regular expression NFA also needs to be freed:

libregex_result regex_free(regex_node* root);