libregex/README.md

2.4 KiB

libregex

A regular expression matching library using Nondeterministic Finite Automata (NFA).

Rationale

There is very little rationale for programming libds, looking at it from a very general poin of view. It's part of the compulsion to re-implement everything on my own. The regular expression subset that this library can handle is very bare.

Usage

The usage for the library is fairly simple. The first step is to build a regular expression from a string. The regex_build function takes care of this, returning a libregex_result. libregex_result is an enum used for reporting errors, and the function will return LIBREGEX_SUCCESS if all goes well, or a choice of either LIBREGEX_MALLOC or LIBREGEX_INVALID if the compilation fails due to a bad memory allocation or an invalid regular expression, respectively. The detailed function prototype can be seen below:

libregex_result regex_build(regex_node** root, char* expression);

The first parameter is a pointer to the "root" node of the NFA. It's the first state which will be the starting point during the matching process. The pointer will be populated by the function. The second parameter is the actual regular expression, as a NULL-terminated string.

Once a regular expression has been built, it can be used to match strings. This is, unfortunately, not thread-safe, as the states of the NFA carry some small pieces of data that are necessary for the matching process to stay efficient. The function prototype of the matching function can be seen below:

libregex_result regex_match_string(regex_node* root, char* string, regex_result* result);

The first parameters it the root node acquired through regex_build, the second parameter is the NULL-terminated string to be matched, and the last parameter is a pointer to a struct into which the data from the match will be stored. result->matches will be set to 1 if the expression matched or 0 otherwise, and the result->groups is an array of matched regular expression groups, each consisting of a regex_match struct, which, in turn, carries the beginning and end index of the regular expression (inclusive). The result allocates memory, and needs to be freed using regex_result_free:

void regex_result_free(regex_result* result);

The regular expression NFA also needs to be freed:

libregex_result regex_free(regex_node* root);