A library to convert a set of input patterns into tokens.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Danila Fedorin be7ea694fc Update libds (again). 4 years ago
external Update libds (again). 4 years ago
include Build an "inverted" path to allow for patterns that exclude characters. 4 years ago
src Fix bug causing comparison between incompatible enums. 4 years ago
.gitignore Initial commit. Setup gitignore. 5 years ago
.gitmodules Change submodule url to http to allow non-ssh access. 5 years ago
CMakeLists.txt Make CMake calls consistent. 4 years ago
README.md Create README.md 5 years ago



A library for converting input text into tokens defined by regular expressions.


liblex is a part of an attempt to write / create a compiler entirely from scratch. This part of the compiler would be used to convert input text into tokens to be evaluated by the parser.


First of all, an evulation configuration has to be created. This configuration is used to store the various regular expressions to be used during lexing. The below code is an example of initializing and configuring an evalutation configuration.

/* Declares the configuration */
eval_config config;
/* Initializes the configuration for use */
/* Registers regular expressions to be used. The IDs, given as the third
   parameter are also used for priority - the higher the ID, the higher the
   priority. */
eval_config_add(&config, "[ \n]+", 0);
eval_config_add(&config, "[a-zA-Z_][a-zA-Z_0-9]*", 1);
eval_config_add(&config, "if", 2);
eval_config_add(&config, "else", 3);
eval_config_add(&config, "[0-9]+", 4);
eval_config_add(&config, "{|}", 5);

It should be noted that this example is incomplete. eval_config_add returns a liblex_result, which represents the result of the operation. LIBLEX_SUCCESS means that no errors occured. LIBLEX_MALLOC, on the other hand, means that the function failed to allocate the necessary memory, and LIBLEX_INVALID means that the regular expression provided was not correctly formatted.

After the eval configuration has been configured, tokenizing a string is done by creating a linked list and populating it with the resulting tokens (called matches).

/* Declares the linked list. */
ll match_ll;
/* Initializes the linked list. */

/* The first parameter is the input string, the second is the index at which
   to begin parsing. */
eval_all(string, 0, &config, &match_ll);

Once done, some things need to be cleaned up. The eval_foreach_match_free function can be passed to ll_foreach containing the matches to release them:

ll_foreach(&match_ll, NULL, compare_always, eval_foreach_match_free);

And the configuration can be freed using: