A library to convert a set of input patterns into tokens.
Go to file
Danila Fedorin eb4204fe76 Fix two bugs. One was caused by the previous commit.
The other was created by using invalid memory.
2017-02-25 22:38:13 -08:00
external Update libds. 2017-01-27 22:04:16 -08:00
include Make foreach_free public and fix a few bugs. 2017-02-14 19:10:44 -08:00
src Fix two bugs. One was caused by the previous commit. 2017-02-25 22:38:13 -08:00
.gitignore Initial commit. Setup gitignore. 2017-01-19 15:29:50 -08:00
.gitmodules Add libds dependency. 2017-01-19 15:30:11 -08:00
CMakeLists.txt Write initial code for matching the patterns to a single string. 2017-02-04 00:28:36 -08:00
README.md Create README.md 2017-02-15 03:28:52 +00:00

liblex

A library for converting input text into tokens defined by regular expressions.

Rationale

liblex is a part of an attempt to write / create a compiler entirely from scratch. This part of the compiler would be used to convert input text into tokens to be evaluated by the parser.

Usage

First of all, an evulation configuration has to be created. This configuration is used to store the various regular expressions to be used during lexing. The below code is an example of initializing and configuring an evalutation configuration.

/* Declares the configuration */
eval_config config;
/* Initializes the configuration for use */
eval_config_init(&config);
/* Registers regular expressions to be used. The IDs, given as the third
   parameter are also used for priority - the higher the ID, the higher the
   priority. */
eval_config_add(&config, "[ \n]+", 0);
eval_config_add(&config, "[a-zA-Z_][a-zA-Z_0-9]*", 1);
eval_config_add(&config, "if", 2);
eval_config_add(&config, "else", 3);
eval_config_add(&config, "[0-9]+", 4);
eval_config_add(&config, "{|}", 5);

It should be noted that this example is incomplete. eval_config_add returns a liblex_result, which represents the result of the operation. LIBLEX_SUCCESS means that no errors occured. LIBLEX_MALLOC, on the other hand, means that the function failed to allocate the necessary memory, and LIBLEX_INVALID means that the regular expression provided was not correctly formatted.

After the eval configuration has been configured, tokenizing a string is done by creating a linked list and populating it with the resulting tokens (called matches).

/* Declares the linked list. */
ll match_ll;
/* Initializes the linked list. */
ll_init(&match_ll);

/* The first parameter is the input string, the second is the index at which
   to begin parsing. */
eval_all(string, 0, &config, &match_ll);

Once done, some things need to be cleaned up. The eval_foreach_match_free function can be passed to ll_foreach containing the matches to release them:

ll_foreach(&match_ll, NULL, compare_always, eval_foreach_match_free);
ll_clear(&match_ll);

And the configuration can be freed using:

eval_config_free(&config);