A library to convert a set of input patterns into tokens.
Go to file
Danila Fedorin 2f65eda7bd Try to fix reading freed memory in eval.c.
This occurred when the null term was reached - the token was freed, but
no error was thrown (it's just the end of the string), so the program
attempted to increment the index using freed token data.
2017-02-26 22:35:45 -08:00
external Update libds. 2017-01-27 22:04:16 -08:00
include Make foreach_free public and fix a few bugs. 2017-02-14 19:10:44 -08:00
src Try to fix reading freed memory in eval.c. 2017-02-26 22:35:45 -08:00
.gitignore Initial commit. Setup gitignore. 2017-01-19 15:29:50 -08:00
.gitmodules Add libds dependency. 2017-01-19 15:30:11 -08:00
CMakeLists.txt Write initial code for matching the patterns to a single string. 2017-02-04 00:28:36 -08:00
README.md Create README.md 2017-02-15 03:28:52 +00:00

README.md

liblex

A library for converting input text into tokens defined by regular expressions.

Rationale

liblex is a part of an attempt to write / create a compiler entirely from scratch. This part of the compiler would be used to convert input text into tokens to be evaluated by the parser.

Usage

First of all, an evulation configuration has to be created. This configuration is used to store the various regular expressions to be used during lexing. The below code is an example of initializing and configuring an evalutation configuration.

/* Declares the configuration */
eval_config config;
/* Initializes the configuration for use */
eval_config_init(&config);
/* Registers regular expressions to be used. The IDs, given as the third
   parameter are also used for priority - the higher the ID, the higher the
   priority. */
eval_config_add(&config, "[ \n]+", 0);
eval_config_add(&config, "[a-zA-Z_][a-zA-Z_0-9]*", 1);
eval_config_add(&config, "if", 2);
eval_config_add(&config, "else", 3);
eval_config_add(&config, "[0-9]+", 4);
eval_config_add(&config, "{|}", 5);

It should be noted that this example is incomplete. eval_config_add returns a liblex_result, which represents the result of the operation. LIBLEX_SUCCESS means that no errors occured. LIBLEX_MALLOC, on the other hand, means that the function failed to allocate the necessary memory, and LIBLEX_INVALID means that the regular expression provided was not correctly formatted.

After the eval configuration has been configured, tokenizing a string is done by creating a linked list and populating it with the resulting tokens (called matches).

/* Declares the linked list. */
ll match_ll;
/* Initializes the linked list. */
ll_init(&match_ll);

/* The first parameter is the input string, the second is the index at which
   to begin parsing. */
eval_all(string, 0, &config, &match_ll);

Once done, some things need to be cleaned up. The eval_foreach_match_free function can be passed to ll_foreach containing the matches to release them:

ll_foreach(&match_ll, NULL, compare_always, eval_foreach_match_free);
ll_clear(&match_ll);

And the configuration can be freed using:

eval_config_free(&config);