Write the semantics section using Bergamot

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
2024-10-13 13:31:47 -07:00 · 2024-10-13 13:31:47 -07:00 · 37dd9ad6d4
commit 37dd9ad6d4
parent 77dade1d1d
2 changed files with 382 additions and 0 deletions
--- a/content/blog/05_spa_agda_semantics/index.md
+++ b/content/blog/05_spa_agda_semantics/index.md
@ -5,6 +5,17 @@ description: "In this post, I define the language that well serve as the object
 date: 2024-08-10T17:37:43-07:00
 tags: ["Agda", "Programming Languages"]
 draft: true
 custom_js: ["parser.js"]
 bergamot:
    render_presets:
        default: "bergamot/rendering/imp.bergamot"
    input_modes:
        - name: "Expression"
          fn: "window.parseExpr"
        - name: "Basic Statement"
          fn: "window.parseBasicStmt"
        - name: "Statement"
          fn: "window.parseStmt"
 ---
 In the previous several posts, I've formalized the notion of lattices, which
@ -197,3 +208,246 @@ The Agda version is:
 Notice how we used `noop` to express the fact that the `else` branch of the
 conditional does nothing.
 ### The Semantics of Our Language
 We now have all the language constructs that I'll be showing off --- because
 those are all the concepts that I've formalized. What's left is to define
 how they behave. We will do this using a logical tool called
 [_inference rules_](https://en.wikipedia.org/wiki/Rule_of_inference). I've
 written about them a number of times; they're ubiquitous, particularly in the
 sorts of things I like explore on this site. The [section on inference rules]({{< relref "01_aoc_coq#inference-rules" >}})
 from my Advent of Code series is pretty relevant, and [the notation section from
 a post in my compiler series]({{< relref "03_compiler_typechecking#some-notation" >}}) says
 much the same thing; I won't be re-describing them here.
 There are three pieces which demand semantics: expressions, simple statements,
 and non-simple statements. The semantics of each of the three requires the
 semantics of the items that precede it. We will therefore start with expressions.
 #### Expressions
 The trickiest thing about expression is that the value of an expression depends
 on the "context": `x+1` can evaluate to `43` if `x` is `42`, or it can evaluate
 to `0` if `x` is `-1`. To evaluate an expression, we will therefore need to
 assign values to all of the variables in that expression. A mapping that
 assigns values to variables is typically called an _environment_. We will write
 \(\varnothing\) for "empty environment", and \(\{\texttt{x} \mapsto 42, \texttt{y} \mapsto -1 \}\) for
 an environment that maps the variable \(\texttt{x}\) to 42, and the variable \(\texttt{y}\) to -1.
 Now, a bit of notation. We will use the letter \(\rho\) to represent environments
 (and if several environments are involved, we will occasionally number them
 as \(\rho_1\), \(\rho_2\), etc.) We will use the letter \(e\) to stand for
 expressions, and the letter \(v\) to stand for values. Finally, we'll write
 \(\rho, e \Downarrow v\) to say that "in an environment \(\rho\), expression \(e\)
 evaluates to value \(v\)". Our two previous examples of evaluating `x+1` can
 thus be written as follows:
 {{< latex >}}
 \{ \texttt{x} \mapsto 42 \}, \texttt{x}+1 \Downarrow 43 \\
 \{ \texttt{x} \mapsto -1 \}, \texttt{x}+1 \Downarrow 0 \\
 {{< /latex >}}
 Now, on to the actual rules for how to evaluate expressions. Most simply,
 integer literals `1` just evaluate to themselves.
 {{< latex >}}
 \frac{n \in \text{Int}}{\rho, n \Downarrow n}
 {{< /latex >}}
 Note that the letter \(\rho\) is completely unused in the above rule. That's
 because no matter what values _variables_ have, a number still evaluates to
 the same value. As we've already established, the same is not true for a
 variable like \(\texttt{x}\). To evaluate such a variable, we need to retrieve
 the value it's mapped to in the current environment, which we will write as
 \(\rho(\texttt{x})\). This gives the following inference rule:
 {{< latex >}}
 \frac{\rho(x) = v}{\rho, x \Downarrow v}
 {{< /latex >}}
 All that's left is to define addition and subtraction. For an expression in the
 form \(e_1+e_2\), we first need to evaluate the two subexpressions \(e_1\)
 and \(e_2\), and then add the two resulting numbers. As a result, the addition
 rule includes two additional premises, one for evaluating each summand.
 {{< latex >}}
 \frac
    {\rho, e_1 \Downarrow v_1 \quad \rho, e_2 \Downarrow v_2 \quad v_1 + v_2 = v}
    {\rho, e_1+e_2 \Downarrow v}
 {{< /latex >}}
 The subtraction rule is similar. Below, I've configured an instance of
 [Bergamot]({{< relref "bergamot" >}}) to interpret these exact rules. Try
 typing various expressions like `1`, `1+1`, etc. into the input box below
 to see them evaluate. If you click the "Full Proof Tree" button, you can also view
 the exact rules that were used in computing a particular value. The variables
 `x`, `y`, and `z` are pre-defined for your convenience.
 {{< bergamot_widget id="expr-widget" query="" prompt="eval(extend(extend(extend(empty, x, 17), y, 42), z, 0), TERM, ?v)" modes="Expression:Expression" >}}
 section "" {
  EvalNum @ eval(?rho, lit(?n), ?n) <- int(?n);
  EvalVar @ eval(?rho, var(?x), ?v) <- inenv(?x, ?v, ?rho);
 }
 section "" {
  EvalPlus @ eval(?rho, plus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), add(?v_1, ?v_2, ?v);
  EvalMinus @ eval(?rho, minus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), subtract(?v_1, ?v_2, ?v);
 }
 hidden section "" {
  EnvTake @ inenv(?x, ?v, extend(?rho, ?x, ?v)) <-;
  EnvSkip @ inenv(?x, ?v_1, extend(?rho, ?y, ?v_2)) <- inenv(?x, ?v_1, ?rho), not(symeq(?x, ?y));
 }
 {{< /bergamot_widget >}}
 The Agda equivalent of this looks very similar to the rules themselves. I use
 `⇒ᵉ` instead of \(\Downarrow\), and there's a little bit of tedium with
 wrapping integers into a new `Value` type. I also used a (partial) relation
 `(x, v) ∈ ρ` instead of explicitly defining accessing an environment, since
 it is conceivable for a user to attempt accessing a variable that has not
 been assigned to. Aside from these notational changes, the structure of each
 of the constructors of the evaluation data type matches the inference rules
 I showed above.
 {{< codelines "Agda" "agda-spa/Language/Semantics.agda" 27 35 >}}
 #### Simple Statements
 The main difference between formalizing (simple and "normal") statements is
 that they modify the environment. If `x` has one value, writing `x = x + 1` will
 certainly change that value. On the other hands, statements don't produce values.
 So, we will be writing claims like \(\rho_1 , \textit{bs} \Rightarrow \rho_2\)
 to say that the basic statement \(\textit{bs}\), when starting in environment
 \(\rho_1\), will produce environment \(\rho_2\). Here's an example:
 {{< latex >}}
 \{  \texttt{x} \mapsto 42, \texttt{y} \mapsto 17 \}, \  \texttt{x = x - \text{1}} \Rightarrow \{  \texttt{x} \mapsto 41, \texttt{y} \mapsto 17 \}
 {{< /latex >}}
 Here, we subtracted one from a variable with value `42`, leaving it with a new
 value of `41`.
 There are two basic statements, and one of them quite literally does nothing.
 The inference rule for `noop` is very simple:
 {{< latex >}}
 \rho,\ \texttt{noop} \Rightarrow \rho
 {{< /latex >}}
 For the assignment rule, we need to know how to evaluate the expression on the
 right side of the equal sign. This is why we needed to define the semantics
 of expressions first. Given those, the evaluation rule for assignment is as
 follows, with \(\rho[x \mapsto v]\) meaning "the environment \(\rho\) but
 mapping the variable \(x\) to value \(v\)".
 {{< latex >}}
 \frac
  {\rho, e \Downarrow v}
  {\rho, x = e \Rightarrow \rho[x \mapsto v]}
 {{< /latex >}}
 Those are actually all the rules we need, and below, I am once again configuring
 a Bergamot instance, this time with simple statements. Try out `noop` or some
 sort of variable assignment, like `x = x + 1`.
 {{< bergamot_widget id="basic-stmt-widget" query="" prompt="stepbasic(extend(extend(extend(empty, x, 17), y, 42), z, 0), TERM, ?env)" modes="Basic Statement:Basic Statement" >}}
 hidden section "" {
  EvalNum @ eval(?rho, lit(?n), ?n) <- int(?n);
  EvalVar @ eval(?rho, var(?x), ?v) <- inenv(?x, ?v, ?rho);
 }
 hidden section "" {
  EvalPlus @ eval(?rho, plus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), add(?v_1, ?v_2, ?v);
  EvalMinus @ eval(?rho, minus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), subtract(?v_1, ?v_2, ?v);
 }
 section "" {
  StepNoop @ stepbasic(?rho, noop, ?rho) <-;
  StepAssign @ stepbasic(?rho, assign(?x, ?e), extend(?rho, ?x, ?v)) <- eval(?rho, ?e, ?v);
 }
 hidden section "" {
  EnvTake @ inenv(?x, ?v, extend(?rho, ?x, ?v)) <-;
  EnvSkip @ inenv(?x, ?v_1, extend(?rho, ?y, ?v_2)) <- inenv(?x, ?v_1, ?rho), not(symeq(?x, ?y));
 }
 {{< /bergamot_widget >}}
 The Agda implementation is once again just a data type with constructors-for-rules.
 This time they also look quite similar to the rules I've shown up until now,
 though I continue to explicitly quantify over variables like `ρ`.
 {{< codelines "Agda" "agda-spa/Language/Semantics.agda" 37 40 >}}
 #### Statements
 Let's work on non-simple statements next. The easiest rule to define is probably
 sequencing. When we use `then` (or `;`) to combine two statements, what we
 actually want is to execute the first statement, which may change variables,
 and then execute the second statement while keeping the changes from the first.
 This means there are three environments: \(\rho_1\) for the initial state before
 either statement is executed, \(\rho_2\) for the state between executing the
 first and second statement, and \(\rho_3\) for the final state after both
 are done executing. This leads to the following rule:
 {{< latex >}}
 \frac
    { \rho_1, s_1 \Rightarrow \rho_2 \quad \rho_2, s_2 \Rightarrow \rho_3 }
    { \rho_1, s_1; s_2 \Rightarrow \rho_3 }
 {{< /latex >}}
 We will actually need two rules to evaluate the conditional statement: one
 for when the condition evaluates to "true", and one for when the condition
 evaluates to "false". Only, I never specified booleans as being part of
 the language, which means that we will need to come up what "false" and "true"
 are. I will take my cue from C++ and use zero as "false", and any other number
 as "true".
 If the condition of an `if`-`else` statement is "true" (nonzero), then the
 effect of executing the `if`-`else` should be the same as executing the "then"
 part of the statement, while completely ignoring the "else" part.
 {{< latex >}}
 \frac
    { \rho_1 , e \Downarrow v \quad v \neq 0 \quad \rho_1, s_1 \Rightarrow \rho_2}
    { \rho_1, \textbf{if}\ e\ \{ s_1 \}\ \textbf{else}\ \{ s_2 \}\ \Rightarrow \rho_2 }
 {{< /latex >}}
 Notice that in the above rule, we used the evaluation judgement \(\rho_1, e \Downarrow v\)
 to evaluate the _expression_ that serves as the condition. We then had an
 additional premise that requires the truthiness of the resulting value \(v\).
 The rule for evaluating a conditional with a "false" branch is very similar.
 {{< latex >}}
 \frac
    { \rho_1 , e \Downarrow v \quad v = 0 \quad \rho_1, s_2 \Rightarrow \rho_2}
    { \rho_1, \textbf{if}\ e\ \{ s_1 \}\ \textbf{else}\ \{ s_2 \}\ \Rightarrow \rho_2 }
 {{< /latex >}}
 {{< bergamot_widget id="stmt-widget" query="" prompt="step(extend(extend(extend(empty, x, 17), y, 42), z, 0), TERM, ?env)" modes="Statement:Statement" >}}
 hidden section "" {
  EvalNum @ eval(?rho, lit(?n), ?n) <- int(?n);
  EvalVar @ eval(?rho, var(?x), ?v) <- inenv(?x, ?v, ?rho);
 }
 hidden section "" {
  EvalPlus @ eval(?rho, plus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), add(?v_1, ?v_2, ?v);
  EvalMinus @ eval(?rho, minus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), subtract(?v_1, ?v_2, ?v);
 }
 hidden section "" {
  StepNoop @ stepbasic(?rho, noop, ?rho) <-;
  StepAssign @ stepbasic(?rho, assign(?x, ?e), extend(?rho, ?x, ?v)) <- eval(?rho, ?e, ?v);
 }
 hidden section "" {
  StepNoop @ stepbasic(?rho, noop, ?rho) <-;
  StepAssign @ stepbasic(?rho, assign(?x, ?e), extend(?rho, ?x, ?v)) <- eval(?rho, ?e, ?v);
 }
 hidden section "" {
  StepLiftBasic @ step(?rho_1, ?s, ?rho_2) <- stepbasic(?rho_1, ?s, ?rho_2);
 }
 section "" {
  StepIfTrue @ step(?rho_1, if(?e, ?s_1, ?s_2), ?rho_2) <- eval(?rho_1, ?e, ?v), not(numeq(?v, 0)), step(?rho_1, ?s_1, ?rho_2);
  StepIfFalse @ step(?rho_1, if(?e, ?s_1, ?s_2), ?rho_2) <- eval(?rho_1, ?e, ?v), numeq(?v, 0), step(?rho_1, ?s_2, ?rho_2);
  StepWhileTrue @ step(?rho_1, while(?e, ?s), ?rho_3) <- eval(?rho_1, ?e, ?v), not(numeq(?v, 0)), step(?rho_1, ?s, ?rho_2), step(?rho_2, while(?e, ?s), ?rho_3);
  StepWhileFalse @ step(?rho_1, while(?e, ?s), ?rho_1) <- eval(?rho_1, ?e, ?v), numeq(?v, 0);
  StepSeq @ step(?rho_1, seq(?s_1, ?s_2), ?rho_3) <- step(?rho_1, ?s_1, ?rho_2), step(?rho_2, ?s_2, ?rho_3);
 }
 hidden section "" {
  EnvTake @ inenv(?x, ?v, extend(?rho, ?x, ?v)) <-;
  EnvSkip @ inenv(?x, ?v_1, extend(?rho, ?y, ?v_2)) <- inenv(?x, ?v_1, ?rho), not(symeq(?x, ?y));
 }
 {{< /bergamot_widget >}}
--- a/content/blog/05_spa_agda_semantics/parser.js
+++ b/content/blog/05_spa_agda_semantics/parser.js
@ -0,0 +1,128 @@
 const match = str => input => {
    if (input.startsWith(str)) {
        return [[str, input.slice(str.length)]]
    }
    return [];
 };
 const map = (f, m) => input => {
    return m(input).map(([v, rest]) => [f(v), rest]);
 };
 const apply = (m1, m2) => input => {
    return m1(input).flatMap(([f, rest]) => m2(rest).map(([v, rest]) => [f(v), rest]));
 };
 const pure = v => input => [[v, input]];
 const liftA = (f, ...ms) => input => {
    if (ms.length <= 0) return []
    let results = map(v => [v], ms[0])(input);
    for (let i = 1; i < ms.length; i++) {
        results = results.flatMap(([vals, rest]) =>
            ms[i](rest).map(([val, rest]) => [[...vals, val], rest])
        );
    }
    return results.map(([vals, rest]) => [f(...vals), rest]);
 };
 const many1 = (m) => liftA((x, xs) => [x].concat(xs), m, oneOf([
    lazy(() => many1(m)),
    pure([])
 ]));
 const many = (m) => oneOf([ pure([]), many1(m) ]);
 const oneOf = ms => input => {
    return ms.flatMap(m => m(input));
 };
 const takeWhileRegex0 = regex => input => {
    let idx = 0;
    while (idx < input.length && regex.test(input[idx])) {
        idx++;
    }
    return [[input.slice(0, idx), input.slice(idx)]];
 };
 const takeWhileRegex = regex => input => {
    const result = takeWhileRegex0(regex)(input);
    if (result[0][0].length > 0) return result;
    return [];
 };
 const spaces = takeWhileRegex0(/\s/);
 const digits = takeWhileRegex(/\d/);
 const alphas = takeWhileRegex(/[a-zA-Z]/);
 const left = (m1, m2) => liftA((a, _) => a, m1, m2);
 const right = (m1, m2) => liftA((_, b) => b, m1, m2);
 const word = s => left(match(s), spaces);
 const end = s => s.length == 0 ? [['', '']] : [];
 const lazy = deferred => input => deferred()(input);
 const ident = left(alphas, spaces);
 const number = oneOf([
    liftA((a, b) => a + b, word("-"), left(digits, spaces)),
    left(digits, spaces),
 ]);
 const basicExpr = oneOf([
    map(n => `lit(${n})`, number),
    map(x => `var(${x})`, ident),
    liftA((lp, v, rp) => v, word("("), lazy(() => expr), word(")")),
 ]);
 const opExpr = oneOf([
  liftA((_a, _b, e) => ["plus", e], word("+"), spaces, lazy(() => expr)),
  liftA((_a, _b, e) => ["minus", e], word("-"), spaces, lazy(() => expr)),
 ]);
 const flatten = (e, es) => {
    return es.reduce((e1, [op, e2]) => `${op}(${e1}, ${e2})`, e);
 }
 const expr = oneOf([
  basicExpr,
  liftA(flatten, basicExpr, many(opExpr)),
 ]);
 const basicStmt = oneOf([
    liftA((x, _, e) => `assign(${x}, ${e})`, ident, word("="), expr),
    word("noop"),
 ]);
 const stmt = oneOf([
    basicStmt,
    liftA((_if, _lp_, cond, _rp, _lbr1_, s1, _rbr1, _else, _lbr2, s2, _rbr2) => `if(${cond}, ${s1}, ${s2})`,
        word("if"), word("("), expr, word(")"),
        word("{"), lazy(() => stmtSeq), word("}"),
        word("else"), word("{"), lazy(() => stmtSeq), word("}")),
    liftA((_while, _lp_, cond, _rp, _lbr_, s1, _rbr) => `while(${cond}, ${s1})`,
        word("while"), word("("), expr, word(")"),
        word("{"), lazy(() => stmtSeq), word("}")),
 ]);
 const stmtSeq = oneOf([
    liftA((s1, _semi, rest) => `seq(${s1}, ${rest})`, stmt, word(";"), lazy(() => stmtSeq)),
    stmt,
 ]);
 const parseWhole = m => string => {
    const result = left(m, end)(string);
    console.log(result);
    if (result.length > 0) return result[0][0];
    return null;
 }
 window.parseExpr = parseWhole(expr);
 window.parseBasicStmt = parseWhole(basicStmt);
 window.parseStmt = parseWhole(stmtSeq);