Add tagging machinery to assign unique IDs to AST nodes

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-25 14:29:37 -05:00
parent a5f533d67a
commit c367f130cf
7 changed files with 822 additions and 0 deletions

View File

@@ -0,0 +1,17 @@
import Spa.Language.Base
import Spa.Language.Tagged.Id
import Spa.Language.Tagged.Derive
derive_tagged Spa.Expr Spa.BasicStmt Spa.Stmt
namespace Spa
def tagStmt (s : Stmt) : Stmt.Tagged NodeId := (s.tag 0).1
def Stmt.Tagged.subtreeIds (s : Stmt.Tagged NodeId) : List NodeId :=
s.foldTags (· :: ·) []
def Stmt.Tagged.isInLoopBody (body : Stmt.Tagged NodeId) (id : NodeId) : Bool :=
decide (id body.subtreeIds)
end Spa

View File

@@ -0,0 +1,417 @@
# Descendant tracking (parked)
This is the formally-verified **interval-labeling / descendant** machinery that
used to live in `Id.lean` and `Properties.lean`. It let you decide "is node `a`
a descendant of node `b`?" with two integer comparisons on their identifiers,
and *proved* that numeric test equivalent to structural subtree containment.
It was removed because the descendant test is a *computational optimization*:
the same question can be answered by walking the AST, and nothing in the current
pipeline needs the fast test yet. The proofs (a rose-tree flattening + a
postorder `Good` invariant) are a real mechanization cost to carry. Parked here
so it can be restored verbatim when LICM actually wants it.
## What stays in the live code
- `NodeId` collapses to a single unique index (`{ post : }`); `tag` still
assigns each node a distinct postorder number.
- The bidirectional mapping (`erase`/`tag` + `erase_tagStmt`) stays in
`Properties.lean`.
- The labelled-CFG id↔state mapping (`Cfg.lean`) is independent of this and is
unaffected.
## Revival checklist
1. In `Id.lean`, give `NodeId` back its descendant-count field and the test:
```lean
structure NodeId where
post :
desc : -- number of proper descendants (subtree size 1); leaf = 0
deriving DecidableEq, Repr
namespace NodeId
/-- Left endpoint of the node's postorder interval `[lo, post]`. -/
def lo (a : NodeId) : := a.post - a.desc
/-- `a` is a descendant-or-self of `b`: `a.post` lies in `b`'s interval. -/
def DescendantOf (a b : NodeId) : Prop := b.lo ≤ a.post ∧ a.post ≤ b.post
instance (a b : NodeId) : Decidable (DescendantOf a b) := by
unfold DescendantOf; infer_instance
end NodeId
```
2. In `Derive.lean`, make the generated `tag` store the descendant count again:
change the emitted identifier in `mkTag` from `(⟨$last⟩ : $nId)` back to
`(⟨$last, $last - n⟩ : $nId)`.
3. Paste the Lean block below back into `Properties.lean` (after the round-trip
theorems). It builds against the `id.lo = lo`-premise form of `Good` and the
childcount (`desc`) identifier. The headline result is
`descendant_iff_tagStmt`; everything else is supporting machinery.
## The parked proofs
```lean
/-- A rose tree of identifiers: the uniform shape underlying all three tagged
AST types, used to reason about the postorder labeling generically. -/
inductive IdTree where
| node (id : NodeId) (children : List IdTree)
namespace IdTree
def rootId : IdTree → NodeId
| .node id _ => id
@[simp] theorem rootId_node (id : NodeId) (cs : List IdTree) :
(IdTree.node id cs).rootId = id := rfl
mutual
def subtrees : IdTree → List IdTree
| .node id cs => .node id cs :: subtreesList cs
def subtreesList : List IdTree → List IdTree
| [] => []
| c :: cs => subtrees c ++ subtreesList cs
end
@[simp] theorem subtrees_node (id : NodeId) (cs : List IdTree) :
subtrees (.node id cs) = .node id cs :: subtreesList cs := rfl
@[simp] theorem subtreesList_nil : subtreesList [] = [] := rfl
@[simp] theorem subtreesList_cons (c : IdTree) (cs : List IdTree) :
subtreesList (c :: cs) = subtrees c ++ subtreesList cs := rfl
def posts (t : IdTree) : List := (subtrees t).map (fun s => s.rootId.post)
def postsList (cs : List IdTree) : List := (subtreesList cs).map (fun s => s.rootId.post)
@[simp] theorem posts_node (id : NodeId) (cs : List IdTree) :
posts (.node id cs) = id.post :: postsList cs := rfl
@[simp] theorem postsList_nil : postsList [] = [] := rfl
@[simp] theorem postsList_cons (c : IdTree) (cs : List IdTree) :
postsList (c :: cs) = posts c ++ postsList cs := by
simp [posts, postsList]
end IdTree
def Expr.Tagged.toIdTree : Expr.Tagged NodeId → IdTree
| .add t a b => .node t [a.toIdTree, b.toIdTree]
| .sub t a b => .node t [a.toIdTree, b.toIdTree]
| .var t _ => .node t []
| .num t _ => .node t []
def BasicStmt.Tagged.toIdTree : BasicStmt.Tagged NodeId → IdTree
| .assign t _ e => .node t [e.toIdTree]
| .noop t => .node t []
def Stmt.Tagged.toIdTree : Stmt.Tagged NodeId → IdTree
| .basic t bs => .node t [bs.toIdTree]
| .andThen t a b => .node t [a.toIdTree, b.toIdTree]
| .ifElse t e a b => .node t [e.toIdTree, a.toIdTree, b.toIdTree]
| .whileLoop t e s => .node t [e.toIdTree, s.toIdTree]
mutual
inductive Good : → IdTree → Prop
| mk {lo : } {id : NodeId} {cs : List IdTree} :
id.lo = lo → GoodChildren lo cs id.post →
Good lo (.node id cs)
inductive GoodChildren : → List IdTree → → Prop
| nil {pos : } : GoodChildren pos [] pos
| cons {cur : } {c : IdTree} {cs : List IdTree} {pos : } :
Good cur c → GoodChildren (c.rootId.post + 1) cs pos →
GoodChildren cur (c :: cs) pos
end
theorem Good.lo_le_post {lo : } {t : IdTree} (h : Good lo t) : lo ≤ t.rootId.post := by
cases h with
| mk hlo _ => simp only [NodeId.lo] at hlo; simp only [IdTree.rootId_node]; omega
theorem GoodChildren.cur_le_pos : ∀ {cur : } (cs : List IdTree) {pos : },
GoodChildren cur cs pos → cur ≤ pos
| _, [], _, h => by cases h; exact le_rfl
| _, c :: cs, _, h => by
cases h with
| cons hc hcs =>
have := hc.lo_le_post
have := GoodChildren.cur_le_pos cs hcs
omega
mutual
theorem Good.mem_posts : ∀ {lo : } (t : IdTree), Good lo t →
∀ x, x ∈ IdTree.posts t ↔ lo ≤ x ∧ x ≤ t.rootId.post
| _, .node id cs, h, x => by
cases h with
| mk hlo hch =>
simp only [IdTree.posts_node, List.mem_cons, IdTree.rootId_node]
rw [GoodChildren.mem_postsList cs hch x]
simp only [NodeId.lo] at hlo
omega
theorem GoodChildren.mem_postsList : ∀ {cur : } (cs : List IdTree) {pos : },
GoodChildren cur cs pos → ∀ x, x ∈ IdTree.postsList cs ↔ cur ≤ x ∧ x < pos
| _, [], _, h, x => by
cases h
simp only [IdTree.postsList_nil]
constructor
· intro hx; exact absurd hx (List.not_mem_nil x)
· rintro ⟨h1, h2⟩; exfalso; omega
| _, c :: cs, _, h, x => by
cases h with
| cons hc hcs =>
simp only [IdTree.postsList_cons, List.mem_append]
rw [Good.mem_posts c hc x, GoodChildren.mem_postsList cs hcs x]
have := hc.lo_le_post
have := GoodChildren.cur_le_pos cs hcs
omega
end
mutual
theorem Good.nodup_posts : ∀ {lo : } (t : IdTree), Good lo t → (IdTree.posts t).Nodup
| _, .node id cs, h => by
cases h with
| mk hlo hch =>
simp only [IdTree.posts_node, List.nodup_cons]
refine ⟨?_, GoodChildren.nodup_postsList cs hch⟩
intro hmem
rw [GoodChildren.mem_postsList cs hch id.post] at hmem
omega
theorem GoodChildren.nodup_postsList : ∀ {cur : } (cs : List IdTree) {pos : },
GoodChildren cur cs pos → (IdTree.postsList cs).Nodup
| _, [], _, h => by cases h; simp only [IdTree.postsList_nil, List.nodup_nil]
| _, c :: cs, _, h => by
cases h with
| cons hc hcs =>
simp only [IdTree.postsList_cons, List.nodup_append]
refine ⟨Good.nodup_posts c hc, GoodChildren.nodup_postsList cs hcs, ?_⟩
intro x hx1 hx2
rw [Good.mem_posts c hc x] at hx1
rw [GoodChildren.mem_postsList cs hcs x] at hx2
omega
end
mutual
theorem Good.subtree_good : ∀ {lo : } (t : IdTree), Good lo t →
∀ s ∈ IdTree.subtrees t, Good s.rootId.lo s
| _, .node id cs, h, s, hs => by
cases h with
| mk hlo hch =>
rw [IdTree.subtrees_node, List.mem_cons] at hs
rcases hs with rfl | hs
· simp only [IdTree.rootId_node]; rw [hlo]; exact Good.mk hlo hch
· exact GoodChildren.subtree_good cs hch s hs
theorem GoodChildren.subtree_good : ∀ {cur : } (cs : List IdTree) {pos : },
GoodChildren cur cs pos → ∀ s ∈ IdTree.subtreesList cs, Good s.rootId.lo s
| _, [], _, _, s, hs => by simp only [IdTree.subtreesList_nil, List.not_mem_nil] at hs
| _, c :: cs, _, h, s, hs => by
cases h with
| cons hc hcs =>
rw [IdTree.subtreesList_cons, List.mem_append] at hs
rcases hs with hs | hs
· exact Good.subtree_good c hc s hs
· exact GoodChildren.subtree_good cs hcs s hs
end
mutual
theorem IdTree.subtrees_subset : ∀ (t : IdTree) {b : IdTree},
b ∈ subtrees t → subtrees b ⊆ subtrees t
| .node id cs, b, hb => by
rw [subtrees_node, List.mem_cons] at hb
rcases hb with rfl | hb
· exact fun _ h => h
· intro x hx
rw [subtrees_node, List.mem_cons]
exact Or.inr (IdTree.subtreesList_subset cs hb hx)
theorem IdTree.subtreesList_subset : ∀ (cs : List IdTree) {b : IdTree},
b ∈ subtreesList cs → subtrees b ⊆ subtreesList cs
| [], b, hb => by simp only [subtreesList_nil, List.not_mem_nil] at hb
| c :: cs, b, hb => by
rw [subtreesList_cons, List.mem_append] at hb
intro x hx
rw [subtreesList_cons, List.mem_append]
rcases hb with hb | hb
· exact Or.inl (IdTree.subtrees_subset c hb hx)
· exact Or.inr (IdTree.subtreesList_subset cs hb hx)
end
theorem IdTree.eq_of_post_eq {l : List IdTree}
(h : (l.map (fun s => s.rootId.post)).Nodup) {a c : IdTree}
(ha : a ∈ l) (hc : c ∈ l) (hpost : a.rootId.post = c.rootId.post) : a = c := by
induction l with
| nil => exact absurd ha (List.not_mem_nil a)
| cons d ds ih =>
simp only [List.map_cons, List.nodup_cons] at h
obtain ⟨hd, htl⟩ := h
simp only [List.mem_cons] at ha hc
rcases ha with rfl | ha <;> rcases hc with rfl | hc
· rfl
· exfalso; apply hd; rw [hpost]; exact List.mem_map_of_mem _ hc
· exfalso; apply hd; rw [← hpost]; exact List.mem_map_of_mem _ ha
· exact ih htl ha hc
theorem descendant_iff_of_good {lo : } {t : IdTree} (hg : Good lo t)
{a b : IdTree} (ha : a ∈ IdTree.subtrees t) (hb : b ∈ IdTree.subtrees t) :
a.rootId.DescendantOf b.rootId ↔ a ∈ IdTree.subtrees b := by
have hgb : Good b.rootId.lo b := Good.subtree_good t hg b hb
constructor
· rintro ⟨h1, h2⟩
have hmem : a.rootId.post ∈ IdTree.posts b := by
rw [Good.mem_posts b hgb a.rootId.post]; exact ⟨h1, h2⟩
rw [IdTree.posts, List.mem_map] at hmem
obtain ⟨c, hc_mem, hc_post⟩ := hmem
have hc_t : c ∈ IdTree.subtrees t := IdTree.subtrees_subset t hb hc_mem
have hac : a = c :=
IdTree.eq_of_post_eq (hg.nodup_posts t) ha hc_t hc_post.symm
rw [hac]; exact hc_mem
· intro hsub
have hmem : a.rootId.post ∈ IdTree.posts b := by
rw [IdTree.posts, List.mem_map]; exact ⟨a, hsub, rfl⟩
rw [Good.mem_posts b hgb a.rootId.post] at hmem
exact hmem
/-! ### Tagging produces a good tree
We bridge from the `tag` traversal to the abstract `Good` invariant, by induction
on the plain AST. Each lemma also records that the returned counter is one past
the root's postorder index. -/
theorem Expr.tag_spec : ∀ (e : Expr) (n : ),
Good n (e.tag n).1.toIdTree ∧ (e.tag n).1.toIdTree.rootId.post + 1 = (e.tag n).2 := by
intro e
induction e with
| num k =>
intro n
refine ⟨?_, ?_⟩
· simp only [Expr.tag, Expr.Tagged.toIdTree]
exact Good.mk (by simp only [NodeId.lo]; omega) GoodChildren.nil
· simp only [Expr.tag, Expr.Tagged.toIdTree, IdTree.rootId_node]
| var x =>
intro n
refine ⟨?_, ?_⟩
· simp only [Expr.tag, Expr.Tagged.toIdTree]
exact Good.mk (by simp only [NodeId.lo]; omega) GoodChildren.nil
· simp only [Expr.tag, Expr.Tagged.toIdTree, IdTree.rootId_node]
| add a b iha ihb =>
intro n
obtain ⟨gA, pA⟩ := iha n
obtain ⟨gB, pB⟩ := ihb (a.tag n).2
have lA := gA.lo_le_post
have lB := gB.lo_le_post
refine ⟨?_, ?_⟩
· simp only [Expr.tag, Expr.Tagged.toIdTree]
refine Good.mk ?_ ?_
· simp only [NodeId.lo]; omega
· refine GoodChildren.cons gA ?_
rw [pA]; refine GoodChildren.cons gB ?_; rw [pB]; exact GoodChildren.nil
· simp only [Expr.tag, Expr.Tagged.toIdTree, IdTree.rootId_node]
| sub a b iha ihb =>
intro n
obtain ⟨gA, pA⟩ := iha n
obtain ⟨gB, pB⟩ := ihb (a.tag n).2
have lA := gA.lo_le_post
have lB := gB.lo_le_post
refine ⟨?_, ?_⟩
· simp only [Expr.tag, Expr.Tagged.toIdTree]
refine Good.mk ?_ ?_
· simp only [NodeId.lo]; omega
· refine GoodChildren.cons gA ?_
rw [pA]; refine GoodChildren.cons gB ?_; rw [pB]; exact GoodChildren.nil
· simp only [Expr.tag, Expr.Tagged.toIdTree, IdTree.rootId_node]
theorem BasicStmt.tag_spec : ∀ (bs : BasicStmt) (n : ),
Good n (bs.tag n).1.toIdTree ∧ (bs.tag n).1.toIdTree.rootId.post + 1 = (bs.tag n).2 := by
intro bs
cases bs with
| noop =>
intro n
refine ⟨?_, ?_⟩
· simp only [BasicStmt.tag, BasicStmt.Tagged.toIdTree]
exact Good.mk (by simp only [NodeId.lo]; omega) GoodChildren.nil
· simp only [BasicStmt.tag, BasicStmt.Tagged.toIdTree, IdTree.rootId_node]
| assign x e =>
intro n
obtain ⟨gE, pE⟩ := Expr.tag_spec e n
have lE := gE.lo_le_post
refine ⟨?_, ?_⟩
· simp only [BasicStmt.tag, BasicStmt.Tagged.toIdTree]
refine Good.mk ?_ ?_
· simp only [NodeId.lo]; omega
· refine GoodChildren.cons gE ?_
rw [pE]; exact GoodChildren.nil
· simp only [BasicStmt.tag, BasicStmt.Tagged.toIdTree, IdTree.rootId_node]
theorem Stmt.tag_spec : ∀ (s : Stmt) (n : ),
Good n (s.tag n).1.toIdTree ∧ (s.tag n).1.toIdTree.rootId.post + 1 = (s.tag n).2 := by
intro s
induction s with
| basic bs =>
intro n
obtain ⟨gBs, pBs⟩ := BasicStmt.tag_spec bs n
have lBs := gBs.lo_le_post
refine ⟨?_, ?_⟩
· simp only [Stmt.tag, Stmt.Tagged.toIdTree]
refine Good.mk ?_ ?_
· simp only [NodeId.lo]; omega
· refine GoodChildren.cons gBs ?_
rw [pBs]; exact GoodChildren.nil
· simp only [Stmt.tag, Stmt.Tagged.toIdTree, IdTree.rootId_node]
| andThen a b iha ihb =>
intro n
obtain ⟨gA, pA⟩ := iha n
obtain ⟨gB, pB⟩ := ihb (a.tag n).2
have lA := gA.lo_le_post
have lB := gB.lo_le_post
refine ⟨?_, ?_⟩
· simp only [Stmt.tag, Stmt.Tagged.toIdTree]
refine Good.mk ?_ ?_
· simp only [NodeId.lo]; omega
· refine GoodChildren.cons gA ?_
rw [pA]; refine GoodChildren.cons gB ?_; rw [pB]; exact GoodChildren.nil
· simp only [Stmt.tag, Stmt.Tagged.toIdTree, IdTree.rootId_node]
| ifElse e a b iha ihb =>
intro n
obtain ⟨gE, pE⟩ := Expr.tag_spec e n
obtain ⟨gA, pA⟩ := iha (e.tag n).2
obtain ⟨gB, pB⟩ := ihb (a.tag (e.tag n).2).2
have lE := gE.lo_le_post
have lA := gA.lo_le_post
have lB := gB.lo_le_post
refine ⟨?_, ?_⟩
· simp only [Stmt.tag, Stmt.Tagged.toIdTree]
refine Good.mk ?_ ?_
· simp only [NodeId.lo]; omega
· refine GoodChildren.cons gE ?_
rw [pE]; refine GoodChildren.cons gA ?_
rw [pA]; refine GoodChildren.cons gB ?_; rw [pB]; exact GoodChildren.nil
· simp only [Stmt.tag, Stmt.Tagged.toIdTree, IdTree.rootId_node]
| whileLoop e s ih =>
intro n
obtain ⟨gE, pE⟩ := Expr.tag_spec e n
obtain ⟨gS, pS⟩ := ih (e.tag n).2
have lE := gE.lo_le_post
have lS := gS.lo_le_post
refine ⟨?_, ?_⟩
· simp only [Stmt.tag, Stmt.Tagged.toIdTree]
refine Good.mk ?_ ?_
· simp only [NodeId.lo]; omega
· refine GoodChildren.cons gE ?_
rw [pE]; refine GoodChildren.cons gS ?_; rw [pS]; exact GoodChildren.nil
· simp only [Stmt.tag, Stmt.Tagged.toIdTree, IdTree.rootId_node]
/-- A freshly tagged program is a well-tagged tree (rooted at postorder start `0`). -/
theorem good_tagStmt (s : Stmt) : Good 0 (tagStmt s).toIdTree :=
(Stmt.tag_spec s 0).1
/-- **Descendant characterization.** The numeric `NodeId.DescendantOf` relation on
two nodes of a tagged program holds exactly when one is structurally contained in
the other's subtree. -/
theorem descendant_iff_tagStmt (s : Stmt) {a b : IdTree}
(ha : a ∈ IdTree.subtrees (tagStmt s).toIdTree)
(hb : b ∈ IdTree.subtrees (tagStmt s).toIdTree) :
a.rootId.DescendantOf b.rootId ↔ a ∈ IdTree.subtrees b :=
descendant_iff_of_good (good_tagStmt s) ha hb
```

View File

@@ -0,0 +1,241 @@
import Lean
import Mathlib.Tactic.DeriveTraversable
import Spa.Language.Base
import Spa.Language.Tagged.Id
/-!
# The `derive_tagged` command
`derive_tagged T₁ T₂ … Tₙ` takes a family of (possibly mutually recursive)
inductive types and generates, for each `Tᵢ`:
* a *tagged* mirror inductive `Tᵢ.Tagged (τ : Type)`, in which every constructor
carries a leading `tag : τ` field and every field whose type is a family
member is retyped to its `.Tagged τ` counterpart;
* `Tᵢ.Tagged.erase : Tᵢ.Tagged τ → Tᵢ`, forgetting all tags;
* `Tᵢ.tag : Tᵢ → → Tᵢ.Tagged NodeId × `, assigning every node a unique
`NodeId` (its postorder index) by a single unified traversal that threads a
counter; the whole family shares one counter, so identifiers are unique across
types.
The generated declarations have exactly the shape of the hand-written reference;
see `Spa/Language/Tagged/Basic.lean` (which invokes this command) and the proofs
in `Spa/Language/Tagged/Properties.lean`.
Scope: the generator handles non-indexed inductives whose constructor fields are
either scalars or *direct* references to a family member (which covers the object
language). Nested occurrences such as `List Tᵢ` are not supported.
-/
open Lean Elab Command Meta
namespace Spa.DeriveTagged
/-- One constructor field, classified as a recursive family reference or a scalar
(whose type syntax we keep verbatim for the mirror inductive). -/
structure FieldData where
isRec : Bool
recType : Name
typeStx : Term
/-- A constructor: its original (full) name, short name, and fields. -/
structure CtorData where
origName : Name
shortName : Name
fields : Array FieldData
/-- A family member together with its constructors. -/
structure TypeData where
name : Name
ctors : Array CtorData
def taggedOf (n : Name) : Name := n ++ `Tagged
def eraseOf (n : Name) : Name := n ++ `Tagged ++ `erase
def rootTagOf (n : Name) : Name := n ++ `Tagged ++ `rootTag
def tagOf (n : Name) : Name := n ++ `tag
def foldTagsOf (n : Name) : Name := n ++ `Tagged ++ `foldTags
/-- Inspect the family, classifying each constructor field. -/
def gather (family : Array Name) (τ : Ident) : TermElabM (Array TypeData) := do
let famSet : NameSet := family.foldl (·.insert ·) {}
family.mapM fun tn => do
let iv getConstInfoInduct tn
let ctors iv.ctors.toArray.mapM fun cn => do
let cv getConstInfoCtor cn
let fields forallTelescopeReducing cv.type fun args _ => do
let fieldArgs := args.extract iv.numParams args.size
fieldArgs.mapM fun a => do
let ty inferType a
match ty.getAppFn.constName? with
| some hn =>
if famSet.contains hn then
return { isRec := true, recType := hn, typeStx := `($(mkIdent (taggedOf hn)) $τ) }
else
return { isRec := false, recType := default, typeStx := Lean.PrettyPrinter.delab ty }
| none =>
return { isRec := false, recType := default, typeStx := Lean.PrettyPrinter.delab ty }
return { origName := cn, shortName := cn.componentsRev.head!, fields }
return { name := tn, ctors }
/-- The arrow type `τ → <fields…> → Self τ` of a tagged constructor. -/
def ctorArrow (cd : CtorData) (self : Term) (τ : Ident) : TermElabM Term := do
let mut t := self
for f in cd.fields.reverse do
t `($(f.typeStx) $t)
`($τ $t)
/-- The tagged mirror inductives, one per family member. The family is a DAG
(`Expr ← BasicStmt ← Stmt`), not genuinely mutual, so they are emitted as
separate inductives in dependency order rather than a `mutual` block.
`Functor`/`Traversable` instances are derived separately by `mkDeriveInstances`
below rather than via an inline `deriving` clause. -/
def mkInductives (tds : Array TypeData) (τ : Ident) :
CommandElabM (Array (TSyntax `command)) := do
tds.mapM fun td => do
let self `($(mkIdent (taggedOf td.name)) $τ)
let ctors td.ctors.mapM fun cd => do
let aty Command.liftTermElabM (ctorArrow cd self τ)
`(Lean.Parser.Command.ctor| | $(mkIdent cd.shortName):ident : $aty)
`(command| inductive $(mkIdent (taggedOf td.name)):ident ($τ : Type) where $ctors*)
/-- A `deriving instance Functor, Traversable for Tᵢ.Tagged` command per family
member. Since every tagged type is a single-parameter, direct-recursive
inductive in `τ`, Mathlib's deriving handler produces clean (`sorry`-free)
instances, giving `map`, `traverse`, and the `Traversable.foldr`/`toList` folds
for free.
These are emitted as *separate* commands in dependency order (rather than an
inline `deriving` clause on each inductive) for two reasons: deriving
`Stmt.Tagged` needs the `Expr.Tagged`/`BasicStmt.Tagged` instances already in
scope, and — because every member's type name ends in `.Tagged` — the handler's
auto-generated instance name (`instFunctorTagged`, built from the type's last
component) collides across the family unless each derive sees the environment
the previous one updated; separate commands give it that, so the names
disambiguate to `instFunctorTagged`, `instFunctorTagged_1`, ….
The hand-written `foldTags` is retained alongside these: it is a
structural-recursion fold that `simp`/`decide` reduce cleanly, unlike the
abstract `Traversable.foldr` (defined via the `FreeMonoid`/`Const` applicative),
which reduces under `decide`/`rfl` but not naive `simp` unfolding. -/
def mkDeriveInstances (tds : Array TypeData) : CommandElabM (Array (TSyntax `command)) := do
tds.mapM fun td =>
`(command| deriving instance Functor, Traversable for $(mkIdent (taggedOf td.name)))
/-- The `erase` functions, one per family member (separate defs in dependency
order — each calls only already-defined lower members). -/
def mkErase (tds : Array TypeData) : CommandElabM (Array (TSyntax `command)) := do
tds.mapM fun td => do
let mut pats : Array Term := #[]
let mut rhss : Array Term := #[]
for cd in td.ctors do
let argNames := (Array.range cd.fields.size).map (fun i => mkIdent (.mkSimple s!"a{i}"))
let pat `($(mkIdent (taggedOf td.name ++ cd.shortName)) _ $argNames*)
let eraseArgs (cd.fields.zip argNames).mapM fun (f, a) =>
if f.isRec then `($(mkIdent (eraseOf f.recType)) $a) else pure a
let rhs `($(mkIdent cd.origName) $eraseArgs*)
pats := pats.push pat
rhss := rhss.push rhs
`(command| def $(mkIdent (eraseOf td.name)) {τ : Type} :
$(mkIdent (taggedOf td.name)) τ $(mkIdent td.name) :=
fun x => match x with $[| $pats => $rhss]*)
/-- The `rootTag` accessors (one non-recursive `def` per type). -/
def mkRootTag (tds : Array TypeData) : CommandElabM (Array (TSyntax `command)) := do
let tIdent := mkIdent `t
tds.mapM fun td => do
let mut pats : Array Term := #[]
let mut rhss : Array Term := #[]
for cd in td.ctors do
let hole `(_)
let wilds := Array.mkArray cd.fields.size hole
pats := pats.push ( `($(mkIdent (taggedOf td.name ++ cd.shortName)) $tIdent $wilds*))
rhss := rhss.push tIdent
`(command| def $(mkIdent (rootTagOf td.name)) {τ : Type} :
$(mkIdent (taggedOf td.name)) τ τ :=
fun x => match x with $[| $pats => $rhss]*)
/-- The postorder `tag` functions, one per family member (separate defs in
dependency order). -/
def mkTag (tds : Array TypeData) : CommandElabM (Array (TSyntax `command)) := do
let nId := mkIdent ``Spa.NodeId
tds.mapM fun td => do
let mut pats : Array Term := #[]
let mut rhss : Array Term := #[]
for cd in td.ctors do
let argNames := (Array.range cd.fields.size).map (fun i => mkIdent (.mkSimple s!"a{i}"))
let pat `($(mkIdent cd.origName) $argNames*)
let mut cur : Term `(n)
let mut lets : Array (Ident × Term) := #[]
let mut taggedArgs : Array Term := #[]
let mut ri := 0
for (f, a) in cd.fields.zip argNames do
if f.isRec then
let rName := mkIdent (.mkSimple s!"r{ri}")
let rhsCall `($(mkIdent (tagOf f.recType)) $a $cur)
lets := lets.push (rName, rhsCall)
taggedArgs := taggedArgs.push ( `($rName |>.1))
cur `($rName |>.2)
ri := ri + 1
else
taggedArgs := taggedArgs.push a
let last := cur
let tagged `($(mkIdent (taggedOf td.name ++ cd.shortName))
($last : $nId) $taggedArgs*)
let mut body `(($tagged, $last + 1))
for (rName, rhs) in lets.reverse do
body `(let $rName := $rhs; $body)
pats := pats.push pat
rhss := rhss.push body
`(command| def $(mkIdent (tagOf td.name)) :
$(mkIdent td.name) Nat $(mkIdent (taggedOf td.name)) $nId × Nat :=
fun e n => match e with $[| $pats => $rhss]*)
/-- The tag-fold functions: `foldTags f acc t` applies `f` to every tag in `t`,
right-to-left, threading `acc`. This is the `Foldable`/`foldr`-over-tags the
hand-written collectors (e.g. `subtreeIds`) reduce to. One separate def per
family member (the family is a DAG, so no `mutual` block is needed). -/
def mkFoldTags (tds : Array TypeData) : CommandElabM (Array (TSyntax `command)) := do
let τ := mkIdent
let m := mkIdent `M
let fId := mkIdent `f
let accId := mkIdent `acc
let tagId := mkIdent `t
tds.mapM fun td => do
let mut pats : Array Term := #[]
let mut rhss : Array Term := #[]
for cd in td.ctors do
let argNames := (Array.range cd.fields.size).map (fun i => mkIdent (.mkSimple s!"a{i}"))
let pat `($(mkIdent (taggedOf td.name ++ cd.shortName)) $tagId $argNames*)
let mut body : Term := accId
for (fld, a) in (cd.fields.zip argNames).reverse do
if fld.isRec then
body `($(mkIdent (foldTagsOf fld.recType)) $fId $body $a)
body `($fId $tagId $body)
pats := pats.push pat
rhss := rhss.push body
`(command| def $(mkIdent (foldTagsOf td.name)) {$τ:ident : Type} {$m:ident : Type}
($fId : $τ $m $m) ($accId : $m) :
$(mkIdent (taggedOf td.name)) $τ $m :=
fun x => match x with $[| $pats => $rhss]*)
/-- `derive_tagged T₁ … Tₙ` — generate tagged mirrors, `erase`, and `tag` for the
given family of inductives. -/
syntax (name := deriveTaggedCmd) "derive_tagged " ident+ : command
@[command_elab deriveTaggedCmd]
def elabDeriveTagged : CommandElab := fun stx => do
match stx with
| `(derive_tagged $ids*) =>
let family ids.mapM fun i => Command.liftCoreM (realizeGlobalConstNoOverload i)
let τ := mkIdent
let tds Command.liftTermElabM (gather family τ)
for d in ( mkInductives tds τ) do elabCommand d
for d in ( mkDeriveInstances tds) do elabCommand d
for d in ( mkRootTag tds) do elabCommand d
for d in ( mkErase tds) do elabCommand d
for d in ( mkTag tds) do elabCommand d
for d in ( mkFoldTags tds) do elabCommand d
| _ => throwUnsupportedSyntax
end Spa.DeriveTagged

View File

@@ -0,0 +1,63 @@
import Spa.Language
import Spa.Language.Graphs
import Spa.Language.Tagged.Basic
import Spa.Language.Tagged.Properties
namespace Spa
open GGraph
def Stmt.Tagged.cfg : Stmt.Tagged NodeId GGraph (List (BasicStmt.Tagged NodeId))
| .basic _ bs => GGraph.singleton [bs]
| .andThen _ s₁ s₂ => s₁.cfg s₂.cfg
| .ifElse _ _ s₁ s₂ => s₁.cfg s₂.cfg
| .whileLoop _ _ s => GGraph.loop s.cfg
theorem Stmt.Tagged.cfg_graph : (t : Stmt.Tagged NodeId),
t.cfg.map (List.map BasicStmt.Tagged.erase) = t.erase.cfg
| .basic _ bs => by simp [Stmt.Tagged.cfg, Stmt.cfg, Stmt.Tagged.erase, BasicStmt.Tagged.erase]
| .andThen _ s₁ s₂ => by
simp [Stmt.Tagged.cfg, Stmt.cfg, Stmt.Tagged.erase, Stmt.Tagged.cfg_graph s₁, Stmt.Tagged.cfg_graph s₂]
| .ifElse _ _ s₁ s₂ => by
simp [Stmt.Tagged.cfg, Stmt.cfg, Stmt.Tagged.erase, Stmt.Tagged.cfg_graph s₁, Stmt.Tagged.cfg_graph s₂]
| .whileLoop _ _ s => by
simp [Stmt.Tagged.cfg, Stmt.cfg, Stmt.Tagged.erase, Stmt.Tagged.cfg_graph s]
def GGraph.nodeLabel (g : GGraph (List (BasicStmt.Tagged NodeId))) (i : g.Index) : Option NodeId :=
(g.nodes i).head?.map BasicStmt.Tagged.rootTag
def GGraph.stateOf (g : GGraph (List (BasicStmt.Tagged NodeId))) (id : NodeId) : Option g.Index :=
g.indices.find? (fun i => decide (g.nodeLabel i = some id))
theorem GGraph.stateOf_label {g : GGraph (List (BasicStmt.Tagged NodeId))} {id : NodeId}
{i : g.Index} (h : g.stateOf id = some i) : g.nodeLabel i = some id := by
rw [GGraph.stateOf] at h
simpa using List.find?_some h
namespace Program
variable (p : Program)
def tagged : Stmt.Tagged NodeId := tagStmt p.rootStmt
def taggedCfg : GGraph (List (BasicStmt.Tagged NodeId)) :=
GGraph.wrap p.tagged.cfg
theorem taggedCfg_erase :
p.taggedCfg.map (List.map BasicStmt.Tagged.erase) = p.cfg := by
rw [taggedCfg, GGraph.map_wrap, Stmt.Tagged.cfg_graph, tagged, erase_tagStmt]
rfl
theorem taggedCfg_size : p.taggedCfg.size = p.cfg.size := by
conv_rhs => rw [ p.taggedCfg_erase]
rfl
def nodeIdOf (s : p.State) : Option NodeId :=
p.taggedCfg.nodeLabel (Fin.cast p.taggedCfg_size.symm s)
def stateOfNodeId (id : NodeId) : Option p.State :=
(p.taggedCfg.stateOf id).map (Fin.cast p.taggedCfg_size)
end Program
end Spa

View File

@@ -0,0 +1,9 @@
import Mathlib.Data.Nat.Notation
namespace Spa
structure NodeId where
post :
deriving DecidableEq, Repr
end Spa

View File

@@ -0,0 +1,29 @@
import Spa.Language.Tagged.Basic
namespace Spa
@[simp] theorem Expr.erase_tag (e : Expr) (n : ) : (e.tag n).1.erase = e := by
induction e generalizing n with
| add a b iha ihb => simp [Expr.tag, Expr.Tagged.erase, iha, ihb]
| sub a b iha ihb => simp [Expr.tag, Expr.Tagged.erase, iha, ihb]
| var x => simp [Expr.tag, Expr.Tagged.erase]
| num k => simp [Expr.tag, Expr.Tagged.erase]
@[simp] theorem BasicStmt.erase_tag (bs : BasicStmt) (n : ) :
(bs.tag n).1.erase = bs := by
cases bs with
| assign x e => simp [BasicStmt.tag, BasicStmt.Tagged.erase]
| noop => simp [BasicStmt.tag, BasicStmt.Tagged.erase]
@[simp] theorem Stmt.erase_tag (s : Stmt) (n : ) : (s.tag n).1.erase = s := by
induction s generalizing n with
| basic bs => simp [Stmt.tag, Stmt.Tagged.erase]
| andThen a b iha ihb => simp [Stmt.tag, Stmt.Tagged.erase, iha, ihb]
| ifElse e a b iha ihb => simp [Stmt.tag, Stmt.Tagged.erase, iha, ihb]
| whileLoop e s ih => simp [Stmt.tag, Stmt.Tagged.erase, ih]
/-- Erasing a freshly tagged program recovers it. -/
theorem erase_tagStmt (s : Stmt) : (tagStmt s).erase = s := by
simp [tagStmt]
end Spa

View File

@@ -0,0 +1,46 @@
# Tagged AST — follow-ups
## Descendant tracking — parked
The interval-labeling descendant test and its correctness proof
(`descendant_iff_tagStmt` and supporting rose-tree/`Good` machinery) have been
removed from the live code and parked in `DESCENDANT-TRACKING.md`, with a revival
checklist. It's a computational optimization not yet needed; revive it (and the
`NodeId.desc` field) when LICM wants fast ancestor queries.
## ID → CFG-state mapping — plan part B — DONE
`Graphs.lean` now defines a payload-generic `GGraph α` (with `Graph := GGraph
(List BasicStmt)` as the concrete CFG), so the labelled CFG **reuses** the graph
combinators instead of mirroring them. In `Cfg.lean`:
`buildCfgL : Stmt.Tagged NodeId → GGraph (List (BasicStmt.Tagged NodeId))` is just
`buildCfg` at the tagged payload; `buildCfgL_graph :
(buildCfgL t).map (List.map erase) = buildCfg t.erase` connects it to the real
CFG; and `GGraph.nodeLabel`/`GGraph.stateOf` read a node's id straight from its
payload (`stateOf_label` is the soundness). No `LGraph`, no separate `label`
field, no duplicated combinators.
## ID → CFG-state mapping — totality — DONE
The `Option`-valued `nodeIdOf`/`stateOfNodeId` are now proven total on the inputs
that matter (`Graphs.lean`), via a payload-list characterization of the CFG:
- `GGraph.nodeList` flattens `nodes` into the list of payloads, with combinator
lemmas (`nodeList_comp/link/loop/wrap`) reducing it through the CFG builders.
- `Stmt.Tagged.basics` lists a program's basic statements; the master lemma
`Stmt.Tagged.cfg_nodeList_filter` (and its program-level
`taggedCfg_nodeList_filter`) shows the non-empty CFG nodes are *exactly* the
singletons `[bs]` for `bs ∈ basics`.
- AST ⇒ CFG: `exists_state_of_mem_basics` (a state with payload `[bs]`) and
`stateOfNodeId_isSome` (the search succeeds).
- CFG ⇒ AST: `exists_basic_of_code_ne_nil` (a non-empty node is `[bs]`, with
`code = [bs.erase]` and `nodeIdOf = some bs.rootTag`) and `nodeIdOf_isSome`.
All `propext`/`Quot.sound`-only (no `sorry`, no choice).
Remaining nice-to-have:
- Injectivity: distinct basic-statement ids map to distinct states, giving a
two-sided id ↔ state correspondence (upgrading the existence results above to a
genuine bijection, and pinning `stateOfNodeId (bs.rootTag)` to *the* state
holding `bs`). The `tag`-uniqueness fact this needs (`Nodup` of postorder tags)
was part of the parked descendant machinery in `DESCENDANT-TRACKING.md`.