Add tagging machinery to assign unique IDs to AST nodes

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-25 14:29:37 -05:00
parent a5f533d67a
commit c367f130cf
7 changed files with 822 additions and 0 deletions
--- a/lean/Spa/Language/Tagged/Basic.lean
+++ b/lean/Spa/Language/Tagged/Basic.lean
@@ -0,0 +1,17 @@
+import Spa.Language.Base
+import Spa.Language.Tagged.Id
+import Spa.Language.Tagged.Derive
+
+derive_tagged Spa.Expr Spa.BasicStmt Spa.Stmt
+
+namespace Spa
+
+def tagStmt (s : Stmt) : Stmt.Tagged NodeId := (s.tag 0).1
+
+def Stmt.Tagged.subtreeIds (s : Stmt.Tagged NodeId) : List NodeId :=
+  s.foldTags (· :: ·) []
+
+def Stmt.Tagged.isInLoopBody (body : Stmt.Tagged NodeId) (id : NodeId) : Bool :=
+  decide (id ∈ body.subtreeIds)
+
+end Spa
--- a/lean/Spa/Language/Tagged/DESCENDANT-TRACKING.md
+++ b/lean/Spa/Language/Tagged/DESCENDANT-TRACKING.md
@@ -0,0 +1,417 @@
+# Descendant tracking (parked)
+
+This is the formally-verified **interval-labeling / descendant** machinery that
+used to live in `Id.lean` and `Properties.lean`. It let you decide "is node `a`
+a descendant of node `b`?" with two integer comparisons on their identifiers,
+and *proved* that numeric test equivalent to structural subtree containment.
+
+It was removed because the descendant test is a *computational optimization*:
+the same question can be answered by walking the AST, and nothing in the current
+pipeline needs the fast test yet. The proofs (a rose-tree flattening + a
+postorder `Good` invariant) are a real mechanization cost to carry. Parked here
+so it can be restored verbatim when LICM actually wants it.
+
+## What stays in the live code
+
+- `NodeId` collapses to a single unique index (`{ post : ℕ }`); `tag` still
+  assigns each node a distinct postorder number.
+- The bidirectional mapping (`erase`/`tag` + `erase_tagStmt`) stays in
+  `Properties.lean`.
+- The labelled-CFG id↔state mapping (`Cfg.lean`) is independent of this and is
+  unaffected.
+
+## Revival checklist
+
+1. In `Id.lean`, give `NodeId` back its descendant-count field and the test:
+
+   ```lean
+   structure NodeId where
+     post : ℕ
+     desc : ℕ          -- number of proper descendants (subtree size − 1); leaf = 0
+     deriving DecidableEq, Repr
+
+   namespace NodeId
+
+   /-- Left endpoint of the node's postorder interval `[lo, post]`. -/
+   def lo (a : NodeId) : ℕ := a.post - a.desc
+
+   /-- `a` is a descendant-or-self of `b`: `a.post` lies in `b`'s interval. -/
+   def DescendantOf (a b : NodeId) : Prop := b.lo ≤ a.post ∧ a.post ≤ b.post
+
+   instance (a b : NodeId) : Decidable (DescendantOf a b) := by
+     unfold DescendantOf; infer_instance
+
+   end NodeId
+   ```
+
+2. In `Derive.lean`, make the generated `tag` store the descendant count again:
+   change the emitted identifier in `mkTag` from `(⟨$last⟩ : $nId)` back to
+   `(⟨$last, $last - n⟩ : $nId)`.
+
+3. Paste the Lean block below back into `Properties.lean` (after the round-trip
+   theorems). It builds against the `id.lo = lo`-premise form of `Good` and the
+   childcount (`desc`) identifier. The headline result is
+   `descendant_iff_tagStmt`; everything else is supporting machinery.
+
+## The parked proofs
+
+```lean
+/-- A rose tree of identifiers: the uniform shape underlying all three tagged
+AST types, used to reason about the postorder labeling generically. -/
+inductive IdTree where
+  | node (id : NodeId) (children : List IdTree)
+
+namespace IdTree
+
+def rootId : IdTree → NodeId
+  | .node id _ => id
+
+@[simp] theorem rootId_node (id : NodeId) (cs : List IdTree) :
+    (IdTree.node id cs).rootId = id := rfl
+
+mutual
+def subtrees : IdTree → List IdTree
+  | .node id cs => .node id cs :: subtreesList cs
+def subtreesList : List IdTree → List IdTree
+  | [] => []
+  | c :: cs => subtrees c ++ subtreesList cs
+end
+
+@[simp] theorem subtrees_node (id : NodeId) (cs : List IdTree) :
+    subtrees (.node id cs) = .node id cs :: subtreesList cs := rfl
+
+@[simp] theorem subtreesList_nil : subtreesList [] = [] := rfl
+
+@[simp] theorem subtreesList_cons (c : IdTree) (cs : List IdTree) :
+    subtreesList (c :: cs) = subtrees c ++ subtreesList cs := rfl
+
+def posts (t : IdTree) : List ℕ := (subtrees t).map (fun s => s.rootId.post)
+
+def postsList (cs : List IdTree) : List ℕ := (subtreesList cs).map (fun s => s.rootId.post)
+
+@[simp] theorem posts_node (id : NodeId) (cs : List IdTree) :
+    posts (.node id cs) = id.post :: postsList cs := rfl
+
+@[simp] theorem postsList_nil : postsList [] = [] := rfl
+
+@[simp] theorem postsList_cons (c : IdTree) (cs : List IdTree) :
+    postsList (c :: cs) = posts c ++ postsList cs := by
+  simp [posts, postsList]
+
+end IdTree
+
+def Expr.Tagged.toIdTree : Expr.Tagged NodeId → IdTree
+  | .add t a b => .node t [a.toIdTree, b.toIdTree]
+  | .sub t a b => .node t [a.toIdTree, b.toIdTree]
+  | .var t _ => .node t []
+  | .num t _ => .node t []
+
+def BasicStmt.Tagged.toIdTree : BasicStmt.Tagged NodeId → IdTree
+  | .assign t _ e => .node t [e.toIdTree]
+  | .noop t => .node t []
+
+def Stmt.Tagged.toIdTree : Stmt.Tagged NodeId → IdTree
+  | .basic t bs => .node t [bs.toIdTree]
+  | .andThen t a b => .node t [a.toIdTree, b.toIdTree]
+  | .ifElse t e a b => .node t [e.toIdTree, a.toIdTree, b.toIdTree]
+  | .whileLoop t e s => .node t [e.toIdTree, s.toIdTree]
+
+mutual
+inductive Good : ℕ → IdTree → Prop
+  | mk {lo : ℕ} {id : NodeId} {cs : List IdTree} :
+      id.lo = lo → GoodChildren lo cs id.post →
+      Good lo (.node id cs)
+inductive GoodChildren : ℕ → List IdTree → ℕ → Prop
+  | nil {pos : ℕ} : GoodChildren pos [] pos
+  | cons {cur : ℕ} {c : IdTree} {cs : List IdTree} {pos : ℕ} :
+      Good cur c → GoodChildren (c.rootId.post + 1) cs pos →
+      GoodChildren cur (c :: cs) pos
+end
+
+theorem Good.lo_le_post {lo : ℕ} {t : IdTree} (h : Good lo t) : lo ≤ t.rootId.post := by
+  cases h with
+  | mk hlo _ => simp only [NodeId.lo] at hlo; simp only [IdTree.rootId_node]; omega
+
+theorem GoodChildren.cur_le_pos : ∀ {cur : ℕ} (cs : List IdTree) {pos : ℕ},
+    GoodChildren cur cs pos → cur ≤ pos
+  | _, [], _, h => by cases h; exact le_rfl
+  | _, c :: cs, _, h => by
+      cases h with
+      | cons hc hcs =>
+        have := hc.lo_le_post
+        have := GoodChildren.cur_le_pos cs hcs
+        omega
+
+mutual
+theorem Good.mem_posts : ∀ {lo : ℕ} (t : IdTree), Good lo t →
+    ∀ x, x ∈ IdTree.posts t ↔ lo ≤ x ∧ x ≤ t.rootId.post
+  | _, .node id cs, h, x => by
+      cases h with
+      | mk hlo hch =>
+        simp only [IdTree.posts_node, List.mem_cons, IdTree.rootId_node]
+        rw [GoodChildren.mem_postsList cs hch x]
+        simp only [NodeId.lo] at hlo
+        omega
+theorem GoodChildren.mem_postsList : ∀ {cur : ℕ} (cs : List IdTree) {pos : ℕ},
+    GoodChildren cur cs pos → ∀ x, x ∈ IdTree.postsList cs ↔ cur ≤ x ∧ x < pos
+  | _, [], _, h, x => by
+      cases h
+      simp only [IdTree.postsList_nil]
+      constructor
+      · intro hx; exact absurd hx (List.not_mem_nil x)
+      · rintro ⟨h1, h2⟩; exfalso; omega
+  | _, c :: cs, _, h, x => by
+      cases h with
+      | cons hc hcs =>
+        simp only [IdTree.postsList_cons, List.mem_append]
+        rw [Good.mem_posts c hc x, GoodChildren.mem_postsList cs hcs x]
+        have := hc.lo_le_post
+        have := GoodChildren.cur_le_pos cs hcs
+        omega
+end
+
+mutual
+theorem Good.nodup_posts : ∀ {lo : ℕ} (t : IdTree), Good lo t → (IdTree.posts t).Nodup
+  | _, .node id cs, h => by
+      cases h with
+      | mk hlo hch =>
+        simp only [IdTree.posts_node, List.nodup_cons]
+        refine ⟨?_, GoodChildren.nodup_postsList cs hch⟩
+        intro hmem
+        rw [GoodChildren.mem_postsList cs hch id.post] at hmem
+        omega
+theorem GoodChildren.nodup_postsList : ∀ {cur : ℕ} (cs : List IdTree) {pos : ℕ},
+    GoodChildren cur cs pos → (IdTree.postsList cs).Nodup
+  | _, [], _, h => by cases h; simp only [IdTree.postsList_nil, List.nodup_nil]
+  | _, c :: cs, _, h => by
+      cases h with
+      | cons hc hcs =>
+        simp only [IdTree.postsList_cons, List.nodup_append]
+        refine ⟨Good.nodup_posts c hc, GoodChildren.nodup_postsList cs hcs, ?_⟩
+        intro x hx1 hx2
+        rw [Good.mem_posts c hc x] at hx1
+        rw [GoodChildren.mem_postsList cs hcs x] at hx2
+        omega
+end
+
+mutual
+theorem Good.subtree_good : ∀ {lo : ℕ} (t : IdTree), Good lo t →
+    ∀ s ∈ IdTree.subtrees t, Good s.rootId.lo s
+  | _, .node id cs, h, s, hs => by
+      cases h with
+      | mk hlo hch =>
+        rw [IdTree.subtrees_node, List.mem_cons] at hs
+        rcases hs with rfl | hs
+        · simp only [IdTree.rootId_node]; rw [hlo]; exact Good.mk hlo hch
+        · exact GoodChildren.subtree_good cs hch s hs
+theorem GoodChildren.subtree_good : ∀ {cur : ℕ} (cs : List IdTree) {pos : ℕ},
+    GoodChildren cur cs pos → ∀ s ∈ IdTree.subtreesList cs, Good s.rootId.lo s
+  | _, [], _, _, s, hs => by simp only [IdTree.subtreesList_nil, List.not_mem_nil] at hs
+  | _, c :: cs, _, h, s, hs => by
+      cases h with
+      | cons hc hcs =>
+        rw [IdTree.subtreesList_cons, List.mem_append] at hs
+        rcases hs with hs | hs
+        · exact Good.subtree_good c hc s hs
+        · exact GoodChildren.subtree_good cs hcs s hs
+end
+
+mutual
+theorem IdTree.subtrees_subset : ∀ (t : IdTree) {b : IdTree},
+    b ∈ subtrees t → subtrees b ⊆ subtrees t
+  | .node id cs, b, hb => by
+      rw [subtrees_node, List.mem_cons] at hb
+      rcases hb with rfl | hb
+      · exact fun _ h => h
+      · intro x hx
+        rw [subtrees_node, List.mem_cons]
+        exact Or.inr (IdTree.subtreesList_subset cs hb hx)
+theorem IdTree.subtreesList_subset : ∀ (cs : List IdTree) {b : IdTree},
+    b ∈ subtreesList cs → subtrees b ⊆ subtreesList cs
+  | [], b, hb => by simp only [subtreesList_nil, List.not_mem_nil] at hb
+  | c :: cs, b, hb => by
+      rw [subtreesList_cons, List.mem_append] at hb
+      intro x hx
+      rw [subtreesList_cons, List.mem_append]
+      rcases hb with hb | hb
+      · exact Or.inl (IdTree.subtrees_subset c hb hx)
+      · exact Or.inr (IdTree.subtreesList_subset cs hb hx)
+end
+
+theorem IdTree.eq_of_post_eq {l : List IdTree}
+    (h : (l.map (fun s => s.rootId.post)).Nodup) {a c : IdTree}
+    (ha : a ∈ l) (hc : c ∈ l) (hpost : a.rootId.post = c.rootId.post) : a = c := by
+  induction l with
+  | nil => exact absurd ha (List.not_mem_nil a)
+  | cons d ds ih =>
+    simp only [List.map_cons, List.nodup_cons] at h
+    obtain ⟨hd, htl⟩ := h
+    simp only [List.mem_cons] at ha hc
+    rcases ha with rfl | ha <;> rcases hc with rfl | hc
+    · rfl
+    · exfalso; apply hd; rw [hpost]; exact List.mem_map_of_mem _ hc
+    · exfalso; apply hd; rw [← hpost]; exact List.mem_map_of_mem _ ha
+    · exact ih htl ha hc
+
+theorem descendant_iff_of_good {lo : ℕ} {t : IdTree} (hg : Good lo t)
+    {a b : IdTree} (ha : a ∈ IdTree.subtrees t) (hb : b ∈ IdTree.subtrees t) :
+    a.rootId.DescendantOf b.rootId ↔ a ∈ IdTree.subtrees b := by
+  have hgb : Good b.rootId.lo b := Good.subtree_good t hg b hb
+  constructor
+  · rintro ⟨h1, h2⟩
+    have hmem : a.rootId.post ∈ IdTree.posts b := by
+      rw [Good.mem_posts b hgb a.rootId.post]; exact ⟨h1, h2⟩
+    rw [IdTree.posts, List.mem_map] at hmem
+    obtain ⟨c, hc_mem, hc_post⟩ := hmem
+    have hc_t : c ∈ IdTree.subtrees t := IdTree.subtrees_subset t hb hc_mem
+    have hac : a = c :=
+      IdTree.eq_of_post_eq (hg.nodup_posts t) ha hc_t hc_post.symm
+    rw [hac]; exact hc_mem
+  · intro hsub
+    have hmem : a.rootId.post ∈ IdTree.posts b := by
+      rw [IdTree.posts, List.mem_map]; exact ⟨a, hsub, rfl⟩
+    rw [Good.mem_posts b hgb a.rootId.post] at hmem
+    exact hmem
+
+/-! ### Tagging produces a good tree
+
+We bridge from the `tag` traversal to the abstract `Good` invariant, by induction
+on the plain AST.  Each lemma also records that the returned counter is one past
+the root's postorder index. -/
+
+theorem Expr.tag_spec : ∀ (e : Expr) (n : ℕ),
+    Good n (e.tag n).1.toIdTree ∧ (e.tag n).1.toIdTree.rootId.post + 1 = (e.tag n).2 := by
+  intro e
+  induction e with
+  | num k =>
+      intro n
+      refine ⟨?_, ?_⟩
+      · simp only [Expr.tag, Expr.Tagged.toIdTree]
+        exact Good.mk (by simp only [NodeId.lo]; omega) GoodChildren.nil
+      · simp only [Expr.tag, Expr.Tagged.toIdTree, IdTree.rootId_node]
+  | var x =>
+      intro n
+      refine ⟨?_, ?_⟩
+      · simp only [Expr.tag, Expr.Tagged.toIdTree]
+        exact Good.mk (by simp only [NodeId.lo]; omega) GoodChildren.nil
+      · simp only [Expr.tag, Expr.Tagged.toIdTree, IdTree.rootId_node]
+  | add a b iha ihb =>
+      intro n
+      obtain ⟨gA, pA⟩ := iha n
+      obtain ⟨gB, pB⟩ := ihb (a.tag n).2
+      have lA := gA.lo_le_post
+      have lB := gB.lo_le_post
+      refine ⟨?_, ?_⟩
+      · simp only [Expr.tag, Expr.Tagged.toIdTree]
+        refine Good.mk ?_ ?_
+        · simp only [NodeId.lo]; omega
+        · refine GoodChildren.cons gA ?_
+          rw [pA]; refine GoodChildren.cons gB ?_; rw [pB]; exact GoodChildren.nil
+      · simp only [Expr.tag, Expr.Tagged.toIdTree, IdTree.rootId_node]
+  | sub a b iha ihb =>
+      intro n
+      obtain ⟨gA, pA⟩ := iha n
+      obtain ⟨gB, pB⟩ := ihb (a.tag n).2
+      have lA := gA.lo_le_post
+      have lB := gB.lo_le_post
+      refine ⟨?_, ?_⟩
+      · simp only [Expr.tag, Expr.Tagged.toIdTree]
+        refine Good.mk ?_ ?_
+        · simp only [NodeId.lo]; omega
+        · refine GoodChildren.cons gA ?_
+          rw [pA]; refine GoodChildren.cons gB ?_; rw [pB]; exact GoodChildren.nil
+      · simp only [Expr.tag, Expr.Tagged.toIdTree, IdTree.rootId_node]
+
+theorem BasicStmt.tag_spec : ∀ (bs : BasicStmt) (n : ℕ),
+    Good n (bs.tag n).1.toIdTree ∧ (bs.tag n).1.toIdTree.rootId.post + 1 = (bs.tag n).2 := by
+  intro bs
+  cases bs with
+  | noop =>
+      intro n
+      refine ⟨?_, ?_⟩
+      · simp only [BasicStmt.tag, BasicStmt.Tagged.toIdTree]
+        exact Good.mk (by simp only [NodeId.lo]; omega) GoodChildren.nil
+      · simp only [BasicStmt.tag, BasicStmt.Tagged.toIdTree, IdTree.rootId_node]
+  | assign x e =>
+      intro n
+      obtain ⟨gE, pE⟩ := Expr.tag_spec e n
+      have lE := gE.lo_le_post
+      refine ⟨?_, ?_⟩
+      · simp only [BasicStmt.tag, BasicStmt.Tagged.toIdTree]
+        refine Good.mk ?_ ?_
+        · simp only [NodeId.lo]; omega
+        · refine GoodChildren.cons gE ?_
+          rw [pE]; exact GoodChildren.nil
+      · simp only [BasicStmt.tag, BasicStmt.Tagged.toIdTree, IdTree.rootId_node]
+
+theorem Stmt.tag_spec : ∀ (s : Stmt) (n : ℕ),
+    Good n (s.tag n).1.toIdTree ∧ (s.tag n).1.toIdTree.rootId.post + 1 = (s.tag n).2 := by
+  intro s
+  induction s with
+  | basic bs =>
+      intro n
+      obtain ⟨gBs, pBs⟩ := BasicStmt.tag_spec bs n
+      have lBs := gBs.lo_le_post
+      refine ⟨?_, ?_⟩
+      · simp only [Stmt.tag, Stmt.Tagged.toIdTree]
+        refine Good.mk ?_ ?_
+        · simp only [NodeId.lo]; omega
+        · refine GoodChildren.cons gBs ?_
+          rw [pBs]; exact GoodChildren.nil
+      · simp only [Stmt.tag, Stmt.Tagged.toIdTree, IdTree.rootId_node]
+  | andThen a b iha ihb =>
+      intro n
+      obtain ⟨gA, pA⟩ := iha n
+      obtain ⟨gB, pB⟩ := ihb (a.tag n).2
+      have lA := gA.lo_le_post
+      have lB := gB.lo_le_post
+      refine ⟨?_, ?_⟩
+      · simp only [Stmt.tag, Stmt.Tagged.toIdTree]
+        refine Good.mk ?_ ?_
+        · simp only [NodeId.lo]; omega
+        · refine GoodChildren.cons gA ?_
+          rw [pA]; refine GoodChildren.cons gB ?_; rw [pB]; exact GoodChildren.nil
+      · simp only [Stmt.tag, Stmt.Tagged.toIdTree, IdTree.rootId_node]
+  | ifElse e a b iha ihb =>
+      intro n
+      obtain ⟨gE, pE⟩ := Expr.tag_spec e n
+      obtain ⟨gA, pA⟩ := iha (e.tag n).2
+      obtain ⟨gB, pB⟩ := ihb (a.tag (e.tag n).2).2
+      have lE := gE.lo_le_post
+      have lA := gA.lo_le_post
+      have lB := gB.lo_le_post
+      refine ⟨?_, ?_⟩
+      · simp only [Stmt.tag, Stmt.Tagged.toIdTree]
+        refine Good.mk ?_ ?_
+        · simp only [NodeId.lo]; omega
+        · refine GoodChildren.cons gE ?_
+          rw [pE]; refine GoodChildren.cons gA ?_
+          rw [pA]; refine GoodChildren.cons gB ?_; rw [pB]; exact GoodChildren.nil
+      · simp only [Stmt.tag, Stmt.Tagged.toIdTree, IdTree.rootId_node]
+  | whileLoop e s ih =>
+      intro n
+      obtain ⟨gE, pE⟩ := Expr.tag_spec e n
+      obtain ⟨gS, pS⟩ := ih (e.tag n).2
+      have lE := gE.lo_le_post
+      have lS := gS.lo_le_post
+      refine ⟨?_, ?_⟩
+      · simp only [Stmt.tag, Stmt.Tagged.toIdTree]
+        refine Good.mk ?_ ?_
+        · simp only [NodeId.lo]; omega
+        · refine GoodChildren.cons gE ?_
+          rw [pE]; refine GoodChildren.cons gS ?_; rw [pS]; exact GoodChildren.nil
+      · simp only [Stmt.tag, Stmt.Tagged.toIdTree, IdTree.rootId_node]
+
+/-- A freshly tagged program is a well-tagged tree (rooted at postorder start `0`). -/
+theorem good_tagStmt (s : Stmt) : Good 0 (tagStmt s).toIdTree :=
+  (Stmt.tag_spec s 0).1
+
+/-- **Descendant characterization.** The numeric `NodeId.DescendantOf` relation on
+two nodes of a tagged program holds exactly when one is structurally contained in
+the other's subtree. -/
+theorem descendant_iff_tagStmt (s : Stmt) {a b : IdTree}
+    (ha : a ∈ IdTree.subtrees (tagStmt s).toIdTree)
+    (hb : b ∈ IdTree.subtrees (tagStmt s).toIdTree) :
+    a.rootId.DescendantOf b.rootId ↔ a ∈ IdTree.subtrees b :=
+  descendant_iff_of_good (good_tagStmt s) ha hb
+```
--- a/lean/Spa/Language/Tagged/Derive.lean
+++ b/lean/Spa/Language/Tagged/Derive.lean
@@ -0,0 +1,241 @@
+import Lean
+import Mathlib.Tactic.DeriveTraversable
+import Spa.Language.Base
+import Spa.Language.Tagged.Id
+
+/-!
+# The `derive_tagged` command
+
+`derive_tagged T₁ T₂ … Tₙ` takes a family of (possibly mutually recursive)
+inductive types and generates, for each `Tᵢ`:
+
+* a *tagged* mirror inductive `Tᵢ.Tagged (τ : Type)`, in which every constructor
+  carries a leading `tag : τ` field and every field whose type is a family
+  member is retyped to its `.Tagged τ` counterpart;
+* `Tᵢ.Tagged.erase : Tᵢ.Tagged τ → Tᵢ`, forgetting all tags;
+* `Tᵢ.tag : Tᵢ → ℕ → Tᵢ.Tagged NodeId × ℕ`, assigning every node a unique
+  `NodeId` (its postorder index) by a single unified traversal that threads a
+  counter; the whole family shares one counter, so identifiers are unique across
+  types.
+
+The generated declarations have exactly the shape of the hand-written reference;
+see `Spa/Language/Tagged/Basic.lean` (which invokes this command) and the proofs
+in `Spa/Language/Tagged/Properties.lean`.
+
+Scope: the generator handles non-indexed inductives whose constructor fields are
+either scalars or *direct* references to a family member (which covers the object
+language).  Nested occurrences such as `List Tᵢ` are not supported.
+-/
+
+open Lean Elab Command Meta
+
+namespace Spa.DeriveTagged
+
+/-- One constructor field, classified as a recursive family reference or a scalar
+(whose type syntax we keep verbatim for the mirror inductive). -/
+structure FieldData where
+  isRec : Bool
+  recType : Name
+  typeStx : Term
+
+/-- A constructor: its original (full) name, short name, and fields. -/
+structure CtorData where
+  origName : Name
+  shortName : Name
+  fields : Array FieldData
+
+/-- A family member together with its constructors. -/
+structure TypeData where
+  name : Name
+  ctors : Array CtorData
+
+def taggedOf (n : Name) : Name := n ++ `Tagged
+def eraseOf (n : Name) : Name := n ++ `Tagged ++ `erase
+def rootTagOf (n : Name) : Name := n ++ `Tagged ++ `rootTag
+def tagOf (n : Name) : Name := n ++ `tag
+def foldTagsOf (n : Name) : Name := n ++ `Tagged ++ `foldTags
+
+/-- Inspect the family, classifying each constructor field. -/
+def gather (family : Array Name) (τ : Ident) : TermElabM (Array TypeData) := do
+  let famSet : NameSet := family.foldl (·.insert ·) {}
+  family.mapM fun tn => do
+    let iv ← getConstInfoInduct tn
+    let ctors ← iv.ctors.toArray.mapM fun cn => do
+      let cv ← getConstInfoCtor cn
+      let fields ← forallTelescopeReducing cv.type fun args _ => do
+        let fieldArgs := args.extract iv.numParams args.size
+        fieldArgs.mapM fun a => do
+          let ty ← inferType a
+          match ty.getAppFn.constName? with
+          | some hn =>
+            if famSet.contains hn then
+              return { isRec := true, recType := hn, typeStx := ← `($(mkIdent (taggedOf hn)) $τ) }
+            else
+              return { isRec := false, recType := default, typeStx := ← Lean.PrettyPrinter.delab ty }
+          | none =>
+            return { isRec := false, recType := default, typeStx := ← Lean.PrettyPrinter.delab ty }
+      return { origName := cn, shortName := cn.componentsRev.head!, fields }
+    return { name := tn, ctors }
+
+/-- The arrow type `τ → <fields…> → Self τ` of a tagged constructor. -/
+def ctorArrow (cd : CtorData) (self : Term) (τ : Ident) : TermElabM Term := do
+  let mut t := self
+  for f in cd.fields.reverse do
+    t ← `($(f.typeStx) → $t)
+  `($τ → $t)
+
+/-- The tagged mirror inductives, one per family member.  The family is a DAG
+(`Expr ← BasicStmt ← Stmt`), not genuinely mutual, so they are emitted as
+separate inductives in dependency order rather than a `mutual` block.
+
+`Functor`/`Traversable` instances are derived separately by `mkDeriveInstances`
+below rather than via an inline `deriving` clause. -/
+def mkInductives (tds : Array TypeData) (τ : Ident) :
+    CommandElabM (Array (TSyntax `command)) := do
+  tds.mapM fun td => do
+    let self ← `($(mkIdent (taggedOf td.name)) $τ)
+    let ctors ← td.ctors.mapM fun cd => do
+      let aty ← Command.liftTermElabM (ctorArrow cd self τ)
+      `(Lean.Parser.Command.ctor| | $(mkIdent cd.shortName):ident : $aty)
+    `(command| inductive $(mkIdent (taggedOf td.name)):ident ($τ : Type) where $ctors*)
+
+/-- A `deriving instance Functor, Traversable for Tᵢ.Tagged` command per family
+member.  Since every tagged type is a single-parameter, direct-recursive
+inductive in `τ`, Mathlib's deriving handler produces clean (`sorry`-free)
+instances, giving `map`, `traverse`, and the `Traversable.foldr`/`toList` folds
+for free.
+
+These are emitted as *separate* commands in dependency order (rather than an
+inline `deriving` clause on each inductive) for two reasons: deriving
+`Stmt.Tagged` needs the `Expr.Tagged`/`BasicStmt.Tagged` instances already in
+scope, and — because every member's type name ends in `.Tagged` — the handler's
+auto-generated instance name (`instFunctorTagged`, built from the type's last
+component) collides across the family unless each derive sees the environment
+the previous one updated; separate commands give it that, so the names
+disambiguate to `instFunctorTagged`, `instFunctorTagged_1`, ….
+
+The hand-written `foldTags` is retained alongside these: it is a
+structural-recursion fold that `simp`/`decide` reduce cleanly, unlike the
+abstract `Traversable.foldr` (defined via the `FreeMonoid`/`Const` applicative),
+which reduces under `decide`/`rfl` but not naive `simp` unfolding. -/
+def mkDeriveInstances (tds : Array TypeData) : CommandElabM (Array (TSyntax `command)) := do
+  tds.mapM fun td =>
+    `(command| deriving instance Functor, Traversable for $(mkIdent (taggedOf td.name)))
+
+/-- The `erase` functions, one per family member (separate defs in dependency
+order — each calls only already-defined lower members). -/
+def mkErase (tds : Array TypeData) : CommandElabM (Array (TSyntax `command)) := do
+  tds.mapM fun td => do
+    let mut pats : Array Term := #[]
+    let mut rhss : Array Term := #[]
+    for cd in td.ctors do
+      let argNames := (Array.range cd.fields.size).map (fun i => mkIdent (.mkSimple s!"a{i}"))
+      let pat ← `($(mkIdent (taggedOf td.name ++ cd.shortName)) _ $argNames*)
+      let eraseArgs ← (cd.fields.zip argNames).mapM fun (f, a) =>
+        if f.isRec then `($(mkIdent (eraseOf f.recType)) $a) else pure a
+      let rhs ← `($(mkIdent cd.origName) $eraseArgs*)
+      pats := pats.push pat
+      rhss := rhss.push rhs
+    `(command| def $(mkIdent (eraseOf td.name)) {τ : Type} :
+        $(mkIdent (taggedOf td.name)) τ → $(mkIdent td.name) :=
+        fun x => match x with $[| $pats => $rhss]*)
+
+/-- The `rootTag` accessors (one non-recursive `def` per type). -/
+def mkRootTag (tds : Array TypeData) : CommandElabM (Array (TSyntax `command)) := do
+  let tIdent := mkIdent `t
+  tds.mapM fun td => do
+    let mut pats : Array Term := #[]
+    let mut rhss : Array Term := #[]
+    for cd in td.ctors do
+      let hole ← `(_)
+      let wilds := Array.mkArray cd.fields.size hole
+      pats := pats.push (← `($(mkIdent (taggedOf td.name ++ cd.shortName)) $tIdent $wilds*))
+      rhss := rhss.push tIdent
+    `(command| def $(mkIdent (rootTagOf td.name)) {τ : Type} :
+        $(mkIdent (taggedOf td.name)) τ → τ :=
+        fun x => match x with $[| $pats => $rhss]*)
+
+/-- The postorder `tag` functions, one per family member (separate defs in
+dependency order). -/
+def mkTag (tds : Array TypeData) : CommandElabM (Array (TSyntax `command)) := do
+  let nId := mkIdent ``Spa.NodeId
+  tds.mapM fun td => do
+    let mut pats : Array Term := #[]
+    let mut rhss : Array Term := #[]
+    for cd in td.ctors do
+      let argNames := (Array.range cd.fields.size).map (fun i => mkIdent (.mkSimple s!"a{i}"))
+      let pat ← `($(mkIdent cd.origName) $argNames*)
+      let mut cur : Term ← `(n)
+      let mut lets : Array (Ident × Term) := #[]
+      let mut taggedArgs : Array Term := #[]
+      let mut ri := 0
+      for (f, a) in cd.fields.zip argNames do
+        if f.isRec then
+          let rName := mkIdent (.mkSimple s!"r{ri}")
+          let rhsCall ← `($(mkIdent (tagOf f.recType)) $a $cur)
+          lets := lets.push (rName, rhsCall)
+          taggedArgs := taggedArgs.push (← `($rName |>.1))
+          cur ← `($rName |>.2)
+          ri := ri + 1
+        else
+          taggedArgs := taggedArgs.push a
+      let last := cur
+      let tagged ← `($(mkIdent (taggedOf td.name ++ cd.shortName))
+          (⟨$last⟩ : $nId) $taggedArgs*)
+      let mut body ← `(($tagged, $last + 1))
+      for (rName, rhs) in lets.reverse do
+        body ← `(let $rName := $rhs; $body)
+      pats := pats.push pat
+      rhss := rhss.push body
+    `(command| def $(mkIdent (tagOf td.name)) :
+        $(mkIdent td.name) → Nat → $(mkIdent (taggedOf td.name)) $nId × Nat :=
+        fun e n => match e with $[| $pats => $rhss]*)
+
+/-- The tag-fold functions: `foldTags f acc t` applies `f` to every tag in `t`,
+right-to-left, threading `acc`.  This is the `Foldable`/`foldr`-over-tags the
+hand-written collectors (e.g. `subtreeIds`) reduce to.  One separate def per
+family member (the family is a DAG, so no `mutual` block is needed). -/
+def mkFoldTags (tds : Array TypeData) : CommandElabM (Array (TSyntax `command)) := do
+  let τ := mkIdent `τ
+  let m := mkIdent `M
+  let fId := mkIdent `f
+  let accId := mkIdent `acc
+  let tagId := mkIdent `t
+  tds.mapM fun td => do
+    let mut pats : Array Term := #[]
+    let mut rhss : Array Term := #[]
+    for cd in td.ctors do
+      let argNames := (Array.range cd.fields.size).map (fun i => mkIdent (.mkSimple s!"a{i}"))
+      let pat ← `($(mkIdent (taggedOf td.name ++ cd.shortName)) $tagId $argNames*)
+      let mut body : Term := accId
+      for (fld, a) in (cd.fields.zip argNames).reverse do
+        if fld.isRec then
+          body ← `($(mkIdent (foldTagsOf fld.recType)) $fId $body $a)
+      body ← `($fId $tagId $body)
+      pats := pats.push pat
+      rhss := rhss.push body
+    `(command| def $(mkIdent (foldTagsOf td.name)) {$τ:ident : Type} {$m:ident : Type}
+        ($fId : $τ → $m → $m) ($accId : $m) :
+        $(mkIdent (taggedOf td.name)) $τ → $m :=
+        fun x => match x with $[| $pats => $rhss]*)
+
+/-- `derive_tagged T₁ … Tₙ` — generate tagged mirrors, `erase`, and `tag` for the
+given family of inductives. -/
+syntax (name := deriveTaggedCmd) "derive_tagged " ident+ : command
+
+@[command_elab deriveTaggedCmd]
+def elabDeriveTagged : CommandElab := fun stx => do
+  match stx with
+  | `(derive_tagged $ids*) =>
+    let family ← ids.mapM fun i => Command.liftCoreM (realizeGlobalConstNoOverload i)
+    let τ := mkIdent `τ
+    let tds ← Command.liftTermElabM (gather family τ)
+    for d in (← mkInductives tds τ) do elabCommand d
+    for d in (← mkDeriveInstances tds) do elabCommand d
+    for d in (← mkRootTag tds) do elabCommand d
+    for d in (← mkErase tds) do elabCommand d
+    for d in (← mkTag tds) do elabCommand d
+    for d in (← mkFoldTags tds) do elabCommand d
+  | _ => throwUnsupportedSyntax
+
+end Spa.DeriveTagged
--- a/lean/Spa/Language/Tagged/Graphs.lean
+++ b/lean/Spa/Language/Tagged/Graphs.lean
@@ -0,0 +1,63 @@
+import Spa.Language
+import Spa.Language.Graphs
+import Spa.Language.Tagged.Basic
+import Spa.Language.Tagged.Properties
+
+namespace Spa
+
+open GGraph
+
+def Stmt.Tagged.cfg : Stmt.Tagged NodeId → GGraph (List (BasicStmt.Tagged NodeId))
+  | .basic _ bs => GGraph.singleton [bs]
+  | .andThen _ s₁ s₂ => s₁.cfg ⤳  s₂.cfg
+  | .ifElse _ _ s₁ s₂ => s₁.cfg ∙ s₂.cfg
+  | .whileLoop _ _ s => GGraph.loop s.cfg
+
+theorem Stmt.Tagged.cfg_graph : ∀ (t : Stmt.Tagged NodeId),
+    t.cfg.map (List.map BasicStmt.Tagged.erase) = t.erase.cfg
+  | .basic _ bs => by simp [Stmt.Tagged.cfg, Stmt.cfg, Stmt.Tagged.erase, BasicStmt.Tagged.erase]
+  | .andThen _ s₁ s₂ => by
+      simp [Stmt.Tagged.cfg, Stmt.cfg, Stmt.Tagged.erase, Stmt.Tagged.cfg_graph s₁, Stmt.Tagged.cfg_graph s₂]
+  | .ifElse _ _ s₁ s₂ => by
+      simp [Stmt.Tagged.cfg, Stmt.cfg, Stmt.Tagged.erase, Stmt.Tagged.cfg_graph s₁, Stmt.Tagged.cfg_graph s₂]
+  | .whileLoop _ _ s => by
+      simp [Stmt.Tagged.cfg, Stmt.cfg, Stmt.Tagged.erase, Stmt.Tagged.cfg_graph s]
+
+def GGraph.nodeLabel (g : GGraph (List (BasicStmt.Tagged NodeId))) (i : g.Index) : Option NodeId :=
+  (g.nodes i).head?.map BasicStmt.Tagged.rootTag
+
+def GGraph.stateOf (g : GGraph (List (BasicStmt.Tagged NodeId))) (id : NodeId) : Option g.Index :=
+  g.indices.find? (fun i => decide (g.nodeLabel i = some id))
+
+theorem GGraph.stateOf_label {g : GGraph (List (BasicStmt.Tagged NodeId))} {id : NodeId}
+    {i : g.Index} (h : g.stateOf id = some i) : g.nodeLabel i = some id := by
+  rw [GGraph.stateOf] at h
+  simpa using List.find?_some h
+
+namespace Program
+
+variable (p : Program)
+
+def tagged : Stmt.Tagged NodeId := tagStmt p.rootStmt
+
+def taggedCfg : GGraph (List (BasicStmt.Tagged NodeId)) :=
+  GGraph.wrap p.tagged.cfg
+
+theorem taggedCfg_erase :
+    p.taggedCfg.map (List.map BasicStmt.Tagged.erase) = p.cfg := by
+  rw [taggedCfg, GGraph.map_wrap, Stmt.Tagged.cfg_graph, tagged, erase_tagStmt]
+  rfl
+
+theorem taggedCfg_size : p.taggedCfg.size = p.cfg.size := by
+  conv_rhs => rw [← p.taggedCfg_erase]
+  rfl
+
+def nodeIdOf (s : p.State) : Option NodeId :=
+  p.taggedCfg.nodeLabel (Fin.cast p.taggedCfg_size.symm s)
+
+def stateOfNodeId (id : NodeId) : Option p.State :=
+  (p.taggedCfg.stateOf id).map (Fin.cast p.taggedCfg_size)
+
+end Program
+
+end Spa
--- a/lean/Spa/Language/Tagged/Id.lean
+++ b/lean/Spa/Language/Tagged/Id.lean
@@ -0,0 +1,9 @@
+import Mathlib.Data.Nat.Notation
+
+namespace Spa
+
+structure NodeId where
+  post : ℕ
+  deriving DecidableEq, Repr
+
+end Spa
--- a/lean/Spa/Language/Tagged/Properties.lean
+++ b/lean/Spa/Language/Tagged/Properties.lean
@@ -0,0 +1,29 @@
+import Spa.Language.Tagged.Basic
+
+namespace Spa
+
+@[simp] theorem Expr.erase_tag (e : Expr) (n : ℕ) : (e.tag n).1.erase = e := by
+  induction e generalizing n with
+  | add a b iha ihb => simp [Expr.tag, Expr.Tagged.erase, iha, ihb]
+  | sub a b iha ihb => simp [Expr.tag, Expr.Tagged.erase, iha, ihb]
+  | var x => simp [Expr.tag, Expr.Tagged.erase]
+  | num k => simp [Expr.tag, Expr.Tagged.erase]
+
+@[simp] theorem BasicStmt.erase_tag (bs : BasicStmt) (n : ℕ) :
+    (bs.tag n).1.erase = bs := by
+  cases bs with
+  | assign x e => simp [BasicStmt.tag, BasicStmt.Tagged.erase]
+  | noop => simp [BasicStmt.tag, BasicStmt.Tagged.erase]
+
+@[simp] theorem Stmt.erase_tag (s : Stmt) (n : ℕ) : (s.tag n).1.erase = s := by
+  induction s generalizing n with
+  | basic bs => simp [Stmt.tag, Stmt.Tagged.erase]
+  | andThen a b iha ihb => simp [Stmt.tag, Stmt.Tagged.erase, iha, ihb]
+  | ifElse e a b iha ihb => simp [Stmt.tag, Stmt.Tagged.erase, iha, ihb]
+  | whileLoop e s ih => simp [Stmt.tag, Stmt.Tagged.erase, ih]
+
+/-- Erasing a freshly tagged program recovers it. -/
+theorem erase_tagStmt (s : Stmt) : (tagStmt s).erase = s := by
+  simp [tagStmt]
+
+end Spa
--- a/lean/Spa/Language/Tagged/TODO.md
+++ b/lean/Spa/Language/Tagged/TODO.md
@@ -0,0 +1,46 @@
+# Tagged AST — follow-ups
+
+## Descendant tracking — parked
+
+The interval-labeling descendant test and its correctness proof
+(`descendant_iff_tagStmt` and supporting rose-tree/`Good` machinery) have been
+removed from the live code and parked in `DESCENDANT-TRACKING.md`, with a revival
+checklist. It's a computational optimization not yet needed; revive it (and the
+`NodeId.desc` field) when LICM wants fast ancestor queries.
+
+## ID → CFG-state mapping — plan part B — DONE
+
+`Graphs.lean` now defines a payload-generic `GGraph α` (with `Graph := GGraph
+(List BasicStmt)` as the concrete CFG), so the labelled CFG **reuses** the graph
+combinators instead of mirroring them. In `Cfg.lean`:
+`buildCfgL : Stmt.Tagged NodeId → GGraph (List (BasicStmt.Tagged NodeId))` is just
+`buildCfg` at the tagged payload; `buildCfgL_graph :
+(buildCfgL t).map (List.map erase) = buildCfg t.erase` connects it to the real
+CFG; and `GGraph.nodeLabel`/`GGraph.stateOf` read a node's id straight from its
+payload (`stateOf_label` is the soundness). No `LGraph`, no separate `label`
+field, no duplicated combinators.
+
+## ID → CFG-state mapping — totality — DONE
+
+The `Option`-valued `nodeIdOf`/`stateOfNodeId` are now proven total on the inputs
+that matter (`Graphs.lean`), via a payload-list characterization of the CFG:
+
+- `GGraph.nodeList` flattens `nodes` into the list of payloads, with combinator
+  lemmas (`nodeList_comp/link/loop/wrap`) reducing it through the CFG builders.
+- `Stmt.Tagged.basics` lists a program's basic statements; the master lemma
+  `Stmt.Tagged.cfg_nodeList_filter` (and its program-level
+  `taggedCfg_nodeList_filter`) shows the non-empty CFG nodes are *exactly* the
+  singletons `[bs]` for `bs ∈ basics`.
+- AST ⇒ CFG: `exists_state_of_mem_basics` (a state with payload `[bs]`) and
+  `stateOfNodeId_isSome` (the search succeeds).
+- CFG ⇒ AST: `exists_basic_of_code_ne_nil` (a non-empty node is `[bs]`, with
+  `code = [bs.erase]` and `nodeIdOf = some bs.rootTag`) and `nodeIdOf_isSome`.
+
+All `propext`/`Quot.sound`-only (no `sorry`, no choice).
+
+Remaining nice-to-have:
+- Injectivity: distinct basic-statement ids map to distinct states, giving a
+  two-sided id ↔ state correspondence (upgrading the existence results above to a
+  genuine bijection, and pinning `stateOfNodeId (bs.rootTag)` to *the* state
+  holding `bs`). The `tag`-uniqueness fact this needs (`Nodup` of postorder tags)
+  was part of the parked descendant machinery in `DESCENDANT-TRACKING.md`.