# Spreadsheet DSL v0 Specification A Python-style DSL for generating spreadsheets reliably (especially from LLMs). v0 targets **XLSX output**, but is designed around an **adapter-neutral IR** so other backends (e.g., Numbers, Google Sheets) can be added later. ## v0 goals - Deterministic layout with **implicit flow** + **explicit anchors** - “Formula-by-default” semantics (spreadsheet recalculation) - Unplaced “refs” using `*`, placed later with `place` - Sheet-qualified imports/exports (no ambiguous bare imports) - Small set of adapter-neutral functions: `sum`, `average`, `min`, `max` - Strict validation with actionable errors --- # 1. Program and Compilation Model ## 1.1 Files are modules (no module header) Every `.dsl` file is an implicit module. There is **no module declaration**. ## 1.2 Whole-program compilation (required) Given an entry file: 1. Parse entry file → discover imports 2. Parse all imported files recursively 3. Build export tables for each module 4. Resolve imports in the entry module 5. Lower AST → adapter-neutral workbook IR 6. Layout: place blocks and resolve addresses 7. Emit via adapter (v0: XLSX) **Line order affects only placement flow** (cursor movement) inside a sheet. It must not affect name resolution (imports/exports). --- # 2. Lexical and Naming Rules ## 2.1 Identifiers Identifiers match: - `NAME := [A-Za-z_][A-Za-z0-9_]*` ## 2.2 Strings and numbers - Strings: double-quoted only (`"..."`) - Numbers: signed/unsigned, integer/float ## 2.3 Cell addresses Cell addresses are strings, validated at compile time: - `"A1"`, `"B2"`, `"AA10"` etc. (v0 does **not** support `"B"` column-only anchors.) --- # 3. Scopes and Symbols ## 3.1 Scopes | Scope | Contains | Notes | |------|----------|------| | Module | imports, sheets, exports | Import aliases live here | | Sheet | sheet variables | Variables bind to blocks or scalars | | Table-row | `row` pseudo-variable | Only valid inside table row formula expressions | ## 3.2 Symbol states (important for correctness) Every symbol in a sheet is one of: - **Placed**: has a concrete cell/range address - **Unplaced ref**: declared with `*`, not yet placed - **Non-placeable value**: (v0 mostly avoids this) Rules: - Referencing an **unplaced ref** in an expression is an error (it has no address). - An unplaced ref may be **placed exactly once** in v0. --- # 4. Workbooks and Sheets ## 4.1 Workbook block (optional) ```dsl workbook "Finance Pack": sheet "Data" as DataSheet: ... sheet "Dashboard" as Dash: ... ``` If `workbook` is omitted, sheets still form an implicit workbook. ## 4.2 Sheet declaration (required alias) ```dsl sheet "Dashboard" as Dash: ... ``` - Sheet title is user-facing name in Excel - Sheet alias is stable identifier for imports/exports and compiler internals --- # 5. Placement, Layout, and Cursor Semantics The language places **blocks** (cells, columns, tables). Each sheet maintains a **cursor**. ## 5.1 Cursor - Initial cursor is `A1` - Cursor position is maintained per sheet ## 5.2 Placement strategies A placement strategy determines how the cursor advances after placing a block: - **below**: move down by `(block.height + gap)` - **right**: move right by `(block.width + gap)` Default strategy in a plain sheet body: **below gap=1** ## 5.3 Placement contexts ### `at "B2":` Temporarily sets the cursor to an absolute address **for the duration of the block**. ```dsl at "B2": title = "MRR" mrr = 42100 ``` ### `below gap=1:` Sets strategy to *below* within the block. ```dsl below gap=2: a = "One" b = "Two" ``` ### `right gap=1:` Sets strategy to *right* within the block. ```dsl right gap=1: a = "Left" b = "Right" ``` ## 5.4 Option A semantics (selected): `at` is a temporary teleport, flow resumes This resolves the ambiguity you pointed out. **Rule:** 1. `below/right` define the **active placement strategy** within their block. 2. `at` sets the cursor **absolutely** for its nested block. 3. When an `at` block ends, the cursor becomes the position **immediately after the last placement inside `at`**, advanced using the **surrounding strategy**. ### Example: meaningful nesting ```dsl sheet "S" as S: below gap=1: a = "Top" # A1 at "B6": b = "Here" # B6 c = "Below Here" # B7 d = "Continues" # B8 (below c, because surrounding strategy is below) ``` If an `at` block places nothing, it has no effect other than validation. --- # 6. Binding: Placed vs Unplaced (`*`) and `place` ## 6.1 Placed binding (default) In a sheet or placement block, `x = <placeable>` places the object at the current cursor. ```dsl sheet "S" as S: x = 10 # writes literal 10 at A1 y = x + 5 # writes formula at A2 ``` ## 6.2 Virtual binding `lazy name = <placeable>` defines an **unplaced ref**; nothing is written yet. ```dsl sheet "S" as S: lazy t = table(columns=["A"], data=[[1]]) at "D4": place t ``` All lazy refs must eventually be placed. ## 6.3 `place` statement Places an unplaced ref at: - current cursor, or - explicit address if provided ```dsl place t place t at "D4" at "D4": place t ``` ### v0 placement constraints - You can `place` an unplaced ref **once** (second placement is an error). - `place` on already placed symbols is an error. --- # 7. Values, Expressions, and Formula Semantics ## 7.1 Literals - Numbers: `12000`, `-3.5` - Strings: `"Jan"` - Booleans: `true`, `false` - Lists: `[1,2,3]`, nested lists `[[...],[...]]` (only meaningful inside `col()` or `table()`) ## 7.2 Formula-by-default Expressions produce spreadsheet formulas unless they’re pure literals (which can be written as constants). ```dsl profit = revenue - costs ``` This writes an Excel formula referencing the placed cells for `revenue` and `costs`. ## 7.3 Compile-time evaluation `const expr` forces compile-time evaluation and writes the literal result. ```dsl const x = 2 + 2 # writes 4 ``` v0 rules: - `const` only accepts expressions composed solely of literals and operators (`+ - * /`). - `const` cannot reference sheet symbols, `row.*`, `sum`, etc. ## 7.4 Operators (v0) - `+ - * /` - unary `-` - parentheses `( ... )` Comparisons, boolean operators, conditionals are out of v0. --- # 8. Adapter-neutral Functions (v0 common subset) v0 supports only common spreadsheet functions with consistent semantics: - `sum(x)` - `average(x)` - `min(x)` - `max(x)` ### What `x` may be (v0) - a placed `col(...)` range (Referencing table columns as ranges can be added in v1.) ### Adapter neutrality The compiler lowers function calls to IR function IDs: - `SUM`, `AVERAGE`, `MIN`, `MAX` The adapter maps these IDs to concrete formula syntax. v0 ships only XLSX mapping (identical names), but the IR is backend-independent. --- # 9. Columns ## 9.1 Column constructor ```dsl lazy prices = col([10, 12, 11, 14]) lazy qty = col([ 2, 1, 3, 2]) ``` `col([..])` is a **placeable block** that becomes a vertical range when placed. ## 9.2 Elementwise operations If both operands are placed columns of equal length, operations are elementwise: ```dsl line_total = prices * qty ``` ### v0 constraint - Column-vector ops require both sides to be columns of equal length (or column × scalar). Anything else is an error. ## 9.3 Aggregations Aggregation functions require placed ranges: ```dsl at "A1": place line_total total = sum(line_total) ``` --- # 10. Tables ## 10.1 Table constructor ```dsl pnl = table( columns=["Month","Revenue","Costs","Profit"], data=[ ["Jan", 12000, 7000, row.Revenue - row.Costs], ["Feb", 15000, 9000, row.Revenue - row.Costs], ] ) ``` ## 10.2 Rules - `columns` is required - column names must be valid identifiers (v0 restriction) - `data` is required: list of rows - each row length must equal `len(columns)` - `row.<ColumnName>` only valid inside expressions in `data` rows - referencing a column that doesn’t exist is an error ## 10.3 Placement output When placed, the table writes: - header row: column names - data rows: literals and/or formulas `row.Revenue` compiles to the cell in the current row under column `Revenue`. --- # 11. Imports and Exports (Sheet-qualified) A module can export: - a whole sheet alias - specific members from a sheet alias ## 11.1 Exports ### Export whole sheet ```dsl export DataSheet ``` ### Export members from a sheet ```dsl export mrr, last_month_mrr from DataSheet ``` Rules: - `DataSheet` must be a sheet alias defined in this file - exported members must exist in that sheet scope - exporting a name that is an unplaced ref is allowed only if it is placed before compilation completes (v0: treat exporting unplaced as error unless placed) ## 11.2 Imports ### Namespace import ```dsl import "path/to/data.dsl" as Data ``` Access: ```dsl Data.DataSheet.mrr ``` ### From-import (sheet) ```dsl from "path/to/data.dsl" import DataSheet ``` Access: ```dsl DataSheet.mrr ``` ### From-import (sheet with alias) ```dsl from "path/to/data.dsl" import DataSheet as Table ``` Access: ```dsl Table.mrr ``` ### From-import (members from a sheet) ```dsl from "path/to/data.dsl" import mrr, last_month_mrr from DataSheet ``` Access: ```dsl mrr - last_month_mrr ``` ### From-import with alias ```dsl from "path/to/data.dsl" import mrr as current_mrr from DataSheet ``` ## 11.3 Disallowed in v0 Bare member imports without sheet: ```dsl from "data.dsl" import mrr ``` This must be a compile-time error. --- # 12. Validation and Error Cases (v0 must be strict) ## 12.1 Invalid: `row` outside table ```dsl sheet "S" as S: x = row.Revenue ``` Error: `row` is only valid inside `table(... data=[...])` row expressions. ## 12.2 Invalid: placing raw list ```dsl sheet "S" as S: at "A1": place [1,2,3] ``` Error: list literal is not a placeable block; use `col([..])` or `table(...)`. ## 12.3 Invalid: using unplaced ref in expression ```dsl sheet "S" as S: lazy c = col([1,2,3]) x = sum(c) ``` Error: `c` is unplaced; It must be placed at some point in the sheet. It can be referenced before being placed. ## 12.4 Invalid: placing an unplaced ref twice ```dsl sheet "S" as S: lazy t = table(columns=["A"], data=[[1]]) at "A1": place t at "A10": place t ``` Error: unplaced ref `t` already placed. ## 12.5 Invalid: table row length mismatch ```dsl table(columns=["A","B"], data=[[1]]) ``` Error: row has 1 cells, expected 2. --- # 13. Adapter Model (v0: XLSX only) The compiler produces an **adapter-neutral IR** and then calls an adapter to write a file. v0 ships: - XLSX adapter only The codebase must be structured so new adapters can be added without changing the frontend (parser/sema/lower/layout). --- ## Appendix A — Reference Lark Grammar (v0) This is a reference grammar. The actual compiler will be in TypeScript; it should conform to the same syntax rules. Save as `spreadsheet_dsl_v0.lark`: ```lark // Spreadsheet DSL v0 reference grammar ?start: program program: (stmt NEWLINE)* stmt? ?stmt: workbook_decl | sheet_decl | export_stmt | import_stmt | from_import_stmt workbook_decl: "workbook" STRING ":" NEWLINE INDENT (sheet_decl NEWLINE)* sheet_decl? DEDENT sheet_decl: "sheet" STRING "as" NAME ":" NEWLINE INDENT (sheet_stmt NEWLINE)* sheet_stmt? DEDENT ?sheet_stmt: placement_block | assign_stmt | place_stmt placement_block: ("at" STRING | "below" gap_opt? | "right" gap_opt? ) ":" NEWLINE INDENT (sheet_stmt NEWLINE)* sheet_stmt? DEDENT gap_opt: "gap" "=" NUMBER assign_stmt: "lazy" NAME "=" expr -> assign_unplaced | NAME "=" expr -> assign_placed place_stmt: "place" NAME ("at" STRING)? export_stmt: "export" NAME -> export_sheet | "export" name_list "from" NAME -> export_members import_stmt: "import" STRING "as" NAME from_import_stmt: "from" STRING "import" NAME -> from_import_sheet | "from" STRING "import" name_list "from" NAME -> from_import_members | "from" STRING "import" NAME "as" NAME "from" NAME -> from_import_member_alias name_list: NAME ("," NAME)* ?expr: term | expr "+" term -> add | expr "-" term -> sub ?term: factor | term "*" factor -> mul | term "/" factor -> div ?factor: atom | "-" factor -> neg ?atom: literal | NAME -> var | attr | call | list | "(" expr ")" attr: NAME "." NAME -> attr call: NAME "(" [args] ")" args: expr ("," expr)* list: "[" [list_items] "]" list_items: expr ("," expr)* literal: NUMBER -> number | STRING -> string | "true" -> true | "false" -> false %import common.CNAME -> NAME %import common.SIGNED_NUMBER -> NUMBER %import common.ESCAPED_STRING -> STRING %import common.NEWLINE %declare INDENT DEDENT ``` Notes: - This grammar doesn’t encode Option A cursor semantics; those are implemented in the layout engine. - `call` allows any function name; semantic analysis must restrict to allowed built-ins in v0 (`sum`, `average`, `min`, `max`, `col`, `table`, `calc`). --- ## Appendix B — TypeScript Implementation Guide (v0) ### B.1 Repo layout ```text dsl-compiler/ package.json tsconfig.json src/ index.ts cli/ main.ts fs/ loader.ts # resolves imports, reads files path.ts parser/ tokenizer.ts # optional (if not using parser generator) parser.ts ast.ts sema/ scope.ts resolver.ts # name resolution + import/export checking validator.ts # row rules, arity, unplaced usage, etc. errors.ts ir/ ir.ts # adapter-neutral IR types lower.ts # AST -> IR functions.ts # builtin function registry -> IR IDs layout/ placer.ts # Option A semantics + cursor flow address.ts # A1 parsing + conversions sizes.ts # block size computation emit/ adapter.ts # adapter interface xlsx/ xlsxAdapter.ts # ExcelJS wrapper formula.ts # IR expr -> Excel formula string write.ts # write blocks (tables/cols/scalars) util/ graph.ts # dependency ordering if needed assert.ts test/ cases/ valid/ invalid/ runner.test.ts ``` ### B.2 Recommended dependencies - Parsing: choose one - `nearley` + custom lexer, or - `chevrotain` (solid TS parser toolkit), or - PEG (e.g. `peggy`) - XLSX writing: `exceljs` (practical for TS) ### B.3 Core pipeline 1. **Load modules** - `load(entryPath)` reads entry file and recursively resolves `import` and `from` imports. - Build `Module { path, text, ast, exports }`. 2. **Parse** - Parse each module into AST. - Fail with line/column errors. 3. **Build export table** - For each module: - collect sheets (alias → sheet decl) - process `export SheetAlias` - process `export a,b from SheetAlias` 4. **Resolve imports** - `import "x" as Data` creates namespace `Data` with exported sheets/members. - `from "x" import SheetAlias` imports that sheet alias into module scope. - `from "x" import a,b from SheetAlias` imports members into module scope. 5. **Semantic validation** - `row` only inside table row expr slots - unplaced refs cannot be referenced before placement - table row widths match - builtins exist and have correct arity/types - address strings validate (A1) - sheet alias required - exports refer to real sheet vars, and are placed by end (v0) 6. **Lower to IR** - Adapter-neutral IR: - Sheets contain a sequence of operations: - define unplaced ref - place existing ref - place inline value/block (from normal assignment) - Expressions become IR expression trees with symbol refs (not A1) 7. **Layout (Option A semantics)** - Walk each sheet’s ops with: - `cursor` (row, col) - `strategy` (below/right with gap) - a stack for nested placement blocks including `at` - For `at`: - save outer cursor+strategy context - set cursor to absolute - execute nested ops - on exit: set outer cursor to “after last placed in at” advanced using outer strategy 8. **Emit XLSX** - Convert IR expressions into Excel formulas with absolute refs (`$A$1`) where appropriate. - Write cell values and formulas with ExcelJS. ### B.4 IR design (minimal but sufficient) #### IR types (suggested) - `WorkbookIR { sheets: SheetIR[] }` - `SheetIR { title: string, alias: string, ops: OpIR[] }` `OpIR` variants: - `DefineUnplaced { name: string, block: BlockIR }` - `PlaceExisting { name: string, at?: AddressIR }` - `PlaceInline { bindName: string, block: BlockIR, at?: AddressIR }` - `EnterPlacement { kind: "at"|"below"|"right", at?: AddressIR, gap?: number }` - `ExitPlacement` `BlockIR`: - `ScalarBlock { expr: ExprIR }` - `ColumnBlock { values: ExprIR[] }` *(v0 stores literal list; computed columns can be expanded at layout time)* - `TableBlock { columns: string[], rows: ExprIR[][] }` `ExprIR`: - `Lit(number|string|boolean)` - `SymRef(symbolId)` *(resolved)* - `RowRef(columnName)` *(only inside table row exprs)* - `BinOp(op, left, right)` - `FuncCall(funcId, args[])` `funcId` enum: - `SUM`, `AVERAGE`, `MIN`, `MAX` ### B.5 Formula compilation (XLSX adapter) Implement `toExcelFormula(expr, ctx)` where `ctx` provides: - symbol → address/range - current table row base address (for `RowRef`) Rules: - Scalars: `=...` - `SymRef` resolves to a cell reference (`Sheet!$B$2`) or a range reference (`Sheet!$A$2:$A$5`) - `RowRef("Revenue")` resolves to the cell in the same table row at the Revenue column - `FuncCall(SUM, [range])` emits `SUM(<range>)` ### B.6 Placement sizes You must be able to compute `(width,height)` of blocks before writing: - Scalar: `1×1` - Column: `1×N` - Table: `C×(R+1)` where `C = len(columns)` and header row adds 1 - Column elementwise ops: - v0 easiest: expand `prices*qty` into a ColumnBlock of per-row formulas after both operands are placed (or during placement if operands are known) - enforce equal lengths at compile time when both sides are literal columns; otherwise error in v0 ### B.7 Tests: required fixtures Structure tests as “compile should succeed” and “compile should fail with message”. Include at minimum: - nesting `below` + `at` + resume semantics - unplaced ref misuse - export/import resolution - row usage rules - formula correctness for a table row (`row.Revenue - row.Costs`) - vector column ops and `sum` --- ## Appendix C — Valid/Invalid Examples (expanded) ### C.1 Valid: nested `below` with `at` (Option A) ```dsl sheet "Demo" as Demo: below gap=1: a = "A" at "C5": b = "B" c = "C" d = "D" ``` Expected placements (conceptually): - `a` at A1 - `b` at C5 - `c` at C6 - `d` at C7 (flow resumes below the last placed inside `at`) ### C.2 Valid: unplaced table, later placed ```dsl sheet "T" as T: lazy my_table = table(columns=["X","Y"], data=[[1,2],[3,4]]) at "B2": place my_table ``` ### C.3 Invalid: `row` outside table ```dsl sheet "Bad" as Bad: x = row.X ``` ### C.4 Invalid: mismatched row length ```dsl sheet "Bad" as Bad: t = table(columns=["A","B"], data=[[1]]) ``` ### C.5 Valid: unplaced ref used in formula, then placed ```dsl sheet "Bad" as Bad: lazy c = col([1,2,3]) s = sum(c) place c ``` ### C.6 Valid: place then sum ```dsl sheet "Ok" as Ok: lazy c = col([1,2,3]) at "A1": s = sum(c) place c ``` ### C.7 Valid: imports and member from-import **data.dsl** ```dsl sheet "Data" as DataSheet: at "B2": mrr = 42100 last_month_mrr = 39500 export DataSheet export mrr, last_month_mrr from DataSheet ``` **dash.dsl** ```dsl from "data.dsl" import mrr, last_month_mrr from DataSheet sheet "Dash" as Dash: at "B2": delta = mrr - last_month_mrr ``` ### C.8 Invalid: bare import without sheet qualifier ```dsl from "data.dsl" import mrr ``` --- ## Appendix D — End-to-end example (v0) **data.dsl** ```dsl sheet "Data" as DataSheet: at "B2": mrr = 42100 last_month_mrr = 39500 export DataSheet export mrr, last_month_mrr from DataSheet ``` **pnl.dsl** ```dsl sheet "P&L" as PnL: lazy pnl = table( columns=["Month","Revenue","Costs","Profit"], data=[ ["Jan", 12000, 7000, row.Revenue - row.Costs], ["Feb", 15000, 9000, row.Revenue - row.Costs], ["Mar", 18000, 9500, row.Revenue - row.Costs], ] ) at "A1": place pnl ``` **dashboard.dsl** ```dsl import "data.dsl" as Data sheet "Dashboard" as Dash: below gap=1: title = "MRR" at "B2": mrr = Data.DataSheet.mrr delta = Data.DataSheet.mrr - Data.DataSheet.last_month_mrr note = "Values update if Data changes" ```