# Spreadsheet DSL v0 Specification
A Python-style DSL for generating spreadsheets reliably (especially from LLMs). v0 targets **XLSX output**, but is designed around an **adapter-neutral IR** so other backends (e.g., Numbers, Google Sheets) can be added later.
## v0 goals
- Deterministic layout with **implicit flow** + **explicit anchors**
- “Formula-by-default” semantics (spreadsheet recalculation)
- Unplaced “refs” using `*`, placed later with `place`
- Sheet-qualified imports/exports (no ambiguous bare imports)
- Small set of adapter-neutral functions: `sum`, `average`, `min`, `max`
- Strict validation with actionable errors
---
# 1. Program and Compilation Model
## 1.1 Files are modules (no module header)
Every `.dsl` file is an implicit module. There is **no module declaration**.
## 1.2 Whole-program compilation (required)
Given an entry file:
1. Parse entry file → discover imports
2. Parse all imported files recursively
3. Build export tables for each module
4. Resolve imports in the entry module
5. Lower AST → adapter-neutral workbook IR
6. Layout: place blocks and resolve addresses
7. Emit via adapter (v0: XLSX)
**Line order affects only placement flow** (cursor movement) inside a sheet. It must not affect name resolution (imports/exports).
---
# 2. Lexical and Naming Rules
## 2.1 Identifiers
Identifiers match:
- `NAME := [A-Za-z_][A-Za-z0-9_]*`
## 2.2 Strings and numbers
- Strings: double-quoted only (`"..."`)
- Numbers: signed/unsigned, integer/float
## 2.3 Cell addresses
Cell addresses are strings, validated at compile time:
- `"A1"`, `"B2"`, `"AA10"` etc.
(v0 does **not** support `"B"` column-only anchors.)
---
# 3. Scopes and Symbols
## 3.1 Scopes
| Scope | Contains | Notes |
|------|----------|------|
| Module | imports, sheets, exports | Import aliases live here |
| Sheet | sheet variables | Variables bind to blocks or scalars |
| Table-row | `row` pseudo-variable | Only valid inside table row formula expressions |
## 3.2 Symbol states (important for correctness)
Every symbol in a sheet is one of:
- **Placed**: has a concrete cell/range address
- **Unplaced ref**: declared with `*`, not yet placed
- **Non-placeable value**: (v0 mostly avoids this)
Rules:
- Referencing an **unplaced ref** in an expression is an error (it has no address).
- An unplaced ref may be **placed exactly once** in v0.
---
# 4. Workbooks and Sheets
## 4.1 Workbook block (optional)
```dsl
workbook "Finance Pack":
sheet "Data" as DataSheet:
...
sheet "Dashboard" as Dash:
...
```
If `workbook` is omitted, sheets still form an implicit workbook.
## 4.2 Sheet declaration (required alias)
```dsl
sheet "Dashboard" as Dash:
...
```
- Sheet title is user-facing name in Excel
- Sheet alias is stable identifier for imports/exports and compiler internals
---
# 5. Placement, Layout, and Cursor Semantics
The language places **blocks** (cells, columns, tables). Each sheet maintains a **cursor**.
## 5.1 Cursor
- Initial cursor is `A1`
- Cursor position is maintained per sheet
## 5.2 Placement strategies
A placement strategy determines how the cursor advances after placing a block:
- **below**: move down by `(block.height + gap)`
- **right**: move right by `(block.width + gap)`
Default strategy in a plain sheet body: **below gap=1**
## 5.3 Placement contexts
### `at "B2":`
Temporarily sets the cursor to an absolute address **for the duration of the block**.
```dsl
at "B2":
title = "MRR"
mrr = 42100
```
### `below gap=1:`
Sets strategy to *below* within the block.
```dsl
below gap=2:
a = "One"
b = "Two"
```
### `right gap=1:`
Sets strategy to *right* within the block.
```dsl
right gap=1:
a = "Left"
b = "Right"
```
## 5.4 Option A semantics (selected): `at` is a temporary teleport, flow resumes
This resolves the ambiguity you pointed out.
**Rule:**
1. `below/right` define the **active placement strategy** within their block.
2. `at` sets the cursor **absolutely** for its nested block.
3. When an `at` block ends, the cursor becomes the position **immediately after the last placement inside `at`**, advanced using the **surrounding strategy**.
### Example: meaningful nesting
```dsl
sheet "S" as S:
below gap=1:
a = "Top" # A1
at "B6":
b = "Here" # B6
c = "Below Here" # B7
d = "Continues" # B8 (below c, because surrounding strategy is below)
```
If an `at` block places nothing, it has no effect other than validation.
---
# 6. Binding: Placed vs Unplaced (`*`) and `place`
## 6.1 Placed binding (default)
In a sheet or placement block, `x = <placeable>` places the object at the current cursor.
```dsl
sheet "S" as S:
x = 10 # writes literal 10 at A1
y = x + 5 # writes formula at A2
```
## 6.2 Virtual binding
`lazy name = <placeable>` defines an **unplaced ref**; nothing is written yet.
```dsl
sheet "S" as S:
lazy t = table(columns=["A"], data=[[1]])
at "D4":
place t
```
All lazy refs must eventually be placed.
## 6.3 `place` statement
Places an unplaced ref at:
- current cursor, or
- explicit address if provided
```dsl
place t
place t at "D4"
at "D4":
place t
```
### v0 placement constraints
- You can `place` an unplaced ref **once** (second placement is an error).
- `place` on already placed symbols is an error.
---
# 7. Values, Expressions, and Formula Semantics
## 7.1 Literals
- Numbers: `12000`, `-3.5`
- Strings: `"Jan"`
- Booleans: `true`, `false`
- Lists: `[1,2,3]`, nested lists `[[...],[...]]` (only meaningful inside `col()` or `table()`)
## 7.2 Formula-by-default
Expressions produce spreadsheet formulas unless they’re pure literals (which can be written as constants).
```dsl
profit = revenue - costs
```
This writes an Excel formula referencing the placed cells for `revenue` and `costs`.
## 7.3 Compile-time evaluation
`const expr` forces compile-time evaluation and writes the literal result.
```dsl
const x = 2 + 2 # writes 4
```
v0 rules:
- `const` only accepts expressions composed solely of literals and operators (`+ - * /`).
- `const` cannot reference sheet symbols, `row.*`, `sum`, etc.
## 7.4 Operators (v0)
- `+ - * /`
- unary `-`
- parentheses `( ... )`
Comparisons, boolean operators, conditionals are out of v0.
---
# 8. Adapter-neutral Functions (v0 common subset)
v0 supports only common spreadsheet functions with consistent semantics:
- `sum(x)`
- `average(x)`
- `min(x)`
- `max(x)`
### What `x` may be (v0)
- a placed `col(...)` range
(Referencing table columns as ranges can be added in v1.)
### Adapter neutrality
The compiler lowers function calls to IR function IDs:
- `SUM`, `AVERAGE`, `MIN`, `MAX`
The adapter maps these IDs to concrete formula syntax. v0 ships only XLSX mapping (identical names), but the IR is backend-independent.
---
# 9. Columns
## 9.1 Column constructor
```dsl
lazy prices = col([10, 12, 11, 14])
lazy qty = col([ 2, 1, 3, 2])
```
`col([..])` is a **placeable block** that becomes a vertical range when placed.
## 9.2 Elementwise operations
If both operands are placed columns of equal length, operations are elementwise:
```dsl
line_total = prices * qty
```
### v0 constraint
- Column-vector ops require both sides to be columns of equal length (or column × scalar). Anything else is an error.
## 9.3 Aggregations
Aggregation functions require placed ranges:
```dsl
at "A1":
place line_total
total = sum(line_total)
```
---
# 10. Tables
## 10.1 Table constructor
```dsl
pnl = table(
columns=["Month","Revenue","Costs","Profit"],
data=[
["Jan", 12000, 7000, row.Revenue - row.Costs],
["Feb", 15000, 9000, row.Revenue - row.Costs],
]
)
```
## 10.2 Rules
- `columns` is required
- column names must be valid identifiers (v0 restriction)
- `data` is required: list of rows
- each row length must equal `len(columns)`
- `row.<ColumnName>` only valid inside expressions in `data` rows
- referencing a column that doesn’t exist is an error
## 10.3 Placement output
When placed, the table writes:
- header row: column names
- data rows: literals and/or formulas
`row.Revenue` compiles to the cell in the current row under column `Revenue`.
---
# 11. Imports and Exports (Sheet-qualified)
A module can export:
- a whole sheet alias
- specific members from a sheet alias
## 11.1 Exports
### Export whole sheet
```dsl
export DataSheet
```
### Export members from a sheet
```dsl
export mrr, last_month_mrr from DataSheet
```
Rules:
- `DataSheet` must be a sheet alias defined in this file
- exported members must exist in that sheet scope
- exporting a name that is an unplaced ref is allowed only if it is placed before compilation completes (v0: treat exporting unplaced as error unless placed)
## 11.2 Imports
### Namespace import
```dsl
import "path/to/data.dsl" as Data
```
Access:
```dsl
Data.DataSheet.mrr
```
### From-import (sheet)
```dsl
from "path/to/data.dsl" import DataSheet
```
Access:
```dsl
DataSheet.mrr
```
### From-import (sheet with alias)
```dsl
from "path/to/data.dsl" import DataSheet as Table
```
Access:
```dsl
Table.mrr
```
### From-import (members from a sheet)
```dsl
from "path/to/data.dsl" import mrr, last_month_mrr from DataSheet
```
Access:
```dsl
mrr - last_month_mrr
```
### From-import with alias
```dsl
from "path/to/data.dsl" import mrr as current_mrr from DataSheet
```
## 11.3 Disallowed in v0
Bare member imports without sheet:
```dsl
from "data.dsl" import mrr
```
This must be a compile-time error.
---
# 12. Validation and Error Cases (v0 must be strict)
## 12.1 Invalid: `row` outside table
```dsl
sheet "S" as S:
x = row.Revenue
```
Error: `row` is only valid inside `table(... data=[...])` row expressions.
## 12.2 Invalid: placing raw list
```dsl
sheet "S" as S:
at "A1":
place [1,2,3]
```
Error: list literal is not a placeable block; use `col([..])` or `table(...)`.
## 12.3 Invalid: using unplaced ref in expression
```dsl
sheet "S" as S:
lazy c = col([1,2,3])
x = sum(c)
```
Error: `c` is unplaced; It must be placed at some point in the sheet. It can be referenced before being placed.
## 12.4 Invalid: placing an unplaced ref twice
```dsl
sheet "S" as S:
lazy t = table(columns=["A"], data=[[1]])
at "A1": place t
at "A10": place t
```
Error: unplaced ref `t` already placed.
## 12.5 Invalid: table row length mismatch
```dsl
table(columns=["A","B"], data=[[1]])
```
Error: row has 1 cells, expected 2.
---
# 13. Adapter Model (v0: XLSX only)
The compiler produces an **adapter-neutral IR** and then calls an adapter to write a file.
v0 ships:
- XLSX adapter only
The codebase must be structured so new adapters can be added without changing the frontend (parser/sema/lower/layout).
---
## Appendix A — Reference Lark Grammar (v0)
This is a reference grammar. The actual compiler will be in TypeScript; it should conform to the same syntax rules.
Save as `spreadsheet_dsl_v0.lark`:
```lark
// Spreadsheet DSL v0 reference grammar
?start: program
program: (stmt NEWLINE)* stmt?
?stmt: workbook_decl
| sheet_decl
| export_stmt
| import_stmt
| from_import_stmt
workbook_decl: "workbook" STRING ":" NEWLINE INDENT (sheet_decl NEWLINE)* sheet_decl? DEDENT
sheet_decl: "sheet" STRING "as" NAME ":" NEWLINE INDENT (sheet_stmt NEWLINE)* sheet_stmt? DEDENT
?sheet_stmt: placement_block
| assign_stmt
| place_stmt
placement_block: ("at" STRING
| "below" gap_opt?
| "right" gap_opt?
) ":" NEWLINE INDENT (sheet_stmt NEWLINE)* sheet_stmt? DEDENT
gap_opt: "gap" "=" NUMBER
assign_stmt: "lazy" NAME "=" expr -> assign_unplaced
| NAME "=" expr -> assign_placed
place_stmt: "place" NAME ("at" STRING)?
export_stmt: "export" NAME -> export_sheet
| "export" name_list "from" NAME -> export_members
import_stmt: "import" STRING "as" NAME
from_import_stmt: "from" STRING "import" NAME -> from_import_sheet
| "from" STRING "import" name_list "from" NAME -> from_import_members
| "from" STRING "import" NAME "as" NAME "from" NAME -> from_import_member_alias
name_list: NAME ("," NAME)*
?expr: term
| expr "+" term -> add
| expr "-" term -> sub
?term: factor
| term "*" factor -> mul
| term "/" factor -> div
?factor: atom
| "-" factor -> neg
?atom: literal
| NAME -> var
| attr
| call
| list
| "(" expr ")"
attr: NAME "." NAME -> attr
call: NAME "(" [args] ")"
args: expr ("," expr)*
list: "[" [list_items] "]"
list_items: expr ("," expr)*
literal: NUMBER -> number
| STRING -> string
| "true" -> true
| "false" -> false
%import common.CNAME -> NAME
%import common.SIGNED_NUMBER -> NUMBER
%import common.ESCAPED_STRING -> STRING
%import common.NEWLINE
%declare INDENT DEDENT
```
Notes:
- This grammar doesn’t encode Option A cursor semantics; those are implemented in the layout engine.
- `call` allows any function name; semantic analysis must restrict to allowed built-ins in v0 (`sum`, `average`, `min`, `max`, `col`, `table`, `calc`).
---
## Appendix B — TypeScript Implementation Guide (v0)
### B.1 Repo layout
```text
dsl-compiler/
package.json
tsconfig.json
src/
index.ts
cli/
main.ts
fs/
loader.ts # resolves imports, reads files
path.ts
parser/
tokenizer.ts # optional (if not using parser generator)
parser.ts
ast.ts
sema/
scope.ts
resolver.ts # name resolution + import/export checking
validator.ts # row rules, arity, unplaced usage, etc.
errors.ts
ir/
ir.ts # adapter-neutral IR types
lower.ts # AST -> IR
functions.ts # builtin function registry -> IR IDs
layout/
placer.ts # Option A semantics + cursor flow
address.ts # A1 parsing + conversions
sizes.ts # block size computation
emit/
adapter.ts # adapter interface
xlsx/
xlsxAdapter.ts # ExcelJS wrapper
formula.ts # IR expr -> Excel formula string
write.ts # write blocks (tables/cols/scalars)
util/
graph.ts # dependency ordering if needed
assert.ts
test/
cases/
valid/
invalid/
runner.test.ts
```
### B.2 Recommended dependencies
- Parsing: choose one
- `nearley` + custom lexer, or
- `chevrotain` (solid TS parser toolkit), or
- PEG (e.g. `peggy`)
- XLSX writing: `exceljs` (practical for TS)
### B.3 Core pipeline
1. **Load modules**
- `load(entryPath)` reads entry file and recursively resolves `import` and `from` imports.
- Build `Module { path, text, ast, exports }`.
2. **Parse**
- Parse each module into AST.
- Fail with line/column errors.
3. **Build export table**
- For each module:
- collect sheets (alias → sheet decl)
- process `export SheetAlias`
- process `export a,b from SheetAlias`
4. **Resolve imports**
- `import "x" as Data` creates namespace `Data` with exported sheets/members.
- `from "x" import SheetAlias` imports that sheet alias into module scope.
- `from "x" import a,b from SheetAlias` imports members into module scope.
5. **Semantic validation**
- `row` only inside table row expr slots
- unplaced refs cannot be referenced before placement
- table row widths match
- builtins exist and have correct arity/types
- address strings validate (A1)
- sheet alias required
- exports refer to real sheet vars, and are placed by end (v0)
6. **Lower to IR**
- Adapter-neutral IR:
- Sheets contain a sequence of operations:
- define unplaced ref
- place existing ref
- place inline value/block (from normal assignment)
- Expressions become IR expression trees with symbol refs (not A1)
7. **Layout (Option A semantics)**
- Walk each sheet’s ops with:
- `cursor` (row, col)
- `strategy` (below/right with gap)
- a stack for nested placement blocks including `at`
- For `at`:
- save outer cursor+strategy context
- set cursor to absolute
- execute nested ops
- on exit: set outer cursor to “after last placed in at” advanced using outer strategy
8. **Emit XLSX**
- Convert IR expressions into Excel formulas with absolute refs (`$A$1`) where appropriate.
- Write cell values and formulas with ExcelJS.
### B.4 IR design (minimal but sufficient)
#### IR types (suggested)
- `WorkbookIR { sheets: SheetIR[] }`
- `SheetIR { title: string, alias: string, ops: OpIR[] }`
`OpIR` variants:
- `DefineUnplaced { name: string, block: BlockIR }`
- `PlaceExisting { name: string, at?: AddressIR }`
- `PlaceInline { bindName: string, block: BlockIR, at?: AddressIR }`
- `EnterPlacement { kind: "at"|"below"|"right", at?: AddressIR, gap?: number }`
- `ExitPlacement`
`BlockIR`:
- `ScalarBlock { expr: ExprIR }`
- `ColumnBlock { values: ExprIR[] }` *(v0 stores literal list; computed columns can be expanded at layout time)*
- `TableBlock { columns: string[], rows: ExprIR[][] }`
`ExprIR`:
- `Lit(number|string|boolean)`
- `SymRef(symbolId)` *(resolved)*
- `RowRef(columnName)` *(only inside table row exprs)*
- `BinOp(op, left, right)`
- `FuncCall(funcId, args[])`
`funcId` enum:
- `SUM`, `AVERAGE`, `MIN`, `MAX`
### B.5 Formula compilation (XLSX adapter)
Implement `toExcelFormula(expr, ctx)` where `ctx` provides:
- symbol → address/range
- current table row base address (for `RowRef`)
Rules:
- Scalars: `=...`
- `SymRef` resolves to a cell reference (`Sheet!$B$2`) or a range reference (`Sheet!$A$2:$A$5`)
- `RowRef("Revenue")` resolves to the cell in the same table row at the Revenue column
- `FuncCall(SUM, [range])` emits `SUM(<range>)`
### B.6 Placement sizes
You must be able to compute `(width,height)` of blocks before writing:
- Scalar: `1×1`
- Column: `1×N`
- Table: `C×(R+1)` where `C = len(columns)` and header row adds 1
- Column elementwise ops:
- v0 easiest: expand `prices*qty` into a ColumnBlock of per-row formulas after both operands are placed (or during placement if operands are known)
- enforce equal lengths at compile time when both sides are literal columns; otherwise error in v0
### B.7 Tests: required fixtures
Structure tests as “compile should succeed” and “compile should fail with message”.
Include at minimum:
- nesting `below` + `at` + resume semantics
- unplaced ref misuse
- export/import resolution
- row usage rules
- formula correctness for a table row (`row.Revenue - row.Costs`)
- vector column ops and `sum`
---
## Appendix C — Valid/Invalid Examples (expanded)
### C.1 Valid: nested `below` with `at` (Option A)
```dsl
sheet "Demo" as Demo:
below gap=1:
a = "A"
at "C5":
b = "B"
c = "C"
d = "D"
```
Expected placements (conceptually):
- `a` at A1
- `b` at C5
- `c` at C6
- `d` at C7 (flow resumes below the last placed inside `at`)
### C.2 Valid: unplaced table, later placed
```dsl
sheet "T" as T:
lazy my_table = table(columns=["X","Y"], data=[[1,2],[3,4]])
at "B2":
place my_table
```
### C.3 Invalid: `row` outside table
```dsl
sheet "Bad" as Bad:
x = row.X
```
### C.4 Invalid: mismatched row length
```dsl
sheet "Bad" as Bad:
t = table(columns=["A","B"], data=[[1]])
```
### C.5 Valid: unplaced ref used in formula, then placed
```dsl
sheet "Bad" as Bad:
lazy c = col([1,2,3])
s = sum(c)
place c
```
### C.6 Valid: place then sum
```dsl
sheet "Ok" as Ok:
lazy c = col([1,2,3])
at "A1":
s = sum(c)
place c
```
### C.7 Valid: imports and member from-import
**data.dsl**
```dsl
sheet "Data" as DataSheet:
at "B2":
mrr = 42100
last_month_mrr = 39500
export DataSheet
export mrr, last_month_mrr from DataSheet
```
**dash.dsl**
```dsl
from "data.dsl" import mrr, last_month_mrr from DataSheet
sheet "Dash" as Dash:
at "B2":
delta = mrr - last_month_mrr
```
### C.8 Invalid: bare import without sheet qualifier
```dsl
from "data.dsl" import mrr
```
---
## Appendix D — End-to-end example (v0)
**data.dsl**
```dsl
sheet "Data" as DataSheet:
at "B2":
mrr = 42100
last_month_mrr = 39500
export DataSheet
export mrr, last_month_mrr from DataSheet
```
**pnl.dsl**
```dsl
sheet "P&L" as PnL:
lazy pnl = table(
columns=["Month","Revenue","Costs","Profit"],
data=[
["Jan", 12000, 7000, row.Revenue - row.Costs],
["Feb", 15000, 9000, row.Revenue - row.Costs],
["Mar", 18000, 9500, row.Revenue - row.Costs],
]
)
at "A1":
place pnl
```
**dashboard.dsl**
```dsl
import "data.dsl" as Data
sheet "Dashboard" as Dash:
below gap=1:
title = "MRR"
at "B2":
mrr = Data.DataSheet.mrr
delta = Data.DataSheet.mrr - Data.DataSheet.last_month_mrr
note = "Values update if Data changes"
```