Skip to content

Latest commit

 

History

History
162 lines (130 loc) · 6.29 KB

File metadata and controls

162 lines (130 loc) · 6.29 KB
name pseudopy
version 1.0.0
description Compact Python pseudocode using unicode math symbols for explaining algorithms. Teaches AI assistants to write pseudocode with Greek letters, combining diacritics, and mathematical notation. Includes a transform script to convert runnable Python (with jaxtyping + beartype) into clean pseudocode for papers and slides.

Python Unicode Pseudocode

Compact Python pseudocode for explaining algorithms. Clarity over correctness.

For ML researchers in markdown, comments, and chat. Not for formal papers (use typst lovelace/algo packages) or teaching non-programmers (use traditional pseudocode with IF...THEN...ENDIF).

Principles

Pseudocode is "a mix of programming language conventions with compact mathematical notation, intended for human reading rather than machine control." No broad standard exists. This is mathematical style pseudocode (a.k.a. pidgin Python).

  • For humans, not machines -- not runnable, not compiled, omits implementation details
  • Programming language + math notation -- Python control flow (for, def, @) mixed with unicode math (θ ← θ - α * ∇ℒ)
  • Match the paper -- variable names and notation should look like the equations they implement
  • Skip boilerplate, keep structure -- no imports/.to(device), keep the algorithm shape

Rules

  • Python syntax (def, for, slicing, @) but not runnable
  • Greek unicode for math variables: ε θ λ μ σ δ Σ ∇
  • for conceptual assignment, = for definitions
  • Shapes in trailing comments: # x ∈ ℝ^{b×d}
  • # ── section dividers for algorithm steps
  • Short aliases: svd, sol, norm, inv, eig
  • Skip: imports, boilerplate, .to(device), error handling
  • ... for obvious/boring parts
  • Few comments; only when intent is non-obvious
  • Short names: single-letter dims b s h d, suffixes for spaces _proj _hat
  • x̂ ŷ W̃ (combining diacritics) are valid Python identifiers — use them

The whole point is leveraging Python's readability -- for, if, while, def already read like pseudocode. No need to invent a parallel comment language.

Not traditional pseudocode

Traditional pseudocode (VCAA, CLRS, textbooks) uses IF...THEN...ELSE...ENDIF, WHILE...DO...ENDWHILE, SET x TO 5, PRINT. That's for audiences who don't know a programming language. We skip all that because Python's own syntax is already more readable than those keywords.

Invalid unicode to avoid

  • Subscript digits ₁₂₃ — not valid identifiers, use x[i] or x_1
  • — use sqrt() or **0.5
  • ½ ⅓ — use 0.5, 1/3
  • Fancy arrows → ⟶ in code — use for assignment, # comments for the rest

Examples

# ── Forward ─────────────────
def predict(x, θ):
    z = encode(x)              # z ∈ ℝ^{b×d}
    ŷ = softmax(z @ θ.W + θ.b)
    return ŷ

# ── Loss ────────────────────
 = -mean(log(ŷ[y]))          # cross-entropy
g = ∇(, θ)
θθ - α * g

for batch in batches:
    A @ (B + δ * ΔB)
def svd_steer(W, δ):
    U, s, Vt = svd(W)         # W ∈ ℝ^{m×n}
    R = init_rotation(r)       # R ∈ SO(r)
    A, B = U @ R, R.T @ diag(s) @ Vt
    A @ (B + δ * ΔB)     # δ ∈ {-1,+1}
    return 
def kl(p, q):  return sum(p * log(p / q))
def H(p):      return -sum(p * log(p))

Advanced: executable pseudocode

Since our pseudocode is just Python with details removed, you can write runnable Python using unicode notation, then strip it down to pseudocode automatically. Benefits:

  • The pseudocode is always correct (it ran)
  • You can lint, test, and type-check the full version
  • Two views of the same code: full (executable) and pseudocode (readable)

Conventions for the full version

  • Use unicode variable names from the start: θ, , ŷ, , ΔB
  • Use jaxtyping for verifiable shape annotations: W: Float[Tensor, "m n"]
  • Add # ── section dividers
  • Use explicit imports (from torch.linalg import svd) so code already has short names
  • Use comment markers to control the transform:
    • # hide -- drop this line or block (works on def, for, if, etc.)
    • # to `R = init_rotation(r)` -- replace line with custom pseudocode

Example: full runnable Python

import torch
from torch import Tensor, randn_like, diag
from torch.linalg import svd
from torch.nn.init import orthogonal_
from jaxtyping import Float, jaxtyped
from beartype import beartype

def init_rotation(r: int) -> Tensor:  # hide
    """Helper: random orthogonal matrix."""
    return orthogonal_(torch.empty(r, r))

# ── SVD Steering ────────────────────
@jaxtyped(typechecker=beartype)
def svd_steer(
    W: Float[Tensor, "m n"],
    δ: float,
    r: int = 4,
) -> Float[Tensor, "m n"]:
    """Steer weight matrix W by rotating its SVD basis."""
    U, s, Vt = svd(W, full_matrices=False)
    R = init_rotation(r)
    A = U @ R
    B = R.T @ diag(s) @ Vt
    ΔB = randn_like(B)
     = A @ (B + δ * ΔB)
    return 

Transform to pseudocode

python to_pseudocode.py example_full.py auto-strips imports, annotations, defaults, decorators, asserts, docstrings, __main__ blocks, and implementation kwargs. Collapses multi-line def to one line. Extracts jaxtyping dims into inline comments. User-controlled markers:

  • # hide on a line: drop that line (if on def/for/if/while, drops entire block)
  • # to `pseudocode`: replace the line with the backtick-quoted content

Output:

# ── SVD Steering ────────────────────
def svd_steer(W, δ, r):  # W: m×n, -> m×n
    U, s, Vt = svd(W)
    R = init_rotation(r)
    A = U @ R
    B = R.T @ diag(s) @ Vt
    ΔB = randn_like(B)
     = A @ (B + δ * ΔB)
    return 

Linting the full version

The full Python file should pass normal linting. Unicode identifiers are valid Python 3:

# check syntax
python -c "import ast; ast.parse(open('example_full.py').read())"
# run ruff
ruff check example_full.py --ignore F401,F811
# run as test (verifies shapes at runtime)
python example_full.py