Skip to content

adewale/bobbin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

262 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bobbin

A searchable, browsable archive of Alex Komoroske's Bits and Bobs weekly observations, built on Cloudflare Workers.

Live: bobbin.adewale-883.workers.dev

What it does

Komoroske publishes 30–70 standalone observations each week about AI, software, ecosystems, and technology. Bobbin ingests this content from Google Docs, parses it into individual observations, and provides:

  • Hybrid search — FTS5 + Vectorize semantic search with before:, after:, year:, topic:, and "exact phrase" operators
  • Archive browsing — grouped episode browsing plus topic pages that show how attention shifts across the corpus
  • Source fidelity — chunk and episode detail pages render stored rich-content artifacts, footnotes, links, and images directly
  • Shared editorial UI — home, episodes, topics, and design surfaces all reuse the same layout, section-heading, and rail patterns
  • Local verification workflow — fixture seeding, browser-level checks, and computed-style audits for local development

Quick start

git clone https://github.com/adewale/bobbin.git
cd bobbin
npm install

Create Cloudflare resources:

npx wrangler d1 create bobbin-db
npx wrangler vectorize create bobbin-chunks --dimensions 768 --metric cosine

Update wrangler.jsonc with your database ID, then:

npx wrangler d1 migrations apply bobbin-db --local
npm run dev

Local development workflow

For browser-based local development, use the real app config and the canonical local fixture:

npm run fixture:local   # seeds a full local corpus + rail demo into the local D1
npm run dev:9090        # starts the app on http://localhost:9090

The fixture script prints a set of recommended URLs that exercise the main user-visible surfaces:

  • home
  • episodes index
  • episode rail demo
  • chunk detail and source-fidelity pages
  • topics index and topic detail
  • search
  • design inventory

There is also a repeatable computed-style/browser audit for the current local app:

npm run audit:computed

If the local database is empty, the app will show an in-product setup hint that points back to npm run fixture:local.

For authenticated remote maintenance against the deployed worker, use:

BASE_URL="https://bobbin.adewale-883.workers.dev" \
ADMIN_SECRET="..." \
npm run maintenance:remote -- ingest-doc <doc-id> 100

The same script supports refresh, enrich, finalize, purge-source, backfill-source, and backfill-llm.

Trusted-source policy:

  • refresh and admin ingest only operate on doc IDs in the checked-in trusted source registry
  • unknown doc IDs are rejected instead of being auto-registered
  • purge-source removes an already-ingested source by doc ID when provenance audit finds contamination

All state-changing admin endpoints are POST-only and require the Authorization: Bearer <ADMIN_SECRET> header; read-only endpoints (/api/health, /api/pipeline-runs, /api/ingestion-log) stay GET.

Repeatable health check:

npm run health:production

This combines:

  • provenance audit of trusted sources
  • invariant audit of derived corpus state
  • Playwright smoke checks across key production pages

Production alerting and queue operations are documented in docs/production-alerts.md (npm run alerts:production) and docs/queue-dlq-replay.md.

Additional one-off maintenance scripts in scripts/ (run with npx tsx for .ts, node for .mjs): repair-corpus-derived.ts repairs derived corpus state after manual surgery, embed-missing-vectorize.mjs and reindex-vectorize-metadata.mjs reconcile the Vectorize index, compute-distinctiveness.ts recomputes word distinctiveness, local-ingest.ts ingests a single cached doc locally, and clone-live-topic-preview.mjs snapshots live topic pages for comparison.

Local browser runs, local pipeline runs, and Workers Vitest database bootstrap now all apply the same checked-in D1 migration chain. That keeps the test/local schema aligned with the real app schema, including FTS triggers, secondary indexes, and D1 hardening migrations.

Architecture

Cloudflare Workers with Hono SSR. D1 for structured data, Vectorize for semantic search, Workers AI for embeddings, and Cloudflare Queues for background enrichment/finalization work.

Google Docs → fetch → parse → D1 + Vectorize
                                    ↓
                              Hono SSR → HTML

See docs/architecture.md for the full system design.

Current design, architecture, and search docs:

Forward-looking pipeline redesign/spec work:

Historical research, audits, and specs in docs/audit-*, docs/research-*, and specs/* are retained as background material rather than current source-of-truth documentation.

Extractor tuning and characterization notes:

Pipeline measurement and rollback helpers:

npm run audit:invariants                       # current D1 invariant metrics (local by default)
node scripts/compare-pipeline-baselines.mjs A.json B.json
npm run snapshot:rollback -- --remote --label before-redesign

Notes:

  • snapshot:rollback exports table-level data only. Restore assumes you re-apply migrations first.
  • Each rollback bundle includes a dependency-safe restore.sh and a manifest restoreOrder list.
  • Because wrangler d1 export does not currently support --persist-to, rollback-bundle export supports the default local D1 state or --remote, not arbitrary custom persisted local-state directories.

Testing

npm run typecheck     # tsc --noEmit
npm test              # workers-runtime Vitest suites
npm run test:real     # node/runtime corpus and CSS invariant suites
npm run test:e2e:local # seed the local fixture, then run the Playwright suite against it
npm run test:e2e      # Playwright suite; set BASE_URL to target a deployed environment
npm run test:visual   # opt-in AI visual checks; requires AI_GATEWAY_API_KEY
npm run test:all      # workers + node Vitest suites

The default test and local bootstrap path uses the real migration files, not a handwritten test schema (the test helper derives the migration list from migrations/*.sql automatically). npm run test:all is the canonical non-visual verification pass; CI additionally runs the typecheck and the browser suite against a freshly seeded local fixture served by wrangler.e2e.jsonc.

Search operators

Operator Example Effect
"..." "cognitive labor" Exact phrase match
before: before:2025-06-01 Episodes before date
after: after:2024-01-01 Episodes after date
year: year:2025 Episodes from year
topic: topic:claude-code Chunks assigned to a topic

Project structure

src/
  db/           Typed D1 query boundary layer
  routes/       Hono route handlers (SSR)
  services/     Domain logic (search, tags, parsing)
  components/   JSX components
  jobs/         Ingestion pipeline (phased: fast insert + background enrichment)
  lib/          Pure utilities

Licence

Released under the MIT License.

About

A searchable, browsable archive of Alex Komoroske's Bits and Bobs weekly observations, built on Cloudflare Workers.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages