A searchable, browsable archive of Alex Komoroske's Bits and Bobs weekly observations, built on Cloudflare Workers.
Live: bobbin.adewale-883.workers.dev
Komoroske publishes 30–70 standalone observations each week about AI, software, ecosystems, and technology. Bobbin ingests this content from Google Docs, parses it into individual observations, and provides:
- Hybrid search — FTS5 + Vectorize semantic search with
before:,after:,year:,topic:, and"exact phrase"operators - Archive browsing — grouped episode browsing plus topic pages that show how attention shifts across the corpus
- Source fidelity — chunk and episode detail pages render stored rich-content artifacts, footnotes, links, and images directly
- Shared editorial UI — home, episodes, topics, and design surfaces all reuse the same layout, section-heading, and rail patterns
- Local verification workflow — fixture seeding, browser-level checks, and computed-style audits for local development
git clone https://github.com/adewale/bobbin.git
cd bobbin
npm installCreate Cloudflare resources:
npx wrangler d1 create bobbin-db
npx wrangler vectorize create bobbin-chunks --dimensions 768 --metric cosineUpdate wrangler.jsonc with your database ID, then:
npx wrangler d1 migrations apply bobbin-db --local
npm run devFor browser-based local development, use the real app config and the canonical local fixture:
npm run fixture:local # seeds a full local corpus + rail demo into the local D1
npm run dev:9090 # starts the app on http://localhost:9090The fixture script prints a set of recommended URLs that exercise the main user-visible surfaces:
- home
- episodes index
- episode rail demo
- chunk detail and source-fidelity pages
- topics index and topic detail
- search
- design inventory
There is also a repeatable computed-style/browser audit for the current local app:
npm run audit:computedIf the local database is empty, the app will show an in-product setup hint that points back to npm run fixture:local.
For authenticated remote maintenance against the deployed worker, use:
BASE_URL="https://bobbin.adewale-883.workers.dev" \
ADMIN_SECRET="..." \
npm run maintenance:remote -- ingest-doc <doc-id> 100The same script supports refresh, enrich, finalize, purge-source, backfill-source, and backfill-llm.
Trusted-source policy:
- refresh and admin ingest only operate on doc IDs in the checked-in trusted source registry
- unknown doc IDs are rejected instead of being auto-registered
purge-sourceremoves an already-ingested source by doc ID when provenance audit finds contamination
All state-changing admin endpoints are POST-only and require the
Authorization: Bearer <ADMIN_SECRET> header; read-only endpoints
(/api/health, /api/pipeline-runs, /api/ingestion-log) stay GET.
Repeatable health check:
npm run health:productionThis combines:
- provenance audit of trusted sources
- invariant audit of derived corpus state
- Playwright smoke checks across key production pages
Production alerting and queue operations are documented in
docs/production-alerts.md (npm run alerts:production)
and docs/queue-dlq-replay.md.
Additional one-off maintenance scripts in scripts/ (run with npx tsx
for .ts, node for .mjs): repair-corpus-derived.ts repairs derived
corpus state after manual surgery, embed-missing-vectorize.mjs and
reindex-vectorize-metadata.mjs reconcile the Vectorize index,
compute-distinctiveness.ts recomputes word distinctiveness,
local-ingest.ts ingests a single cached doc locally, and
clone-live-topic-preview.mjs snapshots live topic pages for comparison.
Local browser runs, local pipeline runs, and Workers Vitest database bootstrap now all apply the same checked-in D1 migration chain. That keeps the test/local schema aligned with the real app schema, including FTS triggers, secondary indexes, and D1 hardening migrations.
Cloudflare Workers with Hono SSR. D1 for structured data, Vectorize for semantic search, Workers AI for embeddings, and Cloudflare Queues for background enrichment/finalization work.
Google Docs → fetch → parse → D1 + Vectorize
↓
Hono SSR → HTML
See docs/architecture.md for the full system design.
Current design, architecture, and search docs:
Forward-looking pipeline redesign/spec work:
Historical research, audits, and specs in docs/audit-*, docs/research-*, and specs/* are retained as background material rather than current source-of-truth documentation.
Extractor tuning and characterization notes:
Pipeline measurement and rollback helpers:
npm run audit:invariants # current D1 invariant metrics (local by default)
node scripts/compare-pipeline-baselines.mjs A.json B.json
npm run snapshot:rollback -- --remote --label before-redesignNotes:
snapshot:rollbackexports table-level data only. Restore assumes you re-apply migrations first.- Each rollback bundle includes a dependency-safe
restore.shand a manifestrestoreOrderlist. - Because
wrangler d1 exportdoes not currently support--persist-to, rollback-bundle export supports the default local D1 state or--remote, not arbitrary custom persisted local-state directories.
npm run typecheck # tsc --noEmit
npm test # workers-runtime Vitest suites
npm run test:real # node/runtime corpus and CSS invariant suites
npm run test:e2e:local # seed the local fixture, then run the Playwright suite against it
npm run test:e2e # Playwright suite; set BASE_URL to target a deployed environment
npm run test:visual # opt-in AI visual checks; requires AI_GATEWAY_API_KEY
npm run test:all # workers + node Vitest suitesThe default test and local bootstrap path uses the real migration files, not a handwritten test schema (the test helper derives the migration list from migrations/*.sql automatically). npm run test:all is the canonical non-visual verification pass; CI additionally runs the typecheck and the browser suite against a freshly seeded local fixture served by wrangler.e2e.jsonc.
| Operator | Example | Effect |
|---|---|---|
"..." |
"cognitive labor" |
Exact phrase match |
before: |
before:2025-06-01 |
Episodes before date |
after: |
after:2024-01-01 |
Episodes after date |
year: |
year:2025 |
Episodes from year |
topic: |
topic:claude-code |
Chunks assigned to a topic |
src/
db/ Typed D1 query boundary layer
routes/ Hono route handlers (SSR)
services/ Domain logic (search, tags, parsing)
components/ JSX components
jobs/ Ingestion pipeline (phased: fast insert + background enrichment)
lib/ Pure utilities
Released under the MIT License.