Skip to content

improvement(mothership): user_table speed parity — limit bounds, keyset paging, background import/delete jobs#5012

Open
TheodoreSpeaks wants to merge 1 commit into
stagingfrom
improvement/table-mothership-speed
Open

improvement(mothership): user_table speed parity — limit bounds, keyset paging, background import/delete jobs#5012
TheodoreSpeaks wants to merge 1 commit into
stagingfrom
improvement/table-mothership-speed

Conversation

@TheodoreSpeaks

@TheodoreSpeaks TheodoreSpeaks commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Context

Mothership's user_table tool lagged the fast/safe paths the tables feature already has. This PR lands three of the four findings from the audit; the fourth (the function_execute large-table CSV mount) is deferred — see note at the bottom.

Changes

C — limit bounds. query_rows took an unbounded limit (whole table into one tool result) and the filter ops took unbounded limits (every matching row's id+data into memory). Now rejected above MAX_QUERY_LIMIT (1000) / MAX_BULK_OPERATION_SIZE (1000) with the contract's message text.

D — query_rows waste + paging. Dropped the always-on executions-metadata load (withExecutions: false), and added keyset pagination: accepts an after cursor, returns nextCursor when a default-order page fills (rejects after+sort). Offset paging was O(n²) walking a big table.

E — async job parity for imports/deletes. import_file / create_from_file (CSV/TSV ≥8MB) and delete_rows_by_filter (>1000 matches, no explicit limit) now dispatch the same trigger.dev jobs the UI routes use, claiming the per-table job slot so they can't race a running background job. Smaller imports stay inline but claim/release the slot. The model polls progress via query_rows (rows appear as an import lands; the delete mask makes query_rows reflect the post-delete view immediately) — no dedicated job-status op. TableImportPayload.deleteSourceFile flag (default true) so Mothership imports of persistent workspace files don't delete the source.

Tests

user-table.test.ts (limit clamps, keyset after/nextCursor, after+sort rejection, slot claim/release, async dispatch payloads) and import-runner.test.ts (source-file cleanup default vs deleteSourceFile: false). bun run lint:check, type-check, and check:api-validation green.

Deferred (not in this PR)

  • function_execute large-table mount (finding B): the keyset CSV drain fixed the 100-row truncation but materialized the whole table in the web-container heap before the 50MB check, risking OOM on large/enterprise tables. Backed out here (mount stays at main's 100-row behavior); proper fix — incremental byte-bound + filter/limit/columns selection, or by-reference signed-URL fetch — is a separate effort.

Model-facing docs

Catalog change documenting after / limit maxes / background behavior is in simstudioai/copilot#312.

🤖 Generated with Claude Code

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

@vercel

vercel Bot commented Jun 12, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Jun 16, 2026 11:22pm

Request Review

@cursor

cursor Bot commented Jun 12, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Changes bulk delete/import routing and concurrency on shared table job slots; mis-dispatch could affect large-table operations but mirrors existing UI paths with new tests.

Overview
Brings Mothership’s user_table tool in line with the tables product: tighter limits, cheaper reads, and the same background jobs the UI uses for heavy writes.

Limits. query_rows and bulk filter ops now reject limit above 1000 (matching TABLE_LIMITS). Copilot catalog/runtime schemas document the caps and the new after cursor for paging.

query_rows. Queries run with withExecutions: false, accept an after keyset cursor (mutually exclusive with sort), and return nextCursor when a default-order page is full so callers can avoid deep offset scans.

Imports & deletes. CSV/TSV at or above the 8MB async threshold, plus unbounded delete_rows_by_filter when matches exceed 1000, claim the per-table job slot and dispatch trigger.dev (or detached) import/delete workers—same pattern as the HTTP async routes. Smaller inline imports still claim/release the slot. TableImportPayload.deleteSourceFile defaults to deleting single-use uploads but stays false for persistent workspace files so Mothership doesn’t remove the source.

Generated tool-schemas-v1 keys are normalized to quoted form; behavior change is concentrated in user_table and import-runner.

Reviewed by Cursor Bugbot for commit 59efc85. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread apps/sim/app/api/table/[tableId]/rows/route.ts
@greptile-apps

greptile-apps Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR brings the Mothership user_table tool to parity with the main tables feature by enforcing per-operation row-count limits, adding keyset pagination via an after cursor, and routing large CSV imports and large bulk deletes through the same trigger.dev background jobs that the UI uses (with proper job-slot claiming). The import-runner also gains a deleteSourceFile flag so persistent workspace files aren't deleted after a Mothership-triggered import.

  • Limit bounds (C): query_rows now rejects limits above 1000; update_rows_by_filter and delete_rows_by_filter enforce the same cap.
  • Keyset paging (D): query_rows accepts an after: { orderKey, id } cursor (rejects after + sort), returns nextCursor when a full page lands, and drops the always-on execution-metadata join.
  • Async job parity (E): CSV/TSV files ≥ 8 MB in import_file / create_from_file are dispatched to the background import worker; delete_rows_by_filter counts matches first and dispatches the background delete worker for > 1000 hits, with slot-conflict detection in both paths.

Confidence Score: 5/5

Safe to merge. The job-slot lifecycle is correct across all paths — inline imports use try/finally, async dispatches release the claim on trigger failure, and the background delete path measures blast radius before committing.

The slot-claiming logic is well-tested and correct across all paths. The two observations (after+offset not rejected explicitly, background mapping validation bypass) are API-surface notes without data-correctness implications.

apps/sim/lib/copilot/tools/server/table/user-table.ts — the after+offset and background mapping validation asymmetries are both in this file.

Important Files Changed

Filename Overview
apps/sim/lib/copilot/tools/server/table/user-table.ts Major additions: limit validation for query/bulk ops, keyset cursor (after/nextCursor), background dispatch for large imports and large filter-deletes with job-slot claiming. Slot lifecycle handled correctly via try/finally for inline imports and releaseJobClaim-on-dispatch-failure for async paths.
apps/sim/lib/table/import-runner.ts Adds optional deleteSourceFile flag (defaults to true for backward compatibility) guarding the post-import cleanup call. Minimal, well-scoped change with correct default behavior.
apps/sim/lib/copilot/tools/server/table/user-table.test.ts Good new coverage: limit clamp rejections, keyset after/nextCursor, after+sort rejection, slot claim/release for inline imports, slot contention rejection, and async dispatch payload assertions.
apps/sim/lib/table/import-runner.test.ts New test file covering the deleteSourceFile flag: verifies file deletion happens by default and is skipped when deleteSourceFile=false.
.github/workflows/migrations.yml Wraps db:push in a subshell to capture output, detects the TTY-required rename/drop prompt and emits a structured GHA error with actionable remediation steps.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Model
    participant UserTableTool
    participant JobsService
    participant TriggerDev
    participant Worker

    Note over Model,Worker: Large CSV import (≥8 MB)
    Model->>UserTableTool: import_file / create_from_file
    UserTableTool->>JobsService: markTableJobRunning(tableId, jobId, 'import')
    alt slot free
        JobsService-->>UserTableTool: true
        alt trigger.dev enabled
            UserTableTool->>TriggerDev: tasks.trigger('table-import', payload)
            TriggerDev-->>UserTableTool: ok
            UserTableTool-->>Model: "{success:true, jobId}"
            TriggerDev->>Worker: runTableImport(payload)
        else local dev
            UserTableTool->>Worker: runDetached → runTableImport(payload)
            UserTableTool-->>Model: "{success:true, jobId}"
        end
        Worker->>JobsService: markJobReady / markJobFailed
    else slot held
        JobsService-->>UserTableTool: false
        UserTableTool-->>Model: "{success:false, job in progress}"
    end

    Note over Model,Worker: Large delete (unbounded, >1000 matches)
    Model->>UserTableTool: delete_rows_by_filter (no limit)
    UserTableTool->>UserTableTool: "queryRows(filter, limit=1) → totalCount"
    alt "totalCount > 1000"
        UserTableTool->>JobsService: markTableJobRunning(tableId, jobId, 'delete')
        alt slot free
            UserTableTool->>TriggerDev: tasks.trigger('table-delete', payload)
            UserTableTool-->>Model: "{success:true, jobId, doomedCount}"
        else slot held
            UserTableTool-->>Model: "{success:false, job in progress}"
        end
    else totalCount ≤ 1000
        UserTableTool->>UserTableTool: deleteRowsByFilter inline
        UserTableTool-->>Model: "{success:true, affectedCount}"
    end

    Note over Model,Worker: query_rows with keyset paging
    Model->>UserTableTool: "query_rows(after={orderKey,id}, limit=100)"
    UserTableTool->>UserTableTool: "queryRows(withExecutions=false, after)"
    UserTableTool-->>Model: "{rows, nextCursor?}"
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Model
    participant UserTableTool
    participant JobsService
    participant TriggerDev
    participant Worker

    Note over Model,Worker: Large CSV import (≥8 MB)
    Model->>UserTableTool: import_file / create_from_file
    UserTableTool->>JobsService: markTableJobRunning(tableId, jobId, 'import')
    alt slot free
        JobsService-->>UserTableTool: true
        alt trigger.dev enabled
            UserTableTool->>TriggerDev: tasks.trigger('table-import', payload)
            TriggerDev-->>UserTableTool: ok
            UserTableTool-->>Model: "{success:true, jobId}"
            TriggerDev->>Worker: runTableImport(payload)
        else local dev
            UserTableTool->>Worker: runDetached → runTableImport(payload)
            UserTableTool-->>Model: "{success:true, jobId}"
        end
        Worker->>JobsService: markJobReady / markJobFailed
    else slot held
        JobsService-->>UserTableTool: false
        UserTableTool-->>Model: "{success:false, job in progress}"
    end

    Note over Model,Worker: Large delete (unbounded, >1000 matches)
    Model->>UserTableTool: delete_rows_by_filter (no limit)
    UserTableTool->>UserTableTool: "queryRows(filter, limit=1) → totalCount"
    alt "totalCount > 1000"
        UserTableTool->>JobsService: markTableJobRunning(tableId, jobId, 'delete')
        alt slot free
            UserTableTool->>TriggerDev: tasks.trigger('table-delete', payload)
            UserTableTool-->>Model: "{success:true, jobId, doomedCount}"
        else slot held
            UserTableTool-->>Model: "{success:false, job in progress}"
        end
    else totalCount ≤ 1000
        UserTableTool->>UserTableTool: deleteRowsByFilter inline
        UserTableTool-->>Model: "{success:true, affectedCount}"
    end

    Note over Model,Worker: query_rows with keyset paging
    Model->>UserTableTool: "query_rows(after={orderKey,id}, limit=100)"
    UserTableTool->>UserTableTool: "queryRows(withExecutions=false, after)"
    UserTableTool-->>Model: "{rows, nextCursor?}"
Loading

Reviews (5): Last reviewed commit: "improvement(mothership): user_table spee..." | Re-trigger Greptile

Comment on lines 291 to 326
const tableMounts = await Promise.all(
inputTables.map(async (tableRef) => {
const tableId =
typeof tableRef === 'string'
? tableRef
: tableRef && typeof tableRef === 'object'
? (tableRef as CanonicalTableInput).tableId || (tableRef as CanonicalTableInput).path
: undefined
if (!tableId) return null
const table = await resolveTableRef(tableId, tablePathLookup)
if (!table || table.workspaceId !== workspaceId) {
throw new Error(
`Input table not found: "${tableId}". Pass the table id (tbl_...) from tables/{name}/meta.json, or a tables/{name}/meta.json path.`
)
}
const csvContent = await buildTableCsvForMount(table)
const sandboxPath =
typeof tableRef === 'object' && tableRef !== null
? (tableRef as CanonicalTableInput).sandboxPath
: undefined
if (!tableId) continue
const table = await resolveTableRef(tableId, tablePathLookup)
if (!table || table.workspaceId !== workspaceId) {
throw new Error(
`Input table not found: "${tableId}". Pass the table id (tbl_...) from tables/{name}/meta.json, or a tables/{name}/meta.json path.`
)
}
const rows = await queryRows(table, {}, 'copilot-fn-exec')

const allKeys = new Set(table.schema.columns.map((column) => column.name))
for (const row of rows.rows ?? []) {
if (row.data && typeof row.data === 'object') {
for (const key of Object.keys(row.data as Record<string, unknown>)) {
allKeys.add(key)
}
return {
path: sandboxPath || `/home/user/tables/${table.id}.csv`,
content: csvContent,
}
}
const headers = Array.from(allKeys)
const csvLines = [headers.join(',')]
for (const row of rows.rows ?? []) {
const data = (row.data || {}) as Record<string, unknown>
csvLines.push(
headers
.map((h) => {
const val = data[h]
const str = val === null || val === undefined ? '' : String(val)
return str.includes(',') || str.includes('"') || str.includes('\n')
? `"${str.replace(/"/g, '""')}"`
: str
})
.join(',')
})
)
for (const mount of tableMounts) {
if (!mount) continue
if (totalSize + mount.content.length > MAX_TOTAL_SIZE) {
throw new Error(
`Mounting table "${mount.path}" would exceed the ${MAX_TOTAL_SIZE / 1024 / 1024}MB total mount limit. Mount fewer or smaller tables.`
)
}
const csvContent = csvLines.join('\n')
const sandboxPath =
typeof tableRef === 'object' && tableRef !== null
? (tableRef as CanonicalTableInput).sandboxPath
: undefined
sandboxFiles.push({
path: sandboxPath || `/home/user/tables/${table.id}.csv`,
content: csvContent,
})
totalSize += mount.content.length
sandboxFiles.push(mount)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Full table CSVs built before budget enforcement

All input tables are serialized in parallel via Promise.all before any size check runs. For each table the buildTableCsvForMount loop fetches every row page-by-page and accumulates the entire CSV in memory — no size gate inside the loop. The outer budget check (totalSize + mount.content.length > MAX_TOTAL_SIZE) only fires after every table has been fully loaded.

Concrete failure: a user mounts two enterprise-plan tables each at ~40MB of rows. Both CSVs are built (80MB in memory) before the loop sees that the second one exceeds the 50MB cap and throws. File and directory mounts check record.size before fetching; tables lack that pre-flight guard. Under traffic, several concurrent requests doing this can exhaust the web container's heap.

Comment thread apps/sim/lib/copilot/tools/server/table/user-table.ts
@greptile-apps

greptile-apps Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR closes three performance gaps in Mothership's table operations: sandbox table mounts now drain all rows via keyset pagination instead of silently truncating to 100, query_rows gains a 1000-row limit cap and keyset cursor support, and large CSV/TSV imports plus unbounded filter-deletes are handed off to the existing background-job infrastructure (with a new get_job poll operation). The underlying data model also migrates: per-import state columns on user_table_definitions are replaced by a new table_jobs table with a partial-unique index enforcing one running write-job per table.

  • function-execute mounts: buildTableCsvForMount pages through selectExportRowPage in 5k-row chunks, remaps column-id keys to display names, and counts toward the 50 MB budget — fixing empty-column mounts on id-keyed tables and the silent 100-row truncation.
  • query_rows contract parity: limit now rejected above 1000, withExecutions: false, and keyset after/nextCursor accepted (rejected alongside sort); inline imports/deletes claim the one-write-job slot to block interleaving.
  • Background import/delete in Mothership: CSV/TSV ≥ 8 MB dispatches tableImportTask; unbounded filter-deletes above 1000 rows dispatch tableDeleteTask with a pendingDeleteMask that hides doomed rows immediately from all read paths.

Confidence Score: 3/5

Safe to merge for most workloads; enterprise tables with hundreds of thousands of rows could exhaust Vercel function memory during a sandbox mount before the 50 MB budget check fires.

The core job-slot migration, keyset pagination, background import/delete dispatch, and schema change are all well-structured and tested. The one concern worth resolving before a wide enterprise rollout is buildTableCsvForMount: all table CSVs are fully assembled in parallel memory before the budget check, so a user who mounts a 500k-row enterprise table gets a function OOM rather than a clean rejection.

apps/sim/lib/copilot/tools/handlers/function-execute.ts — the buildTableCsvForMount loop and the parallel Promise.all mount assembly need a per-page budget check to avoid OOM on large enterprise tables.

Important Files Changed

Filename Overview
apps/sim/lib/copilot/tools/handlers/function-execute.ts Replaces the truncated 100-row queryRows mount with a keyset-draining buildTableCsvForMount that reads all rows and remaps id-keyed cells back to display names. Has a memory safety issue: the full CSV is materialized in memory before the 50 MB budget check runs, and pendingDeleteMask is called once per page (N+1 pattern).
apps/sim/lib/table/service.ts Major overhaul: migrates import-status fields from user_table_definitions to table_jobs, adds selectExportRowPage (position-keyset export drain), pendingDeleteMask for delete-job visibility, selectRowIdPage/deletePageByIds for async delete workers, and latestJobForTable/latestJobsForTables for the new job fields. pendingDeleteMask now runs on every queryRows call regardless of table state.
apps/sim/lib/copilot/tools/server/table/user-table.ts Adds query_rows limit clamping, keyset cursor (after/nextCursor), background delete dispatch for unbounded deletes, background import dispatch for large CSV/TSV files, get_job polling operation, and job-slot claim/release for inline imports. Logic is well-structured with proper guards and rollback paths.
apps/sim/lib/table/import-runner.ts Renames import-specific service calls to generic job variants (markImportFailed→markJobFailed, etc.), adds deleteSourceFile flag (defaults true for UI single-use uploads; pass false for Mothership persistent workspace files), and updates SSE events to the new job event kind.
packages/db/migrations/0233_table_jobs_and_keyset.sql Creates table_jobs with partial unique index (one running write-job per table), migrates existing import_status rows idempotently via ON CONFLICT DO NOTHING, tunes autovacuum on user_table_rows, and adds two CONCURRENT indexes (table_id+id keyset, tenant-scoped btree_gin) while dropping the old cross-tenant GIN index.
apps/sim/lib/table/delete-runner.ts New async delete worker: walks rows in keyset pages (selectRowIdPage → deletePageByIds), checks job ownership on each page, applies cutoff/filter/exclusion scope, emits SSE events, and marks the job terminal or failed.
packages/db/schema.ts Adds tableJobs table with unique partial index, replaces the old cross-tenant GIN index on user_table_rows with a tenant-scoped btree_gin index, adds table_id+id keyset index, and removes import_* columns from userTableDefinitions.
apps/sim/lib/table/dispatcher.ts Extends DispatchScope with filter/excludeRowIds for select-all-under-filter and select-all-minus-deselections, wraps the dispatcherStep page query in withSeqscanOff for jsonb-filtered scopes, and adds scope-aware markActiveDispatchesCancelled with JSONB scope comparison.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Mothership: import_file / create_from_file] --> B{CSV/TSV 8 MB?}
    B -- No --> C[markTableJobRunning claim slot]
    C --> D[Inline import parseFileRows + batchInsert]
    D --> E[releaseJobClaim]
    B -- Yes --> F[createTable with jobStatus=running or markTableJobRunning]
    F --> G{isTriggerDevEnabled?}
    G -- Yes --> H[tasks.trigger tableImportTask]
    G -- No --> I[runDetached runTableImport]
    H -- dispatch error --> J[releaseJobClaim / deleteTable]
    I --> K[runTableImport streaming keyset import]
    H --> K

    L[Mothership: delete_rows_by_filter] --> M{limit === undefined?}
    M -- Yes --> N[queryRows count with pendingDeleteMask]
    N --> O{matchCount > 1000?}
    O -- No --> P[deleteRowsByFilter inline]
    O -- Yes --> Q[markTableJobRunning claim]
    Q --> R[dispatchDeleteJob]
    R --> S[runTableDelete keyset page walk]
    M -- No --> P

    T[Mothership: query_rows] --> U{limit > 1000?}
    U -- Yes --> V[return error]
    U -- No --> W[queryRows with pendingDeleteMask + after cursor]
    W --> X[return rows + nextCursor]

    Y[Mothership: get_job] --> Z[getTableById latestJobForTable]
    Z --> AA[return job status / rowCount]
Loading

Reviews (2): Last reviewed commit: "improvement(mothership): table speed par..." | Re-trigger Greptile

Comment on lines 73 to 108
* would corrupt values.
*/
function formatMountCsvValue(value: unknown): string {
if (value === null || value === undefined) return ''
if (value instanceof Date) return value.toISOString()
if (typeof value === 'object') return JSON.stringify(value)
return String(value)
}

/**
* Serializes a full table to CSV for a sandbox mount. Walks the keyset export
* reader page by page so every row is included (`queryRows` with defaults
* silently truncated mounts to its 100-row page and paid for a count and
* execution metadata the CSV never used), and remaps stored column-id keys
* back to display names so headers line up with cell values.
*/
async function buildTableCsvForMount(table: TableDefinition): Promise<string> {
const nameById = buildNameById(table.schema)
const headers = table.schema.columns.map((c) => c.name)
const lines = [toCsvRow(headers)]
let after: { position: number; id: string } | null = null
while (true) {
const page = await selectExportRowPage(table, after, TABLE_MOUNT_PAGE_SIZE)
for (const row of page) {
const data = rowDataIdToName(row.data, nameById)
lines.push(toCsvRow(headers.map((header) => formatMountCsvValue(data[header]))))
}
if (page.length < TABLE_MOUNT_PAGE_SIZE) return lines.join('\n')
const last = page[page.length - 1]
after = { position: last.position, id: last.id }
}
}

async function resolveInputFiles(
workspaceId: string,
inputFiles?: unknown[],

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Full CSV materialized in memory before the 50 MB budget check

buildTableCsvForMount drains every row into a lines[] array and only returns after the full table is assembled. The budget check in the caller happens after all table CSVs are built via Promise.all. For an enterprise table with 100k+ rows, the CSV string can be hundreds of megabytes or more before the check ever fires — the budget guard prevents the file from reaching the sandbox, but the memory damage is already done (and in the parallel case, all tables are materialized simultaneously).

The old queryRows path implicitly bounded this to 100 rows. A straightforward fix is to track a running byte count inside the drain loop and throw early once the caller's budget would be exceeded — or pass maxBytes as a parameter and abort the page walk.

Comment on lines 73 to 108
* would corrupt values.
*/
function formatMountCsvValue(value: unknown): string {
if (value === null || value === undefined) return ''
if (value instanceof Date) return value.toISOString()
if (typeof value === 'object') return JSON.stringify(value)
return String(value)
}

/**
* Serializes a full table to CSV for a sandbox mount. Walks the keyset export
* reader page by page so every row is included (`queryRows` with defaults
* silently truncated mounts to its 100-row page and paid for a count and
* execution metadata the CSV never used), and remaps stored column-id keys
* back to display names so headers line up with cell values.
*/
async function buildTableCsvForMount(table: TableDefinition): Promise<string> {
const nameById = buildNameById(table.schema)
const headers = table.schema.columns.map((c) => c.name)
const lines = [toCsvRow(headers)]
let after: { position: number; id: string } | null = null
while (true) {
const page = await selectExportRowPage(table, after, TABLE_MOUNT_PAGE_SIZE)
for (const row of page) {
const data = rowDataIdToName(row.data, nameById)
lines.push(toCsvRow(headers.map((header) => formatMountCsvValue(data[header]))))
}
if (page.length < TABLE_MOUNT_PAGE_SIZE) return lines.join('\n')
const last = page[page.length - 1]
after = { position: last.position, id: last.id }
}
}

async function resolveInputFiles(
workspaceId: string,
inputFiles?: unknown[],

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 pendingDeleteMask fetched on every page — N+1 DB round-trips during drain

selectExportRowPage calls pendingDeleteMask(table) unconditionally, so a 100k-row table drained in 20 pages makes 20 extra DB round-trips just to check whether a delete job is running. The mask is derived from table.id which is constant, and the delete job status won't change meaningfully mid-drain. Fetching it once before the loop and threading it through would eliminate the repeated queries.

Comment thread apps/sim/lib/table/service.ts Outdated
@TheodoreSpeaks TheodoreSpeaks changed the title improvement(mothership): table speed parity — keyset reads, limit bounds, background import/delete jobs improvement(mothership): user_table speed parity — limit bounds, keyset paging, background import/delete jobs Jun 16, 2026
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

Comment thread apps/sim/lib/table/service.ts Outdated
Comment thread apps/sim/lib/copilot/tools/server/table/user-table.ts
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

@TheodoreSpeaks TheodoreSpeaks force-pushed the improvement/table-mothership-speed branch from 0e47d6e to 440c1d6 Compare June 16, 2026 21:30
@TheodoreSpeaks TheodoreSpeaks changed the base branch from main to staging June 16, 2026 21:30
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

sourceFile: file.name,
},
} finally {
await releaseJobClaim(table.id, inlineImportId).catch(() => {})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catalog lists unimplemented get_job

Medium Severity

This PR adds get_job to the user_table operation enum in the generated tool catalog, but userTableServerTool.execute has no get_job branch and falls through to Unknown operation: get_job. Models that follow the catalog cannot poll import/delete jobs that way; the PR text also says progress is observed via query_rows only.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 440c1d6. Configure here.

const nextCursor =
!args.sort && result.rows.length === result.limit && lastRow?.orderKey
? { orderKey: lastRow.orderKey, id: lastRow.id }
: undefined

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nextCursor on final full page

Medium Severity

query_rows sets nextCursor whenever the page length equals result.limit, without checking totalCount. When the result set size exactly matches the page size (e.g. 100 rows with default limit and 100 total), clients still get a cursor and an extra empty page on the next after request.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 440c1d6. Configure here.

Comment thread apps/sim/lib/copilot/generated/tool-catalog-v1.ts Outdated
@TheodoreSpeaks TheodoreSpeaks force-pushed the improvement/table-mothership-speed branch from 419894a to 45096a5 Compare June 16, 2026 23:13
table,
{
filter: filterNamesToIds(args.filter, idByName),
filter: idFilter,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline delete bypasses bulk cap

Medium Severity

For delete_rows_by_filter without an explicit limit, a query_rows count at or below 1000 chooses the inline path, but deleteRowsByFilter runs with no limit and loads every current match into memory. Rows matching the filter can grow before delete runs, exceeding the 1000-row bulk cap this change adds elsewhere.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 45096a5. Configure here.

@TheodoreSpeaks TheodoreSpeaks force-pushed the improvement/table-mothership-speed branch from 45096a5 to 59efc85 Compare June 16, 2026 23:22

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 4 total unresolved issues (including 3 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 59efc85. Configure here.

const lastRow = result.rows[result.rows.length - 1]
const nextCursor =
!args.sort && result.rows.length === result.limit && lastRow?.orderKey
? { orderKey: lastRow.orderKey, id: lastRow.id }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor order can diverge

Medium Severity

nextCursor is built from orderKey, but the default row order can still be position-based when fractional ordering is disabled. Passing that cursor back can seek in a different order than the page was returned, causing skipped or repeated rows.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 59efc85. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant