Skip to content

feat(context-dev): add Context.dev web + brand data integration#5048

Merged
waleedlatif1 merged 3 commits into
stagingfrom
worktree-context-dev-integration
Jun 15, 2026
Merged

feat(context-dev): add Context.dev web + brand data integration#5048
waleedlatif1 merged 3 commits into
stagingfrom
worktree-context-dev-integration

Conversation

@waleedlatif1

@waleedlatif1 waleedlatif1 commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds a Context.dev integration — a single API-key (Bearer) service for web scraping and brand/firmographic data. Modeled on the existing Firecrawl integration (AuthMode.ApiKey, BYOK via the block's API-key field). Covers all relevant Context.dev endpoints across every API family.

Web scraping

scrape_markdown, scrape_html, scrape_images, screenshot (stored as a downloadable file), crawl, map, search

Web extraction

extract (JSON-schema structured data), extract_product, extract_products, scrape_fonts, scrape_styleguide, classify_naics, classify_sic

Brand intelligence

get_brand (by domain), get_brand_by_name, get_brand_by_email, get_brand_by_ticker, get_brand_simplified, identify_transaction

Utility

prefetch_domain, prefetch_by_email

22 tools total.

File handling

The screenshot endpoint returns a hosted image URL. The tool surfaces it as a file-typed output (ToolFileData with url), so the executor's FileToolProcessor downloads and stores it as a UserFile — the same path other file-producing tools use. The file's MIME type/extension are derived from the returned URL. The raw screenshotUrl is also exposed.

Details

  • Endpoints, params, and response shapes verified against the live Context.dev API reference (https://docs.context.dev). Every response's key_metadata credit accounting is surfaced as creditsConsumed / creditsRemaining.
  • Shared brand output schema + transform helper reused across the six brand endpoints.
  • New block context_dev (white background, brand logo icon), registered in blocks/registry.ts; tools registered in tools/registry.ts.
  • ContextDevBlockMeta with 9 use-case templates.
  • Generated integration docs included.

Validation

  • bun run type-check — clean for the integration (0 errors in context_dev)
  • bun run lint — clean
  • bun run check:api-validation — passes (no new boundary routes)

Review

Addressed all inline findings from the first review pass (commit 3ff8e334e2): wired includeFrames into the block, split crawl/extract maxPages so the differing page limits can't be crossed, and derive the screenshot MIME type from the URL.

@vercel

vercel Bot commented Jun 15, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment Jun 15, 2026 2:35am

Request Review

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor review

@cursor

cursor Bot commented Jun 15, 2026

Copy link
Copy Markdown

PR Summary

Low Risk
Additive integration and documentation; user-supplied API keys call a third-party API with no changes to core auth or data paths.

Overview
Adds a new Context.dev integration (API-key auth, BYOK) so workflows can call scraping, extraction, and brand APIs from one block.

The context_dev workflow block exposes 22 operations via an operation dropdown—web scrape/crawl/map/search/screenshot, JSON-schema extract, product and design-system pulls, NAICS/SIC classification, brand lookup (domain, name, email, ticker, transaction), and prefetch helpers—with conditional sub-blocks and param mapping into the matching tools.

Implements 22 context_dev_* tools against api.context.dev, shared utils for auth/errors/credits, typed outputs (including file for screenshots from hosted URLs), and registers the block and tools in the sim registries.

Docs and catalog updates: integration MDX, meta.json nav entry, integrations.json entry, ContextDevIcon in docs/sim icon maps, and block meta with nine starter templates.

Reviewed by Cursor Bugbot for commit 3ff8e33. Configure here.

Comment thread apps/sim/blocks/blocks/context_dev.ts
Comment thread apps/sim/blocks/blocks/context_dev.ts
@greptile-apps

greptile-apps Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Adds a full Context.dev integration block with 22 tools spanning web scraping, structured extraction, brand intelligence, industry classification, and utility prefetch endpoints. The implementation follows the existing Firecrawl pattern: AuthMode.ApiKey, BYOK API key forwarded via Bearer header, and shared output constants reused across related tools.

  • 22 tools registered in tools/registry.ts and declared in blocks/blocks/context_dev.ts; a single block config with a 22-option operation dropdown routes all calls through one params dispatch switch, with separate maxPages subblocks for crawl (1–500) and extract (1–50) to avoid cross-contamination.
  • Screenshot tool derives extension and MIME type from the returned URL with a known-extension allowlist, falling back to PNG — addressing the previous review finding.
  • Shared utilities (utils.ts) centralise auth headers, error parsing, credit field extraction, and brand response transformation, keeping individual tool files lean; appendParam correctly forwards false booleans while skipping undefined/null/empty string.

Confidence Score: 5/5

New integration-only addition with no changes to existing tool or block logic; all 22 tools are isolated behind the context_dev namespace and cannot affect other integrations.

The integration is a well-contained additive change. Auth headers, error handling, credit extraction, and response transforms all follow the established Firecrawl pattern. The block's params dispatch is complete and verified (all 22 operations have matching switch cases, the API key uses user-only visibility, and the previously flagged includeFrames and screenshot MIME-type gaps have been resolved). The one naming ambiguity in classify_sic (input type vs output type) is a clarity issue, not a runtime defect.

apps/sim/tools/context_dev/classify_sic.ts — minor naming ambiguity between the type request param and the type response field worth tidying before this is consumed by LLM agents.

Important Files Changed

Filename Overview
apps/sim/blocks/blocks/context_dev.ts 940-line block config with 22 operations; correctly splits crawl/extract maxPages into separate subblocks, wires includeFrames for scrape_markdown/scrape_html, and maps sicType→type for classify_sic via the setString target alias
apps/sim/tools/context_dev/utils.ts Shared utilities (base URL, auth headers, error parsing, credit extraction, brand transform, appendParam) — well-structured, correctly skips undefined/null/empty in appendParam while forwarding false booleans
apps/sim/tools/context_dev/types.ts 464-line type file with param/response interfaces and output schema constants; the classify_sic tool's type param name clashes with the output type field (resolved input format), creating a naming ambiguity that could confuse downstream consumers
apps/sim/tools/context_dev/screenshot.ts Screenshot tool with MIME type derivation from URL extension; correctly falls back to PNG on unparseable URL, and guards the ToolFileData creation behind a non-empty screenshotUrl check
apps/sim/tools/context_dev/extract.ts Extract tool uses separate maxPages limit (1-50) from crawl's (1-500); schema is always included in body even when undefined (but JSON.stringify drops undefined keys, so no issue), and schema is required so block UI prevents invocation without it
apps/sim/tools/context_dev/classify_sic.ts SIC classify tool: input param type (taxonomy version) and output field type (resolved input kind) share the same name, which is accurate to the API but creates a naming ambiguity visible to LLM callers
apps/sim/tools/context_dev/search.ts Search tool wraps markdownEnabled in the API-expected markdownOptions object; block handles includeDomains/excludeDomains as comma-delimited or JSON-array strings before passing to the tool
apps/sim/tools/registry.ts All 22 context_dev tools correctly registered with tool IDs matching the block's access list and the params dispatch switch cases

Sequence Diagram

sequenceDiagram
    participant User as User / LLM
    participant Block as ContextDevBlock
    participant Tool as Tool (one of 22)
    participant API as Context.dev API

    User->>Block: invoke with operation + params + apiKey
    Block->>Block: params dispatch (switch on operation)
    Note over Block: setBool / setString / setNumber coerce inputs
    Block->>Tool: resolved params object

    alt GET endpoint (scrape_markdown, get_brand, screenshot, map, …)
        Tool->>API: GET /v1/… ?params… (Bearer apiKey)
    else "POST endpoint (crawl, extract, search, prefetch_*, …)"
        Tool->>API: "POST /v1/… { body } (Bearer apiKey)"
    end

    API-->>Tool: "JSON { ...data, key_metadata }"
    Tool->>Tool: parseContextDevResponse → transformResponse
    Note over Tool: extractCreditMetadata(key_metadata)
    Tool-->>Block: "{ success, output: { ...data, creditsConsumed, creditsRemaining } }"
    Block-->>User: output fields
Loading

Reviews (3): Last reviewed commit: "fix(context-dev): wire includeFrames, sp..." | Re-trigger Greptile

Comment thread apps/sim/blocks/blocks/context_dev.ts
@greptile-apps

greptile-apps Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a Context.dev integration with 10 operations spanning web scraping (markdown, HTML, screenshot, crawl, sitemap map), web search, structured extraction, NAICS/SIC classification, and brand data retrieval — modeled closely on the existing Firecrawl pattern (BYOK Bearer auth, appendParam-based URL building, shared credit metadata extraction).

  • All 10 tools are correctly registered in both tools/registry.ts and blocks/registry.ts, with consistent param visibility (user-only for apiKey), shared utilities in utils.ts, and a unified block that dispatches to the right tool via an operation dropdown.
  • The screenshot tool surfaces the captured image as a ToolFileData output so the executor's FileToolProcessor can download and store it as a UserFile, following the same path as other file-producing tools.
  • Two capability gaps were found: includeFrames is declared in both scrape_markdown and scrape_html tool definitions but never forwarded in the block's params dispatcher or exposed in the UI; and the screenshot tool hardcodes mimeType: 'image/png' regardless of the format the API may return.

Confidence Score: 4/5

Safe to merge — the integration is self-contained, follows established patterns, and only minor capability gaps were found.

The implementation is consistent across all 10 tools, auth credentials are correctly scoped to user-only, and the file-output path for screenshots follows the established FileToolProcessor convention. Two small gaps exist: includeFrames is documented in both scraping tool definitions but never wired through the block dispatcher or UI, and the screenshot tool unconditionally labels files as image/png without inspecting the actual response content type. Neither affects correctness of the other nine operations.

apps/sim/blocks/blocks/context_dev.ts (params dispatcher completeness) and apps/sim/tools/context_dev/screenshot.ts (MIME type derivation).

Important Files Changed

Filename Overview
apps/sim/blocks/blocks/context_dev.ts Block config for all 10 Context.dev operations; the params dispatcher omits includeFrames for both scrape_markdown and scrape_html operations, making the documented parameter unreachable via the block.
apps/sim/tools/context_dev/screenshot.ts Implements screenshot capture with file output; MIME type is hardcoded to image/png regardless of the API-returned format, which may mislabel non-PNG screenshots.
apps/sim/tools/context_dev/utils.ts Shared utilities: URL builder, auth headers, response parsing, credit metadata extraction, and appendParam helper. All edge cases (false, 0, null) handled correctly.
apps/sim/tools/context_dev/types.ts Type definitions for all 10 tool param/response interfaces plus shared output-property constants. Clean and consistent across operations.
apps/sim/tools/context_dev/extract.ts POST-based extraction tool that passes schema and optional crawl params; straightforward and consistent with other POST tools in the integration.
apps/sim/tools/context_dev/search.ts Web search tool wrapping markdownEnabled into markdownOptions body shape and correctly handling optional array domain filters.
apps/sim/tools/registry.ts All 10 context_dev tools correctly registered in the tool registry, consistent with the block's access list.
apps/sim/blocks/registry.ts ContextDevBlock and ContextDevBlockMeta correctly imported and registered alongside other integrations in alphabetical order.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    User([User / LLM]) --> Block["ContextDevBlock\n(operation dropdown)"]
    Block --> Dispatch["params dispatcher\n(switch on operation)"]

    Dispatch --> SM["context_dev_scrape_markdown\nGET /web/scrape/markdown"]
    Dispatch --> SH["context_dev_scrape_html\nGET /web/scrape/html"]
    Dispatch --> SS["context_dev_screenshot\nGET /web/screenshot"]
    Dispatch --> CW["context_dev_crawl\nPOST /web/crawl"]
    Dispatch --> MP["context_dev_map\nGET /web/scrape/sitemap"]
    Dispatch --> SR["context_dev_search\nPOST /web/search"]
    Dispatch --> EX["context_dev_extract\nPOST /web/extract"]
    Dispatch --> CN["context_dev_classify_naics\nGET /web/naics"]
    Dispatch --> CS["context_dev_classify_sic\nGET /web/sic"]
    Dispatch --> GB["context_dev_get_brand\nGET /brand/retrieve"]

    SS --> FTP["FileToolProcessor\n(download → UserFile)"]
    FTP --> File[(Stored File)]

    SM & SH & CW & MP & SR & EX & CN & CS & GB --> Credits["creditsConsumed / creditsRemaining\n(key_metadata)"]
Loading

Reviews (2): Last reviewed commit: "feat(context-dev): add Context.dev web +..." | Re-trigger Greptile

Comment thread apps/sim/blocks/blocks/context_dev.ts
Comment thread apps/sim/tools/context_dev/screenshot.ts
…mages, prefetch

Expands coverage to all relevant Context.dev endpoints (22 tools): brand by
name/email/ticker, simplified brand, transaction identifier, single + catalog
product extraction, fonts, styleguide, image discovery, and prefetch utilities.
Shared brand output schema and transform helper; verified against the live API.
…erive screenshot MIME

Addresses review feedback:
- includeFrames is now a block subblock + param for scrape_markdown/scrape_html
- crawl and extract use separate Max Pages fields (crawl 1-500, extract 1-50) so a
  crawl value can no longer be forwarded to extract beyond its limit
- screenshot file MIME type and extension are derived from the returned URL instead
  of being hardcoded to PNG
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor review

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 3ff8e33. Configure here.

@waleedlatif1 waleedlatif1 merged commit 57b58fd into staging Jun 15, 2026
15 checks passed
@waleedlatif1 waleedlatif1 deleted the worktree-context-dev-integration branch June 15, 2026 03:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant