Skip to content

feat(frontend): add task-aware field visibility and preview to HuggingFace property editor#5568

Open
ELin2025 wants to merge 48 commits into
apache:mainfrom
ELin2025:hf/07-property-editor
Open

feat(frontend): add task-aware field visibility and preview to HuggingFace property editor#5568
ELin2025 wants to merge 48 commits into
apache:mainfrom
ELin2025:hf/07-property-editor

Conversation

@ELin2025

@ELin2025 ELin2025 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

⚠️ This PR is stacked on #5567. Until that lands, the diff below may also include PR 6a/6b's task selector, model browser, and audio upload changes depending on which base GitHub is showing. The new code in this PR is the task-aware field visibility rules, task preview cards, and custom validators in operator-property-edit-frame.component.ts/.html/.scss, plus the test infrastructure (mockHuggingFaceSchema, mockHuggingFacePredicate) and 7 new spec tests. Once PR #5567 merges and this PR is retargeted to main, the diff should auto-clean to the PR 7 property-editor changes only.

What changes were proposed in this PR?

Wire up the HuggingFace operator's property editor so that selecting a task dynamically controls which fields are visible and shows a media preview card. This is the PR that makes the formly components from PRs 6a/6b user-visible by mapping operator fields to custom field types in jsonSchemaMapIntercept.

Changes to operator-property-edit-frame.component.ts:

  • Map modelIdhuggingface formly type, imageInputhuggingface-image-upload, audioInputhuggingface-audio-upload
  • Hide the task field (it is controlled by the HuggingFaceComponent's task dropdown instead)
  • ~13 field visibility rules via formly expressions that show/hide fields based on the selected task (e.g., imageInput only for image tasks, contextColumn only for question-answering, systemPrompt/maxNewTokens/temperature only for text-generation)
  • 3 custom validators: requiredImageInput, requiredAudioInput, requiredPromptColumn — each checks whether the direct input or the corresponding column selector is filled
  • Task preview cards with sample media for 22 task types across 4 kinds (image, video, audio, text), plus a fallback for unknown tasks

Changes to test infrastructure:

  • mockHuggingFaceSchema in mock-operator-metadata.data.ts (added to mockOperatorSchemaList)
  • mockHuggingFacePredicate in mock-workflow-data.ts
  • 7 new spec tests covering task preview for known tasks (text, image, audio, video), unknown tasks (fallback), empty task, and non-HF operators

Any related issues, documentation, discussions?

How was this PR tested?

7 new spec tests added in operator-property-edit-frame.component.spec.ts:

  1. Non-HF operators return null task preview
  2. HF operator with text-generation returns text-kind preview with correct title
  3. Unknown task returns fallback text preview with formatted title
  4. image-classification returns image-kind preview
  5. text-to-speech returns audio-kind preview
  6. text-to-video returns video-kind preview
  7. Empty task returns null preview

Run with ng test.

Was this PR authored or co-authored using generative AI tooling?

Co-authored with Claude Opus 4.7

PG1204 and others added 29 commits May 17, 2026 13:02
…d media proxy

Introduces a new Jersey REST resource exposing endpoints used by the
upcoming HuggingFace operator UI:

- GET  /api/huggingface/models       — browse / search models per task
- GET  /api/huggingface/tasks        — list HF pipeline tags with hosted inference
- POST /api/huggingface/upload-audio — upload audio for HF audio tasks
- GET  /api/huggingface/audio-preview — stream uploaded audio (path-validated)
- GET  /api/huggingface/media-proxy   — proxy remote media URLs to bypass CORS

This is the first PR in a stacked series landing the HF operator end-to-end.
No operator code yet; this resource is independently useful and lets the
frontend integrate with HF before the operator class lands.
Addresses xuang7's review on PR apache#5124 — both endpoints previously
buffered the full payload into a heap-resident byte[] with no upper
bound, leaving the JVM open to OOM on a hostile or buggy upstream
response (/media-proxy) or out-of-band write into the audio temp dir
(/audio-preview).

- /media-proxy: switch from Unirest.asBytes() to
  asObject(Function<RawResponse, T>), streaming the upstream body in
  8 KiB chunks with a running byte counter. Aborts with 413 if the
  declared Content-Length exceeds the cap (pre-check) or if the body
  crosses the cap mid-read (defends against missing/lying
  Content-Length). New MAX_MEDIA_PROXY_BYTES = 50 MiB, sized for HF
  inference media (text-to-image ~5 MiB, text-to-video ~30 MiB) with
  headroom.
- /audio-preview: add Files.size() defense-in-depth check before
  readAllBytes. /upload-audio already enforces MAX_AUDIO_BYTES on
  ingest; this catches the case where a bug or out-of-band write puts
  an oversized file in the temp dir.

Adds a spec covering the audio-preview cap using a sparse-file fixture
so the test stays fast (87/87 spec passes). The media-proxy cap path
is exercised via the existing input-validation suite plus the new
streamMediaWithCap helper - a follow-up can add a fake-RawResponse
unit test if reviewers want explicit coverage of the chunked-read cap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>


Per review on apache#5124 (xuang7, Ma77Ball): mark the resource with
@RolesAllowed(Array("REGULAR", "ADMIN")) to document that all five
endpoints require an authenticated user. The annotation isn't enforced
yet — that's coming with the auth-enforcement PR @Yicong-Huang and
@Ma77Ball are working on — but adding it now means no follow-up
change is needed when enforcement lands, and it matches the convention
used by UserConfigResource / AdminSettingsResource.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eration

Splits the monolithic 1,278-line HuggingFaceInferenceOpDesc from the
team's feature branch into a dispatcher + per-task codegen architecture
and ships the first task family (text-generation) end-to-end.

- TaskCodegen trait + CodegenContext model the per-task variation
- PythonCodegenBase emits the shared provider-fallback / process_table /
  _parse_response infrastructure with two holes for the per-task payload
  and parse snippets
- TextGenCodegen supplies text-generation's chat-completions payload and
  the body["choices"][0]["message"]["content"] parse branch
- HuggingFaceInferenceOpDesc becomes a thin dispatcher (~180 lines)
  holding @JsonProperty fields and the registeredCodegens map

User-input string fields are typed as EncodableString and emitted via
the pyb"..." macro so values reach Python as
self.decode_python_template('<base64>') rather than raw literals; class
constants are assigned in open(self) so self is in scope for the decode
call. Generated process_table runs a defensive _HF_MODEL_ID_PATTERN
check at runtime before any HF URL is composed.

PR 2 of a stacked 9-PR series. PR 1 (apache#5124) ships the supporting REST
resource; PRs 3-5 will add image, audio + media-gen, and QA/ranking
task families by registering new *Codegen objects in the dispatcher.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…degen specs

Addresses Codecov's 66.85% patch coverage warning by exercising the
defensive null-handling branches in HuggingFaceInferenceOpDesc.scala and
the TextGenCodegen contract that previously had no spec hits.

- null-tolerance: feed null into every @JsonProperty (token, model, prompt
  col, system prompt, result col, task, maxNewTokens, temperature) and
  assert generatePythonCode still emits a parseable ProcessTableOperator
  with sane defaults (TASK falls back to text-generation, MAX_NEW_TOKENS
  clamps to 256, TEMPERATURE to 0.7). Covers the `if (x == null) ... else
  x` branches that previously had no test that took the null side.
- TextGenCodegen.task: trivial canonical-value check.
- TextGenCodegen ctx-independence: pass an "irrelevant"-filled ctx and
  assert payloadPython / parsePython still reference self.MODEL_ID and
  body["choices"]…. Catches a future refactor that accidentally splices
  ctx fields into the static snippets.

13/13 in HuggingFaceInferenceOpDescSpec, 2/2 in PythonCodeRawInvalidTextSpec
(117/117 descriptors still py_compile cleanly).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Plugs the 9-task image family into the dispatcher pattern established
in PR 2:

  image-only      image-classification, object-detection,
                  image-segmentation, image-to-text
  image + prompt  visual-question-answering, document-question-answering,
                  zero-shot-image-classification, image-text-to-text,
                  image-to-image

- ImageTaskCodegen supplies payload + parse Python for all 9 tasks
- TaskCodegen trait gains a `tasks: Set[String]` default method so a
  single codegen can register under multiple task strings; the
  dispatcher map in HuggingFaceInferenceOpDesc is built from
  registeredCodegens.tasks.flatMap(...)
- CodegenContext extended with imageInput + inputImageColumn
  (EncodableString)
- HuggingFaceInferenceOpDesc gains 2 new @JsonProperty fields and
  registers ImageTaskCodegen

PythonCodegenBase grows to host the shared image infrastructure:
- image_only_tasks / image_prompt_tasks / image_tasks tuples and
  image_headers in process_table
- per-row image bytes resolution from upload (self._read_image_input)
  or input column (self._read_binary_value + self._compress_image_bytes)
- use_raw_binary_body / raw_binary_headers state threaded through
  _post_with_fallback (signature extended)
- _post_with_fallback adds the image-text-to-text chat-completions
  branch and the model-author vision branch
- _call_provider adds branches for zai-org's custom API, Replicate
  predictions + polling, Fal-ai, Wavespeed submit+poll, and image
  embedding in OpenAI-compatible / unknown-provider fallbacks
- image-content-type response handling returns data:image URLs
- image helpers added: _read_image_input, _compress_image_bytes,
  _image_input_as_base64, _read_binary_value, _looks_like_html,
  _html_to_image_bytes, _extract_json_arg, _url_to_data_url

User-input strings continue to flow through pyb"..." + EncodableString
so they reach Python as self.decode_python_template('<base64>') rather
than raw literals. PythonCodeRawInvalidTextSpec still passes
(117/117 descriptors py_compile cleanly).

Frontend integration adds only the HF lines (no agent / dataset
noise from the source branch):
- HuggingFaceImageUploadComponent declared in app.module.ts
- huggingface-image-upload formly type registered in formly-config.ts
- Image upload component .ts/.html/.scss cherry-picked from huggingFace
- HuggingFace.png + sample-image.png assets

PR 3 of a stacked 9-PR series. Stacks on hf/02-operator-textgen.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nent

Register the huggingface formly field type and declare HuggingFaceComponent
in AppModule. Provides a task dropdown, paginated model list with client-side
search, and per-task field state preservation when switching tasks.
The rxjs/no-implicit-any-catch ESLint rule requires explicit type
annotations on error callbacks in .subscribe() calls.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the common label Jun 8, 2026
@codecov-commenter

codecov-commenter commented Jun 8, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 36.73469% with 558 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.05%. Comparing base (8001e4c) to head (06a0680).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...e/component/hugging-face/hugging-face.component.ts 7.77% 248 Missing and 1 partial ⚠️
...component/hugging-face/hugging-face.component.html 0.00% 96 Missing ⚠️
...it-frame/operator-property-edit-frame.component.ts 49.00% 49 Missing and 2 partials ⚠️
...-frame/operator-property-edit-frame.component.html 3.84% 50 Missing ⚠️
...mage-upload/hugging-face-image-upload.component.ts 50.00% 41 Missing and 1 partial ⚠️
...udio-upload/hugging-face-audio-upload.component.ts 44.92% 34 Missing and 4 partials ⚠️
...io-upload/hugging-face-audio-upload.component.html 30.43% 16 Missing ⚠️
...ge-upload/hugging-face-image-upload.component.html 38.88% 11 Missing ⚠️
...rator/huggingFace/HuggingFaceInferenceOpDesc.scala 95.58% 0 Missing and 3 partials ⚠️
...ber/operator/huggingFace/codegen/TaskCodegen.scala 88.88% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5568      +/-   ##
============================================
- Coverage     52.54%   52.05%   -0.50%     
- Complexity     2484     2520      +36     
============================================
  Files          1071     1085      +14     
  Lines         41363    42163     +800     
  Branches       4441     4584     +143     
============================================
+ Hits          21733    21946     +213     
- Misses        18357    18929     +572     
- Partials       1273     1288      +15     
Flag Coverage Δ *Carryforward flag
access-control-service 64.61% <ø> (ø)
agent-service 33.34% <ø> (-1.02%) ⬇️ Carriedforward from c009c34
amber 53.72% <96.96%> (+0.41%) ⬆️
computing-unit-managing-service 1.65% <ø> (ø)
config-service 56.71% <ø> (ø)
file-service 38.21% <ø> (ø)
frontend 46.12% <22.87%> (-1.06%) ⬇️
pyamber 90.61% <ø> (-0.11%) ⬇️ Carriedforward from c009c34
python 90.75% <ø> (ø) Carriedforward from c009c34
workflow-compiling-service 58.69% <ø> (ø)

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

ELin2025 and others added 2 commits June 8, 2026 14:42
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mponent

Satisfies the rxjs-angular/prefer-takeuntil lint rule.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ELin2025

ELin2025 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

/request-review @Ma77Ball

@Ma77Ball Ma77Ball left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left comments for review below.

if (hfKey === "contextColumn") {
mappedField.expressions = {
...mappedField.expressions,
hide: (field: FormlyFieldConfig) => getSelectedTask(field) !== "question-answering",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contextColumn shows only for the exact string question-answering, so table-question-answering has no context field at all: candidateLabels, sentencesColumn, and contextColumn are all hidden for it, leaving only promptColumn, so the user cannot supply the table. If table QA (or document/visual QA) is meant to use this field, widen the condition. Manual change, confirm against the backend which QA tasks consume contextColumn first:

hide: (field) => {
  const t = getSelectedTask(field);
  return t !== "question-answering" && t !== "table-question-answering";
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

table questioning answering models don't require a contextColumn, since they take in a table as input as their context

ELin2025 and others added 9 commits June 11, 2026 15:16
…mponent

Fix setInterval/setTimeout leaks by tracking timer IDs and clearing them
in ngOnDestroy. Remove takeUntil(destroy$) from shared module-level task
fetch to prevent cache poisoning when the initiating component is destroyed.
Remove unused TASK_TAG_MAP and TASK_NAMES exports.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…FaceComponent

inFlightByTag was written but never read, so multiple component instances
mounting for the same uncached task would each fire a full backend request.
Add an in-flight guard that polls for the existing request's completion,
matching the pattern already used for task fetches.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… in HuggingFaceComponent

Read X-Texera-Truncated header via { observe: "response" } to detect
when the backend's model list is incomplete. Show a notice prompting
users to search. When truncated, search queries are sent to the backend
search endpoint with debounce; otherwise local filtering is used.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r lint rule

Add back destroy$ and takeUntil(this.destroy$) on all subscribe calls.
For the shared task fetch, add finalize() to reset tasksFetchSubscription
when takeUntil fires before next/error, preventing the stale-guard bug
the reviewer originally flagged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Register the huggingface-audio-upload formly field type and declare
HuggingFaceAudioUploadComponent in AppModule. Handles server-side audio
storage via the /huggingface/upload-audio endpoint with local preview.

Co-Authored-By: Anish Shivamurthy <anish@uci.edu>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ELin2025 ELin2025 force-pushed the hf/07-property-editor branch from 553d90d to d1dc52a Compare June 11, 2026 23:38
@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Arrow Flight E2E bench

batch schema_w str_len n_batches tuples/s MB/s p50 ms p99 ms total ms
10 10 64 20 701 0.428 13.07 25.82 285.11
100 10 64 20 1759 1.073 56.03 70.68 1137.14
1000 10 64 20 2132 1.302 468.45 504.11 9378.76
Raw CSV
config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,285.11,200,128000,701,0.428,13066.19,25815.38,25815.38
1,100,10,64,20,1137.14,2000,1280000,1759,1.073,56034.62,70677.37,70677.37
2,1000,10,64,20,9378.76,20000,12800000,2132,1.302,468453.28,504110.62,504110.62

Full workflow run

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ELin2025 ELin2025 force-pushed the hf/07-property-editor branch from 3487d29 to 18cc966 Compare June 11, 2026 23:51
… upload component

Add early return in onFileSelected when isUploading is true to prevent
race conditions from concurrent file selections. Remove the second spec
test that had a dead metadata variable and duplicated the first test's
assertion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ELin2025 ELin2025 force-pushed the hf/07-property-editor branch from 18cc966 to eafce6d Compare June 12, 2026 00:00
ELin2025 and others added 5 commits June 11, 2026 17:07
…ponent

Add meaningful tests: reject non-audio files, successful upload sets
formControl value, and concurrent upload guard prevents race conditions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…gFace property editor

Show/hide operator fields based on the selected HuggingFace task (e.g.,
imageInput only for image tasks, contextColumn only for question-answering).
Adds task preview cards with media samples per task kind (image/video/audio/text),
custom validators for required inputs, and ~13 field visibility rules inside
the formly jsonSchemaMapIntercept.

Co-Authored-By: Anish Shivamurthy <anish@uci.edu>
Add mockHuggingFaceSchema and mockHuggingFacePredicate to test infrastructure.
Add 7 spec tests covering huggingFaceTaskPreview for known tasks (text, image,
audio, video), unknown tasks (fallback), empty tasks, and non-HF operators.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ELin2025 ELin2025 force-pushed the hf/07-property-editor branch from eafce6d to c009c34 Compare June 12, 2026 00:09
…ew sample

Fix inverted fallback for systemPrompt/maxNewTokens/temperature: these
fields now correctly hide when no task is selected, matching the behavior
of all other HuggingFace fields. Add missing image-text-to-text entry to
huggingFaceTaskPreviewSamples so it no longer falls through to the
generic text fallback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common frontend Changes related to the frontend GUI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add task-aware field visibility and preview to HuggingFace property editor

6 participants