feat(frontend): add HuggingFace media output rendering in result panel#5675
Open
juliethecao wants to merge 37 commits into
Open
feat(frontend): add HuggingFace media output rendering in result panel#5675juliethecao wants to merge 37 commits into
juliethecao wants to merge 37 commits into
Conversation
…d media proxy Introduces a new Jersey REST resource exposing endpoints used by the upcoming HuggingFace operator UI: - GET /api/huggingface/models — browse / search models per task - GET /api/huggingface/tasks — list HF pipeline tags with hosted inference - POST /api/huggingface/upload-audio — upload audio for HF audio tasks - GET /api/huggingface/audio-preview — stream uploaded audio (path-validated) - GET /api/huggingface/media-proxy — proxy remote media URLs to bypass CORS This is the first PR in a stacked series landing the HF operator end-to-end. No operator code yet; this resource is independently useful and lets the frontend integrate with HF before the operator class lands.
Addresses xuang7's review on PR apache#5124 — both endpoints previously buffered the full payload into a heap-resident byte[] with no upper bound, leaving the JVM open to OOM on a hostile or buggy upstream response (/media-proxy) or out-of-band write into the audio temp dir (/audio-preview). - /media-proxy: switch from Unirest.asBytes() to asObject(Function<RawResponse, T>), streaming the upstream body in 8 KiB chunks with a running byte counter. Aborts with 413 if the declared Content-Length exceeds the cap (pre-check) or if the body crosses the cap mid-read (defends against missing/lying Content-Length). New MAX_MEDIA_PROXY_BYTES = 50 MiB, sized for HF inference media (text-to-image ~5 MiB, text-to-video ~30 MiB) with headroom. - /audio-preview: add Files.size() defense-in-depth check before readAllBytes. /upload-audio already enforces MAX_AUDIO_BYTES on ingest; this catches the case where a bug or out-of-band write puts an oversized file in the temp dir. Adds a spec covering the audio-preview cap using a sparse-file fixture so the test stays fast (87/87 spec passes). The media-proxy cap path is exercised via the existing input-validation suite plus the new streamMediaWithCap helper - a follow-up can add a fake-RawResponse unit test if reviewers want explicit coverage of the chunked-read cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per review on apache#5124 (xuang7, Ma77Ball): mark the resource with @RolesAllowed(Array("REGULAR", "ADMIN")) to document that all five endpoints require an authenticated user. The annotation isn't enforced yet — that's coming with the auth-enforcement PR @Yicong-Huang and @Ma77Ball are working on — but adding it now means no follow-up change is needed when enforcement lands, and it matches the convention used by UserConfigResource / AdminSettingsResource. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eration Splits the monolithic 1,278-line HuggingFaceInferenceOpDesc from the team's feature branch into a dispatcher + per-task codegen architecture and ships the first task family (text-generation) end-to-end. - TaskCodegen trait + CodegenContext model the per-task variation - PythonCodegenBase emits the shared provider-fallback / process_table / _parse_response infrastructure with two holes for the per-task payload and parse snippets - TextGenCodegen supplies text-generation's chat-completions payload and the body["choices"][0]["message"]["content"] parse branch - HuggingFaceInferenceOpDesc becomes a thin dispatcher (~180 lines) holding @JsonProperty fields and the registeredCodegens map User-input string fields are typed as EncodableString and emitted via the pyb"..." macro so values reach Python as self.decode_python_template('<base64>') rather than raw literals; class constants are assigned in open(self) so self is in scope for the decode call. Generated process_table runs a defensive _HF_MODEL_ID_PATTERN check at runtime before any HF URL is composed. PR 2 of a stacked 9-PR series. PR 1 (apache#5124) ships the supporting REST resource; PRs 3-5 will add image, audio + media-gen, and QA/ranking task families by registering new *Codegen objects in the dispatcher. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…degen specs Addresses Codecov's 66.85% patch coverage warning by exercising the defensive null-handling branches in HuggingFaceInferenceOpDesc.scala and the TextGenCodegen contract that previously had no spec hits. - null-tolerance: feed null into every @JsonProperty (token, model, prompt col, system prompt, result col, task, maxNewTokens, temperature) and assert generatePythonCode still emits a parseable ProcessTableOperator with sane defaults (TASK falls back to text-generation, MAX_NEW_TOKENS clamps to 256, TEMPERATURE to 0.7). Covers the `if (x == null) ... else x` branches that previously had no test that took the null side. - TextGenCodegen.task: trivial canonical-value check. - TextGenCodegen ctx-independence: pass an "irrelevant"-filled ctx and assert payloadPython / parsePython still reference self.MODEL_ID and body["choices"]…. Catches a future refactor that accidentally splices ctx fields into the static snippets. 13/13 in HuggingFaceInferenceOpDescSpec, 2/2 in PythonCodeRawInvalidTextSpec (117/117 descriptors still py_compile cleanly). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…NAI_COMPATIBLE_PROVIDERS to class constants
Plugs the 9-task image family into the dispatcher pattern established
in PR 2:
image-only image-classification, object-detection,
image-segmentation, image-to-text
image + prompt visual-question-answering, document-question-answering,
zero-shot-image-classification, image-text-to-text,
image-to-image
- ImageTaskCodegen supplies payload + parse Python for all 9 tasks
- TaskCodegen trait gains a `tasks: Set[String]` default method so a
single codegen can register under multiple task strings; the
dispatcher map in HuggingFaceInferenceOpDesc is built from
registeredCodegens.tasks.flatMap(...)
- CodegenContext extended with imageInput + inputImageColumn
(EncodableString)
- HuggingFaceInferenceOpDesc gains 2 new @JsonProperty fields and
registers ImageTaskCodegen
PythonCodegenBase grows to host the shared image infrastructure:
- image_only_tasks / image_prompt_tasks / image_tasks tuples and
image_headers in process_table
- per-row image bytes resolution from upload (self._read_image_input)
or input column (self._read_binary_value + self._compress_image_bytes)
- use_raw_binary_body / raw_binary_headers state threaded through
_post_with_fallback (signature extended)
- _post_with_fallback adds the image-text-to-text chat-completions
branch and the model-author vision branch
- _call_provider adds branches for zai-org's custom API, Replicate
predictions + polling, Fal-ai, Wavespeed submit+poll, and image
embedding in OpenAI-compatible / unknown-provider fallbacks
- image-content-type response handling returns data:image URLs
- image helpers added: _read_image_input, _compress_image_bytes,
_image_input_as_base64, _read_binary_value, _looks_like_html,
_html_to_image_bytes, _extract_json_arg, _url_to_data_url
User-input strings continue to flow through pyb"..." + EncodableString
so they reach Python as self.decode_python_template('<base64>') rather
than raw literals. PythonCodeRawInvalidTextSpec still passes
(117/117 descriptors py_compile cleanly).
Frontend integration adds only the HF lines (no agent / dataset
noise from the source branch):
- HuggingFaceImageUploadComponent declared in app.module.ts
- huggingface-image-upload formly type registered in formly-config.ts
- Image upload component .ts/.html/.scss cherry-picked from huggingFace
- HuggingFace.png + sample-image.png assets
PR 3 of a stacked 9-PR series. Stacks on hf/02-operator-textgen.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nent Register the huggingface formly field type and declare HuggingFaceComponent in AppModule. Provides a task dropdown, paginated model list with client-side search, and per-task field state preservation when switching tasks.
The rxjs/no-implicit-any-catch ESLint rule requires explicit type annotations on error callbacks in .subscribe() calls. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Register the huggingface-audio-upload formly field type and declare HuggingFaceAudioUploadComponent in AppModule. Handles server-side audio storage via the /huggingface/upload-audio endpoint with local preview. Co-Authored-By: Anish Shivamurthy <anish@uci.edu>
…gFace property editor Show/hide operator fields based on the selected HuggingFace task (e.g., imageInput only for image tasks, contextColumn only for question-answering). Adds task preview cards with media samples per task kind (image/video/audio/text), custom validators for required inputs, and ~13 field visibility rules inside the formly jsonSchemaMapIntercept. Co-Authored-By: Anish Shivamurthy <anish@uci.edu>
Add mockHuggingFaceSchema and mockHuggingFacePredicate to test infrastructure. Add 7 spec tests covering huggingFaceTaskPreview for known tasks (text, image, audio, video), unknown tasks (fallback), empty tasks, and non-HF operators.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mponent Satisfies the rxjs-angular/prefer-takeuntil lint rule. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… panel Add isImageUrl/isAudioUrl/isVideoUrl detection helpers and wire them into the result table and row detail modal so image, audio, and video outputs from HuggingFace tasks render inline instead of as raw URLs. Gate backend string truncation on output mode so HF data URLs are never cut off.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5675 +/- ##
============================================
+ Coverage 52.17% 52.66% +0.48%
- Complexity 2482 2586 +104
============================================
Files 1068 1090 +22
Lines 41311 42628 +1317
Branches 4439 4658 +219
============================================
+ Hits 21556 22448 +892
- Misses 18490 18860 +370
- Partials 1265 1320 +55
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Contributor
✅ No material benchmark regressions detected🟢 15 better · 🔴 0 worse · ⚪ 0 noise (<±5%) · 0 without baseline
Baseline detailsLatest main
Raw CSVconfig_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,451.66,200,128000,443,0.270,21700.03,27318.78,27318.78
1,100,10,64,20,2114.39,2000,1280000,946,0.577,104875.94,139201.32,139201.32
2,1000,10,64,20,17922.32,20000,12800000,1116,0.681,894515.64,940170.70,940170.70 |
…media-type spec Vitest does not support Jasmine-style toBeTrue() and toBeFalse() matchers.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this PR?
Render image, audio, and video outputs from HuggingFace tasks inline in the workflow result panel instead of displaying raw data URLs as text.
New file —
media-type.util.ts:isImageUrl— detectsdata:image/data URLs and common image extensions (.png,.jpg,.jpeg,.gif,.webp)isAudioUrl— detectsdata:audio/data URLs and common audio extensions (.mp3,.wav,.ogg,.m4a,.flac)isVideoUrl— detectsdata:video/data URLs, common video extensions (.mp4,.webm,.ogg), and thefal.mediaCDN host used by fal.ai text-to-video outputsChanges to
result-table-frame.component.{ts,html}:isImageCell/isAudioCell/isVideoCellmethods that delegate to the detection helpers<img>,<audio controls>,<video controls>tags conditionally in the table cell template; non-media cells fall through to existing text rendering unchangedChanges to
result-panel-modal.component.{ts,html}:rowEntrieswith per-field media metadata on modal openChanges to
result-panel-model.component.scss:New file —
media-type.util.spec.ts:fal.mediaCDN URL, cross-type rejection, and empty/plain-string inputsAny related issues, documentation, discussions?
How was this PR tested?
Unit tests in
media-type.util.spec.ts(19 cases) andresult-table-frame.component.spec.ts. Run withng test.Was this PR authored or co-authored using generative AI tooling?
Co-authored with Claude Sonnet 4.6 in compliance with ASF guidelines