Skip to content

GH-43010: [C++][Compute] Support view arrays in selection kernels#50164

Open
Periecle wants to merge 1 commit into
apache:mainfrom
Periecle:fix-view-selection-kernels
Open

GH-43010: [C++][Compute] Support view arrays in selection kernels#50164
Periecle wants to merge 1 commit into
apache:mainfrom
Periecle:fix-view-selection-kernels

Conversation

@Periecle

@Periecle Periecle commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Rationale for this change

binary_view and string_view arrays were not fully supported by selection kernels. This also blocked dictionary decoding for dictionaries with view-type values.

What changes are included in this PR?

  • Add take/filter support for BINARY_VIEW and STRING_VIEW.
  • Preserve variadic payload buffers when taking view arrays.
  • Allow dictionary decode/cast from dictionaries with view-type values.
  • Add coverage for take, filter, drop null, scatter, and dictionary decode paths.

Are these changes tested?

Yes.

  • pre-commit run cpp --files <changed files>
  • cmake --build cpp/build --target arrow-compute-vector-selection-test
  • Focused compute tests for cast, selection, drop null, and scatter view-array paths

Are there any user-facing changes?

Yes. binary_view and string_view arrays are now supported by the affected selection and dictionary decode paths.

Partially addresses GH-39634
Closes GH-43010

@Periecle Periecle requested a review from pitrou as a code owner June 12, 2026 12:37
@github-actions

Copy link
Copy Markdown

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@Periecle Periecle force-pushed the fix-view-selection-kernels branch from fce680d to 0b590c7 Compare June 12, 2026 13:30
@Periecle Periecle changed the title [C++][Compute] Support view arrays in selection kernels GH-43010: [C++][Compute] Support view arrays in selection kernels Jun 12, 2026

@pitrou pitrou left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for tackling this @Periecle . This is a review purely on the Take implementation, I haven't looked at the rest (yet).

Comment on lines +511 to +512
if (indices_may_have_nulls &&
!bit_util::GetBit(indices.buffers[0].data, indices.offset + out_i)) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use VisitTwoBitBlocksVoid which will be more efficient than individual calls to GetBit for each item.

Comment on lines +582 to +584
if (validity_buf->size() > 0) {
std::memset(validity_buf->mutable_data(), 0, validity_buf->size());
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AllocateEmptyBitmap should ensure that the bitmap is already zero-filled, IIRC.

Comment on lines +598 to +600
for (const auto& data_buffer : data_buffers) {
buffers.push_back(data_buffer);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to keep alive all data buffers from the input, but it would be nice to only keep alive those that are actually used by the output.

Depending on the outcome from #50172 we may replace the unused buffers with NULL slots, or with zero-sized buffers (or we can remap the buffer indices, which is going to be more costly).

Comment on lines +525 to +527
if (out_validity != nullptr) {
bit_util::SetBit(out_validity, out_i);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OptionalBitmapAnd would probably be much more efficient that settings output validity bits individually.

@pitrou

pitrou commented Jun 15, 2026

Copy link
Copy Markdown
Member

Also @zanmato1984 might be interested as well.

@github-actions github-actions Bot added awaiting committer review Awaiting committer review awaiting review Awaiting review and removed awaiting review Awaiting review awaiting committer review Awaiting committer review labels Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++] Support STRING_VIEW and BINARY_VIEW in "array_take" and "array_filter" functions

2 participants