GH-43010: [C++][Compute] Support view arrays in selection kernels#50164
GH-43010: [C++][Compute] Support view arrays in selection kernels#50164Periecle wants to merge 1 commit into
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or See also: |
fce680d to
0b590c7
Compare
| if (indices_may_have_nulls && | ||
| !bit_util::GetBit(indices.buffers[0].data, indices.offset + out_i)) { |
There was a problem hiding this comment.
I think you can use VisitTwoBitBlocksVoid which will be more efficient than individual calls to GetBit for each item.
| if (validity_buf->size() > 0) { | ||
| std::memset(validity_buf->mutable_data(), 0, validity_buf->size()); | ||
| } |
There was a problem hiding this comment.
AllocateEmptyBitmap should ensure that the bitmap is already zero-filled, IIRC.
| for (const auto& data_buffer : data_buffers) { | ||
| buffers.push_back(data_buffer); | ||
| } |
There was a problem hiding this comment.
This is going to keep alive all data buffers from the input, but it would be nice to only keep alive those that are actually used by the output.
Depending on the outcome from #50172 we may replace the unused buffers with NULL slots, or with zero-sized buffers (or we can remap the buffer indices, which is going to be more costly).
| if (out_validity != nullptr) { | ||
| bit_util::SetBit(out_validity, out_i); | ||
| } |
There was a problem hiding this comment.
OptionalBitmapAnd would probably be much more efficient that settings output validity bits individually.
|
Also @zanmato1984 might be interested as well. |
Rationale for this change
binary_viewandstring_viewarrays were not fully supported by selection kernels. This also blocked dictionary decoding for dictionaries with view-type values.What changes are included in this PR?
BINARY_VIEWandSTRING_VIEW.Are these changes tested?
Yes.
pre-commit run cpp --files <changed files>cmake --build cpp/build --target arrow-compute-vector-selection-testAre there any user-facing changes?
Yes.
binary_viewandstring_viewarrays are now supported by the affected selection and dictionary decode paths.Partially addresses GH-39634
Closes GH-43010