Skip to content

Conversation

@ViggoC
Copy link
Contributor

@ViggoC ViggoC commented May 20, 2025

What's Changed

Avoid doing a binary search on every step to make the RangeEqualsVisitor of RunEndEncodedVector more efficient.

Closes #52 .

@github-actions

This comment has been minimized.

@lidavidm lidavidm added the enhancement PRs that add or improve features. label May 21, 2025
@github-actions github-actions bot added this to the 18.4.0 milestone May 21, 2025

int leftLogicalIndex = range.getLeftStart();
int rightLogicalIndex = range.getRightStart();
while (leftIterator.nextRun() | rightIterator.nextRun()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while (leftIterator.nextRun() | rightIterator.nextRun()) {
while (leftIterator.nextRun() || rightIterator.nextRun()) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we already checked the vector lengths to be equal, and since we're requiring run lengths to be equal, we don't need to check for any leftover runs after the loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lidavidm The vector length is checked in RangeEqualsVisitor.rangeEqual. If we call vector.visit(RangeEqualsVisitor, Range)` directly, the argument checks are skipped. Using rangeEqual is the best practice, but we seem to be unable to limit users from using another method. Therefore, I'm also considering whether we need to add more boundary checks in Iterator. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it doesn't hurt to make sure that both iterators are at the end at the end of the loop as a sanity check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lidavidm According to your suggestion, I should change it to while (leftIterator.nextRun() && rightIterator.nextRun()) instead of while (leftIterator.nextRun() || rightIterator.nextRun()). The short-circuit feature of '||' will make only leftIterator move forward, while right stays still.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah...that would be clearer over using the bitwise operator, yes

int rightRunLength = Math.min(rightRunEnd, rightRangeEnd) - rightLogicalIndex;

if (leftRunLength != rightRunLength) {
if (leftIterator.getRunLength() != rightIterator.getRunLength()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the original code had this problem too, maybe it's better to check the run length first? That's presumably a cheaper check and so we can bail out earlier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified as you suggested.

@lidavidm
Copy link
Member

Ah, the linter is unhappy:

2025-05-21T23:24:02.6057815Z [WARN] /build/vector/src/main/java/org/apache/arrow/vector/complex/RunEndEncodedVector.java:833:5: Missing a Javadoc comment. [MissingJavadocMethod]
2025-05-21T23:24:02.6060058Z [WARN] /build/vector/src/main/java/org/apache/arrow/vector/complex/RunEndEncodedVector.java:850:5: Missing a Javadoc comment. [MissingJavadocMethod]
2025-05-21T23:24:02.6062203Z [WARN] /build/vector/src/main/java/org/apache/arrow/vector/complex/RunEndEncodedVector.java:864:5: Missing a Javadoc comment. [MissingJavadocMethod]

@ViggoC
Copy link
Contributor Author

ViggoC commented May 22, 2025

@lidavidm There were 15 checks that failed. Which ones do I need to fix? Which check action does the linter belong to?

@lidavidm
Copy link
Member

All the checks run basically the same build, so you can check any one of them. The linter is part of running maven - did it not error locally?

@lidavidm
Copy link
Member

Same deal. I can push a fix if that's ok?

 Error:  Failed to execute goal com.diffplug.spotless:spotless-maven-plugin:2.44.4:check (spotless-check) on project arrow-vector: The following files had format violations:
Error:      src/main/java/org/apache/arrow/vector/complex/RunEndEncodedVector.java
Error:          @@ -836,7 +836,8 @@
Error:           ·····*·@param·runEndEncodedVector·The·vector·to·iterate·over
Error:           ·····*·@param·startIndex·The·logical·start·index·of·the·range·(inclusive)
Error:           ·····*·@param·length·The·number·of·values·to·include·in·the·range
Error:          -·····*·@throws·IllegalArgumentException·if·startIndex·is·negative·or·(startIndex·+·length)·exceeds·vector·bounds
Error:          +·····*·@throws·IllegalArgumentException·if·startIndex·is·negative·or·(startIndex·+·length)·exceeds
Error:          +·····*·····vector·bounds
Error:           ·····*/
Error:           ····public·RangeIterator(RunEndEncodedVector·runEndEncodedVector,·int·startIndex,·int·length)·{
Error:           ······int·rangeEnd·=·startIndex·+·length;
Error:  Run 'mvn spotless:apply' to fix these violations.
Error:  -> [Help 1]
Error:  
Error:  To see the full stack trace of the errors, re-run Maven with the -e switch.
Error:  Re-run Maven using the -X switch to enable full debug logging.
Error:  
Error:  For more information about the errors and possible solutions, please read the following articles:
Error:  [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
Error:  
Error:  After correcting the problems, you can resume the build with the command
Error:    mvn <args> -rf :arrow-vector

@ViggoC
Copy link
Contributor Author

ViggoC commented May 22, 2025

@lidavidm Thank you, I applied the spotless, but most of the checks failed, seems it's terminated for some reason.

@lidavidm
Copy link
Member

Retrying - I think GHA had an outage

@lidavidm
Copy link
Member

And I think those JNI failures are unrelated...I'll have to find time and take a look.

@lidavidm lidavidm merged commit 7c25ce5 into apache:main May 22, 2025
25 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement PRs that add or improve features.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Java] Make RangeEqualsVisitor of RunEndEncodedVector more efficient

2 participants