-
Notifications
You must be signed in to change notification settings - Fork 63
refactor: ExecuteResult is reusable, sampleable #2159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tswast
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a high level, I think this makes a lot of sense. My main worry would be the case of ExecuteResult with local data. If we are keeping these around more, is there a risk of memory leaks?
| dfs = iter(map(self._copy_index_to_pandas, dfs)) | ||
|
|
||
| total_rows = execute_result.total_rows | ||
| total_rows = result_batches.approx_total_rows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed offline, I think it'd be helpful to have cases where we know (or at least have very high confidence) of this being exact, such as when reading the destination table of a query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do insert row count directly in the tree when known exactly (or at least some of the time we know exactly, might be missing some opportunities). At a certain point we just intermingle known and approx though, before surfacing. Could keep separate if really needed, but a bit more complexity.
Yeah, this refactor is intended to help with caching, but doesn't actually add any caching behavior for now. So, these data handles will be garbage collected just fine. |
4032364 to
1cdf793
Compare
| for scan_item in node.scan_list.items: | ||
| if ( | ||
| scan_item.dtype == dtypes.JSON_DTYPE | ||
| node.source.schema.get_type(scan_item.source_id) == dtypes.JSON_DTYPE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious what the purpose of this extra layer of indirection is compared to the previous? I guess it's partially undoing some of the changes Shenyang and Chelsea made to plumb type info into the tree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its mostly from moving the schema to the source definition rather than the scan definition, which brings it more in line with the local source, where a source object is not just data, but interpretation (mostly matters for virtual types)
|
e2e failures are "Blob" related. Probably flakes. |
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕