Skip to content

Conversation

@fresh-borzoni
Copy link
Contributor

Purpose

Linked issue: close #153

Add a poll() method to Python LogScanner for incremental, timeout-based record reading.

Brief change log

  • Add LogScanner.poll(timeout_ms) method to Python bindings
  • Implement input validation for timeout parameter (rejects negative values)
  • Return empty PyArrow table (with correct schema) on timeout instead of error
  • Add example usage demonstrating poll-based consumption pattern

API and Format

  • New public API: LogScanner.poll(timeout_ms: int) -> pyarrow.Table
  • No storage format changes

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a poll() method to the Python LogScanner bindings, enabling incremental, timeout-based record reading with a configurable timeout parameter.

Changes:

  • Added LogScanner.poll(timeout_ms) method with input validation and timeout handling
  • Implemented create_empty_table() helper method to return properly-schemed empty tables on timeout
  • Added example usage demonstrating the new poll-based consumption pattern

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
bindings/python/src/table.rs Implements the new poll() method with validation, error handling, and empty table creation helper
bindings/python/example/example.py Adds example demonstrating usage of the new poll() method

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

/// - Returns an empty table (with correct schema) if no records are available
/// - When timeout expires, returns an empty table (NOT an error)
fn poll(&self, py: Python, timeout_ms: i64) -> PyResult<Py<PyAny>> {
use std::time::Duration;
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use std::time::Duration import is redundant. The Duration type is already used in the to_arrow() method at line 253 without a local import, suggesting it's available in scope. Consider removing this redundant import for consistency.

Copilot uses AI. Check for mistakes.
# TODO: support to_duckdb()

# Test the new poll() method for incremental reading
print("\n--- Testing poll() method ---")
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example calls subscribe() again before polling, but there's no explanation of why this is necessary or whether it resets the scanner state. Consider adding a comment explaining this step, similar to line 163 which says '# Reset subscription'.

Suggested change
print("\n--- Testing poll() method ---")
print("\n--- Testing poll() method ---")
# Reset subscription before polling to ensure we start from a known position

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fresh-borzoni Thanks for the pr. Only one question. PTAL


let timeout = Duration::from_millis(timeout_ms as u64);
let scan_records = py
.detach(|| TOKIO_RUNTIME.block_on(async { self.inner.poll(timeout).await }))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, I'm thinking can we use scanner#poll_batches directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add blocking poll similar to C++ bindings into python bindings

2 participants