Skip to content

Conversation

@shuoweil
Copy link
Contributor

feat: Support audio transcription with partial ordering
This change also fixes a related issue where Block.join would fail on joins with null indexes when operating in this partial ordering mode.

b/430572560

@shuoweil shuoweil requested review from a team as code owners July 15, 2025 18:20
@shuoweil shuoweil requested a review from TrevorBergeron July 15, 2025 18:20
@shuoweil shuoweil self-assigned this Jul 15, 2025
@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Jul 15, 2025
@shuoweil shuoweil added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 15, 2025
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 15, 2025
@shuoweil shuoweil removed the request for review from TrevorBergeron July 15, 2025 18:36
@shuoweil shuoweil marked this pull request as draft July 15, 2025 18:36
@shuoweil shuoweil force-pushed the shuowei-transcribe-partial-order branch from 78dcbf0 to abc6dae Compare July 15, 2025 21:04
@shuoweil shuoweil force-pushed the shuowei-transcribe-partial-order branch from 4b2927f to 5560902 Compare July 15, 2025 21:38
@shuoweil shuoweil requested a review from TrevorBergeron July 15, 2025 22:12
@shuoweil shuoweil marked this pull request as ready for review July 15, 2025 22:12
if result is not None:
return result

# For block identify joins with null indices, perform cross join
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem desirable. If df1 is n rows and df2 is m rows, won't this end up with n x m rows?

result = df.to_pandas(ordered=False)

assert "transcribed_text" in result.columns
assert len(result) > 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of rows in result should be exactly equal to the number of rows in audio_mm_df_partial_ordering.

@shuoweil shuoweil marked this pull request as draft July 16, 2025 18:12
@shuoweil shuoweil added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Jul 16, 2025
@shuoweil shuoweil closed this Aug 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. do not merge Indicates a pull request not ready for merge, due to either quality or timing. size: m Pull request size is medium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants