Use PandasCursor for Athena dataframes in fetchdf magic by williebsweet · Pull Request #5072 · TobikoData/sqlmesh

williebsweet · 2025-07-30T20:43:32Z

Only applies to the %%fetchdf magic command (although it could be expanded).

Replaces the generic pandas.read_sql_query() with PandasCursor for improved I/O performance.

With pandas.read_sql_query():

Query executes in Athena and results are written to S3 as CSV files
PyAthena connection fetches results row-by-row through the standard DB-API cursor
interface
Data comes through the AWS API using fetchall() or similar methods
pandas constructs DataFrame from the row-by-row data

The key bottleneck is step 2-3: The data has to go through the PyAthena cursor's fetchall() method, which retrieves results row-by-row via AWS API calls.

With PandasCursor:

Query executes in Athena and results are written to S3 as CSV files
PandasCursor directly downloads the CSV file from S3
pandas loads the CSV directly using optimized CSV parsing

The key advantage is step 2: Instead of going through AWS APIs row-by-row, PandasCursor downloads the entire result CSV file directly from S3 and then uses pandas' highly optimized CSV reading capabilities.

Anecdotally, I had queries that were taking 30+ min to execute that are now taking ~2 min.

for %%fetchdf magic only

erindru · 2025-07-30T23:08:20Z

Oh, nice, Athena has some kind of "native" DataFrame provided by the library?

In that case, _fetch_native_df() should be implemented on AthenaEngineAdapter to return a pd.DataFrame that is backed by PandasCursor so that all fetchdf() calls can use it.

If it relies on an optional extra being available, and that extra is not available, then it can fall back to the existing/current logic from PandasNativeFetchDFSupportMixin

williebsweet · 2025-07-31T13:41:54Z

@erindru Yeah - my testing of moving it to engine_adapter/athena.py got derailed by the Athena issue mentioned here.

I'm having issues with getting pre-commit to run. The docs, make file, CI commands, and environment don't seem to be "in sync". Is uv the default now?

erindru · 2025-07-31T20:22:09Z

Nope, uv has never been the default, it just happens to be what that user was using.

Generally what we do is something like:

$ python -m venv sqlmesh-env
$ . ./sqlmesh-env/bin/activate
(sqlmesh-env) $ make install-dev 
(sqlmesh-env) $ make install-pre-commit #only have to do this once
# make some changes
(sqlmesh-env) $ git commit ....

The pre-commit hooks should run on git commit and abort the commit if they fail. You should also be able to run make style for something similar. The venv that you ran make install-dev needs to be active for all of these.

Is there a specific error you're getting?

williebsweet · 2025-08-01T18:48:46Z

@erindru I was able to resolve my issue by not using uv.

But weirdly, make style and make py-style pass locally, but the latter failed in the last CI run.

williebsweet added 5 commits July 30, 2025 13:01

use PandasCursor with Athena

96e8e89

for %%fetchdf magic only

simplify

9595304

Merge branch 'main' into main

dc35f60

remove pandas import

95758a1

Merge branch 'main' of https://github.com/williebsweet/sqlmesh

343bff2

williebsweet added 3 commits July 31, 2025 08:59

import pd for type checking

8d361ce

Merge remote-tracking branch 'upstream/main'

195bcde

format

da1a27a

Merge branch 'main' into main

703a4b5

Merge remote-tracking branch 'upstream/main'

6a07a43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use PandasCursor for Athena dataframes in fetchdf magic#5072

Use PandasCursor for Athena dataframes in fetchdf magic#5072
williebsweet wants to merge 10 commits intoTobikoData:mainfrom
williebsweet:main

williebsweet commented Jul 30, 2025 •

edited

Loading

Uh oh!

erindru commented Jul 30, 2025 •

edited

Loading

Uh oh!

williebsweet commented Jul 31, 2025

Uh oh!

erindru commented Jul 31, 2025

Uh oh!

williebsweet commented Aug 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

williebsweet commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erindru commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

williebsweet commented Jul 31, 2025

Uh oh!

erindru commented Jul 31, 2025

Uh oh!

williebsweet commented Aug 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

williebsweet commented Jul 30, 2025 •

edited

Loading

erindru commented Jul 30, 2025 •

edited

Loading