Use PandasCursor for Athena dataframes in fetchdf magic#5072
Use PandasCursor for Athena dataframes in fetchdf magic#5072williebsweet wants to merge 10 commits intoTobikoData:mainfrom
Conversation
|
Oh, nice, Athena has some kind of "native" DataFrame provided by the library? In that case, _fetch_native_df() should be implemented on If it relies on an optional extra being available, and that extra is not available, then it can fall back to the existing/current logic from |
|
Nope, uv has never been the default, it just happens to be what that user was using. Generally what we do is something like: The pre-commit hooks should run on Is there a specific error you're getting? |
|
@erindru I was able to resolve my issue by not using But weirdly, |
Only applies to the
%%fetchdfmagic command (although it could be expanded).Replaces the generic
pandas.read_sql_query()with PandasCursor for improved I/O performance.With
pandas.read_sql_query():interface
The key bottleneck is step 2-3: The data has to go through the PyAthena cursor's fetchall() method, which retrieves results row-by-row via AWS API calls.
With
PandasCursor:The key advantage is step 2: Instead of going through AWS APIs row-by-row, PandasCursor downloads the entire result CSV file directly from S3 and then uses pandas' highly optimized CSV reading capabilities.
Anecdotally, I had queries that were taking 30+ min to execute that are now taking ~2 min.