feat: optimize memory allocation when converting execution response to dataframe #1125

Martozar · 2025-09-02T09:14:16Z

Add optimized flag to DataFrameFactory to enable memory-optimized conversion of execution response to pandas dataframe. Without the flag, the conversion will run as usual, storing headers as a list of dictionaries. The optimized version only stores unique headers and references them, thereby preventing unnecessary memory allocations when processing large numbers of duplicated headers. On large datasets, optimized conversation consumes up to 10x less memory (e.g. on 1M rows with 4 attributes original implementation consumed almost 2Gb, while optimized not more than 200Mb)

Note that the new behaviour is optional and turned off by default, so no existing usages should be affected.

JIRA: CQ-1579
risk: low

lupko · 2025-09-02T09:26:25Z

gooddata-pandas/gooddata_pandas/result_convertor.py

 LabelOverrides = dict[str, dict[str, dict[str, str]]]


+class _Header(ABC):


i think this should also have __slots__ = () so as not to pollute class hierarchy.

see: https://stackoverflow.com/questions/1816483/how-does-inheritance-of-slots-in-subclasses-actually-work

lupko · 2025-09-02T09:34:15Z

gooddata-pandas/gooddata_pandas/result_convertor.py

+    _unique_headers: list[_Header] = field(factory=list)
+    _header_to_index: dict[_Header, int] = field(factory=dict)
+    _indexes: list[int] = field(factory=list)


i wonder if this isn't unnecessarily complicated? in the end, the goal is to reuse _Header instances right? why not having a mapping dict[_Header, _Header] and ditch the indexes altogether?

lupko · 2025-09-02T10:59:41Z

gooddata-pandas/gooddata_pandas/result_convertor.py

+
+    _headers: list[_Header] = field(factory=list)
+    _header_to_header: dict[_Header, _Header] = field(factory=dict)


i'd rename this. _header_cache or _header_dedup or something like that. the header to header sounds like a name of some pop band :)

Thanks, fixed :)

lupko · 2025-09-02T11:00:14Z

gooddata-pandas/gooddata_pandas/result_convertor.py

+        h2h = self._header_to_header
+        headers = self._headers


i suggest to ditch these and use the self fields directly.

…o dataframe Add `optimized` flag to DataFrameFactory to enable memory-optimized conversion of execution response to pandas dataframe. Without the flag, the conversion will run as usual, storing headers as a list of dictionaries. The optimized version only stores unique headers and reference them, preventing unnecessary memory allocations when lots of duplicated headers are processed. Note that the new behaviour is optional and turned off by default, so no existing usages should be affected. JIRA: CQ-1579 risk: low

lupko

LGTM

Martozar requested review from hkad98, jaceksan, lupko and pcerny as code owners September 2, 2025 09:14

lupko reviewed Sep 2, 2025

View reviewed changes

Martozar force-pushed the c.mze-CQ-1579 branch from c05773a to fcceef4 Compare September 2, 2025 10:36

lupko reviewed Sep 2, 2025

View reviewed changes

Martozar force-pushed the c.mze-CQ-1579 branch from fcceef4 to 5e3aaf7 Compare September 2, 2025 11:03

Martozar force-pushed the c.mze-CQ-1579 branch from 5e3aaf7 to ac0189f Compare September 2, 2025 11:05

lupko approved these changes Sep 2, 2025

View reviewed changes

Martozar merged commit 17dfca8 into gooddata:master Sep 2, 2025
9 checks passed

Martozar deleted the c.mze-CQ-1579 branch September 2, 2025 12:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: optimize memory allocation when converting execution response to dataframe #1125

feat: optimize memory allocation when converting execution response to dataframe #1125

Uh oh!

Martozar commented Sep 2, 2025 •

edited

Loading

Uh oh!

lupko Sep 2, 2025

Uh oh!

lupko Sep 2, 2025

Uh oh!

lupko Sep 2, 2025

Uh oh!

Martozar Sep 2, 2025

Uh oh!

lupko Sep 2, 2025

Uh oh!

lupko left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		LabelOverrides = dict[str, dict[str, dict[str, str]]]


		class _Header(ABC):


		_headers: list[_Header] = field(factory=list)
		_header_to_header: dict[_Header, _Header] = field(factory=dict)

feat: optimize memory allocation when converting execution response to dataframe #1125

feat: optimize memory allocation when converting execution response to dataframe #1125

Uh oh!

Conversation

Martozar commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lupko Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

lupko Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

lupko Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Martozar Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

lupko Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

lupko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Martozar commented Sep 2, 2025 •

edited

Loading