You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently discovered that full stats collection (i.e. lower_bounds/upper_bounds) is explicitly disabled in PyIceberg for nested (i.e. struct child) fields.
It seems that this change may have been made to limit the number of fields whose stats are collected when default-full stats collection is enabled. However, after discussion it seems that simply adding support for the write.metadata.metrics.max-inferred-column-defaults table property would be the preferred way to control stats growth. If this is implemented, re-enabling stats collection for nested fields should be a non-issue.
Stats collection for nested struct fields is important for schemas like GeoParquet which store important primitive fields (in this case, bounding box xmin, ymin, xmax, ymax) using structs.