HIVE-29398: Make it possible to use longStats instead of timestampStats in ColumnStatisticsData#6276
HIVE-29398: Make it possible to use longStats instead of timestampStats in ColumnStatisticsData#6276thomasrebele wants to merge 1 commit intoapache:masterfrom
Conversation
7150cc8 to
112d9b6
Compare
…ts in ColumnStatisticsData
112d9b6 to
475fff0
Compare
|
|
@thomasrebele I noticed that the parameter Alternatively, we could ask around on the Hive user/dev mailing lists whether HIVE-22311 can be reverted. |
|
Thank you for the review, @kasakrisz! I've sent a mail to dev@hive.apache.org. |
|
It seems that removing the field is more difficult than expected. I had tried to remove it, but that leads to some ClassCastException, and the stats do not appear when using the DESCRIBE FORMATTED command. @kasakrisz, how would the setting be passed to the factory? Do you have a similar example in the Hive repo? Where would the factory/singleton live? Would your suggestion pass some kind of Alternatively, to make the approach extensible, while keeping it close to the original code, a new class ColumnStatsConf could be passed instead of the |



HIVE-29398
What changes were proposed in this pull request?
Add a property to store the timestamp statistics in the long stats field instead of the timestamp stats field. This has been the legacy behavior before HIVE-22311 has been merged.
Why are the changes needed?
Other projects that use the Hive Metastore (e.g., Impala) are still expecting the long stats field. Adding the property makes it possible to switch back to the old behavior.
Does this PR introduce any user-facing change?
No
How was this patch tested?
I've added a unit test and I've manually verified that the stats of a timestamp field in Impala behaves as expected.
'hive.metastore.stats.legacy.timestamp.as.long': 'true',tofe/src/test/resources/hive-site.xml.py./bin/impala-shell.sh)create table a(t timestamp); insert into a(t) values ('2026-01-02 12:34:45'), (null), ('1999-01-03 11:12:23'); compute stats a; show column stats a;#Distinct Valuesis 2 and#Nullsis 1, so the patch works as expected; without the patch, or if you change the property of step 1 to false,#Distinct Valuesand#Nullsare both -1