-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
After the recent PRs for defaulting to microseconds when possible in date_range/Timestamp/to_datetime etc, there is one main type of input to those functions that still result in nanoseconds, which is numeric input.
For example:
>>> pd.to_datetime([1, 2, 3], unit="s")
DatetimeIndex(['1970-01-01 00:00:01', '1970-01-01 00:00:02',
'1970-01-01 00:00:03'],
dtype='datetime64[ns]', freq=None)
(there is the, separate, confusing part that the unit keyword here is define how to interpret the input, and not to specify the unit of the return value)
As a starter, I think there is no reason (apart from that it is currently implemented like that) to return nanoseconds here (unless the input are nanoseconds).
So I would at least default to microseconds when possible.
But then another option would also be to do as minmal conversion as possible, and for example if the input is seconds (or minutes, hours, etc) also return seconds, etc. This gives more variation in the resulting dtype, but causes less conversion.
(I have a slight preference to as much as possible return the same dtype)