-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
TST: Added test that checks if df.sum() results in concatenation instead of dtype coercion to float64 or int64 #63260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ngs results in concatenation and not coercion to dtype int64 or float64
| df = DataFrame({"a": ["483", "3"], "b": ["94", "759"]}) | ||
| result = df.sum(axis=1) | ||
| expected = Series(["48394", "3759"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use pytest.mark.parametrize like
@pytest.mark.parametrize("input_data,expected_data", [[{"a": ...}, ["48394" ...]] ,...])so we don't have to repeat 3 separate setups?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I can. I will make these changes tonight and resubmit.
AbelJSanchez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the test to use pytest.mark.parametrize instead of setting up three diferent assertions.
mroeschke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One change otherwise LGTM
| DataFrame({"a": ["483", "3"], "b": ["94", "759"]}), | ||
| Series(["48394", "3759"]), | ||
| ), | ||
| ( | ||
| DataFrame({"a": ["483.948", "3.0"], "b": ["94.2", "759.93"]}), | ||
| Series(["483.94894.2", "3.0759.93"]), | ||
| ), | ||
| ( | ||
| DataFrame({"a": ["483", "3.0"], "b": ["94.2", "79"]}), | ||
| Series(["48394.2", "3.079"]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put the DataFrame and Series calls in the body of the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Will make changes tonight and resubmit.
AbelJSanchez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated test_sum_string_dtype() by moving the Series and DataFrame calls into the body of the test.
|
Thanks @AbelJSanchez |
Added test_sum_string_dtype_coercion() that checks if the df.sum() method results in in concatenation for numeric strings, and not coercion to dtype int64 or float64.
I wrote three different assertions:
doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.AGENTS.md.