Skip to content

Conversation

@AbelJSanchez
Copy link
Contributor

Added test_sum_string_dtype_coercion() that checks if the df.sum() method results in in concatenation for numeric strings, and not coercion to dtype int64 or float64.

I wrote three different assertions:

  1. Sum of two DataFrames with "integer" strings
  2. Sum of two DataFrames with "floating point" string
  3. Sum of two DataFrames with both number type strings.

…ngs results in concatenation and not coercion to dtype int64 or float64
Comment on lines 1050 to 1052
df = DataFrame({"a": ["483", "3"], "b": ["94", "759"]})
result = df.sum(axis=1)
expected = Series(["48394", "3759"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use pytest.mark.parametrize like

@pytest.mark.parametrize("input_data,expected_data", [[{"a": ...}, ["48394" ...]] ,...])

so we don't have to repeat 3 separate setups?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I can. I will make these changes tonight and resubmit.

@mroeschke mroeschke added the Testing pandas testing functions or related to the test suite label Dec 4, 2025
Copy link
Contributor Author

@AbelJSanchez AbelJSanchez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the test to use pytest.mark.parametrize instead of setting up three diferent assertions.

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One change otherwise LGTM

Comment on lines 1050 to 1059
DataFrame({"a": ["483", "3"], "b": ["94", "759"]}),
Series(["48394", "3759"]),
),
(
DataFrame({"a": ["483.948", "3.0"], "b": ["94.2", "759.93"]}),
Series(["483.94894.2", "3.0759.93"]),
),
(
DataFrame({"a": ["483", "3.0"], "b": ["94.2", "79"]}),
Series(["48394.2", "3.079"]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put the DataFrame and Series calls in the body of the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Will make changes tonight and resubmit.

Copy link
Contributor Author

@AbelJSanchez AbelJSanchez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated test_sum_string_dtype() by moving the Series and DataFrame calls into the body of the test.

@mroeschke mroeschke added this to the 3.0 milestone Dec 6, 2025
@mroeschke mroeschke merged commit 8be8439 into pandas-dev:main Dec 6, 2025
41 checks passed
@mroeschke
Copy link
Member

Thanks @AbelJSanchez

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Testing pandas testing functions or related to the test suite

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: df.sum() of string columns depends on whether or not they can be coerced to numbers

2 participants