PYTHON-5172 bugfix: Add __repr__ and __eq__ to bson.binary.BinaryVector#2162
Conversation
|
@NoahStapp One question in my mind. Is using an f-string an issue? If the vector is really large, will |
|
Can you add the original PYTHON ticket for BSON Binary Vector if we're not adding a new ticket? |
ShaneHarvey
left a comment
There was a problem hiding this comment.
Let's add a new python ticket for this because it's a new feature.
bson/binary.py
Outdated
| self.padding = padding | ||
|
|
||
| def __repr__(self) -> str: | ||
| return f"BinaryVector - {self.dtype=}, {self.dtype},{self.padding=}, {self.data=}" |
There was a problem hiding this comment.
This needs to follow Python's convention for repr: https://docs.python.org/3/library/functions.html#repr
And we should add tests for this like we do for our other custom type reprs.
f-strings are always evaluated, even if never used. In this case, I don't know if the performance benefit of f-strings is offset by a very large vector being parsed into a string at runtime. It's probably worth running some quick benchmarks to see if there is a performance difference here. |
Yeah. I'm going to switch to format. These vectors could get massive. I don't want users to think mongodb is slow when they've simply got something in a logger.warning statement.. |
Our pre-commit automatically converts str.format into f-string! What's the workaround? |
You can ignore specific ruff rules with |
Should we consider adding the following to pyproject.toml? |
Avoiding cluttering the codebase with unused directives is a pattern we follow consistently, are you seeing linting failures without this change? |
Wrong rule. That was from ChatGPT. |
bson/binary.py
Outdated
| self.padding = padding | ||
|
|
||
| def __repr__(self) -> str: | ||
| return "BinaryVector(dtype={}, padding={}, data={})".format( # noqa: UP032 |
There was a problem hiding this comment.
will repr create the string even when it's not called?
f-string vs format makes no difference. If anything f-string will be faster. __repr__ only runs when repr() is called.
There was a problem hiding this comment.
__only runs when repr() is called.
So if one has logger.debug(vector), is it not true that str.format won't be called, but the f-string will be created?
There was a problem hiding this comment.
Logging behavior is unrelated to this PR since we're just adding a new method. The behavior of repr(vector) is the same whether we use f-string vs format() so let's go back to using f-string.
|
Test failures are async ones, unrelated to these. |
| four = BinaryVector(data, BinaryVectorDtype.PACKED_BIT, padding=3) | ||
| self.assertEqual( | ||
| repr(four), f"BinaryVector(dtype=BinaryVectorDtype.PACKED_BIT, padding=3, data={data})" | ||
| ) |
There was a problem hiding this comment.
Could you add asserts using assertRepr like we do in test_connection_monitoring.py?:
def assertRepr(self, obj):
new_obj = eval(repr(obj))
self.assertEqual(type(new_obj), type(obj))
self.assertEqual(repr(new_obj), repr(obj))There was a problem hiding this comment.
Done. That's very cool, that one can create an object using eval(repr(obj))
|
|
||
| def __repr__(self) -> str: | ||
| return f"BinaryVector(dtype={self.dtype}, padding={self.padding}, data={self.data})" | ||
|
|
There was a problem hiding this comment.
Hold up, the real bug here is that we're using @dataclass but also defining __init__. Why is that? The whole point of dataclass is that it provides __init__ and __repr__ automatically.
There was a problem hiding this comment.
I believe that we had issues with the combo of {dataclass, slots, defaults}. So I've just removed the dataclass decorator.
There was a problem hiding this comment.
Are there any other features that will break by removing @dataclass? We may be stuck with it until 5.0 if it's a breaking change.
There was a problem hiding this comment.
I don't think so. As mentioned in the closed PR, we've done init, and repr, and we don't want to rely on equality except via the binary representation.
There was a problem hiding this comment.
The code seems happy to be a dataclass or not. I have no strong preference.
There was a problem hiding this comment.
There was a problem hiding this comment.
Can you show an example of == and > before and after this change? I'm wondering if the old code (with @dataclass) even works at all.
There was a problem hiding this comment.
Sure. It's easy to try. Put a breakpoint on line 83 of test_bson_binary.
# With the decorator:
vector_obs == vector_exp.as_vector()
True
vector_obs > vector_exp.as_vector()
# TypeError: '>' not supported between instances of 'BinaryVector' and 'BinaryVector'
# Without the decorator:
vector_obs == vector_exp.as_vector()
False
vector_obs > vector_exp.as_vector()
# TypeError: '>' not supported between instances of 'BinaryVector' and 'BinaryVector'
There was a problem hiding this comment.
== is already broken with our current @dataclass approach:
>>> from bson.binary import *
>>> BinaryVector([1], BinaryVectorDtype.INT8) == BinaryVector([2], BinaryVectorDtype.INT8)
TrueSo we need to remove dataclass and implement == as well as __repr__.
There was a problem hiding this comment.
Done. Take a look. I hate subtleties with == and floats..
…nd __repr__ manually.
…init__ and __repr__ manually." This reverts commit 916ec33.
ShaneHarvey
left a comment
There was a problem hiding this comment.
Looking good! Could you add a note for this in the changelog?
Sure! Is this targeting 4.11 or 4.12? If the former, should I bump the micro version and add one line for description and another for jira? If the latter, should I additionallyx start up a whole new section? |
ShaneHarvey
left a comment
There was a problem hiding this comment.
On second thought, let's add this to changelog later since I'm adding the new 4.12 section in another PR right now and we may end up being backporting this as a bug fix. Could you update the PR title and Jira ticket to say this is a bug fix for repr and ==?
bson/binary.py
Outdated
| def __repr__(self) -> str: | ||
| return f"BinaryVector(dtype={self.dtype}, padding={self.padding}, data={self.data})" | ||
|
|
||
| def __eq__(self, other: BinaryVector) -> bool: |
There was a problem hiding this comment.
Typing failure:
bson/binary.py:251: error: Argument 1 of "__eq__" is incompatible with
supertype "object"; supertype defines the argument type as "object" [override]
def __eq__(self, other: BinaryVector) -> bool:
^~~~~~~~~~~~~~~~~~~
Updated names. In the JIRA, I marked it as bug instead of improvement. Please adjust if that's not what you meant. |
No ticket. I just noticed this while working on #2161