-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat(NODE-7315): Use BSON ByteUtils instead of Nodejs Buffer #4840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
1. using Uint8Array instead of Buffer in all internal code 2. enabling eslint rule to block Buffer use in src 3. using BSON's ByteUtils in places where we depended on Buffer operations 4. using ByteUtils.isUint8Array instead of Buffer.isBuffer 5. introduced writeInt32LE wrapper that uses the same order of variables, to avoid embarassing mistakes
| } | ||
|
|
||
| const commandBuffer = Buffer.isBuffer(cmd) ? cmd : serialize(cmd, options); | ||
| const commandBuffer: Uint8Array = ByteUtils.isUint8Array(cmd) ? cmd : serialize(cmd, options); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmd is always a document
| const commandBuffer: Uint8Array = ByteUtils.isUint8Array(cmd) ? cmd : serialize(cmd, options); | |
| const commandBuffer: Uint8Array = serialize(cmd, options); |
| const mongoCryptOptions: MongoCryptOptions = { | ||
| ...options, | ||
| kmsProviders: !Buffer.isBuffer(this._kmsProviders) | ||
| kmsProviders: !ByteUtils.isUint8Array(this._kmsProviders) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_kmsProviders should never be a buffer (TS doesn't allow it) - thoughts on removing this check?
| import * as process from 'process'; | ||
|
|
||
| import { BSON, type Document, Int32, NumberUtils } from '../../bson'; | ||
| import { BSON, type Document, fromUTF8, Int32, toUTF8 } from '../../bson'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we have two separate ways of importing ByteUtils:
- importing the
BSONobject and callingBSON.ByteUtils.<method> - importing the wrapper function from the driver's
bson.tsfile
Can we consolidate on one or the other? One other option is to export ByteUtils and NumberUtils from bson.ts, and then use those directly.
src/gridfs/upload.ts
Outdated
| const chunkString = chunk as string; | ||
| const inputBuf = chunkString | ||
| ? fromUTF8(chunkString) | ||
| : ByteUtils.toLocalBufferType(chunk as Uint8Array); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this work instead?
| const chunkString = chunk as string; | |
| const inputBuf = chunkString | |
| ? fromUTF8(chunkString) | |
| : ByteUtils.toLocalBufferType(chunk as Uint8Array); | |
| const inputBuf = typeof chunk === 'string' | |
| ? fromUTF8(chunkString) | |
| : ByteUtils.toLocalBufferType(chunk); |
| import { | ||
| allocateBuffer, | ||
| allocateUnsafeBuffer, | ||
| ByteUtils as BSONByteUtils, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still necessary, or could we just use ByteUtils now?
|
|
||
| export type AnyOptions = Document; | ||
|
|
||
| export const ByteUtils = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be best, but it's public, wouldn't we be breaking users if we removed this?
| return { number: new Double(value).toExtendedJSON() }; | ||
| } | ||
| if (Buffer.isBuffer(value)) | ||
| if (ByteUtils.isUint8Array(value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we also want to remove usages of Buffer from our tests, I think we missed some. Just searching for Buffer. returns a lot more results than are in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this PR, I was only removing Buffer usages from src, based on this comment in the issue:
Enable no-restricted-globals: ["error", "Buffer"] in ESLint configuration for the src/ directory to prevent regression.
This lead me to believe that we should be focusing on the src folder, and can leave Buffer usages in tests. If we want to remove the usages of Buffer from our tests, I'd like to do it as a follow-up task to this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's not necessary to remove Buffer from tests right now, but we definitely would want that in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sergey's offline comment:
We can differentiate between source and test files so it's not necessary to rewrite all the tests right now.
Opened a task to eventually remove Buffer from tests: https://jira.mongodb.org/browse/NODE-7410
|
|
||
| // copyBuffer: copies from source buffer to target buffer, returns number of bytes copied | ||
| // inputs are explicitly named to avoid confusion | ||
| export const copyBuffer = (input: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something we would want to consider putting into BSON's ByteUtils?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a bad idea: copying was a bit of a pain to verify and figure out why certain spec tests were failing, would be nice to push this to a common spot. Follow-up task?
| // readInt32LE, reads a 32-bit integer from buffer at given offset | ||
| // throws if offset is out of bounds | ||
| export const readInt32LE = (buffer: Uint8Array, offset: number): number => { | ||
| validateBufferInputs(buffer, offset, 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this validation behavior something present in Buffers readInt32LE API and we need to preserve it? Or is it there just-in-case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's present in Buffer's readInt32LE and it's something we test for: https://github.com/mongodb/node-mongodb-native/blob/main/test/unit/cmap/commands.test.ts#L124-L127
Removing this check breaks this one test.
| ] | ||
| } | ||
| ], | ||
| "no-restricted-globals": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are removing Buffer usages from tests as well, we'd probably want to enable this lint rule for TS and JS test files too, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed above: I think we should remove Buffer from tests in a follow-up task.
| export const getFloat64LE = BSON.NumberUtils.getFloat64LE; | ||
| export const getBigInt64LE = BSON.NumberUtils.getBigInt64LE; | ||
| export const toUTF8 = BSON.ByteUtils.toUTF8; | ||
| export const fromUTF8 = BSON.ByteUtils.fromUTF8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's too late to change but this PR made me realize that the names toUTF8 and fromUTF8 are reversed semantically – fromUTF8 encodes a string into UTF-8 and toUTF8 decodes a string from UTF-8 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, great spotted! I feel like we can only change that in future major version, I will create ticket for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ByteUtils is experimental - we can make changes outside of semver if we want to fix it now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the driver already rely on toUTF8 method (this one is old, fromUTF8 has been added just recently) , so if we change its' behaviour I believe it will break existing driver functionality, npm install will downloaded newer version of bson.
Co-authored-by: Anna Henningsen <github@addaleax.net>
Description
Summary of Changes
Use BSON ByteUtils for Buffer.* operations.
This change also removes the Buffer type from multiple internal types.
What is the motivation for this change?
Eliminate the driver's hard dependency on the Node.js Buffer global, making the codebase compatible with standard Web APIs (Uint8Array) and portable to environments like Cloudflare Workers, Deno, and the Browser.
This is the next step to implementing NODE-7300 Replace Buffer with Uint8Array,
Double check the following
npm run check:lint)type(NODE-xxxx)[!]: descriptionfeat(NODE-1234)!: rewriting everything in coffeescript