Fix: Stream Json Content #3099

nathanmascitelli · 2025-12-30T22:10:17Z

When inserting a large amount of data into BigQuery it has been noticed that a large number of strings on the large object heap were rooted in Google.Apis.Requests.HttpRequestMessageExtenstions:

The above is due to the fact that in HttpRequestMessageExtenstions we serialize the object we are sending to BQ into a string before sending it over the network:

These allocations can be removed if the object is serialized and its bytes pushed to the network stream as the serialization is done. I've done this in a custom implementation of HttpContent in this PR.

Please let me know if there are any questions or anything I can do. I've kept the dependency on Newtonsoft.JSON and extended the ISerializer interface in a non-breaking way.

google-cla · 2025-12-30T22:10:21Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

gemini-code-assist · 2025-12-30T22:10:35Z

Summary of Changes

Hello @nathanmascitelli, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly optimizes the process of sending large JSON payloads, particularly to services like BigQuery, by refactoring the serialization mechanism. It addresses high memory consumption caused by intermediate string allocations on the large object heap. The core change involves implementing a custom HttpContent that streams serialized JSON directly to the network, optionally applying GZip compression on the fly, thus improving performance and reducing memory footprint.

Highlights

New JsonStreamContent Class: Introduced a new JsonStreamContent class that inherits from HttpContent to enable direct streaming of JSON serialized objects to the network, bypassing intermediate string allocations.
Optimized Serialization to Stream: Modified the ISerializer interface and its NewtonsoftJsonSerializer implementation to include a leaveOpen parameter, allowing the underlying stream to remain open after serialization, which is crucial for the new streaming approach.
Reduced Memory Allocations: Refactored HttpRequestMessageExtenstions to utilize the new JsonStreamContent, eliminating the previous pattern of serializing objects into large strings before sending, thereby significantly reducing large object heap allocations, especially for large BigQuery data insertions.
Integrated GZip Compression: The new JsonStreamContent class now directly handles GZip compression by wrapping the target stream in a GZipStream when enabled, further optimizing data transfer without intermediate buffers.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant performance improvement for uploading large JSON payloads by streaming the serialization process directly to the network stream. This avoids allocating large strings on the heap, addressing the memory pressure issue. The introduction of JsonStreamContent is a clean way to encapsulate this logic. The changes to the ISerializer interface are non-breaking and well-implemented. My review includes one critical fix for the new JsonStreamContent class to ensure gzipped content is correctly formatted.

Src/Support/Google.Apis/Requests/JsonStreamContent.cs

amanda-tarafa

Aside from the breaking change, the code this PR is changing is used by all the Discovery based .NET Client Libraries, so we would have to confirm and test that this change works for 400+ APIs. It's unlikely that we can merge this change as is (even without the breaking change) as it changes long-established default behaviour.

In principle, you should be able to implement your own custom serializer and configure your services to use that? Did you tried that?

In addition, for working with large amounts of data, the recommendation is to use BigQuery Storage API with the Google.Cloud.BigQuery.Storage.V1 library.

I'll close this PR now, but feel freeto create an issue for further discussion. It'd be best if you can include the original problem statement as well as a link to this PR there for easier discoverability.

Src/Support/Google.Apis.Core/ISerializer.cs

nathanmascitelli · 2026-01-07T19:56:18Z

@amanda-tarafa I left a comment on the diff but I don't understand how an optional parameter is a breaking change and would appreciate some more info.

In principle, you should be able to implement your own custom serializer and configure your services to use that? Did you tried that?

I dont think this fixes the problem as the code in my initial post shows that it still needs to create a string in memory before sending it over the network, causing the GC to have to clean that string up later. If I'm misunderstanding what you're suggesting could you please give me an example?

...the recommendation is to use BigQuery Storage API...

I can look at this but to be clear we are inserting one row that is just very wide and so the string that is created ends up being large. Is the Storage API still a good idea for inserting single rows?

amanda-tarafa · 2026-01-07T21:08:38Z

Adding an optional parameter is a binary breaking change.

I dont think this fixes the problem ...

Yes you are right, I missed this.

I can look at this but to be clear we are inserting one row that is just very wide...

But unless you are inserting very wide rows many times over, even a very wide row shouldn't cause memory issues? And if you are inserting the wide rows many times over, then maybe BigQuery Storage is indeed an option. Don't get me wrong, I agree allocating the very wide string is not ideal, but these libraries are not optimized for "very big requests" as it's not a common case. If BigQuery Storage is not an optoin, then, maybe consider uploading the row via a job that uses a media upload? You can see how we've used that on Google.Cloud.BigQuery.V2.

Also, I'm not sure which you are using, but note that Google.Cloud.BigQuery.V2 wraps and is recommended over Google.Apis.Bigquery.v2.

It continues to be unlikely that we make the change as you are proposing. The threshold for such a wide reaching change is high, as in something needs to be fundametanlly broken, and that doesn't seem to be the case here.

Please do create an issue for further discussion. Discussions on closed PRs are not as easily discovered.

nathanmascitelli · 2026-01-07T21:19:34Z

Thanks @amanda-tarafa. Let me take a look at the Storage API as you suggested and if it doesn't solve the problem I'll open an issue with a reproduction of whats causing the allocations I'm seeing and we can continue to discuss there.

We are using Google.Cloud.BigQuery.V2, the allocations just come from Google.Apis.Bigquery.v2.

Thanks again for taking the time to review and explain.

amanda-tarafa · 2026-01-07T21:29:09Z

Let me take a look at the Storage API

Just to be clear, I was proposing you look into the BigQuery Storage API. The (plain) Storage API serves different purposes alltogether.

nathanmascitelli requested a review from a team as a code owner December 30, 2025 22:10

gemini-code-assist bot reviewed Dec 30, 2025

View reviewed changes

Src/Support/Google.Apis/Requests/JsonStreamContent.cs Show resolved Hide resolved

nathanmascitelli added 2 commits December 31, 2025 09:20

Stream uncompressed json content

41b9765

Move compression logic into JsonStreamContent

488e844

nathanmascitelli force-pushed the StreamJsonContent branch from 845490e to 488e844 Compare December 31, 2025 14:59

nathanmascitelli changed the title ~~Stream Json Content~~ Fix: Stream Json Content Dec 31, 2025

amanda-tarafa reviewed Jan 7, 2026

View reviewed changes

Src/Support/Google.Apis.Core/ISerializer.cs Show resolved Hide resolved

amanda-tarafa closed this Jan 7, 2026

amanda-tarafa self-assigned this Jan 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Stream Json Content #3099

Fix: Stream Json Content #3099

Uh oh!

nathanmascitelli commented Dec 30, 2025 •

edited

Loading

Uh oh!

google-cla bot commented Dec 30, 2025

Uh oh!

gemini-code-assist bot commented Dec 30, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

amanda-tarafa left a comment

Uh oh!

Uh oh!

nathanmascitelli commented Jan 7, 2026 •

edited

Loading

Uh oh!

amanda-tarafa commented Jan 7, 2026

Uh oh!

nathanmascitelli commented Jan 7, 2026

Uh oh!

amanda-tarafa commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix: Stream Json Content #3099

Fix: Stream Json Content #3099

Uh oh!

Conversation

nathanmascitelli commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

google-cla bot commented Dec 30, 2025

Uh oh!

gemini-code-assist bot commented Dec 30, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

amanda-tarafa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nathanmascitelli commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amanda-tarafa commented Jan 7, 2026

Uh oh!

nathanmascitelli commented Jan 7, 2026

Uh oh!

amanda-tarafa commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nathanmascitelli commented Dec 30, 2025 •

edited

Loading

nathanmascitelli commented Jan 7, 2026 •

edited

Loading