Skip to content

Conversation

@comphead
Copy link
Contributor

@comphead comphead commented Dec 17, 2025

Which issue does this PR close?

Running experiments to use openDAL with HDFS writes on local and remote clusters

Closes #2890 .

Rationale for this change

What changes are included in this PR?

  • Hookup streaming HDFS openDAL writes to be called from Comet
  • Introduce Remote/Local writers as they have slightly different behavior for write and close
  • Tests to test HDFS writes from native side, ignored as they require running HDFS cluster, but they makes sense for debugging locally

How are these changes tested?

@codecov-commenter
Copy link

codecov-commenter commented Dec 17, 2025

Codecov Report

❌ Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.59%. Comparing base (f09f8af) to head (1a949c3).
⚠️ Report is 801 commits behind head on main.

Files with missing lines Patch % Lines
...comet/serde/operator/CometDataWritingCommand.scala 0.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2929      +/-   ##
============================================
+ Coverage     56.12%   59.59%   +3.46%     
- Complexity      976     1377     +401     
============================================
  Files           119      167      +48     
  Lines         11743    15494    +3751     
  Branches       2251     2570     +319     
============================================
+ Hits           6591     9233    +2642     
- Misses         4012     4961     +949     
- Partials       1140     1300     +160     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@comphead
Copy link
Contributor Author

I can see double writes in the plan now
image

@comphead
Copy link
Contributor Author

Filed #2971 for double writes

@comphead comphead changed the title WIP: Add support for remote Parquet writer with openDAL WIP: Add support for remote Parquet HDFS writer with openDAL Dec 24, 2025
@comphead comphead marked this pull request as ready for review December 24, 2025 21:12
fn parse_hdfs_url(url: &Url) -> Result<(Box<dyn ObjectStore>, Path), object_store::Error> {
// Creates an HDFS object store from a URL using the native HDFS implementation
#[cfg(all(feature = "hdfs", not(feature = "hdfs-opendal")))]
fn create_hdfs_object_store(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed from parse_hdfs_url to reflect more sense

if (!cmd.outputPath.toString.startsWith("file:")) {
return Unsupported(Some("Only local filesystem output paths are supported"))
if (!cmd.outputPath.toString.startsWith("file:") && !cmd.outputPath.toString
.startsWith("hdfs:")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hdfs-opendal can support more schemes, see: #2272

sync::Arc,
};

use opendal::{services::Hdfs, Operator};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is hdfs-opendal always enabled?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support HDFS writes with Comet writer via OpenDAL

3 participants