Skip to content

Conversation

@landreev
Copy link
Contributor

@landreev landreev commented Aug 26, 2025

What this PR does / why we need it:

I need additional mechanisms in the ExportDataProvide interface to implement the improvements in #11405 (primarily in the DDI exporter). This interface is maintained in the main source tree, but for the purposes of building Dataverse it is used as a standalone dependency package, io.gdcc.dataverse-spi. So, the changes to the interface need to be merged, and the new version of it published on maven central before I could make the final PR to close #11405.

Which issue(s) this PR closes:

  • Closes

Special notes for your reviewer:

Suggestions on how to test this:

I don't think there's a way to test this simple interface change by itself. This is an auxiliary PR that needs to be merged (and published) before I can make the main PR for #11405. Once that PR is made, the overall testing and QA will happen there.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

efficient handling of the raw data for datasets with massive numbers
of ingested/tabular datafiles and variables. #11766 #11405
@github-actions github-actions bot added FY25 Sprint 24 FY25 Sprint 24 (2025-05-21 - 2025-06-04) FY25 Sprint 25 FY25 Sprint 25 (2025-06-04 - 2025-06-18) FY25 Sprint 26 FY25 Sprint 26 (2025-06-18 - 2025-07-02) FY26 Sprint 1 FY26 Sprint 1 (2025-07-02 - 2025-07-16) FY26 Sprint 2 FY26 Sprint 2 (2025-07-16 - 2025-07-30) FY26 Sprint 3 (2025-07-30 - 2025-08-13) FY26 Sprint 4 FY26 Sprint 4 (2025-08-13 - 2025-08-27) Size: 80 A percentage of a sprint. 56 hours. labels Aug 26, 2025
@landreev landreev moved this to Ready for Review ⏩ in IQSS Dataverse Project Aug 26, 2025
@landreev landreev requested a review from qqmyers August 27, 2025 15:08
@pdurbin pdurbin moved this from Ready for Review ⏩ to In Review 🔎 in IQSS Dataverse Project Aug 27, 2025
@pdurbin pdurbin self-assigned this Aug 27, 2025
@cmbz cmbz added Size: 10 A percentage of a sprint. 7 hours. and removed Size: 80 A percentage of a sprint. 56 hours. labels Aug 27, 2025
@pdurbin
Copy link
Member

pdurbin commented Aug 27, 2025

I made this pull request:

It won't fix the Dataverse SPI / Release Snapshot (pull_request) check because I didn't even try editing the GitHub Action (update: I did leave some TODOs in f570aaa we can experiment with). It does however explain how to publish a snapshot from a local env (which I did), so I hope it helps to keep this PR moving. I'm out for a few days so I'll put this back in "ready for review" so someone else can pick it up.

@pdurbin pdurbin removed their assignment Aug 27, 2025
@pdurbin pdurbin moved this from In Review 🔎 to Ready for Review ⏩ in IQSS Dataverse Project Aug 27, 2025
@cmbz cmbz added the FY26 Sprint 5 FY26 Sprint 5 (2025-08-27 - 2025-09-10) label Aug 28, 2025
@landreev
Copy link
Contributor Author

landreev commented Sep 2, 2025

@qqmyers @poikilotherm
Jim and I were talking about this last week; but this may be a better place for the discussion.
The motivation behind the PR is to add a few extra hooks to the ExportDataProvider interface in dataverse-spi. The extra goal is to implement it in a way that will not break any existing exporters, and will also allow us to keep adding more options going forward, similarly without breaking anything already implemented.

The approach I am using currently: all the methods in ExportDataProvider have been made to accept an optional ExportDataContext, as in

JsonObject getDatasetJson(ExportDataContext... context);

In order to use any of the newly added options, an exporter will do something like

datasetJson = exportDataProvider.getDatasetJson(ExportDataContext.context().withDatasetMetadataOnly()); 
or 
tabulardataJson = exportDataProvider.getTabularDataDetails(ExportDataContext.context().withOffset(offset).withLength(length));

but none of the already implemented exporters that are currently relying on getDatasetJson() etc. will need to be changed in any way. If/when we need more options in the future, they will be added to ExportDataContext, again, without breaking anything existing.

I am very flexible on all this. I have working code in a separate branch that relies on this current implementation; but it should be trivial to change it in whatever way necessary if we end up implementing it differently. And I do want to invest into implementing it in a most optimal way.

I'm not super attached to the "context" concept above (my first implementation was via individual, multiple ExportDataOptions - ExportDataOption.java is still checked in for reference).

Do we want the implementation to provide more information about the options currently supported? - This would potentially allow an exporter jar built using dataverse-spi version N to be able to both use some new options in that version, and to work with older Dataverses still using dataverse-spi prior to v. N; i.e., before requesting the withPigLatin option, it would be able to first ask whether it is supported by the provider. ... but is this something realistically useful/worth extra effort?

Once again, I'm open to, and appreciate any feedback and suggestions.

@cmbz cmbz added the FY26 Sprint 6 FY26 Sprint 6 (2025-09-10 - 2025-09-24) label Sep 14, 2025
@cmbz cmbz added the FY26 Sprint 7 FY26 Sprint 7 (2025-09-24 - 2025-10-08) label Sep 24, 2025
@cmbz cmbz added the FY26 Sprint 8 FY26 Sprint 8 (2025-10-08 - 2025-10-22) label Oct 8, 2025
@stevenwinship stevenwinship self-assigned this Oct 16, 2025
@stevenwinship stevenwinship moved this from Ready for Review ⏩ to In Review 🔎 in IQSS Dataverse Project Oct 16, 2025
* export plugins.
*/
@Deprecated
public class ExportDataOption {
Copy link
Contributor

@stevenwinship stevenwinship Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is a new unused class. Instead of Deprecating why not just remove it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just saw your comment (ExportDataOption.java is still checked in for reference). I don't think checking in unused code is good. Also, there are no tests included. Does this lower the code coverage? I'm not sure if code not under src/ is included in code coverage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevenwinship Thanks for reviewing this. Unfortunately, I missed your comments above yesterday. Yes, it would have been ideal to remove this before merging. I may make a quick followup pr removing it, before the interface jar is published on maven central.

And yes, I only left it in place for the sake of showing the reviewers an alternative implementation I first tried.

Copy link
Contributor

@stevenwinship stevenwinship left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving since this can't be tested by itself and doesn't pose any issues for dataverse backend.

@github-project-automation github-project-automation bot moved this from In Review 🔎 to Ready for QA ⏩ in IQSS Dataverse Project Oct 17, 2025
@stevenwinship stevenwinship removed their assignment Oct 17, 2025
@ofahimIQSS
Copy link
Contributor

merging

@ofahimIQSS ofahimIQSS merged commit bf08caf into develop Oct 17, 2025
2 checks passed
@github-project-automation github-project-automation bot moved this from Ready for QA ⏩ to Merged 🚀 in IQSS Dataverse Project Oct 17, 2025
@ofahimIQSS ofahimIQSS deleted the 11766-new-io.gdcc.dataverse-spi branch October 17, 2025 14:42
@ofahimIQSS
Copy link
Contributor

https://github.com/IQSS/dataverse/actions/runs/18596172556

Merge pull request #11767 from IQSS/11766-new-io.gdcc.dataverse-spi #69 failed

@landreev
Copy link
Contributor Author

https://github.com/IQSS/dataverse/actions/runs/18596172556

Merge pull request #11767 from IQSS/11766-new-io.gdcc.dataverse-spi #69 failed

OK, you didn't really mean #69, I see. I was royally confused by this at first.
I'm assuming this is a known issue, that releases are still not being built automatically. (but we know how to make them manually).
@pdurbin is this something that can be fixed easily by configuring a new secret for the action? Otherwise I may need your help with making the release. (I already got the credentials for making snapshot releases; but not "real" kind yet).

@pdurbin
Copy link
Member

pdurbin commented Oct 17, 2025

@landreev sure, happy to help. Judging from https://repo1.maven.org/maven2/io/gdcc/dataverse-spi/ and this PR you want the next release to be 2.1.0.

I'm seeing this failure at https://github.com/IQSS/dataverse/actions/runs/18596172556/job/53023163304

Error: Failed to execute goal org.sonatype.plugins:nexus-staging-maven-plugin:1.6.13:deploy (injected-nexus-deploy) on project dataverse-spi: Execution injected-nexus-deploy of goal org.sonatype.plugins:nexus-staging-maven-plugin:1.6.13:deploy failed: Server credentials with ID "ossrh" not found! -> [Help 1]

At minimum we should change these:

% ack TODO -A2 modules/dataverse-spi
modules/dataverse-spi/pom.xml
71:            <!--TODO: change this from ossrh to central?-->
72-            <id>ossrh</id>
73:            <!--TODO: change this url?-->
74-            <url>https://s01.oss.sonatype.org/service/local/staging/deploy/maven2/</url>
75-        </repository>
--
115:                    <!--TODO: change this from ossrh to central?-->
116-                    <serverId>ossrh</serverId>
117:                    <!--TODO: change this URL?-->
118-                    <nexusUrl>https://s01.oss.sonatype.org</nexusUrl>
119-                    <autoReleaseAfterClose>true</autoReleaseAfterClose>```

I had to iterate quite a bit in the Croissant repo but we don't have many rules there. I force pushed to main several times. I can make a PR and we can merge it and then repeat until it's fixed, I guess. We might end up with a version higher than 2.1.0 though.

This module seems a bit different than Croissant where we have -SNAPSHOT in the version.

Anyway, let me make a PR with what I think we should do. Obviously, if @poikilotherm is around, he'll know better! 😅

@pdurbin
Copy link
Member

pdurbin commented Oct 17, 2025

@landreev I made a PR:

Please take a look! Happy to discuss!

@landreev landreev restored the 11766-new-io.gdcc.dataverse-spi branch October 17, 2025 21:38
@pdurbin pdurbin moved this from Merged 🚀 to Done 🧹 in IQSS Dataverse Project Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

FY25 Sprint 24 FY25 Sprint 24 (2025-05-21 - 2025-06-04) FY25 Sprint 25 FY25 Sprint 25 (2025-06-04 - 2025-06-18) FY25 Sprint 26 FY25 Sprint 26 (2025-06-18 - 2025-07-02) FY26 Sprint 1 FY26 Sprint 1 (2025-07-02 - 2025-07-16) FY26 Sprint 2 FY26 Sprint 2 (2025-07-16 - 2025-07-30) FY26 Sprint 3 (2025-07-30 - 2025-08-13) FY26 Sprint 4 FY26 Sprint 4 (2025-08-13 - 2025-08-27) FY26 Sprint 5 FY26 Sprint 5 (2025-08-27 - 2025-09-10) FY26 Sprint 6 FY26 Sprint 6 (2025-09-10 - 2025-09-24) FY26 Sprint 7 FY26 Sprint 7 (2025-09-24 - 2025-10-08) FY26 Sprint 8 FY26 Sprint 8 (2025-10-08 - 2025-10-22) Size: 10 A percentage of a sprint. 7 hours.

Projects

Status: Done 🧹

Development

Successfully merging this pull request may close these issues.

Refactor and optimize the Export metadata framework, especially for the data/variable-level metadata

6 participants