Skip to content

Conversation

@tennlee
Copy link
Collaborator

@tennlee tennlee commented Aug 13, 2025

This notebook demonstrates a possible way to turn a pipeline into a sequence-to-sequence pipeline
It duplicates functionality in the TemporalRetrieval class, but is simpler
It aims to be easier to specify, and therefore easier to explain and use, for one of the most common use cases.

@coveralls
Copy link

coveralls commented Aug 13, 2025

Pull Request Test Coverage Report for Build 18925009139

Details

  • 9 of 22 (40.91%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.04%) to 61.245%

Changes Missing Coverage Covered Lines Changed/Added Lines %
packages/pipeline/src/pyearthtools/pipeline/modifications/idx_modification.py 7 20 35.0%
Totals Coverage Status
Change from base Build 18904429097: -0.04%
Covered Lines: 9560
Relevant Lines: 15190

💛 - Coveralls

@jennan
Copy link
Collaborator

jennan commented Aug 15, 2025

Hi @tennlee ,

I like this new API, it is simpler to understand than the original TemporalRetrieval class.

An additional simplification I was thinking about it to even drop the b_indexes component, for example if using 2 pipelines for training (instead of 1 pipeline with a tuple as an output):

target_pipeline = pyearthtools.pipeline.Pipeline(
    accessor,
    pyearthtools.data.transforms.coordinates.StandardLongitude(type="-180-180"),
    sampler=pyearthtools.pipeline.samplers.Default(),
    iterator=pyearthtools.pipeline.iterators.DateRange(1980, 2016, interval='6 hours')
)

feature_pipeline = pyearthtools.pipeline.Pipeline(
    target_pipeline,
    TemporalWindow([-3,-2,-1], timedelta=timedelta, merge_method=merge_method),
)

Both pipeline share the same iterator and sampler so... I think (not sure) that they should be able to sample the same date, the target pipeline producing T+0 and the feature pipeline producing [T-3,T-2,T-1].
I can see that PipelineLightningDataModule for example seems to support getting a set of pipelines as inputs, so I guess (I have not tried sorry) that feeding data from 2 pipelines is possible, if this is a way we would want to recommend.

Another thing that could be changes is the way to pass the deltas. We could support free form specification, e.g. if we convert the input using pandas.to_timedelta.

feature_pipeline = pyearthtools.pipeline.Pipeline(
    target_pipeline,
    TemporalWindow(["-3h", "-2h", "-1h"], merge_method=merge_method),
)

That's just an idea for an alternative syntax.

@HCookie
Copy link
Collaborator

HCookie commented Aug 15, 2025

I'm interested in reviewing this, as I do think I over complicated the temporal pipeline step in the initial design.
However, my availability over the next two weeks is quite low, so it may be a bit.

@gemmaellen
Copy link
Collaborator

I think that makes sense as an interface. I might suggest adding some example usage to the docstring, both for instantiation and for getitem. In particular, it would be useful to make it clear exactly how date_of_interest relates to b_indexes.

Also, there's a small typo: "The resultant sequences may be left unmerged (i.e. a list of retrieved results for each timetime)". I believe you mean "datetime" rather than "timetime" :)

@tennlee
Copy link
Collaborator Author

tennlee commented Aug 21, 2025

I'll try to get to this tomorrow or early next week. I don't mind if we end up with a few implementations while we determine the absolute best interface, and I don't mind if there is some API change if a better design is arrived at. I will take into account the comments left thus far, code something up, probably merge it in so I can use it, but I'll put documentation in indicating it's still subject to change until we're all happy with it. Thanks everyone.

Update documentation to make it clearer that evaluation is 'coming soon' rather than being done already
Ruff and black had a minor disagreement, this should clear CI, I will circle back to our ruff/black config later.
@tennlee
Copy link
Collaborator Author

tennlee commented Oct 29, 2025

Thanks for the feedback all. I've made slight revisions. I haven't added the option to do just one grouping. I can see the use case but for me this class is very specifically about the sequence-to-sequence handling. I'd be happy to add another class for the single-grouping situation and call it something like TemporalGrouping or something. I'd be happy to receive PRs as suggestions for further improvements, but this is useful now and I think in terms of people's available time, it's better to merge the current functionality and then look to the future.

@tennlee tennlee merged commit c20f2d6 into ACCESS-Community-Hub:develop Oct 29, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants