Question about codec-style patch selection implementation

Hi authors, thank you for the great work on OneVision-Encoder.

I have a question about the “codec-style patch selection” mentioned in the paper/website.
From the codebase, I see that the model forward supports sparse inputs via `patch_positions`,
but I could not find an explicit implementation of the codec-style patch selection pipeline
(e.g., how patches are selected from raw videos).

Could you please clarify:
1. Is the codec-style patch selection implemented anywhere in this repo, or are users expected
   to provide `patch_positions` externally?
2. If it is external, do you plan to release the preprocessing/selection code (or a reference implementation)?
3. What signals are used to estimate patch importance during training (e.g., motion, residuals, saliency)?

I am trying to better understand and possibly reproduce the codec-style pipeline.
Thanks for any clarification!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about codec-style patch selection implementation #80

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about codec-style patch selection implementation #80

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions