Skip to content

Question about codec-style patch selection implementation #80

@Ray-ui

Description

@Ray-ui

Hi authors, thank you for the great work on OneVision-Encoder.

I have a question about the “codec-style patch selection” mentioned in the paper/website.
From the codebase, I see that the model forward supports sparse inputs via patch_positions,
but I could not find an explicit implementation of the codec-style patch selection pipeline
(e.g., how patches are selected from raw videos).

Could you please clarify:

  1. Is the codec-style patch selection implemented anywhere in this repo, or are users expected
    to provide patch_positions externally?
  2. If it is external, do you plan to release the preprocessing/selection code (or a reference implementation)?
  3. What signals are used to estimate patch importance during training (e.g., motion, residuals, saliency)?

I am trying to better understand and possibly reproduce the codec-style pipeline.
Thanks for any clarification!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions