Add comprehensive tests for the `VlMOpenLoopApproach`

To properly test this approach, we need an env that (1) produces renderings, and (2) has train tasks that we can easily solve. Currently, the only envs that satisfy (1) are `burger` and `kitchen`, but neither satisfy (2). Ideally, we create/use a super-simple env that adds state renderings to the `state.simulator_state["images"]` (perhaps we can do this for something like `cover`?).