Cuda Error When Running Inference

Hi,

Thanks for sharing the great work. When I tried running `inference.py`, I got the following error:

```
[2025-07-08 15:13:42,413] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-07-08 15:13:43,226] [INFO] [logging.py:107:log_dist] [Rank -1] [TorchCheckpointEngine] Initialized with serialization = False
Loading Checkpoint ...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00,  1.55it/s]
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
Pointing task detected. We automatically add a pointing prompt for inference.
##### INPUT #####
What is shown in this image?. Your answer should be formatted as a list of tuples, i.e. [(x1, y1), (x2, y2), ...], where each tuple contains the x and y coordinates of a point satisfying the conditions above. The coordinates should indicate the normalized pixel locations of the points in the image.
###############
Thinking disabled.
Running inference ...
/pytorch/aten/src/ATen/native/cuda/TensorCompare.cu:110: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion `probability tensor contains either `inf`, `nan` or element < 0` failed.
Traceback (most recent call last):
  File "/media/dell/NvmeDrive1/RoboBrain2.0/inference.py", line 227, in <module>
    pred = model.inference(prompt, image, task="pointing", plot=False, enable_thinking=False, do_sample=True)
  File "/media/dell/NvmeDrive1/RoboBrain2.0/inference.py", line 97, in inference
    generated_ids = self.model.generate(**inputs, max_new_tokens=768, do_sample=do_sample, temperature=temperature)
  File "/home/dell/miniconda3/envs/vlm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/dell/miniconda3/envs/vlm/lib/python3.10/site-packages/transformers/generation/utils.py", line 2465, in generate
    result = self._sample(
  File "/home/dell/miniconda3/envs/vlm/lib/python3.10/site-packages/transformers/generation/utils.py", line 3422, in _sample
    while self._has_unfinished_sequences(this_peer_finished, synced_gpus, device=input_ids.device):
  File "/home/dell/miniconda3/envs/vlm/lib/python3.10/site-packages/transformers/generation/utils.py", line 2616, in _has_unfinished_sequences
    elif this_peer_finished:
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
```

Can you please look into this?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cuda Error When Running Inference #9

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cuda Error When Running Inference #9

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions