Skip to content

Cuda Error When Running Inference #9

@junbangliang

Description

@junbangliang

Hi,

Thanks for sharing the great work. When I tried running inference.py, I got the following error:

[2025-07-08 15:13:42,413] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-07-08 15:13:43,226] [INFO] [logging.py:107:log_dist] [Rank -1] [TorchCheckpointEngine] Initialized with serialization = False
Loading Checkpoint ...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00,  1.55it/s]
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
Pointing task detected. We automatically add a pointing prompt for inference.
##### INPUT #####
What is shown in this image?. Your answer should be formatted as a list of tuples, i.e. [(x1, y1), (x2, y2), ...], where each tuple contains the x and y coordinates of a point satisfying the conditions above. The coordinates should indicate the normalized pixel locations of the points in the image.
###############
Thinking disabled.
Running inference ...
/pytorch/aten/src/ATen/native/cuda/TensorCompare.cu:110: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion `probability tensor contains either `inf`, `nan` or element < 0` failed.
Traceback (most recent call last):
  File "/media/dell/NvmeDrive1/RoboBrain2.0/inference.py", line 227, in <module>
    pred = model.inference(prompt, image, task="pointing", plot=False, enable_thinking=False, do_sample=True)
  File "/media/dell/NvmeDrive1/RoboBrain2.0/inference.py", line 97, in inference
    generated_ids = self.model.generate(**inputs, max_new_tokens=768, do_sample=do_sample, temperature=temperature)
  File "/home/dell/miniconda3/envs/vlm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/dell/miniconda3/envs/vlm/lib/python3.10/site-packages/transformers/generation/utils.py", line 2465, in generate
    result = self._sample(
  File "/home/dell/miniconda3/envs/vlm/lib/python3.10/site-packages/transformers/generation/utils.py", line 3422, in _sample
    while self._has_unfinished_sequences(this_peer_finished, synced_gpus, device=input_ids.device):
  File "/home/dell/miniconda3/envs/vlm/lib/python3.10/site-packages/transformers/generation/utils.py", line 2616, in _has_unfinished_sequences
    elif this_peer_finished:
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Can you please look into this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions