Skip to content

LLaDA2.0-mini torch error #28

@AIxyz

Description

@AIxyz
python3 /code/dInfer/benchmarks/benchmark.py --model_name /mnt/tenant-home_speed/shared/models/huggingface/LLaDA2.0-mini--572899f-C8 --gen_len 2048 --block_length 32 --gpu 0 --parallel_decoding threshold --threshold 0.9 --cache prefix --use_bd

I use dinfer v0.2.0 in 1*H800-80G to test LLaDA2.0-mini, but the error as follows, what could be the reason for this?@zheng-da

/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directl
y, please report this to the maintainers of the package that installed pynvml for you.                                                                                                        
  import pynvml  # type: ignore[import]                                                                                                                                                       
INFO 12-21 23:08:34 [__init__.py:216] Automatically detected platform cuda.                                                                                                                   
The input args are listed as follows: Namespace(model_name='/mnt/tenant-home_speed/shared/models/huggingface/LLaDA2.0-mini--572899f-C8', gpu='0', gen_len=2048, prefix_look=0, after_look=0, b
lock_length=32, threshold=0.9, warmup_times=0, low_threshold=0.3, cont_weight=0, parallel_decoding='threshold', use_credit=False, exp_name='exp', cache='prefix', use_tp=False, use_shift=Fals
e, use_bd=True, model_type='llada2')                                                                                                                                                          
started 1 0 0 Namespace(model_name='/mnt/tenant-home_speed/shared/models/huggingface/LLaDA2.0-mini--572899f-C8', gpu='0', gen_len=2048, prefix_look=0, after_look=0, block_length=32, threshol
d=0.9, warmup_times=0, low_threshold=0.3, cont_weight=0, parallel_decoding='threshold', use_credit=False, exp_name='exp', cache='prefix', use_tp=False, use_shift=False, use_bd=True, model_ty
pe='llada2', tp_size=1, port_offset=0)                                                                                                                                                        
WARNING 12-21 23:08:36 [__init__.py:3804] Current vLLM config is not set.                                                                                                                     
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0                                                                                                    
WARNING 12-21 23:08:36 [__init__.py:3804] Current vLLM config is not set.                                                                                                                     
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0                                                                                                    
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0                                                                                                    
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0                                                                                                    
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0                                                                                                    
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0                                                                                                    
INFO 12-21 23:08:36 [parallel_state.py:1165] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0                                                                 
[Loading model]                                                                                                                                                                               
EP Enabled: True                                                                                                                                                                              
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 38.08it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14813/14813 [00:00<00:00, 1478270.36it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:13<00:00,  1.45it/s]
unused_keys []                                                                                                                                                                                
not_inited_keys []                                                                                                                                                                            
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 259/259 [00:00<00:00, 348.66it/s]
/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py:282: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setti
ng `torch.set_float32_matmul_precision('high')` for better performance.                                                                                                                       
  warnings.warn(                                                                                                                                                                              
[rank0]: Traceback (most recent call last):                                                                                                                                                   
[rank0]:   File "/code/dInfer/benchmarks/benchmark.py", line 209, in <module>                                                                                                                 
[rank0]:     main(1, 0, gpus[0], args)                                                                                                                                                        
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context                                                                           
[rank0]:     return func(*args, **kwargs)                                                                                                                                                     
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                     
[rank0]:   File "/code/dInfer/benchmarks/benchmark.py", line 135, in main                                                                                                                     
[rank0]:     dllm.generate(input_ids, gen_length=gen_length, block_length=block_length)                                                                                                       
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
[rank0]:     return func(*args, **kwargs)                                                                                                                                                     
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                     
[rank0]:   File "/code/dInfer/python/dinfer/decoding/generate_uniform.py", line 1074, in naive_batching_generate                                                                              
[rank0]:     self.block_runner.prefill(self.model, x[:, :prefill_length], kv_cache, pos_ids[:, :prefill_length], bd_attn_mask[:,:prefill_length,:prefill_length], self.prefilling_limit, block
_length)                                                                                                                                                                                      
[rank0]:   File "/code/dInfer/python/dinfer/decoding/generate_uniform.py", line 168, in prefill                                                                                               
[rank0]:     output = model(prefilling_x.clone(memory_format=torch.contiguous_format), use_cache=True, attention_mask=attn_mask, position_ids=pos_ids.clone(memory_format=torch.contiguous_for
mat))                                                                                                                                                                                         
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^                                                                                                                                                                                         
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl                                                                        
[rank0]:     return self._call_impl(*args, **kwargs)                                                                                                                                          
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                          
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl                                                                                
[rank0]:     return forward_call(*args, **kwargs)                                                                                                                                             
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                             
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 749, in compile_wrapper                                                                           
[rank0]:     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1                                                                                                            
[rank0]:     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                         
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 923, in _compile_fx_inner
[rank0]:     raise InductorError(e, currentframe()).with_traceback(
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 907, in _compile_fx_inner
[rank0]:     mb_compiled_graph = fx_codegen_and_compile(
[rank0]:                         ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1578, in fx_codegen_and_compile
[rank0]:     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1236, in codegen_and_compile
[rank0]:     _recursive_post_grad_passes(gm, is_inference=is_inference)
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 504, in _recursive_post_grad_passes
[rank0]:     post_grad_passes(gm, is_inference) 
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 208, in post_grad_passes
[rank0]:     GraphTransformObserver(gm, "decompose_auto_functionalized").apply_graph_pass(
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/fx/passes/graph_transform_observer.py", line 85, in apply_graph_pass
[rank0]:     return pass_fn(self.gm.graph)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 1232, in decompose_auto_functionalized
[rank0]:     graph_pass.apply(graph)
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/pattern_matcher.py", line 1963, in apply
[rank0]:     entry.apply(m, graph, node)
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/pattern_matcher.py", line 1115, in apply
[rank0]:     self.handler(match, *match.args, **match.kwargs)
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 1230, in _
[rank0]:     match.replace_by_example(decomp, flat_args, run_functional_passes=False)
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/pattern_matcher.py", line 309, in replace_by_example
[rank0]:     assert len(graph_with_eager_vals.graph.nodes) == len(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: torch._inductor.exc.InductorError: AssertionError: 

[rank0]: Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

[rank0]:[W1221 23:09:28.578579446 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see
 https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions