-
Notifications
You must be signed in to change notification settings - Fork 35
Open
Description
python3 /code/dInfer/benchmarks/benchmark.py --model_name /mnt/tenant-home_speed/shared/models/huggingface/LLaDA2.0-mini--572899f-C8 --gen_len 2048 --block_length 32 --gpu 0 --parallel_decoding threshold --threshold 0.9 --cache prefix --use_bdI use dinfer v0.2.0 in 1*H800-80G to test LLaDA2.0-mini, but the error as follows, what could be the reason for this?@zheng-da
/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directl
y, please report this to the maintainers of the package that installed pynvml for you.
import pynvml # type: ignore[import]
INFO 12-21 23:08:34 [__init__.py:216] Automatically detected platform cuda.
The input args are listed as follows: Namespace(model_name='/mnt/tenant-home_speed/shared/models/huggingface/LLaDA2.0-mini--572899f-C8', gpu='0', gen_len=2048, prefix_look=0, after_look=0, b
lock_length=32, threshold=0.9, warmup_times=0, low_threshold=0.3, cont_weight=0, parallel_decoding='threshold', use_credit=False, exp_name='exp', cache='prefix', use_tp=False, use_shift=Fals
e, use_bd=True, model_type='llada2')
started 1 0 0 Namespace(model_name='/mnt/tenant-home_speed/shared/models/huggingface/LLaDA2.0-mini--572899f-C8', gpu='0', gen_len=2048, prefix_look=0, after_look=0, block_length=32, threshol
d=0.9, warmup_times=0, low_threshold=0.3, cont_weight=0, parallel_decoding='threshold', use_credit=False, exp_name='exp', cache='prefix', use_tp=False, use_shift=False, use_bd=True, model_ty
pe='llada2', tp_size=1, port_offset=0)
WARNING 12-21 23:08:36 [__init__.py:3804] Current vLLM config is not set.
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
WARNING 12-21 23:08:36 [__init__.py:3804] Current vLLM config is not set.
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
INFO 12-21 23:08:36 [parallel_state.py:1165] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
[Loading model]
EP Enabled: True
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 38.08it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14813/14813 [00:00<00:00, 1478270.36it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:13<00:00, 1.45it/s]
unused_keys []
not_inited_keys []
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 259/259 [00:00<00:00, 348.66it/s]
/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py:282: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setti
ng `torch.set_float32_matmul_precision('high')` for better performance.
warnings.warn(
[rank0]: Traceback (most recent call last):
[rank0]: File "/code/dInfer/benchmarks/benchmark.py", line 209, in <module>
[rank0]: main(1, 0, gpus[0], args)
[rank0]: File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/code/dInfer/benchmarks/benchmark.py", line 135, in main
[rank0]: dllm.generate(input_ids, gen_length=gen_length, block_length=block_length)
[rank0]: File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/code/dInfer/python/dinfer/decoding/generate_uniform.py", line 1074, in naive_batching_generate
[rank0]: self.block_runner.prefill(self.model, x[:, :prefill_length], kv_cache, pos_ids[:, :prefill_length], bd_attn_mask[:,:prefill_length,:prefill_length], self.prefilling_limit, block
_length)
[rank0]: File "/code/dInfer/python/dinfer/decoding/generate_uniform.py", line 168, in prefill
[rank0]: output = model(prefilling_x.clone(memory_format=torch.contiguous_format), use_cache=True, attention_mask=attn_mask, position_ids=pos_ids.clone(memory_format=torch.contiguous_for
mat))
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 749, in compile_wrapper
[rank0]: raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 923, in _compile_fx_inner
[rank0]: raise InductorError(e, currentframe()).with_traceback(
[rank0]: File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 907, in _compile_fx_inner
[rank0]: mb_compiled_graph = fx_codegen_and_compile(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1578, in fx_codegen_and_compile
[rank0]: return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1236, in codegen_and_compile
[rank0]: _recursive_post_grad_passes(gm, is_inference=is_inference)
[rank0]: File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 504, in _recursive_post_grad_passes
[rank0]: post_grad_passes(gm, is_inference)
[rank0]: File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 208, in post_grad_passes
[rank0]: GraphTransformObserver(gm, "decompose_auto_functionalized").apply_graph_pass(
[rank0]: File "/usr/local/lib/python3.12/dist-packages/torch/fx/passes/graph_transform_observer.py", line 85, in apply_graph_pass
[rank0]: return pass_fn(self.gm.graph)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 1232, in decompose_auto_functionalized
[rank0]: graph_pass.apply(graph)
[rank0]: File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/pattern_matcher.py", line 1963, in apply
[rank0]: entry.apply(m, graph, node)
[rank0]: File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/pattern_matcher.py", line 1115, in apply
[rank0]: self.handler(match, *match.args, **match.kwargs)
[rank0]: File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 1230, in _
[rank0]: match.replace_by_example(decomp, flat_args, run_functional_passes=False)
[rank0]: File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/pattern_matcher.py", line 309, in replace_by_example
[rank0]: assert len(graph_with_eager_vals.graph.nodes) == len(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: torch._inductor.exc.InductorError: AssertionError:
[rank0]: Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
[rank0]:[W1221 23:09:28.578579446 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see
https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
Metadata
Metadata
Assignees
Labels
No labels