You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Cleanup] Refactor profiling env vars into a CLI config (#29912)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Copy file name to clipboardExpand all lines: docs/contributing/profiling.md
+10-13Lines changed: 10 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,16 +5,15 @@
5
5
6
6
## Profile with PyTorch Profiler
7
7
8
-
We support tracing vLLM workers using the `torch.profiler` module. You can enable tracing by setting the `VLLM_TORCH_PROFILER_DIR` environment variable to the directory where you want to save the traces: `VLLM_TORCH_PROFILER_DIR=/mnt/traces/`. Additionally, you can control the profiling content by specifying the following environment variables:
8
+
We support tracing vLLM workers using the `torch.profiler` module. You can enable the torch profiler by setting `--profiler-config`
9
+
when launching the server, and setting the entries `profiler` to `'torch'` and `torch_profiler_dir` to the directory where you want to save the traces. Additionally, you can control the profiling content by specifying the following additional arguments in the config:
9
10
10
-
-`VLLM_TORCH_PROFILER_RECORD_SHAPES=1` to enable recording Tensor Shapes, off by default
11
-
-`VLLM_TORCH_PROFILER_WITH_PROFILE_MEMORY=1` to record memory, off by default
12
-
-`VLLM_TORCH_PROFILER_WITH_STACK=1` to enable recording stack information, on by default
13
-
-`VLLM_TORCH_PROFILER_WITH_FLOPS=1` to enable recording FLOPs, off by default
14
-
-`VLLM_TORCH_PROFILER_USE_GZIP=0` to disable gzip-compressing profiling files, on by default
15
-
-`VLLM_TORCH_PROFILER_DUMP_CUDA_TIME_TOTAL=0` to disable dumping and printing the aggregated CUDA self time table, on by default
16
-
17
-
The OpenAI server also needs to be started with the `VLLM_TORCH_PROFILER_DIR` environment variable set.
11
+
-`torch_profiler_record_shapes` to enable recording Tensor Shapes, off by default
12
+
-`torch_profiler_with_memory` to record memory, off by default
13
+
-`torch_profiler_with_stack` to enable recording stack information, on by default
14
+
-`torch_profiler_with_flops` to enable recording FLOPs, off by default
15
+
-`torch_profiler_use_gzip` to control gzip-compressing profiling files, on by default
16
+
-`torch_profiler_dump_cuda_time_total` to control dumping and printing the aggregated CUDA self time table, on by default
18
17
19
18
When using `vllm bench serve`, you can enable profiling by passing the `--profile` flag.
20
19
@@ -40,8 +39,7 @@ Refer to [examples/offline_inference/simple_profiling.py](../../examples/offline
0 commit comments