Skip to content

Conversation

@DrXiao
Copy link
Collaborator

@DrXiao DrXiao commented Jan 18, 2026

The proposed changes aim to resolve #236 so that we can observe the performance of the bootstrapping when shecc's source code is changed.

Benchmarking script

First, a new Python script (tests/bench.py) is introduced to run the bootstrapping process and calculate the average execution time and max RSS. The usage of this new script is shown as follows:

usage: bench.py [-h] [--hostcc {cc,gcc,clang}] [--arch {arm,riscv}] [--dynlink] [--output-json OUTPUT_JSON] [--runs RUNS]

Run benchmarks for shecc

options:
  -h, --help            show this help message and exit
  --hostcc {cc,gcc,clang}
                        Host C Compiler (default: gcc)
  --arch {arm,riscv}    Target architecture (default: arm)
  --dynlink             Enable dynamic linking (default: static linking)
  --output-json OUTPUT_JSON
                        Output JSON file name (default: out/benchmark.json)
  --runs RUNS           Number of runs (default: 5)
$ ./tests/bench.py                      # default configuration          
$ ./tests/bench.py --hostcc clang       # Use clang as host C compiler
$ ./tests/bench.py --runs 10            # Repeat the bootstrapping for 10 runs
$ ./tests/bench.py --arch arm --dynlink # Arm + Dynamic linking

Combined with make:

$ make bench                        # default configuration
$ make bench CC=clang               # Use clang as host C compiler
$ make bench ARCH=arm DYNLINK=1     # Arm + Dynamic linking
$ make bench BENCH_RUNS=10          # 10 runs
$ make bench \
  BENCH_OUTPUT_JSON=out.json        # Output json filename

$ make all-bench
$ make all-bench CC=clang

New workflow

This new workflow is still a work in progress and will integrate GitHub Action for Continuous Benchmarking, allowing us to view the statistics in another GitHub repository, similar to rv32emu.

The current definitions can check whether shecc's source code has changed. If so, the workflow continues to execute the subsequent steps to install the dependencies and run the benchmarking script via make.

TODO:

  • Complete the workflow definitions and commit messages.
  • Add a new GitHub repository to store the benchmarking results. (e.g.: shecc-bench)
    • This may require the maintainer of sysprog21 to create a repository.
    • After the repository is created, another task will be initiated to add a webpage to visualize the results.

Summary by cubic

Adds a benchmarking script and a CI workflow to measure shecc bootstrapping performance (average time and max RSS) and spot regressions when core code changes. Addresses #236.

  • New Features
    • Added tests/bench.py to run bootstrapping multiple times and output JSON. Supports --hostcc (cc/gcc/clang), --arch (arm/riscv), --dynlink, --runs, --output-json.
    • Added Makefile targets: bench and all-bench, with BENCH_RUNS and BENCH_OUTPUT_JSON for simple config across architectures and link modes.
    • Added GitHub Actions workflow that runs on push/PR/manual, skips merge commits, checks for core source changes, runs on ubuntu-24.04-arm, installs deps, and runs make bench with gcc (static and dynamic). Prints JSON for future continuous benchmarking.

Written for commit 96c3fd3. Summary will update on new commits.

@jserv
Copy link
Collaborator

jserv commented Jan 18, 2026

GitHub Actions provides standard GitHub-hosted runners for Ubuntu on Arm64. This makes it practical to evaluate Arm32 execution on Arm64 using QEMU user-mode emulation, which typically offers better performance than running Arm32 emulation on x86-64 hosts.

@DrXiao DrXiao force-pushed the benchmark branch 3 times, most recently from 32c4e18 to 6e0b0ba Compare January 19, 2026 14:19
Since any source code changes may affect the performance of the
bootstrapping process, this commit adds a new Python script to measure
its performance.

The Python script provides the following arguments:
- Host C compiler (--hostcc)
- Target architecture (--arch)
- Enable dynamic linking (--dynlink)
- Number of runs (--runs)
- Output json file name (--output-json)

After executing the script, it repeatedly runs the bootstrapping process
for the compiler, calculates the average execution time and maximum
memory usage during the runs, and then stores the results in the
specified file while displaying them to the user via standard output.

Additionally, Makefile also adds two new targets:
- 'bench': run the benchmarking script using the given configuration.
  e.g.:
  $ make bench                        # default configuration
  $ make bench CC=clang               # Use clang as host C compiler
  $ make bench ARCH=arm DYNLINK=1     # Arm + Dynamic linking
  $ make bench BENCH_RUNS=10          # 10 runs
  $ make bench \
    BENCH_OUTPUT_JSON=out.json        # Output json filename

- 'all-bench': run the benchmarking script for the following
  configurations:
  - (ARCH, DYNLINK) = (arm, static)
  - (ARCH, DYNLINK) = (riscv, statis)
  - (ARCH, DYNLINK) = (arm, dynamic)

  CC can be optionally determined by the user to use GCC or Clang.
  e.g.:
  $ make all-bench
  $ make all-bench CC=clang
@DrXiao DrXiao force-pushed the benchmark branch 2 times, most recently from 744e3c7 to 9beb7b1 Compare January 19, 2026 14:50
@DrXiao
Copy link
Collaborator Author

DrXiao commented Jan 19, 2026

GitHub Actions provides standard GitHub-hosted runners for Ubuntu on Arm64.

The new workflow has been modified to use arm64 runners. Since no C source files were changed, the workflow does not execute the third and fourth steps.

Reference: result in my repository

@DrXiao
Copy link
Collaborator Author

DrXiao commented Jan 19, 2026

Because GitHub does not provide RISC-V runners, we can only measure the bootstrapping performance of shecc targeting the RISC-V architecture by running RISC-V emulation on x86-64 hosts.

@DrXiao
Copy link
Collaborator Author

DrXiao commented Jan 20, 2026

I created a new benchmark-test branch based on the benchmark branch to test all steps by removing the if conditions.
-> result in my repository

Run make bench CC=gcc
env printf "ARCH=arm" > .session.mk
==> config: (HOSTCC, ARCH, DYNLINK)=(gcc, arm, static)
==> runs: 5
==> output_json: out/benchmark-gcc-arm-static.json
Running (1/5)...
Running (2/5)...
Running (3/5)...
Running (4/5)...
Running (5/5)...

======================================================================
Benchmark results
Config     : (HOSTCC, ARCH, DYNLINK)=(gcc, arm, static)
Output file: out/benchmark-gcc-arm-static.json
======================================================================
    Average Execution Time        : 6.4354237546 second (5 runs)
    Maximum Resident Set Size     : 346512 KBytes (5 runs)
======================================================================
env printf "ARCH=arm" > .session.mk
==> config: (HOSTCC, ARCH, DYNLINK)=(gcc, arm, dynamic)
==> runs: 5
==> output_json: out/benchmark-gcc-arm-dynamic.json
Running (1/5)...
Running (2/5)...
Running (3/5)...
Running (4/5)...
Running (5/5)...

======================================================================
Benchmark results
Config     : (HOSTCC, ARCH, DYNLINK)=(gcc, arm, dynamic)
Output file: out/benchmark-gcc-arm-dynamic.json
======================================================================
    Average Execution Time        : 4.7024199772000035 second (5 runs)
    Maximum Resident Set Size     : 333264 KBytes (5 runs)
======================================================================

We can observe that the workflow uses Arm64 runners together with QEMU to execute bench.py successfully. The execution results are shown as above.

@jserv
Copy link
Collaborator

jserv commented Jan 21, 2026

We can observe that the workflow uses Arm64 runners together with QEMU to execute bench.py successfully.

Would it be worthwhile to switch to Arm64-based runners? If so, submit a pull request to implement the transition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Import a GitHub Action for benchmarking purposes

2 participants