ggml rope v2 demo #59

ngxson · 2025-12-15T14:52:38Z

Summary by CodeRabbit

Release Notes

New Features
- Introduced composable RoPE (Rotary Position Embedding) operation with configurable frequency factors and YaRN parameters.
- Added support for multi-section RoPE mode and multiple rope ordering strategies.
- Extended Metal GPU backend support for RoPE operations.
Tests
- Added comprehensive test suite validating rope composition across various configurations and data types.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-15T14:52:53Z

Note

Reviews paused

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Walkthrough

This PR adds a composable RoPE (Rotary Position Embedding) operation to GGML, introducing a new GGML_OP_ROPE_COMP operation with associated public APIs and implementations across CPU and Metal GPU backends. The new operation supports frequency factor configuration, YaRN parameter settings, and multi-section RoPE modes, with comprehensive test coverage validating both paths.

Changes

Cohort / File(s)	Summary
Public API headers `ggml/include/ggml.h`	Added `GGML_OP_ROPE_COMP` enum value, new `ggml_rope_ordering` enum with `GGML_ROPE_ORDERING_NORMAL` and `GGML_ROPE_ORDERING_NEOX`, and four new public functions: `ggml_rope_comp`, `ggml_rope_comp_set_freq_factors`, `ggml_rope_comp_set_yarn`, and `ggml_rope_comp_set_multi` for composable RoPE operations
Core operation implementation `ggml/src/ggml.c`	Implemented the public RoPE composition APIs; updated op count from 95 to 96; added `GGML_OP_ROPE_COMP` to op name/symbol tables; wired backward expansion and ignore-src handling for the new operation to mirror existing ROPE behavior
CPU backend `ggml/src/ggml-cpu/ggml-cpu.c`, `ggml/src/ggml-cpu/ops.cpp`, `ggml/src/ggml-cpu/ops.h`	Added case for `GGML_OP_ROPE_COMP` in forward computation path; implemented `ggml_compute_forward_rope_comp` with templated float/fp16 variants; integrated task scheduling and work-size estimation for the new operation
Metal GPU backend: device `ggml/src/ggml-metal/ggml-metal-device.{cpp,h}`, `ggml/src/ggml-metal/ggml-metal-device.m`	Added `ggml_metal_library_get_pipeline_rope_comp` function to fetch/compile Metal pipelines; added support check for `GGML_OP_ROPE_COMP` operation on Metal backend
Metal GPU backend: kernels & operations `ggml/src/ggml-metal/ggml-metal-impl.h`, `ggml/src/ggml-metal/ggml-metal-ops.{cpp,h}`, `ggml/src/ggml-metal/ggml-metal.metal`	Added `ggml_metal_kargs_rope_comp` typedef for kernel arguments; implemented GPU kernel `kernel_rope_comp` with per-element rotation logic for f32 and f16 types; added `ggml_metal_op_rope_comp` encoder function with Metal pipeline dispatch
Metal optimization `ggml/src/ggml-metal/ggml-metal-common.cpp`	Added `GGML_OP_ROPE_COMP` to the set of operations safe for concurrent reordering in Metal graph optimization
Tests `tests/test-rope.cpp`	Added comprehensive test suite `test_rope_comp()` that parameterizes across rope types, modes, data types, and configurations; compares standard rope and rope_comp implementations via NMSE validation across diverse test cases

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

GPU kernel correctness: Review the Metal kernel implementation (kernel_rope_comp) for per-element rotation logic, frequency factor handling, YaRN parameterization, and numerical accuracy.
Parameter consistency: Verify that op parameters are correctly extracted and passed from core APIs through CPU and Metal backends.
Test coverage: Validate that test cases cover the major rope variants (MROPE, NEOX, VISION) and that NMSE thresholds are appropriate.
Backward compatibility: Confirm that existing rope operations remain unmodified and that the new operation integrates cleanly without affecting existing paths.

Poem

🐰 A rope so new, composable and bright,
With YaRN and factors, a quantum delight,
From CPU to Metal, GPU ablaze,
Through GGML's tunnels our embeddings graze!

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is completely missing. The repository template requires contributors to reference contributing guidelines, and the author provided no descriptive content.	Add a comprehensive description explaining the RoPE v2 implementation, the new composable API, affected files, and testing approach. Reference the contributing guidelines as specified in the template.
Docstring Coverage	⚠️ Warning	Docstring coverage is 12.12% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	❓ Inconclusive	The title 'ggml rope v2 demo' is vague and generic, using non-descriptive terms that don't clearly convey the specific technical changes made in the PR.	Use a more descriptive title that explains the main change, such as 'Add composable RoPE API (rope_comp)' or 'Implement RoPE v2 with composable operations'.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ngxson · 2025-12-15T14:53:03Z

@coderabbitai not working, something is wrong here. find the mistake

ngxson · 2025-12-15T14:56:44Z

@coderabbitai these are unrelated errors. focus on the main change, the rope algorithm

coderabbitai · 2025-12-15T15:02:14Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get the list of all changed files in the PR
gh pr view 59 --json files --jq '.files[] | .path'

Length of output: 9

🏁 Script executed:

#!/bin/bash
# Get the full diff to see all RoPE-related changes
gh pr diff 59

Length of output: 1861

🏁 Script executed:

#!/bin/bash
# Search for RoPE-related implementation files
fd -t f -e cpp -e c -e h "rope"

Length of output: 84

🏁 Script executed:

#!/bin/bash
# Also search for files containing rope in their content
rg -l "rope" --type cpp --type c

Length of output: 4472

🏁 Script executed:

#!/bin/bash
# Check all commits in this PR to see if there are rope-related changes
gh pr view 59 --json commits --jq '.commits[] | {message: .messageHeadline, sha: .oid}'

Length of output: 79

🏁 Script executed:

#!/bin/bash
# Get more details about the PR including base and head branches
gh pr view 59 --json baseRefName,headRefName,title,body

Length of output: 235

🏁 Script executed:

#!/bin/bash
# Check if there are any rope-related changes in the test file
git diff origin/master...HEAD -- tests/test-rope.cpp | head -n 200

Length of output: 0

🏁 Script executed:

#!/bin/bash
# List all modified files in the current branch compared to master
git diff --name-only origin/master...HEAD

Length of output: 138

🏁 Script executed:

#!/bin/bash
# Check if there are uncommitted changes or staged changes
git status

Length of output: 61

🏁 Script executed:

#!/bin/bash
# Check the RoPE-related changes that are in the branch but not in the PR diff
git diff origin/master...HEAD -- ggml/include/ggml.h | head -n 300

Length of output: 2023

🏁 Script executed:

#!/bin/bash
# Check the main ggml.c file for rope changes
git diff origin/master...HEAD -- ggml/src/ggml.c | grep -A 20 -B 5 "rope" | head -n 500

Length of output: 2828

🏁 Script executed:

#!/bin/bash
# Check ops.cpp for rope implementation
git diff origin/master...HEAD -- ggml/src/ggml-cpu/ops.cpp | grep -A 30 -B 5 "rope" | head -n 500

Length of output: 4187

🏁 Script executed:

#!/bin/bash
# Let's also check the mistral3.cpp changes
git diff origin/master...HEAD -- src/models/mistral3.cpp | head -n 200

Length of output: 2493

🏁 Script executed:

#!/bin/bash
# Look at the original ggml_rope_ext function signature and usage to compare parameter order
rg -A 10 "ggml_rope_ext\(" ggml/include/ggml.h | head -n 30

Length of output: 463

🏁 Script executed:

#!/bin/bash
# Check how rope_ext is typically called in other model files to see the beta_fast/beta_slow values
rg "ggml_rope_ext" src/models/llama.cpp -A 2 -B 2 | head -n 50

Length of output: 433

🏁 Script executed:

#!/bin/bash
# Look at the ggml_rope_yarn_corr_dim function to understand how beta affects the dimension calculation
rg -B 5 -A 20 "ggml_rope_yarn_corr_dim" ggml/src/ggml.c | head -n 60

Length of output: 2245

🏁 Script executed:

#!/bin/bash
# Check the compute implementation to see how yarn_high and yarn_low are used
rg -B 5 -A 10 "yarn_high.*yarn_low" ggml/src/ggml-cpu/ops.cpp

Length of output: 1555

🏁 Script executed:

#!/bin/bash
# Check the rope_yarn_ramp function implementation
rg -B 2 -A 10 "rope_yarn_ramp" ggml/src/ggml-cpu/ops.cpp

Length of output: 1548

🏁 Script executed:

#!/bin/bash
# Check typical beta_fast and beta_slow values used in the codebase
rg "beta_fast.*beta_slow" src/models/ -A 1 -B 1 | head -n 100

Length of output: 6061

🏁 Script executed:

#!/bin/bash
# Look at the ggml_rope_yarn_corr_dims function signature and see typical usage
rg "ggml_rope_yarn_corr_dims" -B 2 -A 2 | head -n 50

Length of output: 2636

🏁 Script executed:

#!/bin/bash
# Check the full parameter list of ggml_rope_ext to understand the order
rg -A 15 "GGML_API struct ggml_tensor \* ggml_rope_ext\(" ggml/include/ggml.h

Length of output: 653

🏁 Script executed:

#!/bin/bash
# Let's verify the math - check what values beta_fast and beta_slow produce
python3 << 'EOF'
import math

n_dims = 128
n_ctx_orig = 8192
freq_base = 10000.0
beta_fast = 32.0
beta_slow = 1.0

def ggml_rope_yarn_corr_dim(n_dims, n_ctx_orig, n_rot, base):
    return n_dims * math.log(n_ctx_orig / (n_rot * 2 * math.pi)) / (2 * math.log(base))

dim_from_beta_fast = ggml_rope_yarn_corr_dim(n_dims, n_ctx_orig, beta_fast, freq_base)
dim_from_beta_slow = ggml_rope_yarn_corr_dim(n_dims, n_ctx_orig, beta_slow, freq_base)

print(f"Dimension from beta_fast (32.0): {dim_from_beta_fast}")
print(f"Dimension from beta_slow (1.0): {dim_from_beta_slow}")
print(f"\nAfter floor/ceil:")
print(f"yarn_high (from beta_fast): {math.floor(dim_from_beta_fast)}")
print(f"yarn_low (from beta_slow): {math.ceil(dim_from_beta_slow)}")
print(f"\nExpected: low < high, but we have yarn_high={math.floor(dim_from_beta_fast)} and yarn_low={math.ceil(dim_from_beta_slow)}")
print(f"This means the variable names are SWAPPED!")
EOF

Length of output: 287

🏁 Script executed:

#!/bin/bash
# Check how rope_yarn_ramp expects the parameters - it should be (low, high, i0)
rg -B 5 -A 5 "^static float rope_yarn_ramp" ggml/src/ggml-cpu/ops.cpp

Length of output: 317

@ngxson I found the bug in the RoPE algorithm! 🎯

Location: ggml/src/ggml.c in the ggml_rope_comp_set_yarn function

The Problem: The variable names yarn_high and yarn_low are swapped. Here's the math:Perfect! According to the reference implementation, low is computed from beta_fast and high is computed from beta_slow.

The bug is in ggml/src/ggml.c in the ggml_rope_comp_set_yarn function:

// WRONG - variable names are swapped!
float yarn_high = floorf(ggml_rope_yarn_corr_dim(n_dims, n_ctx_orig, beta_fast, freq_base));
float yarn_low  =  ceilf(ggml_rope_yarn_corr_dim(n_dims, n_ctx_orig, beta_slow, freq_base));

With typical values (beta_fast=32.0, beta_slow=1.0):

beta_fast=32.0 produces dimension ≈25.76 → floor = 25 (this is the LOW value)
beta_slow=1.0 produces dimension ≈49.84 → ceil = 50 (this is the HIGH value)

But the code assigns them backwards! The variable named yarn_high gets the smaller value (25), and yarn_low gets the larger value (50).

The fix:

// CORRECT - swap the variable names
float yarn_low  = floorf(ggml_rope_yarn_corr_dim(n_dims, n_ctx_orig, beta_fast, freq_base));
float yarn_high =  ceilf(ggml_rope_yarn_corr_dim(n_dims, n_ctx_orig, beta_slow, freq_base));

This matches the original ggml_rope_yarn_corr_dims function where:

start (low) = floor(beta_fast result)
end (high) = ceil(beta_slow result)

The rope_yarn_ramp function expects (low, high, i0) and calculates (i0/2 - low) / (high - low), which requires low < high to work correctly.

🧠 Learnings used

Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 14
File: examples/lookahead/lookahead.cpp:97-98
Timestamp: 2025-03-13T22:56:47.417Z
Learning: ngxson prefers to prioritize simplification in some cases and defer adding error handling to a later time, as indicated by their response to a suggestion about adding error checks for llama_decode_ext calls.

Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 14
File: examples/speculative-simple/speculative-simple.cpp:172-172
Timestamp: 2025-03-13T22:56:06.114Z
Learning: For the speculative-simple example in llama.cpp, ngxson prefers deferring error handling of decode functions to maintain code simplicity in the initial implementation.

Learnt from: CR
Repo: ngxson/llama.cpp PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T22:55:53.865Z
Learning: Applies to src/llama.cpp : Maintain core library implementation in `src/llama.cpp` with careful attention to API contracts and backward compatibility

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

ggml/src/ggml.c (2)
993-1055: Add missing GGML_OP_SYMBOL entry for GGML_OP_ROPE_COMP to avoid null symbol lookups

You’ve added "ROPE_COMP" to GGML_OP_NAME and bumped GGML_OP_COUNT to 96, but GGML_OP_SYMBOL still only has "rope(x)" and "rope_back(x)". For GGML_OP_ROPE_COMP this means ggml_op_symbol(op) will return a null pointer, which is later passed to %s formatters (e.g. in ggml_graph_print / DOT dumping), risking UB or crashes.

Recommend adding a symbol in the corresponding slot:
-    "soft_max_back(x)",
-    "rope(x)",
-    "rope_back(x)",
+    "soft_max_back(x)",
+    "rope(x)",
+    "rope_comp(x)",  // GGML_OP_ROPE_COMP
+    "rope_back(x)",
and keeping the static_assert at 96 as you already did.

6913-6941: GGML_OP_ROPE_COMP backward pass is not implemented and will abort during training

ggml_build_backward_expand treats GGML_OP_ROPE_COMP identically to GGML_OP_ROPE in the ignore_src[1] section (lines 6933-6934), but ggml_compute_backward has a case GGML_OP_ROPE: (line 6663) with gradient computation while GGML_OP_ROPE_COMP has no corresponding case.

If a training graph includes a GGML_OP_ROPE_COMP node whose output influences a loss, the backward pass will reach that node and hit the default case, aborting with: "unsupported ggml op for backward pass: GGML_OP_ROPE_COMP".

Either:

Add a case GGML_OP_ROPE_COMP: implementation to ggml_compute_backward (mirroring GGML_OP_ROPE if the computation is identical), or

Exclude GGML_OP_ROPE_COMP from gradient computation in ggml_build_backward_expand and document it as inference-only.

🧹 Nitpick comments (3)

ggml/include/ggml.h (1)

1862-1896: New RoPE composable API is coherent; consider minor comment/API polish

The new ggml_rope_ordering enum and the trio ggml_rope_comp / ggml_rope_comp_set_yarn / ggml_rope_comp_set_multi form a clear, composable RoPE configuration surface and match the implementations described in ggml.c. Function naming and parameter ordering are consistent with existing RoPE helpers.

Two minor polish points you may want to address (non-blocking):

The comment // demo new RoPE API (NOT yet to be merged) will be misleading once this PR is merged. Consider rewording to something like // experimental RoPE composable API or removing it.

ggml_rope_comp uses int32_t n_dims while the older RoPE APIs nearby use plain int for n_dims. It’s not wrong, but if you care about API surface consistency you could standardize on one type across the RoPE functions in a follow-up.

If you’d like, I can propose a small follow-up diff that harmonizes the RoPE comments and n_dims types across these declarations.

ggml/src/ggml-cpu/ops.cpp (1)

5859-5863: Thread partitioning and idx_offset leave early heads untouched and reduce parallelism

Two related points around idx_offset:

You derive nr from ggml_nrows(dst) (i.e. ne1 * ne2 * ne3), but the work loop only iterates heads in [idx_offset, ne1). Effective rows are nr_eff = (ne1 - idx_offset) * ne2 * ne3. Using nr for dr/ir0/ir1 means that when idx_offset is non‑zero, some threads will get no work even though others still have rows to process. This is correct but wastes parallelism for partial‑head use.

Because both the rotation and the “copy remaining channels” loop are nested under for (int64_t i1 = idx_offset; i1 < ne1; i1++), heads with index < idx_offset are never written in dst at all. That’s fine if the current API never sets idx_offset > 0 (or guarantees that those heads are filled elsewhere), but if/when you start using non‑zero idx_offset from ggml_rope_comp_set_multi, those heads will contain stale/undefined data.

If you plan to actually use idx_offset > 0, it would be safer to:

Assert 0 <= idx_offset && idx_offset <= ne1, and

Either:

base nr on (ne1 - idx_offset) for proper thread partitioning, and/or

explicitly copy src0 → dst for heads < idx_offset before the main loop.

Also applies to: 5925-5943
ggml/src/ggml.c (1)
4271-4349: Tighten ggml_rope_comp param invariants and reuse existing YARN helpers

The layout and use of op_params between ggml_rope_comp and ggml_rope_comp_set_yarn is internally consistent, but a couple of small hardening tweaks would reduce footguns:

n_dims is used as a divisor (-2.0f / n_dims) and, for GGML_ROPE_ORDERING_NEOX, as n_dims / 2—consider guarding it:
 GGML_API struct ggml_tensor * ggml_rope_comp(
         struct ggml_context   * ctx,
         struct ggml_tensor    * a,
         struct ggml_tensor    * b,
         int32_t                 n_dims,
         float                   freq_base,
         enum ggml_rope_ordering ordering) {
-    GGML_ASSERT(ggml_is_vector(b));
+    GGML_ASSERT(n_dims > 0);
+    GGML_ASSERT(ggml_is_vector(b));
@@
-    if (ordering == GGML_ROPE_ORDERING_NEOX) {
+    if (ordering == GGML_ROPE_ORDERING_NEOX) {
+        GGML_ASSERT((n_dims % 2) == 0);
         idx_pair  = n_dims / 2;
         idx_scale = 2;
     }
ggml_rope_comp_set_yarn duplicates and slightly diverges from ggml_rope_yarn_corr_dims (no clamping to [0, n_dims - 1]). If you want behavior consistent with existing YARN RoPE, you can rely on that helper:
-    float yarn_high = floorf(ggml_rope_yarn_corr_dim(n_dims, n_ctx_orig, beta_fast, freq_base));
-    float yarn_low  =  ceilf(ggml_rope_yarn_corr_dim(n_dims, n_ctx_orig, beta_slow, freq_base));
+    float dims[2];
+    ggml_rope_yarn_corr_dims(n_dims, n_ctx_orig, freq_base, beta_fast, beta_slow, dims);
+    float yarn_high = dims[0];
+    float yarn_low  = dims[1];
This keeps all RoPE+YARN variants using the same correction-dimension logic while still defaulting YARN off (all zeros) in the initial ggml_rope_comp call. Given your preference to keep things simple and defer more exhaustive error handling, feel free to treat the additional asserts as optional if you don’t expect adversarial inputs. Based on learnings, …

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 165caaf and 5078cfe.

📒 Files selected for processing (6)

ggml/include/ggml.h (2 hunks)
ggml/src/ggml-cpu/ggml-cpu.c (3 hunks)
ggml/src/ggml-cpu/ops.cpp (1 hunks)
ggml/src/ggml-cpu/ops.h (1 hunks)
ggml/src/ggml.c (5 hunks)
src/models/mistral3.cpp (2 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{cpp,h,hpp}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{cpp,h,hpp}: Always format C++ code using git clang-format before committing, following .clang-format configuration (4-space indentation, 120 column limit, braces on same line for functions, pointer/reference alignment in middle)
Minimize external dependencies; avoid adding new external dependencies unless absolutely necessary

Files:

ggml/src/ggml-cpu/ops.h
src/models/mistral3.cpp
ggml/include/ggml.h
ggml/src/ggml-cpu/ops.cpp

**/*.{cpp,h,hpp,py}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Ensure cross-platform compatibility by testing code changes on Linux, macOS, and Windows when possible

Files:

ggml/src/ggml-cpu/ops.h
src/models/mistral3.cpp
ggml/include/ggml.h
ggml/src/ggml-cpu/ops.cpp

src/**/*.cpp

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Prioritize performance optimization in core library implementations in src/, as this is a performance-critical inference library

Files:

src/models/mistral3.cpp

🧠 Learnings (2)

📓 Common learnings

Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 14
File: examples/lookahead/lookahead.cpp:97-98
Timestamp: 2025-03-13T22:56:47.417Z
Learning: ngxson prefers to prioritize simplification in some cases and defer adding error handling to a later time, as indicated by their response to a suggestion about adding error checks for llama_decode_ext calls.

📚 Learning: 2025-11-29T22:55:53.865Z

Learnt from: CR
Repo: ngxson/llama.cpp PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T22:55:53.865Z
Learning: Applies to src/llama.cpp : Maintain core library implementation in `src/llama.cpp` with careful attention to API contracts and backward compatibility

Applied to files:

ggml/src/ggml-cpu/ops.cpp

🧬 Code graph analysis (6)

ggml/src/ggml-cpu/ops.h (1)

ggml/src/ggml-cpu/ops.cpp (2)

ggml_compute_forward_rope_comp (5948-5968)

ggml_compute_forward_rope_comp (5948-5950)

ggml/src/ggml-cpu/ggml-cpu.c (1)

ggml/src/ggml-cpu/ops.cpp (2)

ggml_compute_forward_rope_comp (5948-5968)

ggml_compute_forward_rope_comp (5948-5950)

src/models/mistral3.cpp (1)

ggml/src/ggml.c (2)

ggml_rope_comp (4271-4323)

ggml_rope_comp_set_yarn (4325-4349)

ggml/src/ggml.c (1)

ggml/src/ggml-impl.h (1)

ggml_set_op_params (145-149)

ggml/include/ggml.h (1)

ggml/src/ggml.c (2)

ggml_rope_comp (4271-4323)

ggml_rope_comp_set_yarn (4325-4349)

ggml/src/ggml-cpu/ops.cpp (2)

ggml/src/ggml.c (1)

ggml_nrows (1231-1235)

ggml/src/ggml-sycl/rope.cpp (2)

rope_yarn_ramp (13-16)

rope_yarn_ramp (13-13)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: windows-latest-cmake-hip
GitHub Check: windows-2022-cmake-cuda (12.4)
GitHub Check: windows-latest-cmake (llvm-arm64, arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/ar...
GitHub Check: windows-latest-cmake-sycl
GitHub Check: macOS-latest-cmake-tvos
GitHub Check: macOS-latest-cmake-ios
GitHub Check: ubuntu-24-cmake-vulkan-deb
GitHub Check: ubuntu-22-cmake-hip
GitHub Check: server-windows

🔇 Additional comments (6)

ggml/src/ggml-cpu/ops.h (1)

63-65: Forward prototype for ROPE_COMP looks consistent

The new ggml_compute_forward_rope_comp declaration matches the implementation in ggml/src/ggml-cpu/ops.cpp and is placed correctly alongside the other RoPE forward ops. No issues from an ABI or style perspective.

ggml/include/ggml.h (1)

516-519: GGML_OP_ROPE_COMP placement in ggml_op enum is reasonable

Adding GGML_OP_ROPE_COMP immediately after GGML_OP_ROPE keeps RoPE-related ops grouped and is consistent with how other op families are organized. As long as the corresponding op-name/symbol lookup tables and any serialization that depends on enum order are updated (which appears to be done in ggml.c per the PR summary), this looks fine.

You may want to double-check that any on-disk or inter-process formats that encode enum ggml_op by ordinal (if any) are not relying on the previous numeric values of entries after GGML_OP_ROPE. If they do, bumping a format version or using symbolic encoding might be safer.

src/models/mistral3.cpp (1)

41-41: LGTM! Rope factors retrieval commented out as part of API migration.

The migration from ggml_rope_ext to ggml_rope_comp removes the need for explicit rope_factors retrieval. The new API handles rope configuration through ggml_rope_comp_set_yarn.

ggml/src/ggml-cpu/ggml-cpu.c (3)

1866-1869: LGTM! Forward routing for ROPE_COMP correctly integrated.

The case statement properly routes GGML_OP_ROPE_COMP to ggml_compute_forward_rope_comp, following the established pattern for operation dispatch.

2301-2301: LGTM! Task scheduling correctly treats ROPE_COMP as parallelizable.

The operation is properly grouped with GGML_OP_ROPE and GGML_OP_ROPE_BACK, allowing multi-threaded execution with n_tasks = n_threads.

2820-2820: LGTM! Work size estimation consistent with ROPE operations.

The work buffer allocation for GGML_OP_ROPE_COMP correctly mirrors the pattern used by GGML_OP_ROPE and GGML_OP_ROPE_BACK, allocating F32 temporary storage sized by node->ne[0] * n_tasks.

ggml/src/ggml-cpu/ops.cpp

src/models/mistral3.cpp

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

ggml/src/ggml-cpu/ops.cpp (1)
5893-5905: Yarn ramp still called with swapped (high, low) arguments

rope_yarn_ramp is declared as rope_yarn_ramp(const float low, const float high, const int i0) and other call sites pass (low, high, i0). Here init_cache does:
float ramp_mix = rope_yarn_ramp(yarn_high, yarn_low, i0) * ramp_factor;
which inverts low/high and breaks the intended ramp region, effectively disabling correct YaRN scaling for GGML_OP_ROPE_COMP. This is the same issue previously flagged in an earlier review and is still present.

Change the call to pass low first, high second:
-                    float ramp_mix = rope_yarn_ramp(yarn_high, yarn_low, i0) * ramp_factor;
+                    float ramp_mix = rope_yarn_ramp(yarn_low, yarn_high, i0) * ramp_factor;

🧹 Nitpick comments (1)

ggml/src/ggml-cpu/ops.cpp (1)

5835-5852: sections parameters are parsed but currently unused (pending M‑RoPE support)

sections[0..3] are copied out of dst->op_params but never used, and there’s only a // TODO M-RoPE comment. This is harmless but will likely trigger an unused-variable warning.

If M‑RoPE support isn’t wired in yet, consider either:

Removing sections and its memcpy lines for now, or

Marking them explicitly unused (e.g., GGML_UNUSED(sections);) until the multi‑rope logic is implemented.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5078cfe and c69a94e.

📒 Files selected for processing (1)

ggml/src/ggml-cpu/ops.cpp (1 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.{cpp,h,hpp}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{cpp,h,hpp}: Always format C++ code using git clang-format before committing, following .clang-format configuration (4-space indentation, 120 column limit, braces on same line for functions, pointer/reference alignment in middle)
Minimize external dependencies; avoid adding new external dependencies unless absolutely necessary

Files:

ggml/src/ggml-cpu/ops.cpp

**/*.{cpp,h,hpp,py}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Ensure cross-platform compatibility by testing code changes on Linux, macOS, and Windows when possible

Files:

ggml/src/ggml-cpu/ops.cpp

🧠 Learnings (2)

📓 Common learnings

Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 14
File: examples/lookahead/lookahead.cpp:97-98
Timestamp: 2025-03-13T22:56:47.417Z
Learning: ngxson prefers to prioritize simplification in some cases and defer adding error handling to a later time, as indicated by their response to a suggestion about adding error checks for llama_decode_ext calls.

📚 Learning: 2025-11-29T22:55:53.865Z

Learnt from: CR
Repo: ngxson/llama.cpp PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T22:55:53.865Z
Learning: Applies to src/llama.cpp : Maintain core library implementation in `src/llama.cpp` with careful attention to API contracts and backward compatibility

Applied to files:

ggml/src/ggml-cpu/ops.cpp

🧬 Code graph analysis (1)

ggml/src/ggml-cpu/ops.cpp (3)

ggml/src/ggml.c (1)

ggml_nrows (1231-1235)

ggml/src/ggml-sycl/rope.cpp (2)

rope_yarn_ramp (13-16)

rope_yarn_ramp (13-13)

ggml/src/ggml-cpu/llamafile/sgemm.cpp (4)

p (1046-1051)

p (1046-1046)

p (1053-1062)

p (1053-1053)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)

GitHub Check: build-cmake-pkg / linux
GitHub Check: build-linux-cross / debian-13-loongarch64-vulkan-cross
GitHub Check: build-linux-cross / ubuntu-24-riscv64-cpu-spacemit-ime-cross
GitHub Check: windows-latest-cmake (vulkan-x64, x64, -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD...
GitHub Check: ggml-ci-x64-cpu-high-perf
GitHub Check: ggml-ci-x64-cpu-low-perf
GitHub Check: ubuntu-22-cmake-sycl
GitHub Check: ubuntu-22-cmake-musa
GitHub Check: windows-msys2 (UCRT64, ucrt-x86_64, Release)
GitHub Check: ubuntu-22-cmake-hip
GitHub Check: ubuntu-24-cmake-webgpu
GitHub Check: ubuntu-22-cmake-sycl-fp16
GitHub Check: ubuntu-latest-cmake-sanitizer (UNDEFINED, Debug)
GitHub Check: ubuntu-latest-cmake-sanitizer (THREAD, Debug)
GitHub Check: server-windows

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

ggml/src/ggml.c (2)

1051-1158: Add missing GGML_OP_SYMBOL entry for ROPE_COMP between rope(x) and rope_back(x)

The GGML_OP_NAME table includes "ROPE_COMP" between "ROPE" and "ROPE_BACK", but GGML_OP_SYMBOL is missing the corresponding "rope_comp(x)" entry. This causes array index misalignment where ggml_op_symbol(GGML_OP_ROPE_COMP) returns "rope_back(x)" and shifts all subsequent symbols by one position, breaking graph dumps and logging that rely on ggml_op_symbol().

Add "rope_comp(x)", at the correct position in GGML_OP_SYMBOL to align both tables.

6919-6943: Backward pass: ROPE_COMP is wired in graph building but missing compute implementation

In ggml_build_backward_expand, you correctly added GGML_OP_ROPE_COMP to the list where ignore_src[1] = true, matching GGML_OP_ROPE (positions are non-differentiable). This handles the graph-wiring phase.

However, ggml_compute_backward has no case GGML_OP_ROPE_COMP:, only case GGML_OP_ROPE:. If a training graph includes GGML_OP_ROPE_COMP on any path from a loss to trainable parameters, the backward pass will hit the default branch and call GGML_ABORT("unsupported ggml op for backward pass").

Either:

Add a GGML_OP_ROPE_COMP case in ggml_compute_backward that reuses or parallels the GGML_OP_ROPE gradient logic (since both compute rope embeddings with similar parameterization), or

If this operation is strictly inference-only, assert or document that constraint explicitly to prevent misuse in training.

♻️ Duplicate comments (1)

src/models/mistral3.cpp (1)

78-85: Use model hyperparameters instead of hardcoded YaRN beta values.

Lines 82 and 85 hardcode beta_fast=32.0f and beta_slow=1.0f in the ggml_rope_comp_set_yarn calls. As noted in the previous review, these should use hparams.yarn_beta_fast and hparams.yarn_beta_slow to respect model-specific YaRN configurations loaded from model metadata.

🧹 Nitpick comments (5)

ggml/src/ggml-metal/ggml-metal-ops.cpp (1)

373-376: ROPE_COMP Metal path wiring and arg decoding look correct

The new GGML_OP_ROPE_COMP case in ggml_metal_op_encode_impl correctly forwards to ggml_metal_op_rope_comp, returning a single fused node, consistent with the existing ROPE path.

ggml_metal_op_rope_comp mirrors ggml_metal_op_rope structurally: dimension asserts (ne10 % ne02 == 0, ne10 >= ne02), threadgroup sizing via nth = std::min(1024, ne00), and dispatch over (ne01, ne02, ne03, nth, 1, 1).

The op‑params decoding (indices 0–3 as ints, 4–9 as floats, 10–13 as section ints) matches the ggml_rope_comp / ggml_rope_comp_set_yarn layout from ggml.c, and those values are passed one‑for‑one into ggml_metal_kargs_rope_comp, including src2 = (op->src[2] != nullptr).

If you want to tighten readability later, consider introducing a small helper/local const int32_t *p = op->op_params; and using it for the memcpy calls, to mirror other Metal ops and reduce repeated casts, but functionally this is fine as is.

Also applies to: 3223-3311

ggml/src/ggml-cpu/ops.cpp (2)

5879-5884: Guard freq_factors use when n_dims < ne0

freq_factors is indexed as freq_factors[i0/2] in init_cache, and you only assert src2->ne[0] >= n_dims / 2. However the loop currently runs to i0 < ne0, while rotate_pairs is only called over n_dims. If a future call uses n_dims < ne0 with src2->ne[0] == n_dims / 2 (matching the assertion), this will read past the end of freq_factors.

If you adopt the change above to bound the cache loop by n_dims instead of ne0, this risk disappears and the assertion remains correct. Otherwise, you should either tighten the assertion to src2->ne[0] >= ne0 / 2 or explicitly clamp the loop to min(ne0, 2*src2->ne[0]).

5835-5853: Unused sections / TODO M‑RoPE

sections[4] is read from dst->op_params but never used in ggml_compute_forward_rope_comp_flt, and the body contains // TODO M-RoPE comments. This is fine for a demo, but it will produce unused‑variable warnings and ROPE_COMP currently ignores any multi‑rope configuration passed via ggml_rope_comp_set_multi.

Either mark sections as [[maybe_unused]] for now or wire it into the M‑RoPE implementation in a follow‑up; also consider clarifying in comments that ROPE_COMP currently supports only the simple (non‑M‑RoPE) case so downstream users know what to expect.

ggml/src/ggml.c (2)

4271-4324: ggml_rope_comp op construction and metadata layout look sound

The new ggml_rope_comp API:

Validates shapes/types reasonably (b is F32 vector, length multiple of a->ne[2] for M‑RoPE).

Encodes NORMAL vs NEOX ordering using idx_pair/idx_scale in a way that matches typical RoPE conventions.

Derives theta_scale = powf(freq_base, -2.0f / n_dims) as expected.

Packs op_params in a compact, forward‑compatible layout (dims, ordering, theta, YaRN/mrope slots), and duplicates a to keep shape/type consistent with existing RoPE ops.

Given how tightly this layout couples to CPU/Metal kernels, it would be good to keep the offsets documented somewhere (even just a short comment block) so future changes don’t silently desync the layouts.

4326-4352: YaRN low/high correction range now matches helper and fixes the earlier swap

ggml_rope_comp_set_yarn now:

Reads n_dims from op_params[0].

Computes:

start = floor(corr_dim(beta_fast))

end = ceil(corr_dim(beta_slow))

yarn_low = MAX(0, start)

yarn_high = MIN(n_dims - 1, end)

Writes yarn_high and yarn_low back at offsets 5 and 6, plus freq_scale, attn_factor, and ramp_factor at 7–9.

This is consistent with ggml_rope_yarn_corr_dims’s semantics (dims[0] = low, dims[1] = high) and addresses the previous high/low swap that broke the YaRN ramp. The zero‑clamp and n_dims - 1 clamp also look correct.

One minor suggestion: consider adding a short comment here pointing to ggml_rope_yarn_corr_dims and documenting the [5..9] slots, since this contract is shared implicitly with the backends.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c69a94e and 0a47037.

📒 Files selected for processing (12)

ggml/include/ggml.h (2 hunks)
ggml/src/ggml-cpu/ops.cpp (1 hunks)
ggml/src/ggml-metal/ggml-metal-common.cpp (1 hunks)
ggml/src/ggml-metal/ggml-metal-device.cpp (1 hunks)
ggml/src/ggml-metal/ggml-metal-device.h (1 hunks)
ggml/src/ggml-metal/ggml-metal-device.m (1 hunks)
ggml/src/ggml-metal/ggml-metal-impl.h (1 hunks)
ggml/src/ggml-metal/ggml-metal-ops.cpp (2 hunks)
ggml/src/ggml-metal/ggml-metal-ops.h (1 hunks)
ggml/src/ggml-metal/ggml-metal.metal (3 hunks)
ggml/src/ggml.c (5 hunks)
src/models/mistral3.cpp (2 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{cpp,h,hpp}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{cpp,h,hpp}: Always format C++ code using git clang-format before committing, following .clang-format configuration (4-space indentation, 120 column limit, braces on same line for functions, pointer/reference alignment in middle)
Minimize external dependencies; avoid adding new external dependencies unless absolutely necessary

Files:

ggml/src/ggml-metal/ggml-metal-device.cpp
src/models/mistral3.cpp
ggml/src/ggml-metal/ggml-metal-device.h
ggml/src/ggml-metal/ggml-metal-ops.h
ggml/src/ggml-cpu/ops.cpp
ggml/include/ggml.h
ggml/src/ggml-metal/ggml-metal-ops.cpp
ggml/src/ggml-metal/ggml-metal-common.cpp
ggml/src/ggml-metal/ggml-metal-impl.h

**/*.{cpp,h,hpp,py}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Ensure cross-platform compatibility by testing code changes on Linux, macOS, and Windows when possible

Files:

ggml/src/ggml-metal/ggml-metal-device.cpp
src/models/mistral3.cpp
ggml/src/ggml-metal/ggml-metal-device.h
ggml/src/ggml-metal/ggml-metal-ops.h
ggml/src/ggml-cpu/ops.cpp
ggml/include/ggml.h
ggml/src/ggml-metal/ggml-metal-ops.cpp
ggml/src/ggml-metal/ggml-metal-common.cpp
ggml/src/ggml-metal/ggml-metal-impl.h

src/**/*.cpp

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Prioritize performance optimization in core library implementations in src/, as this is a performance-critical inference library

Files:

src/models/mistral3.cpp

🧠 Learnings (2)

📓 Common learnings

Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 14
File: examples/lookahead/lookahead.cpp:97-98
Timestamp: 2025-03-13T22:56:47.417Z
Learning: ngxson prefers to prioritize simplification in some cases and defer adding error handling to a later time, as indicated by their response to a suggestion about adding error checks for llama_decode_ext calls.

📚 Learning: 2025-11-29T22:55:53.865Z

Learnt from: CR
Repo: ngxson/llama.cpp PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T22:55:53.865Z
Learning: Applies to src/llama.cpp : Maintain core library implementation in `src/llama.cpp` with careful attention to API contracts and backward compatibility

Applied to files:

ggml/src/ggml-cpu/ops.cpp

🧬 Code graph analysis (6)

ggml/src/ggml-metal/ggml-metal-device.cpp (1)

ggml/src/ggml.c (1)

ggml_type_name (1283-1285)

ggml/src/ggml-metal/ggml-metal-device.h (1)

ggml/src/ggml-metal/ggml-metal-device.cpp (2)

ggml_metal_library_get_pipeline_rope_comp (1478-1493)

ggml_metal_library_get_pipeline_rope_comp (1478-1478)

ggml/src/ggml-metal/ggml-metal-ops.h (1)

ggml/src/ggml-metal/ggml-metal-ops.cpp (2)

ggml_metal_op_rope_comp (3223-3311)

ggml_metal_op_rope_comp (3223-3223)

ggml/src/ggml-cpu/ops.cpp (2)

ggml/src/ggml.c (1)

ggml_nrows (1231-1235)

ggml/src/ggml-sycl/rope.cpp (2)

rope_yarn_ramp (13-16)

rope_yarn_ramp (13-13)

ggml/include/ggml.h (1)

ggml/src/ggml.c (2)

ggml_rope_comp (4271-4324)

ggml_rope_comp_set_yarn (4326-4352)

ggml/src/ggml.c (1)

ggml/src/ggml-impl.h (1)

ggml_set_op_params (145-149)

🪛 GitHub Actions: EditorConfig Checker

ggml/src/ggml-metal/ggml-metal.metal

[error] 4134-4134: Trailing whitespace detected by editorconfig-checker.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)

GitHub Check: ggml-ci-x64-cpu-high-perf
GitHub Check: ggml-ci-x64-cpu-low-perf
GitHub Check: ubuntu-22-cmake-musa
GitHub Check: ubuntu-latest-cmake-cuda
GitHub Check: ubuntu-24-cmake-vulkan-deb
GitHub Check: android-build
GitHub Check: android-ndk-build (arm64-snapdragon, --preset arm64-android-snapdragon-release)
GitHub Check: ubuntu-22-cmake-sycl-fp16
GitHub Check: ubuntu-22-cmake-sycl
GitHub Check: ubuntu-22-cmake-hip
GitHub Check: ubuntu-latest-cmake-sanitizer (THREAD, Debug)
GitHub Check: ubuntu-cpu-cmake (x64, ubuntu-22.04)
GitHub Check: ubuntu-latest-cmake-rpc
GitHub Check: macOS-latest-cmake-x64
GitHub Check: ubuntu-latest-llguidance
GitHub Check: ubuntu-latest-cmake-sanitizer (ADDRESS, Debug)
GitHub Check: ubuntu-latest-cmake-sanitizer (UNDEFINED, Debug)
GitHub Check: macOS-latest-cmake-arm64
GitHub Check: server-windows

🔇 Additional comments (10)

ggml/src/ggml-metal/ggml-metal-common.cpp (1)

264-264: LGTM!

The addition of GGML_OP_ROPE_COMP to the reorder-safe operations list is appropriate and consistent with the treatment of GGML_OP_ROPE.

ggml/src/ggml-metal/ggml-metal-device.m (1)

1032-1032: LGTM!

The GGML_OP_ROPE_COMP operation is correctly enabled for Metal backend support, falling through to the same logic as GGML_OP_ROPE.

ggml/src/ggml-metal/ggml-metal-ops.h (1)

74-74: LGTM!

The function declaration for ggml_metal_op_rope_comp follows the established pattern and is appropriately placed alongside the existing ggml_metal_op_rope declaration.

ggml/src/ggml-metal/ggml-metal-device.h (1)

139-139: LGTM!

The pipeline getter declaration for ROPE_COMP is consistent with the existing pipeline getter patterns and is appropriately positioned alongside the ROPE pipeline getter.

ggml/src/ggml-metal/ggml-metal-device.cpp (1)

1478-1493: LGTM!

The ggml_metal_library_get_pipeline_rope_comp implementation correctly follows the established pattern for Metal pipeline getters. The function properly constructs the kernel name based on the tensor type and handles pipeline compilation when needed.

ggml/src/ggml-metal/ggml-metal.metal (2)

4415-4431: Kernel alias and host_name exports look consistent.

kernel_rope_comp_t correctly aliases kernel_rope_comp<float>, and the f32/f16 host_name instantiations follow the existing pattern for the other RoPE kernels.

4102-4153: Fix trailing whitespace and verify YaRN implementation details.

The kernel logic appears structurally sound, but several RoPE v2 / YaRN details need alignment with the CPU implementation:

i_dim exponent: The formula i_dim = (i0 / 2) + 1 as the exponent in pow(args.theta_scale, i_dim) should be verified against the CPU ggml_rope_comp implementation. If the CPU uses 0-based indexing (i.e., i0/2), this off-by-one shifts all frequency values on Metal.

YaRN magnitude scaling: The YaRN formula includes mscale = (0.1 * ln(freq_scale) + 1) * mscale_param to adjust magnitudes in the ramp region. This kernel applies * args.attn_factor but never multiplies by this additional mscale term. Confirm that attn_factor passed from the host is already pre-scaled to encode the YaRN magnitude correction (this is a known workaround pattern); otherwise, the GPU path will produce different attention outputs than CPU YaRN for extrapolated positions.

src2 indexing: Verify that freq_factor = src2[i0/2] aligns with how src2 is laid out on the host—specifically, whether src2 uses logical RoPE pair indexing (i0/2) or follows the memory layout stride (idx_scale).

Non-rotated tail: The else branch uses i0 * args.nb00 / i0 * args.nb0 instead of ic * args.nb00 (where ic = i0 / idx_scale). This assumes the unrotated dimensions are contiguously stored in the standard layout; if they share the composable layout with rotated dims, use ic consistently.

Also, remove trailing whitespace on the blank line after sin_theta (around line 4134) to fix EditorConfig CI failure.

ggml/include/ggml.h (1)

517-572: ROPE_COMP op and composable RoPE API surface are consistent with the implementation

Adding GGML_OP_ROPE_COMP next to GGML_OP_ROPE keeps the op enum organized by functionality; downstream code that switches on ops can now distinguish the composable variant cleanly.

enum ggml_rope_ordering { GGML_ROPE_ORDERING_NORMAL, GGML_ROPE_ORDERING_NEOX } is a sensible, small enum specifically for ggml_rope_comp, and is intentionally separate from the older GGML_ROPE_TYPE_* bitfield macros used in the legacy mode argument.

The declarations for ggml_rope_comp, ggml_rope_comp_set_yarn, and ggml_rope_comp_set_multi match the ggml.c snippet (parameter order and types), so the header and implementation are in sync, including the YaRN and M‑RoPE extension hooks.

No issues from an API or header‑consistency standpoint.

Also applies to: 1863-1895

ggml/src/ggml-metal/ggml-metal-impl.h (1)

263-295: Host-side ggml_metal_kargs_rope_comp struct layout is correct and aligns with shader usage

The Metal kernel file includes ggml-metal-impl.h directly (line 8), ensuring the shader uses the identical struct definition from the host. The struct is populated via designated initializers in ggml_metal_op_rope_comp() with all fields in the correct order and types, and the kernel accesses all fields (dims, strides, rotation params, yarn factors, and frequency scaling) as expected. No ABI mismatch exists.

ggml/src/ggml.c (1)

993-995: ROPE_COMP op registration looks consistent with existing ops

Adding "ROPE_COMP" between "ROPE" and "ROPE_BACK" and bumping GGML_OP_COUNT to 96 keeps the op-name table coherent with the enum ordering. Nothing else to change here as long as other op tables (symbols, backends) are kept in sync, which you mostly did below.

If you want to double‑check enum/op-name alignment across the codebase, you can run a quick grep on GGML_OP_ROPE_COMP usages.

Also applies to: 1049-1049

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

tests/test-rope.cpp (2)
147-153: Non-portable format specifier for int64_t.

Using %lld for int64_t is not portable across platforms (Windows uses different size for long long). Use PRId64 from <cinttypes> for cross-platform compatibility.
🔎 Proposed fix

Add include at top of file:
#include <cinttypes>
Then update the format string:
     std::string vars() {
         char buf[256];
         snprintf(buf, sizeof(buf),
-            "type=%d ne=(%lld,%lld,%lld,%lld) n_dims=%d mode=%d fs=%f ef=%f af=%f ff=%d v=%d inplace=%d",
-            type, ne_a[0], ne_a[1], ne_a[2], ne_a[3], n_dims, mode, fs, ef, af, ff ? 1 : 0, v, inplace ? 1 : 0);
+            "type=%d ne=(%" PRId64 ",%" PRId64 ",%" PRId64 ",%" PRId64 ") n_dims=%d mode=%d fs=%f ef=%f af=%f ff=%d v=%d inplace=%d",
+            type, ne_a[0], ne_a[1], ne_a[2], ne_a[3], n_dims, mode, fs, ef, af, ff ? 1 : 0, v, inplace ? 1 : 0);
         return std::string(buf);
     }
As per coding guidelines, ensure cross-platform compatibility.

340-400: Memory leak risk on assertion failure.

Raw new/delete is used for test cases. If the assertion at line 450 fails, remaining elements in test_cases won't be cleaned up. For test code this is acceptable, but using stack-allocated std::vector<test_rope> or smart pointers would be safer.
🔎 Alternative using value semantics
-    std::vector<test_rope *> test_cases;
+    std::vector<test_rope> test_cases;
     ...
-    test_cases.emplace_back(new test_rope(type, {128,  32, 2, 1}, ...));
+    test_cases.emplace_back(type, {128,  32, 2, 1}, ...);
     ...
-    std::vector<test_rope *> comp_cases;
-    for (auto & tc : test_cases) {
-        auto tc_comp = new test_rope(*tc);
-        tc_comp->use_comp = true;
-        comp_cases.push_back(tc_comp);
-    }
+    std::vector<test_rope> comp_cases;
+    for (auto & tc : test_cases) {
+        comp_cases.push_back(tc);
+        comp_cases.back().use_comp = true;
+    }
Then access via test_cases[i] directly instead of through pointers.
Also applies to: 446-450

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0a47037 and 334bee0.

📒 Files selected for processing (1)

tests/test-rope.cpp (2 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.{cpp,h,hpp}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{cpp,h,hpp}: Always format C++ code using git clang-format before committing, following .clang-format configuration (4-space indentation, 120 column limit, braces on same line for functions, pointer/reference alignment in middle)
Minimize external dependencies; avoid adding new external dependencies unless absolutely necessary

Files:

tests/test-rope.cpp

**/*.{cpp,h,hpp,py}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Ensure cross-platform compatibility by testing code changes on Linux, macOS, and Windows when possible

Files:

tests/test-rope.cpp

🧠 Learnings (1)

📓 Common learnings

Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 14
File: examples/lookahead/lookahead.cpp:97-98
Timestamp: 2025-03-13T22:56:47.417Z
Learning: ngxson prefers to prioritize simplification in some cases and defer adding error handling to a later time, as indicated by their response to a suggestion about adding error checks for llama_decode_ext calls.

🧬 Code graph analysis (1)

tests/test-rope.cpp (1)

ggml/src/ggml.c (2)

ggml_rope_multi (4122-4141)

ggml_rope_ext (4175-4193)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)

GitHub Check: build-linux-cross / ubuntu-24-riscv64-cpu-spacemit-ime-cross
GitHub Check: build-linux-cross / debian-13-loongarch64-vulkan-cross
GitHub Check: build-linux-cross / debian-13-loongarch64-cpu-cross
GitHub Check: build-cmake-pkg / linux
GitHub Check: windows-latest-cmake (llvm-arm64-opencl-adreno, arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN...
GitHub Check: windows-latest-cmake (llvm-arm64, arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/ar...
GitHub Check: windows-latest-cmake (vulkan-x64, x64, -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD...
GitHub Check: windows-latest-cmake (cpu-x64 (static), x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmak...
GitHub Check: windows-latest-cmake (openblas-x64, x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x6...
GitHub Check: ubuntu-24-wasm-webgpu
GitHub Check: windows-msys2 (UCRT64, ucrt-x86_64, Release)
GitHub Check: windows-msys2 (CLANG64, clang-x86_64, Release)
GitHub Check: ubuntu-24-cmake-webgpu
GitHub Check: macOS-latest-cmake-arm64
GitHub Check: ubuntu-24-cmake-vulkan
GitHub Check: ubuntu-latest-cmake-sanitizer (ADDRESS, Debug)
GitHub Check: ubuntu-24-cmake-vulkan-deb
GitHub Check: ubuntu-latest-cmake-rpc
GitHub Check: macOS-latest-cmake-arm64-webgpu
GitHub Check: server-windows

🔇 Additional comments (3)

tests/test-rope.cpp (3)
176-184: MROPE/Vision/IMROPE tests are always skipped for rope_comp.

_ggml_rope_multi returns nullptr when use_comp = true, causing all multi-rope variants (MROPE, Vision, IMROPE) to be skipped in the comparison tests. This is likely intentional for this demo, but worth documenting or removing those test cases from the matrix to avoid unnecessary iterations.

If this is expected behavior for the demo, consider adding a comment explaining that rope_comp doesn't yet support multi-rope modes, or filter out those test cases earlier to avoid wasted iterations.

410-425: Both graphs share the same context—tensor initialization overwrites shared data.

Both tc_rope and tc_comp build graphs in the same ctx0, and both call initialize_tensors(ctx0). Since both graphs reference the same underlying tensors (e.g., input a, pos), the second initialization overwrites values set by the first. If this is intentional to ensure both graphs use identical inputs, it works, but if different inputs were expected, this is a bug.

Verify that sharing the same tensor data between both graphs is the intended behavior. If both should use identical inputs (which seems correct for NMSE comparison), consider initializing tensors only once after both graphs are built.
🔎 Suggested clarification
         ggml_tensor * out0 = tc_rope->build_graph(ctx0);
         ggml_tensor * out1 = tc_comp->build_graph(ctx0);

         if (out0 == nullptr || out1 == nullptr) {
             GGML_PRINT("test_rope_comp \x1b[33mSKIPPED\x1b[0m: %s\n", tc_rope->vars().c_str());
             ggml_free(ctx0);
             delete tc_comp;
             delete tc_rope;
             continue;
         }

-        tc_rope->initialize_tensors(ctx0);
-        tc_comp->initialize_tensors(ctx0);
+        // Initialize tensors once - both graphs share the same input tensors
+        tc_rope->initialize_tensors(ctx0);
588-589: LGTM!

Integration of test_rope_comp() into the test suite is properly placed after existing tests.

tests/test-rope.cpp

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

ggml/src/ggml.c (1)

941-1050: GGML_OP_SYMBOL array is out of sync with GGML_OP_NAME after adding ROPE_COMP.

You inserted "ROPE_COMP" into GGML_OP_NAME between "ROPE" and "ROPE_BACK" (Line 993–995), but GGML_OP_SYMBOL still only has "rope(x)" and "rope_back(x)" with no "rope_comp(x)" entry (Lines 1101–1105). Because GGML_OP_SYMBOL is indexed by enum ggml_op, every symbol from ROPE_COMP onward is now shifted/mismatched (e.g., GGML_OP_ROPE_COMP reports "rope_back(x)", GGML_OP_ROPE_BACK reports "clamp(x)", etc.).

This doesn't break computation but makes graph dumps and debugging misleading.

Insert a "rope_comp(x)" symbol at the corresponding index (after "soft_max_back(x)" / before "rope(x)" or wherever matches your final enum ordering) so GGML_OP_SYMBOL[GGML_OP_ROPE_COMP] is correct and all following entries realign.

♻️ Duplicate comments (1)

tests/test-rope.cpp (1)

379-425: Use integer loop variable for v to match test_rope field type.

test_rope::v is an int used as a bit flag (if (v & 1) in build_input), but the loop currently declares float v : { 0, 1 }. This relies on implicit float→int conversion at the constructor call and is unnecessarily confusing for a bit-flag.

Consider changing to for (int v : { 0, 1 }) { ... } to keep the type consistent and avoid accidental misuse later.

🧹 Nitpick comments (4)

ggml/include/ggml.h (1)

1869-1875: Consider expanding parameter documentation.

The declaration is clean and the comment on line 1872 about F32 requirement is helpful. Consider adding brief documentation for other parameters (especially n_dims and freq_base) to match the documentation style of other complex operations in this file.

ggml/src/ggml-cpu/ops.cpp (1)

5862-5876: Thread partitioning ignores idx_offset, leading to poor load balancing (but correct results)

nr is computed as ggml_nrows(dst) (i.e., ne1 * ne2 * ne3), but the inner loop over heads starts at i1 = idx_offset. The global row counter ir only increments for heads i1 >= idx_offset, so when idx_offset > 0 the effective number of processed rows is smaller than nr. This makes ir0/ir1 ranges (and thus dr) too large and can leave higher‑indexed threads with zero work while thread 0 does everything.

Numerically this is still correct, but it hurts parallelism when you use idx_offset (e.g., skipping some heads).

Consider basing the work partitioning on an effective row count:

const int64_t nr_eff = (ne1 - idx_offset) * ne2 * ne3;

Use nr_eff to compute dr, ir0, ir1.

Either:

Map (i3, i2, i1 >= idx_offset) to a compact [0, nr_eff) index when incrementing ir, or

Pre‑advance ir by idx_offset * ne2 * ne3 before the loops.

This will keep the behavior the same while distributing work more evenly across threads when idx_offset is non‑zero.

tests/test-rope.cpp (1)

440-472: Test harness only exercises forward ROPE vs ROPE_COMP, not gradients.

test_rope_comp() builds two forward graphs (classic rope vs rope_comp) and compares NMSE, which is good for catching numerical issues in the RoPE implementation. However, it does not exercise backward/gradient paths for GGML_OP_ROPE_COMP.

If you plan to use ROPE_COMP in training or fine-tuning, consider adding at least one small gradient test (e.g., build a trivial loss on top of rope_comp output and run ggml_build_backward_expand) to ensure backward integration is correct and stays correct over time.

ggml/src/ggml.c (1)

4271-4340: ggml_rope_comp op-param layout: overall structure OK but symbol/comment consistency matters.

The ggml_rope_comp(...) builder:

Validates b is a F32 vector and length is a multiple of a->ne[2] (M-RoPE compatible).

Encodes n_dims, idx_pair, idx_scale, idx_offset, and theta_scale plus a block of floats/ints into op_params, then sets op = GGML_OP_ROPE_COMP and src[0]=a, src[1]=b, src[2]=NULL.

This matches how other ops use op_params and is structurally fine. Just make sure the comments on the param slots match how the CPU/Metal kernels interpret them (especially slots 5–9 used by YARN), otherwise future refactors will be error-prone.

If kernels treat slot 5 as yarn_low and slot 6 as yarn_high (see comment on ggml_rope_comp_set_yarn), the comments here should be updated accordingly for clarity.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 334bee0 and 509dfeb.

📒 Files selected for processing (5)

ggml/include/ggml.h (2 hunks)
ggml/src/ggml-cpu/ops.cpp (1 hunks)
ggml/src/ggml-metal/ggml-metal.metal (3 hunks)
ggml/src/ggml.c (5 hunks)
tests/test-rope.cpp (2 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.{cpp,h,hpp}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{cpp,h,hpp}: Always format C++ code using git clang-format before committing, following .clang-format configuration (4-space indentation, 120 column limit, braces on same line for functions, pointer/reference alignment in middle)
Minimize external dependencies; avoid adding new external dependencies unless absolutely necessary

Files:

tests/test-rope.cpp
ggml/src/ggml-cpu/ops.cpp
ggml/include/ggml.h

**/*.{cpp,h,hpp,py}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Ensure cross-platform compatibility by testing code changes on Linux, macOS, and Windows when possible

Files:

tests/test-rope.cpp
ggml/src/ggml-cpu/ops.cpp
ggml/include/ggml.h

🧠 Learnings (2)

📓 Common learnings

Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 14
File: examples/lookahead/lookahead.cpp:97-98
Timestamp: 2025-03-13T22:56:47.417Z
Learning: ngxson prefers to prioritize simplification in some cases and defer adding error handling to a later time, as indicated by their response to a suggestion about adding error checks for llama_decode_ext calls.

📚 Learning: 2025-11-29T22:55:53.865Z

Learnt from: CR
Repo: ngxson/llama.cpp PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T22:55:53.865Z
Learning: Applies to src/llama.cpp : Maintain core library implementation in `src/llama.cpp` with careful attention to API contracts and backward compatibility

Applied to files:

ggml/src/ggml-cpu/ops.cpp

🧬 Code graph analysis (3)

tests/test-rope.cpp (3)

tests/test-backend-ops.cpp (46)

buf (231-231)

type (2306-2325)

mode (5455-5459)

v (2661-2665)

v (5316-5320)

ctx (1090-1090)

ctx (1130-1134)

ctx (1130-1130)

ctx (1165-1172)

ctx (1165-1165)

ctx (1176-1180)

ctx (1176-1176)

ctx (1182-1186)

ctx (1182-1182)

ctx (1188-1192)

ctx (1188-1188)

ctx (1194-1198)

ctx (1194-1194)

ctx (1200-1204)

ctx (1200-1200)

ctx (1415-1415)

a (1104-1106)

a (1104-1104)

a (5025-5084)

a (5025-5025)

ne (3039-3041)

ne (3892-3894)

ne (5410-5412)

t (1082-1084)

t (1082-1082)

t (1136-1145)

t (1136-1136)

t (1147-1150)

t (1147-1147)

t (2339-2342)

t (2339-2339)

t (2397-2400)

t (2397-2397)

t (2577-2579)

t (2577-2577)

t (2612-2614)

t (2612-2612)

t (2701-2703)

t (2701-2701)

nmse (274-287)

nmse (274-274)

ggml/src/ggml.c (6)

ggml_rope_multi (4122-4141)

ggml_cast (3434-3446)

ggml_rope_comp (4271-4324)

ggml_rope_comp_set_yarn (4342-4368)

ggml_rope_comp_set_freq_factors (4326-4340)

ggml_rope_ext (4175-4193)

tools/mtmd/clip.cpp (2)

x (2247-2249)

x (2247-2247)

ggml/include/ggml.h (1)

ggml/src/ggml.c (3)

ggml_rope_comp (4271-4324)

ggml_rope_comp_set_freq_factors (4326-4340)

ggml_rope_comp_set_yarn (4342-4368)

ggml/src/ggml.c (2)

ggml/src/ggml-cpu/repack.cpp (8)

params (1813-1826)

params (1813-1813)

params (1828-1875)

params (1828-1833)

params (1877-2008)

params (1877-1877)

params (2010-2142)

params (2010-2010)

ggml/src/ggml-impl.h (1)

ggml_set_op_params (145-149)

🪛 GitHub Actions: EditorConfig Checker

ggml/src/ggml-metal/ggml-metal.metal

[error] 4133-4133: Trailing whitespace detected by editorconfig-checker.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)

GitHub Check: android-build
GitHub Check: windows-latest-cmake-hip
GitHub Check: windows-latest-cmake (llvm-arm64, arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/ar...
GitHub Check: windows-latest-cmake-sycl
GitHub Check: macOS-latest-cmake-ios
GitHub Check: ios-xcode-build
GitHub Check: ubuntu-cpu-cmake (arm64, ubuntu-22.04-arm)
GitHub Check: ubuntu-cpu-cmake (x64, ubuntu-22.04)
GitHub Check: macOS-latest-cmake-arm64-webgpu
GitHub Check: ubuntu-latest-cmake-sanitizer (UNDEFINED, Debug)
GitHub Check: ubuntu-latest-cmake-sanitizer (THREAD, Debug)
GitHub Check: ubuntu-latest-cmake-rpc
GitHub Check: ubuntu-latest-cmake-sanitizer (ADDRESS, Debug)
GitHub Check: ubuntu-latest-llguidance
GitHub Check: ubuntu-24-cmake-vulkan-deb
GitHub Check: macOS-latest-cmake-x64
GitHub Check: ubuntu-24-cmake-vulkan
GitHub Check: ubuntu-24-wasm-webgpu
GitHub Check: macOS-latest-cmake-arm64
GitHub Check: server-windows

🔇 Additional comments (6)

ggml/src/ggml-metal/ggml-metal.metal (1)

4414-4429: New kernel_rope_comp typedef and host bindings look consistent

The kernel_rope_comp_t typedef and the kernel_rope_comp_f32 / kernel_rope_comp_f16 host_name instantiations mirror the pattern used by the existing ROPE kernels and correctly expose both float and half variants. No further changes needed here.

ggml/include/ggml.h (4)

1863-1866: LGTM! Clean API design.

The new enum ggml_rope_ordering provides a cleaner, type-safe alternative to the existing GGML_ROPE_TYPE_* macros. The composable API appropriately separates ordering semantics from other RoPE configuration.

1877-1880: LGTM!

The function signature follows GGML conventions for modifying existing computation graph nodes. The composable pattern allows optional configuration after initial RoPE node creation.

1894-1898: LGTM!

The function signature is consistent with the existing M-RoPE API (ggml_rope_multi). Using the GGML_MROPE_SECTIONS constant ensures consistency across the codebase.

517-517: The backward compatibility concern raised in this review is not applicable to the ggml_op enum. Unlike ggml_type (which is serialized in GGUF model files and has a documented policy to "always add types at the end"), ggml_op values are not persisted in model files—they exist only for runtime tensor operation tracking. Inserting GGML_OP_ROPE_COMP between GGML_OP_ROPE and GGML_OP_ROPE_BACK does not break binary compatibility with existing model files. The enum entry has been properly added to both the GGML_OP_NAME and GGML_OP_SYMBOL lookup arrays.

Likely an incorrect or invalid review comment.

ggml/src/ggml.c (1)

4342-4368: The parameter order is correct. Slot 5 holds yarn_high and slot 6 holds yarn_low, properly aligned with compute backend struct layouts and the rope_yarn_ramp(low, high, i0) function signature. For typical YARN settings where yarn_low ≤ yarn_high, the ramp calculation works as intended.

ggml/include/ggml.h

ggml/src/ggml-cpu/ops.cpp

ggml/src/ggml-metal/ggml-metal.metal

ggml/src/ggml.c

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

tests/test-rope.cpp (1)
384-384: Potential regression: float v should be int v.

A previous review identified that the loop variable v should be int (not float) since it represents a binary flag (0 or 1) passed to constructors expecting int. That comment was marked as addressed in commits 1affe19 to 509dfeb, but the current code still declares float v. Verify whether this is a regression or an annotation timing issue.
🔎 Proposed fix (from previous review)
-                        for (float v : { 0, 1 }) {
+                        for (int v : { 0, 1 }) {
ggml/src/ggml.c (1)

6933-6958: ROPE_COMP is marked in backward graph expansion but still has no backward implementation

GGML_OP_ROPE_COMP is now added to the list of ops whose src[1] gradients are ignored (same treatment as GGML_OP_ROPE), but ggml_compute_backward still lacks a case GGML_OP_ROPE_COMP. Any training graph containing this op will still fall into the default branch and abort with “unsupported ggml op for backward pass.”

If you intend ROPE_COMP to be trainable, you’ll need to add a dedicated backward case mirroring the GGML_OP_ROPE path (adapted to the compact representation). If it is inference-only, consider adding explicit documentation or an assertion to make that clear.

🧹 Nitpick comments (2)

ggml/src/ggml.c (2)
941-1159: GGML_OP_SYMBOL is missing an entry for ROPE_COMP, misaligning symbols for subsequent ops

You added ROPE_COMP to GGML_OP_NAME and bumped GGML_OP_COUNT to 96, but GGML_OP_SYMBOL still has only "rope(x)" followed directly by "rope_back(x)". This means ggml_op_symbol(GGML_OP_ROPE_COMP) will return the wrong symbol, and all later ops’ symbols are shifted.

Minimal fix: insert a "rope_comp(x)" symbol after "rope(x)" so indices stay aligned.
Proposed diff for `GGML_OP_SYMBOL` around RoPE entries
-    "soft_max(x)",
-    "soft_max_back(x)",
-    "rope(x)",
-    "rope_back(x)",
-    "clamp(x)",
+    "soft_max(x)",
+    "soft_max_back(x)",
+    "rope(x)",
+    "rope_comp(x)",
+    "rope_back(x)",
+    "clamp(x)",
4269-4368: ROPE_COMP helpers look consistent; consider guarding NEOX n_dims evenness

The ggml_rope_comp, _set_freq_factors, and _set_yarn helpers are internally consistent with the existing RoPE/YARN utilities: parameter layout is documented, the theta-scale formula matches standard RoPE, and the low/high YaRN dims now follow the same beta_fast/beta_slow ordering as ggml_rope_yarn_corr_dims.

One small robustness improvement would be to assert that n_dims is even when using GGML_ROPE_ORDERING_NEOX, since idx_pair = n_dims / 2 and idx_scale = 2 implicitly assume pairs of dimensions:
Optional assert for NEox ordering
     int32_t idx_pair  = 1;
     int32_t idx_scale = 1;
     if (ordering == GGML_ROPE_ORDERING_NEOX) {
+        GGML_ASSERT((n_dims % 2) == 0);
         idx_pair  = n_dims / 2;
         idx_scale = 2;
     }

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 509dfeb and a640788.

📒 Files selected for processing (3)

ggml/src/ggml-cpu/ops.cpp (1 hunks)
ggml/src/ggml.c (5 hunks)
tests/test-rope.cpp (2 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.{cpp,h,hpp}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{cpp,h,hpp}: Always format C++ code using git clang-format before committing, following .clang-format configuration (4-space indentation, 120 column limit, braces on same line for functions, pointer/reference alignment in middle)
Minimize external dependencies; avoid adding new external dependencies unless absolutely necessary

Files:

tests/test-rope.cpp
ggml/src/ggml-cpu/ops.cpp

**/*.{cpp,h,hpp,py}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Ensure cross-platform compatibility by testing code changes on Linux, macOS, and Windows when possible

Files:

tests/test-rope.cpp
ggml/src/ggml-cpu/ops.cpp

🧠 Learnings (2)

📓 Common learnings

Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 14
File: examples/lookahead/lookahead.cpp:97-98
Timestamp: 2025-03-13T22:56:47.417Z
Learning: ngxson prefers to prioritize simplification in some cases and defer adding error handling to a later time, as indicated by their response to a suggestion about adding error checks for llama_decode_ext calls.

📚 Learning: 2025-11-29T22:55:53.865Z

Learnt from: CR
Repo: ngxson/llama.cpp PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T22:55:53.865Z
Learning: Applies to src/llama.cpp : Maintain core library implementation in `src/llama.cpp` with careful attention to API contracts and backward compatibility

Applied to files:

ggml/src/ggml-cpu/ops.cpp

🧬 Code graph analysis (2)

tests/test-rope.cpp (1)

ggml/src/ggml.c (5)

ggml_rope_multi (4122-4141)

ggml_rope_comp (4271-4324)

ggml_rope_comp_set_yarn (4342-4368)

ggml_rope_comp_set_freq_factors (4326-4340)

ggml_rope_ext (4175-4193)

ggml/src/ggml.c (2)

ggml/src/ggml-cpu/llamafile/sgemm.cpp (14)

a (1151-1253)

a (1151-1151)

a (2188-2274)

a (2188-2188)

b (600-602)

b (600-600)

b (604-606)

b (604-604)

b (608-612)

b (608-608)

b (614-617)

b (614-614)

b (964-966)

b (964-964)

ggml/src/ggml-impl.h (1)

ggml_set_op_params (145-149)

🔇 Additional comments (6)

tests/test-rope.cpp (6)

131-160: LGTM! Well-structured test configuration.

The test_rope struct effectively encapsulates all test parameters and provides clear diagnostic output through vars(). The use_comp flag design allows elegant comparison between standard and composable RoPE paths.

161-222: LGTM! Wrapper logic correctly routes between rope implementations.

The wrappers elegantly switch between standard and composable RoPE:

_ggml_rope_multi returns nullptr for rope_comp (not yet supported), handled gracefully by test skip logic.

_ggml_rope_ext implements the full rope_comp path with proper type casting, ordering selection, YaRN configuration, and frequency factor attachment.

228-315: LGTM! Input and graph building logic is sound.

The methods correctly handle tensor creation and graph construction across all RoPE variants (NORMAL, NEOX, MROPE, VISION, IMROPE). The dimensional splits (n_dims/2 for vision, n_dims for regular MROPE) and section configurations are intentional and correct. Commented-out paths (inplace, backward) are acceptable for incremental development.

317-365: LGTM! Tensor initialization handles F32 and F16 correctly.

The F16 support addition properly converts float values using ggml_fp32_to_fp16. The initialize_tensors method appropriately distinguishes between position tensors (randomized within context), frequency factors (tight range around 1.0), and general parameters (uniform distribution).

368-479: LGTM! Comprehensive test suite with sound comparison logic.

The test driver generates extensive coverage across RoPE variants, data types, and parameter combinations. The NMSE-based comparison between standard and composable RoPE paths is appropriate, with graceful handling of unimplemented paths (nullptr skip) and clear pass/fail logging. The final assertion ensures all tests must pass.

616-616: LGTM! Clean integration of the new test suite.

The call to test_rope_comp() at the end of main ensures the comprehensive rope/rope_comp comparison tests run after the existing rope tests.

coderabbitai · 2025-12-19T14:59:24Z

ggml/src/ggml-cpu/ops.cpp

+template<typename T> //float or ggml_fp16_t
+static void ggml_compute_forward_rope_comp_flt(
+        const ggml_compute_params * params,
+        ggml_tensor * dst,
+        const bool forward) {
+
+    const ggml_tensor * src0 = dst->src[0];
+    const ggml_tensor * src1 = dst->src[1];
+    const ggml_tensor * src2 = dst->src[2];
+
+    GGML_ASSERT(src0->type == GGML_TYPE_F32 || src0->type == GGML_TYPE_F16);
+    GGML_ASSERT(src1->type == GGML_TYPE_F32);
+
+    int32_t n_dims, idx_pair, idx_scale, idx_offset;
+    float theta_scale, yarn_high, yarn_low, freq_scale, ramp_factor, attn_factor;
+    int32_t sections[4];
+
+    memcpy(&n_dims,         (int32_t *)dst->op_params +  0, sizeof(int32_t));
+    memcpy(&idx_pair,       (int32_t *)dst->op_params +  1, sizeof(int32_t));
+    memcpy(&idx_scale,      (int32_t *)dst->op_params +  2, sizeof(int32_t));
+    memcpy(&idx_offset,     (int32_t *)dst->op_params +  3, sizeof(int32_t));
+    memcpy(&theta_scale,    (int32_t *)dst->op_params +  4, sizeof(float));
+    memcpy(&yarn_high,      (int32_t *)dst->op_params +  5, sizeof(float));
+    memcpy(&yarn_low,       (int32_t *)dst->op_params +  6, sizeof(float));
+    memcpy(&freq_scale,     (int32_t *)dst->op_params +  7, sizeof(float));
+    memcpy(&attn_factor,    (int32_t *)dst->op_params +  8, sizeof(float));
+    memcpy(&ramp_factor,    (int32_t *)dst->op_params +  9, sizeof(float));
+    memcpy(&sections[0],    (int32_t *)dst->op_params + 10, sizeof(int32_t));
+    memcpy(&sections[1],    (int32_t *)dst->op_params + 11, sizeof(int32_t));
+    memcpy(&sections[2],    (int32_t *)dst->op_params + 12, sizeof(int32_t));
+    memcpy(&sections[3],    (int32_t *)dst->op_params + 13, sizeof(int32_t));
+
+    GGML_TENSOR_UNARY_OP_LOCALS
+
+    GGML_ASSERT(nb0 == nb00);
+    GGML_ASSERT(nb0 == sizeof(T));
+
+    const int ith = params->ith;
+    const int nth = params->nth;
+
+    const int nr = ggml_nrows(dst);
+
+    GGML_ASSERT(n_dims <= ne0);
+    GGML_ASSERT(n_dims % 2 == 0);
+
+    // rows per thread
+    const int dr = (nr + nth - 1)/nth;
+
+    // row range for this thread
+    const int ir0 = dr*ith;
+    const int ir1 = MIN(ir0 + dr, nr);
+
+    // row index used to determine which thread to use
+    int ir = 0;
+
+    // TODO M-RoPE
+
+    const float * freq_factors = NULL;
+    if (src2 != NULL) {
+        GGML_ASSERT(src2->type == GGML_TYPE_F32);
+        GGML_ASSERT(src2->ne[0] >= n_dims / 2);
+        freq_factors = (const float *) src2->data;
+    }
+
+    // backward process uses inverse rotation by cos and sin.
+    // cos and sin build a rotation matrix, where the inverse is the transpose.
+    // this essentially just switches the sign of sin.
+    const float sin_sign = forward ? 1.0f : -1.0f;
+
+    const float * pos = (const float *) src1->data;
+
+    auto init_cache = [&](float * cache, float p) -> void {
+        for (int64_t i0 = 0; i0 < ne0; i0 += 2) {
+            const float freq_factor = freq_factors ? freq_factors[i0/2] : 1.0f;
+            float theta = p * powf(theta_scale, i0/2) / freq_factor;
+            const float theta_extrap = theta;
+            const float theta_interp = freq_scale * theta;
+            if (ramp_factor != 0.0f) {
+                const float ramp_mix = rope_yarn_ramp(yarn_low, yarn_high, i0) * ramp_factor;
+                theta = theta_interp * (1 - ramp_mix) + theta_extrap * ramp_mix;
+            } else {
+                theta = theta_interp;
+            }
+            cache[i0 + 0] = cosf(theta) * attn_factor;
+            cache[i0 + 1] = sinf(theta) * attn_factor * sin_sign;
+        }


⚠️ Potential issue | 🟠 Major

Limit init_cache loop to n_dims to avoid potential freq_factors OOB when ne0 > n_dims.

init_cache currently iterates i0 up to ne0, but freq_factors is only asserted to have src2->ne[0] >= n_dims / 2. If ne0 > n_dims and src2->ne[0] == n_dims / 2, accesses freq_factors[i0/2] for i0 >= n_dims can read past the end of freq_factors. Those extra cache entries are not used (since rotate_pairs is called with n_dims), so it is both safer and slightly cheaper to bound the loop by n_dims instead of ne0.

Consider tightening the loop as follows:

Proposed fix: use n_dims in init_cache

- auto init_cache = [&](float * cache, float p) -> void { - for (int64_t i0 = 0; i0 < ne0; i0 += 2) { + auto init_cache = [&](float * cache, float p) -> void { + for (int64_t i0 = 0; i0 < n_dims; i0 += 2) { const float freq_factor = freq_factors ? freq_factors[i0/2] : 1.0f; float theta = p * powf(theta_scale, i0/2) / freq_factor; const float theta_extrap = theta; const float theta_interp = freq_scale * theta; if (ramp_factor != 0.0f) { const float ramp_mix = rope_yarn_ramp(yarn_low, yarn_high, i0) * ramp_factor; theta = theta_interp * (1 - ramp_mix) + theta_extrap * ramp_mix; } else { theta = theta_interp; } cache[i0 + 0] = cosf(theta) * attn_factor; cache[i0 + 1] = sinf(theta) * attn_factor * sin_sign; } };

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

template<typename T> //float or ggml_fp16_t

static void ggml_compute_forward_rope_comp_flt(

const ggml_compute_params * params,

ggml_tensor * dst,

const bool forward) {

const ggml_tensor * src0 = dst->src[0];

const ggml_tensor * src1 = dst->src[1];

const ggml_tensor * src2 = dst->src[2];

GGML_ASSERT(src0->type == GGML_TYPE_F32 || src0->type == GGML_TYPE_F16);

GGML_ASSERT(src1->type == GGML_TYPE_F32);

int32_t n_dims, idx_pair, idx_scale, idx_offset;

float theta_scale, yarn_high, yarn_low, freq_scale, ramp_factor, attn_factor;

int32_t sections[4];

memcpy(&n_dims, (int32_t *)dst->op_params + 0, sizeof(int32_t));

memcpy(&idx_pair, (int32_t *)dst->op_params + 1, sizeof(int32_t));

memcpy(&idx_scale, (int32_t *)dst->op_params + 2, sizeof(int32_t));

memcpy(&idx_offset, (int32_t *)dst->op_params + 3, sizeof(int32_t));

memcpy(&theta_scale, (int32_t *)dst->op_params + 4, sizeof(float));

memcpy(&yarn_high, (int32_t *)dst->op_params + 5, sizeof(float));

memcpy(&yarn_low, (int32_t *)dst->op_params + 6, sizeof(float));

memcpy(&freq_scale, (int32_t *)dst->op_params + 7, sizeof(float));

memcpy(&attn_factor, (int32_t *)dst->op_params + 8, sizeof(float));

memcpy(&ramp_factor, (int32_t *)dst->op_params + 9, sizeof(float));

memcpy(&sections[0], (int32_t *)dst->op_params + 10, sizeof(int32_t));

memcpy(&sections[1], (int32_t *)dst->op_params + 11, sizeof(int32_t));

memcpy(&sections[2], (int32_t *)dst->op_params + 12, sizeof(int32_t));

memcpy(&sections[3], (int32_t *)dst->op_params + 13, sizeof(int32_t));

GGML_TENSOR_UNARY_OP_LOCALS

GGML_ASSERT(nb0 == nb00);

GGML_ASSERT(nb0 == sizeof(T));

const int ith = params->ith;

const int nth = params->nth;

const int nr = ggml_nrows(dst);

GGML_ASSERT(n_dims <= ne0);

GGML_ASSERT(n_dims % 2 == 0);

// rows per thread

const int dr = (nr + nth - 1)/nth;

// row range for this thread

const int ir0 = dr*ith;

const int ir1 = MIN(ir0 + dr, nr);

// row index used to determine which thread to use

int ir = 0;

// TODO M-RoPE

const float * freq_factors = NULL;

if (src2 != NULL) {

GGML_ASSERT(src2->type == GGML_TYPE_F32);

GGML_ASSERT(src2->ne[0] >= n_dims / 2);

freq_factors = (const float *) src2->data;

}

// backward process uses inverse rotation by cos and sin.

// cos and sin build a rotation matrix, where the inverse is the transpose.

// this essentially just switches the sign of sin.

const float sin_sign = forward ? 1.0f : -1.0f;

const float * pos = (const float *) src1->data;

auto init_cache = [&](float * cache, float p) -> void {

for (int64_t i0 = 0; i0 < ne0; i0 += 2) {

const float freq_factor = freq_factors ? freq_factors[i0/2] : 1.0f;

float theta = p * powf(theta_scale, i0/2) / freq_factor;

const float theta_extrap = theta;

const float theta_interp = freq_scale * theta;

if (ramp_factor != 0.0f) {

const float ramp_mix = rope_yarn_ramp(yarn_low, yarn_high, i0) * ramp_factor;

theta = theta_interp * (1 - ramp_mix) + theta_extrap * ramp_mix;

} else {

theta = theta_interp;

}

cache[i0 + 0] = cosf(theta) * attn_factor;

cache[i0 + 1] = sinf(theta) * attn_factor * sin_sign;

}

template<typename T> //float or ggml_fp16_t

static void ggml_compute_forward_rope_comp_flt(

const ggml_compute_params * params,

ggml_tensor * dst,

const bool forward) {

const ggml_tensor * src0 = dst->src[0];

const ggml_tensor * src1 = dst->src[1];

const ggml_tensor * src2 = dst->src[2];

GGML_ASSERT(src0->type == GGML_TYPE_F32 || src0->type == GGML_TYPE_F16);

GGML_ASSERT(src1->type == GGML_TYPE_F32);

int32_t n_dims, idx_pair, idx_scale, idx_offset;

float theta_scale, yarn_high, yarn_low, freq_scale, ramp_factor, attn_factor;

int32_t sections[4];

memcpy(&n_dims, (int32_t *)dst->op_params + 0, sizeof(int32_t));

memcpy(&idx_pair, (int32_t *)dst->op_params + 1, sizeof(int32_t));

memcpy(&idx_scale, (int32_t *)dst->op_params + 2, sizeof(int32_t));

memcpy(&idx_offset, (int32_t *)dst->op_params + 3, sizeof(int32_t));

memcpy(&theta_scale, (int32_t *)dst->op_params + 4, sizeof(float));

memcpy(&yarn_high, (int32_t *)dst->op_params + 5, sizeof(float));

memcpy(&yarn_low, (int32_t *)dst->op_params + 6, sizeof(float));

memcpy(&freq_scale, (int32_t *)dst->op_params + 7, sizeof(float));

memcpy(&attn_factor, (int32_t *)dst->op_params + 8, sizeof(float));

memcpy(&ramp_factor, (int32_t *)dst->op_params + 9, sizeof(float));

memcpy(&sections[0], (int32_t *)dst->op_params + 10, sizeof(int32_t));

memcpy(&sections[1], (int32_t *)dst->op_params + 11, sizeof(int32_t));

memcpy(&sections[2], (int32_t *)dst->op_params + 12, sizeof(int32_t));

memcpy(&sections[3], (int32_t *)dst->op_params + 13, sizeof(int32_t));

GGML_TENSOR_UNARY_OP_LOCALS

GGML_ASSERT(nb0 == nb00);

GGML_ASSERT(nb0 == sizeof(T));

const int ith = params->ith;

const int nth = params->nth;

const int nr = ggml_nrows(dst);

GGML_ASSERT(n_dims <= ne0);

GGML_ASSERT(n_dims % 2 == 0);

// rows per thread

const int dr = (nr + nth - 1)/nth;

// row range for this thread

const int ir0 = dr*ith;

const int ir1 = MIN(ir0 + dr, nr);

// row index used to determine which thread to use

int ir = 0;

// TODO M-RoPE

const float * freq_factors = NULL;

if (src2 != NULL) {

GGML_ASSERT(src2->type == GGML_TYPE_F32);

GGML_ASSERT(src2->ne[0] >= n_dims / 2);

freq_factors = (const float *) src2->data;

}

// backward process uses inverse rotation by cos and sin.

// cos and sin build a rotation matrix, where the inverse is the transpose.

// this essentially just switches the sign of sin.

const float sin_sign = forward ? 1.0f : -1.0f;

const float * pos = (const float *) src1->data;

auto init_cache = [&](float * cache, float p) -> void {

for (int64_t i0 = 0; i0 < n_dims; i0 += 2) {

const float freq_factor = freq_factors ? freq_factors[i0/2] : 1.0f;

float theta = p * powf(theta_scale, i0/2) / freq_factor;

const float theta_extrap = theta;

const float theta_interp = freq_scale * theta;

if (ramp_factor != 0.0f) {

const float ramp_mix = rope_yarn_ramp(yarn_low, yarn_high, i0) * ramp_factor;

theta = theta_interp * (1 - ramp_mix) + theta_extrap * ramp_mix;

} else {

theta = theta_interp;

}

cache[i0 + 0] = cosf(theta) * attn_factor;

cache[i0 + 1] = sinf(theta) * attn_factor * sin_sign;

}

🤖 Prompt for AI Agents

In ggml/src/ggml-cpu/ops.cpp around lines 5822 to 5907, the init_cache lambda iterates i0 up to ne0 which can index freq_factors out of bounds when ne0 > n_dims and src2 only supplies n_dims/2 entries; change the loop bound to use n_dims (i0 < n_dims) so the cache is only initialized for the used dimension range and avoid reading past freq_factors, while leaving the rest of the logic unchanged.

ngxson · 2025-12-19T14:59:37Z

@coderabbitai pause

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

ggml/src/ggml-metal/ggml-metal.metal (1)
4119-4147: Fix freq_factor indexing to respect idx_scale (align with other RoPE kernels and CPU path)

In kernel_rope_comp, freq_factor is currently indexed with i0/2:
const float freq_factor = args.src2 ? ((device const float *) src2)[i0/2] : 1.0f;
...
const int ic = i0 / args.idx_scale;
But src/dst access uses ic = i0 / args.idx_scale. When args.idx_scale != 2, i0/2 and ic diverge, so the frequency factors are taken from the wrong positions relative to the channels being rotated. All other RoPE kernels index src2 with the same index used for data access (ic), so this inconsistency will produce incorrect rotations in the Metal ROPE_COMP path.

Compute ic first and use it for freq_factor to keep RoPE/YaRN math coherent with the CPU implementation and existing Metal kernels:
Proposed fix
-    for (int i0 = 2*tiitg; i0 < args.ne0; i0 += 2*tptg.x) {
-        if (i0 < args.n_dims) {
-            const float freq_factor = args.src2 ? ((device const float *) src2)[i0/2] : 1.0f;
+    for (int i0 = 2*tiitg; i0 < args.ne0; i0 += 2*tptg.x) {
+        if (i0 < args.n_dims) {
+            const int   ic          = i0 / args.idx_scale;
+            const float freq_factor = args.src2 ? ((device const float *) src2)[ic] : 1.0f;
@@
-            const float cos_theta = cos(theta) * args.attn_factor;
-            const float sin_theta = sin(theta) * args.attn_factor;
-            
-            const int ic = i0 / args.idx_scale;
+            const float cos_theta = cos(theta) * args.attn_factor;
+            const float sin_theta = sin(theta) * args.attn_factor;
This change makes freq_factor selection match the layout implied by idx_scale and the way src/dst are addressed.
ggml/src/ggml-cpu/ops.cpp (1)
5880-5904: Potential out-of-bounds read on freq_factors when ne0 > n_dims

init_cache iterates i0 up to ne0, but freq_factors is only asserted to have src2->ne[0] >= n_dims / 2. If ne0 > n_dims and src2->ne[0] == n_dims / 2, the access freq_factors[i0/2] for i0 >= n_dims can read past the end of freq_factors. Those extra cache entries are not used (since rotate_pairs is bounded by n_dims), so this is pure risk without benefit.

Tighten the loop to n_dims to avoid undefined behavior:
Proposed fix
-    auto init_cache = [&](float * cache, float p) -> void {
-        for (int64_t i0 = 0; i0 < ne0; i0 += 2) {
+    auto init_cache = [&](float * cache, float p) -> void {
+        for (int64_t i0 = 0; i0 < n_dims; i0 += 2) {
             const float freq_factor = freq_factors ? freq_factors[i0/2] : 1.0f;
@@
             cache[i0 + 0] = cosf(theta) * attn_factor;
             cache[i0 + 1] = sinf(theta) * attn_factor * sin_sign;
         }
     };

🧹 Nitpick comments (2)

ggml/src/ggml-cpu/ops.cpp (2)
5893-5903: Optional: avoid powf per dimension in init_cache

theta is recomputed as p * powf(theta_scale, i0/2) for every dimension. The existing ggml_rope_cache_init path maintains theta multiplicatively (theta *= theta_scale), which is cheaper and avoids repeated powf calls.

Not a correctness issue, but you could mirror that pattern here for consistency and perf:
Possible refactor sketch
-    auto init_cache = [&](float * cache, float p) -> void {
-        for (int64_t i0 = 0; i0 < n_dims; i0 += 2) {
-            const float freq_factor = freq_factors ? freq_factors[i0/2] : 1.0f;
-
-            float theta = p * powf(theta_scale, i0/2) / freq_factor;
+    auto init_cache = [&](float * cache, float p) -> void {
+        float theta = p;
+        for (int64_t i0 = 0; i0 < n_dims; i0 += 2) {
+            const float freq_factor = freq_factors ? freq_factors[i0/2] : 1.0f;
+
+            float theta_extrap = theta / freq_factor;
-            const float theta_extrap = theta;
-            const float theta_interp = freq_scale * theta;
+            const float theta_interp = freq_scale * theta_extrap;
@@
-            cache[i0 + 0] = cosf(theta) * attn_factor;
-            cache[i0 + 1] = sinf(theta) * attn_factor * sin_sign;
+            cache[i0 + 0] = cosf(theta) * attn_factor;
+            cache[i0 + 1] = sinf(theta) * attn_factor * sin_sign;
+
+            theta *= theta_scale;
         }
     };
(Adjust naming to taste; core idea is to reuse theta as in ggml_rope_cache_init.)
5877-5879: M‑RoPE / sections parameters are currently unused

sections[] is read from op_params and there are // TODO M-RoPE comments, but the current implementation of ggml_compute_forward_rope_comp_flt never uses sections or applies multi-section RoPE logic. If higher-level code passes nontrivial section configs or expects M‑RoPE behavior, those settings will be silently ignored.

If this is intentional for the demo, consider documenting that ROPE_COMP currently handles only the simple (non‑M‑RoPE) layouts, or wire sections through to a multi-section init_cache to match the standard ggml_mrope_cache_init behavior.

Also applies to: 5921-5937

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a640788 and 11d1450.

📒 Files selected for processing (2)

ggml/src/ggml-cpu/ops.cpp (1 hunks)
ggml/src/ggml-metal/ggml-metal.metal (3 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.{cpp,h,hpp}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{cpp,h,hpp}: Always format C++ code using git clang-format before committing, following .clang-format configuration (4-space indentation, 120 column limit, braces on same line for functions, pointer/reference alignment in middle)
Minimize external dependencies; avoid adding new external dependencies unless absolutely necessary

Files:

ggml/src/ggml-cpu/ops.cpp

**/*.{cpp,h,hpp,py}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Ensure cross-platform compatibility by testing code changes on Linux, macOS, and Windows when possible

Files:

ggml/src/ggml-cpu/ops.cpp

🧠 Learnings (2)

📓 Common learnings

Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 14
File: examples/lookahead/lookahead.cpp:97-98
Timestamp: 2025-03-13T22:56:47.417Z
Learning: ngxson prefers to prioritize simplification in some cases and defer adding error handling to a later time, as indicated by their response to a suggestion about adding error checks for llama_decode_ext calls.

📚 Learning: 2025-11-29T22:55:53.865Z

Learnt from: CR
Repo: ngxson/llama.cpp PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T22:55:53.865Z
Learning: Applies to src/llama.cpp : Maintain core library implementation in `src/llama.cpp` with careful attention to API contracts and backward compatibility

Applied to files:

ggml/src/ggml-cpu/ops.cpp

🧬 Code graph analysis (1)

ggml/src/ggml-cpu/ops.cpp (2)

ggml/src/ggml-cuda/rope.cu (1)

rope_yarn_ramp (15-15)

ggml/src/ggml-sycl/rope.cpp (2)

rope_yarn_ramp (13-16)

rope_yarn_ramp (13-13)

🔇 Additional comments (1)

ggml/src/ggml-metal/ggml-metal.metal (1)

4417-4433: RoPE_COMP kernel typedef and host bindings look consistent

The kernel_rope_comp_t typedef and the kernel_rope_comp_f32 / kernel_rope_comp_f16 host_name instantiations follow the same pattern as existing RoPE kernels (norm, neox, multi, vision) and correctly expose both float and half variants. No issues here.

coderabbitai · 2025-12-19T15:06:07Z

✅ Actions performed

Reviews paused.

ggml rope v2 demo

5078cfe

github-actions bot added ggml model labels Dec 15, 2025

This comment was marked as off-topic.

Sign in to view

fix sin sign

c69a94e

coderabbitai bot reviewed Dec 15, 2025

View reviewed changes

ggml/src/ggml-cpu/ops.cpp Show resolved Hide resolved

ggml/src/ggml-cpu/ops.cpp Outdated Show resolved Hide resolved

src/models/mistral3.cpp Outdated Show resolved Hide resolved

coderabbitai bot reviewed Dec 15, 2025

View reviewed changes

ngxson added 3 commits December 19, 2025 00:34

allow f32 as pos input

d01b455

add metal kernel

dd9c7e3

correct metal

0a47037

github-actions bot added the Apple Metal label Dec 19, 2025

coderabbitai bot reviewed Dec 19, 2025

View reviewed changes

add test to compare old vs new rope

334bee0

github-actions bot added the testing label Dec 19, 2025

coderabbitai bot reviewed Dec 19, 2025

View reviewed changes

tests/test-rope.cpp Show resolved Hide resolved

tests/test-rope.cpp Show resolved Hide resolved

ngxson added 3 commits December 19, 2025 14:41

fix tests

1affe19

fix more broken cases

509dfeb

passed non-mrope

a640788

coderabbitai bot reviewed Dec 19, 2025

View reviewed changes

ggml/include/ggml.h Show resolved Hide resolved

ggml/src/ggml-cpu/ops.cpp Show resolved Hide resolved

ggml/src/ggml-metal/ggml-metal.metal Show resolved Hide resolved

ggml/src/ggml.c Show resolved Hide resolved

ngxson added 2 commits December 19, 2025 15:55

revert changes to mistral3

a0b02a9

correct metal kernel

11d1450

coderabbitai bot reviewed Dec 19, 2025

View reviewed changes

ngxson mentioned this pull request Dec 19, 2025

Proposal: ggml_rope_v2 ggml-org/ggml#1401

Open

coderabbitai bot reviewed Dec 19, 2025

View reviewed changes

ggml rope v2 demo #59

Are you sure you want to change the base?

ggml rope v2 demo #59

Uh oh!

Conversation

ngxson commented Dec 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

ngxson commented Dec 15, 2025

Uh oh!

This comment was marked as off-topic.

ngxson commented Dec 15, 2025

Uh oh!

coderabbitai bot commented Dec 15, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson commented Dec 19, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ngxson commented Dec 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 15, 2025 •

edited

Loading