Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
e916d21
Add Apriel2 conversion documentation and supernet pruning examples
tscholak Jan 10, 2026
34fe694
Add vLLM model implementation for Apriel2 with plugin-based registration
tscholak Jan 12, 2026
527b692
Refactor vLLM Apriel2 weight loading and add test script
tscholak Jan 15, 2026
86622a5
Merge branch 'main' into feature/vllm-apriel2-models
tscholak Jan 15, 2026
8689afd
Merge branch 'main' into feature/vllm-apriel2-models
tscholak Jan 15, 2026
e8b93e0
apriel2 modeling bug
oleksost Jan 16, 2026
303206b
kda fix
oleksost Jan 16, 2026
cd7f314
Require CUDA kernels with no silent fallbacks
tscholak Jan 17, 2026
9af693c
Fix GDN chunk mode to use initial_state from cache
tscholak Jan 17, 2026
fe64f6c
Add extended cache tests for GDN and KDA equivalence
tscholak Jan 17, 2026
f6989c6
Merge cache and non-cache equivalence tests with proper Apriel2Cache
tscholak Jan 17, 2026
05cdfdd
Refactor mixer tests and fix KDA mode selection
tscholak Jan 17, 2026
66f4696
Remove redundant test_causal_conv1d.py
tscholak Jan 17, 2026
24f6133
Consolidate cache.py into modeling_apriel2.py
tscholak Jan 17, 2026
f25c24e
Enable bf16 tests and fix dtype handling
tscholak Jan 17, 2026
5669a35
Refactor test_mixer_equivalence.py: extract config fixtures and helpers
tscholak Jan 17, 2026
068d001
Merge branch 'fix/require-cuda-kernels-no-fallbacks' into feature/vll…
tscholak Jan 18, 2026
b6449db
chunked prefil mode recurrent state
oleksost Jan 19, 2026
d4d93e1
Fix rope_theta parameter and improve test coverage
tscholak Jan 19, 2026
c3a6b44
Fix GDN/KDA bugs, require CUDA kernels, add cache-aware tests (#451)
tscholak Jan 19, 2026
da1225d
Merge oo/apriel_modeling_bug: GDN/KDA fixes and recurrent state fix
tscholak Jan 19, 2026
5575466
Merge main: apriel2 modeling bug fixes (#450)
tscholak Jan 19, 2026
d62d38a
Merge main: gRMS norm default to silu (#452)
tscholak Jan 19, 2026
8146b45
Fix vLLM KDA norm activation: read from config instead of hardcoding …
tscholak Jan 20, 2026
c7741db
Add vLLM kernel flags and debugging for Apriel2 GDN alignment
tscholak Jan 20, 2026
5c16df1
Add recurrent state debugging for vLLM vs TF comparison
tscholak Jan 20, 2026
806b6a8
Add pure GDN surgery configs and improve debug logging
tscholak Jan 21, 2026
f788afa
Consolidate vLLM debug flags to top-level module constants
tscholak Jan 21, 2026
bde94fc
Remove conditional branch in GDN head expansion for torch.compile
tscholak Jan 21, 2026
639f42d
Re-enable compilation_config in test script
tscholak Jan 21, 2026
d9d1c26
Unify decoder layer forward signatures to eliminate isinstance dispatch
tscholak Jan 21, 2026
63d137d
Add shape invariants to Apriel2Model for torch.compile compatibility
tscholak Jan 21, 2026
707a59d
Comment out debug code to enable torch.compile compatibility
tscholak Jan 21, 2026
8f61023
Add statistical testing infrastructure to test_apriel2.py
tscholak Jan 21, 2026
4ebb282
Add stochastic mixer support for vLLM Apriel2 models
tscholak Jan 21, 2026
cc02501
Refactor unified page size machinery and remove dead code
tscholak Jan 21, 2026
9b2c42d
Add caching for unified page size computation
tscholak Jan 21, 2026
8eca6f3
Unify mixer signatures to use output buffer pattern for placement swi…
tscholak Jan 22, 2026
2972996
Use vLLM plugin system for Apriel2 registration and consolidate tests
tscholak Jan 22, 2026
7ac0262
Remove unused _l2norm functions
tscholak Jan 23, 2026
1582813
Merge origin/main into feature/vllm-apriel2-models
tscholak Jan 23, 2026
eb9360d
Fix apriel2 multimodal test to use AutoModelForImageTextToText
tscholak Jan 23, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions fast_llm_external_models/apriel2/examples/pure_gdn_step1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Step 1: Convert fixed -> pattern with all GDN blocks
#
# Sets main_mixer_name to gdn for all layers
# Run before pure_gdn_step2.yaml
#
# Usage:
# python convert.py /tmp/apriel2-0.5b-dev /tmp/apriel2-0.5b-pure-gdn \
# -s examples/pure_gdn_step1.yaml \
# -s examples/pure_gdn_step2.yaml

decoder:
type: pattern
# Single block type - all layers use GDN
pattern: [gdn_block]

blocks:
gdn_block:
mixer:
main_mixer_name: gdn
18 changes: 18 additions & 0 deletions fast_llm_external_models/apriel2/examples/pure_gdn_step2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Step 2: Unwrap stochastic -> pure GDN
#
# Converts stochastic mixer to non-stochastic GDN for all layers
# Run after pure_gdn_step1.yaml
#
# Usage:
# python convert.py /tmp/apriel2-0.5b-dev /tmp/apriel2-0.5b-pure-gdn \
# -s examples/pure_gdn_step1.yaml \
# -s examples/pure_gdn_step2.yaml

decoder:
blocks:
gdn_block:
mixer:
type: gdn
init: transfer
convolution_layer:
kernel_size: 4
Loading
Loading