Why LLama Architecture use LLAMA_ROPE_TYPE_NORM? #18127
-
|
I've found that huggingface implements Neox Style RoPE for llama based models as quoted below However I've found out that llama based models are assigned to LLAMA_ROPE_TYPE_NORM And I found out manually switching RoPE type to ROPE_TYPE_NEOX undermines performance severly, when I run PPL evaluation code in llama.cpp . Could anyone answer the underlying reason for applying different ROPE_type than pytorch? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
Because Q/K gets permuted on conversion: llama.cpp/convert_hf_to_gguf.py Lines 2506 to 2509 in 3d86c6c |
Beta Was this translation helpful? Give feedback.
Because Q/K gets permuted on conversion:
llama.cpp/convert_hf_to_gguf.py
Lines 2506 to 2509 in 3d86c6c