-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Hello!
I'm working on a TL1 implementation for the BitNet. I'm using the weights in the .safetensors here: https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16.
I'm quantizing the weights, following your paper, using the absmean() method. Here is a snippet of my code.
std::vector<int8_t> bitnet_158_quantize(const std::vector& weight_array, float32_t * weight_scale, int M, int K) {
const float32_t epsilon = 1e-7f;
int size = weight_array.size();
float sum_abs = 0.0f;
for(int m = 0; m < M; m++) {
for(int k = 0; k < K; k++) {
sum_abs += std::fabs(weight_array[m * K + k]) ;
}
}
float32_t gamma = sum_abs / (M * K);
//gamma = 4.365f;
weight_scale[0] = gamma;
std::vector<int8_t> quantized_w(size);
for(int m = 0; m < M; m++) {
for(int k = 0; k < K; k++) {
float32_t gamma = weight_scale[0];
int idx = m * K + k;
float normalized = weight_array[idx] / (gamma + epsilon);
float rounded = std::round(normalized);
// Clip to [-1, 1] range
int8_t clipped = static_cast<int8_t>(
std::max(-1.0f, std::min(1.0f, rounded))
);
quantized_w[idx] = clipped;
}
}
return quantized_w;
}
The problem I'm having is that the Ternary weights produce results with a huge difference compared to the original .safetensors. If I do a Cosine Similarity between my MatMul with the ternary weights and the reference weights, it falls below 0.7, which I don't believe is preserving the signals and result in math explosion in the deeper layers.
Things I have checked:
- Yes I'm using absmean() method as described in your paper.
- Yes I'm using a global weight scale and later applied as a multiplier in my kernel for LUT kernel.
- My LUT kernel is producing correct results if tested with weights initialized with uniform randomized {-1, 0, 1}, and a global weight scale of 1.0, so I know the other aspect of my algo is solid.
Please let me know how should I quantize this BF16 model for TL1? Does it need a block/tile weight scaler? Do I need to customize the ggml code to add some magic sauce somewhere?
thanks!
David