Skip to content

Mistral 7B v0.3 Support - Resolves part of #2#28

Open
davidjpyu wants to merge 1 commit intoInfini-AI-Lab:v0.1.0from
davidjpyu:v0.1.0
Open

Mistral 7B v0.3 Support - Resolves part of #2#28
davidjpyu wants to merge 1 commit intoInfini-AI-Lab:v0.1.0from
davidjpyu:v0.1.0

Conversation

@davidjpyu
Copy link

As we discussed earlier, there are implementations of mistral concurrently on both sides, after comparison, there is no meaningful difference on mistral.py, mistral_layer.py, and templates.py, so only auto_model.py is updated here and it is tested with its output answer correctly.

Although everything works fine, one thing to note is that generate.py works well using DEVICE = "cuda:0" in line 20, but not if using other gpus like changing that line to DEVICE = "cuda:1" instead. The bug would fall into cache.py line 79:
hidden_states = flashinfer.single_prefill_with_kv_cache(
with message:
File "/home/zhuominc/anaconda3/envs/junpuy_test/lib/python3.10/site-packages/flashinfer/prefill.py", line 186, in single_prefill_with_kv_cache
packed_custom_mask = packbits(
File "/home/zhuominc/anaconda3/envs/junpuy_test/lib/python3.10/site-packages/flashinfer/quantization.py", line 65, in packbits
return _kernels.packbits(x, bitorder)
RuntimeError: PackBits failed with error code an illegal memory access was encountered

I'm not sure if that is expected or it is something to figure out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant