Skip to content
Discussion options

You must be logged in to vote

The version of llama.cpp in the python package is from 4 months ago, while the version in node-llama-cpp is from a few days ago.
I've seen a few fixes implemented in llama.cpp in the last few months that improve stability and correctness that might lead to the difference you're seeing here. It could help if you could compare the performance with older versions of node-llama-cpp, specifically 3.12.3 since it uses a version that came shortly after the one used in the python package.

How did you measure the speed in node-llama-cpp? Have you excluded the time it takes to load the model and the context?
Also, the first token might take some time to generate since things are still loading durin…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@xushengfeng
Comment options

@giladgd
Comment options

Answer selected by giladgd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants