Install Docker

curl -fsSL https://get.docker.com -o get-docker.sh && sudo sh get-docker.sh

Start inference

For RK3588

docker run -it --name deepseek-r1-1.5b-fp16   --privileged    --net=host    --device /dev/dri    --device /dev/dma_heap    --device /dev/rknpu    --device /dev/mali0    -v /dev:/dev      ghcr.io/lj-hao/rk3588-deepseek-r1-distill-qwen:1.5b-fp16-latest

For RK3576

docker run -it --name deepseek-r1-1.5b-fp16   --privileged    --net=host    --device /dev/dri    --device /dev/dma_heap    --device /dev/rknpu    --device /dev/mali0    -v /dev:/dev      ghcr.io/lj-hao/rk3576-deepseek-r1-distill-qwen:1.5b-fp16-latest

Note: When you start the service, you can access http://localhost:8080/docs and http://localhost:8080/redoc to view the documentation.

Test API：

Non-streaming response：

curl http://127.0.0.1:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "rkllm-model",
    "messages": [
      {"role": "user", "content": "Where is the capital of China？"}
    ],
    "temperature": 1,
    "max_tokens": 512,
    "top_k": 1,
    "stream": false
  }'

Streaming response:

curl -N http://127.0.0.1:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "rkllm-model",
    "messages": [
      {"role": "user", "content": "Where is the capital of China？"}
    ],
    "temperature": 1,
    "max_tokens": 512,
    "top_k": 1,
    "stream": true
  }'

Use OpenAI API

Non-streaming response：

import openai

# Configure the OpenAI client to use your local server
client = openai.OpenAI(
    base_url="http://localhost:8080/v1",  # Point to your local server
    api_key="dummy-key"  # The API key can be anything for this local server
)

# Test the API
response = client.chat.completions.create(
    model="rkllm-model",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Where is the capital of China？"}
    ],
    temperature=0.7,
    max_tokens=512
)

print(response.choices[0].message.content)

Streaming response:

import openai

# Configure the OpenAI client to use your local server
client = openai.OpenAI(
    base_url="http://localhost:8080/v1",  # Point to your local server
    api_key="dummy-key"  # The API key can be anything for this local server
)

# Test the API with streaming
response_stream = client.chat.completions.create(
    model="rkllm-model",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Where is the capital of China？"}
    ],
    temperature=0.7,
    max_tokens=512,
    stream=True  # Enable streaming
)

# Process the streaming response
for chunk in response_stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)

Speed test

Note: A rough estimate of a model's inference speed includes both TTFT and TPOT. Note: You can use python test_inference_speed.py --help to view the help function.

python -m venv .env && source .env/bin/activate
pip install requests
python test_inference_speed.py

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github/workflows		.github/workflows
docker		docker
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
test_inference_speed.py		test_inference_speed.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Install Docker

Start inference

For RK3588

For RK3576

Test API：

Non-streaming response：

Streaming response:

Use OpenAI API

Non-streaming response：

Streaming response:

Speed test

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Languages

License

Seeed-Projects/reComputer-RK-LLM

Folders and files

Latest commit

History

Repository files navigation

Install Docker

Start inference

For RK3588

For RK3576

Test API：

Non-streaming response：

Streaming response:

Use OpenAI API

Non-streaming response：

Streaming response:

Speed test

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Languages

Packages