add Llama Stack quickstart guide and notebook demo #107

davidwtf · 2026-01-15T09:25:30Z

Summary by CodeRabbit

Documentation
- Added a comprehensive guide for creating AI agents with LlamaStack: overview, prerequisites, deployment, quickstart, FAQ, and resources.
- Added an example stack configuration showcasing providers, storage/persistence backends, environment fallbacks, and model registration.
- Added an interactive Quickstart notebook with end-to-end examples: tool definition, agent creation, session handling, streaming responses, and a FastAPI deployment example.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-15T09:25:42Z

Walkthrough

Adds three new documentation artifacts: a user guide for creating an AI agent with LlamaStack, a sample YAML stack configuration, and a Jupyter quickstart notebook demonstrating server startup, agent/tool setup, streaming, session handling, and a FastAPI chat endpoint.

Changes

Cohort / File(s)	Summary
User guide `docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md`	New guide covering overview, prerequisites, operator-based deployment (upload/install/server CR), quickstart notebook workflow, sample LlamaStackDistribution YAML, env var notes, FAQ (Python 3.12), and resources.
Stack configuration `docs/public/llama-stack/llama_stack_config.yaml`	New YAML (version: 2) defining APIs (inference, agents, safety, tool_runtime, vector_io, files), provider mappings (remote OpenAI/DeepSeek, inline providers), persistence backends (sqlite, KV, SQL), env var fallbacks, and a model entry (`deepseek/deepseek-chat`).
Notebook / Examples `docs/public/llama-stack/llama_stack_quickstart.ipynb`	New Jupyter notebook with server startup/install steps, a `@client_tool` weather tool (`get_weather`), agent creation and session usage, streaming via `AgentEventLogger`, and a FastAPI example including `ChatRequest` and `chat` endpoint.

Sequence Diagram(s)

sequenceDiagram
  participant User as User
  participant Notebook as FastAPI\n(Notebook/API)
  participant Agent as Agent
  participant Model as Model\n(LLM)
  participant Tool as Tool\n(get_weather)

  User->>Notebook: POST /chat {message}
  Notebook->>Agent: Start session / enqueue message
  Agent->>Model: Request response (streaming)
  Agent->>Tool: Invoke get_weather(...) if tool required
  Tool-->>Agent: Return tool result
  Model-->>Agent: Stream tokens/results
  Agent-->>Notebook: Stream aggregated response
  Notebook-->>User: Stream partial/final responses

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through docs and YAML trails,

I fetched the weather, followed agent tales,
A notebook sparked and streamed reply,
FastAPI waved the messages by,
I nibbled notes and gave a joyful cry 🌿✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'add Llama Stack quickstart guide and notebook demo' directly summarizes the main changes: a new quickstart guide documentation, configuration file, and Jupyter notebook for Llama Stack.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@docs/public/llama-stack/llama_stack_config.yaml`:
- Around line 1-60: The metadata_store block omits an explicit db_path; add a
db_path entry to metadata_store mirroring the pattern used for vector_io and
files so it reads metadata_store: type: sqlite and db_path:
${env.SQLITE_STORE_DIR:~/.llama/distributions/llama-stack-demo}/registry.db
(update the metadata_store section in the YAML to include this db_path key).

In `@docs/public/llama-stack/llama_stack_quickstart.ipynb`:
- Around line 462-467: Update the notebook metadata kernelspec so the kernel
name and display_name reflect the Llama Stack quickstart (e.g., change
kernelspec.name from "langchain-demo" and kernelspec.display_name from "Python
(langchain-demo)" to a clearer identifier like "llama-stack" and "Python (Llama
Stack)" respectively) by editing the kernelspec block in the notebook metadata.
- Around line 122-148: The docstring for get_weather promises wind speed but the
returned dict only contains city, temperature, and humidity; update the function
to include wind speed by extracting it from the parsed API response (e.g.,
current['windspeedKmph'] or current['windspeedMiles'] depending on desired
units) and add a 'wind_speed' key to the returned dictionary, or alternatively
remove the "wind speed" mention from the docstring to make it match the existing
return value.
- Around line 194-208: Agent creation uses model_id which may be undefined if
the model listing try block failed; move the Agent(...) creation (the Agent
instantiation that references model_id, client, get_weather and instructions)
inside the try block that sets model_id or add an early exit/conditional guard
after the except (e.g., return or raise) so Agent(...) is only called when
model_id is successfully set; ensure you reference the same Agent(...) call and
the model_id assignment to relocate or gate the creation.

🧹 Nitpick comments (2)

docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md (1)

41-44: Consider varying the link descriptions.

All four resource links begin with "Llama Stack", which creates repetition. You could vary the wording:

💡 Suggested rewording

-- [Llama Stack Documentation](https://llamastack.github.io/docs) - The official Llama Stack documentation covering all usage-related topics, API providers, and core concepts.
-- [Llama Stack Core Concepts](https://llamastack.github.io/docs/concepts) - Deep dive into Llama Stack architecture, API stability, and resource management.
-- [Llama Stack GitHub Repository](https://github.com/llamastack/llama-stack) - Source code, example applications, distribution configurations, and how to add new API providers.
-- [Llama Stack Example Apps](https://github.com/llamastack/llama-stack-apps/) - Official examples demonstrating how to use Llama Stack in various scenarios.
+- [Official Documentation](https://llamastack.github.io/docs) - Covers all usage-related topics, API providers, and core concepts.
+- [Core Concepts Guide](https://llamastack.github.io/docs/concepts) - Deep dive into architecture, API stability, and resource management.
+- [GitHub Repository](https://github.com/llamastack/llama-stack) - Source code, example applications, and distribution configurations.
+- [Example Applications](https://github.com/llamastack/llama-stack-apps/) - Official examples demonstrating various use cases.

docs/public/llama-stack/llama_stack_quickstart.ipynb (1)

325-343: Consider session management for the chat endpoint.

The /chat endpoint creates a new session for every request (line 328). For a demo this works, but in production:

Sessions accumulate without cleanup

Conversation context is lost between requests

For a production-ready version, consider reusing sessions or implementing session cleanup:
# Option 1: Single shared session (simple approach)
_session_id = None

`@api_app.post`("/chat")
async def chat(request: ChatRequest):
    global _session_id
    if _session_id is None:
        _session_id = agent.create_session('fastapi-weather-session')
    # ... rest of the code using _session_id

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 98fc418 and f51adf9.

📒 Files selected for processing (3)

docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md
docs/public/llama-stack/llama_stack_config.yaml
docs/public/llama-stack/llama_stack_quickstart.ipynb

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2026-01-13T11:25:34.596Z

Learnt from: jing2uo
Repo: alauda/knowledge PR: 104
File: docs/en/solutions/How_to_Migrate_VirtualMachine_From_VMware.md:131-172
Timestamp: 2026-01-13T11:25:34.596Z
Learning: In VMware migration documentation (docs/en/solutions), when describing the Forklift Operator workflow for VMware, specify that the VMware provider secret should set insecureSkipVerify=true to accommodate self-signed certificates commonly used in enterprise vCenter/ESXi environments. Include a note on the security trade-offs and ensure readers understand this is for internal, controlled environments.

Applied to files:

docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md

📚 Learning: 2025-12-30T08:43:22.275Z

Learnt from: davidwtf
Repo: alauda/knowledge PR: 95
File: docs/public/langchain/langchain_quickstart.ipynb:163-171
Timestamp: 2025-12-30T08:43:22.275Z
Learning: When reviewing LangChain notebooks, validate the agent creation pattern: ensure code uses create_agent(model=..., tools=[...], system_prompt="...") with model being a chat model instance (e.g., ChatOpenAI) and system_prompt provided as a string describing system instructions. This reflects a standard agent construction and can be applied across LangChain notebook files.

Applied to files:

docs/public/llama-stack/llama_stack_quickstart.ipynb

🪛 LanguageTool

docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md

[style] ~43-~43: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... stability, and resource management. - [Llama Stack GitHub Repository](https://github...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~44-~44: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...s, and how to add new API providers. - [Llama Stack Example Apps](https://github.com/...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 Ruff (0.14.11)

docs/public/llama-stack/llama_stack_quickstart.ipynb

42-42: Do not catch blind exception: Exception

(BLE001)

43-43: Use explicit conversion flag

Replace with conversion flag

(RUF010)

59-59: Abstract raise to an inner function

(TRY301)

59-59: Create your own exception

(TRY002)

59-59: Avoid specifying long messages outside the exception class

(TRY003)

73-73: Abstract raise to an inner function

(TRY301)

73-73: Create your own exception

(TRY002)

73-73: Avoid specifying long messages outside the exception class

(TRY003)

78-78: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build

🔇 Additional comments (3)

docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md (1)

1-44: Documentation is clear and comprehensive.

The guide covers all essential aspects: overview, prerequisites, quickstart instructions, and additional resources. The structure is logical and user-friendly.

docs/public/llama-stack/llama_stack_quickstart.ipynb (2)

85-106: LGTM! Import setup is appropriate for notebook environments.

The custom path manipulation to include ~/packages aligns with the --target ~/packages flag used in the pip install cell, which is a valid pattern for restricted notebook environments.

358-384: LGTM! Server startup approach is appropriate for notebook demo.

The daemon thread approach with clear comments about production alternatives is well-documented. Using daemon=True ensures cleanup when the kernel restarts.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

docs/public/llama-stack/llama_stack_config.yaml

docs/public/llama-stack/llama_stack_quickstart.ipynb

coderabbitai · 2026-01-15T09:30:23Z

docs/public/llama-stack/llama_stack_quickstart.ipynb

+        "except Exception as e:\n",
+        "    print(f'Failed to get model list: {e}')\n",
+        "    print('Make sure the server is running')\n",
+        "\n",
+        "\n",
+        "# Create Agent\n",
+        "print('Creating Agent...')\n",
+        "agent = Agent(\n",
+        "    client,\n",
+        "    model=model_id,\n",
+        "    instructions='You are a helpful weather assistant. When users ask about weather, use the get_weather tool to query weather information, then answer based on the query results.',\n",
+        "    tools=[get_weather],\n",
+        ")\n",
+        "\n",
+        "print('Agent created successfully')"


⚠️ Potential issue | 🟡 Minor

Agent creation may fail if model listing failed.

The agent creation at lines 199-208 uses model_id which is only defined inside the try block (line 191). If the model listing fails, model_id will be undefined and agent creation will raise a NameError.

🔧 Suggested fix: Move agent creation inside the try block or add early exit

except Exception as e: print(f'Failed to get model list: {e}') print('Make sure the server is running') + raise # Re-raise to prevent subsequent cells from failing # Create Agent

Or wrap agent creation in a conditional:

+if 'model_id' in dir(): # Create Agent print('Creating Agent...') agent = Agent( client, model=model_id, instructions='You are a helpful weather assistant...', tools=[get_weather], ) print('Agent created successfully') +else: + print('Skipping agent creation - no model available')

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"except Exception as e:\n",

" print(f'Failed to get model list: {e}')\n",

" print('Make sure the server is running')\n",

"\n",

"\n",

"# Create Agent\n",

"print('Creating Agent...')\n",

"agent = Agent(\n",

" client,\n",

" model=model_id,\n",

" instructions='You are a helpful weather assistant. When users ask about weather, use the get_weather tool to query weather information, then answer based on the query results.',\n",

" tools=[get_weather],\n",

")\n",

"\n",

"print('Agent created successfully')"

except Exception as e:

print(f'Failed to get model list: {e}')

print('Make sure the server is running')

raise # Re-raise to prevent subsequent cells from failing

# Create Agent

print('Creating Agent...')

agent = Agent(

client,

model=model_id,

instructions='You are a helpful weather assistant. When users ask about weather, use the get_weather tool to query weather information, then answer based on the query results.',

tools=[get_weather],

)

print('Agent created successfully')

🤖 Prompt for AI Agents

In `@docs/public/llama-stack/llama_stack_quickstart.ipynb` around lines 194 - 208, Agent creation uses model_id which may be undefined if the model listing try block failed; move the Agent(...) creation (the Agent instantiation that references model_id, client, get_weather and instructions) inside the try block that sets model_id or add an early exit/conditional guard after the except (e.g., return or raise) so Agent(...) is only called when model_id is successfully set; ensure you reference the same Agent(...) call and the model_id assignment to relocate or gate the creation.

docs/public/llama-stack/llama_stack_quickstart.ipynb

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md`:
- Around line 68-70: Remove the leading space in the environment variable value
for VLLM_URL so the provider URL is correct; locate the VLLM_URL entry (the
name: VLLM_URL and its value field) and change the value from "
https://api.deepseek.com/v1" to "https://api.deepseek.com/v1" (no leading
whitespace).

🧹 Nitpick comments (2)

docs/public/llama-stack/llama_stack_quickstart.ipynb (2)
129-148: Docstring promises “description” but the return payload omits it.

Either remove “description” from the docstring or include it in the returned dict to avoid confusing tool consumers.
♻️ Proposed fix (add description)
-        return {
-            'city': city,
-            'temperature': f"{current['temp_C']}°C",
-            'humidity': f"{current['humidity']}%",
-        }
+        description = current.get('weatherDesc', [{}])[0].get('value', '')
+        return {
+            'city': city,
+            'temperature': f"{current['temp_C']}°C",
+            'description': description,
+            'humidity': f"{current['humidity']}%",
+        }
199-202: Prefer bare raise to preserve traceback.

raise e resets the traceback and makes debugging harder.
♻️ Proposed fix
-    raise e
+    raise

docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md`:
- Around line 68-73: Replace the inline VLLM_API_TOKEN value with a Kubernetes
Secret reference instead of plaintext: update the env var entry for
VLLM_API_TOKEN in the LlamaStackDistribution CRD/spec so it uses
valueFrom.secretKeyRef (referencing your Secret name and key) rather than
setting value: XXX; ensure the Secret contains the token and that the container
spec (where VLLM_URL, VLLM_MAX_TOKENS, and VLLM_API_TOKEN are defined)
references that secret via valueFrom.secretKeyRef to securely inject the token
at runtime.

In `@docs/public/llama-stack/llama_stack_quickstart.ipynb`:
- Around line 128-150: The get_weather function builds a wttr.in URL using the
raw city string which breaks for spaces and non-ASCII characters; update
get_weather to percent-encode the city before interpolating into url (e.g., use
urllib.parse.quote or quote_plus or pass city as a query/path-encoded parameter)
so city names like "New York" and Unicode names are valid; ensure the encoding
is applied to the city variable used in the url =
f'https://wttr.in/{city}?format=j1' construction and keep the existing
timeout/response handling.
- Around line 331-349: The endpoint is defined as async but calls blocking
functions (agent.create_turn and AgentEventLogger.log); change the FastAPI route
handler from "async def chat(request: ChatRequest)" to a synchronous "def
chat(request: ChatRequest)" so FastAPI runs it in a threadpool, keep the body
logic the same (call agent.create_turn(...) and iterate
logger.log(response_stream) directly) and remove any awaits or async-only
constructs; ensure the decorator remains `@api_app.post`("/chat") and the function
name chat, and keep returning the {"response": full_response} dict.

docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md

docs/public/llama-stack/llama_stack_quickstart.ipynb

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md`:
- Around line 116-120: The extraction step uses tar with --strip-components=1
into ~/python312 but doesn't ensure the target directory exists; update the
documentation step that currently shows "tar -xzf /tmp/python312.tar.gz -C
~/python312 --strip-components=1" to create the directory first (use mkdir -p
~/python312) before running the tar command so extraction won't fail.

♻️ Duplicate comments (2)

docs/public/llama-stack/llama_stack_quickstart.ipynb (2)
137-139: URL-encode the city parameter to handle spaces and unicode characters.

City names like "New York" or non-ASCII names will produce invalid URLs. Use urllib.parse.quote to encode the city before interpolating into the URL.
🔧 Proposed fix
+        from urllib.parse import quote
-        url = f'https://wttr.in/{city}?format=j1'
+        url = f'https://wttr.in/{quote(city)}?format=j1'
331-349: Use a sync endpoint instead of async for blocking I/O.

agent.create_session(), agent.create_turn(), and AgentEventLogger.log() are synchronous blocking calls. Using async def here blocks the event loop and prevents concurrent request handling. Change to a sync def endpoint—FastAPI will automatically run it in a threadpool.
🔧 Proposed fix
 `@api_app.post`("/chat")
-async def chat(request: ChatRequest):
+def chat(request: ChatRequest):
     """Chat endpoint that uses the Llama Stack Agent"""

docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md

add Llama Stack quickstart guide and notebook demo

f51adf9

davidwtf temporarily deployed to translate January 15, 2026 09:25 — with GitHub Actions Inactive

coderabbitai bot reviewed Jan 15, 2026

View reviewed changes

update

7f42896

davidwtf temporarily deployed to translate January 15, 2026 09:45 — with GitHub Actions Inactive

typhoonzero approved these changes Jan 16, 2026

View reviewed changes

update

a9b6173

davidwtf temporarily deployed to translate January 26, 2026 09:59 — with GitHub Actions Inactive

coderabbitai bot reviewed Jan 26, 2026

View reviewed changes

docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md Show resolved Hide resolved

update

6ed781b

davidwtf requested a review from a team January 26, 2026 15:11

davidwtf temporarily deployed to translate January 26, 2026 15:11 — with GitHub Actions Inactive

coderabbitai bot reviewed Jan 26, 2026

View reviewed changes

docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md Outdated Show resolved Hide resolved

docs/public/llama-stack/llama_stack_quickstart.ipynb Show resolved Hide resolved

docs/public/llama-stack/llama_stack_quickstart.ipynb Show resolved Hide resolved

update

dabdba1

davidwtf temporarily deployed to translate January 26, 2026 15:40 — with GitHub Actions Inactive

coderabbitai bot reviewed Jan 26, 2026

View reviewed changes

docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md Show resolved Hide resolved

update

e8c9f10

davidwtf deployed to translate January 26, 2026 15:58 — with GitHub Actions Active

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add Llama Stack quickstart guide and notebook demo #107

add Llama Stack quickstart guide and notebook demo #107

davidwtf commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 15, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Jan 15, 2026

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

add Llama Stack quickstart guide and notebook demo #107

Are you sure you want to change the base?

add Llama Stack quickstart guide and notebook demo #107

Conversation

davidwtf commented Jan 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

davidwtf commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 15, 2026 •

edited

Loading