Skip to content

Conversation

@davidwtf
Copy link
Contributor

@davidwtf davidwtf commented Jan 15, 2026

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive guide for creating AI agents with LlamaStack: overview, prerequisites, deployment, quickstart, FAQ, and resources.
    • Added an example stack configuration showcasing providers, storage/persistence backends, environment fallbacks, and model registration.
    • Added an interactive Quickstart notebook with end-to-end examples: tool definition, agent creation, session handling, streaming responses, and a FastAPI deployment example.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 15, 2026

Walkthrough

Adds three new documentation artifacts: a user guide for creating an AI agent with LlamaStack, a sample YAML stack configuration, and a Jupyter quickstart notebook demonstrating server startup, agent/tool setup, streaming, session handling, and a FastAPI chat endpoint.

Changes

Cohort / File(s) Summary
User guide
docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md
New guide covering overview, prerequisites, operator-based deployment (upload/install/server CR), quickstart notebook workflow, sample LlamaStackDistribution YAML, env var notes, FAQ (Python 3.12), and resources.
Stack configuration
docs/public/llama-stack/llama_stack_config.yaml
New YAML (version: 2) defining APIs (inference, agents, safety, tool_runtime, vector_io, files), provider mappings (remote OpenAI/DeepSeek, inline providers), persistence backends (sqlite, KV, SQL), env var fallbacks, and a model entry (deepseek/deepseek-chat).
Notebook / Examples
docs/public/llama-stack/llama_stack_quickstart.ipynb
New Jupyter notebook with server startup/install steps, a @client_tool weather tool (get_weather), agent creation and session usage, streaming via AgentEventLogger, and a FastAPI example including ChatRequest and chat endpoint.

Sequence Diagram(s)

sequenceDiagram
  participant User as User
  participant Notebook as FastAPI\n(Notebook/API)
  participant Agent as Agent
  participant Model as Model\n(LLM)
  participant Tool as Tool\n(get_weather)

  User->>Notebook: POST /chat {message}
  Notebook->>Agent: Start session / enqueue message
  Agent->>Model: Request response (streaming)
  Agent->>Tool: Invoke get_weather(...) if tool required
  Tool-->>Agent: Return tool result
  Model-->>Agent: Stream tokens/results
  Agent-->>Notebook: Stream aggregated response
  Notebook-->>User: Stream partial/final responses
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through docs and YAML trails,

I fetched the weather, followed agent tales,
A notebook sparked and streamed reply,
FastAPI waved the messages by,
I nibbled notes and gave a joyful cry 🌿✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'add Llama Stack quickstart guide and notebook demo' directly summarizes the main changes: a new quickstart guide documentation, configuration file, and Jupyter notebook for Llama Stack.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@docs/public/llama-stack/llama_stack_config.yaml`:
- Around line 1-60: The metadata_store block omits an explicit db_path; add a
db_path entry to metadata_store mirroring the pattern used for vector_io and
files so it reads metadata_store: type: sqlite and db_path:
${env.SQLITE_STORE_DIR:~/.llama/distributions/llama-stack-demo}/registry.db
(update the metadata_store section in the YAML to include this db_path key).

In `@docs/public/llama-stack/llama_stack_quickstart.ipynb`:
- Around line 462-467: Update the notebook metadata kernelspec so the kernel
name and display_name reflect the Llama Stack quickstart (e.g., change
kernelspec.name from "langchain-demo" and kernelspec.display_name from "Python
(langchain-demo)" to a clearer identifier like "llama-stack" and "Python (Llama
Stack)" respectively) by editing the kernelspec block in the notebook metadata.
- Around line 122-148: The docstring for get_weather promises wind speed but the
returned dict only contains city, temperature, and humidity; update the function
to include wind speed by extracting it from the parsed API response (e.g.,
current['windspeedKmph'] or current['windspeedMiles'] depending on desired
units) and add a 'wind_speed' key to the returned dictionary, or alternatively
remove the "wind speed" mention from the docstring to make it match the existing
return value.
- Around line 194-208: Agent creation uses model_id which may be undefined if
the model listing try block failed; move the Agent(...) creation (the Agent
instantiation that references model_id, client, get_weather and instructions)
inside the try block that sets model_id or add an early exit/conditional guard
after the except (e.g., return or raise) so Agent(...) is only called when
model_id is successfully set; ensure you reference the same Agent(...) call and
the model_id assignment to relocate or gate the creation.
🧹 Nitpick comments (2)
docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md (1)

41-44: Consider varying the link descriptions.

All four resource links begin with "Llama Stack", which creates repetition. You could vary the wording:

💡 Suggested rewording
-- [Llama Stack Documentation](https://llamastack.github.io/docs) - The official Llama Stack documentation covering all usage-related topics, API providers, and core concepts.
-- [Llama Stack Core Concepts](https://llamastack.github.io/docs/concepts) - Deep dive into Llama Stack architecture, API stability, and resource management.
-- [Llama Stack GitHub Repository](https://github.com/llamastack/llama-stack) - Source code, example applications, distribution configurations, and how to add new API providers.
-- [Llama Stack Example Apps](https://github.com/llamastack/llama-stack-apps/) - Official examples demonstrating how to use Llama Stack in various scenarios.
+- [Official Documentation](https://llamastack.github.io/docs) - Covers all usage-related topics, API providers, and core concepts.
+- [Core Concepts Guide](https://llamastack.github.io/docs/concepts) - Deep dive into architecture, API stability, and resource management.
+- [GitHub Repository](https://github.com/llamastack/llama-stack) - Source code, example applications, and distribution configurations.
+- [Example Applications](https://github.com/llamastack/llama-stack-apps/) - Official examples demonstrating various use cases.
docs/public/llama-stack/llama_stack_quickstart.ipynb (1)

325-343: Consider session management for the chat endpoint.

The /chat endpoint creates a new session for every request (line 328). For a demo this works, but in production:

  1. Sessions accumulate without cleanup
  2. Conversation context is lost between requests

For a production-ready version, consider reusing sessions or implementing session cleanup:

# Option 1: Single shared session (simple approach)
_session_id = None

`@api_app.post`("/chat")
async def chat(request: ChatRequest):
    global _session_id
    if _session_id is None:
        _session_id = agent.create_session('fastapi-weather-session')
    # ... rest of the code using _session_id
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 98fc418 and f51adf9.

📒 Files selected for processing (3)
  • docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md
  • docs/public/llama-stack/llama_stack_config.yaml
  • docs/public/llama-stack/llama_stack_quickstart.ipynb
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-01-13T11:25:34.596Z
Learnt from: jing2uo
Repo: alauda/knowledge PR: 104
File: docs/en/solutions/How_to_Migrate_VirtualMachine_From_VMware.md:131-172
Timestamp: 2026-01-13T11:25:34.596Z
Learning: In VMware migration documentation (docs/en/solutions), when describing the Forklift Operator workflow for VMware, specify that the VMware provider secret should set insecureSkipVerify=true to accommodate self-signed certificates commonly used in enterprise vCenter/ESXi environments. Include a note on the security trade-offs and ensure readers understand this is for internal, controlled environments.

Applied to files:

  • docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md
📚 Learning: 2025-12-30T08:43:22.275Z
Learnt from: davidwtf
Repo: alauda/knowledge PR: 95
File: docs/public/langchain/langchain_quickstart.ipynb:163-171
Timestamp: 2025-12-30T08:43:22.275Z
Learning: When reviewing LangChain notebooks, validate the agent creation pattern: ensure code uses create_agent(model=..., tools=[...], system_prompt="...") with model being a chat model instance (e.g., ChatOpenAI) and system_prompt provided as a string describing system instructions. This reflects a standard agent construction and can be applied across LangChain notebook files.

Applied to files:

  • docs/public/llama-stack/llama_stack_quickstart.ipynb
🪛 LanguageTool
docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md

[style] ~43-~43: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... stability, and resource management. - [Llama Stack GitHub Repository](https://github...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~44-~44: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...s, and how to add new API providers. - [Llama Stack Example Apps](https://github.com/...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 Ruff (0.14.11)
docs/public/llama-stack/llama_stack_quickstart.ipynb

42-42: Do not catch blind exception: Exception

(BLE001)


43-43: Use explicit conversion flag

Replace with conversion flag

(RUF010)


59-59: Abstract raise to an inner function

(TRY301)


59-59: Create your own exception

(TRY002)


59-59: Avoid specifying long messages outside the exception class

(TRY003)


73-73: Abstract raise to an inner function

(TRY301)


73-73: Create your own exception

(TRY002)


73-73: Avoid specifying long messages outside the exception class

(TRY003)


78-78: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (3)
docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md (1)

1-44: Documentation is clear and comprehensive.

The guide covers all essential aspects: overview, prerequisites, quickstart instructions, and additional resources. The structure is logical and user-friendly.

docs/public/llama-stack/llama_stack_quickstart.ipynb (2)

85-106: LGTM! Import setup is appropriate for notebook environments.

The custom path manipulation to include ~/packages aligns with the --target ~/packages flag used in the pip install cell, which is a valid pattern for restricted notebook environments.


358-384: LGTM! Server startup approach is appropriate for notebook demo.

The daemon thread approach with clear comments about production alternatives is well-documented. Using daemon=True ensures cleanup when the kernel restarts.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines 194 to 208
"except Exception as e:\n",
" print(f'Failed to get model list: {e}')\n",
" print('Make sure the server is running')\n",
"\n",
"\n",
"# Create Agent\n",
"print('Creating Agent...')\n",
"agent = Agent(\n",
" client,\n",
" model=model_id,\n",
" instructions='You are a helpful weather assistant. When users ask about weather, use the get_weather tool to query weather information, then answer based on the query results.',\n",
" tools=[get_weather],\n",
")\n",
"\n",
"print('Agent created successfully')"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Agent creation may fail if model listing failed.

The agent creation at lines 199-208 uses model_id which is only defined inside the try block (line 191). If the model listing fails, model_id will be undefined and agent creation will raise a NameError.

🔧 Suggested fix: Move agent creation inside the try block or add early exit
 except Exception as e:
     print(f'Failed to get model list: {e}')
     print('Make sure the server is running')
+    raise  # Re-raise to prevent subsequent cells from failing


 # Create Agent

Or wrap agent creation in a conditional:

+if 'model_id' in dir():
     # Create Agent
     print('Creating Agent...')
     agent = Agent(
         client,
         model=model_id,
         instructions='You are a helpful weather assistant...',
         tools=[get_weather],
     )
     print('Agent created successfully')
+else:
+    print('Skipping agent creation - no model available')
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"except Exception as e:\n",
" print(f'Failed to get model list: {e}')\n",
" print('Make sure the server is running')\n",
"\n",
"\n",
"# Create Agent\n",
"print('Creating Agent...')\n",
"agent = Agent(\n",
" client,\n",
" model=model_id,\n",
" instructions='You are a helpful weather assistant. When users ask about weather, use the get_weather tool to query weather information, then answer based on the query results.',\n",
" tools=[get_weather],\n",
")\n",
"\n",
"print('Agent created successfully')"
except Exception as e:
print(f'Failed to get model list: {e}')
print('Make sure the server is running')
raise # Re-raise to prevent subsequent cells from failing
# Create Agent
print('Creating Agent...')
agent = Agent(
client,
model=model_id,
instructions='You are a helpful weather assistant. When users ask about weather, use the get_weather tool to query weather information, then answer based on the query results.',
tools=[get_weather],
)
print('Agent created successfully')
🤖 Prompt for AI Agents
In `@docs/public/llama-stack/llama_stack_quickstart.ipynb` around lines 194 - 208,
Agent creation uses model_id which may be undefined if the model listing try
block failed; move the Agent(...) creation (the Agent instantiation that
references model_id, client, get_weather and instructions) inside the try block
that sets model_id or add an early exit/conditional guard after the except
(e.g., return or raise) so Agent(...) is only called when model_id is
successfully set; ensure you reference the same Agent(...) call and the model_id
assignment to relocate or gate the creation.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md`:
- Around line 68-70: Remove the leading space in the environment variable value
for VLLM_URL so the provider URL is correct; locate the VLLM_URL entry (the
name: VLLM_URL and its value field) and change the value from "
https://api.deepseek.com/v1" to "https://api.deepseek.com/v1" (no leading
whitespace).
🧹 Nitpick comments (2)
docs/public/llama-stack/llama_stack_quickstart.ipynb (2)

129-148: Docstring promises “description” but the return payload omits it.

Either remove “description” from the docstring or include it in the returned dict to avoid confusing tool consumers.

♻️ Proposed fix (add description)
-        return {
-            'city': city,
-            'temperature': f"{current['temp_C']}°C",
-            'humidity': f"{current['humidity']}%",
-        }
+        description = current.get('weatherDesc', [{}])[0].get('value', '')
+        return {
+            'city': city,
+            'temperature': f"{current['temp_C']}°C",
+            'description': description,
+            'humidity': f"{current['humidity']}%",
+        }

199-202: Prefer bare raise to preserve traceback.

raise e resets the traceback and makes debugging harder.

♻️ Proposed fix
-    raise e
+    raise

@davidwtf davidwtf requested a review from a team January 26, 2026 15:11
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md`:
- Around line 68-73: Replace the inline VLLM_API_TOKEN value with a Kubernetes
Secret reference instead of plaintext: update the env var entry for
VLLM_API_TOKEN in the LlamaStackDistribution CRD/spec so it uses
valueFrom.secretKeyRef (referencing your Secret name and key) rather than
setting value: XXX; ensure the Secret contains the token and that the container
spec (where VLLM_URL, VLLM_MAX_TOKENS, and VLLM_API_TOKEN are defined)
references that secret via valueFrom.secretKeyRef to securely inject the token
at runtime.

In `@docs/public/llama-stack/llama_stack_quickstart.ipynb`:
- Around line 128-150: The get_weather function builds a wttr.in URL using the
raw city string which breaks for spaces and non-ASCII characters; update
get_weather to percent-encode the city before interpolating into url (e.g., use
urllib.parse.quote or quote_plus or pass city as a query/path-encoded parameter)
so city names like "New York" and Unicode names are valid; ensure the encoding
is applied to the city variable used in the url =
f'https://wttr.in/{city}?format=j1' construction and keep the existing
timeout/response handling.
- Around line 331-349: The endpoint is defined as async but calls blocking
functions (agent.create_turn and AgentEventLogger.log); change the FastAPI route
handler from "async def chat(request: ChatRequest)" to a synchronous "def
chat(request: ChatRequest)" so FastAPI runs it in a threadpool, keep the body
logic the same (call agent.create_turn(...) and iterate
logger.log(response_stream) directly) and remove any awaits or async-only
constructs; ensure the decorator remains `@api_app.post`("/chat") and the function
name chat, and keep returning the {"response": full_response} dict.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md`:
- Around line 116-120: The extraction step uses tar with --strip-components=1
into ~/python312 but doesn't ensure the target directory exists; update the
documentation step that currently shows "tar -xzf /tmp/python312.tar.gz -C
~/python312 --strip-components=1" to create the directory first (use mkdir -p
~/python312) before running the tar command so extraction won't fail.
♻️ Duplicate comments (2)
docs/public/llama-stack/llama_stack_quickstart.ipynb (2)

137-139: URL-encode the city parameter to handle spaces and unicode characters.

City names like "New York" or non-ASCII names will produce invalid URLs. Use urllib.parse.quote to encode the city before interpolating into the URL.

🔧 Proposed fix
+        from urllib.parse import quote
-        url = f'https://wttr.in/{city}?format=j1'
+        url = f'https://wttr.in/{quote(city)}?format=j1'

331-349: Use a sync endpoint instead of async for blocking I/O.

agent.create_session(), agent.create_turn(), and AgentEventLogger.log() are synchronous blocking calls. Using async def here blocks the event loop and prevents concurrent request handling. Change to a sync def endpoint—FastAPI will automatically run it in a threadpool.

🔧 Proposed fix
 `@api_app.post`("/chat")
-async def chat(request: ChatRequest):
+def chat(request: ChatRequest):
     """Chat endpoint that uses the Llama Stack Agent"""

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants