-
Notifications
You must be signed in to change notification settings - Fork 14
add Llama Stack quickstart guide and notebook demo #107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds three new documentation artifacts: a user guide for creating an AI agent with LlamaStack, a sample YAML stack configuration, and a Jupyter quickstart notebook demonstrating server startup, agent/tool setup, streaming, session handling, and a FastAPI chat endpoint. Changes
Sequence Diagram(s)sequenceDiagram
participant User as User
participant Notebook as FastAPI\n(Notebook/API)
participant Agent as Agent
participant Model as Model\n(LLM)
participant Tool as Tool\n(get_weather)
User->>Notebook: POST /chat {message}
Notebook->>Agent: Start session / enqueue message
Agent->>Model: Request response (streaming)
Agent->>Tool: Invoke get_weather(...) if tool required
Tool-->>Agent: Return tool result
Model-->>Agent: Stream tokens/results
Agent-->>Notebook: Stream aggregated response
Notebook-->>User: Stream partial/final responses
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@docs/public/llama-stack/llama_stack_config.yaml`:
- Around line 1-60: The metadata_store block omits an explicit db_path; add a
db_path entry to metadata_store mirroring the pattern used for vector_io and
files so it reads metadata_store: type: sqlite and db_path:
${env.SQLITE_STORE_DIR:~/.llama/distributions/llama-stack-demo}/registry.db
(update the metadata_store section in the YAML to include this db_path key).
In `@docs/public/llama-stack/llama_stack_quickstart.ipynb`:
- Around line 462-467: Update the notebook metadata kernelspec so the kernel
name and display_name reflect the Llama Stack quickstart (e.g., change
kernelspec.name from "langchain-demo" and kernelspec.display_name from "Python
(langchain-demo)" to a clearer identifier like "llama-stack" and "Python (Llama
Stack)" respectively) by editing the kernelspec block in the notebook metadata.
- Around line 122-148: The docstring for get_weather promises wind speed but the
returned dict only contains city, temperature, and humidity; update the function
to include wind speed by extracting it from the parsed API response (e.g.,
current['windspeedKmph'] or current['windspeedMiles'] depending on desired
units) and add a 'wind_speed' key to the returned dictionary, or alternatively
remove the "wind speed" mention from the docstring to make it match the existing
return value.
- Around line 194-208: Agent creation uses model_id which may be undefined if
the model listing try block failed; move the Agent(...) creation (the Agent
instantiation that references model_id, client, get_weather and instructions)
inside the try block that sets model_id or add an early exit/conditional guard
after the except (e.g., return or raise) so Agent(...) is only called when
model_id is successfully set; ensure you reference the same Agent(...) call and
the model_id assignment to relocate or gate the creation.
🧹 Nitpick comments (2)
docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md (1)
41-44: Consider varying the link descriptions.All four resource links begin with "Llama Stack", which creates repetition. You could vary the wording:
💡 Suggested rewording
-- [Llama Stack Documentation](https://llamastack.github.io/docs) - The official Llama Stack documentation covering all usage-related topics, API providers, and core concepts. -- [Llama Stack Core Concepts](https://llamastack.github.io/docs/concepts) - Deep dive into Llama Stack architecture, API stability, and resource management. -- [Llama Stack GitHub Repository](https://github.com/llamastack/llama-stack) - Source code, example applications, distribution configurations, and how to add new API providers. -- [Llama Stack Example Apps](https://github.com/llamastack/llama-stack-apps/) - Official examples demonstrating how to use Llama Stack in various scenarios. +- [Official Documentation](https://llamastack.github.io/docs) - Covers all usage-related topics, API providers, and core concepts. +- [Core Concepts Guide](https://llamastack.github.io/docs/concepts) - Deep dive into architecture, API stability, and resource management. +- [GitHub Repository](https://github.com/llamastack/llama-stack) - Source code, example applications, and distribution configurations. +- [Example Applications](https://github.com/llamastack/llama-stack-apps/) - Official examples demonstrating various use cases.docs/public/llama-stack/llama_stack_quickstart.ipynb (1)
325-343: Consider session management for the chat endpoint.The
/chatendpoint creates a new session for every request (line 328). For a demo this works, but in production:
- Sessions accumulate without cleanup
- Conversation context is lost between requests
For a production-ready version, consider reusing sessions or implementing session cleanup:
# Option 1: Single shared session (simple approach) _session_id = None `@api_app.post`("/chat") async def chat(request: ChatRequest): global _session_id if _session_id is None: _session_id = agent.create_session('fastapi-weather-session') # ... rest of the code using _session_id
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.mddocs/public/llama-stack/llama_stack_config.yamldocs/public/llama-stack/llama_stack_quickstart.ipynb
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-01-13T11:25:34.596Z
Learnt from: jing2uo
Repo: alauda/knowledge PR: 104
File: docs/en/solutions/How_to_Migrate_VirtualMachine_From_VMware.md:131-172
Timestamp: 2026-01-13T11:25:34.596Z
Learning: In VMware migration documentation (docs/en/solutions), when describing the Forklift Operator workflow for VMware, specify that the VMware provider secret should set insecureSkipVerify=true to accommodate self-signed certificates commonly used in enterprise vCenter/ESXi environments. Include a note on the security trade-offs and ensure readers understand this is for internal, controlled environments.
Applied to files:
docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md
📚 Learning: 2025-12-30T08:43:22.275Z
Learnt from: davidwtf
Repo: alauda/knowledge PR: 95
File: docs/public/langchain/langchain_quickstart.ipynb:163-171
Timestamp: 2025-12-30T08:43:22.275Z
Learning: When reviewing LangChain notebooks, validate the agent creation pattern: ensure code uses create_agent(model=..., tools=[...], system_prompt="...") with model being a chat model instance (e.g., ChatOpenAI) and system_prompt provided as a string describing system instructions. This reflects a standard agent construction and can be applied across LangChain notebook files.
Applied to files:
docs/public/llama-stack/llama_stack_quickstart.ipynb
🪛 LanguageTool
docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md
[style] ~43-~43: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... stability, and resource management. - [Llama Stack GitHub Repository](https://github...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~44-~44: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...s, and how to add new API providers. - [Llama Stack Example Apps](https://github.com/...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 Ruff (0.14.11)
docs/public/llama-stack/llama_stack_quickstart.ipynb
42-42: Do not catch blind exception: Exception
(BLE001)
43-43: Use explicit conversion flag
Replace with conversion flag
(RUF010)
59-59: Abstract raise to an inner function
(TRY301)
59-59: Create your own exception
(TRY002)
59-59: Avoid specifying long messages outside the exception class
(TRY003)
73-73: Abstract raise to an inner function
(TRY301)
73-73: Create your own exception
(TRY002)
73-73: Avoid specifying long messages outside the exception class
(TRY003)
78-78: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (3)
docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md (1)
1-44: Documentation is clear and comprehensive.The guide covers all essential aspects: overview, prerequisites, quickstart instructions, and additional resources. The structure is logical and user-friendly.
docs/public/llama-stack/llama_stack_quickstart.ipynb (2)
85-106: LGTM! Import setup is appropriate for notebook environments.The custom path manipulation to include
~/packagesaligns with the--target ~/packagesflag used in the pip install cell, which is a valid pattern for restricted notebook environments.
358-384: LGTM! Server startup approach is appropriate for notebook demo.The daemon thread approach with clear comments about production alternatives is well-documented. Using
daemon=Trueensures cleanup when the kernel restarts.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| "except Exception as e:\n", | ||
| " print(f'Failed to get model list: {e}')\n", | ||
| " print('Make sure the server is running')\n", | ||
| "\n", | ||
| "\n", | ||
| "# Create Agent\n", | ||
| "print('Creating Agent...')\n", | ||
| "agent = Agent(\n", | ||
| " client,\n", | ||
| " model=model_id,\n", | ||
| " instructions='You are a helpful weather assistant. When users ask about weather, use the get_weather tool to query weather information, then answer based on the query results.',\n", | ||
| " tools=[get_weather],\n", | ||
| ")\n", | ||
| "\n", | ||
| "print('Agent created successfully')" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agent creation may fail if model listing failed.
The agent creation at lines 199-208 uses model_id which is only defined inside the try block (line 191). If the model listing fails, model_id will be undefined and agent creation will raise a NameError.
🔧 Suggested fix: Move agent creation inside the try block or add early exit
except Exception as e:
print(f'Failed to get model list: {e}')
print('Make sure the server is running')
+ raise # Re-raise to prevent subsequent cells from failing
# Create AgentOr wrap agent creation in a conditional:
+if 'model_id' in dir():
# Create Agent
print('Creating Agent...')
agent = Agent(
client,
model=model_id,
instructions='You are a helpful weather assistant...',
tools=[get_weather],
)
print('Agent created successfully')
+else:
+ print('Skipping agent creation - no model available')📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "except Exception as e:\n", | |
| " print(f'Failed to get model list: {e}')\n", | |
| " print('Make sure the server is running')\n", | |
| "\n", | |
| "\n", | |
| "# Create Agent\n", | |
| "print('Creating Agent...')\n", | |
| "agent = Agent(\n", | |
| " client,\n", | |
| " model=model_id,\n", | |
| " instructions='You are a helpful weather assistant. When users ask about weather, use the get_weather tool to query weather information, then answer based on the query results.',\n", | |
| " tools=[get_weather],\n", | |
| ")\n", | |
| "\n", | |
| "print('Agent created successfully')" | |
| except Exception as e: | |
| print(f'Failed to get model list: {e}') | |
| print('Make sure the server is running') | |
| raise # Re-raise to prevent subsequent cells from failing | |
| # Create Agent | |
| print('Creating Agent...') | |
| agent = Agent( | |
| client, | |
| model=model_id, | |
| instructions='You are a helpful weather assistant. When users ask about weather, use the get_weather tool to query weather information, then answer based on the query results.', | |
| tools=[get_weather], | |
| ) | |
| print('Agent created successfully') |
🤖 Prompt for AI Agents
In `@docs/public/llama-stack/llama_stack_quickstart.ipynb` around lines 194 - 208,
Agent creation uses model_id which may be undefined if the model listing try
block failed; move the Agent(...) creation (the Agent instantiation that
references model_id, client, get_weather and instructions) inside the try block
that sets model_id or add an early exit/conditional guard after the except
(e.g., return or raise) so Agent(...) is only called when model_id is
successfully set; ensure you reference the same Agent(...) call and the model_id
assignment to relocate or gate the creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md`:
- Around line 68-70: Remove the leading space in the environment variable value
for VLLM_URL so the provider URL is correct; locate the VLLM_URL entry (the
name: VLLM_URL and its value field) and change the value from "
https://api.deepseek.com/v1" to "https://api.deepseek.com/v1" (no leading
whitespace).
🧹 Nitpick comments (2)
docs/public/llama-stack/llama_stack_quickstart.ipynb (2)
129-148: Docstring promises “description” but the return payload omits it.Either remove “description” from the docstring or include it in the returned dict to avoid confusing tool consumers.
♻️ Proposed fix (add description)
- return { - 'city': city, - 'temperature': f"{current['temp_C']}°C", - 'humidity': f"{current['humidity']}%", - } + description = current.get('weatherDesc', [{}])[0].get('value', '') + return { + 'city': city, + 'temperature': f"{current['temp_C']}°C", + 'description': description, + 'humidity': f"{current['humidity']}%", + }
199-202: Prefer bareraiseto preserve traceback.
raise eresets the traceback and makes debugging harder.♻️ Proposed fix
- raise e + raise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md`:
- Around line 68-73: Replace the inline VLLM_API_TOKEN value with a Kubernetes
Secret reference instead of plaintext: update the env var entry for
VLLM_API_TOKEN in the LlamaStackDistribution CRD/spec so it uses
valueFrom.secretKeyRef (referencing your Secret name and key) rather than
setting value: XXX; ensure the Secret contains the token and that the container
spec (where VLLM_URL, VLLM_MAX_TOKENS, and VLLM_API_TOKEN are defined)
references that secret via valueFrom.secretKeyRef to securely inject the token
at runtime.
In `@docs/public/llama-stack/llama_stack_quickstart.ipynb`:
- Around line 128-150: The get_weather function builds a wttr.in URL using the
raw city string which breaks for spaces and non-ASCII characters; update
get_weather to percent-encode the city before interpolating into url (e.g., use
urllib.parse.quote or quote_plus or pass city as a query/path-encoded parameter)
so city names like "New York" and Unicode names are valid; ensure the encoding
is applied to the city variable used in the url =
f'https://wttr.in/{city}?format=j1' construction and keep the existing
timeout/response handling.
- Around line 331-349: The endpoint is defined as async but calls blocking
functions (agent.create_turn and AgentEventLogger.log); change the FastAPI route
handler from "async def chat(request: ChatRequest)" to a synchronous "def
chat(request: ChatRequest)" so FastAPI runs it in a threadpool, keep the body
logic the same (call agent.create_turn(...) and iterate
logger.log(response_stream) directly) and remove any awaits or async-only
constructs; ensure the decorator remains `@api_app.post`("/chat") and the function
name chat, and keep returning the {"response": full_response} dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md`:
- Around line 116-120: The extraction step uses tar with --strip-components=1
into ~/python312 but doesn't ensure the target directory exists; update the
documentation step that currently shows "tar -xzf /tmp/python312.tar.gz -C
~/python312 --strip-components=1" to create the directory first (use mkdir -p
~/python312) before running the tar command so extraction won't fail.
♻️ Duplicate comments (2)
docs/public/llama-stack/llama_stack_quickstart.ipynb (2)
137-139: URL-encode the city parameter to handle spaces and unicode characters.City names like "New York" or non-ASCII names will produce invalid URLs. Use
urllib.parse.quoteto encode the city before interpolating into the URL.🔧 Proposed fix
+ from urllib.parse import quote - url = f'https://wttr.in/{city}?format=j1' + url = f'https://wttr.in/{quote(city)}?format=j1'
331-349: Use a sync endpoint instead ofasyncfor blocking I/O.
agent.create_session(),agent.create_turn(), andAgentEventLogger.log()are synchronous blocking calls. Usingasync defhere blocks the event loop and prevents concurrent request handling. Change to a syncdefendpoint—FastAPI will automatically run it in a threadpool.🔧 Proposed fix
`@api_app.post`("/chat") -async def chat(request: ChatRequest): +def chat(request: ChatRequest): """Chat endpoint that uses the Llama Stack Agent"""
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.