Skip to content

Conversation

@vladimirivic
Copy link
Contributor

@vladimirivic vladimirivic commented Jan 26, 2025

Summary:
We want to use the headers to negotiate content.

Sending this header in every request will cause server to return chunks, even without the stream=True param.

llama-stack-client inference chat-completion --message="Hello there"

{"event":{"event_type":"start","delta":"Hello"}}

{"event":{"event_type":"progress","delta":"!"}}

{"event":{"event_type":"progress","delta":" How"}}

{"event":{"event_type":"progress","delta":" are"}}

{"event":{"event_type":"progress","delta":" you"}}

{"event":{"event_type":"progress","delta":" today"}}

Test Plan:

pip install .

llama-stack-client configure --endpoint={endpoint} --api-key={api-key}

llama-stack-client inference chat-completion --message="Hello there"
ChatCompletionResponse(completion_message=CompletionMessage(content='Hello! How can I assist you today?', role='assistant', stop_reason='end_of_turn', tool_calls=[]), logprobs=None)

Summary:
We want to use the headers to negotiate content.

Sending this header in every request will cause server to return chunks, even without the stream=True param.

```
llama-stack-client inference chat-completion --message="Hello there"

{"event":{"event_type":"start","delta":"Hello"}}

{"event":{"event_type":"progress","delta":"!"}}

{"event":{"event_type":"progress","delta":" How"}}

{"event":{"event_type":"progress","delta":" are"}}

{"event":{"event_type":"progress","delta":" you"}}

{"event":{"event_type":"progress","delta":" today"}}
```

Test Plan:

```
pip install .

llama-stack-client configure --endpoint={endpoint} --api-key={api-key}

llama-stack-client inference chat-completion --message="Hello there"
ChatCompletionResponse(completion_message=CompletionMessage(content='Hello! How can I assist you today?', role='assistant', stop_reason='end_of_turn', tool_calls=[]), logprobs=None)
```
timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,
) -> InferenceChatCompletionResponse | Stream[InferenceChatCompletionResponse]:
extra_headers = {"Accept": "text/event-stream", **(extra_headers or {})}
if stream is True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be if stream, but the higher level issue is that this is generated code. We need to make sure we auto-apply this patch always after generation (see stainless_sync.sh) or find another way

@yanxi0830
Copy link
Contributor

Is this still needed after #108 ?

@ashwinb
Copy link
Contributor

ashwinb commented Jan 31, 2025

@yanxi0830 yes this is still needed but for a different reason. this header is sent by the client to the server, not the other way round.

@vladimirivic vladimirivic deleted the pr98 branch February 1, 2025 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants