-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Python: Adds NVIDIA NIM example #1198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
137fb8b
python: Adds NVIDIA NIM with OpenAI Chat Client example
thegovind c33bbcc
Updates environment variable names
thegovind b54fa35
Simplifies NVIDIA NIM agent example
thegovind 9ce3e7e
Add function calling and nemotron agent example
thegovind eb3309e
Merge branch 'main' into gok/feat/nvidia
thegovind ca57d85
fix minor lint issue
thegovind File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,114 @@ | ||
| # NVIDIA NIM Examples | ||
|
|
||
| This folder contains examples demonstrating how to use NVIDIA NIM (NVIDIA Inference Microservices) models with the Agent Framework through Azure AI Foundry. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| Before running these examples, you need to set up NVIDIA NIM models on Azure AI Foundry. Follow the detailed instructions in the [NVIDIA Developer Blog](https://developer.nvidia.com/blog/accelerated-ai-inference-with-nvidia-nim-on-azure-ai-foundry/#deploy_a_nim_on_azure_ai_foundry). | ||
|
|
||
| ### Quick Setup Steps | ||
|
|
||
| 1. **Access Azure AI Foundry Portal** | ||
| - Navigate to [ai.azure.com](https://ai.azure.com) | ||
| - Ensure you have a Hub and Project available | ||
|
|
||
| 2. **Deploy NVIDIA NIM Model** | ||
| - Select **Model Catalog** from the left sidebar | ||
| - In the **Collections** filter, select **NVIDIA** | ||
| - Choose a NIM microservice (e.g., Llama 3.1 8B Instruct NIM) | ||
| - Click **Deploy** | ||
| - Choose deployment name and VM type | ||
| - Review pricing and terms of use | ||
| - Click **Deploy** to launch the deployment | ||
|
|
||
| 3. **Get API Credentials** | ||
| - Once deployed, note your endpoint URL and API key | ||
| - The endpoint URL should include `/v1` (e.g., `https://<endpoint>.<region>.inference.ml.azure.com/v1`) | ||
|
|
||
| ## Examples | ||
|
|
||
| | File | Description | | ||
| |------|-------------| | ||
| | [`nvidia_nim_agent_example.py`](nvidia_nim_agent_example.py) | Complete example demonstrating how to use NVIDIA NIM models with the Agent Framework. Shows both streaming and non-streaming responses with tool calling capabilities. | | ||
| | [`nvidia_nim_chat_client.py`](nvidia_nim_chat_client.py) | Custom chat client implementation that handles NVIDIA NIM's specific message format requirements. | | ||
|
|
||
| ## Environment Variables | ||
|
|
||
| Set the following environment variables before running the examples: | ||
|
|
||
| - `OPENAI_BASE_URL`: Your Azure AI Foundry endpoint URL (e.g., `https://<endpoint>.<region>.inference.ml.azure.com/v1`) | ||
| - `OPENAI_API_KEY`: Your Azure AI Foundry API key | ||
| - `OPENAI_CHAT_MODEL_ID`: The NVIDIA NIM model to use (e.g., `nvidia/llama-3.1-8b-instruct`) | ||
|
|
||
| ## Running the Example | ||
|
|
||
| After setting up your NVIDIA NIM deployment and environment variables, you can run the example: | ||
|
|
||
| ```bash | ||
| # Navigate to the examples directory | ||
| cd python/samples/getting_started/agents/nvidia | ||
|
|
||
| # Activate the virtual environment (if using one) | ||
| source ../../../.venv/bin/activate | ||
|
|
||
| # Set your environment variables | ||
| export OPENAI_BASE_URL="https://your-endpoint.region.inference.ml.azure.com/v1" | ||
| export OPENAI_API_KEY="your-api-key" | ||
| export OPENAI_CHAT_MODEL_ID="nvidia/llama-3.1-8b-instruct" | ||
|
|
||
| # Run the example | ||
| python nvidia_nim_agent_example.py | ||
| ``` | ||
|
|
||
| The example will demonstrate: | ||
| - Chat completion with NVIDIA NIM models | ||
| - Function calling capabilities | ||
| - Tool integration | ||
|
|
||
| ## API Compatibility | ||
|
|
||
| NVIDIA NIM models deployed on Azure AI Foundry expose an OpenAI-compatible API, making them easy to integrate with existing OpenAI-based applications and frameworks. The models support: | ||
|
|
||
| - Standard OpenAI Chat Completion API | ||
| - Streaming and non-streaming responses | ||
| - Tool calling capabilities | ||
| - System and user messages | ||
|
|
||
| ### Message Format Differences | ||
|
|
||
| NVIDIA NIM models expect the `content` field in messages to be a simple string, not an array of content objects like the standard OpenAI API. The example uses a custom `NVIDIANIMChatClient` that handles this conversion automatically. | ||
|
|
||
| **Standard OpenAI format:** | ||
| ```json | ||
| { | ||
| "role": "user", | ||
| "content": [{"type": "text", "text": "Hello"}] | ||
| } | ||
| ``` | ||
|
|
||
| **NVIDIA NIM format:** | ||
| ```json | ||
| { | ||
| "role": "user", | ||
| "content": "Hello" | ||
| } | ||
| ``` | ||
|
|
||
| ## Available Models | ||
|
|
||
| NVIDIA NIM microservices support a wide range of models including: | ||
|
|
||
| - **Meta Llama 3.1 8B Instruct NIM** | ||
| - **Meta Llama 3.3 70B NIM** | ||
| - **NVIDIA Nemotron models** | ||
| - **Community models** | ||
| - **Custom AI models** | ||
|
|
||
| For the complete list of available models, check the Model Catalog in Azure AI Foundry. | ||
|
|
||
| ## Additional Resources | ||
|
|
||
| - [NVIDIA NIM on Azure AI Foundry Documentation](https://developer.nvidia.com/blog/accelerated-ai-inference-with-nvidia-nim-on-azure-ai-foundry/) | ||
| - [NVIDIA NIM Microservices](https://developer.nvidia.com/nim) | ||
| - [Azure AI Foundry Portal](https://ai.azure.com) | ||
| - [OpenAI SDK with NIM on Azure AI Foundry](https://developer.nvidia.com/blog/accelerated-ai-inference-with-nvidia-nim-on-azure-ai-foundry/#openai_sdk_with_nim_on_azure_ai_foundry) |
155 changes: 155 additions & 0 deletions
155
python/samples/getting_started/agents/nvidia/nvidia_nim_agent_example.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,155 @@ | ||
| # Copyright (c) Microsoft. All rights reserved. | ||
|
|
||
| import asyncio | ||
| import os | ||
| from random import randint | ||
| from typing import Annotated | ||
|
|
||
| from nvidia_nim_chat_client import NVIDIANIMChatClient | ||
|
|
||
| """ | ||
| NVIDIA NIM Agent Example | ||
|
|
||
| This sample demonstrates using NVIDIA NIM (NVIDIA Inference Microservices) models | ||
| deployed on Azure AI Foundry with the Agent Framework. It uses a custom chat client | ||
| that handles NVIDIA NIM's specific message format requirements. | ||
|
|
||
| ## Prerequisites - Deploy NVIDIA NIM on Azure AI Foundry | ||
|
|
||
| Before running this example, you must first deploy a NVIDIA NIM model on Azure AI Foundry. | ||
| Follow the detailed instructions at: | ||
| https://developer.nvidia.com/blog/accelerated-ai-inference-with-nvidia-nim-on-azure-ai-foundry/#deploy_a_nim_on_azure_ai_foundry | ||
|
|
||
| ### Quick Setup Steps: | ||
|
|
||
| 1. **Access Azure AI Foundry Portal** | ||
| - Navigate to https://ai.azure.com | ||
| - Ensure you have a Hub and Project available | ||
|
|
||
| 2. **Deploy NVIDIA NIM Model** | ||
| - Select "Model Catalog" from the left sidebar | ||
| - In the "Collections" filter, select "NVIDIA" | ||
| - Choose a NIM microservice (e.g., Llama 3.1 8B Instruct NIM) | ||
| - Click "Deploy" | ||
| - Choose deployment name and VM type | ||
| - Review pricing and terms of use | ||
| - Click "Deploy" to launch the deployment | ||
|
|
||
| 3. **Get API Credentials** | ||
| - Once deployed, note your endpoint URL and API key | ||
| - The endpoint URL should include '/v1' (e.g., 'https://<endpoint>.<region>.inference.ml.azure.com/v1') | ||
|
|
||
| ## Environment Variables | ||
|
|
||
| Set the following environment variables before running this example: | ||
|
|
||
| - OPENAI_BASE_URL: Your Azure AI Foundry endpoint URL (e.g., 'https://<endpoint>.<region>.inference.ml.azure.com/v1') | ||
| - OPENAI_API_KEY: Your Azure AI Foundry API key | ||
| - OPENAI_CHAT_MODEL_ID: The NVIDIA NIM model to use (e.g., 'nvidia/llama-3.1-8b-instruct') | ||
|
|
||
| ## Running the Example | ||
|
|
||
| After setting up your NVIDIA NIM deployment and environment variables: | ||
|
|
||
| ```bash | ||
| # Set your environment variables | ||
| export OPENAI_BASE_URL="https://your-endpoint.region.inference.ml.azure.com/v1" | ||
| export OPENAI_API_KEY="your-api-key" | ||
| export OPENAI_CHAT_MODEL_ID="nvidia/llama-3.1-8b-instruct" | ||
|
|
||
| # Run the example | ||
| python nvidia_nim_agent_example.py | ||
| ``` | ||
|
|
||
| The example will demonstrate: | ||
| - Chat completion with NVIDIA NIM models | ||
| - Function calling capabilities | ||
| - Tool integration | ||
|
|
||
| ## API Compatibility | ||
|
|
||
| NVIDIA NIM models deployed on Azure AI Foundry expose an OpenAI-compatible API, making them | ||
| easy to integrate with existing OpenAI-based applications and frameworks. The models support: | ||
|
|
||
| - Standard OpenAI Chat Completion API | ||
| - Streaming and non-streaming responses | ||
| - Tool calling capabilities | ||
| - System and user messages | ||
|
|
||
| For more information, see: | ||
| https://developer.nvidia.com/blog/accelerated-ai-inference-with-nvidia-nim-on-azure-ai-foundry/#openai_sdk_with_nim_on_azure_ai_foundry | ||
| """ | ||
|
|
||
|
|
||
| def get_weather( | ||
| location: Annotated[str, "The location to get the weather for."], | ||
| ) -> str: | ||
| """Get the weather for a given location.""" | ||
| conditions = ["sunny", "cloudy", "rainy", "stormy"] | ||
| return f"The weather in {location} is {conditions[randint(0, 3)]} with a high of {randint(10, 30)}°C." | ||
|
|
||
|
|
||
| def get_ai_insights( | ||
| topic: Annotated[str, "The AI topic to get insights about."], | ||
| ) -> str: | ||
| """Get AI insights about a specific topic.""" | ||
| insights = [ | ||
| f"AI is revolutionizing {topic} through advanced machine learning techniques.", | ||
| f"The future of {topic} lies in AI-powered automation and intelligent systems.", | ||
| f"Recent breakthroughs in AI have significantly impacted {topic} development.", | ||
| f"AI applications in {topic} are becoming more sophisticated and widespread." | ||
| ] | ||
| return insights[randint(0, len(insights) - 1)] | ||
|
|
||
|
|
||
| async def first_example() -> None: | ||
| """First example with function calling.""" | ||
| print("=== Response ===") | ||
|
|
||
| agent = NVIDIANIMChatClient( | ||
| api_key=os.environ["OPENAI_API_KEY"], | ||
| base_url=os.environ["OPENAI_BASE_URL"], | ||
| model_id=os.environ["OPENAI_CHAT_MODEL_ID"], | ||
| ).create_agent( | ||
| name="NVIDIAAIAgent", | ||
| instructions="You are a helpful AI assistant powered by NVIDIA NIM models. You can provide weather information and AI insights.", | ||
| tools=[get_weather, get_ai_insights], | ||
| ) | ||
|
|
||
| query = "What's the weather like in Seattle and tell me about AI in healthcare?" | ||
| print(f"User: {query}") | ||
| result = await agent.run(query) | ||
| print(f"Agent: {result}\n") | ||
|
|
||
|
|
||
| async def second_example() -> None: | ||
| """Second example with function calling.""" | ||
| print("=== Response ===") | ||
|
|
||
| agent = NVIDIANIMChatClient( | ||
| api_key=os.environ["OPENAI_API_KEY"], | ||
| base_url=os.environ["OPENAI_BASE_URL"], | ||
| model_id=os.environ["OPENAI_CHAT_MODEL_ID"], | ||
| ).create_agent( | ||
| name="NVIDIAAIAgent", | ||
| instructions="You are a helpful AI assistant powered by NVIDIA NIM models. You can provide weather information and AI insights.", | ||
| tools=[get_weather, get_ai_insights], | ||
| ) | ||
|
|
||
| query = "What's the weather like in Portland and give me insights about AI in autonomous vehicles?" | ||
| print(f"User: {query}") | ||
| result = await agent.run(query) | ||
| print(f"Agent: {result}\n") | ||
|
|
||
|
|
||
| async def main() -> None: | ||
| print("=== NVIDIA NIM Agent Example ===") | ||
| print("This example demonstrates using NVIDIA NIM models deployed on Azure AI Foundry") | ||
| print("with the Agent Framework using a custom chat client.\n") | ||
|
|
||
| await first_example() | ||
| await second_example() | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| asyncio.run(main()) |
79 changes: 79 additions & 0 deletions
79
python/samples/getting_started/agents/nvidia/nvidia_nim_chat_client.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,79 @@ | ||
| # Copyright (c) Microsoft. All rights reserved. | ||
|
|
||
| from typing import Any | ||
| from collections.abc import Sequence | ||
|
|
||
| from agent_framework.openai import OpenAIChatClient | ||
| from agent_framework._types import ( | ||
| ChatMessage, | ||
| Contents, | ||
| TextContent, | ||
| Role, | ||
| FunctionCallContent, | ||
| FunctionResultContent, | ||
| FunctionApprovalRequestContent, | ||
| FunctionApprovalResponseContent, | ||
| prepare_function_call_results, | ||
| ) | ||
|
|
||
|
|
||
| class NVIDIANIMChatClient(OpenAIChatClient): | ||
| """Custom OpenAI Chat Client for NVIDIA NIM models. | ||
|
|
||
| NVIDIA NIM models expect the 'content' field in messages to be a simple string, | ||
| not an array of content objects like standard OpenAI API. This client handles | ||
| the conversion from the agent framework's content format to NVIDIA NIM's expected format. | ||
| """ | ||
|
|
||
| def _openai_chat_message_parser(self, message: ChatMessage) -> list[dict[str, Any]]: | ||
| """Parse a chat message into the NVIDIA NIM format. | ||
|
|
||
| NVIDIA NIM expects: | ||
| - content: string (not array of objects) | ||
| - role: string | ||
| """ | ||
| all_messages: list[dict[str, Any]] = [] | ||
|
|
||
| for content in message.contents: | ||
| # Skip approval content - it's internal framework state, not for the LLM | ||
| if isinstance(content, (FunctionApprovalRequestContent, FunctionApprovalResponseContent)): | ||
| continue | ||
|
|
||
| args: dict[str, Any] = { | ||
| "role": message.role.value if isinstance(message.role, Role) else message.role, | ||
| } | ||
|
|
||
| if message.additional_properties: | ||
| args["metadata"] = message.additional_properties | ||
|
|
||
| if isinstance(content, FunctionCallContent): | ||
| if all_messages and "tool_calls" in all_messages[-1]: | ||
| # If the last message already has tool calls, append to it | ||
| all_messages[-1]["tool_calls"].append(self._openai_content_parser(content)) | ||
| else: | ||
| args["tool_calls"] = [self._openai_content_parser(content)] # type: ignore | ||
| elif isinstance(content, FunctionResultContent): | ||
| args["tool_call_id"] = content.call_id | ||
| if content.result is not None: | ||
| args["content"] = prepare_function_call_results(content.result) | ||
| elif content.exception is not None: | ||
| # Send the exception message to the model | ||
| args["content"] = "Error: " + str(content.exception) | ||
| elif isinstance(content, TextContent): | ||
| # For NVIDIA NIM, content should be a simple string, not an array | ||
| if "content" not in args: | ||
| args["content"] = content.text | ||
| else: | ||
| # If there's already content, append to it | ||
| args["content"] += content.text | ||
| else: | ||
| # For other content types, convert to string representation | ||
| if "content" not in args: | ||
| args["content"] = str(content) | ||
| else: | ||
| args["content"] += str(content) | ||
|
|
||
| if "content" in args or "tool_calls" in args: | ||
| all_messages.append(args) | ||
|
|
||
| return all_messages | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Directed to @eavanvalkenburg: we have other clients based on the OpenAIChatClient, why wouldn't we want to bring this NVIDIANIMChatClient into its own package, instead of it being purely in the samples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question @moonbox3 I've asked Shawn for his opinion