Build AI-powered data assistants using OpenAI's GPT models and CData Connect AI. Query your live data sources using natural language conversations.
This project provides a Python framework for building conversational AI applications that can interact with your data through CData Connect AI. It uses the Model Context Protocol (MCP) to enable OpenAI's GPT models to discover and query your connected data sources.
Key Features:
- Natural language queries against 300+ data sources (Google Sheets, Salesforce, Snowflake, etc.)
- Automatic tool discovery via MCP protocol
- Multi-turn conversation support
- Streaming responses
- Easy-to-use Python API
Learn more about Embedded Cloud for AI:
- Visit the Embedded Cloud website
- Watch our introductory video
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ │ │ │ │ │
│ Your Python │────▶│ CData Connect │────▶│ Data Sources │
│ Application │ │ AI MCP Server │ │ (300+ types) │
│ │◀────│ │◀────│ │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │
│ Tool Discovery │
│ & Execution │
▼ │
┌─────────────────┐ │
│ │ │
│ OpenAI API │──────────────┘
│ (GPT-4, etc.) │ Natural Language
│ │ to SQL Translation
└─────────────────┘
- Python 3.9+ - Download from python.org or install via your package manager
- pip - Python's package installer (included with Python 3.4+). Verify with
pip --versionorpip3 --version - An OpenAI API key
- A CData Connect AI account (free trial available)
- Clone the repository:
git clone https://github.com/CDataSoftware/connectai-openai-agent.git
cd connectai-openai-agent- Install dependencies:
pip install -r requirements.txt- Configure your environment:
cp .env.example .envEdit .env with your credentials:
OPENAI_API_KEY=your_openai_api_key
CDATA_EMAIL=your_email@example.com
CDATA_PAT=your_personal_access_token- Sign up at CData Connect AI
- Add a data source connection (e.g., Google Sheets)
- Go to Settings > Access Tokens > Create PAT
- Copy the token (it's only shown once!)
from dotenv import load_dotenv
from src.connectai_openai import Config, MCPAgent
load_dotenv()
# Create agent from environment variables
config = Config.from_env()
agent = MCPAgent(config)
# Ask questions about your data
response = agent.chat("What data sources do I have connected?")
print(response)
response = agent.chat("Show me the tables in my Google Sheets connection")
print(response)
response = agent.chat("Query the top 5 accounts by revenue")
print(response)Run the example chat application:
python examples/basic_chat.pyfrom dotenv import load_dotenv
from src.connectai_openai import Config, MCPAgent
load_dotenv()
config = Config.from_env()
agent = MCPAgent(config)
# Explore available data
agent.chat("List all my data connections")
agent.chat("What tables are in my Google Sheets?")
agent.chat("Show me the columns in the account table")
# Query the data
response = agent.chat("""
Show me all accounts with revenue over $1 million,
sorted by revenue descending
""")
print(response)from dotenv import load_dotenv
from src.connectai_openai import Config, MCPAgent
load_dotenv()
config = Config.from_env()
agent = MCPAgent(config)
# Stream the response
for chunk in agent.chat_stream("Analyze the health of my top 5 customers"):
print(chunk, end="", flush=True)# Query across multiple connected sources
agent.chat("Compare sales data from Salesforce with usage data from Google Sheets")Configuration class for credentials and settings.
# From environment variables
config = Config.from_env()
# Or explicit values
config = Config(
openai_api_key="sk-...",
cdata_email="user@example.com",
cdata_pat="your-pat-token",
openai_model="gpt-4o", # optional
mcp_server_url="https://mcp.cloud.cdata.com/mcp" # optional
)AI agent with tool calling capabilities.
agent = MCPAgent(
config,
instructions="Custom system prompt...", # optional
max_tool_iterations=10 # optional
)
# Methods
response = agent.chat("Your question")
for chunk in agent.chat_stream("Your question"):
print(chunk)
agent.clear_history()
tools = agent.get_available_tools()Low-level MCP client for direct tool access.
from src.connectai_openai import Config, MCPClient
config = Config.from_env()
client = MCPClient(config)
# Discover tools
tools = client.list_tools()
# Execute tools directly
catalogs = client.get_catalogs()
schemas = client.get_schemas("MyConnection")
tables = client.get_tables("MyConnection", "GoogleSheets")
columns = client.get_columns("MyConnection", "GoogleSheets", "account")
results = client.query_data("SELECT * FROM [MyConnection].[GoogleSheets].[account]")The agent automatically has access to these CData Connect AI tools:
| Tool | Description |
|---|---|
getCatalogs |
List available data source connections |
getSchemas |
Get schemas for a specific catalog |
getTables |
Get tables in a schema |
getColumns |
Get column metadata for a table |
queryData |
Execute SQL queries |
getProcedures |
List stored procedures |
getProcedureParameters |
Get procedure parameter details |
executeProcedure |
Execute stored procedures |
getInstructions |
Get driver-specific instructions and best practices for a data source |
When querying data, use fully qualified table names:
SELECT * FROM [CatalogName].[SchemaName].[TableName]Example:
SELECT [Name], [Revenue]
FROM [demo_organization].[GoogleSheets].[account]
WHERE [Revenue] > 1000000
ORDER BY [Revenue] DESCTo get started quickly, copy our sample Google Sheet with customer data:
- account: Company information (name, industry, revenue)
- opportunity: Sales pipeline data
- tickets: Support ticket information
- usage: Product usage metrics
- Verify your CData email and PAT are correct in
.env - Ensure the PAT hasn't expired
- Check that your Connect AI account is active
- Confirm you have at least one data source connected in Connect AI
- Check that your user has permissions to access the connection
- Use fully qualified table names:
[Catalog].[Schema].[Table] - Verify column names exist using
getColumns - Check SQL syntax (Connect AI uses SQL-92 standard)
- CData Connect AI Documentation
- CData Connect AI Prompt Library
- OpenAI API Documentation
- Model Context Protocol
MIT License - see LICENSE for details.