Skip to content

Tuning the AI

This guide covers configuring AI models, optimizing performance, managing costs, and customizing AI behavior in Querri.

Querri uses Large Language Models (LLMs) for:

  • Data analysis: Automated insights and pattern detection
  • Natural language queries: Chat-based data exploration
  • Code generation: Python/SQL code for data transformations
  • Visualization recommendations: Suggested charts and dashboards
  • Data cleaning: Anomaly detection and data quality suggestions

Querri supports two AI providers:

  1. Azure OpenAI (Recommended for enterprise)

    • Enterprise-grade security
    • Regional deployment options
    • SLA guarantees
    • Dedicated capacity
  2. OpenAI (Alternative)

    • Direct OpenAI API access
    • Latest models
    • Pay-as-you-go pricing

Configure Azure OpenAI in .env-prod:

Terminal window
# Azure OpenAI Endpoint
AZURE_OPENAI_ENDPOINT=https://yourcompany.openai.azure.com
# API Key
AZURE_OPENAI_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
# API Version
AZURE_OPENAI_API_VERSION=2024-02-15-preview
# Model Deployments
STANDARD_MODEL=gpt-4o # Standard operations
FAST_MODEL=gpt-4o-mini # Quick responses
SMART_MODEL=gpt-4o # Complex analysis

Note: Model names refer to your Azure deployment names, not OpenAI model IDs.

Configure direct OpenAI access:

Terminal window
# OpenAI API Key
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
# Model selection
STANDARD_MODEL=gpt-4o
FAST_MODEL=gpt-4o-mini
SMART_MODEL=gpt-4o

Querri uses different models for different tasks:

Use case: Most data analysis tasks Recommended: gpt-4o Characteristics:

  • Balanced speed and capability
  • Good for general analysis
  • Cost-effective for routine tasks

Used for:

  • Data transformations
  • Basic analysis
  • Chart generation
  • Query interpretation

Use case: Quick responses, simple tasks Recommended: gpt-4o-mini Characteristics:

  • Fast response time
  • Lower cost per token
  • Good for straightforward tasks

Used for:

  • Simple queries
  • UI suggestions
  • Quick calculations
  • Error message interpretation

Use case: Complex reasoning, advanced analysis Recommended: gpt-4o or o1-preview Characteristics:

  • Advanced reasoning
  • Better for complex problems
  • Higher cost, slower

Used for:

  • Multi-step analysis
  • Complex data transformations
  • Advanced visualizations
  • Strategic insights

Change model configuration:

Terminal window
# Edit .env-prod
nano .env-prod
# Update model deployments
STANDARD_MODEL=gpt-4o
FAST_MODEL=gpt-4o-mini
SMART_MODEL=o1-preview # Upgrade to o1 for advanced reasoning
# Restart services
docker compose restart server-api hub

Different models have different context windows:

  • GPT-4o: 128K tokens (~96K words)
  • GPT-4o-mini: 128K tokens
  • O1-preview: 128K tokens

Configure context limits:

Terminal window
# Maximum tokens for prompts
MAX_PROMPT_TOKENS=8000
# Maximum tokens for responses
MAX_COMPLETION_TOKENS=4000
# Total context budget
MAX_TOTAL_TOKENS=12000

In code (server-api/api/utils/ai.py):

# Configure token limits
config = {
"max_prompt_tokens": int(os.getenv("MAX_PROMPT_TOKENS", 8000)),
"max_completion_tokens": int(os.getenv("MAX_COMPLETION_TOKENS", 4000)),
}
# Truncate context if needed
def truncate_context(context: str, max_tokens: int) -> str:
# Implement token counting and truncation
pass

Reduce context size to lower costs:

  1. Include only relevant data:

    # Bad: Include entire dataset
    context = f"Analyze this data: {entire_dataframe}"
    # Good: Include sample and schema
    context = f"Schema: {df.dtypes}\nSample: {df.head(10)}"
  2. Use data summaries:

    # Include summary statistics instead of raw data
    context = f"Summary: {df.describe()}"
  3. Implement sliding window:

    # Keep only recent conversation history
    MAX_HISTORY_MESSAGES = 10
    context = conversation_history[-MAX_HISTORY_MESSAGES:]

Querri AI can use various tools for data analysis:

Terminal window
# Enable/disable AI tools
ENABLE_CODE_INTERPRETER=true # Python code execution
ENABLE_WEB_SEARCH=false # Web search (requires API)
ENABLE_FILE_UPLOAD=true # File analysis
ENABLE_DATA_VISUALIZATION=true # Chart generation
ENABLE_SQL_GENERATION=true # SQL query generation

Controls Python code execution:

server-api/querri_core/ai/tools.py
tools = []
if os.getenv("ENABLE_CODE_INTERPRETER", "true").lower() == "true":
tools.append({
"type": "code_interpreter",
"settings": {
"timeout": 30, # seconds
"max_memory_mb": 512,
"allowed_packages": [
"pandas",
"numpy",
"matplotlib",
"seaborn",
"scikit-learn"
]
}
})

Configure available functions:

# Define available functions
functions = [
{
"name": "execute_sql",
"description": "Execute SQL query on dataset",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"}
}
}
},
{
"name": "create_visualization",
"description": "Create data visualization",
"parameters": {
"type": "object",
"properties": {
"chart_type": {"type": "string"},
"x_axis": {"type": "string"},
"y_axis": {"type": "string"}
}
}
}
]

Customize AI behavior via system prompts:

server-api/querri_core/ai/prompts.py:

# Default system prompt
DEFAULT_SYSTEM_PROMPT = """
You are a data analysis assistant for Querri. Your role is to help users
analyze data, create visualizations, and derive insights.
Guidelines:
- Be concise and accurate
- Show your work (include code/queries)
- Explain findings in business terms
- Suggest relevant visualizations
- Flag data quality issues
When analyzing data:
1. Understand the question
2. Examine the data structure
3. Perform analysis
4. Present insights clearly
"""
# Custom prompt per organization
def get_system_prompt(org_id: str) -> str:
custom = db.organizations.find_one(
{"_id": org_id},
{"ai_settings.system_prompt": 1}
)
return custom.get("ai_settings", {}).get(
"system_prompt",
DEFAULT_SYSTEM_PROMPT
)

Configure custom prompts per organization:

// Update organization AI settings
db.organizations.updateOne(
{_id: "org_xxxxxxxxxxxxx"},
{
$set: {
"ai_settings.system_prompt": `
You are a healthcare data analyst. Focus on patient outcomes,
compliance with HIPAA, and clinical insights. Always consider:
- Patient privacy
- Statistical significance
- Clinical relevance
`,
"ai_settings.temperature": 0.3, // Lower for more consistent results
"ai_settings.max_tokens": 2000
}
}
)

Control AI randomness:

Terminal window
# Temperature (0.0 - 2.0)
# Lower = more deterministic, higher = more creative
AI_TEMPERATURE=0.7
# Top P sampling
AI_TOP_P=0.9
# Frequency penalty (reduce repetition)
AI_FREQUENCY_PENALTY=0.0
# Presence penalty (encourage new topics)
AI_PRESENCE_PENALTY=0.0

Usage in code:

response = client.chat.completions.create(
model=os.getenv("STANDARD_MODEL"),
messages=messages,
temperature=float(os.getenv("AI_TEMPERATURE", 0.7)),
top_p=float(os.getenv("AI_TOP_P", 0.9)),
frequency_penalty=float(os.getenv("AI_FREQUENCY_PENALTY", 0.0)),
presence_penalty=float(os.getenv("AI_PRESENCE_PENALTY", 0.0)),
)

Control output format:

# JSON mode for structured output
response = client.chat.completions.create(
model=os.getenv("STANDARD_MODEL"),
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Analyze sales data"}
],
response_format={"type": "json_object"}
)
# Structured output schema
schema = {
"type": "object",
"properties": {
"insights": {"type": "array"},
"recommendations": {"type": "array"},
"visualizations": {"type": "array"}
},
"required": ["insights"]
}

Track AI costs:

# Log token usage
db.ai_usage.insert_one({
"user_email": user_email,
"organization_id": org_id,
"model": model_name,
"prompt_tokens": response.usage.prompt_tokens,
"completion_tokens": response.usage.completion_tokens,
"total_tokens": response.usage.total_tokens,
"cost_usd": calculate_cost(response.usage, model_name),
"timestamp": datetime.now()
})
def calculate_cost(usage, model):
# Pricing per 1K tokens (example rates)
pricing = {
"gpt-4o": {"prompt": 0.005, "completion": 0.015},
"gpt-4o-mini": {"prompt": 0.00015, "completion": 0.0006},
"o1-preview": {"prompt": 0.015, "completion": 0.060}
}
rates = pricing.get(model, pricing["gpt-4o"])
cost = (
(usage.prompt_tokens / 1000) * rates["prompt"] +
(usage.completion_tokens / 1000) * rates["completion"]
)
return round(cost, 6)

Set organizational usage limits:

// Set monthly token quota
db.organizations.updateOne(
{_id: "org_xxxxxxxxxxxxx"},
{
$set: {
"ai_settings.quotas": {
"tokens_per_month": 1000000, // 1M tokens/month
"cost_limit_usd": 100, // $100/month
"requests_per_day": 1000
}
}
}
)

Enforce quotas:

def check_quota(org_id: str) -> bool:
# Get organization quotas
org = db.organizations.find_one(
{"_id": org_id},
{"ai_settings.quotas": 1}
)
quotas = org.get("ai_settings", {}).get("quotas", {})
# Get current month usage
month_start = datetime.now().replace(day=1, hour=0, minute=0, second=0)
usage = db.ai_usage.aggregate([
{
"$match": {
"organization_id": org_id,
"timestamp": {"$gte": month_start}
}
},
{
"$group": {
"_id": None,
"total_tokens": {"$sum": "$total_tokens"},
"total_cost": {"$sum": "$cost_usd"}
}
}
])
current = list(usage)[0] if usage else {"total_tokens": 0, "total_cost": 0}
# Check limits
if current["total_tokens"] > quotas.get("tokens_per_month", float("inf")):
return False
if current["total_cost"] > quotas.get("cost_limit_usd", float("inf")):
return False
return True
  1. Use appropriate models:

    • Fast model for simple tasks
    • Standard model for most work
    • Smart model only when needed
  2. Optimize prompts:

    • Be specific and concise
    • Avoid unnecessary context
    • Use examples sparingly
  3. Cache responses:

    # Cache common queries
    cache_key = f"ai_response:{hash(prompt)}"
    cached = redis.get(cache_key)
    if cached:
    return cached
    response = ai_completion(prompt)
    redis.setex(cache_key, 3600, response) # Cache 1 hour
    return response
  4. Implement rate limiting:

    # Limit AI requests per user
    rate_limit = f"ai_rate:{user_email}"
    requests = redis.incr(rate_limit)
    if requests == 1:
    redis.expire(rate_limit, 3600) # Reset every hour
    if requests > 100: # Max 100 requests/hour
    raise RateLimitError("AI rate limit exceeded")

View AI usage and costs:

// Monthly cost by organization
db.ai_usage.aggregate([
{
$match: {
timestamp: {
$gte: new Date(new Date().getFullYear(), new Date().getMonth(), 1)
}
}
},
{
$group: {
_id: "$organization_id",
total_cost: {$sum: "$cost_usd"},
total_tokens: {$sum: "$total_tokens"},
request_count: {$sum: 1}
}
},
{
$sort: {total_cost: -1}
}
])
// Cost by user
db.ai_usage.aggregate([
{
$match: {
timestamp: {
$gte: new Date(Date.now() - 30*24*60*60*1000)
}
}
},
{
$group: {
_id: "$user_email",
total_cost: {$sum: "$cost_usd"},
avg_tokens_per_request: {
$avg: "$total_tokens"
}
}
},
{
$sort: {total_cost: -1}
}
])

Process multiple AI requests concurrently:

import asyncio
from typing import List
async def batch_ai_completion(prompts: List[str]) -> List[str]:
tasks = [
ai_completion_async(prompt)
for prompt in prompts
]
return await asyncio.gather(*tasks)
# Usage
results = await batch_ai_completion([
"Analyze dataset A",
"Analyze dataset B",
"Analyze dataset C"
])

Stream AI responses for better UX:

def stream_ai_response(prompt: str):
response = client.chat.completions.create(
model=os.getenv("STANDARD_MODEL"),
messages=[{"role": "user", "content": prompt}],
stream=True
)
for chunk in response:
if chunk.choices[0].delta.content:
yield chunk.choices[0].delta.content

Batch multiple questions:

# Instead of multiple API calls
result1 = ai_completion("What is the average?")
result2 = ai_completion("What is the max?")
result3 = ai_completion("What is the min?")
# Batch into single call
batch_prompt = """
Answer these questions about the data:
1. What is the average?
2. What is the maximum?
3. What is the minimum?
Format as JSON: {"average": ..., "max": ..., "min": ...}
"""
result = ai_completion(batch_prompt)

Issue: Rate limit errors from AI provider

Solutions:

  1. Implement exponential backoff:

    import time
    def ai_completion_with_retry(prompt, max_retries=3):
    for attempt in range(max_retries):
    try:
    return ai_completion(prompt)
    except RateLimitError:
    wait_time = 2 ** attempt
    time.sleep(wait_time)
    raise Exception("Max retries exceeded")
  2. Reduce request frequency

  3. Upgrade API tier (Azure/OpenAI)

Issue: AI providing irrelevant or incorrect answers

Solutions:

  1. Improve prompts:

    • Be more specific
    • Provide examples
    • Add constraints
  2. Adjust temperature:

    • Lower for factual tasks (0.3)
    • Higher for creative tasks (0.9)
  3. Use better model:

    • Upgrade to SMART_MODEL for complex tasks

Issue: AI costs exceeding budget

Solutions:

  1. Audit usage by user/org
  2. Implement stricter quotas
  3. Switch to smaller models where appropriate
  4. Cache frequent queries
  5. Optimize prompts to reduce tokens
  1. Model Selection:

    • Use fast model by default
    • Upgrade to standard/smart only when needed
    • Test model performance vs. cost tradeoffs
  2. Prompt Engineering:

    • Clear, specific instructions
    • Include relevant context only
    • Use examples when helpful
    • Specify output format
  3. Cost Control:

    • Set organizational quotas
    • Monitor usage regularly
    • Alert on unusual patterns
    • Implement caching
  4. Performance:

    • Use async/parallel processing
    • Stream responses for UX
    • Batch similar requests
    • Cache common queries
  5. Quality Assurance:

    • Validate AI responses
    • Log problematic outputs
    • Gather user feedback
    • Iterate on prompts