Tuning the AI

This guide covers configuring AI models, optimizing performance, managing costs, and customizing AI behavior in Querri.

AI Architecture

Querri uses Large Language Models (LLMs) for:

Data analysis: Automated insights and pattern detection
Natural language queries: Chat-based data exploration
Code generation: Python/SQL code for data transformations
Visualization recommendations: Suggested charts and dashboards
Data cleaning: Anomaly detection and data quality suggestions

AI Providers

Querri supports two AI providers:

Azure OpenAI (Recommended for enterprise)
- Enterprise-grade security
- Regional deployment options
- SLA guarantees
- Dedicated capacity
OpenAI (Alternative)
- Direct OpenAI API access
- Latest models
- Pay-as-you-go pricing

Model Configuration

Azure OpenAI Setup

Configure Azure OpenAI in .env-prod:

# Azure OpenAI Endpoint
AZURE_OPENAI_ENDPOINT=https://yourcompany.openai.azure.com

# API Key
AZURE_OPENAI_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

# API Version
AZURE_OPENAI_API_VERSION=2024-02-15-preview

# Model Deployments
STANDARD_MODEL=gpt-4o          # Standard operations
FAST_MODEL=gpt-4o-mini         # Quick responses
SMART_MODEL=gpt-4o             # Complex analysis

Note: Model names refer to your Azure deployment names, not OpenAI model IDs.

OpenAI Setup (Alternative)

Configure direct OpenAI access:

# OpenAI API Key
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

# Model selection
STANDARD_MODEL=gpt-4o
FAST_MODEL=gpt-4o-mini
SMART_MODEL=gpt-4o

Model Selection Strategy

Querri uses different models for different tasks:

Standard Model (STANDARD_MODEL)

Use case: Most data analysis tasks Recommended: gpt-4o Characteristics:

Balanced speed and capability
Good for general analysis
Cost-effective for routine tasks

Used for:

Data transformations
Basic analysis
Chart generation
Query interpretation

Fast Model (FAST_MODEL)

Use case: Quick responses, simple tasks Recommended: gpt-4o-mini Characteristics:

Fast response time
Lower cost per token
Good for straightforward tasks

Used for:

Simple queries
UI suggestions
Quick calculations
Error message interpretation

Smart Model (SMART_MODEL)

Use case: Complex reasoning, advanced analysis Recommended: gpt-4o or o1-preview Characteristics:

Advanced reasoning
Better for complex problems
Higher cost, slower

Used for:

Multi-step analysis
Complex data transformations
Advanced visualizations
Strategic insights

Updating Models

Change model configuration:

# Edit .env-prod
nano .env-prod

# Update model deployments
STANDARD_MODEL=gpt-4o
FAST_MODEL=gpt-4o-mini
SMART_MODEL=o1-preview  # Upgrade to o1 for advanced reasoning

# Restart services
docker compose restart server-api hub

Context Window Configuration

Context Window Sizes

Different models have different context windows:

GPT-4o: 128K tokens (~96K words)
GPT-4o-mini: 128K tokens
O1-preview: 128K tokens

Managing Context

Configure context limits:

# Maximum tokens for prompts
MAX_PROMPT_TOKENS=8000

# Maximum tokens for responses
MAX_COMPLETION_TOKENS=4000

# Total context budget
MAX_TOTAL_TOKENS=12000

In code (server-api/api/utils/ai.py):

# Configure token limits
config = {
    "max_prompt_tokens": int(os.getenv("MAX_PROMPT_TOKENS", 8000)),
    "max_completion_tokens": int(os.getenv("MAX_COMPLETION_TOKENS", 4000)),
}

# Truncate context if needed
def truncate_context(context: str, max_tokens: int) -> str:
    # Implement token counting and truncation
    pass

Context Optimization

Reduce context size to lower costs:

Include only relevant data:

# Bad: Include entire dataset
context = f"Analyze this data: {entire_dataframe}"

# Good: Include sample and schema
context = f"Schema: {df.dtypes}\nSample: {df.head(10)}"

Use data summaries:

# Include summary statistics instead of raw data
context = f"Summary: {df.describe()}"

Implement sliding window:

# Keep only recent conversation history
MAX_HISTORY_MESSAGES = 10
context = conversation_history[-MAX_HISTORY_MESSAGES:]

Tool Availability

Configuring AI Tools

Querri AI can use various tools for data analysis:

# Enable/disable AI tools
ENABLE_CODE_INTERPRETER=true      # Python code execution
ENABLE_WEB_SEARCH=false           # Web search (requires API)
ENABLE_FILE_UPLOAD=true           # File analysis
ENABLE_DATA_VISUALIZATION=true   # Chart generation
ENABLE_SQL_GENERATION=true       # SQL query generation

Code Interpreter

Controls Python code execution:

tools = []

if os.getenv("ENABLE_CODE_INTERPRETER", "true").lower() == "true":
    tools.append({
        "type": "code_interpreter",
        "settings": {
            "timeout": 30,  # seconds
            "max_memory_mb": 512,
            "allowed_packages": [
                "pandas",
                "numpy",
                "matplotlib",
                "seaborn",
                "scikit-learn"
            ]
        }
    })

Function Calling

Configure available functions:

# Define available functions
functions = [
    {
        "name": "execute_sql",
        "description": "Execute SQL query on dataset",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string"}
            }
        }
    },
    {
        "name": "create_visualization",
        "description": "Create data visualization",
        "parameters": {
            "type": "object",
            "properties": {
                "chart_type": {"type": "string"},
                "x_axis": {"type": "string"},
                "y_axis": {"type": "string"}
            }
        }
    }
]

Agent Behavior Customization

System Prompts

Customize AI behavior via system prompts:

server-api/querri_core/ai/prompts.py:

# Default system prompt
DEFAULT_SYSTEM_PROMPT = """
You are a data analysis assistant for Querri. Your role is to help users
analyze data, create visualizations, and derive insights.

Guidelines:
- Be concise and accurate
- Show your work (include code/queries)
- Explain findings in business terms
- Suggest relevant visualizations
- Flag data quality issues

When analyzing data:
1. Understand the question
2. Examine the data structure
3. Perform analysis
4. Present insights clearly
"""

# Custom prompt per organization
def get_system_prompt(org_id: str) -> str:
    custom = db.organizations.find_one(
        {"_id": org_id},
        {"ai_settings.system_prompt": 1}
    )

    return custom.get("ai_settings", {}).get(
        "system_prompt",
        DEFAULT_SYSTEM_PROMPT
    )

Organization-Specific Prompts

Configure custom prompts per organization:

// Update organization AI settings
db.organizations.updateOne(
  {_id: "org_xxxxxxxxxxxxx"},
  {
    $set: {
      "ai_settings.system_prompt": `
        You are a healthcare data analyst. Focus on patient outcomes,
        compliance with HIPAA, and clinical insights. Always consider:
        - Patient privacy
        - Statistical significance
        - Clinical relevance
      `,
      "ai_settings.temperature": 0.3,  // Lower for more consistent results
      "ai_settings.max_tokens": 2000
    }
  }
)

Temperature and Sampling

Control AI randomness:

# Temperature (0.0 - 2.0)
# Lower = more deterministic, higher = more creative
AI_TEMPERATURE=0.7

# Top P sampling
AI_TOP_P=0.9

# Frequency penalty (reduce repetition)
AI_FREQUENCY_PENALTY=0.0

# Presence penalty (encourage new topics)
AI_PRESENCE_PENALTY=0.0

Usage in code:

response = client.chat.completions.create(
    model=os.getenv("STANDARD_MODEL"),
    messages=messages,
    temperature=float(os.getenv("AI_TEMPERATURE", 0.7)),
    top_p=float(os.getenv("AI_TOP_P", 0.9)),
    frequency_penalty=float(os.getenv("AI_FREQUENCY_PENALTY", 0.0)),
    presence_penalty=float(os.getenv("AI_PRESENCE_PENALTY", 0.0)),
)

Response Formatting

Control output format:

# JSON mode for structured output
response = client.chat.completions.create(
    model=os.getenv("STANDARD_MODEL"),
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Analyze sales data"}
    ],
    response_format={"type": "json_object"}
)

# Structured output schema
schema = {
    "type": "object",
    "properties": {
        "insights": {"type": "array"},
        "recommendations": {"type": "array"},
        "visualizations": {"type": "array"}
    },
    "required": ["insights"]
}

Cost Management

Token Usage Tracking

Track AI costs:

# Log token usage
db.ai_usage.insert_one({
    "user_email": user_email,
    "organization_id": org_id,
    "model": model_name,
    "prompt_tokens": response.usage.prompt_tokens,
    "completion_tokens": response.usage.completion_tokens,
    "total_tokens": response.usage.total_tokens,
    "cost_usd": calculate_cost(response.usage, model_name),
    "timestamp": datetime.now()
})

def calculate_cost(usage, model):
    # Pricing per 1K tokens (example rates)
    pricing = {
        "gpt-4o": {"prompt": 0.005, "completion": 0.015},
        "gpt-4o-mini": {"prompt": 0.00015, "completion": 0.0006},
        "o1-preview": {"prompt": 0.015, "completion": 0.060}
    }

    rates = pricing.get(model, pricing["gpt-4o"])
    cost = (
        (usage.prompt_tokens / 1000) * rates["prompt"] +
        (usage.completion_tokens / 1000) * rates["completion"]
    )
    return round(cost, 6)

Usage Quotas

Set organizational usage limits:

// Set monthly token quota
db.organizations.updateOne(
  {_id: "org_xxxxxxxxxxxxx"},
  {
    $set: {
      "ai_settings.quotas": {
        "tokens_per_month": 1000000,    // 1M tokens/month
        "cost_limit_usd": 100,          // $100/month
        "requests_per_day": 1000
      }
    }
  }
)

Enforce quotas:

def check_quota(org_id: str) -> bool:
    # Get organization quotas
    org = db.organizations.find_one(
        {"_id": org_id},
        {"ai_settings.quotas": 1}
    )
    quotas = org.get("ai_settings", {}).get("quotas", {})

    # Get current month usage
    month_start = datetime.now().replace(day=1, hour=0, minute=0, second=0)
    usage = db.ai_usage.aggregate([
        {
            "$match": {
                "organization_id": org_id,
                "timestamp": {"$gte": month_start}
            }
        },
        {
            "$group": {
                "_id": None,
                "total_tokens": {"$sum": "$total_tokens"},
                "total_cost": {"$sum": "$cost_usd"}
            }
        }
    ])

    current = list(usage)[0] if usage else {"total_tokens": 0, "total_cost": 0}

    # Check limits
    if current["total_tokens"] > quotas.get("tokens_per_month", float("inf")):
        return False
    if current["total_cost"] > quotas.get("cost_limit_usd", float("inf")):
        return False

    return True

Cost Optimization Strategies

Use appropriate models:
- Fast model for simple tasks
- Standard model for most work
- Smart model only when needed
Optimize prompts:
- Be specific and concise
- Avoid unnecessary context
- Use examples sparingly

Cache responses:

# Cache common queries
cache_key = f"ai_response:{hash(prompt)}"
cached = redis.get(cache_key)

if cached:
    return cached

response = ai_completion(prompt)
redis.setex(cache_key, 3600, response)  # Cache 1 hour
return response

Implement rate limiting:

# Limit AI requests per user
rate_limit = f"ai_rate:{user_email}"
requests = redis.incr(rate_limit)

if requests == 1:
    redis.expire(rate_limit, 3600)  # Reset every hour

if requests > 100:  # Max 100 requests/hour
    raise RateLimitError("AI rate limit exceeded")

Monitoring Costs

View AI usage and costs:

// Monthly cost by organization
db.ai_usage.aggregate([
    {
        $match: {
            timestamp: {
                $gte: new Date(new Date().getFullYear(), new Date().getMonth(), 1)
            }
        }
    },
    {
        $group: {
            _id: "$organization_id",
            total_cost: {$sum: "$cost_usd"},
            total_tokens: {$sum: "$total_tokens"},
            request_count: {$sum: 1}
        }
    },
    {
        $sort: {total_cost: -1}
    }
])

// Cost by user
db.ai_usage.aggregate([
    {
        $match: {
            timestamp: {
                $gte: new Date(Date.now() - 30*24*60*60*1000)
            }
        }
    },
    {
        $group: {
            _id: "$user_email",
            total_cost: {$sum: "$cost_usd"},
            avg_tokens_per_request: {
                $avg: "$total_tokens"
            }
        }
    },
    {
        $sort: {total_cost: -1}
    }
])

Performance Optimization

Parallel Processing

Process multiple AI requests concurrently:

import asyncio
from typing import List

async def batch_ai_completion(prompts: List[str]) -> List[str]:
    tasks = [
        ai_completion_async(prompt)
        for prompt in prompts
    ]
    return await asyncio.gather(*tasks)

# Usage
results = await batch_ai_completion([
    "Analyze dataset A",
    "Analyze dataset B",
    "Analyze dataset C"
])

Streaming Responses

Stream AI responses for better UX:

def stream_ai_response(prompt: str):
    response = client.chat.completions.create(
        model=os.getenv("STANDARD_MODEL"),
        messages=[{"role": "user", "content": prompt}],
        stream=True
    )

    for chunk in response:
        if chunk.choices[0].delta.content:
            yield chunk.choices[0].delta.content

Request Batching

Batch multiple questions:

# Instead of multiple API calls
result1 = ai_completion("What is the average?")
result2 = ai_completion("What is the max?")
result3 = ai_completion("What is the min?")

# Batch into single call
batch_prompt = """
Answer these questions about the data:
1. What is the average?
2. What is the maximum?
3. What is the minimum?

Format as JSON: {"average": ..., "max": ..., "min": ...}
"""
result = ai_completion(batch_prompt)

Troubleshooting AI Issues

API Rate Limits

Issue: Rate limit errors from AI provider

Solutions:

Implement exponential backoff:

import time

def ai_completion_with_retry(prompt, max_retries=3):
    for attempt in range(max_retries):
        try:
            return ai_completion(prompt)
        except RateLimitError:
            wait_time = 2 ** attempt
            time.sleep(wait_time)
    raise Exception("Max retries exceeded")

Reduce request frequency
Upgrade API tier (Azure/OpenAI)

Poor Quality Responses

Issue: AI providing irrelevant or incorrect answers

Solutions:

Improve prompts:
- Be more specific
- Provide examples
- Add constraints
Adjust temperature:
- Lower for factual tasks (0.3)
- Higher for creative tasks (0.9)
Use better model:
- Upgrade to SMART_MODEL for complex tasks

High Costs

Issue: AI costs exceeding budget

Solutions:

Audit usage by user/org
Implement stricter quotas
Switch to smaller models where appropriate
Cache frequent queries
Optimize prompts to reduce tokens

Best Practices

Model Selection:
- Use fast model by default
- Upgrade to standard/smart only when needed
- Test model performance vs. cost tradeoffs
Prompt Engineering:
- Clear, specific instructions
- Include relevant context only
- Use examples when helpful
- Specify output format
Cost Control:
- Set organizational quotas
- Monitor usage regularly
- Alert on unusual patterns
- Implement caching
Performance:
- Use async/parallel processing
- Stream responses for UX
- Batch similar requests
- Cache common queries
Quality Assurance:
- Validate AI responses
- Log problematic outputs
- Gather user feedback
- Iterate on prompts

Next Steps

Monitoring & Usage - Track AI usage and costs
Environment Configuration - Configure AI settings
Troubleshooting - Resolve AI-related issues