Tuning the AI
Tuning the AI
Section titled “Tuning the AI”This guide covers configuring AI models, optimizing performance, managing costs, and customizing AI behavior in Querri.
AI Architecture
Section titled “AI Architecture”Querri uses Large Language Models (LLMs) for:
- Data analysis: Automated insights and pattern detection
- Natural language queries: Chat-based data exploration
- Code generation: Python/SQL code for data transformations
- Visualization recommendations: Suggested charts and dashboards
- Data cleaning: Anomaly detection and data quality suggestions
AI Providers
Section titled “AI Providers”Querri supports two AI providers:
-
Azure OpenAI (Recommended for enterprise)
- Enterprise-grade security
- Regional deployment options
- SLA guarantees
- Dedicated capacity
-
OpenAI (Alternative)
- Direct OpenAI API access
- Latest models
- Pay-as-you-go pricing
Model Configuration
Section titled “Model Configuration”Azure OpenAI Setup
Section titled “Azure OpenAI Setup”Configure Azure OpenAI in .env-prod:
# Azure OpenAI EndpointAZURE_OPENAI_ENDPOINT=https://yourcompany.openai.azure.com
# API KeyAZURE_OPENAI_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
# API VersionAZURE_OPENAI_API_VERSION=2024-02-15-preview
# Model DeploymentsSTANDARD_MODEL=gpt-4o # Standard operationsFAST_MODEL=gpt-4o-mini # Quick responsesSMART_MODEL=gpt-4o # Complex analysisNote: Model names refer to your Azure deployment names, not OpenAI model IDs.
OpenAI Setup (Alternative)
Section titled “OpenAI Setup (Alternative)”Configure direct OpenAI access:
# OpenAI API KeyOPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
# Model selectionSTANDARD_MODEL=gpt-4oFAST_MODEL=gpt-4o-miniSMART_MODEL=gpt-4oModel Selection Strategy
Section titled “Model Selection Strategy”Querri uses different models for different tasks:
Standard Model (STANDARD_MODEL)
Section titled “Standard Model (STANDARD_MODEL)”Use case: Most data analysis tasks
Recommended: gpt-4o
Characteristics:
- Balanced speed and capability
- Good for general analysis
- Cost-effective for routine tasks
Used for:
- Data transformations
- Basic analysis
- Chart generation
- Query interpretation
Fast Model (FAST_MODEL)
Section titled “Fast Model (FAST_MODEL)”Use case: Quick responses, simple tasks
Recommended: gpt-4o-mini
Characteristics:
- Fast response time
- Lower cost per token
- Good for straightforward tasks
Used for:
- Simple queries
- UI suggestions
- Quick calculations
- Error message interpretation
Smart Model (SMART_MODEL)
Section titled “Smart Model (SMART_MODEL)”Use case: Complex reasoning, advanced analysis
Recommended: gpt-4o or o1-preview
Characteristics:
- Advanced reasoning
- Better for complex problems
- Higher cost, slower
Used for:
- Multi-step analysis
- Complex data transformations
- Advanced visualizations
- Strategic insights
Updating Models
Section titled “Updating Models”Change model configuration:
# Edit .env-prodnano .env-prod
# Update model deploymentsSTANDARD_MODEL=gpt-4oFAST_MODEL=gpt-4o-miniSMART_MODEL=o1-preview # Upgrade to o1 for advanced reasoning
# Restart servicesdocker compose restart server-api hubContext Window Configuration
Section titled “Context Window Configuration”Context Window Sizes
Section titled “Context Window Sizes”Different models have different context windows:
- GPT-4o: 128K tokens (~96K words)
- GPT-4o-mini: 128K tokens
- O1-preview: 128K tokens
Managing Context
Section titled “Managing Context”Configure context limits:
# Maximum tokens for promptsMAX_PROMPT_TOKENS=8000
# Maximum tokens for responsesMAX_COMPLETION_TOKENS=4000
# Total context budgetMAX_TOTAL_TOKENS=12000In code (server-api/api/utils/ai.py):
# Configure token limitsconfig = { "max_prompt_tokens": int(os.getenv("MAX_PROMPT_TOKENS", 8000)), "max_completion_tokens": int(os.getenv("MAX_COMPLETION_TOKENS", 4000)),}
# Truncate context if neededdef truncate_context(context: str, max_tokens: int) -> str: # Implement token counting and truncation passContext Optimization
Section titled “Context Optimization”Reduce context size to lower costs:
-
Include only relevant data:
# Bad: Include entire datasetcontext = f"Analyze this data: {entire_dataframe}"# Good: Include sample and schemacontext = f"Schema: {df.dtypes}\nSample: {df.head(10)}" -
Use data summaries:
# Include summary statistics instead of raw datacontext = f"Summary: {df.describe()}" -
Implement sliding window:
# Keep only recent conversation historyMAX_HISTORY_MESSAGES = 10context = conversation_history[-MAX_HISTORY_MESSAGES:]
Tool Availability
Section titled “Tool Availability”Configuring AI Tools
Section titled “Configuring AI Tools”Querri AI can use various tools for data analysis:
# Enable/disable AI toolsENABLE_CODE_INTERPRETER=true # Python code executionENABLE_WEB_SEARCH=false # Web search (requires API)ENABLE_FILE_UPLOAD=true # File analysisENABLE_DATA_VISUALIZATION=true # Chart generationENABLE_SQL_GENERATION=true # SQL query generationCode Interpreter
Section titled “Code Interpreter”Controls Python code execution:
tools = []
if os.getenv("ENABLE_CODE_INTERPRETER", "true").lower() == "true": tools.append({ "type": "code_interpreter", "settings": { "timeout": 30, # seconds "max_memory_mb": 512, "allowed_packages": [ "pandas", "numpy", "matplotlib", "seaborn", "scikit-learn" ] } })Function Calling
Section titled “Function Calling”Configure available functions:
# Define available functionsfunctions = [ { "name": "execute_sql", "description": "Execute SQL query on dataset", "parameters": { "type": "object", "properties": { "query": {"type": "string"} } } }, { "name": "create_visualization", "description": "Create data visualization", "parameters": { "type": "object", "properties": { "chart_type": {"type": "string"}, "x_axis": {"type": "string"}, "y_axis": {"type": "string"} } } }]Agent Behavior Customization
Section titled “Agent Behavior Customization”System Prompts
Section titled “System Prompts”Customize AI behavior via system prompts:
server-api/querri_core/ai/prompts.py:
# Default system promptDEFAULT_SYSTEM_PROMPT = """You are a data analysis assistant for Querri. Your role is to help usersanalyze data, create visualizations, and derive insights.
Guidelines:- Be concise and accurate- Show your work (include code/queries)- Explain findings in business terms- Suggest relevant visualizations- Flag data quality issues
When analyzing data:1. Understand the question2. Examine the data structure3. Perform analysis4. Present insights clearly"""
# Custom prompt per organizationdef get_system_prompt(org_id: str) -> str: custom = db.organizations.find_one( {"_id": org_id}, {"ai_settings.system_prompt": 1} )
return custom.get("ai_settings", {}).get( "system_prompt", DEFAULT_SYSTEM_PROMPT )Organization-Specific Prompts
Section titled “Organization-Specific Prompts”Configure custom prompts per organization:
// Update organization AI settingsdb.organizations.updateOne( {_id: "org_xxxxxxxxxxxxx"}, { $set: { "ai_settings.system_prompt": ` You are a healthcare data analyst. Focus on patient outcomes, compliance with HIPAA, and clinical insights. Always consider: - Patient privacy - Statistical significance - Clinical relevance `, "ai_settings.temperature": 0.3, // Lower for more consistent results "ai_settings.max_tokens": 2000 } })Temperature and Sampling
Section titled “Temperature and Sampling”Control AI randomness:
# Temperature (0.0 - 2.0)# Lower = more deterministic, higher = more creativeAI_TEMPERATURE=0.7
# Top P samplingAI_TOP_P=0.9
# Frequency penalty (reduce repetition)AI_FREQUENCY_PENALTY=0.0
# Presence penalty (encourage new topics)AI_PRESENCE_PENALTY=0.0Usage in code:
response = client.chat.completions.create( model=os.getenv("STANDARD_MODEL"), messages=messages, temperature=float(os.getenv("AI_TEMPERATURE", 0.7)), top_p=float(os.getenv("AI_TOP_P", 0.9)), frequency_penalty=float(os.getenv("AI_FREQUENCY_PENALTY", 0.0)), presence_penalty=float(os.getenv("AI_PRESENCE_PENALTY", 0.0)),)Response Formatting
Section titled “Response Formatting”Control output format:
# JSON mode for structured outputresponse = client.chat.completions.create( model=os.getenv("STANDARD_MODEL"), messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Analyze sales data"} ], response_format={"type": "json_object"})
# Structured output schemaschema = { "type": "object", "properties": { "insights": {"type": "array"}, "recommendations": {"type": "array"}, "visualizations": {"type": "array"} }, "required": ["insights"]}Cost Management
Section titled “Cost Management”Token Usage Tracking
Section titled “Token Usage Tracking”Track AI costs:
# Log token usagedb.ai_usage.insert_one({ "user_email": user_email, "organization_id": org_id, "model": model_name, "prompt_tokens": response.usage.prompt_tokens, "completion_tokens": response.usage.completion_tokens, "total_tokens": response.usage.total_tokens, "cost_usd": calculate_cost(response.usage, model_name), "timestamp": datetime.now()})
def calculate_cost(usage, model): # Pricing per 1K tokens (example rates) pricing = { "gpt-4o": {"prompt": 0.005, "completion": 0.015}, "gpt-4o-mini": {"prompt": 0.00015, "completion": 0.0006}, "o1-preview": {"prompt": 0.015, "completion": 0.060} }
rates = pricing.get(model, pricing["gpt-4o"]) cost = ( (usage.prompt_tokens / 1000) * rates["prompt"] + (usage.completion_tokens / 1000) * rates["completion"] ) return round(cost, 6)Usage Quotas
Section titled “Usage Quotas”Set organizational usage limits:
// Set monthly token quotadb.organizations.updateOne( {_id: "org_xxxxxxxxxxxxx"}, { $set: { "ai_settings.quotas": { "tokens_per_month": 1000000, // 1M tokens/month "cost_limit_usd": 100, // $100/month "requests_per_day": 1000 } } })Enforce quotas:
def check_quota(org_id: str) -> bool: # Get organization quotas org = db.organizations.find_one( {"_id": org_id}, {"ai_settings.quotas": 1} ) quotas = org.get("ai_settings", {}).get("quotas", {})
# Get current month usage month_start = datetime.now().replace(day=1, hour=0, minute=0, second=0) usage = db.ai_usage.aggregate([ { "$match": { "organization_id": org_id, "timestamp": {"$gte": month_start} } }, { "$group": { "_id": None, "total_tokens": {"$sum": "$total_tokens"}, "total_cost": {"$sum": "$cost_usd"} } } ])
current = list(usage)[0] if usage else {"total_tokens": 0, "total_cost": 0}
# Check limits if current["total_tokens"] > quotas.get("tokens_per_month", float("inf")): return False if current["total_cost"] > quotas.get("cost_limit_usd", float("inf")): return False
return TrueCost Optimization Strategies
Section titled “Cost Optimization Strategies”-
Use appropriate models:
- Fast model for simple tasks
- Standard model for most work
- Smart model only when needed
-
Optimize prompts:
- Be specific and concise
- Avoid unnecessary context
- Use examples sparingly
-
Cache responses:
# Cache common queriescache_key = f"ai_response:{hash(prompt)}"cached = redis.get(cache_key)if cached:return cachedresponse = ai_completion(prompt)redis.setex(cache_key, 3600, response) # Cache 1 hourreturn response -
Implement rate limiting:
# Limit AI requests per userrate_limit = f"ai_rate:{user_email}"requests = redis.incr(rate_limit)if requests == 1:redis.expire(rate_limit, 3600) # Reset every hourif requests > 100: # Max 100 requests/hourraise RateLimitError("AI rate limit exceeded")
Monitoring Costs
Section titled “Monitoring Costs”View AI usage and costs:
// Monthly cost by organizationdb.ai_usage.aggregate([ { $match: { timestamp: { $gte: new Date(new Date().getFullYear(), new Date().getMonth(), 1) } } }, { $group: { _id: "$organization_id", total_cost: {$sum: "$cost_usd"}, total_tokens: {$sum: "$total_tokens"}, request_count: {$sum: 1} } }, { $sort: {total_cost: -1} }])
// Cost by userdb.ai_usage.aggregate([ { $match: { timestamp: { $gte: new Date(Date.now() - 30*24*60*60*1000) } } }, { $group: { _id: "$user_email", total_cost: {$sum: "$cost_usd"}, avg_tokens_per_request: { $avg: "$total_tokens" } } }, { $sort: {total_cost: -1} }])Performance Optimization
Section titled “Performance Optimization”Parallel Processing
Section titled “Parallel Processing”Process multiple AI requests concurrently:
import asynciofrom typing import List
async def batch_ai_completion(prompts: List[str]) -> List[str]: tasks = [ ai_completion_async(prompt) for prompt in prompts ] return await asyncio.gather(*tasks)
# Usageresults = await batch_ai_completion([ "Analyze dataset A", "Analyze dataset B", "Analyze dataset C"])Streaming Responses
Section titled “Streaming Responses”Stream AI responses for better UX:
def stream_ai_response(prompt: str): response = client.chat.completions.create( model=os.getenv("STANDARD_MODEL"), messages=[{"role": "user", "content": prompt}], stream=True )
for chunk in response: if chunk.choices[0].delta.content: yield chunk.choices[0].delta.contentRequest Batching
Section titled “Request Batching”Batch multiple questions:
# Instead of multiple API callsresult1 = ai_completion("What is the average?")result2 = ai_completion("What is the max?")result3 = ai_completion("What is the min?")
# Batch into single callbatch_prompt = """Answer these questions about the data:1. What is the average?2. What is the maximum?3. What is the minimum?
Format as JSON: {"average": ..., "max": ..., "min": ...}"""result = ai_completion(batch_prompt)Troubleshooting AI Issues
Section titled “Troubleshooting AI Issues”API Rate Limits
Section titled “API Rate Limits”Issue: Rate limit errors from AI provider
Solutions:
-
Implement exponential backoff:
import timedef ai_completion_with_retry(prompt, max_retries=3):for attempt in range(max_retries):try:return ai_completion(prompt)except RateLimitError:wait_time = 2 ** attempttime.sleep(wait_time)raise Exception("Max retries exceeded") -
Reduce request frequency
-
Upgrade API tier (Azure/OpenAI)
Poor Quality Responses
Section titled “Poor Quality Responses”Issue: AI providing irrelevant or incorrect answers
Solutions:
-
Improve prompts:
- Be more specific
- Provide examples
- Add constraints
-
Adjust temperature:
- Lower for factual tasks (0.3)
- Higher for creative tasks (0.9)
-
Use better model:
- Upgrade to SMART_MODEL for complex tasks
High Costs
Section titled “High Costs”Issue: AI costs exceeding budget
Solutions:
- Audit usage by user/org
- Implement stricter quotas
- Switch to smaller models where appropriate
- Cache frequent queries
- Optimize prompts to reduce tokens
Best Practices
Section titled “Best Practices”-
Model Selection:
- Use fast model by default
- Upgrade to standard/smart only when needed
- Test model performance vs. cost tradeoffs
-
Prompt Engineering:
- Clear, specific instructions
- Include relevant context only
- Use examples when helpful
- Specify output format
-
Cost Control:
- Set organizational quotas
- Monitor usage regularly
- Alert on unusual patterns
- Implement caching
-
Performance:
- Use async/parallel processing
- Stream responses for UX
- Batch similar requests
- Cache common queries
-
Quality Assurance:
- Validate AI responses
- Log problematic outputs
- Gather user feedback
- Iterate on prompts
Next Steps
Section titled “Next Steps”- Monitoring & Usage - Track AI usage and costs
- Environment Configuration - Configure AI settings
- Troubleshooting - Resolve AI-related issues