Monitoring & Usage
Monitoring & Usage
Section titled “Monitoring & Usage”This guide covers monitoring Querri’s health, tracking usage metrics, managing resources, and analyzing performance.
Usage Tracking Interface
Section titled “Usage Tracking Interface”Accessing Usage Metrics
Section titled “Accessing Usage Metrics”Usage dashboard is available at:
https://app.yourcompany.com/settings/usageRequirements: Admin access
Usage Dashboard
Section titled “Usage Dashboard”The usage page displays:
- User activity - Active users, login frequency
- Resource usage - Projects, dashboards, files
- Storage metrics - Total storage, per-user breakdown
- API usage - API calls, rate limits
- AI usage - Token consumption, model usage
- Integration activity - Sync frequency, data volume
Service Health Monitoring
Section titled “Service Health Monitoring”Health Check Endpoints
Section titled “Health Check Endpoints”Querri provides multiple health check endpoints:
External Health Check
Section titled “External Health Check”# Healthz service (external monitoring)curl http://localhost:8180/healthzResponse:
{ "status": "healthy", "timestamp": "2024-01-15T10:00:00Z", "services": { "web-app": "up", "server-api": "up", "hub": "up", "mongo": "up", "redis": "up" }}Hub Service Health
Section titled “Hub Service Health”# Hub internal health checkcurl http://localhost:8888/hub/healthzAPI Health
Section titled “API Health”# Server API healthcurl http://localhost:8888/api/healthService Status via Docker
Section titled “Service Status via Docker”# View all service statusdocker compose ps
# Check specific service healthdocker inspect --format='{{.State.Health.Status}}' querri-hub
# View service logsdocker compose logs --tail=100 -f server-apiAutomated Health Monitoring
Section titled “Automated Health Monitoring”Set up automated health checks with monitoring tools:
Uptime Monitoring (Uptime Robot, Pingdom)
Section titled “Uptime Monitoring (Uptime Robot, Pingdom)”Configure HTTP monitor:
URL: https://app.yourcompany.com/api/healthInterval: 5 minutesExpected: Status 200Alert on: Status != 200 or response time > 5sNagios/Icinga Check
Section titled “Nagios/Icinga Check”#!/bin/bashRESPONSE=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8180/healthz)
if [ "$RESPONSE" -eq 200 ]; then echo "OK - Querri is healthy" exit 0else echo "CRITICAL - Querri health check failed (HTTP $RESPONSE)" exit 2fiResource Monitoring
Section titled “Resource Monitoring”MongoDB Monitoring
Section titled “MongoDB Monitoring”Database Metrics
Section titled “Database Metrics”# Connect to MongoDBdocker compose exec mongo mongosh -u querri -p
# Database statisticsuse querridb.stats()
# Collection sizesdb.projects.stats()db.files.stats()db.users.stats()
# Current operationsdb.currentOp()Storage Usage
Section titled “Storage Usage”// Total database sizedb.stats().dataSize
// Storage by collectiondb.getCollectionNames().forEach(function(collection) { var stats = db[collection].stats(); print(collection + ": " + (stats.size / 1024 / 1024).toFixed(2) + " MB");});
// File storage usagedb.files.aggregate([ { $group: { _id: null, total_size_mb: { $sum: {$divide: ["$size", 1024 * 1024]} }, file_count: {$sum: 1} } }])Index Performance
Section titled “Index Performance”// List indexesdb.projects.getIndexes()
// Index statisticsdb.projects.aggregate([{$indexStats: {}}])
// Slow queries (enable profiling first)db.setProfilingLevel(1, {slowms: 100}) // Log queries > 100msdb.system.profile.find().sort({ts: -1}).limit(10)Redis Monitoring
Section titled “Redis Monitoring”# Connect to Redisdocker compose exec redis redis-cli
# Server informationINFO
# Memory usageINFO memory
# Key statisticsINFO keyspace
# List all keys (use with caution in production)KEYS *
# Monitor real-time commandsMONITORRedis Memory Analysis
Section titled “Redis Memory Analysis”# Memory statsredis-cli INFO memory | grep used_memory_human
# Largest keysredis-cli --bigkeys
# Key expiration inforedis-cli TTL session:user@company.comContainer Resource Usage
Section titled “Container Resource Usage”# Docker container statsdocker stats --no-stream
# Specific service resourcesdocker stats querri-server-api --no-stream
# Detailed container metricsdocker inspect querri-server-api | grep -A 20 "State"Disk Space Monitoring
Section titled “Disk Space Monitoring”# MongoDB volume usagedocker volume inspect querri_mongodb_data
# Docker system disk usagedocker system df -v
# Container disk usagedu -sh /var/lib/docker/volumes/querri_mongodb_dataLog Aggregation
Section titled “Log Aggregation”Service Logs
Section titled “Service Logs”Viewing Logs
Section titled “Viewing Logs”# All servicesdocker compose logs
# Specific servicedocker compose logs server-api
# Follow logs in real-timedocker compose logs -f --tail=100 server-api
# Filter by timedocker compose logs --since 1h server-api
# Multiple servicesdocker compose logs web-app server-api hubLog Levels
Section titled “Log Levels”Configure log verbosity in .env-prod:
# Application log levelLOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR, CRITICALFor Traefik:
log: level: INFO # DEBUG, INFO, WARN, ERRORaccessLog: enabled: trueCentralized Logging
Section titled “Centralized Logging”Option 1: ELK Stack (Elasticsearch, Logstash, Kibana)
Section titled “Option 1: ELK Stack (Elasticsearch, Logstash, Kibana)”docker-compose.logging.yml:
elasticsearch: image: elasticsearch:8.11.0 environment: - discovery.type=single-node ports: - "9200:9200"
logstash: image: logstash:8.11.0 volumes: - ./logstash/pipeline:/usr/share/logstash/pipeline depends_on: - elasticsearch
kibana: image: kibana:8.11.0 ports: - "5601:5601" depends_on: - elasticsearchlogstash/pipeline/docker.conf:
input { gelf { port => 12201 }}
output { elasticsearch { hosts => ["elasticsearch:9200"] index => "querri-logs-%{+YYYY.MM.dd}" }}Update main docker-compose.yml:
services: server-api: logging: driver: gelf options: gelf-address: "udp://localhost:12201" tag: "server-api"Option 2: Loki (Lightweight)
Section titled “Option 2: Loki (Lightweight)”loki: image: grafana/loki:2.9.0 ports: - "3100:3100"
promtail: image: grafana/promtail:2.9.0 volumes: - /var/lib/docker/containers:/var/lib/docker/containers:ro - ./promtail-config.yml:/etc/promtail/config.ymlApplication Logging
Section titled “Application Logging”Structured Logging
Section titled “Structured Logging”Python services use structured JSON logging:
import loggingimport json
logger = logging.getLogger(__name__)
logger.info( "Project created", extra={ "user_email": "user@company.com", "project_id": "abc123", "organization_id": "org_123" })Log Retention
Section titled “Log Retention”Configure log rotation:
# Docker log rotation{ "log-driver": "json-file", "log-opts": { "max-size": "10m", "max-file": "3" }}
# Restart Docker daemonsudo systemctl restart dockerPerformance Metrics
Section titled “Performance Metrics”Application Performance Monitoring (APM)
Section titled “Application Performance Monitoring (APM)”Sentry Integration
Section titled “Sentry Integration”Configure Sentry for error tracking:
PUBLIC_SENTRY_ORG_ID="your_org_id"PUBLIC_SENTRY_PROJECT_ID="your_project_id"PUBLIC_SENTRY_KEY="your_sentry_key"SENTRY_AUTH_TOKEN="your_auth_token"View errors in Sentry Dashboard:
- Error frequency and trends
- Stack traces
- User impact
- Performance issues
Custom Metrics
Section titled “Custom Metrics”Track custom application metrics:
# Track AI usagedb.metrics.insertOne({ "metric": "ai_tokens_used", "value": 1247, "user_email": "user@company.com", "model": "gpt-4o", "timestamp": datetime.now()})
# Track API latencydb.metrics.insertOne({ "metric": "api_latency_ms", "value": 234, "endpoint": "/api/projects", "method": "GET", "timestamp": datetime.now()})Database Performance
Section titled “Database Performance”Query Performance
Section titled “Query Performance”// Enable profilingdb.setProfilingLevel(2) // Log all queries
// View slow queriesdb.system.profile.find({millis: {$gt: 100}}).sort({ts: -1}).limit(10)
// Explain query performancedb.projects.find({user_email: "user@company.com"}).explain("executionStats")Connection Pool Monitoring
Section titled “Connection Pool Monitoring”# MongoDB connection pool statsfrom pymongo import MongoClient
client = MongoClient(connection_string)pool_stats = client.server_info()print(f"Connections: {pool_stats['connections']}")API Performance
Section titled “API Performance”Response Time Tracking
Section titled “Response Time Tracking”// Average API response time by endpointdb.api_logs.aggregate([ { $match: { timestamp: { $gte: new Date(Date.now() - 24*60*60*1000) } } }, { $group: { _id: "$endpoint", avg_response_ms: {$avg: "$response_time_ms"}, count: {$sum: 1} } }, { $sort: {avg_response_ms: -1} }])Rate Limit Monitoring
Section titled “Rate Limit Monitoring”// API calls per user (last 24 hours)db.api_logs.aggregate([ { $match: { timestamp: { $gte: new Date(Date.now() - 24*60*60*1000) } } }, { $group: { _id: "$user_email", api_calls: {$sum: 1} } }, { $sort: {api_calls: -1} }])Usage Analytics
Section titled “Usage Analytics”User Activity
Section titled “User Activity”Active Users
Section titled “Active Users”// Active users (last 30 days)db.users.find({ last_login: { $gte: new Date(Date.now() - 30*24*60*60*1000) }}).count()
// Daily active usersdb.audit_log.aggregate([ { $match: { event_type: "authentication", timestamp: { $gte: new Date(Date.now() - 30*24*60*60*1000) } } }, { $group: { _id: { $dateToString: { format: "%Y-%m-%d", date: "$timestamp" } }, unique_users: {$addToSet: "$user_email"} } }, { $project: { date: "$_id", user_count: {$size: "$unique_users"} } }, { $sort: {date: -1} }])Feature Usage
Section titled “Feature Usage”// Most used featuresdb.audit_log.aggregate([ { $match: { timestamp: { $gte: new Date(Date.now() - 30*24*60*60*1000) } } }, { $group: { _id: "$action", count: {$sum: 1} } }, { $sort: {count: -1} }])Resource Usage
Section titled “Resource Usage”Projects by User
Section titled “Projects by User”// Projects per userdb.projects.aggregate([ { $group: { _id: "$created_by", project_count: {$sum: 1} } }, { $sort: {project_count: -1} }])Storage by User
Section titled “Storage by User”// Storage usage per userdb.files.aggregate([ { $group: { _id: "$uploaded_by", total_bytes: {$sum: "$size"}, file_count: {$sum: 1} } }, { $project: { user_email: "$_id", total_mb: {$divide: ["$total_bytes", 1024*1024]}, file_count: 1 } }, { $sort: {total_mb: -1} }])AI Usage Tracking
Section titled “AI Usage Tracking”// AI token usage by userdb.ai_usage.aggregate([ { $match: { timestamp: { $gte: new Date(Date.now() - 30*24*60*60*1000) } } }, { $group: { _id: "$user_email", total_tokens: {$sum: "$tokens"}, requests: {$sum: 1}, cost_usd: {$sum: "$cost_usd"} } }, { $sort: {total_tokens: -1} }])Alerting
Section titled “Alerting”Alert Configuration
Section titled “Alert Configuration”Set up alerts for critical conditions:
Disk Space Alert
Section titled “Disk Space Alert”#!/bin/bashTHRESHOLD=80USAGE=$(df -h /var/lib/docker | awk 'NR==2 {print $5}' | sed 's/%//')
if [ $USAGE -gt $THRESHOLD ]; then echo "CRITICAL: Disk usage is ${USAGE}%" # Send alert (email, Slack, PagerDuty, etc.) curl -X POST https://hooks.slack.com/services/YOUR/WEBHOOK/URL \ -H 'Content-Type: application/json' \ -d "{\"text\": \"🚨 Querri disk usage: ${USAGE}%\"}"fiService Down Alert
Section titled “Service Down Alert”#!/bin/bashHEALTH=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8180/healthz)
if [ "$HEALTH" -ne 200 ]; then echo "CRITICAL: Health check failed (HTTP $HEALTH)" # Send alert curl -X POST https://hooks.slack.com/services/YOUR/WEBHOOK/URL \ -H 'Content-Type: application/json' \ -d '{"text": "🚨 Querri health check failed"}'fiMongoDB Replication Lag
Section titled “MongoDB Replication Lag”// Check replication lag (if using replica set)rs.printSlaveReplicationInfo()
// Alert if lag > 10 secondsconst lag = rs.status().members[1].optimeDate - rs.status().members[0].optimeDate;if (lag > 10000) { // milliseconds // Send alert}Alert Channels
Section titled “Alert Channels”Email Alerts
Section titled “Email Alerts”# Send email alertfrom api.utils.email import send_email
send_email( to="admin@company.com", subject="Querri Alert: High Disk Usage", body="Disk usage has exceeded 80%")Slack Alerts
Section titled “Slack Alerts”# Slack webhookcurl -X POST https://hooks.slack.com/services/YOUR/WEBHOOK/URL \ -H 'Content-Type: application/json' \ -d '{"text": "🚨 Querri Alert: Service degraded"}'PagerDuty Integration
Section titled “PagerDuty Integration”import requests
def send_pagerduty_alert(message): payload = { "routing_key": "YOUR_INTEGRATION_KEY", "event_action": "trigger", "payload": { "summary": message, "severity": "error", "source": "querri-monitoring" } } requests.post( "https://events.pagerduty.com/v2/enqueue", json=payload )Dashboards
Section titled “Dashboards”Grafana Integration
Section titled “Grafana Integration”Set up Grafana for visual monitoring:
grafana: image: grafana/grafana:10.0.0 ports: - "3000:3000" volumes: - grafana-storage:/var/lib/grafana - ./grafana/dashboards:/etc/grafana/provisioning/dashboards
prometheus: image: prom/prometheus:v2.45.0 ports: - "9090:9090" volumes: - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml - prometheus-storage:/prometheusMetrics Export
Section titled “Metrics Export”Export metrics to Prometheus:
# Prometheus metrics endpointfrom prometheus_client import Counter, Histogram, generate_latest
api_requests = Counter('api_requests_total', 'Total API requests')api_latency = Histogram('api_latency_seconds', 'API latency')
@app.get("/metrics")def metrics(): return Response(generate_latest(), media_type="text/plain")Troubleshooting Performance Issues
Section titled “Troubleshooting Performance Issues”High CPU Usage
Section titled “High CPU Usage”-
Identify process:
Terminal window docker stats --no-stream -
Check service logs:
Terminal window docker compose logs server-api | grep -i error -
Review query performance:
db.currentOp({"secs_running": {$gte: 5}}) -
Scale service:
Terminal window SERVER_API_REPLICAS=8 docker compose up -d --scale server-api=8
High Memory Usage
Section titled “High Memory Usage”-
Check container memory:
Terminal window docker stats --format "table {{.Name}}\t{{.MemUsage}}" -
Review MongoDB memory:
db.serverStatus().mem -
Clear Redis cache:
Terminal window docker compose exec redis redis-cli FLUSHALL
Slow Response Times
Section titled “Slow Response Times”-
Check API latency:
db.api_logs.find().sort({response_time_ms: -1}).limit(10) -
Review database indexes:
db.projects.getIndexes() -
Analyze slow queries:
db.system.profile.find({millis: {$gt: 100}})
Best Practices
Section titled “Best Practices”- Set up automated monitoring - Don’t rely on manual checks
- Establish baselines - Know normal performance metrics
- Alert on trends - Catch issues before they become critical
- Regular reviews - Weekly review of metrics and logs
- Document incidents - Keep record of issues and resolutions
- Capacity planning - Monitor growth and plan for scaling
- Test alerts - Ensure alert system is working
Next Steps
Section titled “Next Steps”- Backup & Maintenance - Backup and recovery procedures
- Troubleshooting - Common issues and solutions
- Security & Permissions - Monitor security events
- AI Tuning - Optimize AI performance and costs