Skip to content

Monitoring & Usage

This guide covers monitoring Querri’s health, tracking usage metrics, managing resources, and analyzing performance.

Usage dashboard is available at:

https://app.yourcompany.com/settings/usage

Requirements: Admin access

The usage page displays:

  • User activity - Active users, login frequency
  • Resource usage - Projects, dashboards, files
  • Storage metrics - Total storage, per-user breakdown
  • API usage - API calls, rate limits
  • AI usage - Token consumption, model usage
  • Integration activity - Sync frequency, data volume

Querri provides multiple health check endpoints:

Terminal window
# Healthz service (external monitoring)
curl http://localhost:8180/healthz

Response:

{
"status": "healthy",
"timestamp": "2024-01-15T10:00:00Z",
"services": {
"web-app": "up",
"server-api": "up",
"hub": "up",
"mongo": "up",
"redis": "up"
}
}
Terminal window
# Hub internal health check
curl http://localhost:8888/hub/healthz
Terminal window
# Server API health
curl http://localhost:8888/api/health
Terminal window
# View all service status
docker compose ps
# Check specific service health
docker inspect --format='{{.State.Health.Status}}' querri-hub
# View service logs
docker compose logs --tail=100 -f server-api

Set up automated health checks with monitoring tools:

Configure HTTP monitor:

URL: https://app.yourcompany.com/api/health
Interval: 5 minutes
Expected: Status 200
Alert on: Status != 200 or response time > 5s
check_querri_health.sh
#!/bin/bash
RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8180/healthz)
if [ "$RESPONSE" -eq 200 ]; then
echo "OK - Querri is healthy"
exit 0
else
echo "CRITICAL - Querri health check failed (HTTP $RESPONSE)"
exit 2
fi
Terminal window
# Connect to MongoDB
docker compose exec mongo mongosh -u querri -p
# Database statistics
use querri
db.stats()
# Collection sizes
db.projects.stats()
db.files.stats()
db.users.stats()
# Current operations
db.currentOp()
// Total database size
db.stats().dataSize
// Storage by collection
db.getCollectionNames().forEach(function(collection) {
var stats = db[collection].stats();
print(collection + ": " + (stats.size / 1024 / 1024).toFixed(2) + " MB");
});
// File storage usage
db.files.aggregate([
{
$group: {
_id: null,
total_size_mb: {
$sum: {$divide: ["$size", 1024 * 1024]}
},
file_count: {$sum: 1}
}
}
])
// List indexes
db.projects.getIndexes()
// Index statistics
db.projects.aggregate([{$indexStats: {}}])
// Slow queries (enable profiling first)
db.setProfilingLevel(1, {slowms: 100}) // Log queries > 100ms
db.system.profile.find().sort({ts: -1}).limit(10)
Terminal window
# Connect to Redis
docker compose exec redis redis-cli
# Server information
INFO
# Memory usage
INFO memory
# Key statistics
INFO keyspace
# List all keys (use with caution in production)
KEYS *
# Monitor real-time commands
MONITOR
Terminal window
# Memory stats
redis-cli INFO memory | grep used_memory_human
# Largest keys
redis-cli --bigkeys
# Key expiration info
redis-cli TTL session:user@company.com
Terminal window
# Docker container stats
docker stats --no-stream
# Specific service resources
docker stats querri-server-api --no-stream
# Detailed container metrics
docker inspect querri-server-api | grep -A 20 "State"
Terminal window
# MongoDB volume usage
docker volume inspect querri_mongodb_data
# Docker system disk usage
docker system df -v
# Container disk usage
du -sh /var/lib/docker/volumes/querri_mongodb_data
Terminal window
# All services
docker compose logs
# Specific service
docker compose logs server-api
# Follow logs in real-time
docker compose logs -f --tail=100 server-api
# Filter by time
docker compose logs --since 1h server-api
# Multiple services
docker compose logs web-app server-api hub

Configure log verbosity in .env-prod:

Terminal window
# Application log level
LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR, CRITICAL

For Traefik:

traefik/traefik.yml
log:
level: INFO # DEBUG, INFO, WARN, ERROR
accessLog:
enabled: true

Option 1: ELK Stack (Elasticsearch, Logstash, Kibana)

Section titled “Option 1: ELK Stack (Elasticsearch, Logstash, Kibana)”

docker-compose.logging.yml:

elasticsearch:
image: elasticsearch:8.11.0
environment:
- discovery.type=single-node
ports:
- "9200:9200"
logstash:
image: logstash:8.11.0
volumes:
- ./logstash/pipeline:/usr/share/logstash/pipeline
depends_on:
- elasticsearch
kibana:
image: kibana:8.11.0
ports:
- "5601:5601"
depends_on:
- elasticsearch

logstash/pipeline/docker.conf:

input {
gelf {
port => 12201
}
}
output {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "querri-logs-%{+YYYY.MM.dd}"
}
}

Update main docker-compose.yml:

services:
server-api:
logging:
driver: gelf
options:
gelf-address: "udp://localhost:12201"
tag: "server-api"
loki:
image: grafana/loki:2.9.0
ports:
- "3100:3100"
promtail:
image: grafana/promtail:2.9.0
volumes:
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- ./promtail-config.yml:/etc/promtail/config.yml

Python services use structured JSON logging:

import logging
import json
logger = logging.getLogger(__name__)
logger.info(
"Project created",
extra={
"user_email": "user@company.com",
"project_id": "abc123",
"organization_id": "org_123"
}
)

Configure log rotation:

/etc/docker/daemon.json
# Docker log rotation
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
}
}
# Restart Docker daemon
sudo systemctl restart docker

Configure Sentry for error tracking:

.env-prod
PUBLIC_SENTRY_ORG_ID="your_org_id"
PUBLIC_SENTRY_PROJECT_ID="your_project_id"
PUBLIC_SENTRY_KEY="your_sentry_key"
SENTRY_AUTH_TOKEN="your_auth_token"

View errors in Sentry Dashboard:

  • Error frequency and trends
  • Stack traces
  • User impact
  • Performance issues

Track custom application metrics:

# Track AI usage
db.metrics.insertOne({
"metric": "ai_tokens_used",
"value": 1247,
"user_email": "user@company.com",
"model": "gpt-4o",
"timestamp": datetime.now()
})
# Track API latency
db.metrics.insertOne({
"metric": "api_latency_ms",
"value": 234,
"endpoint": "/api/projects",
"method": "GET",
"timestamp": datetime.now()
})
// Enable profiling
db.setProfilingLevel(2) // Log all queries
// View slow queries
db.system.profile.find({millis: {$gt: 100}}).sort({ts: -1}).limit(10)
// Explain query performance
db.projects.find({user_email: "user@company.com"}).explain("executionStats")
# MongoDB connection pool stats
from pymongo import MongoClient
client = MongoClient(connection_string)
pool_stats = client.server_info()
print(f"Connections: {pool_stats['connections']}")
// Average API response time by endpoint
db.api_logs.aggregate([
{
$match: {
timestamp: {
$gte: new Date(Date.now() - 24*60*60*1000)
}
}
},
{
$group: {
_id: "$endpoint",
avg_response_ms: {$avg: "$response_time_ms"},
count: {$sum: 1}
}
},
{
$sort: {avg_response_ms: -1}
}
])
// API calls per user (last 24 hours)
db.api_logs.aggregate([
{
$match: {
timestamp: {
$gte: new Date(Date.now() - 24*60*60*1000)
}
}
},
{
$group: {
_id: "$user_email",
api_calls: {$sum: 1}
}
},
{
$sort: {api_calls: -1}
}
])
// Active users (last 30 days)
db.users.find({
last_login: {
$gte: new Date(Date.now() - 30*24*60*60*1000)
}
}).count()
// Daily active users
db.audit_log.aggregate([
{
$match: {
event_type: "authentication",
timestamp: {
$gte: new Date(Date.now() - 30*24*60*60*1000)
}
}
},
{
$group: {
_id: {
$dateToString: {
format: "%Y-%m-%d",
date: "$timestamp"
}
},
unique_users: {$addToSet: "$user_email"}
}
},
{
$project: {
date: "$_id",
user_count: {$size: "$unique_users"}
}
},
{
$sort: {date: -1}
}
])
// Most used features
db.audit_log.aggregate([
{
$match: {
timestamp: {
$gte: new Date(Date.now() - 30*24*60*60*1000)
}
}
},
{
$group: {
_id: "$action",
count: {$sum: 1}
}
},
{
$sort: {count: -1}
}
])
// Projects per user
db.projects.aggregate([
{
$group: {
_id: "$created_by",
project_count: {$sum: 1}
}
},
{
$sort: {project_count: -1}
}
])
// Storage usage per user
db.files.aggregate([
{
$group: {
_id: "$uploaded_by",
total_bytes: {$sum: "$size"},
file_count: {$sum: 1}
}
},
{
$project: {
user_email: "$_id",
total_mb: {$divide: ["$total_bytes", 1024*1024]},
file_count: 1
}
},
{
$sort: {total_mb: -1}
}
])
// AI token usage by user
db.ai_usage.aggregate([
{
$match: {
timestamp: {
$gte: new Date(Date.now() - 30*24*60*60*1000)
}
}
},
{
$group: {
_id: "$user_email",
total_tokens: {$sum: "$tokens"},
requests: {$sum: 1},
cost_usd: {$sum: "$cost_usd"}
}
},
{
$sort: {total_tokens: -1}
}
])

Set up alerts for critical conditions:

check_disk_space.sh
#!/bin/bash
THRESHOLD=80
USAGE=$(df -h /var/lib/docker | awk 'NR==2 {print $5}' | sed 's/%//')
if [ $USAGE -gt $THRESHOLD ]; then
echo "CRITICAL: Disk usage is ${USAGE}%"
# Send alert (email, Slack, PagerDuty, etc.)
curl -X POST https://hooks.slack.com/services/YOUR/WEBHOOK/URL \
-H 'Content-Type: application/json' \
-d "{\"text\": \"🚨 Querri disk usage: ${USAGE}%\"}"
fi
check_service_health.sh
#!/bin/bash
HEALTH=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8180/healthz)
if [ "$HEALTH" -ne 200 ]; then
echo "CRITICAL: Health check failed (HTTP $HEALTH)"
# Send alert
curl -X POST https://hooks.slack.com/services/YOUR/WEBHOOK/URL \
-H 'Content-Type: application/json' \
-d '{"text": "🚨 Querri health check failed"}'
fi
// Check replication lag (if using replica set)
rs.printSlaveReplicationInfo()
// Alert if lag > 10 seconds
const lag = rs.status().members[1].optimeDate - rs.status().members[0].optimeDate;
if (lag > 10000) { // milliseconds
// Send alert
}
# Send email alert
from api.utils.email import send_email
send_email(
to="admin@company.com",
subject="Querri Alert: High Disk Usage",
body="Disk usage has exceeded 80%"
)
Terminal window
# Slack webhook
curl -X POST https://hooks.slack.com/services/YOUR/WEBHOOK/URL \
-H 'Content-Type: application/json' \
-d '{"text": "🚨 Querri Alert: Service degraded"}'
import requests
def send_pagerduty_alert(message):
payload = {
"routing_key": "YOUR_INTEGRATION_KEY",
"event_action": "trigger",
"payload": {
"summary": message,
"severity": "error",
"source": "querri-monitoring"
}
}
requests.post(
"https://events.pagerduty.com/v2/enqueue",
json=payload
)

Set up Grafana for visual monitoring:

docker-compose.monitoring.yml
grafana:
image: grafana/grafana:10.0.0
ports:
- "3000:3000"
volumes:
- grafana-storage:/var/lib/grafana
- ./grafana/dashboards:/etc/grafana/provisioning/dashboards
prometheus:
image: prom/prometheus:v2.45.0
ports:
- "9090:9090"
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-storage:/prometheus

Export metrics to Prometheus:

# Prometheus metrics endpoint
from prometheus_client import Counter, Histogram, generate_latest
api_requests = Counter('api_requests_total', 'Total API requests')
api_latency = Histogram('api_latency_seconds', 'API latency')
@app.get("/metrics")
def metrics():
return Response(generate_latest(), media_type="text/plain")
  1. Identify process:

    Terminal window
    docker stats --no-stream
  2. Check service logs:

    Terminal window
    docker compose logs server-api | grep -i error
  3. Review query performance:

    db.currentOp({"secs_running": {$gte: 5}})
  4. Scale service:

    Terminal window
    SERVER_API_REPLICAS=8 docker compose up -d --scale server-api=8
  1. Check container memory:

    Terminal window
    docker stats --format "table {{.Name}}\t{{.MemUsage}}"
  2. Review MongoDB memory:

    db.serverStatus().mem
  3. Clear Redis cache:

    Terminal window
    docker compose exec redis redis-cli FLUSHALL
  1. Check API latency:

    db.api_logs.find().sort({response_time_ms: -1}).limit(10)
  2. Review database indexes:

    db.projects.getIndexes()
  3. Analyze slow queries:

    db.system.profile.find({millis: {$gt: 100}})
  1. Set up automated monitoring - Don’t rely on manual checks
  2. Establish baselines - Know normal performance metrics
  3. Alert on trends - Catch issues before they become critical
  4. Regular reviews - Weekly review of metrics and logs
  5. Document incidents - Keep record of issues and resolutions
  6. Capacity planning - Monitor growth and plan for scaling
  7. Test alerts - Ensure alert system is working