Skip to content

Database Best Practices for Analytics

When connecting a database to Querri, a little preparation goes a long way. This guide covers how administrators can set up databases to give analytics users a clean, fast, and secure experience—without exposing them to the complexity of raw normalized tables.

Most production databases are normalized—data is split across many tables to reduce redundancy and maintain integrity. This is great for applications but frustrating for analytics.

Consider a typical e-commerce database:

orders → order_items → products → categories
↓ ↓
customers → addresses → regions
customer_segments

To answer “What’s our revenue by product category and customer segment?”, an analyst would need to join 6 tables. In a natural language tool, this creates two problems:

  1. Complexity — The AI needs to understand all the relationships
  2. Performance — Multi-table joins on large datasets can be slow
  3. Confusion — Users see dozens of tables and don’t know where to start

Create pre-joined views that present data the way analysts think about it:

-- PostgreSQL / MySQL
CREATE VIEW analytics.order_summary AS
SELECT
o.order_id,
o.order_date,
o.total_amount,
c.customer_name,
c.email,
cs.segment_name as customer_segment,
p.product_name,
cat.category_name,
oi.quantity,
oi.unit_price,
oi.line_total
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN customer_segments cs ON c.segment_id = cs.segment_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
JOIN categories cat ON p.category_id = cat.category_id;

Now users see one order_summary view instead of six tables. They can immediately ask:

  • “Show revenue by customer segment”
  • “What are the top products by category?”
  • “Revenue trend by month”

No joins required—the view handles it.

Commonly joined data If analysts always need customer info with orders, pre-join them.

Aggregated summaries Daily/weekly/monthly rollups that are expensive to compute on the fly.

Filtered subsets “Active customers only” or “Last 2 years of orders” views run faster than filtering the full table every time.

Calculated fields Profit margins, customer lifetime value, days since last order—compute once in the view.

Denormalized dimensions Flatten hierarchies (Product → Subcategory → Category → Department) into single columns.

Ad-hoc deep dives Sometimes analysts need the raw detail.

Data that changes structure frequently Views on volatile schemas break easily.

Very large fact tables Views on billion-row tables may not perform better than direct access.

Use clear, descriptive names:

-- Good: Clear what it contains
CREATE VIEW analytics.daily_sales_summary ...
CREATE VIEW analytics.customer_lifetime_value ...
CREATE VIEW analytics.product_performance ...
-- Avoid: Cryptic abbreviations
CREATE VIEW v_dss_01 ...
CREATE VIEW rpt_cust_ltv ...
CREATE VIEW analytics.customer_metrics AS
SELECT
c.customer_id,
c.customer_name,
c.email,
c.created_at as signup_date,
-- Calculated fields analysts always need
COUNT(o.order_id) as total_orders,
SUM(o.total_amount) as lifetime_value,
AVG(o.total_amount) as avg_order_value,
MIN(o.order_date) as first_order_date,
MAX(o.order_date) as last_order_date,
DATEDIFF(day, MAX(o.order_date), CURRENT_DATE) as days_since_last_order,
-- Derived segments
CASE
WHEN COUNT(o.order_id) >= 10 THEN 'VIP'
WHEN COUNT(o.order_id) >= 5 THEN 'Regular'
WHEN COUNT(o.order_id) >= 2 THEN 'Repeat'
ELSE 'New'
END as customer_tier
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name, c.email, c.created_at;

For dashboards that always show monthly or weekly data:

CREATE VIEW analytics.monthly_revenue AS
SELECT
DATE_TRUNC('month', order_date) as month,
COUNT(*) as order_count,
SUM(total_amount) as revenue,
COUNT(DISTINCT customer_id) as unique_customers,
SUM(total_amount) / COUNT(DISTINCT customer_id) as revenue_per_customer
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '3 years'
GROUP BY DATE_TRUNC('month', order_date);

Consider Materialized Views for Large Data

Section titled “Consider Materialized Views for Large Data”

For expensive aggregations, use materialized views (pre-computed and stored):

PostgreSQL:

CREATE MATERIALIZED VIEW analytics.product_performance AS
SELECT
p.product_id,
p.product_name,
p.category,
SUM(oi.quantity) as total_units_sold,
SUM(oi.line_total) as total_revenue,
COUNT(DISTINCT o.customer_id) as unique_buyers
FROM products p
JOIN order_items oi ON p.product_id = oi.product_id
JOIN orders o ON oi.order_id = o.order_id
WHERE o.order_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY p.product_id, p.product_name, p.category;
-- Refresh periodically (e.g., nightly)
REFRESH MATERIALIZED VIEW analytics.product_performance;

SQL Server:

-- Use indexed views for automatic refresh
CREATE VIEW analytics.product_performance WITH SCHEMABINDING AS
SELECT
p.product_id,
p.product_name,
SUM(oi.line_total) as total_revenue,
COUNT_BIG(*) as row_count
FROM dbo.products p
JOIN dbo.order_items oi ON p.product_id = oi.product_id
GROUP BY p.product_id, p.product_name;
CREATE UNIQUE CLUSTERED INDEX IX_product_performance
ON analytics.product_performance(product_id);

Restricting Access to Specific Tables and Views

Section titled “Restricting Access to Specific Tables and Views”

Don’t give analytics users access to everything. Create a focused, clean environment.

Keep analytics views separate from raw tables:

PostgreSQL:

-- Create analytics schema
CREATE SCHEMA analytics;
-- Create analytics user
CREATE USER querri_analytics WITH PASSWORD 'secure_password';
-- Grant access ONLY to analytics schema
GRANT USAGE ON SCHEMA analytics TO querri_analytics;
GRANT SELECT ON ALL TABLES IN SCHEMA analytics TO querri_analytics;
ALTER DEFAULT PRIVILEGES IN SCHEMA analytics
GRANT SELECT ON TABLES TO querri_analytics;
-- Explicitly deny access to other schemas (optional, for extra security)
REVOKE ALL ON SCHEMA public FROM querri_analytics;

MySQL:

-- Create analytics database (MySQL doesn't have schemas like PostgreSQL)
CREATE DATABASE analytics;
-- Create analytics user with access only to analytics database
CREATE USER 'querri_analytics'@'%' IDENTIFIED BY 'secure_password';
GRANT SELECT ON analytics.* TO 'querri_analytics'@'%';
FLUSH PRIVILEGES;

SQL Server:

-- Create analytics schema
CREATE SCHEMA analytics;
-- Create login and user
CREATE LOGIN querri_analytics WITH PASSWORD = 'secure_password';
USE your_database;
CREATE USER querri_analytics FOR LOGIN querri_analytics;
-- Grant access only to analytics schema
GRANT SELECT ON SCHEMA::analytics TO querri_analytics;
-- Deny access to other schemas
DENY SELECT ON SCHEMA::dbo TO querri_analytics;

If you don’t want to use a separate schema, grant access table by table:

PostgreSQL:

-- Create user with no default permissions
CREATE USER querri_analytics WITH PASSWORD 'secure_password';
GRANT CONNECT ON DATABASE your_database TO querri_analytics;
GRANT USAGE ON SCHEMA public TO querri_analytics;
-- Grant access to specific tables only
GRANT SELECT ON orders TO querri_analytics;
GRANT SELECT ON customers TO querri_analytics;
GRANT SELECT ON products TO querri_analytics;
-- User cannot see: invoices, payments, user_credentials, etc.

MySQL:

CREATE USER 'querri_analytics'@'%' IDENTIFIED BY 'secure_password';
-- Grant access to specific tables only
GRANT SELECT ON your_database.orders TO 'querri_analytics'@'%';
GRANT SELECT ON your_database.customers TO 'querri_analytics'@'%';
GRANT SELECT ON your_database.products TO 'querri_analytics'@'%';
FLUSH PRIVILEGES;

SQL Server:

CREATE USER querri_analytics FOR LOGIN querri_analytics;
-- Grant access to specific tables only
GRANT SELECT ON dbo.orders TO querri_analytics;
GRANT SELECT ON dbo.customers TO querri_analytics;
GRANT SELECT ON dbo.products TO querri_analytics;

The fewer tables users see, the faster they can find what they need.

Don’t expose:

  • Migration tracking tables (schema_migrations, __drizzle_migrations)
  • Audit logs (unless specifically needed)
  • Session/cache tables
  • Internal configuration tables

Remove or mask columns analysts don’t need:

-- View without sensitive columns
CREATE VIEW analytics.customers_safe AS
SELECT
customer_id,
customer_name,
-- Mask email
CONCAT(LEFT(email, 2), '***@***', RIGHT(email, 4)) as email_masked,
city,
state,
country,
segment,
created_at
-- Excluded: password_hash, ssn, credit_card_last4, etc.
FROM customers;

Move historical data out of main tables:

-- Main table: recent data only
CREATE VIEW analytics.orders_recent AS
SELECT * FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '2 years';
-- Archive table: older data (separate, if needed)
CREATE VIEW analytics.orders_archive AS
SELECT * FROM orders_archive;

Here’s a complete example of an analytics-ready setup:

-- Create schema
CREATE SCHEMA analytics;
-- Core views
CREATE VIEW analytics.orders AS
SELECT
o.order_id,
o.order_date,
o.total_amount,
o.status,
c.customer_id,
c.customer_name,
c.segment as customer_segment,
c.city,
c.state,
c.country
FROM raw.orders o
JOIN raw.customers c ON o.customer_id = c.customer_id
WHERE o.order_date >= CURRENT_DATE - INTERVAL '3 years';
CREATE VIEW analytics.order_items AS
SELECT
oi.order_id,
oi.line_number,
p.product_id,
p.product_name,
p.category,
p.subcategory,
oi.quantity,
oi.unit_price,
oi.line_total
FROM raw.order_items oi
JOIN raw.products p ON oi.product_id = p.product_id;
CREATE VIEW analytics.customer_summary AS
SELECT
c.customer_id,
c.customer_name,
c.segment,
c.city,
c.state,
c.country,
c.created_at as signup_date,
COUNT(o.order_id) as total_orders,
COALESCE(SUM(o.total_amount), 0) as lifetime_value,
MAX(o.order_date) as last_order_date
FROM raw.customers c
LEFT JOIN raw.orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name, c.segment,
c.city, c.state, c.country, c.created_at;
CREATE VIEW analytics.monthly_summary AS
SELECT
DATE_TRUNC('month', order_date) as month,
COUNT(*) as orders,
SUM(total_amount) as revenue,
COUNT(DISTINCT customer_id) as customers
FROM raw.orders
WHERE order_date >= CURRENT_DATE - INTERVAL '3 years'
GROUP BY DATE_TRUNC('month', order_date);
-- Create analytics user
CREATE USER querri_analytics WITH PASSWORD 'secure_password';
GRANT USAGE ON SCHEMA analytics TO querri_analytics;
GRANT SELECT ON ALL TABLES IN SCHEMA analytics TO querri_analytics;

Now analytics users see 4 clean views instead of dozens of normalized tables.

Before connecting a database to Querri:

  • Identify key analytics tables — What do analysts actually need?
  • Create pre-joined views — For commonly combined data
  • Add calculated fields — Metrics analysts always compute
  • Create an analytics schema — Separate from raw tables
  • Set up a dedicated user — With minimal permissions
  • Grant access to views only — Not raw tables
  • Exclude sensitive columns — Mask or remove PII
  • Consider materialized views — For expensive aggregations
  • Document the views — What each one contains and when to use it
  • Test with Querri — Verify tables appear and queries work