Recurring Analysis

Recurring analysis allows you to automate data pipelines, ensuring your analysis stays up-to-date without manual intervention. This guide covers how to schedule step execution, automate data refreshes, chain tasks together, and handle errors gracefully.

Scheduling Step Execution

In Querri, projects consist of steps that transform and analyze data. You can schedule individual steps or entire projects to run automatically.

Why Schedule Step Execution?

Benefits:

Always fresh data: Your analysis updates automatically
Save time: No manual re-running of steps
Consistency: Same process runs every time
Early detection: Catch data issues quickly
Pipeline automation: Build end-to-end data workflows

Use Cases:

Daily ETL (Extract, Transform, Load) processes
Hourly data synchronization
Weekly report data preparation
Monthly aggregation and rollups
Real-time monitoring and alerts

Scheduling a Single Step

When to use:

Step has expensive computation
Step pulls data from external source
Step needs to run at specific times
Step feeds into a dashboard

How to schedule:

Open your project
Navigate to the step you want to automate
Click on “Step Settings” (gear icon)
Select “Schedule Execution”
Configure frequency and time
Enable the schedule

Example: Daily Data Import

Step: Import Sales Data from Database
Schedule: Every day at 2:00 AM EST
Cron: 0 2 * * *
Timezone: America/New_York

Why 2 AM?
- Database load is low
- Data from previous day is complete
- Finished before morning dashboards refresh

Scheduling an Entire Project

When to use:

Multiple steps need to run in sequence
You have a complete data pipeline
All steps depend on each other
You want to ensure consistency across steps

How to schedule:

Open your project
Click “Project Settings”
Select “Schedule Project Execution”
Choose whether to run all steps or only specific steps
Configure frequency and time
Enable the schedule

Example: Weekly Analysis Pipeline

Project: Weekly Sales Analysis
Steps to run: All (5 steps)
  1. Import data from database
  2. Clean and transform data
  3. Calculate metrics
  4. Generate visualizations
  5. Export to dashboard

Schedule: Every Monday at 6:00 AM EST
Cron: 0 6 * * 1
Timezone: America/New_York

Estimated duration: 15 minutes
Dashboard refresh: 6:30 AM (after project completes)

Conditional Step Execution

Run steps only when certain conditions are met:

Data-Driven Conditions:

Run step only if:
  - New data is available (check last modified timestamp)
  - Row count has changed
  - Specific file exists
  - API returns new records

Example:
Step: Process New Orders
Condition: new_orders_count > 0
If condition is false: Skip step, continue to next

Time-Based Conditions:

Run step only if:
  - It's the last day of the month
  - It's a specific day of week
  - It's within a date range

Example:
Step: Monthly Rollup
Condition: DAY_OF_MONTH = LAST_DAY
Schedule: 0 23 28-31 * * (runs on days 28-31)
Execution logic: Check if today is actually the last day

Dependency-Based Conditions:

Run step only if:
  - Previous step completed successfully
  - Specific flag is set
  - Upstream data source is available

Example:
Step: Calculate Derived Metrics
Depends on: Step 2 (Data Cleaning)
If Step 2 failed: Skip this step, send alert

Data Refresh Automation

Keep your data fresh with automated refresh strategies.

Incremental Refresh

Instead of reprocessing all data, refresh only what’s changed:

Benefits:

Faster execution
Lower resource usage
Reduced database load
More frequent updates possible

Implementation:

Step: Sync Customer Data
Type: Incremental Refresh
Filter: updated_at > ${last_run_timestamp}

On each run:
1. Get timestamp of last successful run
2. Query only records updated since then
3. Merge new/updated records with existing data
4. Update last_run_timestamp

Example query:
SELECT *
FROM customers
WHERE updated_at > '${last_run_timestamp}'
  OR created_at > '${last_run_timestamp}'

Tracking Incremental State:

Querri automatically tracks:
- Last run timestamp
- Last processed record ID
- Checkpoint markers
- Watermarks for streaming data

You can access these in your steps:
${last_run_timestamp}
${last_processed_id}
${checkpoint}

Full Refresh

Replace all data with fresh data:

When to use:

Data source is small
Complete accuracy is critical
Source data may have historical changes
Incremental logic is complex

Implementation:

Step: Load Product Catalog
Type: Full Refresh
Process:
1. Truncate existing table
2. Load all current data
3. Apply transformations
4. Validate completeness

Schedule: Daily at 3:00 AM
Duration: ~5 minutes for 10,000 products

Hybrid Refresh Strategy

Combine full and incremental refresh:

Example Strategy:

Incremental refresh: Every hour
Full refresh: Once per day (overnight)

Hourly incremental:
- Fast updates throughout the day
- Captures recent changes
- May miss historical corrections

Daily full refresh:
- Ensures complete accuracy
- Catches any missed updates
- Serves as data quality check

Smart Refresh

Automatically determine the best refresh strategy:

Querri’s smart refresh logic:

Decision tree:
1. Check data source size
   - If < 1000 rows: Use full refresh
   - If > 1000 rows: Check change rate

2. Check change rate
   - If > 50% of data changes: Use full refresh
   - If < 50% changes: Use incremental refresh

3. Check time since last full refresh
   - If > 7 days: Force full refresh
   - Else: Use determined strategy

4. Monitor and adapt
   - Track execution time
   - Compare full vs incremental accuracy
   - Adjust strategy automatically

Chaining Automated Tasks

Create complex workflows by chaining multiple automated tasks.

Sequential Chains

Tasks run one after another in order:

Example: Daily Reporting Pipeline

Chain: Daily Sales Reporting

Task 1: Import Data (2:00 AM)
  └─> Success: Proceed to Task 2
  └─> Failure: Send alert, stop chain

Task 2: Transform Data (2:15 AM - after Task 1)
  └─> Success: Proceed to Task 3
  └─> Failure: Send alert, stop chain

Task 3: Calculate Metrics (2:30 AM - after Task 2)
  └─> Success: Proceed to Task 4
  └─> Failure: Send alert, stop chain

Task 4: Refresh Dashboard (2:45 AM - after Task 3)
  └─> Success: Send summary email
  └─> Failure: Send alert

Task 5: Send Email Report (3:00 AM - after Task 4)
  └─> Success: Chain complete
  └─> Failure: Log error

Total estimated time: 1 hour

Parallel Chains

Tasks run simultaneously when they don’t depend on each other:

Example: Multi-Source Data Pipeline

Chain: Multi-Source Analytics

Start at 2:00 AM:

├─> Branch 1: Import from Database
│   └─> Duration: 10 minutes
│
├─> Branch 2: Import from API
│   └─> Duration: 15 minutes
│
└─> Branch 3: Import from File Upload
    └─> Duration: 5 minutes

Wait for all branches to complete (by 2:15 AM)

Then: Merge Data (Task 4)
  └─> Depends on: Tasks 1, 2, and 3
  └─> Starts: When all branches complete
  └─> Duration: 10 minutes

Finally: Refresh Dashboards (Task 5)
  └─> Starts: 2:25 AM (after Task 4)

Conditional Chains

Different paths based on results or conditions:

Example: Data Quality Pipeline

Chain: Data Quality Workflow

Task 1: Import Data
  └─> Check data quality

If quality score > 90%:
  ├─> Task 2a: Standard Processing
  └─> Task 3a: Update Production Dashboard

If quality score 70-90%:
  ├─> Task 2b: Extra Validation
  ├─> Task 3b: Send Warning Email
  └─> Task 4b: Update Dashboard with Warning Flag

If quality score < 70%:
  ├─> Task 2c: Reject Data
  ├─> Task 3c: Send Alert Email
  └─> Task 4c: Use Previous Day's Data

All paths converge:
Task 5: Log Results and Archive

Fan-Out and Fan-In Patterns

Fan-Out: One task triggers multiple independent tasks

Task: Process Daily Orders
  ├─> Create invoices
  ├─> Update inventory
  ├─> Send shipping notifications
  ├─> Update customer records
  └─> Generate analytics reports

All run in parallel, no dependencies

Fan-In: Multiple tasks feed into one task

Monthly Close Process:

├─> Calculate Sales Revenue
├─> Calculate Operating Expenses
├─> Calculate Cost of Goods Sold
└─> Calculate Other Income

All feed into:
└─> Generate Financial Statements

Error Handling

Robust error handling ensures your automated workflows recover gracefully from failures.

Retry Logic

Automatically retry failed tasks:

Simple Retry:

Configuration:
  Max retries: 3
  Delay between retries: 5 minutes
  Backoff: None (fixed delay)

Example:
Attempt 1: Fails at 2:00 AM
Attempt 2: Retries at 2:05 AM - Fails
Attempt 3: Retries at 2:10 AM - Fails
Attempt 4: Retries at 2:15 AM - Succeeds

Exponential Backoff:

Configuration:
  Max retries: 5
  Initial delay: 1 minute
  Backoff multiplier: 2

Example:
Attempt 1: Fails at 2:00 AM
Attempt 2: Retries at 2:01 AM (1 min delay) - Fails
Attempt 3: Retries at 2:03 AM (2 min delay) - Fails
Attempt 4: Retries at 2:07 AM (4 min delay) - Fails
Attempt 5: Retries at 2:15 AM (8 min delay) - Fails
Attempt 6: Retries at 2:31 AM (16 min delay) - Succeeds

Good for: Temporary network issues, rate limiting

Intelligent Retry:

Configuration:
  Retry based on error type

Transient errors (retry):
  - Network timeout
  - Database connection error
  - Rate limit exceeded
  - Temporary service unavailable

Permanent errors (don't retry):
  - Authentication failure
  - Permission denied
  - Invalid query syntax
  - Data validation error

Example:
Error: "Connection timeout"
Action: Retry with backoff

Error: "Invalid SQL syntax"
Action: Don't retry, send alert immediately

Fallback Strategies

What to do when retries are exhausted:

Use Previous Data:

If current data fetch fails:
  └─> Use yesterday's data
  └─> Add warning flag
  └─> Note in dashboard: "Data from previous day"

Good for: Daily reports where stale data is better than no data

Use Default Values:

If metric calculation fails:
  └─> Use default or average value
  └─> Mark as estimated
  └─> Schedule manual review

Good for: Non-critical metrics, forecasting

Skip and Continue:

If optional step fails:
  └─> Log the error
  └─> Continue with rest of pipeline
  └─> Generate report with partial data

Good for: Optional enrichment, bonus visualizations

Stop and Alert:

If critical step fails:
  └─> Stop entire pipeline
  └─> Send immediate alert
  └─> Don't update dashboards with partial data
  └─> Wait for manual intervention

Good for: Financial reports, compliance data, critical operations

Error Notifications

Stay informed when things go wrong:

Alert Levels:

Info: Step took longer than usual (send daily summary)
Warning: Step failed but will retry (send if not resolved in 1 hour)
Error: Step failed after retries (send immediately)
Critical: Entire pipeline failed (send immediately + SMS)

Notification Content:

Subject: [ERROR] Daily Sales Pipeline Failed

Pipeline: Daily Sales Reporting
Step: Import Sales Data
Failed at: 2:15 AM EST
Error message: Connection timeout to sales database
Retry count: 3 (exhausted)
Impact: Dashboard not updated, report not sent
Last successful run: Yesterday 2:00 AM

Action required:
1. Check database connectivity
2. Verify credentials
3. Manually trigger pipeline once resolved

View logs: ${log_url}
View pipeline: ${pipeline_url}

Escalation:

Failure detected: 2:15 AM
  └─> Send email to on-call engineer

Still failing after 30 minutes (2:45 AM):
  └─> Send email to manager
  └─> Create support ticket

Still failing after 1 hour (3:15 AM):
  └─> Send SMS to on-call engineer
  └─> Escalate to senior engineer

Still failing after 2 hours (4:15 AM):
  └─> Page team lead
  └─> Create incident

Error Recovery

Recovering from failures gracefully:

Checkpoint and Resume:

Process: Load 1 million customer records

Every 10,000 records:
  └─> Save checkpoint

If process fails at record 75,432:
  └─> Resume from checkpoint 70,000
  └─> Don't reprocess records 1-70,000

Benefits:
  - Faster recovery
  - No duplicate processing
  - Incremental progress

Transaction Rollback:

Pipeline: Update Customer Database

Begin transaction
  └─> Step 1: Update customer records
  └─> Step 2: Update order records
  └─> Step 3: Update inventory

If any step fails:
  └─> Rollback all changes
  └─> Database returns to state before pipeline started

Ensures: Data consistency, no partial updates

Compensation Actions:

Forward process failed:
  └─> Execute reverse process

Example:
  Uploaded file → Process failed
  Compensation: Delete uploaded file, clean temporary tables

Example:
  Sent notification → Processing failed
  Compensation: Send "Correction" notification

Monitoring Recurring Analysis

Track the health of your automated analysis:

Key Metrics

Execution Metrics:

Success rate (% of successful runs)
Average execution time
Peak execution time
Execution time trend

Data Metrics:

Records processed per run
Data freshness (time since last update)
Data quality scores
Anomaly detection

Resource Metrics:

CPU usage
Memory usage
Database query count
API call count

Performance Optimization

Identify Bottlenecks:

Pipeline: Daily Sales Analysis
Total time: 45 minutes

Step 1: Import data - 5 min (11%)
Step 2: Transform data - 35 min (78%) ← Bottleneck
Step 3: Calculate metrics - 3 min (7%)
Step 4: Update dashboard - 2 min (4%)

Optimization target: Step 2

Optimization Strategies:

For slow data import:
  - Add database indexes
  - Use incremental instead of full refresh
  - Parallelize imports from multiple sources

For slow transformations:
  - Optimize SQL queries
  - Use materialized views
  - Cache intermediate results
  - Process in batches

For resource constraints:
  - Schedule during off-peak hours
  - Increase allocated resources
  - Distribute work across multiple workers

Best Practices

Start simple: Begin with single-step automation, add complexity gradually
Test thoroughly: Run manually multiple times before automating
Handle errors gracefully: Always have retry logic and fallback strategies
Monitor actively: Set up alerts and check dashboards regularly
Document everything: Note why automation was set up and how it works
Use checkpoints: For long-running processes, save progress frequently
Validate output: Add data quality checks to automated pipelines
Schedule wisely: Avoid peak hours, allow buffer time between dependent tasks
Version your logic: Keep track of changes to automated steps
Plan for maintenance: Schedule downtime for updates and improvements

Next Steps

Scheduling Basics - Master scheduling for your recurring analysis
Monitoring Automations - Track and optimize your automated workflows
Automated Reports - Combine recurring analysis with automated reporting

Build robust, reliable automated data pipelines with Querri!