Skip to content

Recurring Analysis

Recurring analysis allows you to automate data pipelines, ensuring your analysis stays up-to-date without manual intervention. This guide covers how to schedule step execution, automate data refreshes, chain tasks together, and handle errors gracefully.

In Querri, projects consist of steps that transform and analyze data. You can schedule individual steps or entire projects to run automatically.

Benefits:

  • Always fresh data: Your analysis updates automatically
  • Save time: No manual re-running of steps
  • Consistency: Same process runs every time
  • Early detection: Catch data issues quickly
  • Pipeline automation: Build end-to-end data workflows

Use Cases:

  • Daily ETL (Extract, Transform, Load) processes
  • Hourly data synchronization
  • Weekly report data preparation
  • Monthly aggregation and rollups
  • Real-time monitoring and alerts

When to use:

  • Step has expensive computation
  • Step pulls data from external source
  • Step needs to run at specific times
  • Step feeds into a dashboard

How to schedule:

  1. Open your project
  2. Navigate to the step you want to automate
  3. Click on “Step Settings” (gear icon)
  4. Select “Schedule Execution”
  5. Configure frequency and time
  6. Enable the schedule

Example: Daily Data Import

Step: Import Sales Data from Database
Schedule: Every day at 2:00 AM EST
Cron: 0 2 * * *
Timezone: America/New_York
Why 2 AM?
- Database load is low
- Data from previous day is complete
- Finished before morning dashboards refresh

When to use:

  • Multiple steps need to run in sequence
  • You have a complete data pipeline
  • All steps depend on each other
  • You want to ensure consistency across steps

How to schedule:

  1. Open your project
  2. Click “Project Settings”
  3. Select “Schedule Project Execution”
  4. Choose whether to run all steps or only specific steps
  5. Configure frequency and time
  6. Enable the schedule

Example: Weekly Analysis Pipeline

Project: Weekly Sales Analysis
Steps to run: All (5 steps)
1. Import data from database
2. Clean and transform data
3. Calculate metrics
4. Generate visualizations
5. Export to dashboard
Schedule: Every Monday at 6:00 AM EST
Cron: 0 6 * * 1
Timezone: America/New_York
Estimated duration: 15 minutes
Dashboard refresh: 6:30 AM (after project completes)

Run steps only when certain conditions are met:

Data-Driven Conditions:

Run step only if:
- New data is available (check last modified timestamp)
- Row count has changed
- Specific file exists
- API returns new records
Example:
Step: Process New Orders
Condition: new_orders_count > 0
If condition is false: Skip step, continue to next

Time-Based Conditions:

Run step only if:
- It's the last day of the month
- It's a specific day of week
- It's within a date range
Example:
Step: Monthly Rollup
Condition: DAY_OF_MONTH = LAST_DAY
Schedule: 0 23 28-31 * * (runs on days 28-31)
Execution logic: Check if today is actually the last day

Dependency-Based Conditions:

Run step only if:
- Previous step completed successfully
- Specific flag is set
- Upstream data source is available
Example:
Step: Calculate Derived Metrics
Depends on: Step 2 (Data Cleaning)
If Step 2 failed: Skip this step, send alert

Keep your data fresh with automated refresh strategies.

Instead of reprocessing all data, refresh only what’s changed:

Benefits:

  • Faster execution
  • Lower resource usage
  • Reduced database load
  • More frequent updates possible

Implementation:

Step: Sync Customer Data
Type: Incremental Refresh
Filter: updated_at > ${last_run_timestamp}
On each run:
1. Get timestamp of last successful run
2. Query only records updated since then
3. Merge new/updated records with existing data
4. Update last_run_timestamp
Example query:
SELECT *
FROM customers
WHERE updated_at > '${last_run_timestamp}'
OR created_at > '${last_run_timestamp}'

Tracking Incremental State:

Querri automatically tracks:
- Last run timestamp
- Last processed record ID
- Checkpoint markers
- Watermarks for streaming data
You can access these in your steps:
${last_run_timestamp}
${last_processed_id}
${checkpoint}

Replace all data with fresh data:

When to use:

  • Data source is small
  • Complete accuracy is critical
  • Source data may have historical changes
  • Incremental logic is complex

Implementation:

Step: Load Product Catalog
Type: Full Refresh
Process:
1. Truncate existing table
2. Load all current data
3. Apply transformations
4. Validate completeness
Schedule: Daily at 3:00 AM
Duration: ~5 minutes for 10,000 products

Combine full and incremental refresh:

Example Strategy:

Incremental refresh: Every hour
Full refresh: Once per day (overnight)
Hourly incremental:
- Fast updates throughout the day
- Captures recent changes
- May miss historical corrections
Daily full refresh:
- Ensures complete accuracy
- Catches any missed updates
- Serves as data quality check

Automatically determine the best refresh strategy:

Querri’s smart refresh logic:

Decision tree:
1. Check data source size
- If < 1000 rows: Use full refresh
- If > 1000 rows: Check change rate
2. Check change rate
- If > 50% of data changes: Use full refresh
- If < 50% changes: Use incremental refresh
3. Check time since last full refresh
- If > 7 days: Force full refresh
- Else: Use determined strategy
4. Monitor and adapt
- Track execution time
- Compare full vs incremental accuracy
- Adjust strategy automatically

Create complex workflows by chaining multiple automated tasks.

Tasks run one after another in order:

Example: Daily Reporting Pipeline

Chain: Daily Sales Reporting
Task 1: Import Data (2:00 AM)
└─> Success: Proceed to Task 2
└─> Failure: Send alert, stop chain
Task 2: Transform Data (2:15 AM - after Task 1)
└─> Success: Proceed to Task 3
└─> Failure: Send alert, stop chain
Task 3: Calculate Metrics (2:30 AM - after Task 2)
└─> Success: Proceed to Task 4
└─> Failure: Send alert, stop chain
Task 4: Refresh Dashboard (2:45 AM - after Task 3)
└─> Success: Send summary email
└─> Failure: Send alert
Task 5: Send Email Report (3:00 AM - after Task 4)
└─> Success: Chain complete
└─> Failure: Log error
Total estimated time: 1 hour

Tasks run simultaneously when they don’t depend on each other:

Example: Multi-Source Data Pipeline

Chain: Multi-Source Analytics
Start at 2:00 AM:
├─> Branch 1: Import from Database
│ └─> Duration: 10 minutes
├─> Branch 2: Import from API
│ └─> Duration: 15 minutes
└─> Branch 3: Import from File Upload
└─> Duration: 5 minutes
Wait for all branches to complete (by 2:15 AM)
Then: Merge Data (Task 4)
└─> Depends on: Tasks 1, 2, and 3
└─> Starts: When all branches complete
└─> Duration: 10 minutes
Finally: Refresh Dashboards (Task 5)
└─> Starts: 2:25 AM (after Task 4)

Different paths based on results or conditions:

Example: Data Quality Pipeline

Chain: Data Quality Workflow
Task 1: Import Data
└─> Check data quality
If quality score > 90%:
├─> Task 2a: Standard Processing
└─> Task 3a: Update Production Dashboard
If quality score 70-90%:
├─> Task 2b: Extra Validation
├─> Task 3b: Send Warning Email
└─> Task 4b: Update Dashboard with Warning Flag
If quality score < 70%:
├─> Task 2c: Reject Data
├─> Task 3c: Send Alert Email
└─> Task 4c: Use Previous Day's Data
All paths converge:
Task 5: Log Results and Archive

Fan-Out: One task triggers multiple independent tasks

Task: Process Daily Orders
├─> Create invoices
├─> Update inventory
├─> Send shipping notifications
├─> Update customer records
└─> Generate analytics reports
All run in parallel, no dependencies

Fan-In: Multiple tasks feed into one task

Monthly Close Process:
├─> Calculate Sales Revenue
├─> Calculate Operating Expenses
├─> Calculate Cost of Goods Sold
└─> Calculate Other Income
All feed into:
└─> Generate Financial Statements

Robust error handling ensures your automated workflows recover gracefully from failures.

Automatically retry failed tasks:

Simple Retry:

Configuration:
Max retries: 3
Delay between retries: 5 minutes
Backoff: None (fixed delay)
Example:
Attempt 1: Fails at 2:00 AM
Attempt 2: Retries at 2:05 AM - Fails
Attempt 3: Retries at 2:10 AM - Fails
Attempt 4: Retries at 2:15 AM - Succeeds

Exponential Backoff:

Configuration:
Max retries: 5
Initial delay: 1 minute
Backoff multiplier: 2
Example:
Attempt 1: Fails at 2:00 AM
Attempt 2: Retries at 2:01 AM (1 min delay) - Fails
Attempt 3: Retries at 2:03 AM (2 min delay) - Fails
Attempt 4: Retries at 2:07 AM (4 min delay) - Fails
Attempt 5: Retries at 2:15 AM (8 min delay) - Fails
Attempt 6: Retries at 2:31 AM (16 min delay) - Succeeds
Good for: Temporary network issues, rate limiting

Intelligent Retry:

Configuration:
Retry based on error type
Transient errors (retry):
- Network timeout
- Database connection error
- Rate limit exceeded
- Temporary service unavailable
Permanent errors (don't retry):
- Authentication failure
- Permission denied
- Invalid query syntax
- Data validation error
Example:
Error: "Connection timeout"
Action: Retry with backoff
Error: "Invalid SQL syntax"
Action: Don't retry, send alert immediately

What to do when retries are exhausted:

Use Previous Data:

If current data fetch fails:
└─> Use yesterday's data
└─> Add warning flag
└─> Note in dashboard: "Data from previous day"
Good for: Daily reports where stale data is better than no data

Use Default Values:

If metric calculation fails:
└─> Use default or average value
└─> Mark as estimated
└─> Schedule manual review
Good for: Non-critical metrics, forecasting

Skip and Continue:

If optional step fails:
└─> Log the error
└─> Continue with rest of pipeline
└─> Generate report with partial data
Good for: Optional enrichment, bonus visualizations

Stop and Alert:

If critical step fails:
└─> Stop entire pipeline
└─> Send immediate alert
└─> Don't update dashboards with partial data
└─> Wait for manual intervention
Good for: Financial reports, compliance data, critical operations

Stay informed when things go wrong:

Alert Levels:

Info: Step took longer than usual (send daily summary)
Warning: Step failed but will retry (send if not resolved in 1 hour)
Error: Step failed after retries (send immediately)
Critical: Entire pipeline failed (send immediately + SMS)

Notification Content:

Subject: [ERROR] Daily Sales Pipeline Failed
Pipeline: Daily Sales Reporting
Step: Import Sales Data
Failed at: 2:15 AM EST
Error message: Connection timeout to sales database
Retry count: 3 (exhausted)
Impact: Dashboard not updated, report not sent
Last successful run: Yesterday 2:00 AM
Action required:
1. Check database connectivity
2. Verify credentials
3. Manually trigger pipeline once resolved
View logs: ${log_url}
View pipeline: ${pipeline_url}

Escalation:

Failure detected: 2:15 AM
└─> Send email to on-call engineer
Still failing after 30 minutes (2:45 AM):
└─> Send email to manager
└─> Create support ticket
Still failing after 1 hour (3:15 AM):
└─> Send SMS to on-call engineer
└─> Escalate to senior engineer
Still failing after 2 hours (4:15 AM):
└─> Page team lead
└─> Create incident

Recovering from failures gracefully:

Checkpoint and Resume:

Process: Load 1 million customer records
Every 10,000 records:
└─> Save checkpoint
If process fails at record 75,432:
└─> Resume from checkpoint 70,000
└─> Don't reprocess records 1-70,000
Benefits:
- Faster recovery
- No duplicate processing
- Incremental progress

Transaction Rollback:

Pipeline: Update Customer Database
Begin transaction
└─> Step 1: Update customer records
└─> Step 2: Update order records
└─> Step 3: Update inventory
If any step fails:
└─> Rollback all changes
└─> Database returns to state before pipeline started
Ensures: Data consistency, no partial updates

Compensation Actions:

Forward process failed:
└─> Execute reverse process
Example:
Uploaded file → Process failed
Compensation: Delete uploaded file, clean temporary tables
Example:
Sent notification → Processing failed
Compensation: Send "Correction" notification

Track the health of your automated analysis:

Execution Metrics:

  • Success rate (% of successful runs)
  • Average execution time
  • Peak execution time
  • Execution time trend

Data Metrics:

  • Records processed per run
  • Data freshness (time since last update)
  • Data quality scores
  • Anomaly detection

Resource Metrics:

  • CPU usage
  • Memory usage
  • Database query count
  • API call count

Identify Bottlenecks:

Pipeline: Daily Sales Analysis
Total time: 45 minutes
Step 1: Import data - 5 min (11%)
Step 2: Transform data - 35 min (78%) ← Bottleneck
Step 3: Calculate metrics - 3 min (7%)
Step 4: Update dashboard - 2 min (4%)
Optimization target: Step 2

Optimization Strategies:

For slow data import:
- Add database indexes
- Use incremental instead of full refresh
- Parallelize imports from multiple sources
For slow transformations:
- Optimize SQL queries
- Use materialized views
- Cache intermediate results
- Process in batches
For resource constraints:
- Schedule during off-peak hours
- Increase allocated resources
- Distribute work across multiple workers
  1. Start simple: Begin with single-step automation, add complexity gradually
  2. Test thoroughly: Run manually multiple times before automating
  3. Handle errors gracefully: Always have retry logic and fallback strategies
  4. Monitor actively: Set up alerts and check dashboards regularly
  5. Document everything: Note why automation was set up and how it works
  6. Use checkpoints: For long-running processes, save progress frequently
  7. Validate output: Add data quality checks to automated pipelines
  8. Schedule wisely: Avoid peak hours, allow buffer time between dependent tasks
  9. Version your logic: Keep track of changes to automated steps
  10. Plan for maintenance: Schedule downtime for updates and improvements

Build robust, reliable automated data pipelines with Querri!