Recurring Analysis
Recurring Analysis
Section titled “Recurring Analysis”Recurring analysis allows you to automate data pipelines, ensuring your analysis stays up-to-date without manual intervention. This guide covers how to schedule step execution, automate data refreshes, chain tasks together, and handle errors gracefully.
Scheduling Step Execution
Section titled “Scheduling Step Execution”In Querri, projects consist of steps that transform and analyze data. You can schedule individual steps or entire projects to run automatically.
Why Schedule Step Execution?
Section titled “Why Schedule Step Execution?”Benefits:
- Always fresh data: Your analysis updates automatically
- Save time: No manual re-running of steps
- Consistency: Same process runs every time
- Early detection: Catch data issues quickly
- Pipeline automation: Build end-to-end data workflows
Use Cases:
- Daily ETL (Extract, Transform, Load) processes
- Hourly data synchronization
- Weekly report data preparation
- Monthly aggregation and rollups
- Real-time monitoring and alerts
Scheduling a Single Step
Section titled “Scheduling a Single Step”When to use:
- Step has expensive computation
- Step pulls data from external source
- Step needs to run at specific times
- Step feeds into a dashboard
How to schedule:
- Open your project
- Navigate to the step you want to automate
- Click on “Step Settings” (gear icon)
- Select “Schedule Execution”
- Configure frequency and time
- Enable the schedule
Example: Daily Data Import
Step: Import Sales Data from DatabaseSchedule: Every day at 2:00 AM ESTCron: 0 2 * * *Timezone: America/New_York
Why 2 AM?- Database load is low- Data from previous day is complete- Finished before morning dashboards refreshScheduling an Entire Project
Section titled “Scheduling an Entire Project”When to use:
- Multiple steps need to run in sequence
- You have a complete data pipeline
- All steps depend on each other
- You want to ensure consistency across steps
How to schedule:
- Open your project
- Click “Project Settings”
- Select “Schedule Project Execution”
- Choose whether to run all steps or only specific steps
- Configure frequency and time
- Enable the schedule
Example: Weekly Analysis Pipeline
Project: Weekly Sales AnalysisSteps to run: All (5 steps) 1. Import data from database 2. Clean and transform data 3. Calculate metrics 4. Generate visualizations 5. Export to dashboard
Schedule: Every Monday at 6:00 AM ESTCron: 0 6 * * 1Timezone: America/New_York
Estimated duration: 15 minutesDashboard refresh: 6:30 AM (after project completes)Conditional Step Execution
Section titled “Conditional Step Execution”Run steps only when certain conditions are met:
Data-Driven Conditions:
Run step only if: - New data is available (check last modified timestamp) - Row count has changed - Specific file exists - API returns new records
Example:Step: Process New OrdersCondition: new_orders_count > 0If condition is false: Skip step, continue to nextTime-Based Conditions:
Run step only if: - It's the last day of the month - It's a specific day of week - It's within a date range
Example:Step: Monthly RollupCondition: DAY_OF_MONTH = LAST_DAYSchedule: 0 23 28-31 * * (runs on days 28-31)Execution logic: Check if today is actually the last dayDependency-Based Conditions:
Run step only if: - Previous step completed successfully - Specific flag is set - Upstream data source is available
Example:Step: Calculate Derived MetricsDepends on: Step 2 (Data Cleaning)If Step 2 failed: Skip this step, send alertData Refresh Automation
Section titled “Data Refresh Automation”Keep your data fresh with automated refresh strategies.
Incremental Refresh
Section titled “Incremental Refresh”Instead of reprocessing all data, refresh only what’s changed:
Benefits:
- Faster execution
- Lower resource usage
- Reduced database load
- More frequent updates possible
Implementation:
Step: Sync Customer DataType: Incremental RefreshFilter: updated_at > ${last_run_timestamp}
On each run:1. Get timestamp of last successful run2. Query only records updated since then3. Merge new/updated records with existing data4. Update last_run_timestamp
Example query:SELECT *FROM customersWHERE updated_at > '${last_run_timestamp}' OR created_at > '${last_run_timestamp}'Tracking Incremental State:
Querri automatically tracks:- Last run timestamp- Last processed record ID- Checkpoint markers- Watermarks for streaming data
You can access these in your steps:${last_run_timestamp}${last_processed_id}${checkpoint}Full Refresh
Section titled “Full Refresh”Replace all data with fresh data:
When to use:
- Data source is small
- Complete accuracy is critical
- Source data may have historical changes
- Incremental logic is complex
Implementation:
Step: Load Product CatalogType: Full RefreshProcess:1. Truncate existing table2. Load all current data3. Apply transformations4. Validate completeness
Schedule: Daily at 3:00 AMDuration: ~5 minutes for 10,000 productsHybrid Refresh Strategy
Section titled “Hybrid Refresh Strategy”Combine full and incremental refresh:
Example Strategy:
Incremental refresh: Every hourFull refresh: Once per day (overnight)
Hourly incremental:- Fast updates throughout the day- Captures recent changes- May miss historical corrections
Daily full refresh:- Ensures complete accuracy- Catches any missed updates- Serves as data quality checkSmart Refresh
Section titled “Smart Refresh”Automatically determine the best refresh strategy:
Querri’s smart refresh logic:
Decision tree:1. Check data source size - If < 1000 rows: Use full refresh - If > 1000 rows: Check change rate
2. Check change rate - If > 50% of data changes: Use full refresh - If < 50% changes: Use incremental refresh
3. Check time since last full refresh - If > 7 days: Force full refresh - Else: Use determined strategy
4. Monitor and adapt - Track execution time - Compare full vs incremental accuracy - Adjust strategy automaticallyChaining Automated Tasks
Section titled “Chaining Automated Tasks”Create complex workflows by chaining multiple automated tasks.
Sequential Chains
Section titled “Sequential Chains”Tasks run one after another in order:
Example: Daily Reporting Pipeline
Chain: Daily Sales Reporting
Task 1: Import Data (2:00 AM) └─> Success: Proceed to Task 2 └─> Failure: Send alert, stop chain
Task 2: Transform Data (2:15 AM - after Task 1) └─> Success: Proceed to Task 3 └─> Failure: Send alert, stop chain
Task 3: Calculate Metrics (2:30 AM - after Task 2) └─> Success: Proceed to Task 4 └─> Failure: Send alert, stop chain
Task 4: Refresh Dashboard (2:45 AM - after Task 3) └─> Success: Send summary email └─> Failure: Send alert
Task 5: Send Email Report (3:00 AM - after Task 4) └─> Success: Chain complete └─> Failure: Log error
Total estimated time: 1 hourParallel Chains
Section titled “Parallel Chains”Tasks run simultaneously when they don’t depend on each other:
Example: Multi-Source Data Pipeline
Chain: Multi-Source Analytics
Start at 2:00 AM:
├─> Branch 1: Import from Database│ └─> Duration: 10 minutes│├─> Branch 2: Import from API│ └─> Duration: 15 minutes│└─> Branch 3: Import from File Upload └─> Duration: 5 minutes
Wait for all branches to complete (by 2:15 AM)
Then: Merge Data (Task 4) └─> Depends on: Tasks 1, 2, and 3 └─> Starts: When all branches complete └─> Duration: 10 minutes
Finally: Refresh Dashboards (Task 5) └─> Starts: 2:25 AM (after Task 4)Conditional Chains
Section titled “Conditional Chains”Different paths based on results or conditions:
Example: Data Quality Pipeline
Chain: Data Quality Workflow
Task 1: Import Data └─> Check data quality
If quality score > 90%: ├─> Task 2a: Standard Processing └─> Task 3a: Update Production Dashboard
If quality score 70-90%: ├─> Task 2b: Extra Validation ├─> Task 3b: Send Warning Email └─> Task 4b: Update Dashboard with Warning Flag
If quality score < 70%: ├─> Task 2c: Reject Data ├─> Task 3c: Send Alert Email └─> Task 4c: Use Previous Day's Data
All paths converge:Task 5: Log Results and ArchiveFan-Out and Fan-In Patterns
Section titled “Fan-Out and Fan-In Patterns”Fan-Out: One task triggers multiple independent tasks
Task: Process Daily Orders ├─> Create invoices ├─> Update inventory ├─> Send shipping notifications ├─> Update customer records └─> Generate analytics reports
All run in parallel, no dependenciesFan-In: Multiple tasks feed into one task
Monthly Close Process:
├─> Calculate Sales Revenue├─> Calculate Operating Expenses├─> Calculate Cost of Goods Sold└─> Calculate Other Income
All feed into:└─> Generate Financial StatementsError Handling
Section titled “Error Handling”Robust error handling ensures your automated workflows recover gracefully from failures.
Retry Logic
Section titled “Retry Logic”Automatically retry failed tasks:
Simple Retry:
Configuration: Max retries: 3 Delay between retries: 5 minutes Backoff: None (fixed delay)
Example:Attempt 1: Fails at 2:00 AMAttempt 2: Retries at 2:05 AM - FailsAttempt 3: Retries at 2:10 AM - FailsAttempt 4: Retries at 2:15 AM - SucceedsExponential Backoff:
Configuration: Max retries: 5 Initial delay: 1 minute Backoff multiplier: 2
Example:Attempt 1: Fails at 2:00 AMAttempt 2: Retries at 2:01 AM (1 min delay) - FailsAttempt 3: Retries at 2:03 AM (2 min delay) - FailsAttempt 4: Retries at 2:07 AM (4 min delay) - FailsAttempt 5: Retries at 2:15 AM (8 min delay) - FailsAttempt 6: Retries at 2:31 AM (16 min delay) - Succeeds
Good for: Temporary network issues, rate limitingIntelligent Retry:
Configuration: Retry based on error type
Transient errors (retry): - Network timeout - Database connection error - Rate limit exceeded - Temporary service unavailable
Permanent errors (don't retry): - Authentication failure - Permission denied - Invalid query syntax - Data validation error
Example:Error: "Connection timeout"Action: Retry with backoff
Error: "Invalid SQL syntax"Action: Don't retry, send alert immediatelyFallback Strategies
Section titled “Fallback Strategies”What to do when retries are exhausted:
Use Previous Data:
If current data fetch fails: └─> Use yesterday's data └─> Add warning flag └─> Note in dashboard: "Data from previous day"
Good for: Daily reports where stale data is better than no dataUse Default Values:
If metric calculation fails: └─> Use default or average value └─> Mark as estimated └─> Schedule manual review
Good for: Non-critical metrics, forecastingSkip and Continue:
If optional step fails: └─> Log the error └─> Continue with rest of pipeline └─> Generate report with partial data
Good for: Optional enrichment, bonus visualizationsStop and Alert:
If critical step fails: └─> Stop entire pipeline └─> Send immediate alert └─> Don't update dashboards with partial data └─> Wait for manual intervention
Good for: Financial reports, compliance data, critical operationsError Notifications
Section titled “Error Notifications”Stay informed when things go wrong:
Alert Levels:
Info: Step took longer than usual (send daily summary)Warning: Step failed but will retry (send if not resolved in 1 hour)Error: Step failed after retries (send immediately)Critical: Entire pipeline failed (send immediately + SMS)Notification Content:
Subject: [ERROR] Daily Sales Pipeline Failed
Pipeline: Daily Sales ReportingStep: Import Sales DataFailed at: 2:15 AM ESTError message: Connection timeout to sales databaseRetry count: 3 (exhausted)Impact: Dashboard not updated, report not sentLast successful run: Yesterday 2:00 AM
Action required:1. Check database connectivity2. Verify credentials3. Manually trigger pipeline once resolved
View logs: ${log_url}View pipeline: ${pipeline_url}Escalation:
Failure detected: 2:15 AM └─> Send email to on-call engineer
Still failing after 30 minutes (2:45 AM): └─> Send email to manager └─> Create support ticket
Still failing after 1 hour (3:15 AM): └─> Send SMS to on-call engineer └─> Escalate to senior engineer
Still failing after 2 hours (4:15 AM): └─> Page team lead └─> Create incidentError Recovery
Section titled “Error Recovery”Recovering from failures gracefully:
Checkpoint and Resume:
Process: Load 1 million customer records
Every 10,000 records: └─> Save checkpoint
If process fails at record 75,432: └─> Resume from checkpoint 70,000 └─> Don't reprocess records 1-70,000
Benefits: - Faster recovery - No duplicate processing - Incremental progressTransaction Rollback:
Pipeline: Update Customer Database
Begin transaction └─> Step 1: Update customer records └─> Step 2: Update order records └─> Step 3: Update inventory
If any step fails: └─> Rollback all changes └─> Database returns to state before pipeline started
Ensures: Data consistency, no partial updatesCompensation Actions:
Forward process failed: └─> Execute reverse process
Example: Uploaded file → Process failed Compensation: Delete uploaded file, clean temporary tables
Example: Sent notification → Processing failed Compensation: Send "Correction" notificationMonitoring Recurring Analysis
Section titled “Monitoring Recurring Analysis”Track the health of your automated analysis:
Key Metrics
Section titled “Key Metrics”Execution Metrics:
- Success rate (% of successful runs)
- Average execution time
- Peak execution time
- Execution time trend
Data Metrics:
- Records processed per run
- Data freshness (time since last update)
- Data quality scores
- Anomaly detection
Resource Metrics:
- CPU usage
- Memory usage
- Database query count
- API call count
Performance Optimization
Section titled “Performance Optimization”Identify Bottlenecks:
Pipeline: Daily Sales AnalysisTotal time: 45 minutes
Step 1: Import data - 5 min (11%)Step 2: Transform data - 35 min (78%) ← BottleneckStep 3: Calculate metrics - 3 min (7%)Step 4: Update dashboard - 2 min (4%)
Optimization target: Step 2Optimization Strategies:
For slow data import: - Add database indexes - Use incremental instead of full refresh - Parallelize imports from multiple sources
For slow transformations: - Optimize SQL queries - Use materialized views - Cache intermediate results - Process in batches
For resource constraints: - Schedule during off-peak hours - Increase allocated resources - Distribute work across multiple workersBest Practices
Section titled “Best Practices”- Start simple: Begin with single-step automation, add complexity gradually
- Test thoroughly: Run manually multiple times before automating
- Handle errors gracefully: Always have retry logic and fallback strategies
- Monitor actively: Set up alerts and check dashboards regularly
- Document everything: Note why automation was set up and how it works
- Use checkpoints: For long-running processes, save progress frequently
- Validate output: Add data quality checks to automated pipelines
- Schedule wisely: Avoid peak hours, allow buffer time between dependent tasks
- Version your logic: Keep track of changes to automated steps
- Plan for maintenance: Schedule downtime for updates and improvements
Next Steps
Section titled “Next Steps”- Scheduling Basics - Master scheduling for your recurring analysis
- Monitoring Automations - Track and optimize your automated workflows
- Automated Reports - Combine recurring analysis with automated reporting
Build robust, reliable automated data pipelines with Querri!