How to Connect Your Data
Real analysis often requires combining data from multiple sources. Whether you need to match customers to their orders, merge monthly reports into one view, or enrich your data with classifications, Querri makes connecting data simple through natural language prompts.
This guide covers two fundamental ways to connect data:
- Joining — matching rows from different datasets based on a shared key
- Stacking — combining similar datasets with the same columns into one
Joining Data: Matching on a Key
Section titled “Joining Data: Matching on a Key”Joining connects rows from two datasets when they share a common value—like a customer ID, product code, or email address.
When to Use Joins
Section titled “When to Use Joins”| Scenario | Example |
|---|---|
| Add details to your data | Match orders to customer info |
| Filter based on another list | Keep only products in your catalog |
| Enrich with classifications | Add industry labels to companies |
| Combine related tables | Connect invoices to payments |
How to Think About Joins
Section titled “How to Think About Joins”Imagine you have two lists:
- List A: Your main data (e.g., 10,000 orders)
- List B: Reference data you want to add (e.g., 500 customers)
A join finds matching rows and combines them. The key question is: what happens when there’s no match?
| Join Type | Plain English | When to Use |
|---|---|---|
| Left Join | ”Keep all of List A, add matching info from List B” | You want all your main records, even if some don’t match |
| Inner Join | ”Only keep rows that exist in both lists” | You only want records that have matches on both sides |
| Full Join | ”Keep everything from both lists” | You want a complete picture of all data |
Example Prompts for Joining
Section titled “Example Prompts for Joining”Basic join — add customer details to orders:
"Join the orders data with customer info on customer_id"Left join — keep all orders even if customer is unknown:
"Add customer details to orders, keeping all orders even if there's no customer match"Inner join — only orders with known customers:
"Show me only orders where we have customer information"Join on multiple keys:
"Match sales to inventory on both product_id and warehouse_id"Join with different column names:
"Join orders to customers where orders.cust_id matches customers.customer_id"Joining to Enrich Data
Section titled “Joining to Enrich Data”A common pattern is joining to add classifications or lookups:
"I have a list of companies with industries I classified.Join those industry labels back to my full transaction data on company_name"This is especially powerful after using the Researcher tool to classify a subset of your data.
Fuzzy Matching: When Keys Don’t Match Exactly
Section titled “Fuzzy Matching: When Keys Don’t Match Exactly”Real-world data is messy. “Acme Corp”, “ACME Corporation”, and “Acme Corp.” are the same company, but won’t match in a standard join.
Querri’s Data Query tool supports fuzzy matching using text similarity functions.
When to Use Fuzzy Matching
Section titled “When to Use Fuzzy Matching”| Scenario | Example |
|---|---|
| Name variations | ”John Smith” vs “Smith, John” |
| Company name differences | ”Microsoft” vs “Microsoft Corporation” |
| Typos in data entry | ”Califronia” vs “California” |
| Address matching | ”123 Main St” vs “123 Main Street” |
Fuzzy Matching Example Prompts
Section titled “Fuzzy Matching Example Prompts”Match companies with similar names:
"Join these two company lists, matching on company name with fuzzy matching.Use a similarity threshold of 0.8 to catch spelling variations."Find duplicate customers:
"Find potential duplicate customers where names are similar but not identical.Show me pairs with similarity above 0.85 so I can review them."Match addresses approximately:
"Match shipping addresses to our warehouse locations,allowing for minor differences in street abbreviations"Similarity Thresholds Guide
Section titled “Similarity Thresholds Guide”Different data types need different thresholds:
| Data Type | Recommended Threshold | Notes |
|---|---|---|
| Person names | 0.85 - 0.95 | Higher to avoid false matches |
| Company names | 0.80 - 0.90 | Allow for Inc/Corp/LLC variations |
| Addresses | 0.70 - 0.85 | Street abbreviations vary widely |
| Product names | 0.75 - 0.85 | Model numbers need precision |
| Catching typos | 0.90+ | Very similar strings only |
Tips for Better Fuzzy Matches
Section titled “Tips for Better Fuzzy Matches”-
Standardize first — Remove extra spaces, convert to lowercase
"Clean up company names by trimming whitespace and converting to uppercase,then do the fuzzy match" -
Add exact filters — Reduce comparison volume with exact criteria
"Fuzzy match customer names, but only compare customers in the same state" -
Review results — Fuzzy matching can produce false positives
"Show me the fuzzy matches with similarity scores so I can verify them"
Stacking Data: Combining Similar Datasets
Section titled “Stacking Data: Combining Similar Datasets”Stacking (also called concatenating or appending) combines datasets that have the same structure—like monthly reports, regional data, or yearly exports.
When to Use Stacking
Section titled “When to Use Stacking”| Scenario | Example |
|---|---|
| Combine time periods | January + February + March sales |
| Merge regional data | North + South + East + West territories |
| Consolidate file exports | Multiple CSV exports from same system |
| Append new records | Add this week’s data to historical data |
How to Think About Stacking
Section titled “How to Think About Stacking”Unlike joining (which adds columns), stacking adds rows:
Dataset A: 1,000 rows (January sales)Dataset B: 1,200 rows (February sales)Dataset C: 950 rows (March sales)─────────────────────────────────────Stacked: 3,150 rows (Q1 sales)The key requirement: columns should match (or at least overlap significantly).
Example Prompts for Stacking
Section titled “Example Prompts for Stacking”Basic stack — combine monthly data:
"Stack January, February, and March sales data into one dataset"Stack with source tracking:
"Combine all quarterly reports and add a column showing which quarter each row came from"Stack files with slightly different columns:
"Combine these three regional exports.They have mostly the same columns but some regions have extra fields—just include all columns."Append new data to existing:
"Add this week's transactions to the historical transactions data"Tips for Successful Stacking
Section titled “Tips for Successful Stacking”-
Check column alignment first
"Show me the columns in each dataset so I can verify they match" -
Add a source column — Track where rows came from
"Stack these files and add a 'source_file' column" -
Handle column differences — Decide what to do with mismatched columns
"Stack these datasets, filling in blanks where columns don't exist in some files" -
Watch for duplicates — Stacking can create duplicates if data overlaps
"Stack the datasets and remove any duplicate rows based on transaction_id"
Comparison: Joining vs. Stacking
Section titled “Comparison: Joining vs. Stacking”| Aspect | Joining | Stacking |
|---|---|---|
| What it does | Adds columns from another dataset | Adds rows from another dataset |
| Requires | A matching key (ID, name, code) | Similar column structure |
| Result size | Usually same row count as main data | Sum of all input rows |
| Use when | Enriching or filtering data | Combining similar data |
| Example | Orders + Customer details | Jan sales + Feb sales + Mar sales |
Common Patterns
Section titled “Common Patterns”Pattern 1: Classify Then Join Back
Section titled “Pattern 1: Classify Then Join Back”When you need to classify values that repeat across many rows:
-
Extract unique values:
"Get unique company names from the transactions data" -
Classify the unique list:
"Classify each company by industry" -
Join classifications back:
"Join the industry classifications to the full transactions data on company name"
This avoids classifying the same company hundreds of times.
Pattern 2: Stack Then Analyze
Section titled “Pattern 2: Stack Then Analyze”When your data arrives in pieces:
-
Stack all sources:
"Combine Q1, Q2, Q3, and Q4 sales files into one dataset" -
Clean if needed:
"Remove duplicates and standardize the date format" -
Analyze the combined data:
"Show me monthly revenue trends for the full year"
Pattern 3: Enrich From Multiple Sources
Section titled “Pattern 3: Enrich From Multiple Sources”When you need data from several reference tables:
-
Start with main data:
"Load the transactions data" -
Join customer info:
"Add customer details by matching customer_id" -
Join product info:
"Add product categories by matching product_id" -
Analyze enriched data:
"Show revenue by customer segment and product category"
Troubleshooting
Section titled “Troubleshooting””My join returned no results”
Section titled “”My join returned no results””Cause: Keys don’t match exactly (different formats, extra spaces, case differences).
Fix: Standardize keys first or use fuzzy matching:
"Clean up the ID columns by trimming spaces and converting to uppercase, then join"“My join has way more rows than expected”
Section titled ““My join has way more rows than expected””Cause: One-to-many relationship—one record in List A matches multiple in List B.
Fix: This might be correct! If not, deduplicate first:
"Remove duplicate customer_ids from the customer list before joining"“Some columns are missing after stacking”
Section titled ““Some columns are missing after stacking””Cause: Column names differ slightly between datasets.
Fix: Rename columns to match before stacking:
"Rename 'sales_amount' to 'revenue' in the Q1 data, then stack with Q2"“Fuzzy matching is too slow”
Section titled ““Fuzzy matching is too slow””Cause: Comparing every row to every other row is computationally expensive.
Fix: Add exact filters to reduce comparisons:
"Fuzzy match companies, but only compare those in the same country first"Next Steps
Section titled “Next Steps”- Try the Researcher Tool for AI-powered classification
- Learn Prompting Best Practices for better results
- Explore Aggregating Data for summarizing results