How to Connect Your Data

Real analysis often requires combining data from multiple sources. Whether you need to match customers to their orders, merge monthly reports into one view, or enrich your data with classifications, Querri makes connecting data simple through natural language prompts.

This guide covers two fundamental ways to connect data:

Joining — matching rows from different datasets based on a shared key
Stacking — combining similar datasets with the same columns into one

Joining Data: Matching on a Key

Joining connects rows from two datasets when they share a common value—like a customer ID, product code, or email address.

When to Use Joins

Scenario	Example
Add details to your data	Match orders to customer info
Filter based on another list	Keep only products in your catalog
Enrich with classifications	Add industry labels to companies
Combine related tables	Connect invoices to payments

How to Think About Joins

Imagine you have two lists:

List A: Your main data (e.g., 10,000 orders)
List B: Reference data you want to add (e.g., 500 customers)

A join finds matching rows and combines them. The key question is: what happens when there’s no match?

Join Type	Plain English	When to Use
Left Join	”Keep all of List A, add matching info from List B”	You want all your main records, even if some don’t match
Inner Join	”Only keep rows that exist in both lists”	You only want records that have matches on both sides
Full Join	”Keep everything from both lists”	You want a complete picture of all data

Example Prompts for Joining

Basic join — add customer details to orders:

"Join the orders data with customer info on customer_id"

Left join — keep all orders even if customer is unknown:

"Add customer details to orders, keeping all orders even if there's no customer match"

Inner join — only orders with known customers:

"Show me only orders where we have customer information"

Join on multiple keys:

"Match sales to inventory on both product_id and warehouse_id"

Join with different column names:

"Join orders to customers where orders.cust_id matches customers.customer_id"

Joining to Enrich Data

A common pattern is joining to add classifications or lookups:

"I have a list of companies with industries I classified.
Join those industry labels back to my full transaction data on company_name"

This is especially powerful after using the Researcher tool to classify a subset of your data.

Fuzzy Matching: When Keys Don’t Match Exactly

Real-world data is messy. “Acme Corp”, “ACME Corporation”, and “Acme Corp.” are the same company, but won’t match in a standard join.

Querri’s Data Query tool supports fuzzy matching using text similarity functions.

When to Use Fuzzy Matching

Scenario	Example
Name variations	”John Smith” vs “Smith, John”
Company name differences	”Microsoft” vs “Microsoft Corporation”
Typos in data entry	”Califronia” vs “California”
Address matching	”123 Main St” vs “123 Main Street”

Fuzzy Matching Example Prompts

Match companies with similar names:

"Join these two company lists, matching on company name with fuzzy matching.
Use a similarity threshold of 0.8 to catch spelling variations."

Find duplicate customers:

"Find potential duplicate customers where names are similar but not identical.
Show me pairs with similarity above 0.85 so I can review them."

Match addresses approximately:

"Match shipping addresses to our warehouse locations,
allowing for minor differences in street abbreviations"

Similarity Thresholds Guide

Different data types need different thresholds:

Data Type	Recommended Threshold	Notes
Person names	0.85 - 0.95	Higher to avoid false matches
Company names	0.80 - 0.90	Allow for Inc/Corp/LLC variations
Addresses	0.70 - 0.85	Street abbreviations vary widely
Product names	0.75 - 0.85	Model numbers need precision
Catching typos	0.90+	Very similar strings only

Tips for Better Fuzzy Matches

Standardize first — Remove extra spaces, convert to lowercase

"Clean up company names by trimming whitespace and converting to uppercase,
then do the fuzzy match"

Add exact filters — Reduce comparison volume with exact criteria

"Fuzzy match customer names, but only compare customers in the same state"

Review results — Fuzzy matching can produce false positives

"Show me the fuzzy matches with similarity scores so I can verify them"

Stacking Data: Combining Similar Datasets

Stacking (also called concatenating or appending) combines datasets that have the same structure—like monthly reports, regional data, or yearly exports.

When to Use Stacking

Scenario	Example
Combine time periods	January + February + March sales
Merge regional data	North + South + East + West territories
Consolidate file exports	Multiple CSV exports from same system
Append new records	Add this week’s data to historical data

How to Think About Stacking

Unlike joining (which adds columns), stacking adds rows:

Dataset A: 1,000 rows (January sales)
Dataset B: 1,200 rows (February sales)
Dataset C: 950 rows (March sales)
─────────────────────────────────────
Stacked:   3,150 rows (Q1 sales)

The key requirement: columns should match (or at least overlap significantly).

Example Prompts for Stacking

Basic stack — combine monthly data:

"Stack January, February, and March sales data into one dataset"

Stack with source tracking:

"Combine all quarterly reports and add a column showing which quarter each row came from"

Stack files with slightly different columns:

"Combine these three regional exports.
They have mostly the same columns but some regions have extra fields—just include all columns."

Append new data to existing:

"Add this week's transactions to the historical transactions data"

Tips for Successful Stacking

Check column alignment first

"Show me the columns in each dataset so I can verify they match"

Add a source column — Track where rows came from

"Stack these files and add a 'source_file' column"

Handle column differences — Decide what to do with mismatched columns

"Stack these datasets, filling in blanks where columns don't exist in some files"

Watch for duplicates — Stacking can create duplicates if data overlaps

"Stack the datasets and remove any duplicate rows based on transaction_id"

Comparison: Joining vs. Stacking

Aspect	Joining	Stacking
What it does	Adds columns from another dataset	Adds rows from another dataset
Requires	A matching key (ID, name, code)	Similar column structure
Result size	Usually same row count as main data	Sum of all input rows
Use when	Enriching or filtering data	Combining similar data
Example	Orders + Customer details	Jan sales + Feb sales + Mar sales

Common Patterns

Pattern 1: Classify Then Join Back

When you need to classify values that repeat across many rows:

Extract unique values:

"Get unique company names from the transactions data"

Classify the unique list:
```
"Classify each company by industry"
```

Join classifications back:

"Join the industry classifications to the full transactions data on company name"

This avoids classifying the same company hundreds of times.

Pattern 2: Stack Then Analyze

When your data arrives in pieces:

Stack all sources:

"Combine Q1, Q2, Q3, and Q4 sales files into one dataset"

Clean if needed:

"Remove duplicates and standardize the date format"

Analyze the combined data:

"Show me monthly revenue trends for the full year"

Pattern 3: Enrich From Multiple Sources

When you need data from several reference tables:

Start with main data:
```
"Load the transactions data"
```

Join customer info:

"Add customer details by matching customer_id"

Join product info:

"Add product categories by matching product_id"

Analyze enriched data:

"Show revenue by customer segment and product category"

Troubleshooting

”My join returned no results”

Cause: Keys don’t match exactly (different formats, extra spaces, case differences).

Fix: Standardize keys first or use fuzzy matching:

"Clean up the ID columns by trimming spaces and converting to uppercase, then join"

“My join has way more rows than expected”

Cause: One-to-many relationship—one record in List A matches multiple in List B.

Fix: This might be correct! If not, deduplicate first:

"Remove duplicate customer_ids from the customer list before joining"

“Some columns are missing after stacking”

Cause: Column names differ slightly between datasets.

Fix: Rename columns to match before stacking:

"Rename 'sales_amount' to 'revenue' in the Q1 data, then stack with Q2"

“Fuzzy matching is too slow”

Cause: Comparing every row to every other row is computationally expensive.

Fix: Add exact filters to reduce comparisons:

"Fuzzy match companies, but only compare those in the same country first"

Next Steps

Try the Researcher Tool for AI-powered classification
Learn Prompting Best Practices for better results
Explore Aggregating Data for summarizing results