Categorize Tool
The Categorize tool automatically discovers categories in your text data when you don’t know what the groups should be ahead of time. It uses AI-powered clustering to find natural groupings, then generates human-readable labels for each one.
What the Categorize Tool Does
Section titled “What the Categorize Tool Does”The Categorize tool analyzes a text column in your data, finds patterns across all rows, and groups similar items together. Under the hood, it:
- Embeds your text into numerical representations that capture meaning
- Clusters similar items together using density-based algorithms
- Labels each cluster with a descriptive, human-readable category name
- Optionally generates sub-categories for finer-grained detail
Key capabilities:
- Discover themes in survey responses, support tickets, or feedback
- Group similar items without defining categories upfront
- Choose your granularity — broad themes, medium topics, or specific sub-categories
- Handle ambiguous items — assigns low-confidence items to the nearest group with a confidence score
When to Use It
Section titled “When to Use It”| Scenario | Example |
|---|---|
| Discovering feedback themes | ”What topics are customers talking about in these reviews?” |
| Grouping support tickets | ”Categorize these support tickets by theme” |
| Exploring survey responses | ”Find the main themes in our open-ended survey responses” |
| Topic discovery | ”What kinds of feature requests do we get?” |
| Content organization | ”Group these articles by topic” |
When NOT to Use It
Section titled “When NOT to Use It”The Categorize tool isn’t the right choice when:
- You already know the categories — use the Researcher to classify into predefined groups
- Your data isn’t text-based — Categorize works on text columns, not numbers or dates
- You have fewer than 20 rows — there isn’t enough data to discover meaningful clusters
- You need exact keyword matching — use filters or formulas instead
How to Use It
Section titled “How to Use It”Describe what you want to explore in natural language. The agent recognizes when your request calls for category discovery and invokes the Categorize tool automatically.
Effective Prompts
Section titled “Effective Prompts”Theme discovery:
- “What are the main themes in these customer reviews?”
- “Categorize these support tickets by topic”
- “Group these survey responses into themes”
Exploratory analysis:
- “What kinds of feedback are people leaving?”
- “Find patterns in these product descriptions”
- “Discover the main topics in these comments”
Specific data:
- “Categorize the feature requests in the description column”
- “Group these job postings by role type”
- “Find themes in the notes field”
The Two-Phase Workflow
Section titled “The Two-Phase Workflow”The Categorize tool uses a preview-then-execute approach so you can choose the right level of detail before processing your full dataset.
Phase 1: Choose Your Granularity
Section titled “Phase 1: Choose Your Granularity”When you ask to categorize data, Querri analyzes a sample and presents three options:
| Level | Description | Best For |
|---|---|---|
| Broad | ~10–15 high-level themes | Executive summaries, quick overviews |
| Medium | ~30–80 topic areas | Working analysis, dashboards |
| Specific | Fine-grained with sub-categories | Detailed breakdowns, deep dives |
Each option includes a tailored description based on your actual data, so you can pick the level that fits your analysis.
Phase 2: Full Execution
Section titled “Phase 2: Full Execution”After you choose a granularity level, the tool processes your entire dataset. You’ll see progress updates as it works through the pipeline — this can take a few minutes for larger datasets.
Understanding Your Results
Section titled “Understanding Your Results”Output Columns
Section titled “Output Columns”The Categorize tool adds new columns to your data:
| Column | Description |
|---|---|
Category column (e.g., theme) | The discovered category label for each row |
{column}_confidence | How well the row fits its assigned category (0–1) |
{column}_estimated | true if the row was ambiguous and assigned to the nearest group |
{column}_detail | Sub-category within the main group (only with Specific granularity) |
Reading Confidence Scores
Section titled “Reading Confidence Scores”- High confidence (0.8–1.0): The row is a clear fit for its category
- Medium confidence (0.5–0.8): Reasonable fit, but the row has some overlap with other categories
- Low confidence (below 0.5): The row was ambiguous — check the
_estimatedflag
Preparing Your Data for Best Results
Section titled “Preparing Your Data for Best Results”Ensure You Have Enough Text
Section titled “Ensure You Have Enough Text”The tool needs meaningful text to cluster. Short labels or single-word entries won’t produce useful groupings.
Works well:
- Customer feedback paragraphs
- Support ticket descriptions
- Survey open-ended responses
- Product reviews
Won’t work well:
- Single-word tags
- Numeric codes
- Empty or null-heavy columns
Clean Up Before Categorizing
Section titled “Clean Up Before Categorizing”Better input produces better categories:
Remove irrelevant rows:
"Filter out rows where the description is empty"Focus on the right subset:
"Filter to feedback from 2025, then categorize by theme"Deduplicate if needed:
"Remove duplicate descriptions, then categorize"Let the Tool Pick the Right Column
Section titled “Let the Tool Pick the Right Column”The Categorize tool automatically detects which column contains the most meaningful text. It prioritizes columns with names like description, text, body, message, feedback, and review. If your text is in a differently named column, just mention it in your prompt:
"Categorize the data based on the comments column"Tips for Best Results
Section titled “Tips for Best Results”Start Broad, Then Go Specific
Section titled “Start Broad, Then Go Specific”If you’re exploring unfamiliar data, start with the Broad granularity to get an overview. You can always re-run with Specific once you understand the landscape.
Reduce Row Count First
Section titled “Reduce Row Count First”Processing fewer rows is faster and often produces cleaner categories. Filter, deduplicate, or sample before categorizing:
"Filter to the last 6 months of tickets, then categorize by theme"Combine Relevant Columns
Section titled “Combine Relevant Columns”If the information you need is spread across multiple columns, the tool can analyze them together. Mention which columns matter:
"Categorize based on both the subject and description fields"Use Results for Further Analysis
Section titled “Use Results for Further Analysis”Once your data is categorized, use the new columns for downstream analysis:
"Show a bar chart of ticket count by theme""What's the average satisfaction score per category?""Which themes have the most critical-priority tickets?"Real-World Examples
Section titled “Real-World Examples”Example 1: Customer Feedback Themes
Section titled “Example 1: Customer Feedback Themes”Starting data: 3,000 customer feedback responses
Step 1: Prepare
"Filter to responses where feedback is not empty"Step 2: Categorize
"Categorize this feedback by theme"Step 3: Choose granularity — select Medium for a working analysis
Step 4: Analyze
"Show the top 10 themes by volume as a bar chart"Example 2: Support Ticket Topics
Section titled “Example 2: Support Ticket Topics”Starting data: 8,000 support tickets with subject and body
Step 1: Focus
"Filter to tickets from Q4 2025"Step 2: Categorize
"Categorize these tickets by topic using the subject and body"Step 3: Choose granularity — select Specific for sub-categories
Step 4: Drill down
"Show ticket count by category and detail as a stacked bar chart"Example 3: Survey Response Exploration
Section titled “Example 3: Survey Response Exploration”Starting data: 1,500 open-ended survey responses
Step 1: Categorize
"What are the main themes in these survey responses?"Step 2: Choose granularity — select Broad for an executive summary
Step 3: Summarize
"Create a table showing each theme, its row count, and a representative example"Troubleshooting
Section titled “Troubleshooting””Not enough text data”
Section titled “”Not enough text data””Cause: Fewer than 20 rows with meaningful text content.
Fix:
- Check for empty or null values: “How many rows have empty descriptions?”
- Make sure the right column is being used: “Use the notes column instead”
- Add more data if available
”Categories are too broad or too narrow”
Section titled “”Categories are too broad or too narrow””Cause: The granularity level doesn’t match your needs.
Fix: Re-run with a different granularity. Start with Medium if unsure.
”Some rows are marked as estimated”
Section titled “”Some rows are marked as estimated””Cause: Those rows didn’t fit neatly into any cluster and were assigned to the nearest one.
Fix: This is expected. Check the _confidence column — low-confidence estimated rows may genuinely be outliers or mixed topics. You can filter them out for cleaner analysis:
"Filter to rows where estimated is false"“Processing is taking a long time”
Section titled ““Processing is taking a long time””Cause: Large dataset or high granularity level.
Fix:
- Reduce row count by filtering or deduplicating first
- Use Broad granularity for faster results
- The tool shows progress updates — longer processing often means better categories
Next Steps
Section titled “Next Steps”- Researcher Tool — for classifying into known categories
- Forecaster Tool — for time series predictions
- Aggregating Data — summarize categorized results
- Creating Visualizations — chart your category breakdowns
- Dashboard Basics — add category analysis to dashboards