Categorize Tool

The Categorize tool automatically discovers categories in your text data when you don’t know what the groups should be ahead of time. It uses AI-powered clustering to find natural groupings, then generates human-readable labels for each one.

What the Categorize Tool Does

The Categorize tool analyzes a text column in your data, finds patterns across all rows, and groups similar items together. Under the hood, it:

Embeds your text into numerical representations that capture meaning
Clusters similar items together using density-based algorithms
Labels each cluster with a descriptive, human-readable category name
Optionally generates sub-categories for finer-grained detail

Key capabilities:

Discover themes in survey responses, support tickets, or feedback
Group similar items without defining categories upfront
Choose your granularity — broad themes, medium topics, or specific sub-categories
Handle ambiguous items — assigns low-confidence items to the nearest group with a confidence score

When to Use It

Scenario	Example
Discovering feedback themes	”What topics are customers talking about in these reviews?”
Grouping support tickets	”Categorize these support tickets by theme”
Exploring survey responses	”Find the main themes in our open-ended survey responses”
Topic discovery	”What kinds of feature requests do we get?”
Content organization	”Group these articles by topic”

When NOT to Use It

The Categorize tool isn’t the right choice when:

You already know the categories — use the Researcher to classify into predefined groups
Your data isn’t text-based — Categorize works on text columns, not numbers or dates
You have fewer than 20 rows — there isn’t enough data to discover meaningful clusters
You need exact keyword matching — use filters or formulas instead

How to Use It

Describe what you want to explore in natural language. The agent recognizes when your request calls for category discovery and invokes the Categorize tool automatically.

Effective Prompts

Theme discovery:

“What are the main themes in these customer reviews?”
“Categorize these support tickets by topic”
“Group these survey responses into themes”

Exploratory analysis:

“What kinds of feedback are people leaving?”
“Find patterns in these product descriptions”
“Discover the main topics in these comments”

Specific data:

“Categorize the feature requests in the description column”
“Group these job postings by role type”
“Find themes in the notes field”

The Two-Phase Workflow

The Categorize tool uses a preview-then-execute approach so you can choose the right level of detail before processing your full dataset.

Phase 1: Choose Your Granularity

When you ask to categorize data, Querri analyzes a sample and presents three options:

Level	Description	Best For
Broad	~10–15 high-level themes	Executive summaries, quick overviews
Medium	~30–80 topic areas	Working analysis, dashboards
Specific	Fine-grained with sub-categories	Detailed breakdowns, deep dives

Each option includes a tailored description based on your actual data, so you can pick the level that fits your analysis.

Phase 2: Full Execution

After you choose a granularity level, the tool processes your entire dataset. You’ll see progress updates as it works through the pipeline — this can take a few minutes for larger datasets.

Understanding Your Results

Output Columns

The Categorize tool adds new columns to your data:

Column	Description
Category column (e.g., `theme`)	The discovered category label for each row
`{column}_confidence`	How well the row fits its assigned category (0–1)
`{column}_estimated`	`true` if the row was ambiguous and assigned to the nearest group
`{column}_detail`	Sub-category within the main group (only with Specific granularity)

Reading Confidence Scores

High confidence (0.8–1.0): The row is a clear fit for its category
Medium confidence (0.5–0.8): Reasonable fit, but the row has some overlap with other categories
Low confidence (below 0.5): The row was ambiguous — check the _estimated flag

Preparing Your Data for Best Results

Ensure You Have Enough Text

The tool needs meaningful text to cluster. Short labels or single-word entries won’t produce useful groupings.

Works well:

Customer feedback paragraphs
Support ticket descriptions
Survey open-ended responses
Product reviews

Won’t work well:

Single-word tags
Numeric codes
Empty or null-heavy columns

Clean Up Before Categorizing

Better input produces better categories:

Remove irrelevant rows:

"Filter out rows where the description is empty"

Focus on the right subset:

"Filter to feedback from 2025, then categorize by theme"

Deduplicate if needed:

"Remove duplicate descriptions, then categorize"

Let the Tool Pick the Right Column

The Categorize tool automatically detects which column contains the most meaningful text. It prioritizes columns with names like description, text, body, message, feedback, and review. If your text is in a differently named column, just mention it in your prompt:

"Categorize the data based on the comments column"

Tips for Best Results

Start Broad, Then Go Specific

If you’re exploring unfamiliar data, start with the Broad granularity to get an overview. You can always re-run with Specific once you understand the landscape.

Reduce Row Count First

Processing fewer rows is faster and often produces cleaner categories. Filter, deduplicate, or sample before categorizing:

"Filter to the last 6 months of tickets, then categorize by theme"

Combine Relevant Columns

If the information you need is spread across multiple columns, the tool can analyze them together. Mention which columns matter:

"Categorize based on both the subject and description fields"

Use Results for Further Analysis

Once your data is categorized, use the new columns for downstream analysis:

"Show a bar chart of ticket count by theme"
"What's the average satisfaction score per category?"
"Which themes have the most critical-priority tickets?"

Real-World Examples

Example 1: Customer Feedback Themes

Starting data: 3,000 customer feedback responses

Step 1: Prepare

"Filter to responses where feedback is not empty"

Step 2: Categorize

"Categorize this feedback by theme"

Step 3: Choose granularity — select Medium for a working analysis

Step 4: Analyze

"Show the top 10 themes by volume as a bar chart"

Example 2: Support Ticket Topics

Starting data: 8,000 support tickets with subject and body

Step 1: Focus

"Filter to tickets from Q4 2025"

Step 2: Categorize

"Categorize these tickets by topic using the subject and body"

Step 3: Choose granularity — select Specific for sub-categories

Step 4: Drill down

"Show ticket count by category and detail as a stacked bar chart"

Example 3: Survey Response Exploration

Starting data: 1,500 open-ended survey responses

Step 1: Categorize

"What are the main themes in these survey responses?"

Step 2: Choose granularity — select Broad for an executive summary

Step 3: Summarize

"Create a table showing each theme, its row count, and a representative example"

Troubleshooting

”Not enough text data”

Cause: Fewer than 20 rows with meaningful text content.

Fix:

Check for empty or null values: “How many rows have empty descriptions?”
Make sure the right column is being used: “Use the notes column instead”
Add more data if available

”Categories are too broad or too narrow”

Cause: The granularity level doesn’t match your needs.

Fix: Re-run with a different granularity. Start with Medium if unsure.

”Some rows are marked as estimated”

Cause: Those rows didn’t fit neatly into any cluster and were assigned to the nearest one.

Fix: This is expected. Check the _confidence column — low-confidence estimated rows may genuinely be outliers or mixed topics. You can filter them out for cleaner analysis:

"Filter to rows where estimated is false"

“Processing is taking a long time”

Cause: Large dataset or high granularity level.

Fix:

Reduce row count by filtering or deduplicating first
Use Broad granularity for faster results
The tool shows progress updates — longer processing often means better categories

Next Steps

Researcher Tool — for classifying into known categories
Forecaster Tool — for time series predictions
Aggregating Data — summarize categorized results
Creating Visualizations — chart your category breakdowns
Dashboard Basics — add category analysis to dashboards