Researcher Tool

The Researcher tool uses AI to process each row of your data individually, transforming unstructured text into structured columns. It’s perfect for classification, entity extraction, sentiment analysis, and any task where you need to apply intelligent analysis to every record in your dataset.

Querri automatically suggesting extraction from a text field

What the Researcher Does

The Researcher applies an AI model to each row of your data to add new columns with computed values. Unlike standard transformations that use formulas, the Researcher understands context and can:

Classify data into categories (industry, sentiment, priority level)
Extract entities from text (company names, dates, amounts, locations)
Standardize information (normalize company names, categorize products)
Analyze sentiment or tone of text content
Summarize or rewrite content in a consistent format
Identify patterns that would be difficult to capture with rules

When to Use It

Use the Researcher when you need to:

Turn free-text fields into structured categories
Extract specific information embedded in descriptions
Classify or score items based on their content
Clean and standardize messy text data
Add intelligent metadata to your records

When NOT to Use It

The Researcher isn’t needed for:

Simple calculations (use formulas instead)
Lookups or joins (use standard data operations)
Filtering based on exact values
Aggregations and groupings

Understanding Row Limits and Processing Time

Your subscription level determines how many rows you can process with the Researcher at once. The Researcher analyzes each row individually using AI, which means:

Processing takes time — expect several minutes for larger datasets
Costs scale with rows — each row uses AI processing capacity
Results improve with focus — smaller, cleaner datasets often produce better classifications

How to Prompt for the Researcher

Simply describe what you want to extract or classify in natural language. The agent recognizes when your request requires row-by-row AI analysis and automatically invokes the Researcher.

Effective Prompts

Classification:

“Classify each company by industry”
“Categorize these support tickets by priority: low, medium, high, critical”
“Tag each review as positive, negative, or neutral”
“Identify the business type for each vendor”

Extraction:

“Extract the main technology mentioned in each company description”
“Pull out any monetary amounts from the notes field”
“Find the primary contact name in each record”
“Extract all email addresses mentioned in the text”

Standardization:

“Standardize the company names to their official names”
“Normalize product categories into these buckets: Electronics, Clothing, Home, Food, Other”
“Clean up the job titles to standard roles”

Analysis:

“Score each customer review on a scale of 1-5”
“Identify the main complaint theme for each support ticket”
“Determine if each email is a sales inquiry, support request, or general question”

Be Explicit About Output Columns

The more specific you are about what columns you want and their values, the better results you’ll get.

Vague (might work):

“Analyze these companies”

Better:

“Add an ‘Industry’ column classifying each company”

Best — list your categories explicitly:

“Add an ‘Industry’ column with one of: Technology, Healthcare, Finance, Retail, Manufacturing, Services, Other”

Request Multiple Columns at Once

Don’t run the Researcher multiple times when you can get everything in one pass:

Inefficient (3 separate runs):

"Classify each company by industry"
"Add a company size column"
"Extract the primary product"

Efficient (1 run):

"Add these columns:
- 'Industry': Technology, Healthcare, Finance, Retail, Manufacturing, Services, Other
- 'Company_Size': Startup, SMB, Mid-Market, Enterprise
- 'Primary_Product': The main product or service offered"

This is faster, uses less processing capacity, and keeps your workflow clean.

Extraction results showing multiple structured fields extracted at once

Specify Value Formats

Tell the Researcher exactly what format you want:

“Use title case for all values”
“Return only single-word categories”
“Use numbers 1-5, not words”
“Format as ‘Yes’ or ‘No’”
“Keep responses under 50 words”

The Two-Phase Workflow

The Researcher uses a preview-then-execute workflow to help you validate results before processing your entire dataset.

Phase 1: Preview (30 Rows)

When you request a Researcher operation, the agent:

Analyzes your data structure
Creates an AI prompt for the task
Processes 30 sample rows
Shows you the results in a preview table

Extraction results showing structured fields like Theme, Urgency Level, and Action Required

At this point, review the preview carefully:

Are the new columns what you expected?
Are the values accurate and consistent?
Are there any obvious misclassifications?

Phase 2: Confirmation and Full Execution

After reviewing the preview, you’ll see options like:

“Run for all X rows” - Process your entire dataset
“Filter first” - Reduce rows before processing
“Cancel” - Start over with a different approach

Extracted data joined to the original dataset with recommended next steps

Choose the option that makes sense based on your preview review.

Preparing Your Data for Best Results

The quality of your Researcher results depends heavily on your input data. Follow these data science principles for optimal outcomes.

Reduce Row Count Before Processing

Processing fewer rows is faster, cheaper, and often produces better results. Consider:

Filtering to relevant rows:

"Filter to customers from 2024 only, then classify by industry"

Aggregating first:

"Group by company name first, then classify each unique company"

Removing duplicates:

"Remove duplicate company names, then add industry classification"

Ensure Relevant Columns Are Present

The Researcher analyzes the content of your rows to make decisions. Make sure the columns containing the information needed for classification are included in your data.

For company classification:

Company name
Description or about text
Website URL
Products/services offered

For sentiment analysis:

The actual text to analyze
Optional: context like date or source

For entity extraction:

The text field containing the entities
Any contextual columns that help interpretation

Handle Missing Data First

Empty or null values produce poor Researcher results. Before running the Researcher:

Check for empty values: “How many rows have empty descriptions?”
Filter them out: “Filter to rows where description is not empty”
Or fill them: “Replace empty descriptions with ‘No description available‘“

Consolidate Text When Needed

If the information you need to analyze is spread across multiple columns, consider combining them first:

"Combine the 'name', 'description', and 'notes' columns into a single 'full_text' column, then classify by industry"

Tips for Better Classification Results

Provide Category Examples

When you want specific categories, list them:

Good:

“Classify into: SaaS, Hardware, Consulting, Other”

Even Better:

“Classify into these categories:
- SaaS: Cloud software companies
- Hardware: Physical product manufacturers
- Consulting: Professional services firms
- Other: Anything else”

Use Consistent Granularity

Decide on the level of detail you need and stick to it:

Too granular:

“Enterprise Cloud Infrastructure SaaS”

Too broad:

“Technology”

Just right (for most use cases):

“Cloud Services” or “SaaS”

Handle Ambiguous Cases

Tell the Researcher what to do with edge cases:

“If unsure, classify as ‘Unknown’”
“When a company spans multiple industries, use the primary one”
“For conglomerates, use their largest business unit”

Tips for Better Extraction Results

Be Specific About What to Extract

Vague:

“Extract information from each row”

Specific:

“Extract the founding year from each company description”
“Find the CEO name mentioned in each press release”
“Pull out all technology stack mentions (e.g., React, Python, AWS)“

Specify the Output Format

Tell the Researcher exactly how to format extracted data:

“Extract as a comma-separated list”
“Return only the first match”
“Format dates as YYYY-MM-DD”
“Use ‘N/A’ if not found”

Handle Multiple Matches

Clarify what to do when multiple values exist:

“Return all matches as a comma-separated list”
“Return only the first occurrence”
“Return the most recent date”
“Return the largest amount”

Data Science Best Practices

Sample Size Considerations

For classification tasks:

50-100 rows is usually enough to validate your approach works
500+ rows gives statistically meaningful results for analysis
1,000+ rows should use filtering/aggregation first when possible

Feature Engineering

Consider adding columns that help the AI make better decisions:

For company classification:

Company website domain (helps identify industry)
Employee count range (helps identify company size category)
Geographic region (helps with regional business patterns)

For sentiment analysis:

Text length (very short text is harder to classify)
Source type (support tickets vs. reviews have different patterns)
Date (sentiment patterns may vary over time)

Validation Strategies

After running the Researcher:

Spot-check results: “Show me 20 random rows with the new classification”
Check distributions: “Count how many rows are in each category”
Examine edge cases: “Show me rows classified as ‘Other’ or ‘Unknown’”
Verify accuracy: Manually check 10-20 random classifications

If results aren’t quite right:

Review misclassifications: “Show me rows where industry is ‘Other’”
Refine your prompt: Add more specific category definitions
Re-run with improvements: “Re-classify industry using these updated categories…”

Real-World Examples

Example 1: Classifying Companies by Industry

Starting data: 5,000 companies with name and description

Step 1: Reduce data

"Remove duplicate company names based on the name column"

Result: 3,200 unique companies

Step 2: Filter empty descriptions

"Filter to companies where description is not empty"

Result: 2,800 companies

Step 3: Run Researcher

"Add an 'Industry' column classifying each company into:
Technology, Healthcare, Financial Services, Retail, Manufacturing,
Professional Services, or Other. Base the classification on the
company name and description."

Step 4: Validate

"Show me the count of companies in each industry"

Example 2: Extracting Entities from Support Tickets

Starting data: 10,000 support tickets with subject and body

Step 1: Focus on recent tickets

"Filter to tickets from the last 30 days"

Result: 1,200 tickets

Step 2: Run Researcher

"Add these columns:
- 'Category': Bug, Feature Request, How-To Question, Account Issue, Other
- 'Product_Area': Which product area is affected
- 'Urgency': Low, Medium, High, Critical based on the language used"

Step 3: Analyze results

"Create a bar chart showing ticket count by Category and Urgency"

Example 3: Sentiment Analysis of Reviews

Starting data: 8,000 product reviews

Step 1: Aggregate by product

"First, let me see how many reviews per product"

Step 2: Sample if needed

"Filter to the top 20 products by review count"

Result: 2,400 reviews

Step 3: Run Researcher

"Add a 'Sentiment' column with values: Positive, Neutral, Negative.
Also add a 'Main_Theme' column identifying the primary topic discussed."

Step 4: Visualize

"Show a stacked bar chart of sentiment by product"

Troubleshooting

”My classifications are inconsistent”

Cause: The prompt is too vague or categories overlap. Fix: Be more explicit about category definitions and edge cases.

”Processing is taking too long”

Cause: Too many rows being processed. Fix: Filter, aggregate, or deduplicate your data first.

”I’m getting ‘ERROR’ in result columns”

Cause: The AI couldn’t parse the row content or the prompt was unclear. Fix: Check for malformed data and simplify your extraction request.

”Results don’t match what I expected”

Cause: The preview looked good but full results differ. Fix: The 30-row preview is a sample. Run validation queries on the full results and iterate.

Next Steps

Learn about the Forecaster Tool — for time series predictions
Learn How to Connect Your Data — joining, stacking, and combining datasets
Review Prompting Best Practices for general tips

Researcher Tool

What the Researcher Does

When to Use It

When NOT to Use It

Understanding Row Limits and Processing Time

How to Prompt for the Researcher

Effective Prompts

Be Explicit About Output Columns

Request Multiple Columns at Once

Specify Value Formats

The Two-Phase Workflow

Phase 1: Preview (30 Rows)

Phase 2: Confirmation and Full Execution

Preparing Your Data for Best Results

Reduce Row Count Before Processing

Ensure Relevant Columns Are Present

Handle Missing Data First

Consolidate Text When Needed

Tips for Better Classification Results

Provide Category Examples

Use Consistent Granularity

Handle Ambiguous Cases

Tips for Better Extraction Results

Be Specific About What to Extract

Specify the Output Format

Handle Multiple Matches

Data Science Best Practices

Sample Size Considerations

Feature Engineering

Validation Strategies

Iterative Refinement

Real-World Examples

Example 1: Classifying Companies by Industry

Example 2: Extracting Entities from Support Tickets

Example 3: Sentiment Analysis of Reviews

Troubleshooting

”My classifications are inconsistent”

”Processing is taking too long”

”I’m getting ‘ERROR’ in result columns”

”Results don’t match what I expected”

Next Steps