Skip to content

Researcher Tool

The Researcher tool uses AI to process each row of your data individually, transforming unstructured text into structured columns. It’s perfect for classification, entity extraction, sentiment analysis, and any task where you need to apply intelligent analysis to every record in your dataset.

Querri automatically suggesting extraction from a text field

The Researcher applies an AI model to each row of your data to add new columns with computed values. Unlike standard transformations that use formulas, the Researcher understands context and can:

  • Classify data into categories (industry, sentiment, priority level)
  • Extract entities from text (company names, dates, amounts, locations)
  • Standardize information (normalize company names, categorize products)
  • Analyze sentiment or tone of text content
  • Summarize or rewrite content in a consistent format
  • Identify patterns that would be difficult to capture with rules

Use the Researcher when you need to:

  • Turn free-text fields into structured categories
  • Extract specific information embedded in descriptions
  • Classify or score items based on their content
  • Clean and standardize messy text data
  • Add intelligent metadata to your records

The Researcher isn’t needed for:

  • Simple calculations (use formulas instead)
  • Lookups or joins (use standard data operations)
  • Filtering based on exact values
  • Aggregations and groupings

Understanding Row Limits and Processing Time

Section titled “Understanding Row Limits and Processing Time”

Your subscription level determines how many rows you can process with the Researcher at once. The Researcher analyzes each row individually using AI, which means:

  • Processing takes time — expect several minutes for larger datasets
  • Costs scale with rows — each row uses AI processing capacity
  • Results improve with focus — smaller, cleaner datasets often produce better classifications

Simply describe what you want to extract or classify in natural language. The agent recognizes when your request requires row-by-row AI analysis and automatically invokes the Researcher.

Classification:

  • “Classify each company by industry”
  • “Categorize these support tickets by priority: low, medium, high, critical”
  • “Tag each review as positive, negative, or neutral”
  • “Identify the business type for each vendor”

Extraction:

  • “Extract the main technology mentioned in each company description”
  • “Pull out any monetary amounts from the notes field”
  • “Find the primary contact name in each record”
  • “Extract all email addresses mentioned in the text”

Standardization:

  • “Standardize the company names to their official names”
  • “Normalize product categories into these buckets: Electronics, Clothing, Home, Food, Other”
  • “Clean up the job titles to standard roles”

Analysis:

  • “Score each customer review on a scale of 1-5”
  • “Identify the main complaint theme for each support ticket”
  • “Determine if each email is a sales inquiry, support request, or general question”

The more specific you are about what columns you want and their values, the better results you’ll get.

Vague (might work):

  • “Analyze these companies”

Better:

  • “Add an ‘Industry’ column classifying each company”

Best — list your categories explicitly:

  • “Add an ‘Industry’ column with one of: Technology, Healthcare, Finance, Retail, Manufacturing, Services, Other”

Don’t run the Researcher multiple times when you can get everything in one pass:

Inefficient (3 separate runs):

"Classify each company by industry"
"Add a company size column"
"Extract the primary product"

Efficient (1 run):

"Add these columns:
- 'Industry': Technology, Healthcare, Finance, Retail, Manufacturing, Services, Other
- 'Company_Size': Startup, SMB, Mid-Market, Enterprise
- 'Primary_Product': The main product or service offered"

This is faster, uses less processing capacity, and keeps your workflow clean.

Extraction results showing multiple structured fields extracted at once

Tell the Researcher exactly what format you want:

  • “Use title case for all values”
  • “Return only single-word categories”
  • “Use numbers 1-5, not words”
  • “Format as ‘Yes’ or ‘No’”
  • “Keep responses under 50 words”

The Researcher uses a preview-then-execute workflow to help you validate results before processing your entire dataset.

When you request a Researcher operation, the agent:

  1. Analyzes your data structure
  2. Creates an AI prompt for the task
  3. Processes 30 sample rows
  4. Shows you the results in a preview table

Extraction results showing structured fields like Theme, Urgency Level, and Action Required

At this point, review the preview carefully:

  • Are the new columns what you expected?
  • Are the values accurate and consistent?
  • Are there any obvious misclassifications?

After reviewing the preview, you’ll see options like:

  • “Run for all X rows” - Process your entire dataset
  • “Filter first” - Reduce rows before processing
  • “Cancel” - Start over with a different approach

Extracted data joined to the original dataset with recommended next steps

Choose the option that makes sense based on your preview review.

The quality of your Researcher results depends heavily on your input data. Follow these data science principles for optimal outcomes.

Processing fewer rows is faster, cheaper, and often produces better results. Consider:

Filtering to relevant rows:

"Filter to customers from 2024 only, then classify by industry"

Aggregating first:

"Group by company name first, then classify each unique company"

Removing duplicates:

"Remove duplicate company names, then add industry classification"

The Researcher analyzes the content of your rows to make decisions. Make sure the columns containing the information needed for classification are included in your data.

For company classification:

  • Company name
  • Description or about text
  • Website URL
  • Products/services offered

For sentiment analysis:

  • The actual text to analyze
  • Optional: context like date or source

For entity extraction:

  • The text field containing the entities
  • Any contextual columns that help interpretation

Empty or null values produce poor Researcher results. Before running the Researcher:

  1. Check for empty values: “How many rows have empty descriptions?”
  2. Filter them out: “Filter to rows where description is not empty”
  3. Or fill them: “Replace empty descriptions with ‘No description available‘“

If the information you need to analyze is spread across multiple columns, consider combining them first:

"Combine the 'name', 'description', and 'notes' columns into a single 'full_text' column, then classify by industry"

When you want specific categories, list them:

Good:

  • “Classify into: SaaS, Hardware, Consulting, Other”

Even Better:

  • “Classify into these categories:
    • SaaS: Cloud software companies
    • Hardware: Physical product manufacturers
    • Consulting: Professional services firms
    • Other: Anything else”

Decide on the level of detail you need and stick to it:

Too granular:

  • “Enterprise Cloud Infrastructure SaaS”

Too broad:

  • “Technology”

Just right (for most use cases):

  • “Cloud Services” or “SaaS”

Tell the Researcher what to do with edge cases:

  • “If unsure, classify as ‘Unknown’”
  • “When a company spans multiple industries, use the primary one”
  • “For conglomerates, use their largest business unit”

Vague:

  • “Extract information from each row”

Specific:

  • “Extract the founding year from each company description”
  • “Find the CEO name mentioned in each press release”
  • “Pull out all technology stack mentions (e.g., React, Python, AWS)“

Tell the Researcher exactly how to format extracted data:

  • “Extract as a comma-separated list”
  • “Return only the first match”
  • “Format dates as YYYY-MM-DD”
  • “Use ‘N/A’ if not found”

Clarify what to do when multiple values exist:

  • “Return all matches as a comma-separated list”
  • “Return only the first occurrence”
  • “Return the most recent date”
  • “Return the largest amount”

For classification tasks:

  • 50-100 rows is usually enough to validate your approach works
  • 500+ rows gives statistically meaningful results for analysis
  • 1,000+ rows should use filtering/aggregation first when possible

Consider adding columns that help the AI make better decisions:

For company classification:

  • Company website domain (helps identify industry)
  • Employee count range (helps identify company size category)
  • Geographic region (helps with regional business patterns)

For sentiment analysis:

  • Text length (very short text is harder to classify)
  • Source type (support tickets vs. reviews have different patterns)
  • Date (sentiment patterns may vary over time)

After running the Researcher:

  1. Spot-check results: “Show me 20 random rows with the new classification”
  2. Check distributions: “Count how many rows are in each category”
  3. Examine edge cases: “Show me rows classified as ‘Other’ or ‘Unknown’”
  4. Verify accuracy: Manually check 10-20 random classifications

If results aren’t quite right:

  1. Review misclassifications: “Show me rows where industry is ‘Other’”
  2. Refine your prompt: Add more specific category definitions
  3. Re-run with improvements: “Re-classify industry using these updated categories…”

Example 1: Classifying Companies by Industry

Section titled “Example 1: Classifying Companies by Industry”

Starting data: 5,000 companies with name and description

Step 1: Reduce data

"Remove duplicate company names based on the name column"

Result: 3,200 unique companies

Step 2: Filter empty descriptions

"Filter to companies where description is not empty"

Result: 2,800 companies

Step 3: Run Researcher

"Add an 'Industry' column classifying each company into:
Technology, Healthcare, Financial Services, Retail, Manufacturing,
Professional Services, or Other. Base the classification on the
company name and description."

Step 4: Validate

"Show me the count of companies in each industry"

Example 2: Extracting Entities from Support Tickets

Section titled “Example 2: Extracting Entities from Support Tickets”

Starting data: 10,000 support tickets with subject and body

Step 1: Focus on recent tickets

"Filter to tickets from the last 30 days"

Result: 1,200 tickets

Step 2: Run Researcher

"Add these columns:
- 'Category': Bug, Feature Request, How-To Question, Account Issue, Other
- 'Product_Area': Which product area is affected
- 'Urgency': Low, Medium, High, Critical based on the language used"

Step 3: Analyze results

"Create a bar chart showing ticket count by Category and Urgency"

Starting data: 8,000 product reviews

Step 1: Aggregate by product

"First, let me see how many reviews per product"

Step 2: Sample if needed

"Filter to the top 20 products by review count"

Result: 2,400 reviews

Step 3: Run Researcher

"Add a 'Sentiment' column with values: Positive, Neutral, Negative.
Also add a 'Main_Theme' column identifying the primary topic discussed."

Step 4: Visualize

"Show a stacked bar chart of sentiment by product"

Cause: The prompt is too vague or categories overlap. Fix: Be more explicit about category definitions and edge cases.

Cause: Too many rows being processed. Fix: Filter, aggregate, or deduplicate your data first.

”I’m getting ‘ERROR’ in result columns”

Section titled “”I’m getting ‘ERROR’ in result columns””

Cause: The AI couldn’t parse the row content or the prompt was unclear. Fix: Check for malformed data and simplify your extraction request.

”Results don’t match what I expected”

Section titled “”Results don’t match what I expected””

Cause: The preview looked good but full results differ. Fix: The 30-row preview is a sample. Run validation queries on the full results and iterate.