Industry Prediction: Basics

Introduction

Baselayer's Industry Prediction product provides accurate business classification by analyzing multiple data sources and returning standardized industry codes (NAICS, MCC, SIC) along with confidence metrics and risk indicators.

For detailed information about industry classification systems (NAICS, MCC, and SIC codes), their structures, and how they differ, see our Industry Classification Systems guide .

This guide explains how Industry Prediction works, how the accuracy score is calculated, and how to effectively use these predictions in your verification workflows.


Why industry classification matters

Accurate industry classification is fundamental to:

  • Risk management: Identifying prohibited or high-risk business types
  • Regulatory compliance: Meeting card network requirements and AML obligations
  • Underwriting decisions: Assessing industry-specific risk factors
  • Portfolio management: Segmenting and monitoring businesses by sector

What you'll learn

  • How the accuracy score is calculated and what it represents
  • Which data sources and signals contribute to predictions
  • How to interpret different accuracy levels and respond appropriately
  • Best practices for using industry predictions in automated workflows
  • How to handle edge cases and low-confidence predictions

Understanding the Accuracy Score

The accuracy field represents Baselayer's confidence level in the predicted industry classification, expressed as a decimal value between 0 and 1 (or 0% to 100%).

What accuracy represents:

  • High accuracy (0.75–1.00): Strong, reliable prediction based on multiple consistent signals
  • Medium accuracy (0.50–0.74): Moderate confidence; prediction based on limited or mixed signals
  • Low accuracy (0.00–0.49): Weak prediction; insufficient or conflicting data

Accuracy is not a measure of how "correct" the prediction is, it's a measure of how confident Baselayer is based on the available data. A business with a rich online presence and clear operational indicators will typically receive a higher accuracy score than one with minimal digital footprint.


Data Inputs & Signals

Baselayer's Industry Prediction model analyzes multiple data sources to classify a business. The strength and consistency of these signals directly influence the accuracy score.

Primary data sources

Baselayer's Industry Prediction is based entirely on publicly available information:

1. Website content

  • Homepage text, product descriptions, and service offerings
  • Business descriptions and "about us" content
  • Navigation structure and site organization
  • Meta descriptions and page titles

When a business has a well-developed website with clear descriptions of their products or services, predictions are typically more accurate.

2. Social media profiles & online listings

  • Business profiles on platforms like Yelp, Google Business, and Facebook
  • Review platforms with business descriptions
  • Professional networks (LinkedIn company pages)
  • Online directories and local listings

The consistency of industry information across platforms strengthens prediction confidence.

3. Public records & licenses

  • Publicly available professional licenses (where applicable)
  • Business registrations in public databases
  • Industry-specific certifications or permits found online

These provide authoritative signals when available.

4. Business name analysis

  • Industry-specific keywords in the business name
  • Common naming patterns for certain sectors
  • Geographic or service-type indicators

Names like "Mike's Plumbing Co" or "Main Street Dental" provide strong classification signals.


How signals combine

Strong prediction (high accuracy):

  • Multiple data sources align on the same industry
  • Website contains detailed, industry-specific content
  • Business name includes clear industry indicators
  • Social profiles and online listings show consistent industry information

Weak prediction (low accuracy):

  • Limited or no website content
  • Generic business name without industry signals
  • Conflicting information across sources
  • Minimal online presence

Understanding the Accuracy Score

The accuracy field represents Baselayer's confidence level in the predicted industry classification, expressed as a decimal value between 0 and 1 (or 0% to 100%).

What accuracy represents:

High accuracy (0.75–1.00): Strong, reliable prediction based on multiple consistent signals

Medium accuracy (0.50–0.74): Moderate confidence; prediction based on limited or mixed signals

Low accuracy (0.00–0.49): Weak prediction; insufficient or conflicting data

Accuracy is not a measure of how "correct" the prediction is, it's a measure of how confident Baselayer is based on the available data. A business with a rich online presence and clear operational indicators will typically receive a higher accuracy score than one with minimal digital footprint.


How Accuracy is Calculated

Baselayer uses a proprietary machine learning model to predict industry codes and calculate accuracy scores. Here's how the process works:

Step 1: Data collection

When you request an Industry Prediction, Baselayer:

  1. Analyzes the business name and address provided
  2. Discovers and scrapes the business website (if available or found)
  3. Searches for social media profiles and online listings
  4. Looks for public records such as business licenses or certifications

Step 2: Signal extraction

The model extracts industry-relevant signals from each data source:

  • Text analysis: Keywords, phrases, and descriptions indicating business activity
  • Structural analysis: Website organization, navigation patterns, and content types
  • Consistency analysis: How well information aligns across different sources

Step 3: Classification & confidence scoring

The model:

  1. Predicts the most likely NAICS code based on extracted signals
  2. Maps to corresponding MCC and SIC codes
  3. Identifies 4–8 keywords representing core business activities
  4. Calculates an accuracy score based on:
    • Signal strength: Quality, quantity and clarity of available data
    • Signal consistency: Agreement between different data sources

Step 4: Validation & output

The prediction is validated against:

  • Known industry patterns and classification rules
  • Card network risk indicators (Mastercard, Visa)
  • Historical accuracy for similar business profiles

The final output includes the predicted codes, accuracy score, keywords, and risk indicators.


Interpreting Accuracy Levels

Understanding what different accuracy scores mean helps you make appropriate decisions about automation and review workflows.

High accuracy: 0.75–1.00

What it means:

  • Strong, reliable prediction based on multiple consistent data sources
  • The business has a clear digital presence with identifiable industry indicators
  • High likelihood that the predicted industry is correct

Typical scenarios:

  • Established businesses with detailed websites
  • Companies with industry-specific names (e.g., "Smith & Associates Law Firm")
  • Businesses with consistent information across multiple platforms
  • Organizations with clear product catalogs or service descriptions

Recommended action:

  • Safe for automated decisioning
  • Use directly in prohibited industry checks
  • Trust the predicted NAICS/MCC/SIC codes for risk assessment

Example: A plumbing company with:

  • Website showing services, service areas, and pricing
  • Business name "ABC Plumbing & Heating Services"
  • Google Business profile with reviews
  • LinkedIn company page describing HVAC services

Accuracy: 0.92 — Very confident the business is in NAICS 238220 (Plumbing, Heating, and Air-Conditioning Contractors)


Medium accuracy: 0.50–0.74

What it means:

  • Moderate confidence based on limited or mixed signals
  • Some industry indicators present but not comprehensive
  • Prediction is likely directionally correct but may lack specificity

Typical scenarios:

  • Early-stage businesses with basic websites
  • Generic business names without industry keywords
  • Limited online presence or sparse website content
  • Mixed signals across different data sources

Recommended action:

  • Use with caution in automated workflows
  • Consider manual review for high-risk decisions
  • Supplement with additional verification if needed
  • Focus on broader industry categories (2-digit or 4-digit NAICS) rather than specific 6-digit codes

Example: A consulting firm with:

  • Simple website with "About" and "Contact" pages but limited service descriptions
  • Generic business name "Apex Solutions LLC"
  • No social media presence

Accuracy: 0.68 — Likely consulting/professional services but unclear which specific type


Low accuracy: 0.00–0.49

What it means:

  • Weak prediction based on insufficient or conflicting data
  • The business has minimal digital footprint
  • Prediction should not be trusted for automated decisions

Typical scenarios:

  • Very new businesses without established online presence
  • Conflicting information across different sources
  • Generic holding companies or multi-industry operations

Recommended action:

  • Do not use for automated decisioning
  • Require manual review or additional documentation
  • Ask the applicant for clarification on business activities

Example: A startup with:

  • Recently registered business name "TechCo Inc."
  • No website or undeveloped domain
  • No online profiles or directory listings
  • SOS records with vague purpose: "technology services"

Accuracy: 0.42 — Insufficient data to confidently predict industry


Response Fields Explained

Industry Prediction returns several fields that work together to provide comprehensive classification information.

Core prediction fields

code (NAICS code)

  • The 6-digit North American Industry Classification System code
  • Most granular and specific industry classification
  • Hierarchical structure: first 2 digits = sector, 4 digits = industry group, 6 digits = specific industry
  • Example: 236115 = New Single-Family Housing Construction

title

  • Human-readable description of the NAICS code
  • Official NAICS title from the classification system
  • Example: "New Single-Family Housing Construction"

accuracy

  • Baselayer's confidence in the prediction (0–1 scale)
  • Higher values indicate stronger, more reliable predictions
  • Recommended threshold: ≥ 0.75 for automated use

keywords

  • Array of 4–8 keywords representing core business activities
  • Extracted from website content, business descriptions, and other sources
  • Useful for detecting sensitive terms within allowed industries

Alternative code formats

mcc_codes[]

  • Array of Merchant Category Code objects
  • Each object includes:
    • code: 4-digit MCC (e.g., "1799")
    • title: MCC description
    • mastercard_risk: Boolean indicating Mastercard high-risk status
    • visa_risk_tier: Visa risk classification (1, 2, 3, or null)

sic_code

  • Standard Industry Code (4-digit format)
  • International system, still used by some organizations
  • Example: 1711 = Plumbing, Heating, and Air-Conditioning Contractors

Risk assessment fields

risk_level

  • Baselayer's normalized risk assessment: low, medium, or high
  • Based on industry characteristics, regulatory considerations, and fraud patterns
  • Independent of card network risk indicators

mastercard_risk

  • Boolean indicating if Mastercard considers the MCC high-risk
  • Based on Mastercard's Brand Risk Assessment and Security Enhancement (BRAM) program
  • true = high-risk per Mastercard guidelines

visa_risk_tier

  • Visa's risk tier classification for the MCC
  • Possible values: "1" (high risk), "2" (standard risk), "3" (emerging high risk), or null
  • Based on Visa's Merchant Category Monitoring Program (MCMP)

Using Risk Indicators

Risk indicators help you align with card network requirements and implement appropriate controls.

When to use each indicator

Accuracy score

  • Use to determine trustworthiness of the prediction itself
  • Controls whether you can safely automate decisions based on the industry code
  • Does not assess industry risk - only prediction confidence

Risk level

  • Use for general risk assessment across your portfolio
  • Helps prioritize monitoring and review resources
  • Based on Baselayer's analysis of fraud patterns and industry characteristics

Mastercard risk & Visa risk tier

  • Use for card network compliance and payment processing decisions
  • Essential for payment facilitators, ISOs, and merchant acquirers
  • May trigger additional monitoring, reserves, or program restrictions

Best Practices

These eight practices will help you effectively implement Industry Prediction in your verification workflows:

Set appropriate accuracy thresholds

Recommended baseline: 0.75 Baselayer's internal validation shows predictions with accuracy ≥ 0.75 are reliable for automated decisioning. However, we recommend adjusting based on your risk tolerance.


Create tiered review policies

Don't treat all accuracy levels the same way:

Auto-approve zone (≥ 0.75):

  • Trust the predicted industry code
  • Apply prohibited industry checks automatically
  • Use risk indicators for decisioning

Review zone (0.50–0.74):

  • Flag for quick analyst review
  • Cross-reference with application data
  • Request clarification if industry is unclear
  • Use broader NAICS categories (2-digit or 4-digit) for policy checks

Manual verification zone (< 0.50):

  • Request additional information about business activities
  • Do not auto-decline based on prediction
  • Consider alternative verification methods

Establish prohibited industry lists at multiple levels

Use NAICS hierarchy strategically:

2-digit codes (broad sectors): Ban entire economic sectors when appropriate (e.g., 71 = Arts, Entertainment, and Recreation)

4-digit codes (industry groups): Target specific industry groups while allowing related activities (e.g., 7132 = Gambling Industries)

6-digit codes (specific industries): Precision targeting for niche restrictions (e.g., 713210 = Casinos but allowing 713290 = Other Gambling Industries)

Example policy:

Prohibited 2-digit NAICS: ["71", "92"]
Prohibited 4-digit NAICS: ["4539", "8129"] 
Prohibited 6-digit NAICS: ["453998", "812990", "812199"]

This allows flexible risk management without overly broad restrictions.


Monitor keywords for nuanced risk

Even within allowed industries, certain keywords may indicate prohibited or high-risk activities:

Common sensitive keywords:

  • Pain (opioid risk in healthcare)
  • CBD, cannabis, hemp (regulated substances)
  • Vape, tobacco, nicotine (age-restricted products)
  • Escort, adult, massage (potential TOSA risk)
  • Crypto, coin, token (virtual currency)
  • Gambling, betting, casino (gaming activities)

Implementation approach: Create a keyword watchlist and flag businesses if a prohibited keyword is found.

Baselayer has a complete list of recommended keywords that can be shared upon request.


Leverage network risk indicators

For payment processing use cases:

Mastercard compliance: If mastercard_risk: true, the MCC is on Mastercard's Brand Risk Assessment and Security Enhancement (BRAM) program. This may require:

  • Additional underwriting and monitoring
  • Higher reserves or transaction limits
  • Specialized high-risk processing programs

Visa compliance: Use visa_risk_tier to apply appropriate controls:

  • Tier 1: High-risk, stringent requirements
  • Tier 2: Standard monitoring
  • Tier 3: Emerging high-risk, enhanced oversight

Always consult current card network rules, as these change periodically.


Troubleshooting & Edge Cases

When accuracy is unexpectedly low

Possible causes:

  1. Website is new or under development

    • Newly launched sites may not have enough content indexed
    • "Coming soon" pages provide no classification signals
  2. Business operates primarily offline

    • Local service businesses without strong online presence
    • Traditional brick-and-mortar operations
  3. Generic or misleading business name

    • Names like "ABC Solutions" or "Main Street Enterprises" provide no industry clues
  4. Conflicting signals across sources

    • Website describes one type of business, online listings indicate another
    • Multiple lines of business with no clear primary activity

When the predicted industry seems wrong

Verification steps:

  1. Review the keywords field: often provides insight into why a particular industry was predicted
  2. Check if accuracy is low: predictions below 0.75 should be treated with skepticism
  3. Look for mixed signals: website may describe multiple activities, model chose the most prominent
  4. Consider franchise vs. corporate distinctions: corporate websites may not reflect individual franchisee operations

When to request Order.WebsiteAnalysis with Industry Prediction

For best accuracy, request Industry Prediction together with Website Analysis:

{
  "options": ["Order.NaicsPrediction", "Order.WebsiteAnalysis"]
}

Website Analysis provides additional content that strengthens industry predictions, especially for:

  • Businesses with detailed websites
  • Service-based companies where operations aren't obvious from the name
  • Companies in ambiguous or overlapping industry categories

Key Takeaways

  1. Accuracy ≥ 0.75 is reliable for automated decisioning in most use cases
  2. Accuracy represents confidence, not correctness - it measures data quality and signal strength
  3. Low accuracy doesn't mean wrong, it means insufficient data to be confident
  4. Combine accuracy with risk indicators (keywords, Mastercard risk, Visa tier) for comprehensive screening
  5. Use NAICS hierarchy strategically, 2-digit for broad bans, 4-digit for targeted restrictions, 6-digit for precision

Additional Resources

API Documentation

Related Guides

Support For questions about Industry Prediction or to report systematic accuracy issues, contact Baselayer support or your account team.