Online Presence: Best Practices
Decisioning patterns and example workflows for using Online Presence data in your onboarding process.
Introduction
Online presence signals are most valuable when applied with context. A missing website means something different for a 20-year-old concrete contractor than for a company claiming to be an established e-commerce platform. A low review volume is expected for a B2B firm and suspicious for a restaurant.
This guide covers how to use each sub-product effectively, how to combine them into a coherent decisioning workflow, and how to detect fraud patterns observed in the field. It also covers sector-specific patterns. For field definitions and response shapes, see Online Presence: Basics.
Working examples throughout this guide. We use two real Web Presence responses side by side:
- ✅ Lucali: a well-known Brooklyn pizza restaurant. Clean, established, legitimate. Submitted website
lucali.comand email[email protected]— both on the corporate domain, both consistent with the discovered identity.- 🚩 Hartwell Legal Group: a law firm application where the submitted website is a lookalike domain registered 15 days before the application, impersonating an established firm with a 4-year-old domain. Every fraud signal in this guide fires on this example.
Website Analysis
What to look for
Is the site real and operational? Use website_build_status and parked together. A status of coming_soon or inactive, or parked: true, indicates a placeholder or undeveloped domain. Treat these as early-stage signals, not automatic flags - but weight them heavily if the business claims years of operation. Note that many legitimate SMBs (local contractors, restaurants, service businesses) operate with no website at all, relying instead on Google Business, Yelp, or Facebook. A missing website should be informational, not a red flag. Evaluate it alongside the rest of the digital footprint: strong review presence and consistent contact data across platforms can more than compensate.
Is the domain legitimate? Flag if ssl_validity.is_valid is false. An invalid or missing SSL certificate is a meaningful risk signal, especially for businesses claiming to transact online.
Does the contact information align? Baselayer discovers emails and phone numbers from the website. Cross-check website_analysis.emails[] and phone_numbers[] against the submitted application data. Leverage email_match and phone_number_match fields. A business with email_deliverable: true and a professional domain on the discovered website, but a free email or an email on a different domain in the application, is a synthetic identity or impersonation signal worth flagging.
Does the submitted website match the discovered one? Check business_website_match. A value of false means the domain you were given doesn't match what Baselayer found. Compare the two objects directly - divergence in parked, website_build_status, or whois_record.domain_age_months between submitted and discovered domains is itself a risk signal.
How substantial is the website? website_structure_metrics returns depth (maximum link depth: "0", "1", or "2+") and breadth (total unique pages discovered). A real operating business almost always has some structure. A site with depth: "0" and breadth: "1" is not automatically suspicious, but combined with other weak signals it suggests a site put up quickly rather than built over time.
Key thresholds
| Signal | Threshold | Action |
|---|---|---|
website_build_status | ≠ active or parked: true | Flag — site may be placeholder or inactive |
ssl_validity.is_valid | false | Flag — insecure or spoofed site |
whois_record.domain_age_months | < 6 and business claims maturity | Flag — newly registered domain for an older business |
business_website_match | false | Flag — submitted domain not verified |
website_structure_metrics | depth: "0" and breadth: "1" | Weak signal — evaluate alongside other indicators |
Submitted email domain vs. discovered email_deliverable domain | Mismatch (e.g., free email or unrelated domain vs. deliverable corporate domain) | Flag — potential impersonation or synthetic identity |
Field ✅ Lucali 🚩 Hartwell Legal Group website_build_statusactive(found);active(submitted)active(found);active(submitted)parkedfalse(both)false(both)ssl_validity.is_validtrue(both)true(submitted);null(found)domain_age_months(found)213 50 — consistent with firm founded 4 years ago domain_age_months(submitted)213 — matches found domain 0 — domain registered 15 days before application business_website_matchtrue— submittedlucali.commatches found domainfalse— submitted domain does not match found domainemail_deliverable(found)truetrueSubmitted email domain lucali.com— matches found domainhartwelllegalgrp.com— different from foundhartwelllegal.com
Domain Impersonation Patterns
This section covers fraud patterns that are only detectable by comparing the submitted domain against the independently discovered domain. They require input_website_analysis to be present, which means the applicant's website must be included in the request. Submit the website collected in your application form, or the email domain from the applicant.
Domain age vs. business age
The most reliable impersonation signal available. Legitimate businesses typically have domains as old as, or older than, the business itself. A business with years of history applying with a brand-new domain is a serious red flag.
The pattern in practice: A fraudster registers a lookalike domain days or weeks before submitting an application. The name is close enough to pass a casual read: a transposed letter, an added word, a slightly different TLD. They set up email on that domain to establish a communication channel with the FI, intercepting correspondence and redirecting funds while the real business owner never finds out.
Baselayer's website_analysis will return the legitimate, long-standing domain. input_website_analysis exposes the fraud by revealing that the submitted domain was registered days ago. Use whois_record.domain_created_at from both objects alongside months_in_business from the Business Search response.
| Condition | Action |
|---|---|
Submitted domain < 90 days old AND business > 24 months old | Flag for manual review |
Submitted domain < 30 days old AND business > 24 months old | High-risk — escalate immediately |
| Found domain age roughly consistent with business age | Positive corroborating signal |
from datetime import datetime, timezone
def check_domain_age_risk(web_presence_response, months_in_business):
input_analysis = web_presence_response.get("input_website_analysis")
if not input_analysis or not months_in_business:
return "INSUFFICIENT_DATA"
whois = input_analysis.get("whois_record")
if not whois or not whois.get("domain_created_at"):
return "NO_WHOIS_DATA" # Treat as elevated risk for established businesses
created_at = datetime.fromisoformat(whois["domain_created_at"])
domain_age_months = (datetime.now(timezone.utc) - created_at.replace(tzinfo=timezone.utc)).days / 30
if domain_age_months < 1 and months_in_business > 24:
return "HIGH_RISK" # Domain registered days ago — near-certain impersonation
elif domain_age_months < 3 and months_in_business > 24:
return "HIGH_RISK" # Sub-30 days — escalate immediately
elif domain_age_months < 6 and months_in_business > 24:
return "FLAG" # Manual review required
else:
return "PASS"
Field ✅ Lucali 🚩 Hartwell Legal Group input_website_analysisPresent — submitted lucali.comPresent — submitted hartwelllegalgrp.comSubmitted domain domain_created_at2008-08-22— 213 months ago2026-05-09— 15 days before applicationSubmitted domain domain_age_months213 — consistent with found domain 0 Found domain domain_age_months213 — matches submitted domain 50 — consistent with a firm founded ~4 years ago check_domain_age_riskresultPASS— submitted and found domains alignHIGH_RISKThe Hartwell Legal Group response makes the fraud pattern unmistakable: a firm claiming years of operation, a found website with a 50-month-old domain, and a submitted website registered 15 days before the application on a domain that closely mirrors the real one (
hartwelllegalgrp.comvs.hartwelllegal.com). Lucali, by contrast, shows the expected clean pattern — submitted and found domains are the same, ages align perfectly.
Lookalike domain names
Beyond domain age, inspect the submitted and found domain names directly for visual similarity. Common patterns:
- Added or transposed characters —
hartwelllegalgrp.comvs.hartwelllegal.com - Hyphenation —
hartwell-legal.comvs.hartwelllegal.com - TLD substitution —
.net,.co,.orgvariants of a.comdomain - Word additions —
hartwelllegalgroup.com,hartwelllegalservices.comWhenbusiness_website_match: falseand the two domain names are visually similar, treat this as a near-certain impersonation attempt regardless of other signals.
Free email vs. deliverable corporate domain
When the applicant submits an email address on a domain that differs from the discovered business domain — and the discovered domain has email_deliverable: true — that combination is a strong impersonation signal. If an established business has a working corporate domain with functioning email, there is no legitimate reason for a representative to apply using an email address on a different domain.
The most likely explanation: the real business exists, but the applicant does not work there. Note that this signal applies beyond free email providers. In the Hartwell Legal Group example, the submitted email is on the newly-registered lookalike domain (hartwelllegalgrp.com), not a free provider — but the mismatch against the discovered hartwelllegal.com domain is equally telling.
FREE_EMAIL_DOMAINS = {
"gmail.com", "yahoo.com", "icloud.com", "hotmail.com",
"outlook.com", "aol.com", "protonmail.com", "me.com"
}
def check_email_impersonation(submitted_email, found_website_url, website_analysis):
if not submitted_email or not website_analysis:
return "INSUFFICIENT_DATA"
email_deliverable = website_analysis.get("email_deliverable", False)
if not email_deliverable:
return "PASS" # Found domain not deliverable — signal not applicable
submitted_domain = submitted_email.split("@")[-1].lower() if "@" in submitted_email else None
if not submitted_domain:
return "INSUFFICIENT_DATA"
# Extract found domain from URL
from urllib.parse import urlparse
found_domain = urlparse(found_website_url).netloc.lower().lstrip("www.") if found_website_url else None
if submitted_domain in FREE_EMAIL_DOMAINS:
return "FLAG" # Free email while corporate domain is operational
if found_domain and submitted_domain != found_domain:
return "FLAG" # Email on a different domain than the discovered business domain
return "PASS"
Field ✅ Lucali 🚩 Hartwell Legal Group Submitted email [email protected]— corporate domain[email protected]— lookalike domainFound domain lucali.comhartwelllegal.comemail_deliverable(found)truetrueEmail domain matches found domain Yes — lucali.com=lucali.comNo — submitted email is on the lookalike domain check_email_impersonationresultPASSFLAG
Industry Prediction
What to look for
Set a confidence floor. accuracy ≥ 0.75 is the recommended threshold for automated decisioning. Predictions below this are typically based on thin or conflicting online data. Use them directionally (at the 2-digit or 4-digit NAICS level) but require manual review before acting on the 6-digit code.
Maintain prohibited industry lists at multiple levels. Use NAICS hierarchy strategically:
- 2-digit codes to ban entire sectors (e.g.,
71= Arts, Entertainment, and Recreation) - 4-digit codes to target industry groups (e.g.,
7132= Gambling Industries) - 6-digit codes for precision targeting (e.g.,
713210= Casinos while allowing713290= Other Gambling Industries) Prohibited and restricted NAICS and keyword lists are available from your Baselayer account representative.
Scan keywords[] for sensitive terms. Even within permitted industries, certain keywords may indicate restricted activities — opioid risk in healthcare (pain, opioid), regulated substances (CBD, cannabis, vape), or potential TOSA risk (escort, adult). Maintaining a keyword watchlist catches high-risk niches inside approved sectors. A complete recommended keyword list is available from your account representative.
Use mcc_codes[] for card network compliance. For payment processing use cases, check mastercard_risk and visa_risk_tier inside each entry of mcc_codes[]. If any MCC has mastercard_risk: true or visa_risk_tier: "1", apply the relevant network compliance controls. Note that a single industry prediction may return multiple MCC codes — check all of them.
Note: Baselayer uses 2017 NAICS codes across all industry prediction and classification endpoints. The 2017 revision is the basis for all returned codes, filters, and industry-related fields. If you're cross-referencing against another system, confirm it also uses the 2017 standard.
Tiered review policy
| Accuracy | Treatment |
|---|---|
≥ 0.75 | Safe for automated decisioning. Apply prohibited industry checks and network risk indicators automatically. |
0.50 – 0.74 | Use directionally. Flag for analyst review. Apply prohibited checks at 2-digit or 4-digit NAICS level only. |
< 0.50 | Do not use for automated decisions. Request additional information about business activities. |
Field ✅ Lucali 🚩 Hartwell Legal Group accuracy0.95 0.98 code722511— Full-Service Restaurants541110— Offices of Lawyerskeywordsrestaurant, pizzeria, pizza, calzone, dine-in, table service law firm, attorney, legal representation, civil litigation, family law risk_levellow low mastercard_riskfalse false Note: industry prediction is clean on both.
Social Profiles
Social profiles are returned at
business.social_profiles[]onPOST /searches(requiresOrder.Enhanced) and atfound_social_profiles[]onPOST /web_presence_requests(requiresOrder.SocialMedia). The object structure is identical.
What to look for
Filter by confidence first. Only profiles with confidence: high should influence automated decisions. Medium and low confidence profiles may belong to unrelated businesses with similar names.
Verify submitted social profiles. When you submit known social profiles in the request, Baselayer returns a social_profiles_match[] array. Treat this as an additional confidence signal alongside email_match and phone_number_match: a confirmed match is a positive identity signal that corroborates the applicant's submitted data. A submitted profile that was not confirmed is informational rather than diagnostic, but warrants closer review. social_profiles_match[] is available on POST /web_presence_requests.
Cross-reference contact data across platforms. Compare business_website and email fields across discovered profiles and the submitted application. Inconsistencies between platform data and application data — especially different websites or unrelated email domains — can indicate impersonation.
Look for business-oriented signals. is_business_account: true on Instagram and is_business_page: true on Facebook are strong legitimacy indicators. is_private: true on Instagram is low-confidence and should not contribute to approval decisions.
Weigh by sector. High follower counts and review presence matter more for consumer-facing brands than B2B firms. For B2B, LinkedIn is typically the most relevant platform; for retail and hospitality, Instagram and Facebook carry more weight. Adjust thresholds accordingly.
Check found_on[]. Profiles with found_on: ["FOUND_WEBSITE"] were linked directly to the business's website, a stronger signal of ownership than profiles discovered through general search alone.
Field ✅ Lucali 🚩 Hartwell Legal Group Profiles found Instagram (279K followers), Facebook (13K followers) Facebook (7 followers, found_on: ["FOUND_WEBSITE"])is_business_pagetrue (both) true business_websitein profilelucali.comhartwelllegal.com— not the submitted domainfound_on["FOUND_WEBSITE"]["FOUND_WEBSITE"]The Hartwell example shows an important corroborating signal: the Facebook page linked from the found website (
hartwelllegal.com) points back tohartwelllegal.com, not to the submittedhartwelllegalgrp.com. This cross-platform consistency of the real business creates a consistent picture that directly contradicts the submitted application.
Reviews
Reviews are returned at
business.reviews[]onPOST /searches(requiresOrder.Enhanced) and atfound_reviews[]onPOST /web_presence_requests(requiresOrder.ReviewSummaryorOrder.ReviewFull). The object structure is identical.
What to look for
Check Google open_state for operational status. The metadata.open_state field is populated only for Google reviews (source: "google") and reflects the current status of the business as reported by Google Maps. This is one of the most direct operational signals available. Critical values:
"Permanently closed": the business has been marked closed on Google. A major red flag for any application; escalate regardless of other signals."Temporarily closed": the business is not currently operating. Worth clarifying with the applicant."Closed · Opens [time]": outside operating hours, but active. Normal."Open": currently open. Positive signal.
def check_google_open_state(found_reviews):
for review in found_reviews or []:
if review.get("source") == "google":
metadata = review.get("metadata") or {}
open_state = (metadata.get("open_state") or "").lower()
if "permanently closed" in open_state:
return "HIGH_RISK"
elif "temporarily closed" in open_state:
return "FLAG"
return "PASS"Combine rating and volume. A high rating with low volume (< 10) is weak evidence. A low rating with meaningful volume (> 20) is a strong negative signal. Use both together.
Only trust confidence: high for automation. Medium and low confidence review profiles may belong to an unrelated business. Filter before scoring.
Cross-verify identity. Check address, phone_number, and business_website across review platforms and your submitted application data. In impersonation cases, review data will consistently point to the real business — not the submitted application details.
Apply sector awareness. Consumer-facing businesses (retail, hospitality, healthcare, food service) typically have substantial review presence. B2B firms often have few or no reviews — absence is not a risk signal for these sectors.
Field ✅ Lucali 🚩 Hartwell Legal Group open_state"Closed · Opens 5 PM"— dinner-only restaurant, normal"Closed · Opens 9 AM Mon"— professional office, normalGoogle review volume 3,100 26 — low but appropriate for a small law firm Google rating 4.2 4.5 business_websitein reviewlucali.comstantonporter.com— the real domain, not the submitted onephone_numberin reviewMatches application Does not match submitted phone number In the Hartwell example, the review data independently confirms the real business identity: phone number and website in the Google listing point to the legitimate firm, directly contradicting the submitted application.
Directory Listings
Directory listings require
Order.DirectoryListingto be included in your request options.
What to look for
Use people[] for officer cross-referencing. Directory listings often include officers or principals. Cross-reference these against submitted officer_names and business.business_officers[] from the KYB search. An officer appearing consistently across directories, the website, and SoS records is a strong identity signal.
Cross-reference contact details against the application. As with reviews, directory listings independently record the business's contact information. In impersonation cases, directory data will point to the real business — phone numbers, addresses, and websites that differ from what was submitted.
Use listing count directionally. There is no hard threshold, but a business appearing in multiple independent directories signals an established real-world presence. Zero directory listings for a claimed 5+ year operation is worth noting, particularly for local service businesses, contractors, and professional services.
Weight directory presence by sector. Local service businesses, contractors, and professional services firms typically appear in directories. Digital-native or B2B businesses may have fewer listings — absence is not a red flag for these sectors.
Field ✅ Lucali 🚩 Hartwell Legal Group Listings found n/a 1 — legal directory listing categoryn/a "Law Firm"— consistent with541110business_websitein listingn/a stantonporter.com— the real domainphone_numberin listingn/a Does not match submitted phone number Again, the directory independently confirms the real business identity — and contradicts the submitted application.
Putting It All Together
Web presence signals work best as a layered system. Each sub-product adds a different dimension:
- Website Analysis answers: is this business real and operational online?
- Industry Prediction answers: is this business in a permitted sector, and how confident are we?
- Social Profiles answers: does this business have a consistent public identity across platforms?
- Reviews answers: do real customers interact with this business, and is their experience consistent with the application?
- Directory Listings answers: is this business recognized by third-party sources, and does the data they hold align?
**No single signal should drive a decision alone. **
The Hartwell Legal Group example illustrates this precisely: industry prediction is clean (0.98 accuracy, low risk, no prohibited keywords), the found website is active and legitimate, and the business has positive reviews. The fraud only becomes visible when you compare the submitted domain against the discovered one - and then cross-reference contact data across reviews and directories to see the contradiction.
Lucali, by contrast, passes every check cleanly: submitted domain matches the found domain, domain age is consistent with business age, and the submitted email is on the same corporate domain.
The most reliable approach:
- Start with website legitimacy.
website_build_status,parked,ssl_validity.is_valid, and domain age are fast checks that catch the most obvious cases. - Run impersonation checks if a website was submitted. Domain age vs. business age and email domain mismatch catch the fraud patterns that website legitimacy checks alone miss. Leverage fields like
email_matchorphone_number_matchto cross-reference the submitted information. These are the highest-value checks in this guide. - Classify with industry prediction. A confidence score and prohibited-list check against
industry_prediction.codeandkeywords[]does the heavy lifting for compliance screening. - Add social and review context where relevant. Sector-appropriate, confidence-filtered, cross-referenced against application data.
- Use directory listings to corroborate. Particularly valuable for local businesses, contractors, and sole proprietors. In impersonation cases, directory data will independently confirm the real business identity.
Example Workflow
This workflow is a starting template. Tune thresholds and logic based on your product type, risk appetite, and customer base.
Step 1: Website legitimacy
| Check | Logic | Signal |
|---|---|---|
| Website operational | website_build_status ≠ active or parked: true | ℹ️ Informational — evaluate alongside review and social presence |
| No website found | found_website is null | ℹ️ Informational — common for SMBs; weight other signals more heavily |
| SSL validity | ssl_validity.is_valid ≠ true | 🚩 Flag — may indicate spoofed or insecure site |
| Domain age vs. business age (submitted) | domain_age_months < 1 and months_in_business > 24 | 🚨 High-risk — escalate immediately |
| Domain age vs. business age (submitted) | domain_age_months < 6 and months_in_business > 24 | 🚩 Flag — new domain for an established business |
| Submitted website unverified | business_website_match: false | 🚩 Flag — submitted domain not confirmed |
| Email domain mismatch | Found domain has email_deliverable: true; submitted email is on a free provider or different domain | 🚩 Flag — potential impersonation |
| Address confirmed online | business_address_match_sources includes FOUND_WEBSITE, REVIEW, or DIRECTORY | ✅ Positive identity signal |
Step 2: Industry alignment
| Check | Logic | Signal |
|---|---|---|
| Low prediction confidence | accuracy < 0.75 | 🔍 Review — low confidence prediction |
| Prohibited industry | industry_prediction.code matches restricted NAICS list | 🚩 Flag — prohibited sector |
| Sensitive keywords | keywords[] includes terms from restricted keyword list | 🚩 Flag — potentially noncompliant operations |
| Card network risk | Any mcc_codes[].mastercard_risk: true or mcc_codes[].visa_risk_tier: "1" | 🚩 Flag — network-restricted MCC |
Step 3: Social profile verification (if relevant for your sector)
| Check | Logic | Signal |
|---|---|---|
| Confidence threshold | Profile confidence ≠ high | ⚪ Ignore for automation |
| Business account | is_verified: true or is_business_account: true | ✅ Positive legitimacy signal |
| Submitted profile verified | social_profiles_match[] entry returns matched: true for a submitted profile | ✅ Positive identity signal |
| Cross-platform consistency | business_website in profile differs from submitted domain | 🚩 Flag — profiles point to a different domain than submitted |
Step 4: Reviews (if relevant for your sector)
| Check | Logic | Signal |
|---|---|---|
| Google operational status | metadata.open_state contains "permanently closed" | 🚨 High-risk — escalate immediately |
| Google operational status | metadata.open_state contains "temporarily closed" | 🚩 Flag — investigate |
| Negative sentiment | rating < 3.0 and volume > 20 | 🚩 Flag — consistent negative feedback |
| Identity mismatch | business_website or phone_number in reviews differs from application data | 🚩 Flag — review data points to a different business identity |
Step 5: Directory listings (if relevant for your sector)
| Check | Logic | Signal |
|---|---|---|
| Officer cross-reference | people[] entries match submitted officer_names | ✅ Positive identity signal |
| Contact data mismatch | business_website or phone_number in listing differs from application data | 🚩 Flag — directory points to a different business identity |
Decisioning summary
Approve when:
- Website is active, secure, and consistent with application data
- Submitted domain age is consistent with business age (or no website submitted)
- Industry prediction
accuracy ≥ 0.75, predicted code is permitted, no flagged keywords - No identity inconsistencies across sub-products
Review when:
- No website found and no compensating signals (reviews, social profiles, or directory listings) exist
- Website signals are mixed (e.g., active but recently registered)
- Social or review data is absent in a sector where it would be expected
Flag when:
- Submitted domain recently registered for a business claiming years of operation
- Email domain mismatch suggesting impersonation or synthetic identity
business_website_match: falsecombined with domain age discrepancy or visual domain similarity- Review or directory data points to a different website or phone number than submitted
- Industry code or keywords match a prohibited list
- Any MCC has
mastercard_risk: trueorvisa_risk_tier: "1" - Google
open_stateindicates"Permanently closed" - Consistent identity discrepancies across website, social profiles, reviews, and directories
Sector-specific guidance
| Sector | What matters most |
|---|---|
| Consumer retail / e-commerce | Active website with SSL; domain age; impersonation checks; social presence (Instagram, Facebook); Google and Yelp reviews with meaningful volume |
| Restaurants / hospitality | Review volume and rating (Google, Yelp, TripAdvisor); Google open_state; Facebook check-ins; address consistency across platforms |
| B2B / professional services | LinkedIn company page (confidence: high); website legitimacy; domain impersonation checks; directory listings; officer cross-reference |
| Contractors / local services | Directory presence (industry associations, chamber of commerce, licensing registries); website analysis; address match sources |
| Healthcare | Website legitimacy; keyword scan for restricted terms (full list available from your account representative); license-based directory listings |
Related guides
- Online Presence: Basics — data model, available orderables, and integration path
- Web Presence API Reference — full endpoint documentation
- Industry Prediction Guide — deep dive on NAICS accuracy, MCC codes, and prohibited industry logic
- Sole Proprietorship Verification — full sole prop workflow
- Business Search: Best Practices — combining online presence signals with core KYB verification
