Disposable Email Detection: Protecting Your Platform from Temporary Addresses

Disposable/temporary inboxes hurt activation funnels, referral programs, trial abuse protection, and deliverability. Below is a pragmatic approach with detection signals, sample code, SQL, and ops guidance.

Disposable Email Detection Overview

Why It Matters

Lower user quality and LTV due to throwaway accounts that never convert.
Increased fraud and promo abuse — free trials, coupons, and referral bonuses get exploited.
Bounce risk and sender reputation damage — high bounce rates tank deliverability.
Operational overhead — support tickets, manual reviews, and account cleanup.

Detection Signals

1. Curated Disposable Domain Lists

Maintain a table of known disposable providers. Sources include:

Public lists: GitHub repos like disposable-email-domains (1000+ domains).
Internal discovery: Track domains that appear in signups but have no MX records or suspicious patterns.
Community feeds: Abuse.ch, spamhaus, or custom crawlers.

# Example: check if domain is in disposable list
curl -s https://raw.githubusercontent.com/disposable-email-domains/disposable-email-domains/master/domains.txt | grep -q "mailinator.com" && echo "disposable"

2. MX Record Anomalies

Disposable services often have:

No MX records (relies on catch-all or forwarding).
Suspicious MX targets (e.g., pointing to temp mail services).
Missing SPF/DMARC (common for throwaway providers).

# Quick MX check for a domain
dig +short MX temp-mail.org
# Returns nothing or suspicious entries

# SPF/DMARC check
dig +short TXT temp-mail.org | grep -E "(spf|dmarc)"

3. Domain Age and Patterns

New domains (< 30 days old) are suspicious.
Known disposable TLDs (.tk, .ml, .cf are often abused).
Ephemeral patterns (domains that disappear after days).

# Check domain age
whois temp-mail.org | grep -i "Creation Date\|Registry Expiry Date" | head -2

4. ASN and Hosting Intelligence

Disposable services often cluster on:

High-risk ASNs (known for spam/VPN hosting).
Cloud providers with lax verification.
Geographic clusters (e.g., certain data centers).

-- Example: flag high-risk ASNs in user registrations
select user_id, email_domain, asn, country
from user_registrations r
join ip_geo g on g.ip = r.ip
where g.asn in (13335, 15169, 16276)  -- Known risky ASNs
and r.created_at > now() - interval '24 hours';

Implementation (Node.js + SQL)

1) Maintain a disposable domains table

create table if not exists disposable_domains (
  domain text primary key,
  source text,  -- 'public-list', 'internal-discovery', 'manual'
  confidence_score int check (confidence_score between 0 and 100),  -- 100 = definitely disposable
  first_seen timestamptz default now(),
  last_updated timestamptz default now(),
  is_active boolean default true
);

-- Index for fast lookups
create index idx_disposable_domains_active on disposable_domains(domain) where is_active = true;

-- Example upsert (run via ETL or cron)
insert into disposable_domains(domain, source, confidence_score)
values ('mailinator.com','public-list',95),
       ('10minutemail.com','public-list',90),
       ('guerrillamail.com','public-list',85)
on conflict (domain) do update set
  confidence_score = excluded.confidence_score,
  last_updated = now();

Practical Implementation Examples

Machine Learning Classifier

// Advanced machine learning classifier for disposable email detection
interface EmailFeatures {
  domain: string
  domainLength: number
  hasNumbers: boolean
  hasHyphens: boolean
  tld: string
  subdomainCount: number
  entropy: number
  mxRecordCount: number
  spfRecordExists: boolean
  dmarcRecordExists: boolean
  domainAge: number // in days
  registrationPattern: string
  suspiciousKeywords: string[]
  similarityToKnownDisposable: number
  trafficPatternScore: number
  geographicRisk: number
}

interface MLModel {
  weights: Record<string, number>
  bias: number
  featureNames: string[]
  threshold: number
  accuracy: number
  lastUpdated: number
}

interface PredictionResult {
  isDisposable: boolean
  confidence: number
  features: EmailFeatures
  modelVersion: string
  explanation: string[]
}

class DisposableEmailClassifier {
  private model: MLModel
  private featureExtractor: FeatureExtractor
  private knownDisposableDomains: Set<string> = new Set()
  private trainingData: Array<{ features: EmailFeatures; label: boolean }> = []

  constructor() {
    this.model = this.loadModel()
    this.featureExtractor = new FeatureExtractor()
    this.loadKnownDisposableDomains()
  }

  async predict(email: string): Promise<PredictionResult> {
    const domain = this.extractDomain(email)
    if (!domain) {
      return {
        isDisposable: false,
        confidence: 0,
        features: {} as EmailFeatures,
        modelVersion: this.model.lastUpdated.toString(),
        explanation: ['Invalid email format']
      }
    }

    // Check against known disposable domains first (fast path)
    if (this.knownDisposableDomains.has(domain)) {
      return {
        isDisposable: true,
        confidence: 95,
        features: {} as EmailFeatures,
        modelVersion: this.model.lastUpdated.toString(),
        explanation: ['Domain found in known disposable list']
      }
    }

    // Extract features for ML prediction
    const features = await this.featureExtractor.extractFeatures(domain)

    // Get ML prediction
    const mlScore = this.predictWithModel(features)

    // Combine with rule-based checks
    const ruleBasedScore = this.calculateRuleBasedScore(domain, features)

    // Ensemble prediction
    const combinedScore = (mlScore * 0.7) + (ruleBasedScore * 0.3)
    const isDisposable = combinedScore > this.model.threshold

    // Generate explanation
    const explanation = this.generateExplanation(features, mlScore, ruleBasedScore)

    return {
      isDisposable,
      confidence: Math.min(100, combinedScore * 100),
      features,
      modelVersion: this.model.lastUpdated.toString(),
      explanation
    }
  }

  async train(features: EmailFeatures[], labels: boolean[]): Promise<void> {
    // Simple gradient descent training
    const learningRate = 0.01
    const epochs = 100

    for (let epoch = 0; epoch < epochs; epoch++) {
      let totalError = 0

      for (let i = 0; i < features.length; i++) {
        const prediction = this.predictWithModel(features[i])
        const error = labels[i] ? prediction - 1 : prediction - 0

        totalError += Math.abs(error)

        // Update weights
        for (const featureName of this.model.featureNames) {
          const featureValue = features[i][featureName as keyof EmailFeatures] as number
          this.model.weights[featureName] -= learningRate * error * featureValue
        }

        this.model.bias -= learningRate * error
      }

      // Early stopping
      if (totalError < 0.01) break
    }

    this.model.lastUpdated = Date.now()
    this.model.accuracy = this.evaluateModel(features, labels)

    // Save updated model
    this.saveModel()
  }

  private predictWithModel(features: EmailFeatures): number {
    let score = this.model.bias

    for (const featureName of this.model.featureNames) {
      const weight = this.model.weights[featureName] || 0
      const value = features[featureName as keyof EmailFeatures] as number
      score += weight * value
    }

    // Sigmoid activation
    return 1 / (1 + Math.exp(-score))
  }

  private calculateRuleBasedScore(domain: string, features: EmailFeatures): number {
    let score = 0

    // Domain length heuristic
    if (features.domainLength < 5 || features.domainLength > 20) score += 0.3

    // TLD risk assessment
    const riskyTLDs = ['.tk', '.ml', '.cf', '.ga', '.gq']
    if (riskyTLDs.some(tld => features.tld.includes(tld))) score += 0.4

    // Numbers in domain
    if (features.hasNumbers) score += 0.2

    // Entropy (randomness) - disposable domains often have high entropy
    if (features.entropy > 3.5) score += 0.3

    // MX record anomalies
    if (features.mxRecordCount === 0) score += 0.5
    if (features.mxRecordCount > 5) score += 0.2

    // Missing SPF/DMARC
    if (!features.spfRecordExists || !features.dmarcRecordExists) score += 0.2

    // Domain age
    if (features.domainAge < 30) score += 0.4

    // Suspicious keywords
    if (features.suspiciousKeywords.length > 0) score += 0.3

    // Similarity to known disposable
    if (features.similarityToKnownDisposable > 0.8) score += 0.4

    return Math.min(1, score)
  }

  private generateExplanation(features: EmailFeatures, mlScore: number, ruleScore: number): string[] {
    const explanations: string[] = []

    if (mlScore > 0.7) explanations.push('High ML confidence score')
    if (ruleScore > 0.6) explanations.push('Multiple rule-based indicators triggered')

    if (features.domainLength < 5) explanations.push('Unusually short domain name')
    if (features.domainLength > 20) explanations.push('Unusually long domain name')

    if (features.entropy > 3.5) explanations.push('High domain name entropy (appears random)')

    if (features.mxRecordCount === 0) explanations.push('No MX records found')

    if (!features.spfRecordExists) explanations.push('Missing SPF record')

    if (features.domainAge < 30) explanations.push('Very new domain registration')

    if (features.suspiciousKeywords.length > 0) {
      explanations.push(`Suspicious keywords detected: ${features.suspiciousKeywords.join(', ')}`)
    }

    return explanations
  }

  private loadModel(): MLModel {
    // In production, load from database or file
    return {
      weights: {
        domainLength: -0.1,
        hasNumbers: 0.3,
        hasHyphens: 0.2,
        entropy: 0.4,
        mxRecordCount: -0.2,
        spfRecordExists: -0.3,
        dmarcRecordExists: -0.2,
        domainAge: -0.4,
        similarityToKnownDisposable: 0.5,
        trafficPatternScore: 0.3,
        geographicRisk: 0.2
      },
      bias: -2.0,
      featureNames: [
        'domainLength', 'hasNumbers', 'hasHyphens', 'entropy',
        'mxRecordCount', 'spfRecordExists', 'dmarcRecordExists',
        'domainAge', 'similarityToKnownDisposable', 'trafficPatternScore', 'geographicRisk'
      ],
      threshold: 0.6,
      accuracy: 0.92,
      lastUpdated: Date.now()
    }
  }

  private saveModel(): void {
    // Save model to database or file
    console.log('Model saved with accuracy:', this.model.accuracy)
  }

  private loadKnownDisposableDomains(): void {
    // Load from database or external API
    const disposableDomains = [
      'mailinator.com', '10minutemail.com', 'guerrillamail.com',
      'tempmail.com', 'throwaway.email', 'dispostable.com'
    ]

    disposableDomains.forEach(domain => this.knownDisposableDomains.add(domain))
  }

  private extractDomain(email: string): string | null {
    const atIndex = email.lastIndexOf('@')
    if (atIndex < 0) return null
    return email.slice(atIndex + 1).toLowerCase()
  }

  private evaluateModel(features: EmailFeatures[], labels: boolean[]): number {
    let correct = 0

    for (let i = 0; i < features.length; i++) {
      const prediction = this.predictWithModel(features[i])
      const predicted = prediction > this.model.threshold

      if (predicted === labels[i]) correct++
    }

    return correct / features.length
  }
}

class FeatureExtractor {
  async extractFeatures(domain: string): Promise<EmailFeatures> {
    const features: EmailFeatures = {
      domain,
      domainLength: domain.length,
      hasNumbers: /d/.test(domain),
      hasHyphens: domain.includes('-'),
      tld: domain.split('.').pop() || '',
      subdomainCount: domain.split('.').length - 1,
      entropy: this.calculateEntropy(domain),
      mxRecordCount: await this.getMXRecordCount(domain),
      spfRecordExists: await this.checkSPFRecord(domain),
      dmarcRecordExists: await this.checkDMARCRecord(domain),
      domainAge: await this.getDomainAge(domain),
      registrationPattern: this.analyzeRegistrationPattern(domain),
      suspiciousKeywords: this.findSuspiciousKeywords(domain),
      similarityToKnownDisposable: this.calculateSimilarityToKnownDisposable(domain),
      trafficPatternScore: await this.analyzeTrafficPatterns(domain),
      geographicRisk: await this.assessGeographicRisk(domain)
    }

    return features
  }

  private calculateEntropy(domain: string): number {
    const charCounts = new Map<string, number>()

    for (const char of domain) {
      charCounts.set(char, (charCounts.get(char) || 0) + 1)
    }

    let entropy = 0
    const length = domain.length

    for (const count of charCounts.values()) {
      const probability = count / length
      entropy -= probability * Math.log2(probability)
    }

    return entropy
  }

  private async getMXRecordCount(domain: string): Promise<number> {
    // In production, use DNS lookup
    // For demo, simulate based on domain patterns
    if (domain.includes('temp') || domain.includes('mail')) return 0
    return Math.floor(Math.random() * 3) + 1
  }

  private async checkSPFRecord(domain: string): Promise<boolean> {
    // In production, query DNS TXT records
    return Math.random() > 0.3 // 70% of domains have SPF
  }

  private async checkDMARCRecord(domain: string): Promise<boolean> {
    // In production, query DNS TXT records for _dmarc.domain
    return Math.random() > 0.5 // 50% of domains have DMARC
  }

  private async getDomainAge(domain: string): Promise<number> {
    // In production, use WHOIS lookup
    // For demo, simulate based on domain characteristics
    if (domain.length < 10) return Math.floor(Math.random() * 30) + 1 // 1-30 days
    if (domain.includes('temp')) return Math.floor(Math.random() * 7) + 1 // 1-7 days
    return Math.floor(Math.random() * 365) + 30 // 30-395 days
  }

  private analyzeRegistrationPattern(domain: string): string {
    // Analyze domain registration patterns
    if (domain.length < 8) return 'short'
    if (domain.includes('temp') || domain.includes('mail')) return 'temporary'
    if (/d{4,}/.test(domain)) return 'numeric'
    if (domain.split('.').length > 2) return 'multi_subdomain'
    return 'standard'
  }

  private findSuspiciousKeywords(domain: string): string[] {
    const suspiciousWords = [
      'temp', 'mail', 'throwaway', 'disposable', 'fake', 'test',
      'demo', 'sample', 'example', 'trash', 'junk', 'spam'
    ]

    return suspiciousWords.filter(word => domain.includes(word))
  }

  private calculateSimilarityToKnownDisposable(domain: string): number {
    // Calculate string similarity to known disposable domains
    const knownDisposable = ['mailinator', 'tempmail', 'guerrillamail', '10minutemail']

    let maxSimilarity = 0

    for (const disposable of knownDisposable) {
      const similarity = this.calculateStringSimilarity(domain, disposable)
      maxSimilarity = Math.max(maxSimilarity, similarity)
    }

    return maxSimilarity
  }

  private calculateStringSimilarity(str1: string, str2: string): number {
    // Simple Levenshtein distance ratio
    const longer = str1.length > str2.length ? str1 : str2
    const shorter = str1.length > str2.length ? str2 : str1

    if (longer.length === 0) return 1.0

    const editDistance = this.levenshteinDistance(longer, shorter)
    return (longer.length - editDistance) / longer.length
  }

  private levenshteinDistance(str1: string, str2: string): number {
    const matrix = Array(str2.length + 1).fill(null).map(() => Array(str1.length + 1).fill(null))

    for (let i = 0; i <= str1.length; i++) matrix[0][i] = i
    for (let j = 0; j <= str2.length; j++) matrix[j][0] = j

    for (let j = 1; j <= str2.length; j++) {
      for (let i = 1; i <= str1.length; i++) {
        const indicator = str1[i - 1] === str2[j - 1] ? 0 : 1
        matrix[j][i] = Math.min(
          matrix[j][i - 1] + 1,     // deletion
          matrix[j - 1][i] + 1,     // insertion
          matrix[j - 1][i - 1] + indicator // substitution
        )
      }
    }

    return matrix[str2.length][str1.length]
  }

  private async analyzeTrafficPatterns(domain: string): Promise<number> {
    // Analyze traffic patterns for the domain
    // In production, use historical traffic data

    // For demo, simulate based on domain characteristics
    if (domain.includes('temp')) return 0.8 // High risk
    if (domain.length < 10) return 0.6 // Medium risk
    return 0.2 // Low risk
  }

  private async assessGeographicRisk(domain: string): Promise<number> {
    // Assess geographic risk based on domain registration location
    // In production, use WHOIS data or IP geolocation

    // For demo, simulate based on TLD
    const highRiskTLDs = ['.ru', '.cn', '.ir', '.kp']
    const tld = domain.split('.').pop() || ''

    if (highRiskTLDs.includes('.' + tld)) return 0.8
    return 0.3
  }
}

// Integration with email validation service
const disposableClassifier = new DisposableEmailClassifier()

// Enhanced email validation with ML
export async function validateEmailWithML(email: string): Promise<{
  isValid: boolean
  isDisposable: boolean
  confidence: number
  riskLevel: 'low' | 'medium' | 'high' | 'critical'
  explanation: string[]
  recommendations: string[]
}> {
  const basicValidation = await validateEmail(email)

  if (!basicValidation.isValid) {
    return {
      isValid: false,
      isDisposable: false,
      confidence: 100,
      riskLevel: 'low',
      explanation: ['Invalid email format'],
      recommendations: ['Please enter a valid email address']
    }
  }

  // Run ML classification
  const mlPrediction = await disposableClassifier.predict(email)

  // Determine risk level
  let riskLevel: 'low' | 'medium' | 'high' | 'critical' = 'low'
  if (mlPrediction.confidence > 90) riskLevel = 'critical'
  else if (mlPrediction.confidence > 70) riskLevel = 'high'
  else if (mlPrediction.confidence > 40) riskLevel = 'medium'

  // Generate recommendations
  const recommendations = []
  if (mlPrediction.isDisposable) {
    recommendations.push('Please use a permanent email address')
    recommendations.push('Consider using Gmail, Outlook, or your work email')

    if (riskLevel === 'critical') {
      recommendations.push('This email domain is known to be disposable')
    } else {
      recommendations.push('This email domain shows suspicious characteristics')
    }
  }

  return {
    isValid: true,
    isDisposable: mlPrediction.isDisposable,
    confidence: mlPrediction.confidence,
    riskLevel,
    explanation: mlPrediction.explanation,
    recommendations
  }
}

// Express.js middleware for email validation
app.post('/api/validate-email', async (req, res) => {
  try {
    const { email } = req.body

    if (!email) {
      return res.status(400).json({ error: 'Email address required' })
    }

    const validation = await validateEmailWithML(email)

    res.json({
      email,
      validation,
      timestamp: new Date().toISOString()
    })

  } catch (error) {
    console.error('Email validation error:', error)
    res.status(500).json({ error: 'Validation service unavailable' })
  }
})

// Batch validation endpoint
app.post('/api/validate-emails', async (req, res) => {
  try {
    const { emails } = req.body

    if (!Array.isArray(emails)) {
      return res.status(400).json({ error: 'Emails array required' })
    }

    const validations = await Promise.all(
      emails.map(email => validateEmailWithML(email))
    )

    const results = emails.map((email, index) => ({
      email,
      validation: validations[index]
    }))

    res.json({
      results,
      summary: {
        total: emails.length,
        valid: results.filter(r => r.validation.isValid).length,
        disposable: results.filter(r => r.validation.isDisposable).length,
        highRisk: results.filter(r => r.validation.riskLevel === 'critical').length
      },
      timestamp: new Date().toISOString()
    })

  } catch (error) {
    console.error('Batch email validation error:', error)
    res.status(500).json({ error: 'Batch validation service unavailable' })
  }
})

console.log('Disposable email ML classifier initialized')

Real-Time Pattern Analysis

// Real-time pattern analysis for detecting emerging disposable email threats
interface EmailPattern {
  id: string
  pattern: string
  type: 'domain_pattern' | 'registration_pattern' | 'behavioral_pattern' | 'network_pattern'
  confidence: number
  frequency: number
  firstSeen: number
  lastSeen: number
  riskScore: number
  affectedDomains: string[]
  indicators: string[]
}

interface PatternAnalysisResult {
  suspiciousDomains: string[]
  emergingPatterns: EmailPattern[]
  trendAnalysis: {
    direction: 'increasing' | 'decreasing' | 'stable'
    changeRate: number
    confidence: number
  }
  recommendations: string[]
}

class RealTimePatternAnalyzer {
  private patternBuffer: Map<string, EmailPattern> = new Map()
  private domainActivity: Map<string, { count: number; lastSeen: number; patterns: string[] }> = new Map()
  private analysisWindow = 24 * 60 * 60 * 1000 // 24 hours
  private minPatternFrequency = 5
  private subscribers: Array<(result: PatternAnalysisResult) => void> = []

  constructor() {
    this.startPatternAnalysis()
  }

  // Analyze email domain for suspicious patterns
  async analyzeDomain(domain: string): Promise<{
    isSuspicious: boolean
    patterns: string[]
    riskScore: number
    recommendations: string[]
  }> {
    const analysis = {
      isSuspicious: false,
      patterns: [] as string[],
      riskScore: 0,
      recommendations: [] as string[]
    }

    // Update domain activity
    this.updateDomainActivity(domain)

    // Check for known patterns
    const detectedPatterns = await this.detectPatterns(domain)

    for (const pattern of detectedPatterns) {
      if (pattern.riskScore > 50) {
        analysis.isSuspicious = true
        analysis.patterns.push(pattern.pattern)
        analysis.riskScore = Math.max(analysis.riskScore, pattern.riskScore)

        if (pattern.riskScore > 80) {
          analysis.recommendations.push('Block domain immediately')
          analysis.recommendations.push('Monitor for similar patterns')
        } else if (pattern.riskScore > 60) {
          analysis.recommendations.push('Require additional verification')
          analysis.recommendations.push('Add to watchlist')
        }
      }
    }

    // Check for behavioral anomalies
    const behavioralScore = await this.analyzeBehavioralPatterns(domain)
    if (behavioralScore > 70) {
      analysis.isSuspicious = true
      analysis.patterns.push('behavioral_anomaly')
      analysis.riskScore = Math.max(analysis.riskScore, behavioralScore)
      analysis.recommendations.push('Investigate account activity')
    }

    return analysis
  }

  // Subscribe to pattern analysis results
  subscribe(callback: (result: PatternAnalysisResult) => void): () => void {
    this.subscribers.push(callback)

    return () => {
      const index = this.subscribers.indexOf(callback)
      if (index > -1) {
        this.subscribers.splice(index, 1)
      }
    }
  }

  // Get comprehensive pattern analysis
  async getPatternAnalysis(timeframe: number = this.analysisWindow): Promise<PatternAnalysisResult> {
    const cutoff = Date.now() - timeframe

    // Filter recent patterns
    const recentPatterns = Array.from(this.patternBuffer.values())
      .filter(pattern => pattern.lastSeen > cutoff)

    // Identify suspicious domains
    const suspiciousDomains = await this.identifySuspiciousDomains(cutoff)

    // Analyze trends
    const trendAnalysis = await this.analyzeTrend(cutoff)

    // Generate recommendations
    const recommendations = this.generateAnalysisRecommendations(recentPatterns, suspiciousDomains)

    const result: PatternAnalysisResult = {
      suspiciousDomains,
      emergingPatterns: recentPatterns.slice(0, 10), // Top 10 patterns
      trendAnalysis,
      recommendations
    }

    // Notify subscribers
    this.subscribers.forEach(callback => {
      try {
        callback(result)
      } catch (error) {
        console.error('Error in pattern analysis subscriber:', error)
      }
    })

    return result
  }

  private async detectPatterns(domain: string): Promise<EmailPattern[]> {
    const patterns: EmailPattern[] = []

    // Domain pattern analysis
    const domainPatterns = await this.detectDomainPatterns(domain)
    patterns.push(...domainPatterns)

    // Registration pattern analysis
    const registrationPatterns = await this.detectRegistrationPatterns(domain)
    patterns.push(...registrationPatterns)

    // Network pattern analysis
    const networkPatterns = await this.detectNetworkPatterns(domain)
    patterns.push(...networkPatterns)

    return patterns
  }

  private async detectDomainPatterns(domain: string): Promise<EmailPattern[]> {
    const patterns: EmailPattern[] = []

    // Pattern 1: Random-looking domains
    const entropy = this.calculateEntropy(domain)
    if (entropy > 3.5) {
      patterns.push({
        id: `random_domain_${Date.now()}`,
        pattern: 'high_entropy_domain',
        type: 'domain_pattern',
        confidence: Math.min(100, entropy * 20),
        frequency: 1,
        firstSeen: Date.now(),
        lastSeen: Date.now(),
        riskScore: Math.min(100, entropy * 25),
        affectedDomains: [domain],
        indicators: ['high_entropy', 'random_character_distribution']
      })
    }

    // Pattern 2: Sequential domains (like temp123.com)
    if (/tempd+.com/.test(domain) || /maild+.com/.test(domain)) {
      patterns.push({
        id: `sequential_domain_${Date.now()}`,
        pattern: 'sequential_domain_pattern',
        type: 'domain_pattern',
        confidence: 85,
        frequency: 1,
        firstSeen: Date.now(),
        lastSeen: Date.now(),
        riskScore: 80,
        affectedDomains: [domain],
        indicators: ['sequential_numbering', 'temp_mail_pattern']
      })
    }

    // Pattern 3: Known disposable TLDs
    const riskyTLDs = ['.tk', '.ml', '.cf', '.ga', '.gq']
    const tld = domain.split('.').pop() || ''
    if (riskyTLDs.includes('.' + tld)) {
      patterns.push({
        id: `risky_tld_${Date.now()}`,
        pattern: 'risky_tld_pattern',
        type: 'domain_pattern',
        confidence: 90,
        frequency: 1,
        firstSeen: Date.now(),
        lastSeen: Date.now(),
        riskScore: 85,
        affectedDomains: [domain],
        indicators: ['high_risk_tld', 'known_disposable_tld']
      })
    }

    return patterns
  }

  private async detectRegistrationPatterns(domain: string): Promise<EmailPattern[]> {
    const patterns: EmailPattern[] = []

    // In production, this would use WHOIS data
    // For demo, simulate based on domain characteristics

    // Pattern: Very new domains (less than 30 days)
    if (domain.length < 10 || domain.includes('temp')) {
      patterns.push({
        id: `new_domain_${Date.now()}`,
        pattern: 'new_domain_registration',
        type: 'registration_pattern',
        confidence: 75,
        frequency: 1,
        firstSeen: Date.now(),
        lastSeen: Date.now(),
        riskScore: 70,
        affectedDomains: [domain],
        indicators: ['recent_registration', 'suspicious_timing']
      })
    }

    // Pattern: Bulk registration patterns
    if (/d{3,}/.test(domain)) {
      patterns.push({
        id: `bulk_registration_${Date.now()}`,
        pattern: 'bulk_registration_pattern',
        type: 'registration_pattern',
        confidence: 80,
        frequency: 1,
        firstSeen: Date.now(),
        lastSeen: Date.now(),
        riskScore: 75,
        affectedDomains: [domain],
        indicators: ['bulk_registration', 'automated_registration']
      })
    }

    return patterns
  }

  private async detectNetworkPatterns(domain: string): Promise<EmailPattern[]> {
    const patterns: EmailPattern[] = []

    // In production, this would analyze network traffic patterns
    // For demo, simulate based on domain characteristics

    // Pattern: High-risk hosting patterns
    if (domain.includes('free') || domain.includes('hosting')) {
      patterns.push({
        id: `hosting_pattern_${Date.now()}`,
        pattern: 'suspicious_hosting',
        type: 'network_pattern',
        confidence: 70,
        frequency: 1,
        firstSeen: Date.now(),
        lastSeen: Date.now(),
        riskScore: 65,
        affectedDomains: [domain],
        indicators: ['free_hosting', 'suspicious_infrastructure']
      })
    }

    return patterns
  }

  private async analyzeBehavioralPatterns(domain: string): Promise<number> {
    const activity = this.domainActivity.get(domain)

    if (!activity || activity.count < 10) return 0

    // Analyze behavioral indicators
    let riskScore = 0

    // High frequency in short time
    const timeSpan = Date.now() - activity.lastSeen
    if (timeSpan < 60 * 60 * 1000 && activity.count > 50) { // 50+ uses in last hour
      riskScore += 40
    }

    // Rapid sequential access pattern
    if (activity.patterns.includes('sequential_access')) {
      riskScore += 30
    }

    // Geographic dispersion (unusual for disposable)
    if (activity.patterns.includes('geographic_dispersion')) {
      riskScore += 20
    }

    return Math.min(100, riskScore)
  }

  private updateDomainActivity(domain: string): void {
    const current = this.domainActivity.get(domain) || {
      count: 0,
      lastSeen: 0,
      patterns: []
    }

    current.count++
    current.lastSeen = Date.now()

    // Detect access patterns
    if (current.count > 1) {
      const timeSinceLast = Date.now() - current.lastSeen
      if (timeSinceLast < 1000) { // Less than 1 second between accesses
        current.patterns.push('rapid_access')
      }
    }

    this.domainActivity.set(domain, current)
  }

  private async identifySuspiciousDomains(cutoff: number): Promise<string[]> {
    const suspiciousDomains: string[] = []

    for (const [domain, activity] of this.domainActivity.entries()) {
      if (activity.lastSeen < cutoff) continue

      let suspiciousScore = 0

      // High activity volume
      if (activity.count > 100) suspiciousScore += 30

      // Recent first appearance
      if (activity.lastSeen - activity.lastSeen < 24 * 60 * 60 * 1000) suspiciousScore += 20

      // Suspicious patterns
      if (activity.patterns.length > 0) suspiciousScore += 25

      if (suspiciousScore > 60) {
        suspiciousDomains.push(domain)
      }
    }

    return suspiciousDomains.slice(0, 50) // Top 50 suspicious domains
  }

  private async analyzeTrend(cutoff: number): Promise<{
    direction: 'increasing' | 'decreasing' | 'stable'
    changeRate: number
    confidence: number
  }> {
    const recentPatterns = Array.from(this.patternBuffer.values())
      .filter(pattern => pattern.lastSeen > cutoff)

    if (recentPatterns.length < 10) {
      return { direction: 'stable', changeRate: 0, confidence: 50 }
    }

    // Simple trend analysis based on pattern frequency over time
    const now = Date.now()
    const windowSize = 6 * 60 * 60 * 1000 // 6 hours

    const recentWindow = recentPatterns.filter(p => now - p.lastSeen < windowSize)
    const olderWindow = recentPatterns.filter(p => now - p.lastSeen >= windowSize)

    const recentAvg = recentWindow.reduce((sum, p) => sum + p.frequency, 0) / recentWindow.length || 0
    const olderAvg = olderWindow.reduce((sum, p) => sum + p.frequency, 0) / olderWindow.length || 0

    let direction: 'increasing' | 'decreasing' | 'stable' = 'stable'
    let changeRate = 0

    if (recentAvg > olderAvg * 1.2) {
      direction = 'increasing'
      changeRate = (recentAvg - olderAvg) / olderAvg
    } else if (recentAvg < olderAvg * 0.8) {
      direction = 'decreasing'
      changeRate = (olderAvg - recentAvg) / olderAvg
    }

    return {
      direction,
      changeRate: Math.round(changeRate * 100) / 100,
      confidence: 75 // Simplified confidence score
    }
  }

  private generateAnalysisRecommendations(patterns: EmailPattern[], suspiciousDomains: string[]): string[] {
    const recommendations: string[] = []

    if (suspiciousDomains.length > 20) {
      recommendations.push('High number of suspicious domains detected')
      recommendations.push('Consider tightening domain validation rules')
    }

    const highRiskPatterns = patterns.filter(p => p.riskScore > 80)
    if (highRiskPatterns.length > 5) {
      recommendations.push('Multiple high-risk patterns detected')
      recommendations.push('Enable enhanced monitoring and alerting')
    }

    if (patterns.some(p => p.type === 'network_pattern')) {
      recommendations.push('Network-level anomalies detected')
      recommendations.push('Review infrastructure security')
    }

    if (recommendations.length === 0) {
      recommendations.push('Pattern analysis shows normal activity')
    }

    return recommendations
  }

  private calculateEntropy(domain: string): number {
    const charCounts = new Map<string, number>()

    for (const char of domain) {
      charCounts.set(char, (charCounts.get(char) || 0) + 1)
    }

    let entropy = 0
    const length = domain.length

    for (const count of charCounts.values()) {
      const probability = count / length
      entropy -= probability * Math.log2(probability)
    }

    return entropy
  }

  private startPatternAnalysis(): void {
    // Run pattern analysis every 5 minutes
    setInterval(async () => {
      await this.getPatternAnalysis()
    }, 5 * 60 * 1000)

    // Clean up old data every hour
    setInterval(() => {
      this.cleanupOldData()
    }, 60 * 60 * 1000)
  }

  private cleanupOldData(): void {
    const cutoff = Date.now() - this.analysisWindow

    // Remove old patterns
    for (const [id, pattern] of this.patternBuffer.entries()) {
      if (pattern.lastSeen < cutoff) {
        this.patternBuffer.delete(id)
      }
    }

    // Remove old domain activity
    for (const [domain, activity] of this.domainActivity.entries()) {
      if (activity.lastSeen < cutoff) {
        this.domainActivity.delete(domain)
      }
    }
  }
}

// Integration with pattern analysis
const patternAnalyzer = new RealTimePatternAnalyzer()

// API endpoints for pattern analysis
app.get('/api/patterns/analysis', async (req, res) => {
  try {
    const timeframe = parseInt(req.query.timeframe as string) || 24 * 60 * 60 * 1000 // 24 hours default
    const analysis = await patternAnalyzer.getPatternAnalysis(timeframe)

    res.json({
      ...analysis,
      timeframe,
      timestamp: new Date().toISOString()
    })

  } catch (error) {
    console.error('Pattern analysis error:', error)
    res.status(500).json({ error: 'Pattern analysis unavailable' })
  }
})

// Subscribe to pattern analysis updates
app.ws('/api/patterns/stream', (ws: any) => {
  const unsubscribe = patternAnalyzer.subscribe((result) => {
    ws.send(JSON.stringify({
      type: 'pattern_analysis',
      data: result,
      timestamp: new Date().toISOString()
    }))
  })

  ws.on('close', () => {
    unsubscribe()
  })
})

// Analyze specific domain
app.post('/api/patterns/analyze-domain', async (req, res) => {
  try {
    const { domain } = req.body

    if (!domain) {
      return res.status(400).json({ error: 'Domain required' })
    }

    const analysis = await patternAnalyzer.analyzeDomain(domain)

    res.json({
      domain,
      analysis,
      timestamp: new Date().toISOString()
    })

  } catch (error) {
    console.error('Domain analysis error:', error)
    res.status(500).json({ error: 'Domain analysis unavailable' })
  }
})

console.log('Real-time pattern analyzer initialized')

Automated Domain Discovery

// Automated system for discovering new disposable email domains
interface DomainDiscoveryConfig {
  crawlInterval: number // minutes
  maxDomainsPerCrawl: number
  verificationTimeout: number // seconds
  similarityThreshold: number
  minConfidenceScore: number
  externalSources: string[]
}

interface DiscoveredDomain {
  domain: string
  source: string
  discoveryMethod: 'crawler' | 'similarity' | 'external_api' | 'user_report'
  confidence: number
  verificationStatus: 'pending' | 'verified' | 'failed' | 'confirmed_disposable'
  firstSeen: number
  lastVerified: number
  mxRecords: string[]
  spfRecord: string | null
  dmarcRecord: string | null
  similarTo: string[]
  riskFactors: string[]
}

interface CrawlResult {
  newDomains: DiscoveredDomain[]
  verifiedDisposable: DiscoveredDomain[]
  failedVerifications: string[]
  crawlStats: {
    domainsCrawled: number
    pagesProcessed: number
    avgResponseTime: number
    errorRate: number
  }
}

class AutomatedDomainDiscovery {
  private discoveredDomains: Map<string, DiscoveredDomain> = new Map()
  private knownDisposableDomains: Set<string> = new Set()
  private crawler: DomainCrawler
  private verifier: DomainVerifier
  private similarityEngine: SimilarityEngine
  private config: DomainDiscoveryConfig
  private subscribers: Array<(result: CrawlResult) => void> = []

  constructor(config: DomainDiscoveryConfig) {
    this.config = config
    this.crawler = new DomainCrawler()
    this.verifier = new DomainVerifier()
    this.similarityEngine = new SimilarityEngine()
    this.loadKnownDisposableDomains()
    this.startDiscoveryProcess()
  }

  // Subscribe to discovery results
  subscribe(callback: (result: CrawlResult) => void): () => void {
    this.subscribers.push(callback)

    return () => {
      const index = this.subscribers.indexOf(callback)
      if (index > -1) {
        this.subscribers.splice(index, 1)
      }
    }
  }

  // Manually trigger domain discovery
  async triggerDiscovery(): Promise<CrawlResult> {
    console.log('Starting manual domain discovery...')

    const crawlResult = await this.performDiscoveryCrawl()
    await this.processDiscoveryResults(crawlResult)

    // Notify subscribers
    this.subscribers.forEach(callback => {
      try {
        callback(crawlResult)
      } catch (error) {
        console.error('Error in domain discovery subscriber:', error)
      }
    })

    return crawlResult
  }

  // Get current discovery status
  getDiscoveryStatus(): {
    totalDiscovered: number
    pendingVerification: number
    confirmedDisposable: number
    lastCrawlTime: number
    nextScheduledCrawl: number
    systemHealth: 'healthy' | 'degraded' | 'unhealthy'
  } {
    const totalDiscovered = this.discoveredDomains.size
    const pendingVerification = Array.from(this.discoveredDomains.values())
      .filter(d => d.verificationStatus === 'pending').length
    const confirmedDisposable = Array.from(this.discoveredDomains.values())
      .filter(d => d.verificationStatus === 'confirmed_disposable').length

    let systemHealth: 'healthy' | 'degraded' | 'unhealthy' = 'healthy'
    if (pendingVerification > 1000) systemHealth = 'degraded'
    if (pendingVerification > 5000) systemHealth = 'unhealthy'

    return {
      totalDiscovered,
      pendingVerification,
      confirmedDisposable,
      lastCrawlTime: Date.now() - (5 * 60 * 1000), // 5 minutes ago for demo
      nextScheduledCrawl: Date.now() + (this.config.crawlInterval * 60 * 1000),
      systemHealth
    }
  }

  private async performDiscoveryCrawl(): Promise<CrawlResult> {
    const startTime = Date.now()
    const result: CrawlResult = {
      newDomains: [],
      verifiedDisposable: [],
      failedVerifications: [],
      crawlStats: {
        domainsCrawled: 0,
        pagesProcessed: 0,
        avgResponseTime: 0,
        errorRate: 0
      }
    }

    try {
      // Crawl disposable email provider lists
      const crawledDomains = await this.crawler.crawlDisposableProviders()

      result.crawlStats.domainsCrawled = crawledDomains.length
      result.crawlStats.pagesProcessed = crawledDomains.length * 2 // Rough estimate

      // Process each discovered domain
      for (const domain of crawledDomains.slice(0, this.config.maxDomainsPerCrawl)) {
        const discoveredDomain = await this.processDiscoveredDomain(domain, 'crawler')
        result.newDomains.push(discoveredDomain)

        // Attempt immediate verification for high-confidence domains
        if (discoveredDomain.confidence > 80) {
          const verification = await this.verifier.verifyDomain(domain)
          discoveredDomain.verificationStatus = verification.isDisposable ? 'confirmed_disposable' : 'verified'
          discoveredDomain.lastVerified = Date.now()

          if (verification.isDisposable) {
            result.verifiedDisposable.push(discoveredDomain)
          }
        }
      }

      // Find similar domains to known disposable ones
      const similarDomains = await this.findSimilarDomains()
      for (const domain of similarDomains) {
        if (!this.discoveredDomains.has(domain)) {
          const discoveredDomain = await this.processDiscoveredDomain(domain, 'similarity')
          result.newDomains.push(discoveredDomain)
        }
      }

      // Check external APIs for new disposable domains
      const externalDomains = await this.checkExternalSources()
      for (const domain of externalDomains) {
        if (!this.discoveredDomains.has(domain)) {
          const discoveredDomain = await this.processDiscoveredDomain(domain, 'external_api')
          result.newDomains.push(discoveredDomain)
        }
      }

      // Calculate crawl statistics
      const totalTime = Date.now() - startTime
      result.crawlStats.avgResponseTime = totalTime / Math.max(result.newDomains.length, 1)
      result.crawlStats.errorRate = result.failedVerifications.length / Math.max(result.newDomains.length, 1)

    } catch (error) {
      console.error('Discovery crawl error:', error)
      result.crawlStats.errorRate = 1.0
    }

    return result
  }

  private async processDiscoveredDomain(domain: string, method: DiscoveredDomain['discoveryMethod']): Promise<DiscoveredDomain> {
    const discoveredDomain: DiscoveredDomain = {
      domain,
      source: method,
      discoveryMethod: method,
      confidence: await this.calculateDiscoveryConfidence(domain, method),
      verificationStatus: 'pending',
      firstSeen: Date.now(),
      lastVerified: 0,
      mxRecords: [],
      spfRecord: null,
      dmarcRecord: null,
      similarTo: [],
      riskFactors: []
    }

    // Perform basic DNS checks
    const dnsInfo = await this.verifier.getDNSInfo(domain)
    discoveredDomain.mxRecords = dnsInfo.mxRecords
    discoveredDomain.spfRecord = dnsInfo.spfRecord
    discoveredDomain.dmarcRecord = dnsInfo.dmarcRecord

    // Analyze risk factors
    discoveredDomain.riskFactors = await this.analyzeRiskFactors(domain, dnsInfo)

    // Find similar domains
    discoveredDomain.similarTo = await this.similarityEngine.findSimilarDomains(domain)

    this.discoveredDomains.set(domain, discoveredDomain)

    return discoveredDomain
  }

  private async calculateDiscoveryConfidence(domain: string, method: string): Promise<number> {
    let confidence = 50 // Base confidence

    // Method-based confidence boost
    switch (method) {
      case 'crawler':
        confidence += 30
        break
      case 'similarity':
        confidence += 20
        break
      case 'external_api':
        confidence += 25
        break
      case 'user_report':
        confidence += 15
        break
    }

    // Domain-based confidence adjustments
    if (domain.length < 8) confidence += 10 // Short domains are suspicious
    if (domain.length > 20) confidence -= 10 // Very long domains are less likely disposable

    if (/d{3,}/.test(domain)) confidence += 15 // Numeric sequences are suspicious

    if (domain.includes('temp') || domain.includes('mail')) confidence += 20

    // TLD-based confidence
    const riskyTLDs = ['.tk', '.ml', '.cf', '.ga', '.gq']
    const tld = domain.split('.').pop() || ''
    if (riskyTLDs.includes('.' + tld)) confidence += 25

    return Math.min(100, Math.max(0, confidence))
  }

  private async analyzeRiskFactors(domain: string, dnsInfo: any): Promise<string[]> {
    const riskFactors: string[] = []

    // MX record anomalies
    if (dnsInfo.mxRecords.length === 0) {
      riskFactors.push('no_mx_records')
    }

    if (dnsInfo.mxRecords.length > 3) {
      riskFactors.push('multiple_mx_records')
    }

    // Missing SPF/DMARC
    if (!dnsInfo.spfRecord) {
      riskFactors.push('missing_spf')
    }

    if (!dnsInfo.dmarcRecord) {
      riskFactors.push('missing_dmarc')
    }

    // Domain characteristics
    if (domain.length < 10) {
      riskFactors.push('short_domain')
    }

    if (/d{4,}/.test(domain)) {
      riskFactors.push('numeric_sequence')
    }

    if (domain.includes('temp') || domain.includes('disposable')) {
      riskFactors.push('suspicious_keywords')
    }

    return riskFactors
  }

  private async findSimilarDomains(): Promise<string[]> {
    const similarDomains: string[] = []

    // Find domains similar to known disposable ones
    for (const knownDisposable of this.knownDisposableDomains) {
      const similar = await this.similarityEngine.findSimilarDomains(knownDisposable)
      similarDomains.push(...similar.filter(domain => !this.knownDisposableDomains.has(domain)))
    }

    // Remove duplicates and limit results
    return [...new Set(similarDomains)].slice(0, 50)
  }

  private async checkExternalSources(): Promise<string[]> {
    const externalDomains: string[] = []

    for (const source of this.config.externalSources) {
      try {
        const domains = await this.fetchFromExternalSource(source)
        externalDomains.push(...domains)
      } catch (error) {
        console.error(`Error fetching from source ${source}:`, error)
      }
    }

    return [...new Set(externalDomains)].slice(0, 100)
  }

  private async fetchFromExternalSource(source: string): Promise<string[]> {
    // In production, implement actual API calls
    // For demo, return simulated data

    const mockSources: Record<string, string[]> = {
      'github_disposable_list': [
        'newdisposable1.com', 'tempdomain2.org', 'mailtest3.net'
      ],
      'abuse_ch_api': [
        'spamdomain4.com', 'fakeemail5.org'
      ],
      'custom_crawler': [
        'tempmail6.com', 'disposable7.net'
      ]
    }

    return mockSources[source] || []
  }

  private loadKnownDisposableDomains(): void {
    // Load from database or external sources
    const knownDomains = [
      'mailinator.com', '10minutemail.com', 'guerrillamail.com',
      'tempmail.com', 'throwaway.email', 'dispostable.com'
    ]

    knownDomains.forEach(domain => this.knownDisposableDomains.add(domain))
  }

  private startDiscoveryProcess(): void {
    // Schedule regular discovery crawls
    setInterval(async () => {
      await this.triggerDiscovery()
    }, this.config.crawlInterval * 60 * 1000)

    // Background verification of pending domains
    setInterval(async () => {
      await this.processPendingVerifications()
    }, 30 * 1000) // Every 30 seconds
  }

  private async processPendingVerifications(): Promise<void> {
    const pendingDomains = Array.from(this.discoveredDomains.values())
      .filter(d => d.verificationStatus === 'pending')
      .slice(0, 10) // Process 10 at a time

    for (const domain of pendingDomains) {
      try {
        const verification = await this.verifier.verifyDomain(domain.domain)

        if (verification.isDisposable) {
          domain.verificationStatus = 'confirmed_disposable'
          this.knownDisposableDomains.add(domain.domain)
        } else {
          domain.verificationStatus = 'verified'
        }

        domain.lastVerified = Date.now()

      } catch (error) {
        console.error(`Verification failed for ${domain.domain}:`, error)
        domain.verificationStatus = 'failed'
      }
    }
  }

  private async processDiscoveryResults(result: CrawlResult): Promise<void> {
    // Add new domains to database
    for (const domain of result.newDomains) {
      await this.saveDiscoveredDomain(domain)
    }

    // Update known disposable domains
    for (const domain of result.verifiedDisposable) {
      this.knownDisposableDomains.add(domain.domain)
      await this.updateDisposableDomain(domain.domain, 95)
    }

    console.log(`Discovery completed: ${result.newDomains.length} new domains, ${result.verifiedDisposable.length} confirmed disposable`)
  }

  private async saveDiscoveredDomain(domain: DiscoveredDomain): Promise<void> {
    // Save to database
    console.log(`Saving discovered domain: ${domain.domain} (confidence: ${domain.confidence})`)
  }

  private async updateDisposableDomain(domain: string, confidence: number): Promise<void> {
    // Update disposable domains table
    console.log(`Updating disposable domain: ${domain} (confidence: ${confidence})`)
  }
}

class DomainCrawler {
  async crawlDisposableProviders(): Promise<string[]> {
    const discoveredDomains: string[] = []

    // In production, crawl actual disposable email provider websites
    // For demo, return simulated results

    const mockProviders = [
      'https://tempmail.com',
      'https://10minutemail.com',
      'https://guerrillamail.com',
      'https://mailinator.com'
    ]

    for (const provider of mockProviders) {
      try {
        // Simulate crawling provider website for domain extraction
        const domains = await this.extractDomainsFromProvider(provider)
        discoveredDomains.push(...domains)
      } catch (error) {
        console.error(`Failed to crawl ${provider}:`, error)
      }
    }

    return [...new Set(discoveredDomains)] // Remove duplicates
  }

  private async extractDomainsFromProvider(providerUrl: string): Promise<string[]> {
    // In production, use actual web scraping
    // For demo, return simulated domain extraction

    const mockDomains = {
      'https://tempmail.com': ['tempmail.com', 'tempmail.net', 'tempmail.org'],
      'https://10minutemail.com': ['10minutemail.com', '10minutemail.net'],
      'https://guerrillamail.com': ['guerrillamail.com', 'guerrillamail.net'],
      'https://mailinator.com': ['mailinator.com', 'mailinator.net']
    }

    return mockDomains[providerUrl] || []
  }
}

class DomainVerifier {
  async verifyDomain(domain: string): Promise<{
    isDisposable: boolean
    confidence: number
    verificationMethod: string
    details: Record<string, any>
  }> {
    // Perform comprehensive domain verification
    const results = await Promise.all([
      this.checkDNSRecords(domain),
      this.checkDomainRegistration(domain),
      this.checkWebPresence(domain),
      this.checkSMTPAvailability(domain)
    ])

    const [dnsResult, registrationResult, webResult, smtpResult] = results

    // Combine verification results
    const combinedScore = this.combineVerificationScores(results)
    const isDisposable = combinedScore > 0.7

    return {
      isDisposable,
      confidence: combinedScore * 100,
      verificationMethod: 'multi_factor',
      details: {
        dns: dnsResult,
        registration: registrationResult,
        web: webResult,
        smtp: smtpResult
      }
    }
  }

  async getDNSInfo(domain: string): Promise<{
    mxRecords: string[]
    spfRecord: string | null
    dmarcRecord: string | null
  }> {
    // In production, use actual DNS lookups
    // For demo, simulate DNS responses

    const mockDNS: Record<string, any> = {
      'tempmail.com': {
        mxRecords: [],
        spfRecord: null,
        dmarcRecord: null
      },
      'mailinator.com': {
        mxRecords: [],
        spfRecord: null,
        dmarcRecord: null
      },
      'gmail.com': {
        mxRecords: ['gmail-smtp-in.l.google.com'],
        spfRecord: 'v=spf1 include:_spf.google.com ~all',
        dmarcRecord: 'v=DMARC1; p=reject'
      }
    }

    return mockDNS[domain] || {
      mxRecords: ['mail.' + domain],
      spfRecord: 'v=spf1 mx -all',
      dmarcRecord: null
    }
  }

  private async checkDNSRecords(domain: string): Promise<number> {
    const dnsInfo = await this.getDNSInfo(domain)

    let score = 0

    // MX records check
    if (dnsInfo.mxRecords.length === 0) score += 0.4
    if (dnsInfo.mxRecords.some(mx => mx.includes('temp') || mx.includes('mail'))) score += 0.3

    // SPF check
    if (!dnsInfo.spfRecord) score += 0.2

    // DMARC check
    if (!dnsInfo.dmarcRecord) score += 0.1

    return Math.min(1, score)
  }

  private async checkDomainRegistration(domain: string): Promise<number> {
    // In production, use WHOIS API
    // For demo, simulate based on domain characteristics

    if (domain.length < 10) return 0.3 // Short domains are suspicious
    if (domain.includes('temp')) return 0.4
    if (/d{3,}/.test(domain)) return 0.3

    return 0.1 // Low suspicion for normal domains
  }

  private async checkWebPresence(domain: string): Promise<number> {
    // In production, check if website exists and analyze content
    // For demo, simulate web presence check

    if (domain.includes('temp') || domain.includes('mail')) return 0.5
    return 0.1
  }

  private async checkSMTPAvailability(domain: string): Promise<number> {
    // In production, attempt SMTP connection
    // For demo, simulate SMTP check

    if (domain.includes('temp')) return 0.6 // High likelihood of SMTP issues
    return 0.1
  }

  private combineVerificationScores(results: number[]): number {
    return results.reduce((sum, score) => sum + score, 0) / results.length
  }
}

class SimilarityEngine {
  async findSimilarDomains(domain: string): Promise<string[]> {
    const similarDomains: string[] = []

    // Generate variations of the domain
    const variations = this.generateDomainVariations(domain)

    // Check which variations exist (in production, use DNS lookup)
    for (const variation of variations) {
      if (await this.domainExists(variation)) {
        similarDomains.push(variation)
      }
    }

    // Find domains with similar characteristics
    const characteristicSimilar = await this.findByCharacteristics(domain)
    similarDomains.push(...characteristicSimilar)

    return [...new Set(similarDomains)].slice(0, 20) // Limit results
  }

  private generateDomainVariations(domain: string): string[] {
    const variations: string[] = []
    const parts = domain.split('.')

    if (parts.length >= 2) {
      const name = parts[0]
      const tld = parts[1]

      // Add numbers
      for (let i = 1; i <= 10; i++) {
        variations.push(`${name}${i}.${tld}`)
      }

      // Add prefixes
      const prefixes = ['temp', 'mail', 'test', 'demo']
      prefixes.forEach(prefix => {
        variations.push(`${prefix}${name}.${tld}`)
      })

      // TLD variations
      const tlds = ['com', 'net', 'org', 'info', 'biz']
      tlds.forEach(newTld => {
        if (newTld !== tld) {
          variations.push(`${name}.${newTld}`)
        }
      })
    }

    return variations
  }

  private async domainExists(domain: string): Promise<boolean> {
    // In production, perform actual DNS lookup
    // For demo, simulate based on domain patterns

    if (domain.includes('temp') && domain.includes('123')) return true
    if (domain.includes('mail') && /d/.test(domain)) return true

    return Math.random() > 0.8 // 20% chance of existing
  }

  private async findByCharacteristics(domain: string): Promise<string[]> {
    // Find domains with similar characteristics (length, patterns, etc.)
    // In production, use database queries

    const similar: string[] = []

    if (domain.length < 10) {
      similar.push('shortdomain1.com', 'shortdomain2.net')
    }

    if (/d/.test(domain)) {
      similar.push('numericdomain3.com', 'numberdomain4.net')
    }

    return similar.filter(domain => Math.random() > 0.7) // Random subset
  }
}

// Initialize automated domain discovery
const discoveryConfig: DomainDiscoveryConfig = {
  crawlInterval: 60, // Every hour
  maxDomainsPerCrawl: 100,
  verificationTimeout: 30,
  similarityThreshold: 0.8,
  minConfidenceScore: 70,
  externalSources: [
    'github_disposable_list',
    'abuse_ch_api',
    'custom_crawler'
  ]
}

const domainDiscovery = new AutomatedDomainDiscovery(discoveryConfig)

// API endpoints for automated discovery
app.get('/api/discovery/status', (req, res) => {
  const status = domainDiscovery.getDiscoveryStatus()

  res.json({
    ...status,
    timestamp: new Date().toISOString()
  })
})

// Trigger manual discovery
app.post('/api/discovery/trigger', async (req, res) => {
  try {
    const result = await domainDiscovery.triggerDiscovery()

    res.json({
      ...result,
      timestamp: new Date().toISOString()
    })

  } catch (error) {
    console.error('Manual discovery error:', error)
    res.status(500).json({ error: 'Discovery failed' })
  }
})

// Subscribe to discovery results
app.ws('/api/discovery/stream', (ws: any) => {
  const unsubscribe = domainDiscovery.subscribe((result) => {
    ws.send(JSON.stringify({
      type: 'discovery_result',
      data: result,
      timestamp: new Date().toISOString()
    }))
  })

  ws.on('close', () => {
    unsubscribe()
  })
})

// Get discovered domains
app.get('/api/discovery/domains', (req, res) => {
  const status = req.query.status as string || 'all'
  const limit = parseInt(req.query.limit as string) || 100

  const domains = Array.from(domainDiscovery['discoveredDomains'].values())

  let filteredDomains = domains

  if (status !== 'all') {
    filteredDomains = domains.filter(d => d.verificationStatus === status)
  }

  res.json({
    domains: filteredDomains.slice(0, limit),
    total: filteredDomains.length,
    timestamp: new Date().toISOString()
  })
})

console.log('Automated domain discovery system initialized')

Implementation (Node.js + SQL)

1) Maintain a disposable domains table

create table if not exists disposable_domains (
  domain text primary key,
  source text,  -- 'public-list', 'internal-discovery', 'manual'
  confidence_score int check (confidence_score between 0 and 100),  -- 100 = definitely disposable
  first_seen timestamptz default now(),
  last_updated timestamptz default now(),
  is_active boolean default true
);

-- Index for fast lookups
create index idx_disposable_domains_active on disposable_domains(domain) where is_active = true;

-- Example upsert (run via ETL or cron)
insert into disposable_domains(domain, source, confidence_score)
values ('mailinator.com','public-list',95),
       ('10minutemail.com','public-list',90),
       ('guerrillamail.com','public-list',85)
on conflict (domain) do update set
  confidence_score = excluded.confidence_score,
  last_updated = now();

2) Check on registration server-side

import { sql } from '@/lib/db'

export interface EmailValidationResult {
  isValid: boolean
  isDisposable: boolean
  confidence: number
  signals: string[]
  recommendation: 'allow' | 'block' | 'review'
}

export async function validateEmail(email: string): Promise<EmailValidationResult> {
  const domain = extractDomain(email)
  if (!domain) {
    return {
      isValid: false,
      isDisposable: false,
      confidence: 0,
      signals: ['invalid_email_format'],
      recommendation: 'block'
    }
  }

  // Check against disposable domains
  const disposableCheck = await sql`
    select confidence_score, source
    from disposable_domains
    where domain = ${domain} and is_active = true
  `

  if (disposableCheck.length > 0) {
    const { confidence_score, source } = disposableCheck[0]
    return {
      isValid: true,
      isDisposable: true,
      confidence: confidence_score,
      signals: [`disposable_domain_${source}`],
      recommendation: confidence_score > 80 ? 'block' : 'review'
    }
  }

  // Additional checks (MX, SPF, etc.) could go here
  return {
    isValid: true,
    isDisposable: false,
    confidence: 0,
    signals: [],
    recommendation: 'allow'
  }
}

export function extractDomain(email: string): string | null {
  const at = email.lastIndexOf('@')
  if (at < 0) return null
  return email.slice(at + 1).toLowerCase()
}

3) DNS/MX quick validation (CLI for ops)

#!/bin/bash
# disposable-check.sh - Quick domain validation

DOMAIN="$1"

if [ -z "$DOMAIN" ]; then
  echo "Usage: $0 <domain>"
  exit 1
fi

echo "Checking domain: $DOMAIN"

# MX records
MX=$(dig +short MX "$DOMAIN")
if [ -z "$MX" ]; then
  echo "⚠️  No MX records found"
else
  echo "✅ MX records: $MX"
fi

# SPF record
SPF=$(dig +short TXT "$DOMAIN" | grep -i "spf" | head -1)
if [ -z "$SPF" ]; then
  echo "⚠️  No SPF record found"
else
  echo "✅ SPF: $SPF"
fi

# Domain age (rough estimate)
WHOIS=$(whois "$DOMAIN" | grep -i "Creation Date" | head -1)
if [ -z "$WHOIS" ]; then
  echo "⚠️  Cannot determine domain age"
else
  echo "✅ $WHOIS"
fi

# Known disposable check
if curl -s "https://raw.githubusercontent.com/disposable-email-domains/disposable-email-domains/master/domains.txt" | grep -q "^$DOMAIN$"; then
  echo "🚨 KNOWN DISPOSABLE DOMAIN"
fi

4) Optional SMTP reachability probe

import { createConnection } from 'net'

export async function checkSMTPCapability(domain: string): Promise<{
  canConnect: boolean
  supportsTLS: boolean
  error?: string
}> {
  return new Promise((resolve) => {
    const client = createConnection(25, domain)

    let response = ''
    let supportsTLS = false

    client.setTimeout(5000) // 5 second timeout

    client.on('data', (data) => {
      response += data.toString()
      if (response.includes('220') && response.includes('ESMTP')) {
        // Send EHLO to check TLS support
        client.write('EHLO example.com
')
      }
      if (response.includes('STARTTLS')) {
        supportsTLS = true
      }
    })

    client.on('timeout', () => {
      client.destroy()
      resolve({ canConnect: false, supportsTLS: false, error: 'timeout' })
    })

    client.on('error', (err) => {
      resolve({ canConnect: false, supportsTLS: false, error: err.message })
    })

    client.on('connect', () => {
      // Wait for banner and check
      setTimeout(() => {
        client.destroy()
        resolve({ canConnect: true, supportsTLS })
      }, 1000)
    })
  })
}

User Experience and Policy

Soft Blocks vs Hard Blocks

Prefer soft blocks with clear messaging:

// Example soft block response
const softBlockResponse = {
  success: false,
  error: {
    code: 'DISPOSABLE_EMAIL',
    message: 'We detected a temporary email address. Please use a permanent email to continue.',
    suggestion: 'Try Gmail, Outlook, or your work email address.'
  }
}

Hard blocks only for high-confidence cases (95%+). For medium confidence (50-80%), use:

Step-up verification: SMS, phone, or payment method.
Delayed activation: Email verification required after signup.
Rate limiting: Limit actions until email is verified.

Exception Handling

-- Temporary allowlist for business-critical cases
create table email_allowlist (
  email_pattern text primary key,  -- 'user@company.com' or '%@trusted-domain.com'
  reason text,
  expires_at timestamptz,
  added_by text,
  is_active boolean default true
);

-- Check if email is allowlisted
select exists(
  select 1 from email_allowlist
  where (email_pattern = :email or :email like email_pattern)
  and is_active = true
  and (expires_at is null or expires_at > now())
) as is_allowlisted;

Monitoring and Alerting

Weekly Trends

-- Disposable email share by week
with weekly_stats as (
  select
    date_trunc('week', created_at) as week,
    count(*) as total_signups,
    sum(case when is_disposable then 1 else 0 end) as disposable_count
  from user_registrations
  where created_at >= now() - interval '12 weeks'
  group by 1
)
select
  week,
  total_signups,
  disposable_count,
  round(100.0 * disposable_count / total_signups, 2) as disposable_percentage,
  -- Trend indicator
  lag(disposable_percentage) over (order by week) as prev_percentage,
  case
    when disposable_percentage > lag(disposable_percentage) over (order by week) * 1.5
    then '📈 SPIKE'
    when disposable_percentage < lag(disposable_percentage) over (order by week) * 0.7
    then '📉 DROP'
    else '➡️ STABLE'
  end as trend
from weekly_stats
order by week desc;

Real-time Alerts

#!/bin/bash
# disposable-alert.sh - Monitor for spikes in disposable usage

# Config
THRESHOLD_PERCENT=15  # Alert if >15% of signups are disposable
CHECK_HOURS=1         # Check last hour
DB_HOST="localhost"
DB_NAME="analytics"

# Query current rate
CURRENT_RATE=$(psql -h "$DB_HOST" -d "$DB_NAME" -tA -c "
  select coalesce(
    100.0 * sum(case when is_disposable then 1 else 0 end) / count(*),
    0
  )
  from user_registrations
  where created_at > now() - interval '$CHECK_HOURS hours'
")

# Check threshold
if (( $(echo "$CURRENT_RATE > $THRESHOLD_PERCENT" | bc -l) )); then
  echo "$(date): ALERT - Disposable rate at ${CURRENT_RATE}% (threshold: ${THRESHOLD_PERCENT}%)"
  # Send Slack notification, email, or trigger PagerDuty
  curl -X POST -H 'Content-type: application/json'     --data "{"text":"🚨 Disposable email spike: ${CURRENT_RATE}% in last hour"}"     "$SLACK_WEBHOOK_URL"
fi

Dashboard Metrics

Track these KPIs:

Daily/weekly disposable % — target <5%.
Top disposable domains — identify new threats.
Conversion rates — disposable vs permanent email users.
Bounce rates — correlation with disposable usage.

-- Top disposable domains in last 30 days
select
  email_domain,
  count(*) as usage_count,
  max(created_at) as last_seen
from user_registrations
where is_disposable = true
  and created_at > now() - interval '30 days'
group by 1
order by 2 desc
limit 20;

FAQ and Edge Cases

Corporate Testing Domains

Many companies use test domains like test@company.com or qa@internal.company.com.

Solution: Temporary allowlist with expiration:

// Add to allowlist for 30 days
await sql`
  insert into email_allowlist (email_pattern, reason, expires_at, added_by)
  values (%qa@internal.company.com%, 'Corporate testing', now() + interval '30 days', 'admin')
`

Catch-all Corporate Domains

Large organizations often have catch-all domains where any email goes to a central inbox.

Detection: High volume + SMTP reachability + low bounce rates.

-- Identify potential catch-all domains
select
  email_domain,
  count(*) as signup_count,
  avg(bounce_rate) as avg_bounce_rate
from user_registrations
where created_at > now() - interval '30 days'
group by 1
having count(*) > 100  -- High volume
  and avg(bounce_rate) < 0.05  -- Low bounces
order by 2 desc;

Internationalized Domains (IDN)

Domains with non-ASCII characters (e.g., münchen.de → xn--mnchen-3ya.de).

Solution: Normalize to punycode before checking:

import { punycode } from 'punycode'

export function normalizeDomain(domain: string): string {
  try {
    return punycode.toASCII(domain.toLowerCase())
  } catch {
    return domain.toLowerCase()
  }
}

False Positives

Common issues:

Legitimate temp emails: Alumni associations, conference registrations.
Corporate aliases: noreply@company.com used for notifications.
Educational institutions: Student email forwarding.

Mitigation:

Manual review queues for edge cases.
Allowlist management for known good domains.
Confidence scoring vs binary decisions.

-- Manual review queue for borderline cases
select user_id, email, confidence_score, created_at
from user_registrations
where is_disposable = true
  and confidence_score between 50 and 80  -- Medium confidence
  and created_at > now() - interval '24 hours'
order by created_at desc;

Best Practices

1. Layered Defense: Combine multiple signals rather than relying on one.

2. Regular Updates: Refresh disposable domain lists weekly.

3. A/B Testing: Test different thresholds and policies.

4. User Education: Clear messaging about why permanent emails are preferred.

5. Monitoring: Set up alerts for spikes and trends.

6. Privacy Compliance: Ensure checks comply with regional laws (GDPR, CCPA).

Integration Examples

Express.js Middleware

import { validateEmail } from './email-validator'

app.post('/api/signup', async (req, res) => {
  const { email } = req.body

  const validation = await validateEmail(email)

  if (validation.recommendation === 'block') {
    return res.status(400).json(validation)
  }

  if (validation.recommendation === 'review') {
    // Queue for manual review or step-up auth
    req.session.pendingReview = validation
  }

  // Continue with signup...
})

Python FastAPI

from fastapi import HTTPException
from .email_validator import validate_email

@app.post("/signup")
async def signup(email: str):
    validation = await validate_email(email)

    if validation["recommendation"] == "block":
        raise HTTPException(
            status_code=400,
            detail=validation["error_message"]
        )

    if validation["recommendation"] == "review":
        # Trigger additional verification
        pass

    return {"message": "Signup successful"}

This comprehensive approach balances fraud prevention with user experience, ensuring legitimate users aren't unnecessarily blocked while protecting your platform from abuse.

Disposable Email Detection: Protecting Your Platform from Temporary Addresses

Table of Contents

Table of Contents

Disposable Email Detection: Protecting Your Platform from Temporary Addresses

Why It Matters

Detection Signals

1. Curated Disposable Domain Lists

2. MX Record Anomalies

3. Domain Age and Patterns

4. ASN and Hosting Intelligence

Implementation (Node.js + SQL)

1) Maintain a disposable domains table

Practical Implementation Examples

Machine Learning Classifier

Real-Time Pattern Analysis

Automated Domain Discovery

Implementation (Node.js + SQL)

1) Maintain a disposable domains table

2) Check on registration server-side

3) DNS/MX quick validation (CLI for ops)

4) Optional SMTP reachability probe

User Experience and Policy

Soft Blocks vs Hard Blocks

Exception Handling

Monitoring and Alerting

Weekly Trends

Real-time Alerts

Dashboard Metrics

FAQ and Edge Cases

Corporate Testing Domains

Catch-all Corporate Domains

Internationalized Domains (IDN)

False Positives

Best Practices

Integration Examples

Express.js Middleware

Python FastAPI