Disposable Email Detection: Protecting Your Platform from Temporary Addresses

Implement advanced techniques to detect and handle disposable email addresses while maintaining user experience.

Disposable Email Detection: Protecting Your Platform from Temporary Addresses
August 17, 2025
26 min read
Email Validation

Disposable Email Detection: Protecting Your Platform from Temporary Addresses


Disposable/temporary inboxes hurt activation funnels, referral programs, trial abuse protection, and deliverability. Below is a pragmatic approach with detection signals, sample code, SQL, and ops guidance.


Disposable Email Detection Overview

Disposable Email Detection Overview


Why It Matters


  • Lower user quality and LTV due to throwaway accounts that never convert.
  • Increased fraud and promo abuse — free trials, coupons, and referral bonuses get exploited.
  • Bounce risk and sender reputation damage — high bounce rates tank deliverability.
  • Operational overhead — support tickets, manual reviews, and account cleanup.

Detection Signals


1. Curated Disposable Domain Lists


Maintain a table of known disposable providers. Sources include:

  • Public lists: GitHub repos like disposable-email-domains (1000+ domains).
  • Internal discovery: Track domains that appear in signups but have no MX records or suspicious patterns.
  • Community feeds: Abuse.ch, spamhaus, or custom crawlers.

# Example: check if domain is in disposable list
curl -s https://raw.githubusercontent.com/disposable-email-domains/disposable-email-domains/master/domains.txt | grep -q "mailinator.com" && echo "disposable"

2. MX Record Anomalies


Disposable services often have:

  • No MX records (relies on catch-all or forwarding).
  • Suspicious MX targets (e.g., pointing to temp mail services).
  • Missing SPF/DMARC (common for throwaway providers).

# Quick MX check for a domain
dig +short MX temp-mail.org
# Returns nothing or suspicious entries

# SPF/DMARC check
dig +short TXT temp-mail.org | grep -E "(spf|dmarc)"

3. Domain Age and Patterns


  • New domains (< 30 days old) are suspicious.
  • Known disposable TLDs (.tk, .ml, .cf are often abused).
  • Ephemeral patterns (domains that disappear after days).

# Check domain age
whois temp-mail.org | grep -i "Creation Date\|Registry Expiry Date" | head -2

4. ASN and Hosting Intelligence


Disposable services often cluster on:

  • High-risk ASNs (known for spam/VPN hosting).
  • Cloud providers with lax verification.
  • Geographic clusters (e.g., certain data centers).

-- Example: flag high-risk ASNs in user registrations
select user_id, email_domain, asn, country
from user_registrations r
join ip_geo g on g.ip = r.ip
where g.asn in (13335, 15169, 16276)  -- Known risky ASNs
and r.created_at > now() - interval '24 hours';

Implementation (Node.js + SQL)


1) Maintain a disposable domains table


create table if not exists disposable_domains (
  domain text primary key,
  source text,  -- 'public-list', 'internal-discovery', 'manual'
  confidence_score int check (confidence_score between 0 and 100),  -- 100 = definitely disposable
  first_seen timestamptz default now(),
  last_updated timestamptz default now(),
  is_active boolean default true
);

-- Index for fast lookups
create index idx_disposable_domains_active on disposable_domains(domain) where is_active = true;

-- Example upsert (run via ETL or cron)
insert into disposable_domains(domain, source, confidence_score)
values ('mailinator.com','public-list',95),
       ('10minutemail.com','public-list',90),
       ('guerrillamail.com','public-list',85)
on conflict (domain) do update set
  confidence_score = excluded.confidence_score,
  last_updated = now();

Practical Implementation Examples


Machine Learning Classifier


// Advanced machine learning classifier for disposable email detection
interface EmailFeatures {
  domain: string
  domainLength: number
  hasNumbers: boolean
  hasHyphens: boolean
  tld: string
  subdomainCount: number
  entropy: number
  mxRecordCount: number
  spfRecordExists: boolean
  dmarcRecordExists: boolean
  domainAge: number // in days
  registrationPattern: string
  suspiciousKeywords: string[]
  similarityToKnownDisposable: number
  trafficPatternScore: number
  geographicRisk: number
}

interface MLModel {
  weights: Record<string, number>
  bias: number
  featureNames: string[]
  threshold: number
  accuracy: number
  lastUpdated: number
}

interface PredictionResult {
  isDisposable: boolean
  confidence: number
  features: EmailFeatures
  modelVersion: string
  explanation: string[]
}

class DisposableEmailClassifier {
  private model: MLModel
  private featureExtractor: FeatureExtractor
  private knownDisposableDomains: Set<string> = new Set()
  private trainingData: Array<{ features: EmailFeatures; label: boolean }> = []

  constructor() {
    this.model = this.loadModel()
    this.featureExtractor = new FeatureExtractor()
    this.loadKnownDisposableDomains()
  }

  async predict(email: string): Promise<PredictionResult> {
    const domain = this.extractDomain(email)
    if (!domain) {
      return {
        isDisposable: false,
        confidence: 0,
        features: {} as EmailFeatures,
        modelVersion: this.model.lastUpdated.toString(),
        explanation: ['Invalid email format']
      }
    }

    // Check against known disposable domains first (fast path)
    if (this.knownDisposableDomains.has(domain)) {
      return {
        isDisposable: true,
        confidence: 95,
        features: {} as EmailFeatures,
        modelVersion: this.model.lastUpdated.toString(),
        explanation: ['Domain found in known disposable list']
      }
    }

    // Extract features for ML prediction
    const features = await this.featureExtractor.extractFeatures(domain)

    // Get ML prediction
    const mlScore = this.predictWithModel(features)

    // Combine with rule-based checks
    const ruleBasedScore = this.calculateRuleBasedScore(domain, features)

    // Ensemble prediction
    const combinedScore = (mlScore * 0.7) + (ruleBasedScore * 0.3)
    const isDisposable = combinedScore > this.model.threshold

    // Generate explanation
    const explanation = this.generateExplanation(features, mlScore, ruleBasedScore)

    return {
      isDisposable,
      confidence: Math.min(100, combinedScore * 100),
      features,
      modelVersion: this.model.lastUpdated.toString(),
      explanation
    }
  }

  async train(features: EmailFeatures[], labels: boolean[]): Promise<void> {
    // Simple gradient descent training
    const learningRate = 0.01
    const epochs = 100

    for (let epoch = 0; epoch < epochs; epoch++) {
      let totalError = 0

      for (let i = 0; i < features.length; i++) {
        const prediction = this.predictWithModel(features[i])
        const error = labels[i] ? prediction - 1 : prediction - 0

        totalError += Math.abs(error)

        // Update weights
        for (const featureName of this.model.featureNames) {
          const featureValue = features[i][featureName as keyof EmailFeatures] as number
          this.model.weights[featureName] -= learningRate * error * featureValue
        }

        this.model.bias -= learningRate * error
      }

      // Early stopping
      if (totalError < 0.01) break
    }

    this.model.lastUpdated = Date.now()
    this.model.accuracy = this.evaluateModel(features, labels)

    // Save updated model
    this.saveModel()
  }

  private predictWithModel(features: EmailFeatures): number {
    let score = this.model.bias

    for (const featureName of this.model.featureNames) {
      const weight = this.model.weights[featureName] || 0
      const value = features[featureName as keyof EmailFeatures] as number
      score += weight * value
    }

    // Sigmoid activation
    return 1 / (1 + Math.exp(-score))
  }

  private calculateRuleBasedScore(domain: string, features: EmailFeatures): number {
    let score = 0

    // Domain length heuristic
    if (features.domainLength < 5 || features.domainLength > 20) score += 0.3

    // TLD risk assessment
    const riskyTLDs = ['.tk', '.ml', '.cf', '.ga', '.gq']
    if (riskyTLDs.some(tld => features.tld.includes(tld))) score += 0.4

    // Numbers in domain
    if (features.hasNumbers) score += 0.2

    // Entropy (randomness) - disposable domains often have high entropy
    if (features.entropy > 3.5) score += 0.3

    // MX record anomalies
    if (features.mxRecordCount === 0) score += 0.5
    if (features.mxRecordCount > 5) score += 0.2

    // Missing SPF/DMARC
    if (!features.spfRecordExists || !features.dmarcRecordExists) score += 0.2

    // Domain age
    if (features.domainAge < 30) score += 0.4

    // Suspicious keywords
    if (features.suspiciousKeywords.length > 0) score += 0.3

    // Similarity to known disposable
    if (features.similarityToKnownDisposable > 0.8) score += 0.4

    return Math.min(1, score)
  }

  private generateExplanation(features: EmailFeatures, mlScore: number, ruleScore: number): string[] {
    const explanations: string[] = []

    if (mlScore > 0.7) explanations.push('High ML confidence score')
    if (ruleScore > 0.6) explanations.push('Multiple rule-based indicators triggered')

    if (features.domainLength < 5) explanations.push('Unusually short domain name')
    if (features.domainLength > 20) explanations.push('Unusually long domain name')

    if (features.entropy > 3.5) explanations.push('High domain name entropy (appears random)')

    if (features.mxRecordCount === 0) explanations.push('No MX records found')

    if (!features.spfRecordExists) explanations.push('Missing SPF record')

    if (features.domainAge < 30) explanations.push('Very new domain registration')

    if (features.suspiciousKeywords.length > 0) {
      explanations.push(`Suspicious keywords detected: ${features.suspiciousKeywords.join(', ')}`)
    }

    return explanations
  }

  private loadModel(): MLModel {
    // In production, load from database or file
    return {
      weights: {
        domainLength: -0.1,
        hasNumbers: 0.3,
        hasHyphens: 0.2,
        entropy: 0.4,
        mxRecordCount: -0.2,
        spfRecordExists: -0.3,
        dmarcRecordExists: -0.2,
        domainAge: -0.4,
        similarityToKnownDisposable: 0.5,
        trafficPatternScore: 0.3,
        geographicRisk: 0.2
      },
      bias: -2.0,
      featureNames: [
        'domainLength', 'hasNumbers', 'hasHyphens', 'entropy',
        'mxRecordCount', 'spfRecordExists', 'dmarcRecordExists',
        'domainAge', 'similarityToKnownDisposable', 'trafficPatternScore', 'geographicRisk'
      ],
      threshold: 0.6,
      accuracy: 0.92,
      lastUpdated: Date.now()
    }
  }

  private saveModel(): void {
    // Save model to database or file
    console.log('Model saved with accuracy:', this.model.accuracy)
  }

  private loadKnownDisposableDomains(): void {
    // Load from database or external API
    const disposableDomains = [
      'mailinator.com', '10minutemail.com', 'guerrillamail.com',
      'tempmail.com', 'throwaway.email', 'dispostable.com'
    ]

    disposableDomains.forEach(domain => this.knownDisposableDomains.add(domain))
  }

  private extractDomain(email: string): string | null {
    const atIndex = email.lastIndexOf('@')
    if (atIndex < 0) return null
    return email.slice(atIndex + 1).toLowerCase()
  }

  private evaluateModel(features: EmailFeatures[], labels: boolean[]): number {
    let correct = 0

    for (let i = 0; i < features.length; i++) {
      const prediction = this.predictWithModel(features[i])
      const predicted = prediction > this.model.threshold

      if (predicted === labels[i]) correct++
    }

    return correct / features.length
  }
}

class FeatureExtractor {
  async extractFeatures(domain: string): Promise<EmailFeatures> {
    const features: EmailFeatures = {
      domain,
      domainLength: domain.length,
      hasNumbers: /d/.test(domain),
      hasHyphens: domain.includes('-'),
      tld: domain.split('.').pop() || '',
      subdomainCount: domain.split('.').length - 1,
      entropy: this.calculateEntropy(domain),
      mxRecordCount: await this.getMXRecordCount(domain),
      spfRecordExists: await this.checkSPFRecord(domain),
      dmarcRecordExists: await this.checkDMARCRecord(domain),
      domainAge: await this.getDomainAge(domain),
      registrationPattern: this.analyzeRegistrationPattern(domain),
      suspiciousKeywords: this.findSuspiciousKeywords(domain),
      similarityToKnownDisposable: this.calculateSimilarityToKnownDisposable(domain),
      trafficPatternScore: await this.analyzeTrafficPatterns(domain),
      geographicRisk: await this.assessGeographicRisk(domain)
    }

    return features
  }

  private calculateEntropy(domain: string): number {
    const charCounts = new Map<string, number>()

    for (const char of domain) {
      charCounts.set(char, (charCounts.get(char) || 0) + 1)
    }

    let entropy = 0
    const length = domain.length

    for (const count of charCounts.values()) {
      const probability = count / length
      entropy -= probability * Math.log2(probability)
    }

    return entropy
  }

  private async getMXRecordCount(domain: string): Promise<number> {
    // In production, use DNS lookup
    // For demo, simulate based on domain patterns
    if (domain.includes('temp') || domain.includes('mail')) return 0
    return Math.floor(Math.random() * 3) + 1
  }

  private async checkSPFRecord(domain: string): Promise<boolean> {
    // In production, query DNS TXT records
    return Math.random() > 0.3 // 70% of domains have SPF
  }

  private async checkDMARCRecord(domain: string): Promise<boolean> {
    // In production, query DNS TXT records for _dmarc.domain
    return Math.random() > 0.5 // 50% of domains have DMARC
  }

  private async getDomainAge(domain: string): Promise<number> {
    // In production, use WHOIS lookup
    // For demo, simulate based on domain characteristics
    if (domain.length < 10) return Math.floor(Math.random() * 30) + 1 // 1-30 days
    if (domain.includes('temp')) return Math.floor(Math.random() * 7) + 1 // 1-7 days
    return Math.floor(Math.random() * 365) + 30 // 30-395 days
  }

  private analyzeRegistrationPattern(domain: string): string {
    // Analyze domain registration patterns
    if (domain.length < 8) return 'short'
    if (domain.includes('temp') || domain.includes('mail')) return 'temporary'
    if (/d{4,}/.test(domain)) return 'numeric'
    if (domain.split('.').length > 2) return 'multi_subdomain'
    return 'standard'
  }

  private findSuspiciousKeywords(domain: string): string[] {
    const suspiciousWords = [
      'temp', 'mail', 'throwaway', 'disposable', 'fake', 'test',
      'demo', 'sample', 'example', 'trash', 'junk', 'spam'
    ]

    return suspiciousWords.filter(word => domain.includes(word))
  }

  private calculateSimilarityToKnownDisposable(domain: string): number {
    // Calculate string similarity to known disposable domains
    const knownDisposable = ['mailinator', 'tempmail', 'guerrillamail', '10minutemail']

    let maxSimilarity = 0

    for (const disposable of knownDisposable) {
      const similarity = this.calculateStringSimilarity(domain, disposable)
      maxSimilarity = Math.max(maxSimilarity, similarity)
    }

    return maxSimilarity
  }

  private calculateStringSimilarity(str1: string, str2: string): number {
    // Simple Levenshtein distance ratio
    const longer = str1.length > str2.length ? str1 : str2
    const shorter = str1.length > str2.length ? str2 : str1

    if (longer.length === 0) return 1.0

    const editDistance = this.levenshteinDistance(longer, shorter)
    return (longer.length - editDistance) / longer.length
  }

  private levenshteinDistance(str1: string, str2: string): number {
    const matrix = Array(str2.length + 1).fill(null).map(() => Array(str1.length + 1).fill(null))

    for (let i = 0; i <= str1.length; i++) matrix[0][i] = i
    for (let j = 0; j <= str2.length; j++) matrix[j][0] = j

    for (let j = 1; j <= str2.length; j++) {
      for (let i = 1; i <= str1.length; i++) {
        const indicator = str1[i - 1] === str2[j - 1] ? 0 : 1
        matrix[j][i] = Math.min(
          matrix[j][i - 1] + 1,     // deletion
          matrix[j - 1][i] + 1,     // insertion
          matrix[j - 1][i - 1] + indicator // substitution
        )
      }
    }

    return matrix[str2.length][str1.length]
  }

  private async analyzeTrafficPatterns(domain: string): Promise<number> {
    // Analyze traffic patterns for the domain
    // In production, use historical traffic data

    // For demo, simulate based on domain characteristics
    if (domain.includes('temp')) return 0.8 // High risk
    if (domain.length < 10) return 0.6 // Medium risk
    return 0.2 // Low risk
  }

  private async assessGeographicRisk(domain: string): Promise<number> {
    // Assess geographic risk based on domain registration location
    // In production, use WHOIS data or IP geolocation

    // For demo, simulate based on TLD
    const highRiskTLDs = ['.ru', '.cn', '.ir', '.kp']
    const tld = domain.split('.').pop() || ''

    if (highRiskTLDs.includes('.' + tld)) return 0.8
    return 0.3
  }
}

// Integration with email validation service
const disposableClassifier = new DisposableEmailClassifier()

// Enhanced email validation with ML
export async function validateEmailWithML(email: string): Promise<{
  isValid: boolean
  isDisposable: boolean
  confidence: number
  riskLevel: 'low' | 'medium' | 'high' | 'critical'
  explanation: string[]
  recommendations: string[]
}> {
  const basicValidation = await validateEmail(email)

  if (!basicValidation.isValid) {
    return {
      isValid: false,
      isDisposable: false,
      confidence: 100,
      riskLevel: 'low',
      explanation: ['Invalid email format'],
      recommendations: ['Please enter a valid email address']
    }
  }

  // Run ML classification
  const mlPrediction = await disposableClassifier.predict(email)

  // Determine risk level
  let riskLevel: 'low' | 'medium' | 'high' | 'critical' = 'low'
  if (mlPrediction.confidence > 90) riskLevel = 'critical'
  else if (mlPrediction.confidence > 70) riskLevel = 'high'
  else if (mlPrediction.confidence > 40) riskLevel = 'medium'

  // Generate recommendations
  const recommendations = []
  if (mlPrediction.isDisposable) {
    recommendations.push('Please use a permanent email address')
    recommendations.push('Consider using Gmail, Outlook, or your work email')

    if (riskLevel === 'critical') {
      recommendations.push('This email domain is known to be disposable')
    } else {
      recommendations.push('This email domain shows suspicious characteristics')
    }
  }

  return {
    isValid: true,
    isDisposable: mlPrediction.isDisposable,
    confidence: mlPrediction.confidence,
    riskLevel,
    explanation: mlPrediction.explanation,
    recommendations
  }
}

// Express.js middleware for email validation
app.post('/api/validate-email', async (req, res) => {
  try {
    const { email } = req.body

    if (!email) {
      return res.status(400).json({ error: 'Email address required' })
    }

    const validation = await validateEmailWithML(email)

    res.json({
      email,
      validation,
      timestamp: new Date().toISOString()
    })

  } catch (error) {
    console.error('Email validation error:', error)
    res.status(500).json({ error: 'Validation service unavailable' })
  }
})

// Batch validation endpoint
app.post('/api/validate-emails', async (req, res) => {
  try {
    const { emails } = req.body

    if (!Array.isArray(emails)) {
      return res.status(400).json({ error: 'Emails array required' })
    }

    const validations = await Promise.all(
      emails.map(email => validateEmailWithML(email))
    )

    const results = emails.map((email, index) => ({
      email,
      validation: validations[index]
    }))

    res.json({
      results,
      summary: {
        total: emails.length,
        valid: results.filter(r => r.validation.isValid).length,
        disposable: results.filter(r => r.validation.isDisposable).length,
        highRisk: results.filter(r => r.validation.riskLevel === 'critical').length
      },
      timestamp: new Date().toISOString()
    })

  } catch (error) {
    console.error('Batch email validation error:', error)
    res.status(500).json({ error: 'Batch validation service unavailable' })
  }
})

console.log('Disposable email ML classifier initialized')

Real-Time Pattern Analysis


// Real-time pattern analysis for detecting emerging disposable email threats
interface EmailPattern {
  id: string
  pattern: string
  type: 'domain_pattern' | 'registration_pattern' | 'behavioral_pattern' | 'network_pattern'
  confidence: number
  frequency: number
  firstSeen: number
  lastSeen: number
  riskScore: number
  affectedDomains: string[]
  indicators: string[]
}

interface PatternAnalysisResult {
  suspiciousDomains: string[]
  emergingPatterns: EmailPattern[]
  trendAnalysis: {
    direction: 'increasing' | 'decreasing' | 'stable'
    changeRate: number
    confidence: number
  }
  recommendations: string[]
}

class RealTimePatternAnalyzer {
  private patternBuffer: Map<string, EmailPattern> = new Map()
  private domainActivity: Map<string, { count: number; lastSeen: number; patterns: string[] }> = new Map()
  private analysisWindow = 24 * 60 * 60 * 1000 // 24 hours
  private minPatternFrequency = 5
  private subscribers: Array<(result: PatternAnalysisResult) => void> = []

  constructor() {
    this.startPatternAnalysis()
  }

  // Analyze email domain for suspicious patterns
  async analyzeDomain(domain: string): Promise<{
    isSuspicious: boolean
    patterns: string[]
    riskScore: number
    recommendations: string[]
  }> {
    const analysis = {
      isSuspicious: false,
      patterns: [] as string[],
      riskScore: 0,
      recommendations: [] as string[]
    }

    // Update domain activity
    this.updateDomainActivity(domain)

    // Check for known patterns
    const detectedPatterns = await this.detectPatterns(domain)

    for (const pattern of detectedPatterns) {
      if (pattern.riskScore > 50) {
        analysis.isSuspicious = true
        analysis.patterns.push(pattern.pattern)
        analysis.riskScore = Math.max(analysis.riskScore, pattern.riskScore)

        if (pattern.riskScore > 80) {
          analysis.recommendations.push('Block domain immediately')
          analysis.recommendations.push('Monitor for similar patterns')
        } else if (pattern.riskScore > 60) {
          analysis.recommendations.push('Require additional verification')
          analysis.recommendations.push('Add to watchlist')
        }
      }
    }

    // Check for behavioral anomalies
    const behavioralScore = await this.analyzeBehavioralPatterns(domain)
    if (behavioralScore > 70) {
      analysis.isSuspicious = true
      analysis.patterns.push('behavioral_anomaly')
      analysis.riskScore = Math.max(analysis.riskScore, behavioralScore)
      analysis.recommendations.push('Investigate account activity')
    }

    return analysis
  }

  // Subscribe to pattern analysis results
  subscribe(callback: (result: PatternAnalysisResult) => void): () => void {
    this.subscribers.push(callback)

    return () => {
      const index = this.subscribers.indexOf(callback)
      if (index > -1) {
        this.subscribers.splice(index, 1)
      }
    }
  }

  // Get comprehensive pattern analysis
  async getPatternAnalysis(timeframe: number = this.analysisWindow): Promise<PatternAnalysisResult> {
    const cutoff = Date.now() - timeframe

    // Filter recent patterns
    const recentPatterns = Array.from(this.patternBuffer.values())
      .filter(pattern => pattern.lastSeen > cutoff)

    // Identify suspicious domains
    const suspiciousDomains = await this.identifySuspiciousDomains(cutoff)

    // Analyze trends
    const trendAnalysis = await this.analyzeTrend(cutoff)

    // Generate recommendations
    const recommendations = this.generateAnalysisRecommendations(recentPatterns, suspiciousDomains)

    const result: PatternAnalysisResult = {
      suspiciousDomains,
      emergingPatterns: recentPatterns.slice(0, 10), // Top 10 patterns
      trendAnalysis,
      recommendations
    }

    // Notify subscribers
    this.subscribers.forEach(callback => {
      try {
        callback(result)
      } catch (error) {
        console.error('Error in pattern analysis subscriber:', error)
      }
    })

    return result
  }

  private async detectPatterns(domain: string): Promise<EmailPattern[]> {
    const patterns: EmailPattern[] = []

    // Domain pattern analysis
    const domainPatterns = await this.detectDomainPatterns(domain)
    patterns.push(...domainPatterns)

    // Registration pattern analysis
    const registrationPatterns = await this.detectRegistrationPatterns(domain)
    patterns.push(...registrationPatterns)

    // Network pattern analysis
    const networkPatterns = await this.detectNetworkPatterns(domain)
    patterns.push(...networkPatterns)

    return patterns
  }

  private async detectDomainPatterns(domain: string): Promise<EmailPattern[]> {
    const patterns: EmailPattern[] = []

    // Pattern 1: Random-looking domains
    const entropy = this.calculateEntropy(domain)
    if (entropy > 3.5) {
      patterns.push({
        id: `random_domain_${Date.now()}`,
        pattern: 'high_entropy_domain',
        type: 'domain_pattern',
        confidence: Math.min(100, entropy * 20),
        frequency: 1,
        firstSeen: Date.now(),
        lastSeen: Date.now(),
        riskScore: Math.min(100, entropy * 25),
        affectedDomains: [domain],
        indicators: ['high_entropy', 'random_character_distribution']
      })
    }

    // Pattern 2: Sequential domains (like temp123.com)
    if (/tempd+.com/.test(domain) || /maild+.com/.test(domain)) {
      patterns.push({
        id: `sequential_domain_${Date.now()}`,
        pattern: 'sequential_domain_pattern',
        type: 'domain_pattern',
        confidence: 85,
        frequency: 1,
        firstSeen: Date.now(),
        lastSeen: Date.now(),
        riskScore: 80,
        affectedDomains: [domain],
        indicators: ['sequential_numbering', 'temp_mail_pattern']
      })
    }

    // Pattern 3: Known disposable TLDs
    const riskyTLDs = ['.tk', '.ml', '.cf', '.ga', '.gq']
    const tld = domain.split('.').pop() || ''
    if (riskyTLDs.includes('.' + tld)) {
      patterns.push({
        id: `risky_tld_${Date.now()}`,
        pattern: 'risky_tld_pattern',
        type: 'domain_pattern',
        confidence: 90,
        frequency: 1,
        firstSeen: Date.now(),
        lastSeen: Date.now(),
        riskScore: 85,
        affectedDomains: [domain],
        indicators: ['high_risk_tld', 'known_disposable_tld']
      })
    }

    return patterns
  }

  private async detectRegistrationPatterns(domain: string): Promise<EmailPattern[]> {
    const patterns: EmailPattern[] = []

    // In production, this would use WHOIS data
    // For demo, simulate based on domain characteristics

    // Pattern: Very new domains (less than 30 days)
    if (domain.length < 10 || domain.includes('temp')) {
      patterns.push({
        id: `new_domain_${Date.now()}`,
        pattern: 'new_domain_registration',
        type: 'registration_pattern',
        confidence: 75,
        frequency: 1,
        firstSeen: Date.now(),
        lastSeen: Date.now(),
        riskScore: 70,
        affectedDomains: [domain],
        indicators: ['recent_registration', 'suspicious_timing']
      })
    }

    // Pattern: Bulk registration patterns
    if (/d{3,}/.test(domain)) {
      patterns.push({
        id: `bulk_registration_${Date.now()}`,
        pattern: 'bulk_registration_pattern',
        type: 'registration_pattern',
        confidence: 80,
        frequency: 1,
        firstSeen: Date.now(),
        lastSeen: Date.now(),
        riskScore: 75,
        affectedDomains: [domain],
        indicators: ['bulk_registration', 'automated_registration']
      })
    }

    return patterns
  }

  private async detectNetworkPatterns(domain: string): Promise<EmailPattern[]> {
    const patterns: EmailPattern[] = []

    // In production, this would analyze network traffic patterns
    // For demo, simulate based on domain characteristics

    // Pattern: High-risk hosting patterns
    if (domain.includes('free') || domain.includes('hosting')) {
      patterns.push({
        id: `hosting_pattern_${Date.now()}`,
        pattern: 'suspicious_hosting',
        type: 'network_pattern',
        confidence: 70,
        frequency: 1,
        firstSeen: Date.now(),
        lastSeen: Date.now(),
        riskScore: 65,
        affectedDomains: [domain],
        indicators: ['free_hosting', 'suspicious_infrastructure']
      })
    }

    return patterns
  }

  private async analyzeBehavioralPatterns(domain: string): Promise<number> {
    const activity = this.domainActivity.get(domain)

    if (!activity || activity.count < 10) return 0

    // Analyze behavioral indicators
    let riskScore = 0

    // High frequency in short time
    const timeSpan = Date.now() - activity.lastSeen
    if (timeSpan < 60 * 60 * 1000 && activity.count > 50) { // 50+ uses in last hour
      riskScore += 40
    }

    // Rapid sequential access pattern
    if (activity.patterns.includes('sequential_access')) {
      riskScore += 30
    }

    // Geographic dispersion (unusual for disposable)
    if (activity.patterns.includes('geographic_dispersion')) {
      riskScore += 20
    }

    return Math.min(100, riskScore)
  }

  private updateDomainActivity(domain: string): void {
    const current = this.domainActivity.get(domain) || {
      count: 0,
      lastSeen: 0,
      patterns: []
    }

    current.count++
    current.lastSeen = Date.now()

    // Detect access patterns
    if (current.count > 1) {
      const timeSinceLast = Date.now() - current.lastSeen
      if (timeSinceLast < 1000) { // Less than 1 second between accesses
        current.patterns.push('rapid_access')
      }
    }

    this.domainActivity.set(domain, current)
  }

  private async identifySuspiciousDomains(cutoff: number): Promise<string[]> {
    const suspiciousDomains: string[] = []

    for (const [domain, activity] of this.domainActivity.entries()) {
      if (activity.lastSeen < cutoff) continue

      let suspiciousScore = 0

      // High activity volume
      if (activity.count > 100) suspiciousScore += 30

      // Recent first appearance
      if (activity.lastSeen - activity.lastSeen < 24 * 60 * 60 * 1000) suspiciousScore += 20

      // Suspicious patterns
      if (activity.patterns.length > 0) suspiciousScore += 25

      if (suspiciousScore > 60) {
        suspiciousDomains.push(domain)
      }
    }

    return suspiciousDomains.slice(0, 50) // Top 50 suspicious domains
  }

  private async analyzeTrend(cutoff: number): Promise<{
    direction: 'increasing' | 'decreasing' | 'stable'
    changeRate: number
    confidence: number
  }> {
    const recentPatterns = Array.from(this.patternBuffer.values())
      .filter(pattern => pattern.lastSeen > cutoff)

    if (recentPatterns.length < 10) {
      return { direction: 'stable', changeRate: 0, confidence: 50 }
    }

    // Simple trend analysis based on pattern frequency over time
    const now = Date.now()
    const windowSize = 6 * 60 * 60 * 1000 // 6 hours

    const recentWindow = recentPatterns.filter(p => now - p.lastSeen < windowSize)
    const olderWindow = recentPatterns.filter(p => now - p.lastSeen >= windowSize)

    const recentAvg = recentWindow.reduce((sum, p) => sum + p.frequency, 0) / recentWindow.length || 0
    const olderAvg = olderWindow.reduce((sum, p) => sum + p.frequency, 0) / olderWindow.length || 0

    let direction: 'increasing' | 'decreasing' | 'stable' = 'stable'
    let changeRate = 0

    if (recentAvg > olderAvg * 1.2) {
      direction = 'increasing'
      changeRate = (recentAvg - olderAvg) / olderAvg
    } else if (recentAvg < olderAvg * 0.8) {
      direction = 'decreasing'
      changeRate = (olderAvg - recentAvg) / olderAvg
    }

    return {
      direction,
      changeRate: Math.round(changeRate * 100) / 100,
      confidence: 75 // Simplified confidence score
    }
  }

  private generateAnalysisRecommendations(patterns: EmailPattern[], suspiciousDomains: string[]): string[] {
    const recommendations: string[] = []

    if (suspiciousDomains.length > 20) {
      recommendations.push('High number of suspicious domains detected')
      recommendations.push('Consider tightening domain validation rules')
    }

    const highRiskPatterns = patterns.filter(p => p.riskScore > 80)
    if (highRiskPatterns.length > 5) {
      recommendations.push('Multiple high-risk patterns detected')
      recommendations.push('Enable enhanced monitoring and alerting')
    }

    if (patterns.some(p => p.type === 'network_pattern')) {
      recommendations.push('Network-level anomalies detected')
      recommendations.push('Review infrastructure security')
    }

    if (recommendations.length === 0) {
      recommendations.push('Pattern analysis shows normal activity')
    }

    return recommendations
  }

  private calculateEntropy(domain: string): number {
    const charCounts = new Map<string, number>()

    for (const char of domain) {
      charCounts.set(char, (charCounts.get(char) || 0) + 1)
    }

    let entropy = 0
    const length = domain.length

    for (const count of charCounts.values()) {
      const probability = count / length
      entropy -= probability * Math.log2(probability)
    }

    return entropy
  }

  private startPatternAnalysis(): void {
    // Run pattern analysis every 5 minutes
    setInterval(async () => {
      await this.getPatternAnalysis()
    }, 5 * 60 * 1000)

    // Clean up old data every hour
    setInterval(() => {
      this.cleanupOldData()
    }, 60 * 60 * 1000)
  }

  private cleanupOldData(): void {
    const cutoff = Date.now() - this.analysisWindow

    // Remove old patterns
    for (const [id, pattern] of this.patternBuffer.entries()) {
      if (pattern.lastSeen < cutoff) {
        this.patternBuffer.delete(id)
      }
    }

    // Remove old domain activity
    for (const [domain, activity] of this.domainActivity.entries()) {
      if (activity.lastSeen < cutoff) {
        this.domainActivity.delete(domain)
      }
    }
  }
}

// Integration with pattern analysis
const patternAnalyzer = new RealTimePatternAnalyzer()

// API endpoints for pattern analysis
app.get('/api/patterns/analysis', async (req, res) => {
  try {
    const timeframe = parseInt(req.query.timeframe as string) || 24 * 60 * 60 * 1000 // 24 hours default
    const analysis = await patternAnalyzer.getPatternAnalysis(timeframe)

    res.json({
      ...analysis,
      timeframe,
      timestamp: new Date().toISOString()
    })

  } catch (error) {
    console.error('Pattern analysis error:', error)
    res.status(500).json({ error: 'Pattern analysis unavailable' })
  }
})

// Subscribe to pattern analysis updates
app.ws('/api/patterns/stream', (ws: any) => {
  const unsubscribe = patternAnalyzer.subscribe((result) => {
    ws.send(JSON.stringify({
      type: 'pattern_analysis',
      data: result,
      timestamp: new Date().toISOString()
    }))
  })

  ws.on('close', () => {
    unsubscribe()
  })
})

// Analyze specific domain
app.post('/api/patterns/analyze-domain', async (req, res) => {
  try {
    const { domain } = req.body

    if (!domain) {
      return res.status(400).json({ error: 'Domain required' })
    }

    const analysis = await patternAnalyzer.analyzeDomain(domain)

    res.json({
      domain,
      analysis,
      timestamp: new Date().toISOString()
    })

  } catch (error) {
    console.error('Domain analysis error:', error)
    res.status(500).json({ error: 'Domain analysis unavailable' })
  }
})

console.log('Real-time pattern analyzer initialized')

Automated Domain Discovery


// Automated system for discovering new disposable email domains
interface DomainDiscoveryConfig {
  crawlInterval: number // minutes
  maxDomainsPerCrawl: number
  verificationTimeout: number // seconds
  similarityThreshold: number
  minConfidenceScore: number
  externalSources: string[]
}

interface DiscoveredDomain {
  domain: string
  source: string
  discoveryMethod: 'crawler' | 'similarity' | 'external_api' | 'user_report'
  confidence: number
  verificationStatus: 'pending' | 'verified' | 'failed' | 'confirmed_disposable'
  firstSeen: number
  lastVerified: number
  mxRecords: string[]
  spfRecord: string | null
  dmarcRecord: string | null
  similarTo: string[]
  riskFactors: string[]
}

interface CrawlResult {
  newDomains: DiscoveredDomain[]
  verifiedDisposable: DiscoveredDomain[]
  failedVerifications: string[]
  crawlStats: {
    domainsCrawled: number
    pagesProcessed: number
    avgResponseTime: number
    errorRate: number
  }
}

class AutomatedDomainDiscovery {
  private discoveredDomains: Map<string, DiscoveredDomain> = new Map()
  private knownDisposableDomains: Set<string> = new Set()
  private crawler: DomainCrawler
  private verifier: DomainVerifier
  private similarityEngine: SimilarityEngine
  private config: DomainDiscoveryConfig
  private subscribers: Array<(result: CrawlResult) => void> = []

  constructor(config: DomainDiscoveryConfig) {
    this.config = config
    this.crawler = new DomainCrawler()
    this.verifier = new DomainVerifier()
    this.similarityEngine = new SimilarityEngine()
    this.loadKnownDisposableDomains()
    this.startDiscoveryProcess()
  }

  // Subscribe to discovery results
  subscribe(callback: (result: CrawlResult) => void): () => void {
    this.subscribers.push(callback)

    return () => {
      const index = this.subscribers.indexOf(callback)
      if (index > -1) {
        this.subscribers.splice(index, 1)
      }
    }
  }

  // Manually trigger domain discovery
  async triggerDiscovery(): Promise<CrawlResult> {
    console.log('Starting manual domain discovery...')

    const crawlResult = await this.performDiscoveryCrawl()
    await this.processDiscoveryResults(crawlResult)

    // Notify subscribers
    this.subscribers.forEach(callback => {
      try {
        callback(crawlResult)
      } catch (error) {
        console.error('Error in domain discovery subscriber:', error)
      }
    })

    return crawlResult
  }

  // Get current discovery status
  getDiscoveryStatus(): {
    totalDiscovered: number
    pendingVerification: number
    confirmedDisposable: number
    lastCrawlTime: number
    nextScheduledCrawl: number
    systemHealth: 'healthy' | 'degraded' | 'unhealthy'
  } {
    const totalDiscovered = this.discoveredDomains.size
    const pendingVerification = Array.from(this.discoveredDomains.values())
      .filter(d => d.verificationStatus === 'pending').length
    const confirmedDisposable = Array.from(this.discoveredDomains.values())
      .filter(d => d.verificationStatus === 'confirmed_disposable').length

    let systemHealth: 'healthy' | 'degraded' | 'unhealthy' = 'healthy'
    if (pendingVerification > 1000) systemHealth = 'degraded'
    if (pendingVerification > 5000) systemHealth = 'unhealthy'

    return {
      totalDiscovered,
      pendingVerification,
      confirmedDisposable,
      lastCrawlTime: Date.now() - (5 * 60 * 1000), // 5 minutes ago for demo
      nextScheduledCrawl: Date.now() + (this.config.crawlInterval * 60 * 1000),
      systemHealth
    }
  }

  private async performDiscoveryCrawl(): Promise<CrawlResult> {
    const startTime = Date.now()
    const result: CrawlResult = {
      newDomains: [],
      verifiedDisposable: [],
      failedVerifications: [],
      crawlStats: {
        domainsCrawled: 0,
        pagesProcessed: 0,
        avgResponseTime: 0,
        errorRate: 0
      }
    }

    try {
      // Crawl disposable email provider lists
      const crawledDomains = await this.crawler.crawlDisposableProviders()

      result.crawlStats.domainsCrawled = crawledDomains.length
      result.crawlStats.pagesProcessed = crawledDomains.length * 2 // Rough estimate

      // Process each discovered domain
      for (const domain of crawledDomains.slice(0, this.config.maxDomainsPerCrawl)) {
        const discoveredDomain = await this.processDiscoveredDomain(domain, 'crawler')
        result.newDomains.push(discoveredDomain)

        // Attempt immediate verification for high-confidence domains
        if (discoveredDomain.confidence > 80) {
          const verification = await this.verifier.verifyDomain(domain)
          discoveredDomain.verificationStatus = verification.isDisposable ? 'confirmed_disposable' : 'verified'
          discoveredDomain.lastVerified = Date.now()

          if (verification.isDisposable) {
            result.verifiedDisposable.push(discoveredDomain)
          }
        }
      }

      // Find similar domains to known disposable ones
      const similarDomains = await this.findSimilarDomains()
      for (const domain of similarDomains) {
        if (!this.discoveredDomains.has(domain)) {
          const discoveredDomain = await this.processDiscoveredDomain(domain, 'similarity')
          result.newDomains.push(discoveredDomain)
        }
      }

      // Check external APIs for new disposable domains
      const externalDomains = await this.checkExternalSources()
      for (const domain of externalDomains) {
        if (!this.discoveredDomains.has(domain)) {
          const discoveredDomain = await this.processDiscoveredDomain(domain, 'external_api')
          result.newDomains.push(discoveredDomain)
        }
      }

      // Calculate crawl statistics
      const totalTime = Date.now() - startTime
      result.crawlStats.avgResponseTime = totalTime / Math.max(result.newDomains.length, 1)
      result.crawlStats.errorRate = result.failedVerifications.length / Math.max(result.newDomains.length, 1)

    } catch (error) {
      console.error('Discovery crawl error:', error)
      result.crawlStats.errorRate = 1.0
    }

    return result
  }

  private async processDiscoveredDomain(domain: string, method: DiscoveredDomain['discoveryMethod']): Promise<DiscoveredDomain> {
    const discoveredDomain: DiscoveredDomain = {
      domain,
      source: method,
      discoveryMethod: method,
      confidence: await this.calculateDiscoveryConfidence(domain, method),
      verificationStatus: 'pending',
      firstSeen: Date.now(),
      lastVerified: 0,
      mxRecords: [],
      spfRecord: null,
      dmarcRecord: null,
      similarTo: [],
      riskFactors: []
    }

    // Perform basic DNS checks
    const dnsInfo = await this.verifier.getDNSInfo(domain)
    discoveredDomain.mxRecords = dnsInfo.mxRecords
    discoveredDomain.spfRecord = dnsInfo.spfRecord
    discoveredDomain.dmarcRecord = dnsInfo.dmarcRecord

    // Analyze risk factors
    discoveredDomain.riskFactors = await this.analyzeRiskFactors(domain, dnsInfo)

    // Find similar domains
    discoveredDomain.similarTo = await this.similarityEngine.findSimilarDomains(domain)

    this.discoveredDomains.set(domain, discoveredDomain)

    return discoveredDomain
  }

  private async calculateDiscoveryConfidence(domain: string, method: string): Promise<number> {
    let confidence = 50 // Base confidence

    // Method-based confidence boost
    switch (method) {
      case 'crawler':
        confidence += 30
        break
      case 'similarity':
        confidence += 20
        break
      case 'external_api':
        confidence += 25
        break
      case 'user_report':
        confidence += 15
        break
    }

    // Domain-based confidence adjustments
    if (domain.length < 8) confidence += 10 // Short domains are suspicious
    if (domain.length > 20) confidence -= 10 // Very long domains are less likely disposable

    if (/d{3,}/.test(domain)) confidence += 15 // Numeric sequences are suspicious

    if (domain.includes('temp') || domain.includes('mail')) confidence += 20

    // TLD-based confidence
    const riskyTLDs = ['.tk', '.ml', '.cf', '.ga', '.gq']
    const tld = domain.split('.').pop() || ''
    if (riskyTLDs.includes('.' + tld)) confidence += 25

    return Math.min(100, Math.max(0, confidence))
  }

  private async analyzeRiskFactors(domain: string, dnsInfo: any): Promise<string[]> {
    const riskFactors: string[] = []

    // MX record anomalies
    if (dnsInfo.mxRecords.length === 0) {
      riskFactors.push('no_mx_records')
    }

    if (dnsInfo.mxRecords.length > 3) {
      riskFactors.push('multiple_mx_records')
    }

    // Missing SPF/DMARC
    if (!dnsInfo.spfRecord) {
      riskFactors.push('missing_spf')
    }

    if (!dnsInfo.dmarcRecord) {
      riskFactors.push('missing_dmarc')
    }

    // Domain characteristics
    if (domain.length < 10) {
      riskFactors.push('short_domain')
    }

    if (/d{4,}/.test(domain)) {
      riskFactors.push('numeric_sequence')
    }

    if (domain.includes('temp') || domain.includes('disposable')) {
      riskFactors.push('suspicious_keywords')
    }

    return riskFactors
  }

  private async findSimilarDomains(): Promise<string[]> {
    const similarDomains: string[] = []

    // Find domains similar to known disposable ones
    for (const knownDisposable of this.knownDisposableDomains) {
      const similar = await this.similarityEngine.findSimilarDomains(knownDisposable)
      similarDomains.push(...similar.filter(domain => !this.knownDisposableDomains.has(domain)))
    }

    // Remove duplicates and limit results
    return [...new Set(similarDomains)].slice(0, 50)
  }

  private async checkExternalSources(): Promise<string[]> {
    const externalDomains: string[] = []

    for (const source of this.config.externalSources) {
      try {
        const domains = await this.fetchFromExternalSource(source)
        externalDomains.push(...domains)
      } catch (error) {
        console.error(`Error fetching from source ${source}:`, error)
      }
    }

    return [...new Set(externalDomains)].slice(0, 100)
  }

  private async fetchFromExternalSource(source: string): Promise<string[]> {
    // In production, implement actual API calls
    // For demo, return simulated data

    const mockSources: Record<string, string[]> = {
      'github_disposable_list': [
        'newdisposable1.com', 'tempdomain2.org', 'mailtest3.net'
      ],
      'abuse_ch_api': [
        'spamdomain4.com', 'fakeemail5.org'
      ],
      'custom_crawler': [
        'tempmail6.com', 'disposable7.net'
      ]
    }

    return mockSources[source] || []
  }

  private loadKnownDisposableDomains(): void {
    // Load from database or external sources
    const knownDomains = [
      'mailinator.com', '10minutemail.com', 'guerrillamail.com',
      'tempmail.com', 'throwaway.email', 'dispostable.com'
    ]

    knownDomains.forEach(domain => this.knownDisposableDomains.add(domain))
  }

  private startDiscoveryProcess(): void {
    // Schedule regular discovery crawls
    setInterval(async () => {
      await this.triggerDiscovery()
    }, this.config.crawlInterval * 60 * 1000)

    // Background verification of pending domains
    setInterval(async () => {
      await this.processPendingVerifications()
    }, 30 * 1000) // Every 30 seconds
  }

  private async processPendingVerifications(): Promise<void> {
    const pendingDomains = Array.from(this.discoveredDomains.values())
      .filter(d => d.verificationStatus === 'pending')
      .slice(0, 10) // Process 10 at a time

    for (const domain of pendingDomains) {
      try {
        const verification = await this.verifier.verifyDomain(domain.domain)

        if (verification.isDisposable) {
          domain.verificationStatus = 'confirmed_disposable'
          this.knownDisposableDomains.add(domain.domain)
        } else {
          domain.verificationStatus = 'verified'
        }

        domain.lastVerified = Date.now()

      } catch (error) {
        console.error(`Verification failed for ${domain.domain}:`, error)
        domain.verificationStatus = 'failed'
      }
    }
  }

  private async processDiscoveryResults(result: CrawlResult): Promise<void> {
    // Add new domains to database
    for (const domain of result.newDomains) {
      await this.saveDiscoveredDomain(domain)
    }

    // Update known disposable domains
    for (const domain of result.verifiedDisposable) {
      this.knownDisposableDomains.add(domain.domain)
      await this.updateDisposableDomain(domain.domain, 95)
    }

    console.log(`Discovery completed: ${result.newDomains.length} new domains, ${result.verifiedDisposable.length} confirmed disposable`)
  }

  private async saveDiscoveredDomain(domain: DiscoveredDomain): Promise<void> {
    // Save to database
    console.log(`Saving discovered domain: ${domain.domain} (confidence: ${domain.confidence})`)
  }

  private async updateDisposableDomain(domain: string, confidence: number): Promise<void> {
    // Update disposable domains table
    console.log(`Updating disposable domain: ${domain} (confidence: ${confidence})`)
  }
}

class DomainCrawler {
  async crawlDisposableProviders(): Promise<string[]> {
    const discoveredDomains: string[] = []

    // In production, crawl actual disposable email provider websites
    // For demo, return simulated results

    const mockProviders = [
      'https://tempmail.com',
      'https://10minutemail.com',
      'https://guerrillamail.com',
      'https://mailinator.com'
    ]

    for (const provider of mockProviders) {
      try {
        // Simulate crawling provider website for domain extraction
        const domains = await this.extractDomainsFromProvider(provider)
        discoveredDomains.push(...domains)
      } catch (error) {
        console.error(`Failed to crawl ${provider}:`, error)
      }
    }

    return [...new Set(discoveredDomains)] // Remove duplicates
  }

  private async extractDomainsFromProvider(providerUrl: string): Promise<string[]> {
    // In production, use actual web scraping
    // For demo, return simulated domain extraction

    const mockDomains = {
      'https://tempmail.com': ['tempmail.com', 'tempmail.net', 'tempmail.org'],
      'https://10minutemail.com': ['10minutemail.com', '10minutemail.net'],
      'https://guerrillamail.com': ['guerrillamail.com', 'guerrillamail.net'],
      'https://mailinator.com': ['mailinator.com', 'mailinator.net']
    }

    return mockDomains[providerUrl] || []
  }
}

class DomainVerifier {
  async verifyDomain(domain: string): Promise<{
    isDisposable: boolean
    confidence: number
    verificationMethod: string
    details: Record<string, any>
  }> {
    // Perform comprehensive domain verification
    const results = await Promise.all([
      this.checkDNSRecords(domain),
      this.checkDomainRegistration(domain),
      this.checkWebPresence(domain),
      this.checkSMTPAvailability(domain)
    ])

    const [dnsResult, registrationResult, webResult, smtpResult] = results

    // Combine verification results
    const combinedScore = this.combineVerificationScores(results)
    const isDisposable = combinedScore > 0.7

    return {
      isDisposable,
      confidence: combinedScore * 100,
      verificationMethod: 'multi_factor',
      details: {
        dns: dnsResult,
        registration: registrationResult,
        web: webResult,
        smtp: smtpResult
      }
    }
  }

  async getDNSInfo(domain: string): Promise<{
    mxRecords: string[]
    spfRecord: string | null
    dmarcRecord: string | null
  }> {
    // In production, use actual DNS lookups
    // For demo, simulate DNS responses

    const mockDNS: Record<string, any> = {
      'tempmail.com': {
        mxRecords: [],
        spfRecord: null,
        dmarcRecord: null
      },
      'mailinator.com': {
        mxRecords: [],
        spfRecord: null,
        dmarcRecord: null
      },
      'gmail.com': {
        mxRecords: ['gmail-smtp-in.l.google.com'],
        spfRecord: 'v=spf1 include:_spf.google.com ~all',
        dmarcRecord: 'v=DMARC1; p=reject'
      }
    }

    return mockDNS[domain] || {
      mxRecords: ['mail.' + domain],
      spfRecord: 'v=spf1 mx -all',
      dmarcRecord: null
    }
  }

  private async checkDNSRecords(domain: string): Promise<number> {
    const dnsInfo = await this.getDNSInfo(domain)

    let score = 0

    // MX records check
    if (dnsInfo.mxRecords.length === 0) score += 0.4
    if (dnsInfo.mxRecords.some(mx => mx.includes('temp') || mx.includes('mail'))) score += 0.3

    // SPF check
    if (!dnsInfo.spfRecord) score += 0.2

    // DMARC check
    if (!dnsInfo.dmarcRecord) score += 0.1

    return Math.min(1, score)
  }

  private async checkDomainRegistration(domain: string): Promise<number> {
    // In production, use WHOIS API
    // For demo, simulate based on domain characteristics

    if (domain.length < 10) return 0.3 // Short domains are suspicious
    if (domain.includes('temp')) return 0.4
    if (/d{3,}/.test(domain)) return 0.3

    return 0.1 // Low suspicion for normal domains
  }

  private async checkWebPresence(domain: string): Promise<number> {
    // In production, check if website exists and analyze content
    // For demo, simulate web presence check

    if (domain.includes('temp') || domain.includes('mail')) return 0.5
    return 0.1
  }

  private async checkSMTPAvailability(domain: string): Promise<number> {
    // In production, attempt SMTP connection
    // For demo, simulate SMTP check

    if (domain.includes('temp')) return 0.6 // High likelihood of SMTP issues
    return 0.1
  }

  private combineVerificationScores(results: number[]): number {
    return results.reduce((sum, score) => sum + score, 0) / results.length
  }
}

class SimilarityEngine {
  async findSimilarDomains(domain: string): Promise<string[]> {
    const similarDomains: string[] = []

    // Generate variations of the domain
    const variations = this.generateDomainVariations(domain)

    // Check which variations exist (in production, use DNS lookup)
    for (const variation of variations) {
      if (await this.domainExists(variation)) {
        similarDomains.push(variation)
      }
    }

    // Find domains with similar characteristics
    const characteristicSimilar = await this.findByCharacteristics(domain)
    similarDomains.push(...characteristicSimilar)

    return [...new Set(similarDomains)].slice(0, 20) // Limit results
  }

  private generateDomainVariations(domain: string): string[] {
    const variations: string[] = []
    const parts = domain.split('.')

    if (parts.length >= 2) {
      const name = parts[0]
      const tld = parts[1]

      // Add numbers
      for (let i = 1; i <= 10; i++) {
        variations.push(`${name}${i}.${tld}`)
      }

      // Add prefixes
      const prefixes = ['temp', 'mail', 'test', 'demo']
      prefixes.forEach(prefix => {
        variations.push(`${prefix}${name}.${tld}`)
      })

      // TLD variations
      const tlds = ['com', 'net', 'org', 'info', 'biz']
      tlds.forEach(newTld => {
        if (newTld !== tld) {
          variations.push(`${name}.${newTld}`)
        }
      })
    }

    return variations
  }

  private async domainExists(domain: string): Promise<boolean> {
    // In production, perform actual DNS lookup
    // For demo, simulate based on domain patterns

    if (domain.includes('temp') && domain.includes('123')) return true
    if (domain.includes('mail') && /d/.test(domain)) return true

    return Math.random() > 0.8 // 20% chance of existing
  }

  private async findByCharacteristics(domain: string): Promise<string[]> {
    // Find domains with similar characteristics (length, patterns, etc.)
    // In production, use database queries

    const similar: string[] = []

    if (domain.length < 10) {
      similar.push('shortdomain1.com', 'shortdomain2.net')
    }

    if (/d/.test(domain)) {
      similar.push('numericdomain3.com', 'numberdomain4.net')
    }

    return similar.filter(domain => Math.random() > 0.7) // Random subset
  }
}

// Initialize automated domain discovery
const discoveryConfig: DomainDiscoveryConfig = {
  crawlInterval: 60, // Every hour
  maxDomainsPerCrawl: 100,
  verificationTimeout: 30,
  similarityThreshold: 0.8,
  minConfidenceScore: 70,
  externalSources: [
    'github_disposable_list',
    'abuse_ch_api',
    'custom_crawler'
  ]
}

const domainDiscovery = new AutomatedDomainDiscovery(discoveryConfig)

// API endpoints for automated discovery
app.get('/api/discovery/status', (req, res) => {
  const status = domainDiscovery.getDiscoveryStatus()

  res.json({
    ...status,
    timestamp: new Date().toISOString()
  })
})

// Trigger manual discovery
app.post('/api/discovery/trigger', async (req, res) => {
  try {
    const result = await domainDiscovery.triggerDiscovery()

    res.json({
      ...result,
      timestamp: new Date().toISOString()
    })

  } catch (error) {
    console.error('Manual discovery error:', error)
    res.status(500).json({ error: 'Discovery failed' })
  }
})

// Subscribe to discovery results
app.ws('/api/discovery/stream', (ws: any) => {
  const unsubscribe = domainDiscovery.subscribe((result) => {
    ws.send(JSON.stringify({
      type: 'discovery_result',
      data: result,
      timestamp: new Date().toISOString()
    }))
  })

  ws.on('close', () => {
    unsubscribe()
  })
})

// Get discovered domains
app.get('/api/discovery/domains', (req, res) => {
  const status = req.query.status as string || 'all'
  const limit = parseInt(req.query.limit as string) || 100

  const domains = Array.from(domainDiscovery['discoveredDomains'].values())

  let filteredDomains = domains

  if (status !== 'all') {
    filteredDomains = domains.filter(d => d.verificationStatus === status)
  }

  res.json({
    domains: filteredDomains.slice(0, limit),
    total: filteredDomains.length,
    timestamp: new Date().toISOString()
  })
})

console.log('Automated domain discovery system initialized')

Implementation (Node.js + SQL)


1) Maintain a disposable domains table


create table if not exists disposable_domains (
  domain text primary key,
  source text,  -- 'public-list', 'internal-discovery', 'manual'
  confidence_score int check (confidence_score between 0 and 100),  -- 100 = definitely disposable
  first_seen timestamptz default now(),
  last_updated timestamptz default now(),
  is_active boolean default true
);

-- Index for fast lookups
create index idx_disposable_domains_active on disposable_domains(domain) where is_active = true;

-- Example upsert (run via ETL or cron)
insert into disposable_domains(domain, source, confidence_score)
values ('mailinator.com','public-list',95),
       ('10minutemail.com','public-list',90),
       ('guerrillamail.com','public-list',85)
on conflict (domain) do update set
  confidence_score = excluded.confidence_score,
  last_updated = now();

2) Check on registration server-side


import { sql } from '@/lib/db'

export interface EmailValidationResult {
  isValid: boolean
  isDisposable: boolean
  confidence: number
  signals: string[]
  recommendation: 'allow' | 'block' | 'review'
}

export async function validateEmail(email: string): Promise<EmailValidationResult> {
  const domain = extractDomain(email)
  if (!domain) {
    return {
      isValid: false,
      isDisposable: false,
      confidence: 0,
      signals: ['invalid_email_format'],
      recommendation: 'block'
    }
  }

  // Check against disposable domains
  const disposableCheck = await sql`
    select confidence_score, source
    from disposable_domains
    where domain = ${domain} and is_active = true
  `

  if (disposableCheck.length > 0) {
    const { confidence_score, source } = disposableCheck[0]
    return {
      isValid: true,
      isDisposable: true,
      confidence: confidence_score,
      signals: [`disposable_domain_${source}`],
      recommendation: confidence_score > 80 ? 'block' : 'review'
    }
  }

  // Additional checks (MX, SPF, etc.) could go here
  return {
    isValid: true,
    isDisposable: false,
    confidence: 0,
    signals: [],
    recommendation: 'allow'
  }
}

export function extractDomain(email: string): string | null {
  const at = email.lastIndexOf('@')
  if (at < 0) return null
  return email.slice(at + 1).toLowerCase()
}

3) DNS/MX quick validation (CLI for ops)


#!/bin/bash
# disposable-check.sh - Quick domain validation

DOMAIN="$1"

if [ -z "$DOMAIN" ]; then
  echo "Usage: $0 <domain>"
  exit 1
fi

echo "Checking domain: $DOMAIN"

# MX records
MX=$(dig +short MX "$DOMAIN")
if [ -z "$MX" ]; then
  echo "⚠️  No MX records found"
else
  echo "✅ MX records: $MX"
fi

# SPF record
SPF=$(dig +short TXT "$DOMAIN" | grep -i "spf" | head -1)
if [ -z "$SPF" ]; then
  echo "⚠️  No SPF record found"
else
  echo "✅ SPF: $SPF"
fi

# Domain age (rough estimate)
WHOIS=$(whois "$DOMAIN" | grep -i "Creation Date" | head -1)
if [ -z "$WHOIS" ]; then
  echo "⚠️  Cannot determine domain age"
else
  echo "✅ $WHOIS"
fi

# Known disposable check
if curl -s "https://raw.githubusercontent.com/disposable-email-domains/disposable-email-domains/master/domains.txt" | grep -q "^$DOMAIN$"; then
  echo "🚨 KNOWN DISPOSABLE DOMAIN"
fi

4) Optional SMTP reachability probe


import { createConnection } from 'net'

export async function checkSMTPCapability(domain: string): Promise<{
  canConnect: boolean
  supportsTLS: boolean
  error?: string
}> {
  return new Promise((resolve) => {
    const client = createConnection(25, domain)

    let response = ''
    let supportsTLS = false

    client.setTimeout(5000) // 5 second timeout

    client.on('data', (data) => {
      response += data.toString()
      if (response.includes('220') && response.includes('ESMTP')) {
        // Send EHLO to check TLS support
        client.write('EHLO example.com
')
      }
      if (response.includes('STARTTLS')) {
        supportsTLS = true
      }
    })

    client.on('timeout', () => {
      client.destroy()
      resolve({ canConnect: false, supportsTLS: false, error: 'timeout' })
    })

    client.on('error', (err) => {
      resolve({ canConnect: false, supportsTLS: false, error: err.message })
    })

    client.on('connect', () => {
      // Wait for banner and check
      setTimeout(() => {
        client.destroy()
        resolve({ canConnect: true, supportsTLS })
      }, 1000)
    })
  })
}

User Experience and Policy


Soft Blocks vs Hard Blocks


Prefer soft blocks with clear messaging:

// Example soft block response
const softBlockResponse = {
  success: false,
  error: {
    code: 'DISPOSABLE_EMAIL',
    message: 'We detected a temporary email address. Please use a permanent email to continue.',
    suggestion: 'Try Gmail, Outlook, or your work email address.'
  }
}

Hard blocks only for high-confidence cases (95%+). For medium confidence (50-80%), use:

  • Step-up verification: SMS, phone, or payment method.
  • Delayed activation: Email verification required after signup.
  • Rate limiting: Limit actions until email is verified.

Exception Handling


-- Temporary allowlist for business-critical cases
create table email_allowlist (
  email_pattern text primary key,  -- 'user@company.com' or '%@trusted-domain.com'
  reason text,
  expires_at timestamptz,
  added_by text,
  is_active boolean default true
);

-- Check if email is allowlisted
select exists(
  select 1 from email_allowlist
  where (email_pattern = :email or :email like email_pattern)
  and is_active = true
  and (expires_at is null or expires_at > now())
) as is_allowlisted;

Monitoring and Alerting



-- Disposable email share by week
with weekly_stats as (
  select
    date_trunc('week', created_at) as week,
    count(*) as total_signups,
    sum(case when is_disposable then 1 else 0 end) as disposable_count
  from user_registrations
  where created_at >= now() - interval '12 weeks'
  group by 1
)
select
  week,
  total_signups,
  disposable_count,
  round(100.0 * disposable_count / total_signups, 2) as disposable_percentage,
  -- Trend indicator
  lag(disposable_percentage) over (order by week) as prev_percentage,
  case
    when disposable_percentage > lag(disposable_percentage) over (order by week) * 1.5
    then '📈 SPIKE'
    when disposable_percentage < lag(disposable_percentage) over (order by week) * 0.7
    then '📉 DROP'
    else '➡️ STABLE'
  end as trend
from weekly_stats
order by week desc;

Real-time Alerts


#!/bin/bash
# disposable-alert.sh - Monitor for spikes in disposable usage

# Config
THRESHOLD_PERCENT=15  # Alert if >15% of signups are disposable
CHECK_HOURS=1         # Check last hour
DB_HOST="localhost"
DB_NAME="analytics"

# Query current rate
CURRENT_RATE=$(psql -h "$DB_HOST" -d "$DB_NAME" -tA -c "
  select coalesce(
    100.0 * sum(case when is_disposable then 1 else 0 end) / count(*),
    0
  )
  from user_registrations
  where created_at > now() - interval '$CHECK_HOURS hours'
")

# Check threshold
if (( $(echo "$CURRENT_RATE > $THRESHOLD_PERCENT" | bc -l) )); then
  echo "$(date): ALERT - Disposable rate at ${CURRENT_RATE}% (threshold: ${THRESHOLD_PERCENT}%)"
  # Send Slack notification, email, or trigger PagerDuty
  curl -X POST -H 'Content-type: application/json'     --data "{"text":"🚨 Disposable email spike: ${CURRENT_RATE}% in last hour"}"     "$SLACK_WEBHOOK_URL"
fi

Dashboard Metrics


Track these KPIs:

  • Daily/weekly disposable % — target <5%.
  • Top disposable domains — identify new threats.
  • Conversion rates — disposable vs permanent email users.
  • Bounce rates — correlation with disposable usage.

-- Top disposable domains in last 30 days
select
  email_domain,
  count(*) as usage_count,
  max(created_at) as last_seen
from user_registrations
where is_disposable = true
  and created_at > now() - interval '30 days'
group by 1
order by 2 desc
limit 20;

FAQ and Edge Cases


Corporate Testing Domains


Many companies use test domains like test@company.com or qa@internal.company.com.


Solution: Temporary allowlist with expiration:

// Add to allowlist for 30 days
await sql`
  insert into email_allowlist (email_pattern, reason, expires_at, added_by)
  values (%qa@internal.company.com%, 'Corporate testing', now() + interval '30 days', 'admin')
`

Catch-all Corporate Domains


Large organizations often have catch-all domains where any email goes to a central inbox.


Detection: High volume + SMTP reachability + low bounce rates.


-- Identify potential catch-all domains
select
  email_domain,
  count(*) as signup_count,
  avg(bounce_rate) as avg_bounce_rate
from user_registrations
where created_at > now() - interval '30 days'
group by 1
having count(*) > 100  -- High volume
  and avg(bounce_rate) < 0.05  -- Low bounces
order by 2 desc;

Internationalized Domains (IDN)


Domains with non-ASCII characters (e.g., münchen.dexn--mnchen-3ya.de).


Solution: Normalize to punycode before checking:

import { punycode } from 'punycode'

export function normalizeDomain(domain: string): string {
  try {
    return punycode.toASCII(domain.toLowerCase())
  } catch {
    return domain.toLowerCase()
  }
}

False Positives


Common issues:

  • Legitimate temp emails: Alumni associations, conference registrations.
  • Corporate aliases: noreply@company.com used for notifications.
  • Educational institutions: Student email forwarding.

Mitigation:

  • Manual review queues for edge cases.
  • Allowlist management for known good domains.
  • Confidence scoring vs binary decisions.

-- Manual review queue for borderline cases
select user_id, email, confidence_score, created_at
from user_registrations
where is_disposable = true
  and confidence_score between 50 and 80  -- Medium confidence
  and created_at > now() - interval '24 hours'
order by created_at desc;

Best Practices


1. Layered Defense: Combine multiple signals rather than relying on one.

2. Regular Updates: Refresh disposable domain lists weekly.

3. A/B Testing: Test different thresholds and policies.

4. User Education: Clear messaging about why permanent emails are preferred.

5. Monitoring: Set up alerts for spikes and trends.

6. Privacy Compliance: Ensure checks comply with regional laws (GDPR, CCPA).


Integration Examples


Express.js Middleware


import { validateEmail } from './email-validator'

app.post('/api/signup', async (req, res) => {
  const { email } = req.body

  const validation = await validateEmail(email)

  if (validation.recommendation === 'block') {
    return res.status(400).json(validation)
  }

  if (validation.recommendation === 'review') {
    // Queue for manual review or step-up auth
    req.session.pendingReview = validation
  }

  // Continue with signup...
})

Python FastAPI


from fastapi import HTTPException
from .email_validator import validate_email

@app.post("/signup")
async def signup(email: str):
    validation = await validate_email(email)

    if validation["recommendation"] == "block":
        raise HTTPException(
            status_code=400,
            detail=validation["error_message"]
        )

    if validation["recommendation"] == "review":
        # Trigger additional verification
        pass

    return {"message": "Signup successful"}

This comprehensive approach balances fraud prevention with user experience, ensuring legitimate users aren't unnecessarily blocked while protecting your platform from abuse.


Tags:disposable-emailtemporary-addressesplatform-protectionuser-quality