Disposable Email Detection: Protecting Your Platform from Temporary Addresses
Implement advanced techniques to detect and handle disposable email addresses while maintaining user experience.
Table of Contents
Table of Contents
Disposable Email Detection: Protecting Your Platform from Temporary Addresses
Disposable/temporary inboxes hurt activation funnels, referral programs, trial abuse protection, and deliverability. Below is a pragmatic approach with detection signals, sample code, SQL, and ops guidance.
Disposable Email Detection Overview
Why It Matters
- Lower user quality and LTV due to throwaway accounts that never convert.
- Increased fraud and promo abuse — free trials, coupons, and referral bonuses get exploited.
- Bounce risk and sender reputation damage — high bounce rates tank deliverability.
- Operational overhead — support tickets, manual reviews, and account cleanup.
Detection Signals
1. Curated Disposable Domain Lists
Maintain a table of known disposable providers. Sources include:
- Public lists: GitHub repos like
disposable-email-domains(1000+ domains). - Internal discovery: Track domains that appear in signups but have no MX records or suspicious patterns.
- Community feeds: Abuse.ch, spamhaus, or custom crawlers.
# Example: check if domain is in disposable list
curl -s https://raw.githubusercontent.com/disposable-email-domains/disposable-email-domains/master/domains.txt | grep -q "mailinator.com" && echo "disposable"2. MX Record Anomalies
Disposable services often have:
- No MX records (relies on catch-all or forwarding).
- Suspicious MX targets (e.g., pointing to temp mail services).
- Missing SPF/DMARC (common for throwaway providers).
# Quick MX check for a domain
dig +short MX temp-mail.org
# Returns nothing or suspicious entries
# SPF/DMARC check
dig +short TXT temp-mail.org | grep -E "(spf|dmarc)"3. Domain Age and Patterns
- New domains (< 30 days old) are suspicious.
- Known disposable TLDs (.tk, .ml, .cf are often abused).
- Ephemeral patterns (domains that disappear after days).
# Check domain age
whois temp-mail.org | grep -i "Creation Date\|Registry Expiry Date" | head -24. ASN and Hosting Intelligence
Disposable services often cluster on:
- High-risk ASNs (known for spam/VPN hosting).
- Cloud providers with lax verification.
- Geographic clusters (e.g., certain data centers).
-- Example: flag high-risk ASNs in user registrations
select user_id, email_domain, asn, country
from user_registrations r
join ip_geo g on g.ip = r.ip
where g.asn in (13335, 15169, 16276) -- Known risky ASNs
and r.created_at > now() - interval '24 hours';Implementation (Node.js + SQL)
1) Maintain a disposable domains table
create table if not exists disposable_domains (
domain text primary key,
source text, -- 'public-list', 'internal-discovery', 'manual'
confidence_score int check (confidence_score between 0 and 100), -- 100 = definitely disposable
first_seen timestamptz default now(),
last_updated timestamptz default now(),
is_active boolean default true
);
-- Index for fast lookups
create index idx_disposable_domains_active on disposable_domains(domain) where is_active = true;
-- Example upsert (run via ETL or cron)
insert into disposable_domains(domain, source, confidence_score)
values ('mailinator.com','public-list',95),
('10minutemail.com','public-list',90),
('guerrillamail.com','public-list',85)
on conflict (domain) do update set
confidence_score = excluded.confidence_score,
last_updated = now();Practical Implementation Examples
Machine Learning Classifier
// Advanced machine learning classifier for disposable email detection
interface EmailFeatures {
domain: string
domainLength: number
hasNumbers: boolean
hasHyphens: boolean
tld: string
subdomainCount: number
entropy: number
mxRecordCount: number
spfRecordExists: boolean
dmarcRecordExists: boolean
domainAge: number // in days
registrationPattern: string
suspiciousKeywords: string[]
similarityToKnownDisposable: number
trafficPatternScore: number
geographicRisk: number
}
interface MLModel {
weights: Record<string, number>
bias: number
featureNames: string[]
threshold: number
accuracy: number
lastUpdated: number
}
interface PredictionResult {
isDisposable: boolean
confidence: number
features: EmailFeatures
modelVersion: string
explanation: string[]
}
class DisposableEmailClassifier {
private model: MLModel
private featureExtractor: FeatureExtractor
private knownDisposableDomains: Set<string> = new Set()
private trainingData: Array<{ features: EmailFeatures; label: boolean }> = []
constructor() {
this.model = this.loadModel()
this.featureExtractor = new FeatureExtractor()
this.loadKnownDisposableDomains()
}
async predict(email: string): Promise<PredictionResult> {
const domain = this.extractDomain(email)
if (!domain) {
return {
isDisposable: false,
confidence: 0,
features: {} as EmailFeatures,
modelVersion: this.model.lastUpdated.toString(),
explanation: ['Invalid email format']
}
}
// Check against known disposable domains first (fast path)
if (this.knownDisposableDomains.has(domain)) {
return {
isDisposable: true,
confidence: 95,
features: {} as EmailFeatures,
modelVersion: this.model.lastUpdated.toString(),
explanation: ['Domain found in known disposable list']
}
}
// Extract features for ML prediction
const features = await this.featureExtractor.extractFeatures(domain)
// Get ML prediction
const mlScore = this.predictWithModel(features)
// Combine with rule-based checks
const ruleBasedScore = this.calculateRuleBasedScore(domain, features)
// Ensemble prediction
const combinedScore = (mlScore * 0.7) + (ruleBasedScore * 0.3)
const isDisposable = combinedScore > this.model.threshold
// Generate explanation
const explanation = this.generateExplanation(features, mlScore, ruleBasedScore)
return {
isDisposable,
confidence: Math.min(100, combinedScore * 100),
features,
modelVersion: this.model.lastUpdated.toString(),
explanation
}
}
async train(features: EmailFeatures[], labels: boolean[]): Promise<void> {
// Simple gradient descent training
const learningRate = 0.01
const epochs = 100
for (let epoch = 0; epoch < epochs; epoch++) {
let totalError = 0
for (let i = 0; i < features.length; i++) {
const prediction = this.predictWithModel(features[i])
const error = labels[i] ? prediction - 1 : prediction - 0
totalError += Math.abs(error)
// Update weights
for (const featureName of this.model.featureNames) {
const featureValue = features[i][featureName as keyof EmailFeatures] as number
this.model.weights[featureName] -= learningRate * error * featureValue
}
this.model.bias -= learningRate * error
}
// Early stopping
if (totalError < 0.01) break
}
this.model.lastUpdated = Date.now()
this.model.accuracy = this.evaluateModel(features, labels)
// Save updated model
this.saveModel()
}
private predictWithModel(features: EmailFeatures): number {
let score = this.model.bias
for (const featureName of this.model.featureNames) {
const weight = this.model.weights[featureName] || 0
const value = features[featureName as keyof EmailFeatures] as number
score += weight * value
}
// Sigmoid activation
return 1 / (1 + Math.exp(-score))
}
private calculateRuleBasedScore(domain: string, features: EmailFeatures): number {
let score = 0
// Domain length heuristic
if (features.domainLength < 5 || features.domainLength > 20) score += 0.3
// TLD risk assessment
const riskyTLDs = ['.tk', '.ml', '.cf', '.ga', '.gq']
if (riskyTLDs.some(tld => features.tld.includes(tld))) score += 0.4
// Numbers in domain
if (features.hasNumbers) score += 0.2
// Entropy (randomness) - disposable domains often have high entropy
if (features.entropy > 3.5) score += 0.3
// MX record anomalies
if (features.mxRecordCount === 0) score += 0.5
if (features.mxRecordCount > 5) score += 0.2
// Missing SPF/DMARC
if (!features.spfRecordExists || !features.dmarcRecordExists) score += 0.2
// Domain age
if (features.domainAge < 30) score += 0.4
// Suspicious keywords
if (features.suspiciousKeywords.length > 0) score += 0.3
// Similarity to known disposable
if (features.similarityToKnownDisposable > 0.8) score += 0.4
return Math.min(1, score)
}
private generateExplanation(features: EmailFeatures, mlScore: number, ruleScore: number): string[] {
const explanations: string[] = []
if (mlScore > 0.7) explanations.push('High ML confidence score')
if (ruleScore > 0.6) explanations.push('Multiple rule-based indicators triggered')
if (features.domainLength < 5) explanations.push('Unusually short domain name')
if (features.domainLength > 20) explanations.push('Unusually long domain name')
if (features.entropy > 3.5) explanations.push('High domain name entropy (appears random)')
if (features.mxRecordCount === 0) explanations.push('No MX records found')
if (!features.spfRecordExists) explanations.push('Missing SPF record')
if (features.domainAge < 30) explanations.push('Very new domain registration')
if (features.suspiciousKeywords.length > 0) {
explanations.push(`Suspicious keywords detected: ${features.suspiciousKeywords.join(', ')}`)
}
return explanations
}
private loadModel(): MLModel {
// In production, load from database or file
return {
weights: {
domainLength: -0.1,
hasNumbers: 0.3,
hasHyphens: 0.2,
entropy: 0.4,
mxRecordCount: -0.2,
spfRecordExists: -0.3,
dmarcRecordExists: -0.2,
domainAge: -0.4,
similarityToKnownDisposable: 0.5,
trafficPatternScore: 0.3,
geographicRisk: 0.2
},
bias: -2.0,
featureNames: [
'domainLength', 'hasNumbers', 'hasHyphens', 'entropy',
'mxRecordCount', 'spfRecordExists', 'dmarcRecordExists',
'domainAge', 'similarityToKnownDisposable', 'trafficPatternScore', 'geographicRisk'
],
threshold: 0.6,
accuracy: 0.92,
lastUpdated: Date.now()
}
}
private saveModel(): void {
// Save model to database or file
console.log('Model saved with accuracy:', this.model.accuracy)
}
private loadKnownDisposableDomains(): void {
// Load from database or external API
const disposableDomains = [
'mailinator.com', '10minutemail.com', 'guerrillamail.com',
'tempmail.com', 'throwaway.email', 'dispostable.com'
]
disposableDomains.forEach(domain => this.knownDisposableDomains.add(domain))
}
private extractDomain(email: string): string | null {
const atIndex = email.lastIndexOf('@')
if (atIndex < 0) return null
return email.slice(atIndex + 1).toLowerCase()
}
private evaluateModel(features: EmailFeatures[], labels: boolean[]): number {
let correct = 0
for (let i = 0; i < features.length; i++) {
const prediction = this.predictWithModel(features[i])
const predicted = prediction > this.model.threshold
if (predicted === labels[i]) correct++
}
return correct / features.length
}
}
class FeatureExtractor {
async extractFeatures(domain: string): Promise<EmailFeatures> {
const features: EmailFeatures = {
domain,
domainLength: domain.length,
hasNumbers: /d/.test(domain),
hasHyphens: domain.includes('-'),
tld: domain.split('.').pop() || '',
subdomainCount: domain.split('.').length - 1,
entropy: this.calculateEntropy(domain),
mxRecordCount: await this.getMXRecordCount(domain),
spfRecordExists: await this.checkSPFRecord(domain),
dmarcRecordExists: await this.checkDMARCRecord(domain),
domainAge: await this.getDomainAge(domain),
registrationPattern: this.analyzeRegistrationPattern(domain),
suspiciousKeywords: this.findSuspiciousKeywords(domain),
similarityToKnownDisposable: this.calculateSimilarityToKnownDisposable(domain),
trafficPatternScore: await this.analyzeTrafficPatterns(domain),
geographicRisk: await this.assessGeographicRisk(domain)
}
return features
}
private calculateEntropy(domain: string): number {
const charCounts = new Map<string, number>()
for (const char of domain) {
charCounts.set(char, (charCounts.get(char) || 0) + 1)
}
let entropy = 0
const length = domain.length
for (const count of charCounts.values()) {
const probability = count / length
entropy -= probability * Math.log2(probability)
}
return entropy
}
private async getMXRecordCount(domain: string): Promise<number> {
// In production, use DNS lookup
// For demo, simulate based on domain patterns
if (domain.includes('temp') || domain.includes('mail')) return 0
return Math.floor(Math.random() * 3) + 1
}
private async checkSPFRecord(domain: string): Promise<boolean> {
// In production, query DNS TXT records
return Math.random() > 0.3 // 70% of domains have SPF
}
private async checkDMARCRecord(domain: string): Promise<boolean> {
// In production, query DNS TXT records for _dmarc.domain
return Math.random() > 0.5 // 50% of domains have DMARC
}
private async getDomainAge(domain: string): Promise<number> {
// In production, use WHOIS lookup
// For demo, simulate based on domain characteristics
if (domain.length < 10) return Math.floor(Math.random() * 30) + 1 // 1-30 days
if (domain.includes('temp')) return Math.floor(Math.random() * 7) + 1 // 1-7 days
return Math.floor(Math.random() * 365) + 30 // 30-395 days
}
private analyzeRegistrationPattern(domain: string): string {
// Analyze domain registration patterns
if (domain.length < 8) return 'short'
if (domain.includes('temp') || domain.includes('mail')) return 'temporary'
if (/d{4,}/.test(domain)) return 'numeric'
if (domain.split('.').length > 2) return 'multi_subdomain'
return 'standard'
}
private findSuspiciousKeywords(domain: string): string[] {
const suspiciousWords = [
'temp', 'mail', 'throwaway', 'disposable', 'fake', 'test',
'demo', 'sample', 'example', 'trash', 'junk', 'spam'
]
return suspiciousWords.filter(word => domain.includes(word))
}
private calculateSimilarityToKnownDisposable(domain: string): number {
// Calculate string similarity to known disposable domains
const knownDisposable = ['mailinator', 'tempmail', 'guerrillamail', '10minutemail']
let maxSimilarity = 0
for (const disposable of knownDisposable) {
const similarity = this.calculateStringSimilarity(domain, disposable)
maxSimilarity = Math.max(maxSimilarity, similarity)
}
return maxSimilarity
}
private calculateStringSimilarity(str1: string, str2: string): number {
// Simple Levenshtein distance ratio
const longer = str1.length > str2.length ? str1 : str2
const shorter = str1.length > str2.length ? str2 : str1
if (longer.length === 0) return 1.0
const editDistance = this.levenshteinDistance(longer, shorter)
return (longer.length - editDistance) / longer.length
}
private levenshteinDistance(str1: string, str2: string): number {
const matrix = Array(str2.length + 1).fill(null).map(() => Array(str1.length + 1).fill(null))
for (let i = 0; i <= str1.length; i++) matrix[0][i] = i
for (let j = 0; j <= str2.length; j++) matrix[j][0] = j
for (let j = 1; j <= str2.length; j++) {
for (let i = 1; i <= str1.length; i++) {
const indicator = str1[i - 1] === str2[j - 1] ? 0 : 1
matrix[j][i] = Math.min(
matrix[j][i - 1] + 1, // deletion
matrix[j - 1][i] + 1, // insertion
matrix[j - 1][i - 1] + indicator // substitution
)
}
}
return matrix[str2.length][str1.length]
}
private async analyzeTrafficPatterns(domain: string): Promise<number> {
// Analyze traffic patterns for the domain
// In production, use historical traffic data
// For demo, simulate based on domain characteristics
if (domain.includes('temp')) return 0.8 // High risk
if (domain.length < 10) return 0.6 // Medium risk
return 0.2 // Low risk
}
private async assessGeographicRisk(domain: string): Promise<number> {
// Assess geographic risk based on domain registration location
// In production, use WHOIS data or IP geolocation
// For demo, simulate based on TLD
const highRiskTLDs = ['.ru', '.cn', '.ir', '.kp']
const tld = domain.split('.').pop() || ''
if (highRiskTLDs.includes('.' + tld)) return 0.8
return 0.3
}
}
// Integration with email validation service
const disposableClassifier = new DisposableEmailClassifier()
// Enhanced email validation with ML
export async function validateEmailWithML(email: string): Promise<{
isValid: boolean
isDisposable: boolean
confidence: number
riskLevel: 'low' | 'medium' | 'high' | 'critical'
explanation: string[]
recommendations: string[]
}> {
const basicValidation = await validateEmail(email)
if (!basicValidation.isValid) {
return {
isValid: false,
isDisposable: false,
confidence: 100,
riskLevel: 'low',
explanation: ['Invalid email format'],
recommendations: ['Please enter a valid email address']
}
}
// Run ML classification
const mlPrediction = await disposableClassifier.predict(email)
// Determine risk level
let riskLevel: 'low' | 'medium' | 'high' | 'critical' = 'low'
if (mlPrediction.confidence > 90) riskLevel = 'critical'
else if (mlPrediction.confidence > 70) riskLevel = 'high'
else if (mlPrediction.confidence > 40) riskLevel = 'medium'
// Generate recommendations
const recommendations = []
if (mlPrediction.isDisposable) {
recommendations.push('Please use a permanent email address')
recommendations.push('Consider using Gmail, Outlook, or your work email')
if (riskLevel === 'critical') {
recommendations.push('This email domain is known to be disposable')
} else {
recommendations.push('This email domain shows suspicious characteristics')
}
}
return {
isValid: true,
isDisposable: mlPrediction.isDisposable,
confidence: mlPrediction.confidence,
riskLevel,
explanation: mlPrediction.explanation,
recommendations
}
}
// Express.js middleware for email validation
app.post('/api/validate-email', async (req, res) => {
try {
const { email } = req.body
if (!email) {
return res.status(400).json({ error: 'Email address required' })
}
const validation = await validateEmailWithML(email)
res.json({
email,
validation,
timestamp: new Date().toISOString()
})
} catch (error) {
console.error('Email validation error:', error)
res.status(500).json({ error: 'Validation service unavailable' })
}
})
// Batch validation endpoint
app.post('/api/validate-emails', async (req, res) => {
try {
const { emails } = req.body
if (!Array.isArray(emails)) {
return res.status(400).json({ error: 'Emails array required' })
}
const validations = await Promise.all(
emails.map(email => validateEmailWithML(email))
)
const results = emails.map((email, index) => ({
email,
validation: validations[index]
}))
res.json({
results,
summary: {
total: emails.length,
valid: results.filter(r => r.validation.isValid).length,
disposable: results.filter(r => r.validation.isDisposable).length,
highRisk: results.filter(r => r.validation.riskLevel === 'critical').length
},
timestamp: new Date().toISOString()
})
} catch (error) {
console.error('Batch email validation error:', error)
res.status(500).json({ error: 'Batch validation service unavailable' })
}
})
console.log('Disposable email ML classifier initialized')Real-Time Pattern Analysis
// Real-time pattern analysis for detecting emerging disposable email threats
interface EmailPattern {
id: string
pattern: string
type: 'domain_pattern' | 'registration_pattern' | 'behavioral_pattern' | 'network_pattern'
confidence: number
frequency: number
firstSeen: number
lastSeen: number
riskScore: number
affectedDomains: string[]
indicators: string[]
}
interface PatternAnalysisResult {
suspiciousDomains: string[]
emergingPatterns: EmailPattern[]
trendAnalysis: {
direction: 'increasing' | 'decreasing' | 'stable'
changeRate: number
confidence: number
}
recommendations: string[]
}
class RealTimePatternAnalyzer {
private patternBuffer: Map<string, EmailPattern> = new Map()
private domainActivity: Map<string, { count: number; lastSeen: number; patterns: string[] }> = new Map()
private analysisWindow = 24 * 60 * 60 * 1000 // 24 hours
private minPatternFrequency = 5
private subscribers: Array<(result: PatternAnalysisResult) => void> = []
constructor() {
this.startPatternAnalysis()
}
// Analyze email domain for suspicious patterns
async analyzeDomain(domain: string): Promise<{
isSuspicious: boolean
patterns: string[]
riskScore: number
recommendations: string[]
}> {
const analysis = {
isSuspicious: false,
patterns: [] as string[],
riskScore: 0,
recommendations: [] as string[]
}
// Update domain activity
this.updateDomainActivity(domain)
// Check for known patterns
const detectedPatterns = await this.detectPatterns(domain)
for (const pattern of detectedPatterns) {
if (pattern.riskScore > 50) {
analysis.isSuspicious = true
analysis.patterns.push(pattern.pattern)
analysis.riskScore = Math.max(analysis.riskScore, pattern.riskScore)
if (pattern.riskScore > 80) {
analysis.recommendations.push('Block domain immediately')
analysis.recommendations.push('Monitor for similar patterns')
} else if (pattern.riskScore > 60) {
analysis.recommendations.push('Require additional verification')
analysis.recommendations.push('Add to watchlist')
}
}
}
// Check for behavioral anomalies
const behavioralScore = await this.analyzeBehavioralPatterns(domain)
if (behavioralScore > 70) {
analysis.isSuspicious = true
analysis.patterns.push('behavioral_anomaly')
analysis.riskScore = Math.max(analysis.riskScore, behavioralScore)
analysis.recommendations.push('Investigate account activity')
}
return analysis
}
// Subscribe to pattern analysis results
subscribe(callback: (result: PatternAnalysisResult) => void): () => void {
this.subscribers.push(callback)
return () => {
const index = this.subscribers.indexOf(callback)
if (index > -1) {
this.subscribers.splice(index, 1)
}
}
}
// Get comprehensive pattern analysis
async getPatternAnalysis(timeframe: number = this.analysisWindow): Promise<PatternAnalysisResult> {
const cutoff = Date.now() - timeframe
// Filter recent patterns
const recentPatterns = Array.from(this.patternBuffer.values())
.filter(pattern => pattern.lastSeen > cutoff)
// Identify suspicious domains
const suspiciousDomains = await this.identifySuspiciousDomains(cutoff)
// Analyze trends
const trendAnalysis = await this.analyzeTrend(cutoff)
// Generate recommendations
const recommendations = this.generateAnalysisRecommendations(recentPatterns, suspiciousDomains)
const result: PatternAnalysisResult = {
suspiciousDomains,
emergingPatterns: recentPatterns.slice(0, 10), // Top 10 patterns
trendAnalysis,
recommendations
}
// Notify subscribers
this.subscribers.forEach(callback => {
try {
callback(result)
} catch (error) {
console.error('Error in pattern analysis subscriber:', error)
}
})
return result
}
private async detectPatterns(domain: string): Promise<EmailPattern[]> {
const patterns: EmailPattern[] = []
// Domain pattern analysis
const domainPatterns = await this.detectDomainPatterns(domain)
patterns.push(...domainPatterns)
// Registration pattern analysis
const registrationPatterns = await this.detectRegistrationPatterns(domain)
patterns.push(...registrationPatterns)
// Network pattern analysis
const networkPatterns = await this.detectNetworkPatterns(domain)
patterns.push(...networkPatterns)
return patterns
}
private async detectDomainPatterns(domain: string): Promise<EmailPattern[]> {
const patterns: EmailPattern[] = []
// Pattern 1: Random-looking domains
const entropy = this.calculateEntropy(domain)
if (entropy > 3.5) {
patterns.push({
id: `random_domain_${Date.now()}`,
pattern: 'high_entropy_domain',
type: 'domain_pattern',
confidence: Math.min(100, entropy * 20),
frequency: 1,
firstSeen: Date.now(),
lastSeen: Date.now(),
riskScore: Math.min(100, entropy * 25),
affectedDomains: [domain],
indicators: ['high_entropy', 'random_character_distribution']
})
}
// Pattern 2: Sequential domains (like temp123.com)
if (/tempd+.com/.test(domain) || /maild+.com/.test(domain)) {
patterns.push({
id: `sequential_domain_${Date.now()}`,
pattern: 'sequential_domain_pattern',
type: 'domain_pattern',
confidence: 85,
frequency: 1,
firstSeen: Date.now(),
lastSeen: Date.now(),
riskScore: 80,
affectedDomains: [domain],
indicators: ['sequential_numbering', 'temp_mail_pattern']
})
}
// Pattern 3: Known disposable TLDs
const riskyTLDs = ['.tk', '.ml', '.cf', '.ga', '.gq']
const tld = domain.split('.').pop() || ''
if (riskyTLDs.includes('.' + tld)) {
patterns.push({
id: `risky_tld_${Date.now()}`,
pattern: 'risky_tld_pattern',
type: 'domain_pattern',
confidence: 90,
frequency: 1,
firstSeen: Date.now(),
lastSeen: Date.now(),
riskScore: 85,
affectedDomains: [domain],
indicators: ['high_risk_tld', 'known_disposable_tld']
})
}
return patterns
}
private async detectRegistrationPatterns(domain: string): Promise<EmailPattern[]> {
const patterns: EmailPattern[] = []
// In production, this would use WHOIS data
// For demo, simulate based on domain characteristics
// Pattern: Very new domains (less than 30 days)
if (domain.length < 10 || domain.includes('temp')) {
patterns.push({
id: `new_domain_${Date.now()}`,
pattern: 'new_domain_registration',
type: 'registration_pattern',
confidence: 75,
frequency: 1,
firstSeen: Date.now(),
lastSeen: Date.now(),
riskScore: 70,
affectedDomains: [domain],
indicators: ['recent_registration', 'suspicious_timing']
})
}
// Pattern: Bulk registration patterns
if (/d{3,}/.test(domain)) {
patterns.push({
id: `bulk_registration_${Date.now()}`,
pattern: 'bulk_registration_pattern',
type: 'registration_pattern',
confidence: 80,
frequency: 1,
firstSeen: Date.now(),
lastSeen: Date.now(),
riskScore: 75,
affectedDomains: [domain],
indicators: ['bulk_registration', 'automated_registration']
})
}
return patterns
}
private async detectNetworkPatterns(domain: string): Promise<EmailPattern[]> {
const patterns: EmailPattern[] = []
// In production, this would analyze network traffic patterns
// For demo, simulate based on domain characteristics
// Pattern: High-risk hosting patterns
if (domain.includes('free') || domain.includes('hosting')) {
patterns.push({
id: `hosting_pattern_${Date.now()}`,
pattern: 'suspicious_hosting',
type: 'network_pattern',
confidence: 70,
frequency: 1,
firstSeen: Date.now(),
lastSeen: Date.now(),
riskScore: 65,
affectedDomains: [domain],
indicators: ['free_hosting', 'suspicious_infrastructure']
})
}
return patterns
}
private async analyzeBehavioralPatterns(domain: string): Promise<number> {
const activity = this.domainActivity.get(domain)
if (!activity || activity.count < 10) return 0
// Analyze behavioral indicators
let riskScore = 0
// High frequency in short time
const timeSpan = Date.now() - activity.lastSeen
if (timeSpan < 60 * 60 * 1000 && activity.count > 50) { // 50+ uses in last hour
riskScore += 40
}
// Rapid sequential access pattern
if (activity.patterns.includes('sequential_access')) {
riskScore += 30
}
// Geographic dispersion (unusual for disposable)
if (activity.patterns.includes('geographic_dispersion')) {
riskScore += 20
}
return Math.min(100, riskScore)
}
private updateDomainActivity(domain: string): void {
const current = this.domainActivity.get(domain) || {
count: 0,
lastSeen: 0,
patterns: []
}
current.count++
current.lastSeen = Date.now()
// Detect access patterns
if (current.count > 1) {
const timeSinceLast = Date.now() - current.lastSeen
if (timeSinceLast < 1000) { // Less than 1 second between accesses
current.patterns.push('rapid_access')
}
}
this.domainActivity.set(domain, current)
}
private async identifySuspiciousDomains(cutoff: number): Promise<string[]> {
const suspiciousDomains: string[] = []
for (const [domain, activity] of this.domainActivity.entries()) {
if (activity.lastSeen < cutoff) continue
let suspiciousScore = 0
// High activity volume
if (activity.count > 100) suspiciousScore += 30
// Recent first appearance
if (activity.lastSeen - activity.lastSeen < 24 * 60 * 60 * 1000) suspiciousScore += 20
// Suspicious patterns
if (activity.patterns.length > 0) suspiciousScore += 25
if (suspiciousScore > 60) {
suspiciousDomains.push(domain)
}
}
return suspiciousDomains.slice(0, 50) // Top 50 suspicious domains
}
private async analyzeTrend(cutoff: number): Promise<{
direction: 'increasing' | 'decreasing' | 'stable'
changeRate: number
confidence: number
}> {
const recentPatterns = Array.from(this.patternBuffer.values())
.filter(pattern => pattern.lastSeen > cutoff)
if (recentPatterns.length < 10) {
return { direction: 'stable', changeRate: 0, confidence: 50 }
}
// Simple trend analysis based on pattern frequency over time
const now = Date.now()
const windowSize = 6 * 60 * 60 * 1000 // 6 hours
const recentWindow = recentPatterns.filter(p => now - p.lastSeen < windowSize)
const olderWindow = recentPatterns.filter(p => now - p.lastSeen >= windowSize)
const recentAvg = recentWindow.reduce((sum, p) => sum + p.frequency, 0) / recentWindow.length || 0
const olderAvg = olderWindow.reduce((sum, p) => sum + p.frequency, 0) / olderWindow.length || 0
let direction: 'increasing' | 'decreasing' | 'stable' = 'stable'
let changeRate = 0
if (recentAvg > olderAvg * 1.2) {
direction = 'increasing'
changeRate = (recentAvg - olderAvg) / olderAvg
} else if (recentAvg < olderAvg * 0.8) {
direction = 'decreasing'
changeRate = (olderAvg - recentAvg) / olderAvg
}
return {
direction,
changeRate: Math.round(changeRate * 100) / 100,
confidence: 75 // Simplified confidence score
}
}
private generateAnalysisRecommendations(patterns: EmailPattern[], suspiciousDomains: string[]): string[] {
const recommendations: string[] = []
if (suspiciousDomains.length > 20) {
recommendations.push('High number of suspicious domains detected')
recommendations.push('Consider tightening domain validation rules')
}
const highRiskPatterns = patterns.filter(p => p.riskScore > 80)
if (highRiskPatterns.length > 5) {
recommendations.push('Multiple high-risk patterns detected')
recommendations.push('Enable enhanced monitoring and alerting')
}
if (patterns.some(p => p.type === 'network_pattern')) {
recommendations.push('Network-level anomalies detected')
recommendations.push('Review infrastructure security')
}
if (recommendations.length === 0) {
recommendations.push('Pattern analysis shows normal activity')
}
return recommendations
}
private calculateEntropy(domain: string): number {
const charCounts = new Map<string, number>()
for (const char of domain) {
charCounts.set(char, (charCounts.get(char) || 0) + 1)
}
let entropy = 0
const length = domain.length
for (const count of charCounts.values()) {
const probability = count / length
entropy -= probability * Math.log2(probability)
}
return entropy
}
private startPatternAnalysis(): void {
// Run pattern analysis every 5 minutes
setInterval(async () => {
await this.getPatternAnalysis()
}, 5 * 60 * 1000)
// Clean up old data every hour
setInterval(() => {
this.cleanupOldData()
}, 60 * 60 * 1000)
}
private cleanupOldData(): void {
const cutoff = Date.now() - this.analysisWindow
// Remove old patterns
for (const [id, pattern] of this.patternBuffer.entries()) {
if (pattern.lastSeen < cutoff) {
this.patternBuffer.delete(id)
}
}
// Remove old domain activity
for (const [domain, activity] of this.domainActivity.entries()) {
if (activity.lastSeen < cutoff) {
this.domainActivity.delete(domain)
}
}
}
}
// Integration with pattern analysis
const patternAnalyzer = new RealTimePatternAnalyzer()
// API endpoints for pattern analysis
app.get('/api/patterns/analysis', async (req, res) => {
try {
const timeframe = parseInt(req.query.timeframe as string) || 24 * 60 * 60 * 1000 // 24 hours default
const analysis = await patternAnalyzer.getPatternAnalysis(timeframe)
res.json({
...analysis,
timeframe,
timestamp: new Date().toISOString()
})
} catch (error) {
console.error('Pattern analysis error:', error)
res.status(500).json({ error: 'Pattern analysis unavailable' })
}
})
// Subscribe to pattern analysis updates
app.ws('/api/patterns/stream', (ws: any) => {
const unsubscribe = patternAnalyzer.subscribe((result) => {
ws.send(JSON.stringify({
type: 'pattern_analysis',
data: result,
timestamp: new Date().toISOString()
}))
})
ws.on('close', () => {
unsubscribe()
})
})
// Analyze specific domain
app.post('/api/patterns/analyze-domain', async (req, res) => {
try {
const { domain } = req.body
if (!domain) {
return res.status(400).json({ error: 'Domain required' })
}
const analysis = await patternAnalyzer.analyzeDomain(domain)
res.json({
domain,
analysis,
timestamp: new Date().toISOString()
})
} catch (error) {
console.error('Domain analysis error:', error)
res.status(500).json({ error: 'Domain analysis unavailable' })
}
})
console.log('Real-time pattern analyzer initialized')Automated Domain Discovery
// Automated system for discovering new disposable email domains
interface DomainDiscoveryConfig {
crawlInterval: number // minutes
maxDomainsPerCrawl: number
verificationTimeout: number // seconds
similarityThreshold: number
minConfidenceScore: number
externalSources: string[]
}
interface DiscoveredDomain {
domain: string
source: string
discoveryMethod: 'crawler' | 'similarity' | 'external_api' | 'user_report'
confidence: number
verificationStatus: 'pending' | 'verified' | 'failed' | 'confirmed_disposable'
firstSeen: number
lastVerified: number
mxRecords: string[]
spfRecord: string | null
dmarcRecord: string | null
similarTo: string[]
riskFactors: string[]
}
interface CrawlResult {
newDomains: DiscoveredDomain[]
verifiedDisposable: DiscoveredDomain[]
failedVerifications: string[]
crawlStats: {
domainsCrawled: number
pagesProcessed: number
avgResponseTime: number
errorRate: number
}
}
class AutomatedDomainDiscovery {
private discoveredDomains: Map<string, DiscoveredDomain> = new Map()
private knownDisposableDomains: Set<string> = new Set()
private crawler: DomainCrawler
private verifier: DomainVerifier
private similarityEngine: SimilarityEngine
private config: DomainDiscoveryConfig
private subscribers: Array<(result: CrawlResult) => void> = []
constructor(config: DomainDiscoveryConfig) {
this.config = config
this.crawler = new DomainCrawler()
this.verifier = new DomainVerifier()
this.similarityEngine = new SimilarityEngine()
this.loadKnownDisposableDomains()
this.startDiscoveryProcess()
}
// Subscribe to discovery results
subscribe(callback: (result: CrawlResult) => void): () => void {
this.subscribers.push(callback)
return () => {
const index = this.subscribers.indexOf(callback)
if (index > -1) {
this.subscribers.splice(index, 1)
}
}
}
// Manually trigger domain discovery
async triggerDiscovery(): Promise<CrawlResult> {
console.log('Starting manual domain discovery...')
const crawlResult = await this.performDiscoveryCrawl()
await this.processDiscoveryResults(crawlResult)
// Notify subscribers
this.subscribers.forEach(callback => {
try {
callback(crawlResult)
} catch (error) {
console.error('Error in domain discovery subscriber:', error)
}
})
return crawlResult
}
// Get current discovery status
getDiscoveryStatus(): {
totalDiscovered: number
pendingVerification: number
confirmedDisposable: number
lastCrawlTime: number
nextScheduledCrawl: number
systemHealth: 'healthy' | 'degraded' | 'unhealthy'
} {
const totalDiscovered = this.discoveredDomains.size
const pendingVerification = Array.from(this.discoveredDomains.values())
.filter(d => d.verificationStatus === 'pending').length
const confirmedDisposable = Array.from(this.discoveredDomains.values())
.filter(d => d.verificationStatus === 'confirmed_disposable').length
let systemHealth: 'healthy' | 'degraded' | 'unhealthy' = 'healthy'
if (pendingVerification > 1000) systemHealth = 'degraded'
if (pendingVerification > 5000) systemHealth = 'unhealthy'
return {
totalDiscovered,
pendingVerification,
confirmedDisposable,
lastCrawlTime: Date.now() - (5 * 60 * 1000), // 5 minutes ago for demo
nextScheduledCrawl: Date.now() + (this.config.crawlInterval * 60 * 1000),
systemHealth
}
}
private async performDiscoveryCrawl(): Promise<CrawlResult> {
const startTime = Date.now()
const result: CrawlResult = {
newDomains: [],
verifiedDisposable: [],
failedVerifications: [],
crawlStats: {
domainsCrawled: 0,
pagesProcessed: 0,
avgResponseTime: 0,
errorRate: 0
}
}
try {
// Crawl disposable email provider lists
const crawledDomains = await this.crawler.crawlDisposableProviders()
result.crawlStats.domainsCrawled = crawledDomains.length
result.crawlStats.pagesProcessed = crawledDomains.length * 2 // Rough estimate
// Process each discovered domain
for (const domain of crawledDomains.slice(0, this.config.maxDomainsPerCrawl)) {
const discoveredDomain = await this.processDiscoveredDomain(domain, 'crawler')
result.newDomains.push(discoveredDomain)
// Attempt immediate verification for high-confidence domains
if (discoveredDomain.confidence > 80) {
const verification = await this.verifier.verifyDomain(domain)
discoveredDomain.verificationStatus = verification.isDisposable ? 'confirmed_disposable' : 'verified'
discoveredDomain.lastVerified = Date.now()
if (verification.isDisposable) {
result.verifiedDisposable.push(discoveredDomain)
}
}
}
// Find similar domains to known disposable ones
const similarDomains = await this.findSimilarDomains()
for (const domain of similarDomains) {
if (!this.discoveredDomains.has(domain)) {
const discoveredDomain = await this.processDiscoveredDomain(domain, 'similarity')
result.newDomains.push(discoveredDomain)
}
}
// Check external APIs for new disposable domains
const externalDomains = await this.checkExternalSources()
for (const domain of externalDomains) {
if (!this.discoveredDomains.has(domain)) {
const discoveredDomain = await this.processDiscoveredDomain(domain, 'external_api')
result.newDomains.push(discoveredDomain)
}
}
// Calculate crawl statistics
const totalTime = Date.now() - startTime
result.crawlStats.avgResponseTime = totalTime / Math.max(result.newDomains.length, 1)
result.crawlStats.errorRate = result.failedVerifications.length / Math.max(result.newDomains.length, 1)
} catch (error) {
console.error('Discovery crawl error:', error)
result.crawlStats.errorRate = 1.0
}
return result
}
private async processDiscoveredDomain(domain: string, method: DiscoveredDomain['discoveryMethod']): Promise<DiscoveredDomain> {
const discoveredDomain: DiscoveredDomain = {
domain,
source: method,
discoveryMethod: method,
confidence: await this.calculateDiscoveryConfidence(domain, method),
verificationStatus: 'pending',
firstSeen: Date.now(),
lastVerified: 0,
mxRecords: [],
spfRecord: null,
dmarcRecord: null,
similarTo: [],
riskFactors: []
}
// Perform basic DNS checks
const dnsInfo = await this.verifier.getDNSInfo(domain)
discoveredDomain.mxRecords = dnsInfo.mxRecords
discoveredDomain.spfRecord = dnsInfo.spfRecord
discoveredDomain.dmarcRecord = dnsInfo.dmarcRecord
// Analyze risk factors
discoveredDomain.riskFactors = await this.analyzeRiskFactors(domain, dnsInfo)
// Find similar domains
discoveredDomain.similarTo = await this.similarityEngine.findSimilarDomains(domain)
this.discoveredDomains.set(domain, discoveredDomain)
return discoveredDomain
}
private async calculateDiscoveryConfidence(domain: string, method: string): Promise<number> {
let confidence = 50 // Base confidence
// Method-based confidence boost
switch (method) {
case 'crawler':
confidence += 30
break
case 'similarity':
confidence += 20
break
case 'external_api':
confidence += 25
break
case 'user_report':
confidence += 15
break
}
// Domain-based confidence adjustments
if (domain.length < 8) confidence += 10 // Short domains are suspicious
if (domain.length > 20) confidence -= 10 // Very long domains are less likely disposable
if (/d{3,}/.test(domain)) confidence += 15 // Numeric sequences are suspicious
if (domain.includes('temp') || domain.includes('mail')) confidence += 20
// TLD-based confidence
const riskyTLDs = ['.tk', '.ml', '.cf', '.ga', '.gq']
const tld = domain.split('.').pop() || ''
if (riskyTLDs.includes('.' + tld)) confidence += 25
return Math.min(100, Math.max(0, confidence))
}
private async analyzeRiskFactors(domain: string, dnsInfo: any): Promise<string[]> {
const riskFactors: string[] = []
// MX record anomalies
if (dnsInfo.mxRecords.length === 0) {
riskFactors.push('no_mx_records')
}
if (dnsInfo.mxRecords.length > 3) {
riskFactors.push('multiple_mx_records')
}
// Missing SPF/DMARC
if (!dnsInfo.spfRecord) {
riskFactors.push('missing_spf')
}
if (!dnsInfo.dmarcRecord) {
riskFactors.push('missing_dmarc')
}
// Domain characteristics
if (domain.length < 10) {
riskFactors.push('short_domain')
}
if (/d{4,}/.test(domain)) {
riskFactors.push('numeric_sequence')
}
if (domain.includes('temp') || domain.includes('disposable')) {
riskFactors.push('suspicious_keywords')
}
return riskFactors
}
private async findSimilarDomains(): Promise<string[]> {
const similarDomains: string[] = []
// Find domains similar to known disposable ones
for (const knownDisposable of this.knownDisposableDomains) {
const similar = await this.similarityEngine.findSimilarDomains(knownDisposable)
similarDomains.push(...similar.filter(domain => !this.knownDisposableDomains.has(domain)))
}
// Remove duplicates and limit results
return [...new Set(similarDomains)].slice(0, 50)
}
private async checkExternalSources(): Promise<string[]> {
const externalDomains: string[] = []
for (const source of this.config.externalSources) {
try {
const domains = await this.fetchFromExternalSource(source)
externalDomains.push(...domains)
} catch (error) {
console.error(`Error fetching from source ${source}:`, error)
}
}
return [...new Set(externalDomains)].slice(0, 100)
}
private async fetchFromExternalSource(source: string): Promise<string[]> {
// In production, implement actual API calls
// For demo, return simulated data
const mockSources: Record<string, string[]> = {
'github_disposable_list': [
'newdisposable1.com', 'tempdomain2.org', 'mailtest3.net'
],
'abuse_ch_api': [
'spamdomain4.com', 'fakeemail5.org'
],
'custom_crawler': [
'tempmail6.com', 'disposable7.net'
]
}
return mockSources[source] || []
}
private loadKnownDisposableDomains(): void {
// Load from database or external sources
const knownDomains = [
'mailinator.com', '10minutemail.com', 'guerrillamail.com',
'tempmail.com', 'throwaway.email', 'dispostable.com'
]
knownDomains.forEach(domain => this.knownDisposableDomains.add(domain))
}
private startDiscoveryProcess(): void {
// Schedule regular discovery crawls
setInterval(async () => {
await this.triggerDiscovery()
}, this.config.crawlInterval * 60 * 1000)
// Background verification of pending domains
setInterval(async () => {
await this.processPendingVerifications()
}, 30 * 1000) // Every 30 seconds
}
private async processPendingVerifications(): Promise<void> {
const pendingDomains = Array.from(this.discoveredDomains.values())
.filter(d => d.verificationStatus === 'pending')
.slice(0, 10) // Process 10 at a time
for (const domain of pendingDomains) {
try {
const verification = await this.verifier.verifyDomain(domain.domain)
if (verification.isDisposable) {
domain.verificationStatus = 'confirmed_disposable'
this.knownDisposableDomains.add(domain.domain)
} else {
domain.verificationStatus = 'verified'
}
domain.lastVerified = Date.now()
} catch (error) {
console.error(`Verification failed for ${domain.domain}:`, error)
domain.verificationStatus = 'failed'
}
}
}
private async processDiscoveryResults(result: CrawlResult): Promise<void> {
// Add new domains to database
for (const domain of result.newDomains) {
await this.saveDiscoveredDomain(domain)
}
// Update known disposable domains
for (const domain of result.verifiedDisposable) {
this.knownDisposableDomains.add(domain.domain)
await this.updateDisposableDomain(domain.domain, 95)
}
console.log(`Discovery completed: ${result.newDomains.length} new domains, ${result.verifiedDisposable.length} confirmed disposable`)
}
private async saveDiscoveredDomain(domain: DiscoveredDomain): Promise<void> {
// Save to database
console.log(`Saving discovered domain: ${domain.domain} (confidence: ${domain.confidence})`)
}
private async updateDisposableDomain(domain: string, confidence: number): Promise<void> {
// Update disposable domains table
console.log(`Updating disposable domain: ${domain} (confidence: ${confidence})`)
}
}
class DomainCrawler {
async crawlDisposableProviders(): Promise<string[]> {
const discoveredDomains: string[] = []
// In production, crawl actual disposable email provider websites
// For demo, return simulated results
const mockProviders = [
'https://tempmail.com',
'https://10minutemail.com',
'https://guerrillamail.com',
'https://mailinator.com'
]
for (const provider of mockProviders) {
try {
// Simulate crawling provider website for domain extraction
const domains = await this.extractDomainsFromProvider(provider)
discoveredDomains.push(...domains)
} catch (error) {
console.error(`Failed to crawl ${provider}:`, error)
}
}
return [...new Set(discoveredDomains)] // Remove duplicates
}
private async extractDomainsFromProvider(providerUrl: string): Promise<string[]> {
// In production, use actual web scraping
// For demo, return simulated domain extraction
const mockDomains = {
'https://tempmail.com': ['tempmail.com', 'tempmail.net', 'tempmail.org'],
'https://10minutemail.com': ['10minutemail.com', '10minutemail.net'],
'https://guerrillamail.com': ['guerrillamail.com', 'guerrillamail.net'],
'https://mailinator.com': ['mailinator.com', 'mailinator.net']
}
return mockDomains[providerUrl] || []
}
}
class DomainVerifier {
async verifyDomain(domain: string): Promise<{
isDisposable: boolean
confidence: number
verificationMethod: string
details: Record<string, any>
}> {
// Perform comprehensive domain verification
const results = await Promise.all([
this.checkDNSRecords(domain),
this.checkDomainRegistration(domain),
this.checkWebPresence(domain),
this.checkSMTPAvailability(domain)
])
const [dnsResult, registrationResult, webResult, smtpResult] = results
// Combine verification results
const combinedScore = this.combineVerificationScores(results)
const isDisposable = combinedScore > 0.7
return {
isDisposable,
confidence: combinedScore * 100,
verificationMethod: 'multi_factor',
details: {
dns: dnsResult,
registration: registrationResult,
web: webResult,
smtp: smtpResult
}
}
}
async getDNSInfo(domain: string): Promise<{
mxRecords: string[]
spfRecord: string | null
dmarcRecord: string | null
}> {
// In production, use actual DNS lookups
// For demo, simulate DNS responses
const mockDNS: Record<string, any> = {
'tempmail.com': {
mxRecords: [],
spfRecord: null,
dmarcRecord: null
},
'mailinator.com': {
mxRecords: [],
spfRecord: null,
dmarcRecord: null
},
'gmail.com': {
mxRecords: ['gmail-smtp-in.l.google.com'],
spfRecord: 'v=spf1 include:_spf.google.com ~all',
dmarcRecord: 'v=DMARC1; p=reject'
}
}
return mockDNS[domain] || {
mxRecords: ['mail.' + domain],
spfRecord: 'v=spf1 mx -all',
dmarcRecord: null
}
}
private async checkDNSRecords(domain: string): Promise<number> {
const dnsInfo = await this.getDNSInfo(domain)
let score = 0
// MX records check
if (dnsInfo.mxRecords.length === 0) score += 0.4
if (dnsInfo.mxRecords.some(mx => mx.includes('temp') || mx.includes('mail'))) score += 0.3
// SPF check
if (!dnsInfo.spfRecord) score += 0.2
// DMARC check
if (!dnsInfo.dmarcRecord) score += 0.1
return Math.min(1, score)
}
private async checkDomainRegistration(domain: string): Promise<number> {
// In production, use WHOIS API
// For demo, simulate based on domain characteristics
if (domain.length < 10) return 0.3 // Short domains are suspicious
if (domain.includes('temp')) return 0.4
if (/d{3,}/.test(domain)) return 0.3
return 0.1 // Low suspicion for normal domains
}
private async checkWebPresence(domain: string): Promise<number> {
// In production, check if website exists and analyze content
// For demo, simulate web presence check
if (domain.includes('temp') || domain.includes('mail')) return 0.5
return 0.1
}
private async checkSMTPAvailability(domain: string): Promise<number> {
// In production, attempt SMTP connection
// For demo, simulate SMTP check
if (domain.includes('temp')) return 0.6 // High likelihood of SMTP issues
return 0.1
}
private combineVerificationScores(results: number[]): number {
return results.reduce((sum, score) => sum + score, 0) / results.length
}
}
class SimilarityEngine {
async findSimilarDomains(domain: string): Promise<string[]> {
const similarDomains: string[] = []
// Generate variations of the domain
const variations = this.generateDomainVariations(domain)
// Check which variations exist (in production, use DNS lookup)
for (const variation of variations) {
if (await this.domainExists(variation)) {
similarDomains.push(variation)
}
}
// Find domains with similar characteristics
const characteristicSimilar = await this.findByCharacteristics(domain)
similarDomains.push(...characteristicSimilar)
return [...new Set(similarDomains)].slice(0, 20) // Limit results
}
private generateDomainVariations(domain: string): string[] {
const variations: string[] = []
const parts = domain.split('.')
if (parts.length >= 2) {
const name = parts[0]
const tld = parts[1]
// Add numbers
for (let i = 1; i <= 10; i++) {
variations.push(`${name}${i}.${tld}`)
}
// Add prefixes
const prefixes = ['temp', 'mail', 'test', 'demo']
prefixes.forEach(prefix => {
variations.push(`${prefix}${name}.${tld}`)
})
// TLD variations
const tlds = ['com', 'net', 'org', 'info', 'biz']
tlds.forEach(newTld => {
if (newTld !== tld) {
variations.push(`${name}.${newTld}`)
}
})
}
return variations
}
private async domainExists(domain: string): Promise<boolean> {
// In production, perform actual DNS lookup
// For demo, simulate based on domain patterns
if (domain.includes('temp') && domain.includes('123')) return true
if (domain.includes('mail') && /d/.test(domain)) return true
return Math.random() > 0.8 // 20% chance of existing
}
private async findByCharacteristics(domain: string): Promise<string[]> {
// Find domains with similar characteristics (length, patterns, etc.)
// In production, use database queries
const similar: string[] = []
if (domain.length < 10) {
similar.push('shortdomain1.com', 'shortdomain2.net')
}
if (/d/.test(domain)) {
similar.push('numericdomain3.com', 'numberdomain4.net')
}
return similar.filter(domain => Math.random() > 0.7) // Random subset
}
}
// Initialize automated domain discovery
const discoveryConfig: DomainDiscoveryConfig = {
crawlInterval: 60, // Every hour
maxDomainsPerCrawl: 100,
verificationTimeout: 30,
similarityThreshold: 0.8,
minConfidenceScore: 70,
externalSources: [
'github_disposable_list',
'abuse_ch_api',
'custom_crawler'
]
}
const domainDiscovery = new AutomatedDomainDiscovery(discoveryConfig)
// API endpoints for automated discovery
app.get('/api/discovery/status', (req, res) => {
const status = domainDiscovery.getDiscoveryStatus()
res.json({
...status,
timestamp: new Date().toISOString()
})
})
// Trigger manual discovery
app.post('/api/discovery/trigger', async (req, res) => {
try {
const result = await domainDiscovery.triggerDiscovery()
res.json({
...result,
timestamp: new Date().toISOString()
})
} catch (error) {
console.error('Manual discovery error:', error)
res.status(500).json({ error: 'Discovery failed' })
}
})
// Subscribe to discovery results
app.ws('/api/discovery/stream', (ws: any) => {
const unsubscribe = domainDiscovery.subscribe((result) => {
ws.send(JSON.stringify({
type: 'discovery_result',
data: result,
timestamp: new Date().toISOString()
}))
})
ws.on('close', () => {
unsubscribe()
})
})
// Get discovered domains
app.get('/api/discovery/domains', (req, res) => {
const status = req.query.status as string || 'all'
const limit = parseInt(req.query.limit as string) || 100
const domains = Array.from(domainDiscovery['discoveredDomains'].values())
let filteredDomains = domains
if (status !== 'all') {
filteredDomains = domains.filter(d => d.verificationStatus === status)
}
res.json({
domains: filteredDomains.slice(0, limit),
total: filteredDomains.length,
timestamp: new Date().toISOString()
})
})
console.log('Automated domain discovery system initialized')Implementation (Node.js + SQL)
1) Maintain a disposable domains table
create table if not exists disposable_domains (
domain text primary key,
source text, -- 'public-list', 'internal-discovery', 'manual'
confidence_score int check (confidence_score between 0 and 100), -- 100 = definitely disposable
first_seen timestamptz default now(),
last_updated timestamptz default now(),
is_active boolean default true
);
-- Index for fast lookups
create index idx_disposable_domains_active on disposable_domains(domain) where is_active = true;
-- Example upsert (run via ETL or cron)
insert into disposable_domains(domain, source, confidence_score)
values ('mailinator.com','public-list',95),
('10minutemail.com','public-list',90),
('guerrillamail.com','public-list',85)
on conflict (domain) do update set
confidence_score = excluded.confidence_score,
last_updated = now();2) Check on registration server-side
import { sql } from '@/lib/db'
export interface EmailValidationResult {
isValid: boolean
isDisposable: boolean
confidence: number
signals: string[]
recommendation: 'allow' | 'block' | 'review'
}
export async function validateEmail(email: string): Promise<EmailValidationResult> {
const domain = extractDomain(email)
if (!domain) {
return {
isValid: false,
isDisposable: false,
confidence: 0,
signals: ['invalid_email_format'],
recommendation: 'block'
}
}
// Check against disposable domains
const disposableCheck = await sql`
select confidence_score, source
from disposable_domains
where domain = ${domain} and is_active = true
`
if (disposableCheck.length > 0) {
const { confidence_score, source } = disposableCheck[0]
return {
isValid: true,
isDisposable: true,
confidence: confidence_score,
signals: [`disposable_domain_${source}`],
recommendation: confidence_score > 80 ? 'block' : 'review'
}
}
// Additional checks (MX, SPF, etc.) could go here
return {
isValid: true,
isDisposable: false,
confidence: 0,
signals: [],
recommendation: 'allow'
}
}
export function extractDomain(email: string): string | null {
const at = email.lastIndexOf('@')
if (at < 0) return null
return email.slice(at + 1).toLowerCase()
}3) DNS/MX quick validation (CLI for ops)
#!/bin/bash
# disposable-check.sh - Quick domain validation
DOMAIN="$1"
if [ -z "$DOMAIN" ]; then
echo "Usage: $0 <domain>"
exit 1
fi
echo "Checking domain: $DOMAIN"
# MX records
MX=$(dig +short MX "$DOMAIN")
if [ -z "$MX" ]; then
echo "⚠️ No MX records found"
else
echo "✅ MX records: $MX"
fi
# SPF record
SPF=$(dig +short TXT "$DOMAIN" | grep -i "spf" | head -1)
if [ -z "$SPF" ]; then
echo "⚠️ No SPF record found"
else
echo "✅ SPF: $SPF"
fi
# Domain age (rough estimate)
WHOIS=$(whois "$DOMAIN" | grep -i "Creation Date" | head -1)
if [ -z "$WHOIS" ]; then
echo "⚠️ Cannot determine domain age"
else
echo "✅ $WHOIS"
fi
# Known disposable check
if curl -s "https://raw.githubusercontent.com/disposable-email-domains/disposable-email-domains/master/domains.txt" | grep -q "^$DOMAIN$"; then
echo "🚨 KNOWN DISPOSABLE DOMAIN"
fi4) Optional SMTP reachability probe
import { createConnection } from 'net'
export async function checkSMTPCapability(domain: string): Promise<{
canConnect: boolean
supportsTLS: boolean
error?: string
}> {
return new Promise((resolve) => {
const client = createConnection(25, domain)
let response = ''
let supportsTLS = false
client.setTimeout(5000) // 5 second timeout
client.on('data', (data) => {
response += data.toString()
if (response.includes('220') && response.includes('ESMTP')) {
// Send EHLO to check TLS support
client.write('EHLO example.com
')
}
if (response.includes('STARTTLS')) {
supportsTLS = true
}
})
client.on('timeout', () => {
client.destroy()
resolve({ canConnect: false, supportsTLS: false, error: 'timeout' })
})
client.on('error', (err) => {
resolve({ canConnect: false, supportsTLS: false, error: err.message })
})
client.on('connect', () => {
// Wait for banner and check
setTimeout(() => {
client.destroy()
resolve({ canConnect: true, supportsTLS })
}, 1000)
})
})
}User Experience and Policy
Soft Blocks vs Hard Blocks
Prefer soft blocks with clear messaging:
// Example soft block response
const softBlockResponse = {
success: false,
error: {
code: 'DISPOSABLE_EMAIL',
message: 'We detected a temporary email address. Please use a permanent email to continue.',
suggestion: 'Try Gmail, Outlook, or your work email address.'
}
}Hard blocks only for high-confidence cases (95%+). For medium confidence (50-80%), use:
- Step-up verification: SMS, phone, or payment method.
- Delayed activation: Email verification required after signup.
- Rate limiting: Limit actions until email is verified.
Exception Handling
-- Temporary allowlist for business-critical cases
create table email_allowlist (
email_pattern text primary key, -- 'user@company.com' or '%@trusted-domain.com'
reason text,
expires_at timestamptz,
added_by text,
is_active boolean default true
);
-- Check if email is allowlisted
select exists(
select 1 from email_allowlist
where (email_pattern = :email or :email like email_pattern)
and is_active = true
and (expires_at is null or expires_at > now())
) as is_allowlisted;Monitoring and Alerting
Weekly Trends
-- Disposable email share by week
with weekly_stats as (
select
date_trunc('week', created_at) as week,
count(*) as total_signups,
sum(case when is_disposable then 1 else 0 end) as disposable_count
from user_registrations
where created_at >= now() - interval '12 weeks'
group by 1
)
select
week,
total_signups,
disposable_count,
round(100.0 * disposable_count / total_signups, 2) as disposable_percentage,
-- Trend indicator
lag(disposable_percentage) over (order by week) as prev_percentage,
case
when disposable_percentage > lag(disposable_percentage) over (order by week) * 1.5
then '📈 SPIKE'
when disposable_percentage < lag(disposable_percentage) over (order by week) * 0.7
then '📉 DROP'
else '➡️ STABLE'
end as trend
from weekly_stats
order by week desc;Real-time Alerts
#!/bin/bash
# disposable-alert.sh - Monitor for spikes in disposable usage
# Config
THRESHOLD_PERCENT=15 # Alert if >15% of signups are disposable
CHECK_HOURS=1 # Check last hour
DB_HOST="localhost"
DB_NAME="analytics"
# Query current rate
CURRENT_RATE=$(psql -h "$DB_HOST" -d "$DB_NAME" -tA -c "
select coalesce(
100.0 * sum(case when is_disposable then 1 else 0 end) / count(*),
0
)
from user_registrations
where created_at > now() - interval '$CHECK_HOURS hours'
")
# Check threshold
if (( $(echo "$CURRENT_RATE > $THRESHOLD_PERCENT" | bc -l) )); then
echo "$(date): ALERT - Disposable rate at ${CURRENT_RATE}% (threshold: ${THRESHOLD_PERCENT}%)"
# Send Slack notification, email, or trigger PagerDuty
curl -X POST -H 'Content-type: application/json' --data "{"text":"🚨 Disposable email spike: ${CURRENT_RATE}% in last hour"}" "$SLACK_WEBHOOK_URL"
fiDashboard Metrics
Track these KPIs:
- Daily/weekly disposable % — target <5%.
- Top disposable domains — identify new threats.
- Conversion rates — disposable vs permanent email users.
- Bounce rates — correlation with disposable usage.
-- Top disposable domains in last 30 days
select
email_domain,
count(*) as usage_count,
max(created_at) as last_seen
from user_registrations
where is_disposable = true
and created_at > now() - interval '30 days'
group by 1
order by 2 desc
limit 20;FAQ and Edge Cases
Corporate Testing Domains
Many companies use test domains like test@company.com or qa@internal.company.com.
Solution: Temporary allowlist with expiration:
// Add to allowlist for 30 days
await sql`
insert into email_allowlist (email_pattern, reason, expires_at, added_by)
values (%qa@internal.company.com%, 'Corporate testing', now() + interval '30 days', 'admin')
`Catch-all Corporate Domains
Large organizations often have catch-all domains where any email goes to a central inbox.
Detection: High volume + SMTP reachability + low bounce rates.
-- Identify potential catch-all domains
select
email_domain,
count(*) as signup_count,
avg(bounce_rate) as avg_bounce_rate
from user_registrations
where created_at > now() - interval '30 days'
group by 1
having count(*) > 100 -- High volume
and avg(bounce_rate) < 0.05 -- Low bounces
order by 2 desc;Internationalized Domains (IDN)
Domains with non-ASCII characters (e.g., münchen.de → xn--mnchen-3ya.de).
Solution: Normalize to punycode before checking:
import { punycode } from 'punycode'
export function normalizeDomain(domain: string): string {
try {
return punycode.toASCII(domain.toLowerCase())
} catch {
return domain.toLowerCase()
}
}False Positives
Common issues:
- Legitimate temp emails: Alumni associations, conference registrations.
- Corporate aliases:
noreply@company.comused for notifications. - Educational institutions: Student email forwarding.
Mitigation:
- Manual review queues for edge cases.
- Allowlist management for known good domains.
- Confidence scoring vs binary decisions.
-- Manual review queue for borderline cases
select user_id, email, confidence_score, created_at
from user_registrations
where is_disposable = true
and confidence_score between 50 and 80 -- Medium confidence
and created_at > now() - interval '24 hours'
order by created_at desc;Best Practices
1. Layered Defense: Combine multiple signals rather than relying on one.
2. Regular Updates: Refresh disposable domain lists weekly.
3. A/B Testing: Test different thresholds and policies.
4. User Education: Clear messaging about why permanent emails are preferred.
5. Monitoring: Set up alerts for spikes and trends.
6. Privacy Compliance: Ensure checks comply with regional laws (GDPR, CCPA).
Integration Examples
Express.js Middleware
import { validateEmail } from './email-validator'
app.post('/api/signup', async (req, res) => {
const { email } = req.body
const validation = await validateEmail(email)
if (validation.recommendation === 'block') {
return res.status(400).json(validation)
}
if (validation.recommendation === 'review') {
// Queue for manual review or step-up auth
req.session.pendingReview = validation
}
// Continue with signup...
})Python FastAPI
from fastapi import HTTPException
from .email_validator import validate_email
@app.post("/signup")
async def signup(email: str):
validation = await validate_email(email)
if validation["recommendation"] == "block":
raise HTTPException(
status_code=400,
detail=validation["error_message"]
)
if validation["recommendation"] == "review":
# Trigger additional verification
pass
return {"message": "Signup successful"}This comprehensive approach balances fraud prevention with user experience, ensuring legitimate users aren't unnecessarily blocked while protecting your platform from abuse.