bayesian-spam-filtering

A statistical method that calculates the probability an email is spam based on the occurrence of tokens, using Bayes' Rule to combine evidence.

2 chapters across 2 books

Hackers & Painters: Big Ideas from the Computer Age (2008)Paul Graham

Chapter 8. A Plan for Spam

Paul Graham argues that spam can be effectively stopped using content-based filters, specifically Bayesian statistical methods that analyze the probability of individual words indicating spam. He details the construction and tuning of such filters, emphasizing the importance of minimizing false positives and customizing filters per user to adapt to evolving spam tactics. The chapter critiques simpler heuristic filters and highlights the advantages of probabilistic approaches over arbitrary scoring systems.

Hackers & Painters (2008)Paul Graham

Chapter 8. A Plan for Spam

Paul Graham argues that spam can be effectively stopped using content-based filters, specifically Bayesian statistical methods that calculate the probability of an email being spam based on word occurrences. He explains how traditional feature-based filters are limited by false positives and the difficulty of identifying all spam features, while Bayesian filters adapt to individual users' email patterns and provide probabilistic assessments. The chapter details the implementation of such filters, their advantages, and the importance of personalized training corpora to maintain effectiveness against evolving spam.