conferences | speakers | series

Rspamd - fast opensource spam filter

home

Rspamd - fast opensource spam filter
FOSDEM 2016

Rspamd is fast open source (BSD licensed) spam filtering system that uses score system to filter messages. In this presentation, I will speak about the internal architecture, performance optimizations, security issues, algorithms used and general spam filtering problems.

In this presentation, I will describe rspamd - a fast spam filtering system used by many companies that process large volumes of e-mail. I'll demonstrate performance comparison graphs, algorithms and rspamd's internal architecture. The overall talk will consist of 4 components:

  • Introduction to spam filtering:

    • What are the most popular spam types so far: advertising, fraud, cloaked spam, images spam
    • What techniques are mainly used to fight spam: adaptive filtering and machine learning, patterns matching, black and white lists
    • Why is it so difficult to write a good spam filtering system: spam for one person could be a useful message for another
    • What is wrong with spam: ethical and security considerations
  • Architecture description:

    • What makes rspamd different from other state of the art spam filters: internal architecture, plugins and rules
    • Which algorithms are optimal to fight spam: OSB Bayes, shingles for fuzzy hashes
  • Performance optimizations used:

    • How to write a spam filter that can filter hundreds of messages per second on commodity hardware: event based architecture
    • Global and local optimizations used: abstract syntax tree optimizations, branches cut, greedy optimizations
    • Why the standard approaches are broken: zero terminated strings, many POSIX functions
    • Other high performance technologies: hyperscan, pcre jit, aho-corasic tries, radix tries, lua-jit
  • Security discussion:

    • Why encryption is important for all traffic in the network
    • Why email security is absolutely essential at all stages
    • What makes TLS not so easy to introduce in a datacentre: latency issues, hard to use with events, complicated model of trust

In conclusion, this talk describes a lot of state-of-the-art methods of spam filtering and other topics, such as writing high performance systems and security in email processing systems.

Speakers: Vsevolod Stakhov