Rspamd - fast opensource spam filter
Rspamd is fast open source (BSD licensed) spam filtering system that uses score system to filter messages. In this presentation, I will speak about the internal architecture, performance optimizations, security issues, algorithms used and general spam filtering problems.
In this presentation, I will describe rspamd - a fast spam filtering system used by many companies that process large volumes of e-mail. I'll demonstrate performance comparison graphs, algorithms and rspamd's internal architecture. The overall talk will consist of 4 components:
Introduction to spam filtering:
- What are the most popular spam types so far: advertising, fraud, cloaked spam, images spam
- What techniques are mainly used to fight spam: adaptive filtering and machine learning, patterns matching, black and white lists
- Why is it so difficult to write a good spam filtering system: spam for one person could be a useful message for another
- What is wrong with spam: ethical and security considerations
Architecture description:
- What makes rspamd different from other state of the art spam filters: internal architecture, plugins and rules
- Which algorithms are optimal to fight spam: OSB Bayes, shingles for fuzzy hashes
Performance optimizations used:
- How to write a spam filter that can filter hundreds of messages per second on commodity hardware: event based architecture
- Global and local optimizations used: abstract syntax tree optimizations, branches cut, greedy optimizations
- Why the standard approaches are broken: zero terminated strings, many POSIX functions
- Other high performance technologies: hyperscan, pcre jit, aho-corasic tries, radix tries, lua-jit
Security discussion:
- Why encryption is important for all traffic in the network
- Why email security is absolutely essential at all stages
- What makes TLS not so easy to introduce in a datacentre: latency issues, hard to use with events, complicated model of trust
In conclusion, this talk describes a lot of state-of-the-art methods of spam filtering and other topics, such as writing high performance systems and security in email processing systems.
Speakers:
Vsevolod Stakhov