Periskop: Exception Monitoring at Scale
This talk is aimed for engineers operating in distributed environments (or microservices) interested in monitoring exceptions at scale. We introduce the open source project "Periskop", a pull-based exception monitoring service built at SoundCloud and inspired by Prometheus.
- What problems did we encounter with the traditional push-based model for exception monitoring.
- Thundering herd issues with bad deployments
- Difficulty navigating large volumes of logs for identifying exceptions
- An alternative pull-based model that scales well with the number of exceptions and instances.
- Aggregation + sampling for concrete occurrences
- Limitations and trade-offs (short lived processes and fork-based application servers)
- An implementation of such model into the open source project "Periskop"
- Initial Development
- Server and client-libraries
- Newly added features and roadmap (push-gateway, federation, time series visualization, integrations)
Speakers:
Jorge Creixell
Marc Tuduri