The State of Monitoring – Insights from Monitorama PDX 2015

SVT values the continuous education of employees very highly. This offers some great opportunities. Opportunities like when four of us members of the Webcore team attended the conference Monitorama 2015 in Portland, USA in June.

Monitorama describes itself as an Open Source monitoring conference. With monitoring in the context of IT we generally mean the technology, process and science of measuring how our technical systems, e.g. our web servers, are actually doing. Are they up, are they overloaded? In a way, one could compare IT monitoring to all the little gauges and screens in a jet cockpit, for the pilots to know their position and how the plane is doing; including the plane’s ”black box”, containing historical recordings that help figure out the root cause of a problem after something went wrong.

Without monitoring our IT systems, we are basically flying our systems blind. And we will neither be able to analyse our failures after the fact and learn from them. This is why monitoring is at the heart of an ongoing paradigm-shift in the IT world that we have given the sloppy moniker ”DevOps”. It is basically a new ideal of how people of different disciplines in IT work together in a more open and collaborative way.

Monitoring gets SVT’s Webcore team excited

At SVT Interaktiv, we are big believers in DevOps and monitoring. Part of the Webcore team’s responsibilities is to look after the monitoring infrastructure for everybody in the division. So obviously, Monitorama was a great opportunity for us, getting to learn about and discuss the state of the art of IT monitoring as well as all those Open Source tools that have made monitoring much more accessible for anyone in the recent years. And in spite of the long and tiring air travel, I have to say that I seldom felt that a conference journey was as much worth undertaking as this one. Obviously, it was a great experience getting to know the lively city of Portland, Oregon, sampling some of the great food and micro brewery culture under a clear blue sky. The real highlights, however, were the conference itself and the community surrounding it.

To start with, Monitorama 2015 was very well organised. There were about 570 attendees taking seat in front of the main stage of the Gerding Theatre to follow three days of programme. The organisers made an effort in making it very easy to continue the discussion after the talks, be it at the organised after parties or in the chat rooms that all attendees and speakers have access to. Due to the single track nature of the conference, everybody had heard the same talks.

Ines Sombra presenting her talk ”From zero to capacity planning” at Monitorama PDX 2015

We detected a few main themes for ourselves in the technical discussions during the conference. One of the main buzzwords at Monitorama 2015 was stream processing. Stream processing is a model that delivers concepts and vocabulary to better describe how our current monitoring systems are working. I think this simply is a sign of a maturing ecosystem.

Another interesting discussion evolved around the question of push versus pull monitoring. Should monitoring systems should pull information from the systems they surveil or should these systems themselves push their data into the monitoring system? It’s a trade-off between a scheduling problem and a load balancing problem that both have their merits. Which one is better for you really depends on your infrastructure and expertise.

Automated anomaly detection was pondered on quite a bit. The advantages are clear: Why not have a computer look at all those graphs and alert you when there is something suspicious happening? Instead of humans having to do it. Unfortunately, this really is a hard problem to solve and generally applicable solutions simply seem not to be there just yet.

On a more practical level, one of the strong trends in modern monitoring continues to be widening the perspective from a traditional per-host view to include a higher-level oversight, with a focus on services, clusters and business metrics.

A much-cautioned advise was to be concious about the target audience of a monitoring system. An infrastructure/monitoring team should not be the only or even main user of their own monitoring system. The ultimate outcome would be that all functions of your organisation heavily rely on your monitoring system – be it developers, designers or business people. This raises requirements on the productification of monitoring systems, with a focus on usability and self-serviceability. These systems need to be easy and safe to use.

Personally, I have come to the insight that monitoring is much more of a big data problem than I had realised earlier. It needs to be more widely understood that the monitoring of big and complex systems itself requires a lot of resources. As Roy Rapoport channelled an internal joke at his company: ”Netflix is a sophisticated monitoring system that also happens to be streaming video at times”. The EC2 bill for their monitoring system apparently is the highest of any system at Netflix.

Kyle Kingsbury and his hand-drawn slides from his talk ”Working with Riemann” at Monitorama PDX 2015

As you can see, there were a lot of insights to be gained at Monitorama. The most striking aspect of the conference, in my opinion however, was that it showed just how inclusive the community is. In spite of how small of a niche of the IT world the conference covered, attendees and speakers featured diverse backgrounds and expertise in distinct disciplines. From developers presenting their new stream processing library, to designers talking about good dashboard user interfaces. From social scientists reporting about researching user behaviour, to data analysts unveiling their latest anomaly detection algorithm.

Diversity was definitely one of the key concerns of the organisers, setting not only a interdisciplinary content agenda, but also a very clear anti-discrimination policy. The gender distribution among both speakers and attendees at Monitorama certainly was a lot more even than what one has learned to expect from technology conferences in the past. This surely validates the organisers’ approach. On a higher level though, I think it shows that the open and communicative culture of the DevOps movement can be an important brick on the road to fixing the male-dominated IT industry.

Without doubt, Monitorama offered an opportunity to see what the current DevOps movement actually is about. Communicating and collaborating with people from different backgrounds to make IT better and more humane — both for creators and users. Personally, I find this to be a great time and area to be working in, inside the software industry.

 

P.S.: Many of the Monitorama PDX 2015 talks are have been recorded and can be watched here:
https://vimeo.com/search?q=monitorama+pdx+2015