Automating all the things – More Continuous Delivery at SVTi
Our Developer Experience vision
Team autonomy is one of our most fundamental values here at SVTi. Having independent and self-responsible teams that each own distinct products works well with our organisation that is built around our different services, be it the Bolibompa app or SVT Play. With everything built around team autonomy, the challenges in such an organisation, unsurprisingly, lies in consistency and synergy between the different teams. Because, even though we’d take autonomy over synergy if in doubt, we definitely need a pretty solid level of synergy between the teams: Not everybody should reinvent the wheel all the time. This is what we from the Developer Experience (DX) team here at SVTi are working with. Our vision:
Make SVTi the best place for developers to create.
The general recipe boils down to making it easy for everybody to do the right thing. Offering general-purpose building blocks and collaborating on how to utilise them without forcing people to all work in the same way. Kind of in the spirit of the half-built houses of Constitución, Chile. This also means that we often deduct the common use cases after the fact and iteratively improve on common infrastructure and tooling, instead of indulging in the illusion that we would be able to anticipate everything in beforehand.
One such piece of common infrastructure that we work with and recently iterated over is our Continuous Delivery (CD) toolbox.
The Continuous Delivery toolbox
One Jenkins per team
Since years back at SVTi, every team has their own Jenkins VM running on Linux that they use to automate their own Continuous Delivery pipelines. Separate Jenkins instances allows for better performance and the individual teams to have different paces in adopting new Jenkins technologies. They can have change their configuration and install or update plugins without the fear to destroy some other team’s working automation. Less stepping on each-others’ feet, simply put.
This also encourages team ownership over their Jenkins machines. We from DX merely provide a base setup pre-installed and pre-configured with useful defaults that fit our documented best practices (our tool to tame the historical Jenkins plugin wildgrowth). Every team actually owns their Jenkins instance though. They have SSH and sudo and admin access on their machine and are responsible for keeping their OS and plugins updated.
At DX, all Jenkins instances are automatically provisioned and continuously updated using Ansible playbooks. The VM creation is currently only semi-automated and the teams will have to go through some manual steps, for example to connect their Jenkins instance to their desired Slack channel for notifications. But about 95% of an instance’s setup (including things that might need to be updated, like the authentication setup) is automated. A number that we deem satisfactory, since the Jenkins VMs have a pretty long lifetime.
The basic setup that a team gets includes:
- Pre-installed OS packages for CD automation, like
- Pre-configured authentication via our directory server
- Pre-installed and preconfigured Jenkins plugins
- Secrets for access to repositories and deployment platforms
- Automatic security updates including nightly reboot to apply kernel updates
Apart from moving from Ubuntu 14.04 to CentOS 7, we had two goals with our iteration on the Jenkins setup: Making it easier for the teams to use newer Pipeline-as-code features and improving the security of our Jenkins setup. We also moved from Jenkins latest to the Jenkins LTS release repository, as a sane default for teams that want a lower-maintenance Jenkins setup.
Improving Jenkins security
Separate build agent
One of the major steps to improve security of the setup was to move to a master and build agent model. In master-only Jenkins setups, all build jobs can obviously only run on the only available node, the master. By default, these build jobs have the exactly same system permissions as the Jenkins master web server, since they are both run as the
jenkins Linux user. That means, build jobs can read and write everything in Jenkins’ data folder, which contains everything from build stats to authentication configuration. A hypothetically compromised build job, let’s say by a malicious
gradle build plugin, could thereby easily read security-sensitive configuration or install malicious plugins in Jenkins. Even worse though, since the
jenkins user needs to have access to persisted secrets and keys, the compromised code could read those and thus spread to other systems that the Jenkins machine has access to.
Obviously, one can run build jobs as a different user, but we can do better. Containerization like Docker could help here, but container insulation is still not quite as secure as a VM, which is why we opted for the approach to use a separate VM as the default build agent. In that way we can confine sensitive information like SSH deploy keys to the Jenkins master. For deploy jobs that need access to those secrets, you can specifically run that job on the master by supplying the build with a label. Everything else runs on the build agent VM by default. Since our deploy operations are only a small set of well-known commands, this reduces the exposure of these secrets to a minimum.
Obviously, also the build jobs running on the build agent VM need access to some secrets for being able to pull code from source code and binary repositories. For that we use the Jenkins Credentials Store, it persists secrets on the master machine in encrypted form. When we need a secret on the build agent, we explicitly export that secret to that build job specifically. Some Jenkins plugins, like the
git plugin, have build-in support for supplying them with a credentials id from the Credentials Store. In JobDSL (see below) code this can look something like that:
When using shell commands or plugins that do not have Credentials Store integration, you can use credentials binding to export specific credentials to environment variables inside a job. Here is an example in declarative pipeline code:
Credentials binding is a quite a bit of a compromise though: Be aware that if you have several executors on the same machine, jobs running in parallel can see each other’s environment variables by using system tools like
ps. To mitigate that, you could turn off the
jenkins user’s access to
/proc. Or simply use only one executor per build agent.
The state of Jenkins Pipeline as Code
The long-standing best-practice at SVTi has been to utilise the JobDSL plugin to automate our Continuous Delivery pipelines as code. JobDSL is a bit of a hack, but by now it is a mature and proven hack that basically enables you to script the configuration of your Jenkins jobs by writing Groovy DSL (domain-specific language) scripts. These scripts are checked into source control along with the project code. These scripts contain quite a bit of boiler plate, as you can see in the example below:
However, the JobDSL is mature and when mixed with other plugins like the Delivery Pipeline visualisation, it makes for a powerful tool for building sophisticated pipelines.
Jenkins scripted pipeline
At some point while the shift from continuous integration to continuous delivery was already well on its way, the Jenkins developers started building a new engine inside an extension called Workflow plugin that would make pipelines, instead of monolithic build jobs, the first class abstraction of work. This effort was renamed simply Pipeline and promoted to be pre-installed since Jenkins 2.0, making it the officially endorsed way of doing Pipeline as code in Jenkins.
In comparison to the native pipeline capabilities, JobDSL clearly appears as being a hack. Jenkins Pipelines do no longer chain old Jenkins jobs, it’s a completely new abstraction that was and still is incompatible with many existing plugins. Thankfully, CloudBees open-sourced their rather admittedly basic, formerly proprietary-licensed pipeline visualisation plugin as part of Jenkins 2, so you at least have a way of visualising these pipelines.
With Jenkins Pipeline, neither does one need seeder jobs any more: Pipeline jobs (with multi-branch support) pick up script files conventionally called
Jenkinsfile written in a new Groovy DSL for automating pipelines in an imperative way. No longer does one need to pass parameters to downstream jobs, variables and code from Source Code Management (but not workspaces) are scoped to the lifecycle of the whole pipeline. More sophisticated use cases, like try/catch flow control and advanced parallelisation use cases, are easy to implement. Have a look at the example below of a pipeline written in the Scripted Pipeline Groovy DSL:
The classic Jenkins interface has the following visualisation for such pipelines:
Jenkins declarative pipeline
Just when one was hoping for the dust to settle around the new Jenkins Pipeline, CloudBees (the company behind a lot of Jenkins development) decided to invent yet another Groovy DSL. It is still based on the same Pipeline engine that also the scripted pipeline from above is based on, but this time it comes in a declarative form. It is more limited and opinionated as to what use cases are supported, this makes it more concise and easier to read, values that we highly appreciate here at DX. This certainly creates the nicest looking pipeline code on Jenkins to day.
One of the main reasons for the move seems to be that CloudBees are developing a brand new user interface for Jenkins, with Pipeline at its centre. It sports a much more modern look at pipelines and features such niceties as pull request integration (only with GitHub currently). The effort is dubbed Blue Ocean and currently in a release candidate stage.The interface can be previewed by installing the Blue Ocean plugin in an existing, modern Jenkins instance.
So the declarative pipeline DSL is modelled so that it is easy for Jenkins to visualise pipelines written in it. This limits the way parallelism can be implemented in them to only being able to run steps in the same node in parallel, as opposed to stages on different nodes. This has proven to be a problem for a lot of our more sophisticated pipelines, especially since we use a different nodes for deployment and building. We also rely on stages as scopes for our credentials bindings, something that you cannot do with steps. So declarative pipeline currently has limited applicability for many use cases.
The level of support for visualising the more complex capabilities of scripted pipelines is currently a bit hard to make out. I guess we have to wait and see and hope for Blue Ocean and the Pipeline DSL dialects to mature.
So which DSL to pick then?
Our old and new best practice recommendation still is to use JobDSL, it’s mature and proven and our teams know how to work with it. Currently, it also offers the most sophisticated visualisation for what many of us want to do with it. However, we expect this to change in the not-so-distant future. Right now it is great for us that the same Jenkins setup can support the old, mature way of working as well as experimenting with the new way of working.
The decision if Pipeline is mature enough is ultimately up to every team. Some teams have already migrated some of their pipelines to Scripted Pipeline, some have dabbled and have decided that the effort is not worth it just yet. If Pipeline ultimately proves to be better, teams will migrate.
What’s happening with Jenkins is definitely interesting. Jenkins has been and still is a bit messy with all its competing plugins and hacks. However, I can definitely see the project going in the right direction with standardising around Pipeline. Jenkins and its huge community tend to be carving out a way for themselves one way or the other. It probably doesn’t ever get perfect, but it certainly has been pretty good for a whole lot of things for a long time.
Generally, the continuous delivery space has been accelerating with increased competition. The times are over when Jenkins was the unrivalled king of open source Continuous Integration are over – and that is a good thing.
I actually suspect that the next big shift might not coming from alternative standalone CD servers, but from integrated development platforms with built-in CD capabilities. The ability of your CD system to automatically integrate, test and merge pull requests in case of success enables automation of a lot more development processes than we have traditionally been able to. Sure, this is nothing really new and you can definitely do that nicely using Jenkins (at least if you are using GitHub as SCM), but integrated systems could definitely have an edge here.
And this leads us directly into our next project here at DX: We are having a close look at GitLab as a candidate for on-premise code hosting – and I’m definitely looking forward to exploring their built-in continuous delivery capabilities in detail!