Anatomy of a DevOps Orchestration Engine: (III) Agents

MaestroDev logoPreviously: (II) Architecture

In Maestro we typically use a Maestro master server and multiple Maestro agents. Each Maestro Agent is just a small service where the actual work happens, it processes the work sent by the master, via ActiveMQ, and executes the plugins with the data received.

Architecture

The two main goals of the agent are load distribution and heterogeneous composition support. The more agents running, the more compositions that can be executed in parallel, and compositions can target specific agents based on its features, such as architecture, operating system,… which is a must for development environments. For simplicity each agent can only run one composition at a time, but you could have multiple agent processes running in a single server.

It uses Puppet Facter to gather the machine facts (operating system, memory size, cloud provider data,…) and sends all that information to the master, that can use it to filter what compositions run in the agent. For instance I may want to run a composition in a Windows agent, or in an agent that has some specific piece of software installed. Facter supports external facts so it is really easy to add new filtering capabilities, and not be just limited to what Facter provides out of the box. A small text file can be added to /etc/facter/facts.d/ and Facter would report it to the master server.

Agents are installed alongside with all the tools that may be needed, from Git, to clone repos, to Jenkins swarm to reuse the agents as Jenkins slaves, or mcollective agents to allow updating the agent itself automatically with Puppet when new manifests are deployed to the Puppet master. In our internal environment any commit to Puppet manifests or modules automatically trigger our rspec-puppet tests, the deployment of those manifests to the Puppet master, and a cascading Puppet update of all the machines in our staging environment using MCollective. All our Puppet modules are likewise built and tested on each commit and a new version published to the Puppet Forge automatically using rspec-puppet and Puppet Blacksmith.

Maestro also supports manually assigning agents to pools, and matching compositions with agent pools, so compositions can be limited to run in a predefined set of agents.

The agent process is written in Ruby and runs under JRuby in the JVM, thus supporting multiple operating systems and architectures, and the ability to write extensions in Java or Ruby easily. It connects to the master’s Composition Execution Engine through ActiveMQ using STOMP for messaging.

Plugins

Plugins are small pieces of code written in Java or Ruby that run in the agent to execute the actual work. We have made all plugins available in GitHub so they can be used as examples to create new plugins for custom tasks.

Plugins can be added to Maestro at runtime and automatically show up in the composition editor. The plugin manifest defines the plugin images, what tasks are defined, and what fields in each task. Based on the workload received, the agent downloads and executes the plugin, which just accesses the fields in the workload and do the actual work, whatever it might be, sending output back to LuCEE and populating the composition context.

For instance the Fog plugin can manage multiple clouds, such as EC2, where it can start and stop instances. The plugin receives the fields defined in the composition (credentials, image id,…), calls the EC2 API, streams the status to the Maestro output (successfully created, instance id,…) and puts some data (ids of the instances created, public ips,…) in the composition context for other tasks to use. All of that in less than 100 lines of code.

The context is important to avoid redefining field values and provide some meaningful defaults, so if you have a provision task and a deprovision task, the values in the the latter are inherited from the former.

Agent cloud manager

The agent cloud manager is a service that runs on Google Compute Engine and watches a number of Maestro installations to provide automatic agent scaling. Based on preconfigured parameters such as min/max number of agents for each agent pool, max waiting time,… and the current status of each agent pool queue, the service can start new machines from specific images, suspend them (destroy the instance but keep the disk), or completely destroy them.

We are also giving a try to Docker instead of using full vms and have created a couple interesting Docker images on CentOS for developers, a Jenkins swarm slave image and a build agent image that includes everything we use at development: Java, Ant, Maven, RVM (with 1.9, 2.0, 2.1, JRuby), Git, Svn, all configurable with credentials at runtime.

Anatomy of a DevOps Orchestration Engine: (II) Architecture

MaestroDev logo

Previously: (I) Workflow

Maestro architecture is basically defined by a master server and multiple agents, written in Java and Ruby (JRuby) for the backend and JavaScript for the frontend using AngularJS, and integrating several open source services. It is quite heterogeneous, with multiple languages, build tools, packages,… using the best tool for the job in each part of the stack.

Architecture

Master

The master services include

  • Maestro REST API
  • End user web interface
  • Composition Execution Engine (LuCEE)
  • ActiveMQ for STOMP messaging
  • PostgreSQL (or MySQL)
  • MongoDB

Maestro REST API

The REST API is a webapp written in Java, using Spring, packaged with a Jetty server. It is documented with Swagger annotations that generate a really nice web interface automatically that allows trying all the operations from the browser.

It handles caching, security, based on LDAP or database records, and delegates to the Composition Execution Engine (LuCEE) typically through LuCEE REST API but also via STOMP messaging to avoid continuous polling.

It also implements handlers to execute compositions from Github, Git, SVN,… on commit callbacks.

End user web interface

The end user UI is written in AngularJS using the AngularJS Bootstrap components and Less stylesheets. It connects to the REST API, so everything that can be done through the webapp can also be automated using the REST API (automation, automation, automation!). I have found Angular really nice to work with besides the service, factory, provider,… complicated abstractions, with good modularity and the ability to reuse third party plugins.

Built with Maven and Grunt (better for the Javascript parts), using Bower to manage all the Javascript dependencies (angular core, bootstrap, ladda button spinner,…), and Karma + PhantomJS, for headless UI tests without needing a real browser.

Composition Execution Engine (LuCEE)

LuCEE is a webapp that manages the execution of compositions, sending/receiving work to/from the agents through ActiveMQ STOMP queues, and storing state in the PostgreSQL database. LuCEE uses the Ruote workflow engine for work scheduling, and manages the compositions queue and agent routing, so basically checks what compositions need to be executed and decides in what agent to execute them, based on composition requirements, free agents, and other factors ie. prioritizing previously used agents that would likely have a cached copy of sources and dependencies to speed things up.

It is written in Ruby, it was quick to implement a first version, with a simple REST API using Sinatra and a STOMP connector to send messages to the Maestro REST webapp through ActiveMQ.

It is packaged as a JRuby war with Warbler, and both LuCEE and the REST API wars are run in the same Jetty server, all packaged as an RPM for easier deployment.

ActiveMQ

ActiveMQ handles all the comunication between LuCEE, the REST API webapp, and the agents using multiple STOMP queues. All the comunication between LuCEE and agents such as workloads, agent output, agent status,… is sent over a queue so it can be easily scaled across a high number of agents.

LuCEE also pushes changes in the database to the REST API webapp so it can update the caches without needing continuous polling.

PostgreSQL

LuCEE uses PostgreSQL (or MySQL or any other SQL database using Ruby Datamapper) as main storage to save compositions, projects, tasks,… The SQL database is also used by the REST API webapp to store permissions and user data when not using LDAP.

MongoDB

We found that in order to do more complex dashboards and reports we needed to store all sort of unstructured data from the plugins, from run time or status to anything that a plugin developer may want such as GitHub payload data received or test stacktrace. That data is sent by the agents to LuCEE and then stored in MongoDB, and can be queried directly (all your data belong to you) or through a reporting pane in the webapp.

Next: (III) Agents

Anatomy of a DevOps Orchestration Engine: (I) Workflow

MaestroDev logoAt MaestroDev we have been building what may be called, for lack of a better name, a DevOps Orchestration Engine, and is long overdue to talk about what we have been doing there and most importantly, how.

The basics of the application is to tie together the different systems involved in a Continuous Delivery cycle: Continuous Integration server, SCM, build tools, packaging tools, cloud resources, notification systems,… and streamline the process through these different tools. So it hooks into a bunch of popular tools to orchestrate interactions between them, an example:

Screen Shot 2014-07-11 at 11.20.12 AM

This workflow, or as we call it, composition, will

  1. download a war file from a Maven repository (previously built by Jenkins)
  2. start an Amazon EC2 instance with Tomcat preinstalled
  3. deploy the war
  4. checkout the acceptance tests from Git
  5. run some tests with Maven (Selenium tests using SauceLabs) against that instance
  6. wait for an user to confirm before moving to the next step (to record the human approval or to do some extra manual tests if needed)
  7. destroy the Amazon EC2 instance

Maestro provides a nice web UI that gives visibility over the composition execution and an aggregated log from all the tools that run during the composition in a single place.

Screen Shot 2014-07-15 at 10.42.42 AM

 

But the power comes with the combination of compositions together, as there are tasks for typical flows, such as running forking and joining compositions, call another composition in case of a failure, or waiting for a composition to finish.

Screen Shot 2014-07-11 at 11.19.54 AM

Here we have a more complex setup with five compositions tied together.

  • * – A composition that calls compositions 1 and 2.
  • 1 – A Jenkins build
  • 2 – The acceptance tests composition mentioned before
  • 2a – Notification composition in case the acceptance tests fail
  • 3 – Deployment to production

So you can see that compositions are not just limited to build, test, deploy. The tasks can be combined as needed to build your specific process.

Tasks are contributed by plugins, easily written in Ruby or Java, and define what fields are needed in the UI and what to do with those fields and the composition context. Maestro includes a lot of prebuilt tasks, publicly available on GitHub, from executing shell scripts to Jenkins job creation or Amazon Route 53 record management, but anything.

All the tasks share a common context and use sensible defaults, so if the scm checkout path is not defined it creates a specific working directory for the composition, and that is reused by the Maven, Ant,… plugins to avoid copying and pasting the fields. That’s also how a EC2 deprovision task doesn’t need any configuration if there was a provision task before in the composition, it will just deprovision those instances started previously in the composition by default.

You can take a look at our Maestro public instance, showing some examples and builds of public projects, mostly Puppet modules that are automatically built and deployed to the Puppet Forge, and Maestro plugins build and release compositions. In next posts I’ll be talking about the technologies used and distributed architecture of Maestro.

Next: (II) Architecture

MaestroDev named DevOps “Cool Vendor” by Gartner

MaestroDev logo

Warning, some self-promotion ahead! 🙂

Gartner has published their annual list of Cool Vendors, including a section for DevOps, where we are one of the 5 selected companies.

Not a big fan of this analyst things, but quite proud of being included in such a short list, right next to the people from CFEngine, Opscode and Puppet Labs, that are very active on the DevOps space and, in the case of PuppetLabs, whose products we use heavily for automation.

MaestroDev, an innovation leader in DevOps Orchestration, has been included in the list of “Cool Vendors in DevOps, 2012” report by Gartner, Inc.

And thanks to our great customers too!

Keith Campbell, CTO, Informatics, said “The Maestro product has automated our build process all the way through packaging. We are using our same toolset, but the Maestro Composition engine gives us consistency and speed that we did not have before. With Maestro, we are planning our development-cloud environment as well — reducing our build cost even further because we can dynamically integrate hybrid resources and external services into our workflows.”

You can check out the rest of the press release at the MaestroDev blog, and the Gartner Cool Vendors report.

Automatically download and install VirtualBox guest additions in Vagrant

So, are you already using Vagrant to manage your VirtualBox VMs?

Then you probably have realized already how annoying is to keep the VBox guest additions up to date in your VMs.

Don’t worry, you can update them with just one command or automatically on each start using the Vagrant-vbguest plugin.

Installation

Requires vagrant 0.9.4 or later (including 1.0)

Since vagrant v1.0.0 the prefered installation method for vagrant is using the provided packages or installers.

Therefore if you installed Vagrant as a package (rpm, deb, dmg,…)

vagrant gem install vagrant-vbguest

Or if you installed vagrant using RubyGems (gem install vagrant):

gem install vagrant-vbguest

Usage

By default the plugin will check what version of the guest additions is installed in the VM every time it is started with vagrant start. Note that it won’t be checked when resuming a box.

In any case, it can be disabled in the Vagrantfile

Vagrant::Config.run do |config|
  # set auto_update to false, if do NOT want to check the correct additions
  # version when booting this machine
  config.vbguest.auto_update = false
end

If it detects an outdated version, it will automatically install the matching version from the VirtualBox installation, located at

  • linux : /usr/share/virtualbox/VBoxGuestAdditions.iso
  • Mac : /Applications/VirtualBox.app/Contents/MacOS/VBoxGuestAdditions.iso
  • Windows : %PROGRAMFILES%/Oracle/VirtualBox/VBoxGuestAdditions.iso

The location can be overridden with the iso_path parameter in your Vagrantfile, and can point to a http server

Vagrant::Config.run do |config|
  config.vbguest.iso_path = "#{ENV['HOME']}/Downloads/VBoxGuestAdditions.iso"
  # or
  config.vbguest.iso_path = "http://company.server/VirtualBox/$VBOX_VERSION/VBoxGuestAdditions.iso"
end

If you have disabled the automatic update, it still easy to manually update the VirtualBox Guest Additions version, just running from the command line

vagrant vbguest

Learning Puppet or Chef? Check out Vagrant!

If you are starting to use Puppet or Chef, you must have Vagrant.

Learning Puppet can be a tedious task, getting up the master, agents, writing your first manifests,… A good way to start is using Vagrant, an Oracle VirtualBox command line automation tool, that allows easy Puppet and Chef provisioning on VirtualBox VMs.

Vagrant projects are composed by base boxes, specifically configured for Vagrant with Puppet/Chef, vagrant username and password, and anything else you may want to add, plus the configuration to apply to those base boxes, defined with Puppet or Chef. That way we can have several projects using the same base boxes shared where the only difference are the Puppet/Chef definitions. For instance a database VM and a web server VM can both use the same base box and just have different Puppet manifests, and when Vagrant starts them, it will apply the specific configuration. That also allows to share boxes and configuration files across teams, for instance having one base box with the Linux flavor used in a team, we can just have in source control the Puppet manifests to apply for the different configurations that anybody from Operations to Developers can use.

There is a list of available VMs or base boxes ready to use with Vagrant at www.vagrantbox.es. But you can build your own and share it anywhere, as they are just (big) VirtualBox VM files, easily using VeeWee, or changing a base box and rebundling it with vagrant package.

Usage

Once you have installed Vagrant and VirtualBox.

Vagrant init will create a sample Vagrantfile, the project definition file that can be customized.

$ vagrant init myproject

Then in the Vagrantfile you can change the default box settings, and add basic Puppet provisioning

config.vm.box = "centos-6"
config.vm.box_url = "https://vagrant-centos-6.s3.amazonaws.com/centos-6.box"

config.vm.provision :puppet do |puppet|
  puppet.manifests_path = "manifests"
  puppet.manifest_file = "site.pp"
end

In manifests/site.pp you can try any puppet manifest.

file { '/etc/motd':
  content => 'Welcome to your Vagrant-built virtual machine! Managed by Puppet.\n'
}

Vagrant up will download the box the first time, start the VM and apply any configuration defined in Puppet

$ vagrant up

vagrant ssh will open a shell into the box. Under the hood vagrant is redirecting the host port 2222 to the vagrant box 22

$ vagrant ssh

The vm can be suspended and resumed at any time

$ vagrant suspend
$ vagrant resume

and later on destroyed, which will delete all the VM files.

$ vagrant destroy

And then we can start again from scratch with vagrant up getting a completely new vm where we can make any mistakes 🙂

Introduction to Puppet

Enough about philosophical posts, let’s get started with some practical Puppet.

Manifests

Puppet configuration files are called manifests, written in a ruby-like DSL. Puppet provides types and functions to manage typical resources (files, services, users, groups,…) and new ones can be defined through extensions called modules.

The standard types that can be used are listed in the Puppet reference. There is a cheat sheet available (pdf) with the main ones.

The resources are grouped in classes, that can later be easily reused.

class 'maven' {
  exec { 'maven-untar':
    command => 'tar xf /tmp/x.tgz',
    cwd     => '/opt',
    creates => "/opt/apache-maven-${version}",
    path    => ["/bin"],
  } ->
  file { '/usr/bin/mvn':
    ensure => link,
    target => "/opt/apache-maven-${version}/bin/mvn",
  }
  file { '/usr/local/bin/mvn':
    ensure  => absent,
    require => Exec["maven-untar"],
  }
  file { "${home}/.mavenrc":
    mode    => '0600',
    owner   => $user,
    content => template('maven/mavenrc.erb'),
    require => User[$user],
  }
}

Infrastructure IS code, for example we can specify that we want the openssh-server package installed

package { 'openssh-server':
  ensure => present,
}

Declarative model

Puppet uses a declarative model, where we define state, not process. We define that a service must be running and puppet will start it if not running, or do nothing if it already is.

service { 'ntp':
  name   => 'ntpd',
  ensure => running,
}

There is no scripting, we don’t make the service start, just define whether it should be running. This is key to understand how puppet works. A side effect is that variables can only be assigned once, so they are pretty much like constants.

Architecture

Puppet is arranged in a master – agent architecture.  The master serves the manifests and files, and the agents poll the master at specific intervals of time to get their configuration. The master does not push anything into the client.

Agents identify with the master using SSL, so the first time an agent tries to connect to the master, the agent certificate needs to be approved (in the default configuration), and that’s usually a source of problems.

File structure

Puppet configuration files are usually in /etc/puppet.

The main files in there are manifests/site.pp which defines the configurations, and the manifests/nodes.pp that defines how those configurations apply to the different nodes or agents, based on their hostname, generally, or other properties.

Site

class 'dave' {
  user { 'dave':
    ensure     => present,
    uid        => '507',
    gid        => 'admin',
    shell      => '/bin/zsh',
    home       => '/home/dave',
    managehome => true,
  }
  file {'/tmp/test1':
    ensure  => present,
    content => "Hi.",
  }
}

Nodes

node 'someserver.domain.com' {
  class { 'dave': }
}

More information

More information about types, resources, manifests, variables,… at learning puppet from PuppetLabs.

Infrastructure as Code

DevOps is not about the tools

That’s true, in the same way that agile is not about the tools either, it’s a set of ideas, concepts, best practices,…

Nice, but… how can I successfully implement it?

Tools can enable change in behavior and eventually change culture [Patrick Debois]

Printer in 1568

The same way the Guttemberg printer was a tool that enabled a cultural change, or that Agile development wouldn’t be possible without Continuous Integration servers, DevOps relies on some tools to implement its principles.

Unfortunately, the same way everyone thinks of themselves as being intelligent enough, and every tool out there is magically cloud enabled, now every tool claims to be DevOps.

However, there is agreement that tools that allow us to deal with infrastructure as code are key on implementing DevOps concepts.

It’s all been invented already, now it’s standardized with tools like Chef or Puppet. Before, you could write your own scripts to automate server installation, configuration,… but everyone would do it their very own way.

Now there’s some common language used by Puppet or by Chef, that allows to share and reuse configuration as modules or recipes.

Infrastructure as Code, a key concept

The concept that infrastructure should be treated as code is really powerful. Server configuration, packages installed, relationships with other servers,… should be modeled with code to be automated and have a predictable outcome, removing manual steps prone to errors. Doesn’t sound bad, does it?

But new solutions bring new challenges, and when infrastructure is code we face the same problems faced by developers.

  • What version of the infrastructure are we using in production?
  • how can we ensure that when an issue is found it gets fixed and redeployed?
  • how can we test the infrastructure as we develop it?

That’s why when dealing with infrastructure as code we should follow development best practices.

For instance we can (and should!)

  • tag, branch and release the code that define our servers.
  • have a lifecycle that covers different stages through the infrastructure code, ie. dev, QA, production.
  • continuously test our infrastructure as we make changes.

Is DevOps killing the Operations team?

To make error is human. To propagate error to all server in automatic way is DevOpsHearing everywhere about DevOps and how it is all about automation, and how manual steps should be removed from Operations. Starting to worry about your OPs job?

On one hand, yes, you should worry.

My job is to make other people’s jobs unnecessary.

While I was working on Maven the goal was to automate and standardize all the build steps so there’s no more need to have a magician build master that is the only one that knows how to build the software. All Maven projects are built in the same way and there’s no need to do any manual step. That ended the build master job in many companies as they knew it. Those that were interested enough moved on to do more useful tasks, like setting up continuous integration servers, integrating new quality assurance tools, adding metrics,…

So, on the other hand, no, you shouldn’t worry as long as you want to explore new areas, because there’s still plenty to improve. Stop doing tedious manual tasks and focus on what’s really important.

You should just worry about the NOOPS guys 😉

DevOps: how we got here

Developer toolset

From the developer point of view, there are some tools involved in the source-to-deploy process

  • Source control management tools: Subversion, Git, Mercurial, Perforce,…
  • Build tools: Maven, Ant, Ivy, Buildr, Graddle, Rake,…
  • Continuous Integration tools: Continuum, Jenkins, Hudson, Bamboo,…
  • Repository (Artifact) management tools: Archiva, Nexus, Artifactory,…

The #1 programmer excuse for legitimately slacking off: My code is compiling

When everything is set together, we can have a CI schedule that is building automatically the changes from the SCM as they are committed, deploying to an artifact repository the result of the build or sending a notification if there is any error. Everything fully automated. A change is made to SCM, the CI server kicks in, builds and runs all sort of tests (unit, functional, integration,…) while you go off for a sword fight with your coworkers.

Now what? somebody sends by email the tarball, zipfile,… to the operations team? oh, no that would be too bad. Just send them the url to download it… And even better send some instructions, a changelog, upgrade task list,…

What developers do today to specify deployments and target environments is not enough. 

The simplest solutions are often the cleverest. They are also usually wrong.

Using tools like Maven in the Java world or Bundle in Rubyland you can explicitly list all the dependencies and versions you need. But there are some critical dependencies that are never set.

It is just too simple.

Packages installed, C libraries, databases, all sort of OS and service level configuration,… That’s the next level of dependencies that should be explicitly listed and automated.

For example, think about versions of libc, postgresql, number of connections allowed, ports opened, opened file descriptors limit,…

Operations

Requirements

From the point of view of the operations team the number of requirements is complex: operating system, kernel version, config files, packages installed,…

And then multiply that for several stage configurations that most likely won’t have the exact same configurations.

  • dev
  • QA
  • pre-production
  • production

Deployment

Deployment of the artifacts produced by the development team is always a challenge

  • How do I deploy this?
  • Reading the documentation provided by the development team?
  • Executing some manual steps?

That is obviously prone to errors

Cloud

It’s nothing new but it has increased with the proliferation of Cloud based environments, making it easier and easier to run dozens or hundreds of servers at any point in time. Even knowing how to deploy to one server, how is it deployed to all those servers? what connections need to be established between servers? how is it going to affect the network?