Previously: (II) Architecture
In Maestro we typically use a Maestro master server and multiple Maestro agents. Each Maestro Agent is just a small service where the actual work happens, it processes the work sent by the master, via ActiveMQ, and executes the plugins with the data received.
The two main goals of the agent are load distribution and heterogeneous composition support. The more agents running, the more compositions that can be executed in parallel, and compositions can target specific agents based on its features, such as architecture, operating system,… which is a must for development environments. For simplicity each agent can only run one composition at a time, but you could have multiple agent processes running in a single server.
It uses Puppet Facter to gather the machine facts (operating system, memory size, cloud provider data,…) and sends all that information to the master, that can use it to filter what compositions run in the agent. For instance I may want to run a composition in a Windows agent, or in an agent that has some specific piece of software installed. Facter supports external facts so it is really easy to add new filtering capabilities, and not be just limited to what Facter provides out of the box. A small text file can be added to /etc/facter/facts.d/ and Facter would report it to the master server.
Agents are installed alongside with all the tools that may be needed, from Git, to clone repos, to Jenkins swarm to reuse the agents as Jenkins slaves, or mcollective agents to allow updating the agent itself automatically with Puppet when new manifests are deployed to the Puppet master. In our internal environment any commit to Puppet manifests or modules automatically trigger our rspec-puppet tests, the deployment of those manifests to the Puppet master, and a cascading Puppet update of all the machines in our staging environment using MCollective. All our Puppet modules are likewise built and tested on each commit and a new version published to the Puppet Forge automatically using rspec-puppet and Puppet Blacksmith.
Maestro also supports manually assigning agents to pools, and matching compositions with agent pools, so compositions can be limited to run in a predefined set of agents.
The agent process is written in Ruby and runs under JRuby in the JVM, thus supporting multiple operating systems and architectures, and the ability to write extensions in Java or Ruby easily. It connects to the master’s Composition Execution Engine through ActiveMQ using STOMP for messaging.
Plugins are small pieces of code written in Java or Ruby that run in the agent to execute the actual work. We have made all plugins available in GitHub so they can be used as examples to create new plugins for custom tasks.
Plugins can be added to Maestro at runtime and automatically show up in the composition editor. The plugin manifest defines the plugin images, what tasks are defined, and what fields in each task. Based on the workload received, the agent downloads and executes the plugin, which just accesses the fields in the workload and do the actual work, whatever it might be, sending output back to LuCEE and populating the composition context.
For instance the Fog plugin can manage multiple clouds, such as EC2, where it can start and stop instances. The plugin receives the fields defined in the composition (credentials, image id,…), calls the EC2 API, streams the status to the Maestro output (successfully created, instance id,…) and puts some data (ids of the instances created, public ips,…) in the composition context for other tasks to use. All of that in less than 100 lines of code.
The context is important to avoid redefining field values and provide some meaningful defaults, so if you have a provision task and a deprovision task, the values in the the latter are inherited from the former.
Agent cloud manager
The agent cloud manager is a service that runs on Google Compute Engine and watches a number of Maestro installations to provide automatic agent scaling. Based on preconfigured parameters such as min/max number of agents for each agent pool, max waiting time,… and the current status of each agent pool queue, the service can start new machines from specific images, suspend them (destroy the instance but keep the disk), or completely destroy them.
We are also giving a try to Docker instead of using full vms and have created a couple interesting Docker images on CentOS for developers, a Jenkins swarm slave image and a build agent image that includes everything we use at development: Java, Ant, Maven, RVM (with 1.9, 2.0, 2.1, JRuby), Git, Svn, all configurable with credentials at runtime.