DevOps: how we got here

Developer toolset

From the developer point of view, there are some tools involved in the source-to-deploy process

  • Source control management tools: Subversion, Git, Mercurial, Perforce,…
  • Build tools: Maven, Ant, Ivy, Buildr, Graddle, Rake,…
  • Continuous Integration tools: Continuum, Jenkins, Hudson, Bamboo,…
  • Repository (Artifact) management tools: Archiva, Nexus, Artifactory,…

The #1 programmer excuse for legitimately slacking off: My code is compiling

When everything is set together, we can have a CI schedule that is building automatically the changes from the SCM as they are committed, deploying to an artifact repository the result of the build or sending a notification if there is any error. Everything fully automated. A change is made to SCM, the CI server kicks in, builds and runs all sort of tests (unit, functional, integration,…) while you go off for a sword fight with your coworkers.

Now what? somebody sends by email the tarball, zipfile,… to the operations team? oh, no that would be too bad. Just send them the url to download it… And even better send some instructions, a changelog, upgrade task list,…

What developers do today to specify deployments and target environments is not enough. 

The simplest solutions are often the cleverest. They are also usually wrong.

Using tools like Maven in the Java world or Bundle in Rubyland you can explicitly list all the dependencies and versions you need. But there are some critical dependencies that are never set.

It is just too simple.

Packages installed, C libraries, databases, all sort of OS and service level configuration,… That’s the next level of dependencies that should be explicitly listed and automated.

For example, think about versions of libc, postgresql, number of connections allowed, ports opened, opened file descriptors limit,…

Operations

Requirements

From the point of view of the operations team the number of requirements is complex: operating system, kernel version, config files, packages installed,…

And then multiply that for several stage configurations that most likely won’t have the exact same configurations.

  • dev
  • QA
  • pre-production
  • production

Deployment

Deployment of the artifacts produced by the development team is always a challenge

  • How do I deploy this?
  • Reading the documentation provided by the development team?
  • Executing some manual steps?

That is obviously prone to errors

Cloud

It’s nothing new but it has increased with the proliferation of Cloud based environments, making it easier and easier to run dozens or hundreds of servers at any point in time. Even knowing how to deploy to one server, how is it deployed to all those servers? what connections need to be established between servers? how is it going to affect the network?

About: DevOps

This is the start of a (hopefully) series of posts about DevOps, based on my presentation From Dev to DevOps.

The Agile movement stablished a series of development practices quite common nowadays, or at least highly desired:

  • planning
  • iterative development
  • continuous integration
  • release soon, release often

But what happens after the development cycle? The process of building software or products does not stop there. Even if you are an agile shop, you may hit a wall when trying to move between Development, QA and Operations.

DevOps is a door in that wall.

DevOps is the intersection between Development, QA and Operations. You can also think of it as DevQAOps.

Old issues fixed, new issues come up

As some issues were solved by Agile methodologies, new issues arose. Agile development generates a higher number of releases (release early/release often) than waterfall methodologies, and that pushes the problem from development teams to operations, that have to deal with more frequent deliveries to production.

DevOps addresses

Fear of change

Once a product is working, changes are seen as a risk by business, developers, QA, operations,… Reducing the time to make (and revert!) changes increases the assurance to do more frequent changes.

Risky deployments

Automation, automation, automation! Not only the development needs to be automated. No more docs with instructions, manual steps, individual knowledge, “deployment master” or hacks. All the steps up to production need to be automated too, including deployments, to reduce risk of manual mistakes.

It works on my machine!

With the proliferation of virtualization, cloud, PaaS, stacks,… is absolutely necessary to be able to replicate environments for development, QA, pre-production,… Ensuring that proper level of testing is conducted as the product (software + environment) progresses through the lifecycle.

Siloisation

Teams split in Dev, QA, Ops,… cause walls to stand between them, and unnecessary friction, with each team pushing for its own benefit and requiring escalation of issues to account for the overall business good. Getting all the teams on the same table more often the friction is reduced.

Dev Change vs. Ops stability

Developers are usually measured by the number of features implemented, while operation teams are measured by the performance and stability on production. Two different goals that make both teams clash in many occasions. Bring issues early on and involve all the teams in the decisions.

Read more at What Is This Devops Thing, Anyway?

DevOps is a extension of Agile

Does this sound familiar?

Individuals and interactions over processes and tools

Working software over comprehensive documentation

Customer collaboration over contract negotiation

Responding to change over following a plan

Exactly, that’s the Agile manifesto, which is still perfectly valid, with some clarifications, if we get into detail:

  • Our highest priority is to satisfy the customer through early and continuous delivery of valuable software products.
  • Welcome changing requirements, even late in development any part of the DEV/QA/OPs cycle.  Agile processes harness change for the customer’s competitive advantage.
  • Deliver working software products frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale.
  • Business people and developers development/QA/Operations teams must work together daily throughout the project.
  • The most efficient and effective method of conveying information to and within a development DEV/QA/OPs team is face-to-face conversation.
  • Agile processes promote sustainable development. The sponsors, developers, QA and OPs teams, and users should be able to maintain a constant pace indefinitely.

What does that exactly mean

  • software/working software. The concept of product, meaning software running, in production
  • late in development. Late in the development, QA or operations phases, not just development
  • developers. Don’t consider only developers, QA and operations teams need to be involved too

The key part is the involvement of not just the development team but the QA and Operations teams.

Game theory

Game theory is defined as the method for analyzing calculated circumstances, such as in games, where a person’s success is based upon the choices of others. If you have watched A beautiful mind, you’ll remember the scene where he tries to pick up a girl at a bar.

What does it have to do with DevOps? The success of the Dev team greatly depends on actions done by the Ops team, good software deployed incorrectly will transmit the wrong impression. Likewise, the success of the Ops team definitely depends on the work of the Dev team, as poorly developed software will cause all sort of performance issues.

If all teams try to achieve their own benefit, it’s not necessarily a good thing for the overall business. Imagine the Ops team not pushing new features because they don’t want to risk the production stability. Or the Dev team pushing new features to production without proper testing because it’s a QA problem, not theirs, and are only measured on the amount of features pushed.

Working in a distributed team

Reading this post from James Governor No Need to Commute, Ever, that he started from a job offer from Genuitec on twitter, I felt like writing a bit about my experience working on distributed teams.

@genuitec No need to commute, ever. See more of your family, work with talented people: Genuitec is growing, developers apply today

That is exactly the Open Source community model. Back 7 years ago or so, it worked pretty well for me on the OSS projects I participated in. When later I joined Mergere, where we provided services on top of Apache Maven, it was pretty clear that if we wanted the best people we’d need to hire them wherever they where, and the advantage with OSS contributors is that there’s no need for a resume, you can see exactly how their work is. So there was people in the team working in Los Angeles, Sydney, Paris, Florida, Philippines,… a bit painful to get everybody at the same time if we wanted to, as we were all across the world, but that also ensured that the number of meetings and their length are reduced as everybody makes an effort to have offline communication in their best interest.

There’s a big difference between working remotely and a distributed team though. When some of the team members work remotely but a number of them don’t, you have an issue, there are gonna be interactions that happen in person that are not gonna be in the mailing lists, issue tracker, irc,… but when the whole team is distributed (or most of it anyway) all the communication will flow through the same channels, with the added plus that everything is documented and you can go back to previous conversations.

Working at the GooglePlex is nice, sure, but imagine what you can do with all those hours commuting to work, the ability to work from anywhere, eat at home, see your family more often,… does free food make up for that?

And what’s the advantage for the companies? You can reduce expenses, but more importantly the ability to offer employees something they won’t get in most of other companies, who wouldn’t prefer working remotely than having to go to an office? And access to great people over the world instead of a specific area.

Just make sure you take some of the cost savings and use them to get the whole team together as soon as possible, and for a few days from time to time. Getting beers together does help human interactions 🙂

New challenges from DevOps: development cycle for your infrastructure

One of the main ideas behind DevOps adoption is the concept of  “infrastructure as code”. Tools like Puppet or Chef allow you to programmatically define your infrastructure, the provisioning of your servers: what packages are installed, what is the content of files,…

If server provisioning is a key point in operations, then code management becomes key too once you start coding your servers. You need source control for your infrastructure, you need tags, versioning, dependencies between components,… You need development, testing, QA, release,… for your infrastructure!

Imagine a environment where you have some server stack running in production using Puppet, with a manifest that defines packages and files in that server, and many servers running the same configuration. That Puppet definition must be in source control.

Now a security fix or new version of package must be installed in all the servers, do you just want to change the manifest and push it out to all the running servers? doesn’t sound like a great idea, does it? Hey, we have been tuning development best practices over the years for use cases just like this one.

What you want to do is create a new branch where you can do that change, and test it in some server that is not in production, let’s call it development environment, original isn’t it?
The change works as expected, your app still works, great! now you can probably merge that branch of the Puppet manifests into trunk, with possibly other changes made by other people, that at some point you will want to test together, in a production-like environment, maybe with several servers in a cluster, load balancing, etc… and very importantly, with the next version of the application that is going to be deployed. You create a new tag and version to be able to identify it later and deploy to that environment, let’s call it QA or staging.

What all this cycle allows you to do is clearly define what is running in each environment, using versions, and easily find issues between deployments, using source control, being able to roll back to known working configuration if needed.

After all, if you deal with infrastructure as code you should use code development best practices, and you’ll get the same benefits.