Cheap backups with Amazon Glacier

Last week Amazon announced Amazon Glacier, where you can have files stored at $0.01 per GB / month, quite a good deal, considering that S3 goes for $0.093 GB/month with reduced redundancy, or Dropbox at its best is 0.825/GB committing to 100GB for a full year, although obviously they fill very different use cases.

To get that pricing there are some drawbacks that make it only useful for storing files that don’t need to be retrieved often, ie. backups for disaster recovery. Downloading or listing files in Glacier take more than 4 hours, so that gives you an idea. Behind the scenes it uses Amazon SQS (Simple Queue Service) and SNS (Simple Notification Service) to handle the download and inventory requests, so you can do extra things like getting emails when your requests are ready.

I have created glacier-cli using the Java API to upload, download, delete and list files stored in Glacier from the command line, as Amazon only provides the APIs for now and some examples. Make sure you save the output when uploading the files, as you will need the ids of the files later on when you need to download them.

Get the code from GitHub.

Glacier-CLI

Building

mvn clean package

Configuration

Create $HOME/AwsCredentials.properties with your AWS keys

secretKey=…
accessKey=…

Commands

  • upload vault_name file1 file2 …
  • download vault_name archiveId output_file
  • delete vault_name archiveId
  • inventory vault_name

Command line options

 -output <file_name>   File to save the inventory to. Defaults to 'glacier.json'
 -queue <queue_name>   SQS queue to use for inventory retrieval. Defaults to 'glacier'
 -region <region>      Specify URL as the web service URL to use. Defaults to 'us-east-1'
 -topic <topic_name>   SNS topic to use for inventory retrieval. Defaults to 'glacier'

Examples

Upload file1 and file2 to vault pictures

java -jar glacier-1.0-jar-with-dependencies.jar upload pictures file1 file2

Download archive with id xxx from vault pictures to file pic.tar (takes >4 hours)

java -jar glacier-1.0-jar-with-dependencies.jar download pictures xxx pic.tar

Delete archive with id xxx from vault pictures

java -jar glacier-1.0-jar-with-dependencies.jar delete pictures xxx

Get the inventory for vault pictures (takes >4 hours)

java -jar glacier-1.0-jar-with-dependencies.jar inventory pictures

Upload file1 and file2 to vault pictures in Europe region

java -jar glacier-1.0-jar-with-dependencies.jar -region eu-west-1 upload pictures file1 file2

Introduction to Amazon Web Services Identity and Access Management

Using AWS Identity and Access Management you can create separate users and permissions to use any AWS service, for instance EC2, and avoid giving other people your Amazon username, password or private key.

You can set very granular permissions, on users, groups, specific resources, and a combination of them. It will become really complex soon! But there are several very common use cases, that IAM is useful for. For instance having a AWS account for a team of developers.

Getting started

You can go through the Getting Started Guide, but I’ll save you some time:

Download IAM command line tools

Store your AWS credentials in a file, ie. ~/account-key

AWSAccessKeyId=AKIAIOSFODNN7EXAMPLE
AWSSecretKey=wJalrXUtnFEMI/K7MDENG/bPxRfiCYzEXAMPLEKEY

Configure environment variables

export AWS_IAM_HOME=<path_to_cli>
export PATH=$AWS_IAM_HOME/bin:$PATH
export AWS_CREDENTIAL_FILE=~/account-key

Creating an admin group

When you have IAM setup, the next step is to create an Admins group where you can add yourself

iam-groupcreate -g Admins

Create a policy in a file, ie. MyPolicy.txt

{
   "Statement":[{
      "Effect":"Allow",
      "Action":"*",
      "Resource":"*"
      }
   ]
}

Upload the policy

iam-groupuploadpolicy -g Admins -p AdminsGroupPolicy -f MyPolicy.txt

Creating an admin user

Create an admin user with

iam-usercreate -u YOUR_NAME -g Admins -k -v

The response looks similar to this:

AKIAIOSFODNN7EXAMPLE
wJalrXUtnFEMI/K7MDENG/bPxRfiCYzEXAMPLEKEY
arn:aws:iam::123456789012:user/YOUR_NAME
AIDACKCEVSQ6C2EXAMPLE

The first line is your Access Key ID; the second line is your Secret Access Key. You need to save these IDs.

Save your Access Key ID and your Secret Access Key to a file called for instance ~/YOUR_NAME_cred.txt. You can use those credentials from now on instead of the global AWS credentials for the whole account.

export AWS_CREDENTIAL_FILE=~/YOUR_NAME_cred.txt

Creating a dev group

Let’s create an example dev group where the users will have only read access to EC2 operations.

 iam-groupcreate -g dev

Now we need to set the group policy to allow all EC2 Describe* actions, which are the ones that allow users to see data, but not to change it. Create a file MyPolicy.txt with these contents

{
  "Statement": [
     {
       "Sid": "EC2AllowDescribe",
       "Action": [
         "ec2:Describe*"
       ],
       "Effect": "Allow",
       "Resource": "*"
     }
   ]
 }

Now upload the policy

iam-groupuploadpolicy -g dev -p devGroupPolicy -f MyPolicy.txt

Creating dev users

To create a new AWS user under the dev group

iam-usercreate -u username -g dev -k -v

Create a login profile for the user to log into the web console

iam-useraddloginprofile -u username -p password

The user can now access the AWS console at

https://your_AWS_Account_ID.signin.aws.amazon.com/console/ec2

Or you can make life easier by creating an alias

 iam-accountaliascreate -a maestrodev

and now the console is available at

https://maestrodev.signin.aws.amazon.com/console/ec2

About Policies

AWS policy files can be really complex. The AWS Policy Generator will help as a start point and see what actions can be used, but it won’t help you making them easier to read (using wildcards) or applying them to specific resources. Amazon could have provided a better generator tool allowing you to choose your own resources (users, groups, S3 buckets,…) from a easy to use interface and not having to lookup all sorts of crazy AWS identifiers. Hopefully they will be provide a comprehensive tool as part of the AWS Console.

There is more information available at the IAM User Guide.

Update

Just after I wrote this post Amazon has made IAM available in the AWS management console, which makes using IAM way easier.

Javagruppen 2011: Build and test in the cloud slides

Last week spent some good days in Denmark for Javagruppen annual conference as I mentioned in a previous post. It’s a small conference that allows you to cover any question that the attendees have and be able to select what you talk about based on their specific interests.

I talked about creating an Apache Continuum + Selenium grid on EC2 for massively multi-environment and parallel build and test. You can find the slides below, although it’s mostly a talk/visual presentation.

The location was great, in a hotel with spa in Jutland and very nice people and the other speakers too. My advice, go to Denmark, but try to do it in summer :) I’m sure it makes a difference – although it’s pretty cool to be on a hot tub outside at 0C (32F)

And you can find some trip pictures in flickr.

Nyhavn panorama

Nyhavn panorama

Speaking at Javagruppen, the Danish JUG annual conference

The guys at Javagruppen, the Danish JUG, are doing their annual conference on February 11th and 12th.

The theme for this year is “Java, a cloudy affair”, and I’ll be speaking on building and testing in the cloud, using Apache Maven, Continuum, TestNG, Selenium,… and how to take full advantage of cloud features for software development, aligned with my previous talks.

This year the conference will be in a 5-star hotel and spa in the middle of Denmark, and I gotta say I look forward to it, seems they know how to choose a location (last year they did it at a Castle).

You can still sign up if you want to go.

Comwell Kellers Park

InterOp New York and ApacheCON Atlanta

One week ago I was at InterOp New York, where we announced the release of Maestro 3, and talked to people attending the conference and analysts on the product and our ideas behind it, and we got some coverage on Maestro 3, including a video interview with InformationWeek, which covers some of the ideas behind the product, not all of them given the format and duration of the interview, but I thought it was worth posting it anyway.

We are doing webinars this Wednesday 3rd and 10th, showcasing how it may help organizations improve their build-test-release-deploy process, so if you are interested you can just check the times and sign up on our Build Through Deploy in a Single Interface – Introduction to Maestro 3 page.

Tomorrow I’ll be in Atlanta for ApacheCON, along with Brett Porter that is doing a training on Maven today. I’ll be at the BarCamp on Tuesday and the Maven Meetup on Wednesday for sure. BTW BarCamp and Meetups are free for everybody, so if you are in the Atlanta area you can just come by even if you don’t attend the show.

At Interop NYC, Maestro Dev showcased its latest Maestro 3 app dev choreography environment — a cloud-based solution that stitches together the discreet steps (build, test, deploy, etc.) of application development, using existing tools for each step

Cloud Computing opportunities in the development (build-test-deploy) space

You heard the word “cloud” everywhere, running applications on the cloud, scaling with the cloud,… but not so often from the development lifecycle perspective: code, commit, test, deploy to QA, release, etc but it brings fundamental changes to this aspect too.

The scenario

If you belong to, or manage, a group of developers, you are doing at least some sort of automated builds with continuous integration. You have continuous integration servers, building your code on specific schedules, not as often as you would like, when developers commit changes. The number of projects grow and you add more servers for the new projects, mixing and matching environments for different needs (linux, windows, os x,…)

The problem and the solution

The architecture we use for our Maestro 3 product is composed of one server that handles all the development lifecycle assets. Behind the scenes we use proven open source projects: Apache Continuum for distributed builds, Apache Archiva for repository management, Sonar for reporting, Selenium for multienvironment integration and load testing. And we add Morph mCloud private cloud solution, which is also based on open source projects such as Eucalyptus or Puppet.

We have multiple Continuum build agents doing builds, and multiple Selenium agents for webapp integration testing, as well as several application servers for continuous deployment and testing.

  • limited capacity
    • problem: your hardware is limited, and provision and setup of new servers requires a considerable amount of time
    • solution: assets are dynamic, you can spin off new virtual machines when you need them, and shuffle infrastructure around in a matter of minutes with just few clicks from the web interface. The hybrid cloud approach means you can start new servers in a public cloud, Amazon EC2, if you really need to
  • capacity utilization
    • problem: you need to setup less active projects in the same server as more active ones to make sure servers are not under/over-utilized
    • solution: infrastructure is shared across all projects. If a project needs it more often than another then it’s there to be used
  • scheduling conflicts
    • problem: at specific times, i.e. releases, you need to stop automatic jobs to ensure resources are available for those builds
    • solution: a smart queue management can differentiate between different build types (ie. continuous builds, release builds) and prioritize
  • location dependence
    • problem: you need to manage the infrastructure, knowing where each server is and what is it building
    • solution: a central view of all the development assets for easier management: build agents, test agents or application servers
  • continuous growing
    • problem: new projects are being added while you are trying to manage and optimize your current setup
    • solution: because infrastructure is shared adding new projects is just a matter of scaling wide the cloud, without assigning infrastructure to specific projects
  • complexity in process
    • problem: multiply that for the number of different stages in your promotion process: development environment, QA, Staging, Production
    • solution: you can keep independent networks in the cloud while sharing resources like virtual machine templates for your stack for instance
  • long time-to-market
    • problem: transition from development to QA to production is a pain point because releases and promotion is not automated
    • solution: compositions (workflows) allow to design and automate the steps from development to production, including manual approval
  • complexity in organization:
    • problem: in large organizations, multiply that for the number of separate entities, departments or groups that have their own separate structure
    • solution: enabling self provisioning you can assign quotas to developers or groups to start servers as they need them in a matter of minutes from prebuilt templates

Why a private cloud?

  • cost effectiveness: development infrastructure is running continuously. Global development teams make use of it 24×7
  • bandwidth usage: the traffic between your source control system and the cloud can be expensive, because it’s continuously getting the code for building
  • security restrictions: most companies don’t like their code being exported anywhere outside their firewall. Companies that need to comply with regulations (ie. PCI) have strong requirements on external networks
  • performance: in a private cloud you can optimize the hardware for your specific scenario, reducing the number of VMs needed for the same job compared to public cloud providers
  • heterogeneous environments: if you need to develop in different environments then there are chances that the public cloud service won’t be able to provide them

The new challenges

  • parallelism, you need to know the dependencies between components to know what needs to be built before what
  • stickyness, or how to take advantage of the state of the agents to start builds in the same ones if possible, ie. agents that built a project before can do a source control update instead of a checkout, or have the dependencies already in the filesystem
  • asset management, when you have an increasing number of services running, stoping and starting as needed, you need to know what’s running and where, not only at hardware level but at service level: build agents, test agents and deployment servers.

The new vision

  • you can improve continuous integration as developers checkin code because the barrier to add new infrastructure is minimal, given you have enough hardware in your cloud or if you use external cloud services, which means reduced time to find problems
  • developers have access to infrastructure they need to do their jobs, for instance start an exact copy of the production environment to fix an issue by using a cloud template that they can get up and running in minutes and tear down at the end, not incurring in high hardware costs
  • less friction and easier interaction with IT people as developers can self provision infrastructure, if necessary shuffling virtual machines that they no longer need for the ones they needed

By leveraging the cloud you can solve existing problems in your development lifecycle and at the same time you will be able to do things that you would not even consider because the technology made it complicated or impossible to do. Definitely something worth checking out for large development teams.

Maestro 3 is going to be released this week at InterOp New York (come over and say hi if you are around) but we are already demoing the development snapshots to clients and at conferences like JavaOne.

Maven, Amazon EC2 and SpringSource Cloud Foundry

You may have heard about the just announced SpringSource Cloud Foundry and how it is based on the CloudTools project, that includes a Maven plugin to deploy Java EE applications to Amazon EC2, starting the images as part of the build process.

Some time ago I started another Maven plugin, the Amazon EC2 Maven plugin, which allows you to start and stop EC2 AMIs as part of your build process. Unlike CloudTools, it’s a lower level plugin that can start any AMI, a very different goal.

My use case? starting Selenium Grid Remote Control images for different environments and browsers before the integration tests start, wait for the images to be online, run the integration tests, and shutdown the images. Check my previous Enterprise Build and Test in the Cloud entry for more details.

You could also have your AMIs with your webserver, db,… pre-installed, start it, deploy using the Maven Cargo plugin to any container of your choice, and shutdown the image at the end of the tests.

The plugin allows all the configuration options than the EC2 API does, because it’s based on the Typica EC2 library. Start any number of images, associate elastic IPs, choose availability zones,…

Hope you find it useful.

JavaOne slides: Enterprise Build and Test in the Cloud

I have uploaded the slides from my talk Enterprise Build and Test in the Cloud at JavaOne in San Francisco.

You can check also the code, and an introduction in previous posts

Enterprise build and Test in the Cloud with Selenium I
and
Enterprise build and Test in the Cloud with Selenium II.

Follow me on twitter

JavaOne talk: Enterprise Build and Test in the Cloud

I’ll be presenting Enterprise Build and Test in the Cloud at JavaOne in San Francisco, Wednesday June 3rd 11:05am Esplanade 301 and will be around the whole week.

You can check the slides from the previous talk at ApacheCON, the code, and an introduction in previous posts

Enterprise build and Test in the Cloud with Selenium I
and
Enterprise build and Test in the Cloud with Selenium II.

Follow me on twitter

Enterprise Build and Test in the Cloud code available

The code accompanying the slides Enterprise Build and Test in the Cloud is available at the appfuse-selenium github page.

Provides a Selenium test environment for Maven projects, Appfuse as an example. Allows to run Selenium tests as part of the Maven build, either in an specific container and browser or launching the tests in parallel in several browsers at the same time.

For more information check my slides on Enterprise Build and Test in the Cloud and the blog entries Enterprise Build and Test in the Cloud with Selenium I and Enterprise Build and Test in the Cloud with Selenium II.

By default it’s configured to launch 3 browsers in parallel, Internet Explorer, Firefox 2 and 3

Check src/test/resources/testng.xml for the configuration.

In the single browser option you could do

  • Testing in Jetty 6 and Firefox

    • mvn install

  • Testing in Internet Explorer

    • mvn install -Pjetty6x,iexplore

  • Testing with any browser

    • mvn install
      -Pjetty6x,otherbrowser -DbrowserPath=path/to/browser/executable

  • Start the server (no tests
    running, good for recording tests)

    • mvn package cargo:start