Building a TripIt Visualizer in a few hours with Google Antigravity

We travel to see the world, but we track our travels to remember it. For years, I relied on TripIt to keep my travel history organized. But the API is no longer available and in their UI is hard to find older trips and get cool stats and visualizations.

Instead of losing that history, I spent a recent holiday afternoon building a custom solution: a TripIt Data Visualization site.

The initial motivation was simple: data ownership. I knew I could still get a JSON export of my data (thanks to GDPR), but a JSON file isn’t exactly “visual.”

🌍 See it in Action (Try the Sample Data)

You don’t need your own data export to see how it works. I’ve included a “Sample Data” mode so anyone can explore the dashboard immediately.

👉 Check out the demo here: tripit.csanchez.org

A web dashboard titled "Viewing Sample Data" showing four key statistics cards: 23 total trips, 34 flights (123,420 miles), 23 unique countries visited, and 206 days traveling. Below the stats is a dark world map with visited countries like the USA, Brazil, Australia, and parts of Europe highlighted in bright blue.
A dark-themed interactive world map displaying "Flight Paths." Numerous purple arced lines connect cities across the globe, showing a dense network of flight routes primarily originating from Europe and connecting to North America, South America, Africa, and Asia.
A "Yearly Statistics" bar chart tracking travel data from 2007 to 2026. The chart uses multi-colored bars to compare Days, Flights, Miles, and Trips per year, showing a significant peak in travel around 2019 and a visible dip during 2020-2021.
A clean user interface showing a list of "Past Trips" with cards for Auckland Nature 2025, Sydney NYE 2025, Oslo Summer 2024, and Summer in Europe 2024. Each card displays the destination, date range, duration in days, and number of flights.

The Goal: Beyond the Itinerary

I wanted more than just a list of past trips. I wanted a comprehensive dashboard that felt like a mix of the best travel apps out there:

  • Countries Visited: A high-level view of global coverage.
  • Deep Statistics: Yearly and monthly breakdowns of flights, miles, and airline preferences.
  • Interactive Flight Paths: View every take-off and landing as a beautiful arc on a global map.
  • Country Tracking: Automatically highlights every country you’ve visited based on your trip history.
  • Deep-Dive Statistics: Automated breakdowns of your travels by year, month, and even specific airlines.
  • Privacy-First Architecture: Your data never leaves your browser. There is no backend server storing your history; it’s all processed locally using your tripit export.
  • Zero Friction: Don’t have your data yet? You can explore every feature using the Gemini generated Sample Data .

The Secret Sauce: Building with Google Antigravity & Gemini

The most remarkable part of this project wasn’t the code itself, but how fast it came together. The entire project took just a few hours, and complex visualizations were trivial to add. Which also hooks you into adding more and more features as it is too easy!

I used the Antigravity browser, which allowed for a feedback loop with Gemini. Instead of manually debugging CSS or layout issues, I could:

  1. Ask Gemini for a design change or a new feature.
  2. The AI would “see” the current state of the app via screenshots.
  3. It would provide the fix or the code block instantly based on the visual context.

This “visual-first” development meant I could spend more time on the logic of the data and less time wrestling with the UI.

Privacy First: Your Data Stays Yours

I wanted to ensure this tool was 100% private.

  • Since there is no API, you need to request a JSON export from TripIt (thank you, GDPR!).
  • Your data never leaves your browser. It’s processed locally, visualized, and saved for the next time you open the page.

📁 GitHub: carlossg/tripit-view

Managing the Machine: A Practical Look at Google Antigravity

By now, most of us have experimented with AI coding. But while tools like Cursor have mastered the “co-pilot” experience—keeping you in the flow of writing code—Google Antigravity (released late 2025) is trying something different. It is an agentic IDE where your primary job is reviewing “Artifacts” rather than syntax.

One of the killer features is the Browser Integration.

The Browser as a “Feedback Loop”

In Antigravity, the built-in browser isn’t just for documentation; it’s a sandbox for the AI agents. When an agent implements a UI change or a new route, it doesn’t just hope the code works. It can:

  • Self-Verify: Open a headless window to check if a component actually renders.
  • Visual Debugging: Use DOM inspection to fix styling bugs that text-based models usually struggle with.
  • Artifact Generation: It produces browser recordings and screenshots so you can verify the “Mission” was successful without having to manually run the build yourself.

It’s less about “the AI browsing the web” and more about the AI having a way to prove its work to you before you merge.


The Reality of the Free Tier: Knowing Your Quotas

The biggest change in the 2026 version of Antigravity is the move to Weekly Quotas. Instead of a daily reset that might cut you off mid-task, you now have a larger “bucket” of usage that refreshes weekly.

Here is how those limits are structured for Free Tier users:

CategoryQuota TypeKey Detail
Gemini ModelsDedicated per ModelGemini 3 Pro and Gemini 3 Flash each have their own separate weekly allowance. If you run out of Pro, you can still use Flash for lighter tasks.
Claude ModelsShared PoolAll non-Google models (Claude 3.5/4.5 Sonnet and Opus) share a single combined bucket. High-intensity work with Opus will drain the same quota you use for Sonnet.
CompletionsUnlimitedStandard inline tab-completions do not count toward your weekly agent/chat limits.

The Strategy: Staggering Your Usage

Because of this “separate vs. shared” structure, the smartest way to use Antigravity is to stagger your models:

  1. Use Gemini 3 Pro for your initial architectural planning and browser-heavy UI tasks.
  2. Switch to the Claude pool only when you need a “second opinion” on complex logic or specific refactoring patterns where Anthropic’s reasoning shines.
  3. Save Gemini 3 Flash for documentation questions and basic unit test generation to preserve your “Heavy Reasoning” buckets.

Final Verdict

Antigravity isn’t a “Cursor killer”—it’s a different workflow. While Cursor is for when you want to be the pilot, Antigravity is for when you want to be the air traffic controller. Just keep a close eye on your Claude shared pool; if you’re not careful, a single complex feature build on Monday can leave you without Claude access for the rest of the week.

Self-Healing Rollouts: Automating Production Fixes with Agentic AI and Argo Rollouts

Rolling out changes to all users at once in production is risky—we’ve all learned this lesson at some point. But what if we could combine progressive delivery techniques with AI agents to automatically detect, analyze, and fix deployment issues? In this article, I’ll show you how to implement self-healing rollouts using Argo Rollouts and agentic AI to create a fully automated feedback loop that can fix production issues while you grab a coffee.

The Case for Progressive Delivery

Progressive Delivery is a term that encompasses deployment strategies designed to avoid the pitfalls of all-or-nothing deployments. The concept gained significant attention after the CrowdStrike incident, where a faulty update took down a substantial portion of the internet. Their post-mortem revealed a crucial lesson: they should have deployed to progressive “rings” or “waves” of customers, with time between deployments to gather metrics and telemetry.

The key principles of progressive delivery are:

  • Avoiding downtime: Deploy changes gradually with quick rollback capabilities
  • Limiting the blast radius: Only a small percentage of users are affected if something goes wrong
  • Shorter time to production: Safety nets enable faster, more confident deployments

As I like to say: “If you haven’t automatically destroyed something by mistake, you’re not automating enough.”

Progressive Delivery Techniques

Rolling Updates

Kubernetes provides rolling updates by default. As new pods come up, old pods are gradually deleted, automatically shifting traffic to the new version. If issues arise, you can roll back quickly, affecting only the percentage of traffic that hit the new pods during the update window.

Blue-Green Deployment

This technique involves deploying a complete copy of your application (the “blue” version) alongside the existing production version (the “green” version). After testing, you switch all traffic to the new version. While this provides quick rollbacks, it requires twice the resources and switches all traffic at once, potentially affecting all users before you can react.

Canary Deployment

Canary deployments offer more granular control. You deploy a new version alongside the stable version and gradually increase the percentage of traffic going to the new version—perhaps starting with 5%, then 10%, and so on. You can route traffic based on various parameters: internal employees, IP ranges, or random percentages. This approach allows you to detect issues early while minimizing user impact.

Feature Flags

Feature flags provide even more granular control at the application level. You can deploy code with new features disabled by default, then enable them selectively for specific user groups. This decouples deployment from feature activation, allowing you to:

  • Ship faster without immediate risk
  • Enable features for specific customers or user segments
  • Quickly disable problematic features without redeployment

You can implement feature flags using dedicated services like OpenFeature or simpler approaches like environment variables.

Progressive Delivery in Kubernetes

Kubernetes provides two main architectures for traffic routing:

Service Architecture

The traditional approach uses load balancers directing traffic to services, which then route to pods based on labels. This works well for basic scenarios but lacks flexibility for advanced routing.

Ingress Architecture

The Ingress layer provides more sophisticated traffic management. You can route traffic based on domains, paths, headers, and other criteria, enabling fine-grained control essential for canary deployments. Popular ingress controllers include:

Enter Argo Rollouts

Argo Rollouts is a Kubernetes controller that provides advanced deployment capabilities including blue-green deployments, canary releases, analysis, and experimentation. It’s a powerful tool for implementing progressive delivery in Kubernetes environments.

How Argo Rollouts Works

The architecture includes:

  1. Rollout Controller: Manages the deployment process
  2. Rollout Object: Defines the deployment strategy and analysis configuration
  3. Analysis Templates: Specify metrics and success criteria
  4. Replica Sets: Manages stable and canary versions with automatic traffic shifting

When you update a Rollout, it creates separate replica sets for stable and canary versions, gradually increasing canary pods while decreasing stable pods based on your defined rules. If you’re using a service mesh or advanced ingress, you can implement fine-grained routing—sending specific headers, paths, or user segments to the canary version.

Analysis Options

Argo Rollouts supports various analysis methods:

  • Prometheus: Query metrics to determine rollout health
  • Datadog: Integration with Datadog monitoring
  • Kubernetes Jobs: Run custom analysis logic—check databases, call APIs, or perform any custom validation

The experimentation feature is particularly interesting. We considered using it to test Java upgrades: deploy a new Java version, run it for a few hours gathering metrics on response times and latency, then decide whether to proceed with the full rollout—all before affecting real users.

Adding AI to the Mix

Now, here’s where it gets interesting: what if we use AI to analyze logs and automatically make rollout decisions?

The AI-Powered Analysis Plugin

I developed a plugin for Argo Rollouts that uses Large Language Models (specifically Google’s Gemini) to analyze deployment logs and make intelligent decisions about whether to promote or rollback a deployment. The workflow is:

  1. Log Collection: Gather logs from stable and canary versions
  2. AI Analysis: Send logs to an LLM with a structured prompt
  3. Decision Making: The AI responds with a promote/rollback recommendation and confidence level
  4. Automated Action: Argo Rollouts automatically promotes or rolls back based on the AI’s decision

The prompt asks the LLM to:

  • Analyze canary behavior compared to the stable version
  • Respond in JSON format with a boolean promotion decision
  • Provide a confidence level (0-100%)

For example, if the confidence threshold is set to 50%, any recommendation with confidence above 50% is executed automatically.

The Complete Self-Healing Loop

But we can go further. When a rollout fails and rolls back, the plugin automatically:

  1. Creates a GitHub Issue: The LLM generates an appropriate title and detailed description of the problem, including log analysis and recommended fixes
  2. Assigns a Coding Agent: Labels the issue to trigger agents like JulesGitHub Copilot, or similar tools
  3. Automatic Fix: The coding agent analyzes the issue, creates a fix, and submits a pull request
  4. Continuous Loop: Once merged, the new version goes through the same rollout process

Live Demo Results

In my live demonstration, I showed this complete workflow in action:

Successful Deployment: When deploying a working version (changing from “blue” to “green”), the rollout progressed smoothly through the defined steps (20%, 40%, 60%, 80%, 100%) at 10-second intervals. The AI analyzed the logs and determined: “The stable version consistently returns 100 blue, the canary version returns 100 green, both versions return 200 status codes. Based on the logs, the canary version seems stable.”

Failed Deployment: When deploying a broken version that returned random colors and threw panic errors, the system:

  • Detected the issue during the canary phase
  • Automatically rolled back to the stable version
  • The AI analysis identified: “The canary version returns a mix of colors (purple, blue, green, orange, yellow) along with several panic errors due to runtime error index out of range with length zero”
  • Provided a confidence level of 95% that the deployment should not be promoted
  • Automatically created a GitHub issue with detailed analysis
  • Assigned the issue to Jules (coding agent)
  • Within 3-5 minutes, received a pull request with a fix

The coding agents (I demonstrated both Jules and GitHub Copilot) analyzed the code, identified the problem in the getColor() function, fixed the bug, added tests, and created well-documented pull requests with proper commit messages.

Technical Implementation

The Rollout Configuration

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: canary-demo
spec:
  strategy:
    canary:
      analysis:
        templates:
          - templateName: canary-analysis-ai

The Analysis Template

The template configures the AI plugin to check every 10 seconds and require a confidence level above 50% for promotion:

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: canary-analysis-ai
spec:
  metrics:
    - name: success-rate
      interval: 10s
      successCondition: result > 0.50
      provider:
        plugin:
          argoproj-labs/metric-ai:
            model: gemini-2.0-flash
            githubUrl: https://github.com/carlossg/rollouts-demo
            extraPrompt: |
              Ignore color changes.

Agent-to-Agent Communication

The plugin supports two modes:

  1. Inline Mode: The plugin directly calls the LLM, makes decisions, and creates GitHub issues
  2. Agent Mode: Uses agent-to-agent (A2A) communication to call specialized agents with domain-specific knowledge and tools

The native mode is particularly powerful because you can build agents that understand your specific problem space, with access to internal databases, monitoring tools, or other specialized resources.

The Future of Self-Healing Systems

This approach demonstrates the practical application of AI agents in production environments. The key insight is creating a continuous feedback loop:

  1. Deploy changes progressively
  2. Automatically detect issues
  3. Roll back when necessary
  4. Generate detailed issue reports
  5. Let AI agents propose fixes
  6. Review and merge fixes
  7. Repeat

The beauty of this system is that it works continuously. You can have multiple issues being addressed simultaneously by different agents, working 24/7 to keep your systems healthy. As humans, we just need to review and ensure the proposed fixes align with our intentions.

Practical Considerations

While this technology is impressive, it’s important to note:

  • AI isn’t perfect: The agents don’t always get it right on the first try (as demonstrated when the AI ignored my instruction about color variations)
  • Human oversight is still crucial: Review pull requests before merging
  • Start simple: Begin with basic metrics before adding AI analysis
  • Tune your confidence thresholds: Adjust based on your risk tolerance
  • Monitor the monitors: Ensure your analysis systems are reliable

Getting Started

If you want to implement similar systems:

  1. Start with Argo Rollouts: Learn basic canary deployments without AI
  2. Implement analysis: Use Prometheus or custom jobs for analysis
  3. Add AI gradually: Experiment with AI analysis for non-critical deployments
  4. Build the feedback loop: Integrate issue creation and coding agents
  5. Iterate and improve: Refine your prompts and confidence thresholds

Conclusion

Progressive delivery isn’t new, but combining it with agentic AI creates powerful new possibilities for self-healing systems. While we’re not at full autonomous production management yet, we’re getting closer. The technology exists today to automatically detect, analyze, and fix many production issues without human intervention.

As I showed in the demo, you can literally watch the system detect a problem, roll back automatically, create an issue, and have a fix ready for review—all while you’re having coffee. That’s the future I want to work toward: systems that heal themselves and learn from their mistakes.

Resources

Serverless Jenkins Pipelines with Google Cloud Run

jenkins-google-cloud-run

Jenkinsfile-Runner-Google-Cloud-Run project is a Google Cloud Run (a container native, serverless platform) Docker image to run Jenkins pipelines. It will process a GitHub webhook, git clone the repository and execute the Jenkinsfile in that git repository. It allows high scalability and pay per use with zero cost if not used.

This image allows Jenkinsfile execution without needing a persistent Jenkins master running in the same way as Jenkins X Serverless, but using the Google Cloud Run platform instead of Kubernetes.

Google Cloud Run vs Project Fn vs AWS Lambda

I wrote three flavors of Jenkinsfile Runner

The image is similar to the other ones. The main difference between Lambda and Google Cloud Run is in the packaging, as Lambda layers are limited in size and are expanded in /opt while Google Cloud Run allows any custom Dockerfile where you can install whatever you want in a much easier way.

This image is extending the Jenkinsfile Runner image instead of doing a Maven build with it as a dependency as it simplifies classpath magement.

Limitations

Max build duration is 15 minutes but we can use a timeout value up tos 60 minutes by using gcloud beta.

Current implementation limitations:

  • checkout scm does not work, change it to sh 'git clone https://github.com/carlossg/jenkinsfile-runner-example.git'

Example

See the jenkinsfile-runner-example project for an example.

When the PRs are built Jenkins writes a comment back to the PR to show status, as defined in the Jenkinsfile, and totally customizable.

Check the PRs at carlossg/jenkinsfile-runner-example

Extending

You can add your plugins to plugins.txt. You could also add the Configuration as Code plugin for configuration, example at jenkins.yaml.

Other tools can be added to the Dockerfile.

Installation

GitHub webhooks execution will time out if the call takes too long, so we also create a nodejs Google function (index.js) that forwards the request to Google Cloud Run and returns the response to GitHub while the build runs.

Building

Build the package

mvn verify 
docker build -t jenkinsfile-runner-google-cloud-run .

Publishing

Both the function and the Google Cloud Run need to be deployed.

Set GITHUB_TOKEN_JENKINSFILE_RUNNER to a token that allows posting PR comments. A more secure way would be to use Google Cloud Secret Manager.

export GITHUB_TOKEN_JENKINSFILE_RUNNER=... 
PROJECT_ID=$(gcloud config get-value project 2> /dev/null) 
make deploy

Note the function url and use it to create a GitHub webhook of type json.

Execution

To test the Google Cloud Run execution

URL=$(gcloud run services describe jenkinsfile-runner \ 
  --platform managed \ 
  --region us-east1 \ 
  --format 'value(status.address.url)') 

curl -v -H "Content-Type: application/json" ${URL}/handle \
  -d @src/test/resources/github.json

Logging

gcloud logging read \
  "resource.type=cloud_run_revision AND resource.labels.service_name=jenkinsfile-runner" \ 
  --format "value(textPayload)" --limit 100

or

gcloud alpha logging tail \
  "resource.type=cloud_run_revision AND resource.labels.service_name=jenkinsfile-runner" \ 
  --format "value(textPayload)"

GitHub events

Add a GitHub json webhook to your git repo pointing to the Google Cloud Function url than you can get with

gcloud functions describe jenkinsfile-runner-function \
  --format 'value(httpsTrigger.url)'

Testing

The image can be run locally

docker run -ti --rm -p 8080:8080 \
  -e GITHUB_TOKEN=${GITHUB_TOKEN_JENKINSFILE_RUNNER} \
  jenkinsfile-runner-google-cloud-run
curl -v -H "Content-Type: application/json" \
  -X POST http://localhost:8080/handle \
  -d @src/test/resources/github.json

More information in the Jenkinsfile-Runner-Google-Cloud-Run GitHub page.

Google Container Registry Service Account Permissions

21046548While testing Jenkins X I hit an issue that puzzled me. I use Kaniko to build Docker images and push them into Google Container Registry. But the push to GCR was failing with

INFO[0000] Taking snapshot of files...
error pushing image: failed to push to destination gcr.io/myprojectid/croc-hunter:1: DENIED: Token exchange failed for project 'myprojectid'. Caller does not have permission 'storage.buckets.get'. To configure permissions, follow instructions at: https://cloud.google.com/container-registry/docs/access-control

During installation Jenkins X creates a GCP Service Account based on the name of the cluster (in my case jx-rocks) called jxkaniko-jx-rocks with roles:

  • roles/storage.admin
  • roles/storage.objectAdmin
  • roles/storage.objectCreator

More roles are added if you install Jenkins X with Vault enabled.

A key is created for the service account and added to Kubernetes as secrets/kaniko-secret containing the service account key json, which is later on mounted in the pods running Kaniko as described in their instructions.

After looking and looking the service account and roles they all seemed correct in the GCP console, but the Kaniko build was still failing. I found a stackoverflow post claiming that the permissions were cached if you had a previous service account with the same name (WAT?), so I tried with a new service account with same permissions and different name and that worked. Weird. So I created a script to replace the service account by another one and update the Kubernetes secret.

ACCOUNT=jxkaniko-jx-rocks
PROJECT_ID=myprojectid

# delete the existing service account and policy binding
gcloud -q iam service-accounts delete ${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com
gcloud -q projects remove-iam-policy-binding ${PROJECT_ID} --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com --role roles/storage.admin
gcloud -q projects remove-iam-policy-binding ${PROJECT_ID} --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com --role roles/storage.objectAdmin
gcloud -q projects remove-iam-policy-binding ${PROJECT_ID} --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com --role roles/storage.objectCreator

# create a new one
gcloud -q iam service-accounts create ${ACCOUNT} --display-name ${ACCOUNT}
gcloud -q projects add-iam-policy-binding ${PROJECT_ID} --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com --role roles/storage.admin
gcloud -q projects add-iam-policy-binding ${PROJECT_ID} --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com --role roles/storage.objectAdmin
gcloud -q projects add-iam-policy-binding ${PROJECT_ID} --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com --role roles/storage.objectCreator

# create a key for the service account and update the secret in Kubernetes
gcloud -q iam service-accounts keys create kaniko-secret --iam-account=${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com
kubectl create secret generic kaniko-secret --from-file=kaniko-secret

And it did also work, so no idea why it was failing, but at least I’ll remember now how to manually cleanup and recreate the service account.

Serverless Jenkins Pipelines with Fn Project

jenkins-lambdaThe Jenkinsfile-Runner-Fn project is a Fn Project (a container native, cloud agnostic serverless platform) function to run Jenkins pipelines. It will process a GitHub webhook, git clone the repository and execute the Jenkinsfile in that git repository. It allows scalability and pay per use with zero cost if not used.

This function allows Jenkinsfile execution without needing a persistent Jenkins master running in the same way as Jenkins X Serverless, but using the Fn Project platform (and supported providers like Oracle Functions) instead of Kubernetes.

Fn Project vs AWS Lambda

The function is very similar to the one in jenkinsfile-runner-lambda with just a small change in the signature. The main difference between Lambda and Fn is in the packaging, as Lambda layers are limited in size and are expanded in /optwhile Fn allows a custom Dockerfile where you can install whatever you want in a much easier way, just need to include the function code and entrypoint from fnproject/fn-java-fdk.

Oracle Functions

Oracle Functions is a cloud service providing Project Fn function execution (currently in limited availability). jenkinsfile-runner-fn function runs in Oracle Functions, with the caveat that it needs a syslog server running somewhere to get the logs (see below).

Limitations

Current implementation limitations:

  • checkout scm does not work, change it to sh 'git clone https://github.com/carlossg/jenkinsfile-runner-fn-example.git'
  • Jenkinsfile must use /tmp for any tool that needs writing files, see the example

Example

See the jenkinsfile-runner-fn-example project for an example that is tested and works.

Extending

You can add your plugins to plugins.txt. You could also add the Configuration as Code plugin for configuration.

Other tools can be added to the Dockerfile.

Installation

Install Fn

Building

Build the function

mvn clean package

Publishing

Create and deploy the function locally

fn create app jenkinsfile-runner
fn --verbose deploy --app jenkinsfile-runner --local

Execution

Invoke the function

cat src/test/resources/github.json | fn invoke jenkinsfile-runner jenkinsfile-runner

Logging

Get the logs for the last execution

fn get logs jenkinsfile-runner jenkinsfile-runner \
$(fn ls calls jenkinsfile-runner jenkinsfile-runner | grep 'ID:' | head -n 1 | sed -e 's/ID: //')

Syslog

Alternatively, start a syslog server to see the logs

docker run -d --rm -it -p 5140:514 --name syslog-ng balabit/syslog-ng:latest
docker exec -ti syslog-ng tail -f /var/log/messages-kv.log

Update the function to send logs to the syslog server

fn update app jenkinsfile-runner --syslog-url tcp://logs-01.loggly.com:514

GitHub events

Add a GitHub json webhook to your git repo pointing to the function url.

More information in the Jenkinsfile-Runner-Fn GitHub page.

Running Jenkins Pipelines in AWS Lambda

jenkins-lambdaThe Jenkinsfile-Runner-Lambda project is a AWS Lambda function to run Jenkins pipelines. It will process a GitHub webhook, git clone the repository and execute the Jenkinsfile in that git repository. It allows huge scalability with 1000+ concurrent builds and pay per use with zero cost if not used.

This function allows Jenkinsfile execution without needing a persistent Jenkins master running in the same way as Jenkins X Serverless, but using AWS Lambda instead of Kubernetes. All the logs are stored in AWS CloudWatch and are easily accessible.

Why???

Why not?

I mean, it could make sense to run Jenkinsfiles in Lambda when you are building AWS related stuff, like creating an artifact and uploading it to S3.

Limitations

Lambda limitations:

  • 15 minutes execution time
  • 3008MB of memory
  • git clone and generated artifacts must fit in the 500MB provided

Current implementation limitations:

  • checkout scm does not work, change it to sh 'git clone https://github.com/carlossg/jenkinsfile-runner-lambda-example.git'
  • Jenkinsfile must add /usr/local/bin to PATH and use /tmp for any tool that needs writing files, see the example

Extending

Three lambda layers are created:

  • jenkinsfile-runner: the main library
  • plugins: minimal set of plugins to build a Jenkinsfile
  • tools: git, openjdk, maven

You can add your plugins in a new layer as a zip file inside a plugins dir to be expanded in /opt/plugins. You could also add the Configuration as Code plugin and configure the Artifact Manager S3 to store all your artifacts in S3.

Other tools can be added as new layers, and they will be expanded in /opt. You can find a list of scripts for inspiration in the lambci project (gcc,go,java,php,python,ruby,rust) and bash, git and zip (git is already included in the tools layer here)

The layers are built with Docker, installing jenkinsfile-runner, tools and plugins under /opt which is where Lambda layers are expanded. These files are then zipped for upload to Lambda.

Installation

Create a lambda function jenkinsfile-runner using Java 8 runtime. Use the layers built in target/layer-* and target/jenkinsfile-runner-lambda-*.jar as function. Could use make publish to create them.

Set

  • handler: org.csanchez.jenkins.lambda.Handler::handleRequest
  • memory: 1024MB
  • timeout: 15 minutes
aws lambda create-function \
    --function-name jenkinsfile-runner \
    --handler org.csanchez.jenkins.lambda.Handler::handleRequest \
    --zip-file fileb://target/jenkinsfile-runner-lambda-1.0-SNAPSHOT.jar \
    --runtime java8 \
    --region us-east-1 \
    --timeout 900 \
    --memory-size 1024 \
    --layers output/layers.json

Exposing the Lambda Function

From the lambda function configuration page add a API Gateway trigger. Select Create a new API and choose the security level. Save the function and you will get a http API endpoint.

Note that to achieve asynchronous execution (GitHub webhooks execution will time out if your webhook takes too long) you would need to configure API Gateway to send the payload to SNS and then lambda to listen to SNS events. See an example.

GitHub events

Add a GitHub json webhook to your git repo pointing to the lambda api gateway url.

 

More information in the Jenkinsfile-Runner-Lambda GitHub page.

Google Cloud Next Recap

google-next-logoSeveral interesting announcements from last week Google Next conference.

Knative, a new OSS project built by Google, Red Hat, IBM,… to build, deploy, and manage modern serverless workloads on Kubernetes. Built upon Istio, with 1.0 coming soon and managed Istio on GCP. It includes a build primitive to manage source to kubernetes flows, that can be used independently. Maybe it is the new standard to define sources and builds in Kubernetes. Read more from Mark Chmarny.

GKE on premise, a Google-configured version of Kubernetes with multi-cluster management, running on top of VMware’s vSphere.

Another Kubernetes related mention was the gVisor pod sandbox, with experimental support for Kubernetes, to allow running sandboxed containers in a Kubernetes cluster. Very interesting for multi-tenant clusters and docker image builds.

Cloud Functions are now Generally Available, and more serverless features are launched:

Serverless containers allow you to run container-based workloads in a fully managed environment and still only pay for what you use. Sign up for an early preview of serverless containers on Cloud Functions to run your own containerized functions on GCP with all the benefits of serverless.

A new GKE serverless add-on lets you run serverless workloads on Kubernetes Engine with a one-step deploy. You can go from source to containers instantaneously, auto-scale your stateless container-based workloads, and even scale down to zero.

Cloud Build, a fully-managed CI/CD platform that lets you build and test applications in the cloud. With an interesting approach where all the pipeline steps are containers themselves so it is reasonably easy to extend. It integrates with GitHub for repos with a Dockerfile (let’s see if it lasts long after Microsoft acquisition).

Other interesting announcements include:

  • Edge TPU, a tiny ASIC chip designed to run TensorFlow Lite ML models at the edge.
  • Shielded VMs – untampered virtual machines

  • Titan Security Key, a FIDO security key with firmware developed by Google. Google security was giving away at the conference both NFC and bluetooth keys, a good replacement for the yubikeys specially for mobile devices.

Running a JVM in a Container Without Getting Killed II

A follow up to Running a JVM in a Container Without Getting Killed

In Java 10 there is improved container integration.
No need to add extra flags, the JVM will use 1/4 of the container memory for heap.

$ docker run -m 1GB openjdk:10 java -XshowSettings:vm \
    -version
VM settings:
    Max. Heap Size (Estimated): 247.50M
    Using VM: OpenJDK 64-Bit Server VM

openjdk version "10.0.1" 2018-04-17
OpenJDK Runtime Environment (build 10.0.1+10-Debian-4)
OpenJDK 64-Bit Server VM (build 10.0.1+10-Debian-4, mixed mode)

Java 10 obsoletes the -XX:MaxRAM parameter, as the JVM will correctly detect the value.

You can still use the -XX:MaxRAMFraction=1 option to squeeze all the memory from the container.

$ docker run -m 1GB openjdk:10 java -XshowSettings:vm \
    -XX:MaxRAMFraction=1 -version
OpenJDK 64-Bit Server VM warning: Option MaxRAMFraction was deprecated in version 10.0 and will likely be removed in a future release.
VM settings:
    Max. Heap Size (Estimated): 989.88M
    Using VM: OpenJDK 64-Bit Server VM

openjdk version "10.0.1" 2018-04-17
OpenJDK Runtime Environment (build 10.0.1+10-Debian-4)
OpenJDK 64-Bit Server VM (build 10.0.1+10-Debian-4, mixed mode)

But it can be risky if your container uses off heap memory, as almost all the container memory is allocated to heap. You would have to either set -XX:MaxRAMFraction=2 and use only 50% of the container memory for heap, or resort to Xmx.

Serverless CI/CD with AWS ECS Fargate

Amazon AWS has recently launched ECS Fargate to “run containers without having to manage servers or clusters”.

So this got me interested enough to patch the Jenkins ECS plugin to run Jenkins agents as containers using Fargate model instead of the previous model where you would still need to create and manage VM instances to run the containers.

How does it work?

With the Jenkins ECS plugin you can configure a “Cloud” item that will launch all your agents on ECS Fargate, matching jobs to different container templates using labels. This means you can have unlimited agents with no machines to manage and just pay for what you use.

Some tips on the configuration:

  • Some options need to be configured, like subnet, security group and assign a public ip to the container in order to launch in Fargate.
  • Agents need to adhere to some predefined cpu and memory configurations. For instance for 1 vCPU you can only use 2GB to 8GB in 1GB increments.

Pricing

Price per vCPU is $0.00001406 per second ($0.0506 per hour) and per GB memory is $0.00000353 per second ($0.0127 per hour).

If you compare the price with a m5.large instance (4 vCPU, 16 GB) that costs $0.192 per hour, it would cost you $0,4056 in Fargate, more than twice, ouch! You could build something similar and cheaper with Kubernetes using the cluster autoscaler given you can achieve a high utilization of the machines.

While I was writing this post someone already beat me to submit a PR to the ECS plugin to add the Fargate support.