devops

An excellent lean coach at my workplace encouraged me to post this quick article about how [macro|micro]-services teams can quickly visualise how successful their continuous deployment implementation is. A picture says a thousand words and all that…so

This slideshow requires JavaScript.

I forgot to grab a snap of it today, here’s one Gus posted earlier this week. I’ll add more

Too many teams too few monitors

We had 10 teams all trying to integrate, normally we could list 5/6 teams vertically using the standard Jenkins BuildPipeline View – but we were short on real estate here – apparently a ticket to solve it was floating around somewhere 😉

To conserve space I checked out a copy of Dashing and re-organised the screen into what you see now, but with just three colours:

Red: Build has failed
Grey: Building
Light Green: No current failures

But.. I got a lot of questions, just general confusion. So I added the Production version to the screen too; indicating x.x.x was successfully deployed to Prod. It still lacked something though, you glanced at it and came away more confused than you were to start.

What message is it supposed to impart ?

This got me thinking, we’re an CI/CD driven floor – what we really need to know is how much we are honouring this principle –

is the latest stable build in Production?

More colours to the rescue

Red: Build has failed – stage indicated – e.g. Acceptance (probably Cloudtsack ;-))
Grey: Building with % indicator
*Light Green: All Good, last stable build is in Production
Less Light Green: Could be better, maybe 2/3 stable builds since Prod release
Getting Yellow: More work to be done here
Yellowy: 5+ stable builds since last release – let the floggings commence

*Colours are near values, it’s a (dodgy) function mapping the distance to an RGB value.

Now it seems to relay more information, I can quickly see which teams are making good progress – there’s probably a lot more you could do with this.

Tools for Analysis

I find it’s easier to get something done if I can see the results quick. This post , maybe series (wishful thinking) covers that. A simple quick Docker setup to pipe stats into a InfluxDB and graph them with Grafana – giving you nice graphs with real data quickly. See below,

Eventually, I’ll add Prometheus and update the clients to allow use of ZipKin for distributed tracing analysis.

Docker to satisfy the Instant Gratification Monkey

For prototyping, or even Prod for the explorers, I find Docker to give the best return time. You can spin up complex networks in minutes, here’s what we’re trying to build.

Setup

The aim is to generate some load using Gatling. Once we get a baseline from each client, we can try adding load balancers or even on demand instances to see how much more performance we can eek out.

Load Generation

Gatling is a good candidate for this, it simulates multiple users per thread – making it more resourceful than JMeter and provides a good DSL for intuitive request generation.

There is a recording tool provided, similar to that of JMeter’s. You hit go, point your browser to proxy through it ; then navigate away through the sequence of clicks that will provide the test ‘scenario’.

I wanted to dabble with Scala so I took a manual approach – there’s a first cut of it below. The steps are as follows:

Download payload file from OAT server, regexp replace all the ids. (Offline)
Load all requests from the file
For each user we configure, stagger their arrival, add the unique id to with a rexgp/replace and send each message in the payload file, pausing realistically as we proceed.

Gatling Source (2.2.0)

class ToDo extends Simulation {

  val ids = csv("user-files/data/ids.csv")
  val destinationUrl: String = System.getProperty("destinationUrl", "http://localhost:8080")
    val duration = 60 //3600 + 1800
    val SETUP_URL = "/AngularJSRestful/rest/todos/158"
    val MAPPING_URL =  "/AngularJSRestful/rest/todos"


    val httpConf = http 
      .baseURL(destinationUrl) 
      .acceptHeader("application/json, text/plain, */*")
      .acceptLanguageHeader("en-US,en;q=0.5")
      .acceptEncodingHeader("gzip, deflate")
      .userAgentHeader("Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0")



      val incidents = Source.fromFile("user-files/data/incidents.txt").getLines.toList
      var chainList : MutableList[ChainBuilder] = new MutableList[ChainBuilder]()
      val pausetime = (duration / incidents.size)
      println(s":::>> Pausetime =  ${pausetime} seconds")


      def go() = {
        chainList+=setup()
        for(incident <- incidents) {
          chainList += generateWebRequest("New TODO", "${id}", Map("incident"->incident))
        }
        chainList
      }


      def setupRequest() =  {
        exec(http(SETUP_URL).get(SETUP_URL).queryParam("x","1")
          .check(regex("[0-9]")
          .saveAs("owner_id"))
        ).pause(10)
      }



      def generateWebRequest(requestName:String, sort:String, queryParamMap:Map[String,String]) = {
        val pl = queryParamMap.get("incident").getOrElse("{}").replace("_","${owner_id}")
        exec(http(requestName).post("/AngularJSRestful/rest/todos").header("Content-Type", "application/json").body(StringBody(pl))).pause(pausetime)
      }


      val scn = scenario("REST Client Publishing")
        .exec(go())

        setUp(
          scn.inject(constantUsersPerSec(20) during(15 seconds))
        ).protocols(httpConf)

}

The payload file from ./user-files/data/incidents.txt

{"name":"Alan","owner":"_","priority":"Low"}
{"name":"Alan","owner":"_","priority":"High"}
{"name":"Alan","owner":"_","priority":"Low"}
{"name":"Alan","owner":"_","priority":"Low"}
{"name":"Alan","owner":"_","priority":"Medium"}
{"name":"Alan","owner":"_","priority":"Low"}
{"name":"Alan1","owner":"_","priority":"Low"}
{"name":"Alan2","owner":"_","priority":"Low"}
{"name":"Alan","owner":"_","priority":"Low"}
{"name":"Alan2","owner":"_","priority":"Low"}
{"name":"Alan2","owner":"_","priority":"Low"}
{"name":"Alan","owner":"_","priority":"Low"}
{"name":"Alan3","owner":"_","priority":"Low"}
{"name":"Alan2","owner":"_","priority":"Low"}
{"name":"Alan","owner":"_","priority":"Low"}
{"name":"Alan4","owner":"_","priority":"Low"}
{"name":"Alan","owner":"_","priority":"Low"}
{"name":"Alan","owner":"_","priority":"Low"}
{"name":"Alan","owner":"_","priority":"Low"}
{"name":"Alan","owner":"_","priority":"Low"}

Clients / Target

You can see from the Scala source above that each client is exposing a RESTful interface on 8080 – For this each client will need a Tomcat instance along with a RESTful app. I used Allen Fang’s – borrowed one from github

FROM tomcat:8.0
RUN apt-get -y update
RUN apt-get -y install collectl-utils vim-tiny

RUN sed -i 's!^DaemonCommands.*!DaemonCommands = -f /var/log/collectl -P -m -scdmn --export graphite,influx:2003,p=.os,s=cdmn!' /etc/collectl.conf
COPY app/AngularJS-RESTful-Sample/target/AngularJSRestful.war /usr/local/tomcat/webapps/AngularJSRestful.war
CMD service collectl start && /usr/local/tomcat/bin/catalina.sh start && while true; do sleep 5; done;

Above , I configure the client using a pre-rolled Tomcat 8 image from DockerHub, I add collectl and vim for tweaking in case I get it wrong.

After this, tell collectl to post it’s data regarding disk,network,memory and cpu all in graphite format to to our yet to be created InfluxDB on influx:2003

Coy the war file to $TOMCAT_HOME/webapps and start the services

InfluxDB

This is one is far less involved, this essentially just sits on the network listening for the screams from our overloaded clients.

FROM tutum/influxdb #easy huh!

This will start influx on port 2003 and an admin interface available on http://$IP:8083. See Docker Compose section below

Influx Setup

We need to add a database called graphitedb* once the host is up – so the the clients have their own Repos. You can log in and create one, with root/root I think , or use CURL to send a request:

curl -G http://localhost:8086/query --data-urlencode "q=CREATE DATABASE graphitedb"

This database name is specified in the config for InfluxDB

Listening for Grpahite data from the clients

In order for the clients to post their data to Influx, we need to enable the following in /config/config. To do this I ran the Dockerfile and copied the directory created by the instance to my local drive. I make the modification below and then mount this as the config for the instance created under Docker Compose

➜ ~ docker cp config/ 430b87a00999:/var/influxdb/config/

Enable this in /config/config

### Controls one or many listeners for Graphite data.
###
[[graphite]]
  enabled = true
  bind-address = ":2003"
  protocol = "tcp"
  consistency-level = "one"
  separator = "."
  database = "graphitedb"
  # These next lines control how batching works. You should have this enabled
  # otherwise you could get dropped metrics or poor performance. Batching
  # will buffer points in memory if you have many coming in.
  # batch-size = 1000 # will flush if this many points get buffered
  # batch-timeout = "1s" # will flush at least this often even if we haven't hit buffer limit
  batch-size = 1000
  batch-timeout = "1s"
  templates = [
     # filter + template
     #"*.app env.service.resource.measurement",
     # filter + template + extra tag
     #"stats.* .host.measurement* region=us-west,agent=sensu",
     # default template. Ignore the first graphite component "servers"
     "instance.profile.measurement*"
  ]

Grafana

Even easier: Grafana will be available on 3000. See Docker Compose section below

FROM grafana/grafana

Docker Compose

To wire all these together, we’ll use a docker-compose session. The directory structure for above looks as follows:

docker-compose.yml

version: "2"

services:
  grafana:
    build: grafana
    ports: ["3000:3000"]
    container_name: 'grafana'
    environment:
      -  GF_SECURITY_ADMIN_PASSWORD=secret
    links:
      - influx
      - clienta
      - clientb
      - clientc


  influx:
    build: influx
    ports: ["8083:8083","8086:8086"]
    container_name: 'influx'
    volumes:
      - '/var/influxdb:/data'
      - '/var/influxdb/config:/config'

  clienta:
    hostname: 'clienta'
    build: client
    ports: ["8080:8080"]
    container_name: 'clienta'

  clientb:
    hostname: 'clientb'
    build: client
    ports: ["8081:8080"]
    container_name: 'clientb'

  clientc:
    hostname: 'clientc'
    build: client
    ports: ["8084:8080"]
    container_name: 'clientc'

Running Load Tests

Now if you set up gatling, create the file above and run

./gatling.sh -s ToDo

We can get graphs like the following:

It’s hard to tell here without Prometheus or something similar for monitoring but the Tomcat container dropped about 6 requests about 3 minutes in – more than likely thread pool exhaustion – next post we’ll add monitoring.