performance

Tools for Analysis

I find it’s easier to get something done if I can see the results quick. This post , maybe series (wishful thinking) covers that. A simple quick Docker setup to pipe stats into a InfluxDB and graph them with Grafana – giving you nice graphs with real data quickly. See below,

Eventually, I’ll add Prometheus and update the clients to allow use of ZipKin for distributed tracing analysis.

Docker to satisfy the Instant Gratification Monkey

For prototyping, or even Prod for the explorers, I find Docker to give the best return time. You can spin up complex networks in minutes, here’s what we’re trying to build.

Setup

The aim is to generate some load using Gatling. Once we get a baseline from each client, we can try adding load balancers or even on demand instances to see how much more performance we can eek out.

Load Generation

Gatling is a good candidate for this, it simulates multiple users per thread – making it more resourceful than JMeter and provides a good DSL for intuitive request generation.

There is a recording tool provided, similar to that of JMeter’s. You hit go, point your browser to proxy through it ; then navigate away through the sequence of clicks that will provide the test ‘scenario’.

I wanted to dabble with Scala so I took a manual approach – there’s a first cut of it below. The steps are as follows:

Download payload file from OAT server, regexp replace all the ids. (Offline)
Load all requests from the file
For each user we configure, stagger their arrival, add the unique id to with a rexgp/replace and send each message in the payload file, pausing realistically as we proceed.

Gatling Source (2.2.0)

class ToDo extends Simulation {

  val ids = csv("user-files/data/ids.csv")
  val destinationUrl: String = System.getProperty("destinationUrl", "http://localhost:8080")
    val duration = 60 //3600 + 1800
    val SETUP_URL = "/AngularJSRestful/rest/todos/158"
    val MAPPING_URL =  "/AngularJSRestful/rest/todos"


    val httpConf = http 
      .baseURL(destinationUrl) 
      .acceptHeader("application/json, text/plain, */*")
      .acceptLanguageHeader("en-US,en;q=0.5")
      .acceptEncodingHeader("gzip, deflate")
      .userAgentHeader("Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0")



      val incidents = Source.fromFile("user-files/data/incidents.txt").getLines.toList
      var chainList : MutableList[ChainBuilder] = new MutableList[ChainBuilder]()
      val pausetime = (duration / incidents.size)
      println(s":::>> Pausetime =  ${pausetime} seconds")


      def go() = {
        chainList+=setup()
        for(incident <- incidents) {
          chainList += generateWebRequest("New TODO", "${id}", Map("incident"->incident))
        }
        chainList
      }


      def setupRequest() =  {
        exec(http(SETUP_URL).get(SETUP_URL).queryParam("x","1")
          .check(regex("[0-9]")
          .saveAs("owner_id"))
        ).pause(10)
      }



      def generateWebRequest(requestName:String, sort:String, queryParamMap:Map[String,String]) = {
        val pl = queryParamMap.get("incident").getOrElse("{}").replace("_","${owner_id}")
        exec(http(requestName).post("/AngularJSRestful/rest/todos").header("Content-Type", "application/json").body(StringBody(pl))).pause(pausetime)
      }


      val scn = scenario("REST Client Publishing")
        .exec(go())

        setUp(
          scn.inject(constantUsersPerSec(20) during(15 seconds))
        ).protocols(httpConf)

}

The payload file from ./user-files/data/incidents.txt

{"name":"Alan","owner":"_","priority":"Low"}
{"name":"Alan","owner":"_","priority":"High"}
{"name":"Alan","owner":"_","priority":"Low"}
{"name":"Alan","owner":"_","priority":"Low"}
{"name":"Alan","owner":"_","priority":"Medium"}
{"name":"Alan","owner":"_","priority":"Low"}
{"name":"Alan1","owner":"_","priority":"Low"}
{"name":"Alan2","owner":"_","priority":"Low"}
{"name":"Alan","owner":"_","priority":"Low"}
{"name":"Alan2","owner":"_","priority":"Low"}
{"name":"Alan2","owner":"_","priority":"Low"}
{"name":"Alan","owner":"_","priority":"Low"}
{"name":"Alan3","owner":"_","priority":"Low"}
{"name":"Alan2","owner":"_","priority":"Low"}
{"name":"Alan","owner":"_","priority":"Low"}
{"name":"Alan4","owner":"_","priority":"Low"}
{"name":"Alan","owner":"_","priority":"Low"}
{"name":"Alan","owner":"_","priority":"Low"}
{"name":"Alan","owner":"_","priority":"Low"}
{"name":"Alan","owner":"_","priority":"Low"}

Clients / Target

You can see from the Scala source above that each client is exposing a RESTful interface on 8080 – For this each client will need a Tomcat instance along with a RESTful app. I used Allen Fang’s – borrowed one from github

FROM tomcat:8.0
RUN apt-get -y update
RUN apt-get -y install collectl-utils vim-tiny

RUN sed -i 's!^DaemonCommands.*!DaemonCommands = -f /var/log/collectl -P -m -scdmn --export graphite,influx:2003,p=.os,s=cdmn!' /etc/collectl.conf
COPY app/AngularJS-RESTful-Sample/target/AngularJSRestful.war /usr/local/tomcat/webapps/AngularJSRestful.war
CMD service collectl start && /usr/local/tomcat/bin/catalina.sh start && while true; do sleep 5; done;

Above , I configure the client using a pre-rolled Tomcat 8 image from DockerHub, I add collectl and vim for tweaking in case I get it wrong.

After this, tell collectl to post it’s data regarding disk,network,memory and cpu all in graphite format to to our yet to be created InfluxDB on influx:2003

Coy the war file to $TOMCAT_HOME/webapps and start the services

InfluxDB

This is one is far less involved, this essentially just sits on the network listening for the screams from our overloaded clients.

FROM tutum/influxdb #easy huh!

This will start influx on port 2003 and an admin interface available on http://$IP:8083. See Docker Compose section below

Influx Setup

We need to add a database called graphitedb* once the host is up – so the the clients have their own Repos. You can log in and create one, with root/root I think , or use CURL to send a request:

curl -G http://localhost:8086/query --data-urlencode "q=CREATE DATABASE graphitedb"

This database name is specified in the config for InfluxDB

Listening for Grpahite data from the clients

In order for the clients to post their data to Influx, we need to enable the following in /config/config. To do this I ran the Dockerfile and copied the directory created by the instance to my local drive. I make the modification below and then mount this as the config for the instance created under Docker Compose

➜ ~ docker cp config/ 430b87a00999:/var/influxdb/config/

Enable this in /config/config

### Controls one or many listeners for Graphite data.
###
[[graphite]]
  enabled = true
  bind-address = ":2003"
  protocol = "tcp"
  consistency-level = "one"
  separator = "."
  database = "graphitedb"
  # These next lines control how batching works. You should have this enabled
  # otherwise you could get dropped metrics or poor performance. Batching
  # will buffer points in memory if you have many coming in.
  # batch-size = 1000 # will flush if this many points get buffered
  # batch-timeout = "1s" # will flush at least this often even if we haven't hit buffer limit
  batch-size = 1000
  batch-timeout = "1s"
  templates = [
     # filter + template
     #"*.app env.service.resource.measurement",
     # filter + template + extra tag
     #"stats.* .host.measurement* region=us-west,agent=sensu",
     # default template. Ignore the first graphite component "servers"
     "instance.profile.measurement*"
  ]

Grafana

Even easier: Grafana will be available on 3000. See Docker Compose section below

FROM grafana/grafana

Docker Compose

To wire all these together, we’ll use a docker-compose session. The directory structure for above looks as follows:

docker-compose.yml

version: "2"

services:
  grafana:
    build: grafana
    ports: ["3000:3000"]
    container_name: 'grafana'
    environment:
      -  GF_SECURITY_ADMIN_PASSWORD=secret
    links:
      - influx
      - clienta
      - clientb
      - clientc


  influx:
    build: influx
    ports: ["8083:8083","8086:8086"]
    container_name: 'influx'
    volumes:
      - '/var/influxdb:/data'
      - '/var/influxdb/config:/config'

  clienta:
    hostname: 'clienta'
    build: client
    ports: ["8080:8080"]
    container_name: 'clienta'

  clientb:
    hostname: 'clientb'
    build: client
    ports: ["8081:8080"]
    container_name: 'clientb'

  clientc:
    hostname: 'clientc'
    build: client
    ports: ["8084:8080"]
    container_name: 'clientc'

Running Load Tests

Now if you set up gatling, create the file above and run

./gatling.sh -s ToDo

We can get graphs like the following:

It’s hard to tell here without Prometheus or something similar for monitoring but the Tomcat container dropped about 6 requests about 3 minutes in – more than likely thread pool exhaustion – next post we’ll add monitoring.

We recently deployed AppDynamics at work and not to be leaving all the good stuff to the new tools, I thought I’d have a look at VisualVM . I wanted to have a good look at this for a while, it’s shipped with the JDK for the last couple of versions too.

I wrote some bad code and cracked it open to get a feel for it. I made the performance bottleneck obvious so that it will be easy to find in the output of of VisualVM. (Code is below)

Download VisualVM or use the one shipped with your JDK $JAVA_HOME/bin/jvisualvm

Running VisualVM, I see my application in the left hand side (Illustration A 1,2) , double clicking this opens a tabbed pane to the right (Illustration A3).

From here I can choose various ‘profiling’ options, I’m going to use Profiler as my program is trivial and I don’t care about instrumentation overhead. Selecting the Profiler tab, I profile Memory, after editing the settings and selecting the option to Profile Stack Traces ((Illustration A 4, 6).

The char allocation here is clearly marked by VisualVM, this is the large String I am building up in badAppend(..) – (Profile the CPU to see the method execution info).

Interestingly here, there is a TreeMap coming in strongly behind the expected char[] array – to investigate the origins of this, I can right click the list item during profiling and select the TakeSnapshot and Show Stack Trace option – in reality I had to take the Snapshot first and then right click – my target program crashes out otherwise. The graph presented identifies the garbage collector (Illustration A 5) – who would have been working hard … Take a look at the heap/memory allocation rate in the first tab – monitor and you’ll see it’s climbing up to a peak – dropping as the garbage collector reclaims the thousands of unnecessary objects we are allocating in createDataSize

Illustration B

Notice the pattern of the peaks here, I’m wondering is this a sweep of the Eden space, followed by some admin then a sweep of the survivor spaces?

I couldn’t find the VM arguments in VisualVM* so I ran jps -v on the command line; giving me this

4529 Main -Xms128m -Xmx750m -XX:MaxPermSize=350m -XX:ReservedCodeCacheSize=96m -XX:+UseCodeCacheFlushing -ea -Dsun.io.useCanonCaches=false -Djava.net.preferIPv4Stack=true -Djb.vmOptionsFile=/home/alan/apps/idea-IU-129.1359/bin/idea64.vmoptions -Xbootclasspath/a:/home/alan/apps/idea-IU-129.1359/bin/../lib/boot.jar -Didea.paths.selector=IntelliJIdea12 -Djb.restart.code=88
11096 Jps -Dapplication.home=/home/alan/apps/jdk1.8.0 -Xms8m

* Actually there they are, just below the Heap Tab in Illustration B

The Xmx750m is my pool/heap size – when we near this the GC must intervene – next time I’ll investigate the length of these GC runs – in theory it should be quick as we are following the ‘expected’ many short lived objects pattern, meaning that most garbage will be run on the small Eden space – with fewer on the survivor spaces. To check this I installed the excellent VisualGC plugin:

Illustration C

I can see the eden space being GC’d quite often and the survivor spaces what seems to be less often – but climbing. I think I took this screenshot early in the run which would explain the increasing allocation rate.

Watching the full animation of the GC Plugin is well worth it. This VM is using a Parallel GC and we can see it in action as the program runs. When the Eden space fills, a GC occurs, at about 50% of Max. Eden space on this run , S0 (Survior 0) is populated with a ‘2nd generation’ of objects – freeing up some room in (the cheapest memory space to GC) Eden, S1 is often purged at this point too. This allows the application to optimistically run on with a ‘healthy’ Eden space available. The GC people wouldn’t be catering for these intentional heap filling executions!! When we quickly once again fill Eden, S0 is purged, promoting it’s objects to their 3rd generation and into S1. My large Heap space / Old Gen fills far slower relative to the other spaces. This cycle repeats as the program runs, explaining I believe the shape of the Heap allocation graph in Illustration B

None of this is science, just me passing time!

I’m using Java 1.8 and Visual VM 1.3.8

package tests;

/**
* Created with IntelliJ IDEA.
* User: alan
* Date: 09/09/14
* Time: 18:54
* To change this template use File | Settings | File Templates.
*/
//http://stackoverflow.com/questions/2474486/create-a-java-variable-string-of-a-specific-size-mbs
public class SimpleStringHogger {

public static void main(String... args) {
new SimpleStringHogger().createDataSize(Integer.valueOf(args[0]));
System.out.println("Done");
}

private StringWrapper createDataSize(int msgSize) {
StringWrapper data = new StringWrapper("a");
while (data.length() < (msgSize * 1024) - 6) {
data = data.badAppend("s");
}
return data;
}

private class StringWrapper {
private String s;

private StringWrapper(String s) {
this.s = s;
}

public StringWrapper badAppend(String s) {
return new StringWrapper(this.s + s);
}

public int length() {
return s.length();
}
}
}

codeistuff

computer stuff

Getting started load testing