Data is power in the world of performance engineering. Be it test results, system metrics, tunable knobs or record shattering benchmarks, data is everywhere. Read on to know how we at Red Hat are leveraging the power of Elasticsearch with Browbeat to lead the next revolution in performance engineering.
Do Numbers Mean Anything?
Test Data and benchmarks can be used to get valuable insights into how a system is performing under given conditions. The words ‘given conditions’ are really important here as me saying, “I got 2 Gbps throughput from my network interface card” doesn’t quantify performance meaningfully. Without providing details about the system architecture, CPUs, line rate and a whole bunch of other things, the number 2 Gbps doesn’t speak for itself.
The Future of Performance Engineering is NOW
Performance engineers have been using spreadsheets to record, track and analyze data forever. As powerful as they are, anyone who has worked with spreadsheets knows how cumbersome it is to organize them. With Browbeat, we have taken technologies that are already out there like Elasticsearch and integrated them into our framework, thereby creating a convenient way to store, organize and analyze test results without the hassles of using CSVs and spreadsheets.
Think that is cool? Wait until you hear how we solve the problem of coming up with names or tags like ‘Westmere-32-core-Mellanox-Connectx3-40Gbps-Neutron-api-workers-32-Run-1-20160714-164017″ to associate test result data with the ‘given conditions’.
We use Ansible to gather data about how the system and software under test are configured, so you can simple query Elasticsearch for something like “neutron_workers: 32” and it would show you all results that match that criteria. Since each test result is bundled with metadata about how the environment was configured, it doesn’t require any effort from the user to associate test results with environment conditions. With this level of automation in place, an engineer is doing more, faster and smarter everyday by utilizing time on the actual analysis and inference of results than on organizing and tracking them.
Tools of the Trade
Let’s dig a bit deeper into how we package a solution in Browbeat using Ansible, Elasticsearch and Kibana to gain quick and priceless insights into OpenStack Performance. For starters let’s briefly define the technologies used in this solution:
Ansible, an automation platform for configuring and managing servers
Elasticsearch, an enterprise class data store and search engine
Kibana, browser-based analytics and search dashboard for Elasticsearch
Currently, with Browbeat we are able to index and analyze data from Rally and Shaker test results and now is the time to see how we do it.
The entire engineering/code behind this work can be broken down into three parts; gathering configuration details of the cloud as metadata, massaging the tool results, and Indexing the data in Elasticsearch to visualize it via Kibana. Lets look at each of these parts in detail.
This task is accomplished using Ansible Facts. Facts are information gathered from remote systems by Ansible. By default, Ansible gathers a lot of data about the system it is talking to, including but not limited to network interfaces, Operating System, Kernel, CPU architecture etc. While this is a lot of useful data, the missing piece is configuration details of the software, which in our case is OpenStack. Since we weren’t getting that piece for free from Ansible we wrote playbooks that go through OpenStack configuration files on all nodes and set these configuration parameters as Ansible Facts. With OpenStack configuration loaded into Ansible Facts, we dump the facts from all the nodes onto the node on which Browbeat is running.
Although we initially tried passing all of the Ansible facts as metadata, we soon realized that it was overwhelming Elasticsearch by creating tons of unique keys, which seemed to be a bad idea anyway. So, we wrote a python class to massage the facts JSON, flatten it out and cleverly grab only facts of interest into three different JSONs, one each for Hardware metadata (hardware details of nodes), Software metadata (OpenStack configuration) and Environment Metadata (number of controllers etc.). The Elasticsearch connector class adds data in each of these files as metadata to every single test result, which gives us the power to associate results with ‘given conditions’ – the way the cloud was setup. The idea is that we run the playbook to gather this metadata every time we have changes in configuration and if there’s no changes at all these files can be reused for every run of Browbeat.
Massaging Tool Results
Sending results from the tool as is to Elasticsearch is a bad idea since it might diminish the value of the data itself by limiting its usability and searchability. We made several prototypes of how we index the JSON output from the tool as a result of which we improved the value we get out of the result data. When I say value from the result data, it translates into how granular the searches can be made and if we are able to get the visualizations for the use cases we want. Querying by action name, scenario name, being able to visualize a scenario with stats for each atomic action, keeping count of all errors and successful results are all things that have been made possible by the manipulations we do to the native JSON outputted by Rally.
It a whole new story with Shaker. Like Rally, the JSON outputted by Shaker has also not been constructed with the idea to be indexed into Elasticsearch. Moreover, the fact that Shaker results have to be represented as ‘Throughput vs Time’, makes it even more challenging given that the JSON doesn’t present data with unique time stamps per data point. The JSON contains throughput sampled over 60 seconds in a 1 second interval as a list , presented as one record per guest involved in the network testing. With our first prototype, we were able to successfully send and retrieve result data from Elastic but on visualizing in Kibana, we were not seeing what we hoped to see. Instead of seeing a ‘Throughput vs Time’ chart, we were seeing just one vertical bar which represented the average of all the throughput values. On further investigation, we understood that with the way we have our data modeled currently, we are limited by Kibana.
We found a slick workaround in the form of fake timestamps. We pre-populated some timestamps during each Browbeat run and associated each throughput value in a record with these timestamps and passed the data along to Elasticsearch. On visualizing the data in Kibana, we were able to see a ‘Throughput vs Time’ chart. Kibana was still showing us the average value per timestamp but now we had multiple timestamps (one per data point). There were a few other things that became easier to visualize with our fake timestamp model. For example consider a Shaker test with concurrency set to 4. In this scenario, Shaker launches 4 master-slave VM pairs and does throughput testing concurrently. The results JSON contains one record per each master VM (server) per concurrency, and by using the same set of timestamps for each record, we can see aggregate and average throughput charts for the 4 VMs, thereby getting a true sense of the network capacity of the cloud.
Indexing the Results in Elasticsearch and Visualizing them via Kibana
Well, this last step is the moment of truth. We have a python class that takes the result data passed to it, bundles it with the metadata generated earlier, and pushes the data as a record to Elasticsearch. The index is of the form “[browbeat-rally-]YYYY.MM.DD” or “[browbeat-shaker-]YYYY.MM.DD” depending on the tool name. So every record passed to Elasticsearch will now have metadata about the cloud, metadata about the tool results (concurrency, times etc. in case of Rally and progression, heat template etc. in case of Shaker) and the actual result data. The Rally results are passed as one record per atomic action and the shaker results are passed as one record per every data point (that’s a lot of records per test, but helps us get the visualizations we need).
The real value in this work is realized when trying to analyze and make inferences from test results, as the tools in their native form only report results for that particular run and that too without any references to the environment they are run in. With Browbeat we can track trends and identify bottlenecks by tuning parameters in OpenStack and visualizing that performance in Kibana.
Let us now look at a few dashboards that were built from result data that Browbeat sent over to Elasticsearch.
The above dashboard tracks the response times for a Keystone authenticate scenario vs concurrency when Keystone is configured in HTTPD. We could easily isolate test runs that had keystone running in HTTPD as keystone configuration data was passed as metadata.
The chart above shows the response times for various atomic actions involved in creating and listing a neutron router with Neutron API workers tuned to 16, 32, 48 and 64 (left to right). It can be seen that a worker count of 16 provides lower average response times (lower being better) contrary to the general belief that optimal performance is obtained by tuning workers to core count (32 in this case). Again, this comparison has only been made possible since we pass neutron configuration as metadata to Elasticsearch.
Let’s look at a sample visualization for Kibana for Shaker.
In the above chart, we have been able to query Elasticsearch to provide the results for a scenario with 8 VMs concurrently blasting traffic, across two compute nodes, when the iptables hybrid firewall driver was being used for Neutron. It is interesting to note that we were able to plot the sum of throughputs across all VMs since results from each VM have been indexed with the same exact time stamp, allowing Kibana to aggregate results.
I am a Browbeat user and I do not care about what happens under the hood. What should be my workflow to make use of the Elasticsearch integration?
In the Browbeat configuration file, you would need to enable elasticsearch and give the values for the IP address of the elasticsearch instance and the port number.
Before kicking off the Browbeat Test suite, you would need to run,
ansible-playbook -i hosts gather/site.yml
This playbook dumps all the metadata we need, which is further processed by the Metadata python class. After running the playbook, you would start Browbeat the usual way,
./browbeat.py rally shaker
Whenever you tune any parameters in OpenStack or make any other changes on the nodes be sure to run the playbook again, as that ensures Browbeat always indexes results with most recent/correct metadata.
Some lessons learned
- Spend enough time modeling your data and getting the mapping right since that could mean the difference between getting 100% value from your data and no value at all although the data is sitting there in Elasticsearch.
- Always visualize the data in Kibana, to make sure what you are seeing is what you intend to. Visualizing the data is an easier way to ensure you got the data modeling/mapping right.
- It is more than likely that you won’t get the mapping right the first time, it is an iterative process that needs some refining.
- Data is good. Metadata is great. But when you bundle them, the possibilities are infinite.
Hope you have a great time playing with performance data!