Practical DevOps for Big Data/Quality Testing

Introduction

Quality testing (QT) of data-intensive application aims at verifying that prototypes of a DIA deliver the required level of scalability, efficiency, and robustness expected by the end-user. Common Big data technologies such as Apache Storm, Spark, Hadoop, Cassandra, and MongoDB are fairly different from each other, thus it is important to realize that QT requires using multiple tools to test a complex DIA.

For example, Cassandra and MongoDB databases recently became supported by the Apache JMeter – a state-of-the-art load generation tool in the open source domain. MongoDB is supported natively, whereas Cassandra is supported through an external plugin. There is also partial support for Apache Hadoop and HBase, now available in JMeter. This means that the research challenges associated with the load injection for these technologies are limited since a practical solution already exists, being JMeter the leading open source solution for stress testing. Based on the above, we discuss in this chapter mainly QT in the scope of stream-based DIAs, which has received limited attention in the open source community.

Motivation

Several challenges arise in designing quality testing tools^[1]. There is no formalized methodology in this respect for DIAs and some of the key challenges in our view may be summarized as follows.

Load injection actuator design. In traditional load injection, the workload submission relies on a client-server model where external worker threads submit jobs to the application. This model is however not natural for platforms like Storm, where the message injection is done by dedicated software components, called spouts, that are part of the application itself, and which inject specific units of work packaged as messages that flow through the system. In systems of this kind, the load injection actuators should not be implemented as external elements, but rather as internal elements to the application. Therefore, there is a conceptual difference between developing QT tools for databases and similar tools for streaming platform. For example, being internal to the application itself, the resource usage of the workload generator in a streaming application will cumulate to the resource consumption of the application itself. This may not be acceptable for database systems, but it’s necessary for streaming applications, where the injection of messages consumes a separate pool of resources. While also an external implementation may be considered, for example by providing a service equivalent to the ones used by Twitter to push data to external parties, a considerable advantage of the internal implementation is that the load injection actuator can have direct access to the data structures used within the DIA, including the custom message structure defined by the developer. This makes much simpler for the end user to define the streaming workload sent to the system using the same Java classes developed for the DIA itself and to automate the execution of the experiments. However, one important implication of this approach is that load injection needs to be provided as an embedded library that can be linked in the application code, with the developer defining the test workload through the IDE.

Scalability testing. A requirement for any load injection tool is to be able to test the ability of a system to scale, i.e., to deliver Quality-of-Service (QoS) in terms of performance and scalability also when the parallelism level of the workload is increased (e.g., more users, greater data velocity, greater data volumes, etc.). It is therefore expected that a good implementation of the QT tools will be able to generate additional load as required by the end user until observing a saturation of one or more resources. This is possible with QT.

Streaming workload generation. Streaming-based DIAs typically focus on analyzing incoming messages for specific topics and trigger certain actions when pre-defined conditions are met on the payload of the messages. One research challenge is, therefore, to provide adequate tool support to make it easy for the user to generate time-varying workloads and time-varying message contents. Ideally, such workloads should resemble as much as possible the real workloads seen by the developer’s company in production. For example, in the case of Twitter streaming data, one would like to generate workloads that are similar to ordinary Twitter feeds, but that are controllable in terms of arrival rate of individual kinds of messages (e.g., arrival rate of original tweets vs. arrival rate of retweets, or arrival rates of messages from a specific user) and frequency of appearance of messages that trigger specific actions from the DIA. We have systematically investigated this problem throughout the research activity part of QT, and we have developed a dedicated mathematical theory to address this problem.

Existing Solutions

There is a variety of load testing tools available for distributed systems, both commercial (e.g., IBM Rational, HP LoadRunner, Quali, Parasoft, etc) and open-source (e.g., Grinder, JMeter, Selenium, etc). However, all such tools have been developed mainly for traditional web applications where the incoming load is web traffic, and users interact with the application front end (website). In the case of DIAs, the incoming data (e.g. databases, streams) are the main workload drivers. While database testing is fairly common, there is a shortage of open source solutions to inject data streams into a DIA. Prior to introducing the DICE solution to this problem, below we discuss in more details existing load testing tools.

Grinder. This is a framework for distributed testing of Java applications. In addition to testing HTTP ports, it can test web services (SOAP/REST), and services exposed via protocols such as JMS, JDBC, SMTP, FTP and RMI. The tool executes test scripts written in Java, Jython or Clojure. Tests can be dynamic, in the sense that the script can decide the next action to perform based on the outcome of the previous one. The tool is mature, having been first released in 2003. It is quite popular with weekly download count of around 400 downloads. A limitation of Grinder is the lack of an organised community of volunteers to support this tool.

Apache JMeter. This is a distributed load testing tool that offers similar features to Grinder in terms of load injection capabilities. If combined with Markov4JMeter, the tool can also feature probabilistic behaviour. Interestingly, the tool also features native support for MongoDB (a NoSQL database). The tool excels in extensibility, which is achieved through plug-ins. A limitation of JMeter is that the web navigation may not be representative on websites where Javascript/AJAX play a major role for performance, since JMeter does not feature the same range of Javascript capabilities as a complex browser. JMeter has support of the large Apache Foundation community.

Selenium. This is a popular library that automates the control of a web browser, such as Internet Explorer or Firefox. It allows executing tests using a native Java API. Furthermore, test scripts can be recorded by adding Selenium plug-in to the browser and manually executing the operations. An extension, formerly called SeleniumGrid and now integrated in the stable branch of the main Selenium tool, allows to perform distributed load testing.

Chaos Monkey. This is an open source service that runs on Amazon Web Services. This tool is used to create failures within Auto Scaling Groups and to help identify failures within cloud applications. The design is flexible and can be expanded to work with additional cloud providers and provide additional functionality. The service has a fully configurable schedule for when actions will be taken and by default runs on non-holiday weekdays between 9am and 3pm. A similar tool has been developed within .NET for Microsoft Azure called WazMonkey which offers similar functionality as Chaos Monkey such as rebooting or reimage role instances within a given Azure deployment at random.

How the tool works

The DICE QT tool is the combination of two independent modules:

QT-LIB: a Java library to define load injection experiments in Storm-based and Kafka-based applications.
QT-GEN: a tool for the generation of workload traces that can be injected into the system using QT-LIB.

QT-LIB is offered as a jar file that can be packaged in any Java-based Storm topology, providing the custom spouts for load injection and workload modulation. This tool can acquire external data to be pushed to the DIA either through MongoDB, for large datasets which can be exported as JSON files, or through external text files. The JSON file uses syntax compatible with MongoDB. Below, we provide more details on the individual modules.

QT-LIB

The QT-LIB module offers a set of custom spouts that automate workload injection in the DIAs. The conceptual working of these spouts is simple: as soon as the DIA is deployed, these spouts automatically generate tuples according to the user-specified workload. A load generation interface is available to the end user to specify an input data source (e.g., CSV, MongoDB); this interface provides the input tuples for the spout to inject load into the system and can be easily adapted to other databases or other textual or binary input sources. Even though QT-LIB becomes part of the topology under test (both as a code and logical architecture), the system under test is a collection of Bolts that process data. Therefore, the Bolts and the relationships among them are embedded in the DIA code.

QT-GEN

We discuss the most advanced feature of QT, which is the ability to automatically generate artificial workloads from a given trace of Storm messages. The idea is to distinguish a set of message types within a given workload trace and produce a novel trace that has similar characteristics to the initial trace, without however being identical to it, and with arbitrary user-specified arrival rates. QT assumes the availability of a JSON file where each entry represents a message to be serialized into tuples.

The QT-GEN tool accepts in input a JSON file including an arbitrarily long sequence of such messages, parses it to extract the sequence of issue timestamp and then generates a new sequence of messages where the messages are alternated in such a way to preserve a similar correlation among the user-defined types as and then generates a new sequence of messages where the messages are alternated in such a way to preserve a similar correlation among the user-defined types as seen in the original trace. As a simple example, for a Twitter trace the QT tool can generate a new trace with:

The same ratio of tweets and retweets as in the original trace, with the latter being identified by a specific field in the Twitter JSON data structure.
A statistically similar temporal spacing between tweets and retweets, so that volumes of tweets and retweets and the frequency and correlation of their peaks are also reproduced in the artificial workload.

The QT tool can statistically analyze this correlation and reproduce it to the best possible extent in artificial workloads, using a class of traffic models called Marked Markovian Arrival Processes, which are a special class of Hidden Markov Models. Markovian Arrival Processes (MAPs) are a class of stochastic processes used to model the arrivals from a traffic stream. Marked MAPs (MMAPs) are an extension of MAPs that allow modeling arrivals of multiple types of traffic. The QT-GEN library provides a backend to the QT tool, and it is designed for fitting marked traces with MMAPs and successive generation of representative traces. The latter problem is nontrivial due to the typically complex and nonlinear relationships between the statistical descriptors of the marked trace and the parameter of the MMAPs.

QT Methodology

The overall methodology for quality testing is illustrated in the figure below for a reference Storm-based application:

Testing the quality of big data application with DICE

The main methodological steps are as follows:

Submission of an application (i.e., a topology) to the cloud testbed
Tests with representative workloads and collection of initial logs (e.g., in JSON format)
Submission of the logs to the QT-GEN workload generator
Production of a new workload to be injected into the system using the custom QT-LIB spout
Submission of the QT-enabled topology with the QT-LIB spout
Automated test carried out
Visualization of the results either on the monitoring platform (D-MON) or in the delivery service (Jenkins dashboard)

Open Challenges

The solution outlined in this chapter takes care of load-injection in a stream processing system with the goal of automatically expose performance and reliability anomalies. Although it is possible to model labeled data, current load-testing solutions used in stream-processing systems do not provide the ability to programmatically specify the data content of the message payloads. Clearly, the response of a stream processing system will also depend on the content of the data itself, thus it is foreseeable that more advanced tools could be defined to extend the proposed approach to encompass content, data concepts, and similar.

Another aspect that is not explored in this chapter is the scalability of the quality testing toolchain, which is relevant to data-intensive applications designed to run on tens or hundreds of nodes. Such configuration requires the quality testing process itself to be distributed, in order to avoid bottleneck in input steam generation. Commercial solutions for cloud-based load testing are available, but their use in the context of Big data technologies is still in its infancy.

Application domain: known uses

Generating an artificial Twitter trace for load testing

Let us consider a stream processing system that obtains data from a Twitter stream. Naturally, in order to test the system, we need some Twitter data. Unfortunately, Twitter and other social media platforms require a payment to obtain and use their datasets. While we could buy some or replay some toy dataset, it would be desirable to have a free tool to generate artificial datasets similar to the real ones. QT offers this capability.

The starting point is to consider the JSON template for messages coming from Twitter or a similar platform. For example, this may be similar to this JSON file:

{ "className" : "gr.iti.mklab.framework.abstractions.socialmedia.items.TwitterItem",
"reference" : "Twitter#00000000000000000",
"source" : "Twitter" ,
"title" : "@YYY: 4 DAYS TO GO #WorldCups",
"tags" : [ "WorldCups"],
"uid" : "Twitter#000000000",
"mentions" : [ "Twitter#111111111" , "Twitter#22222222222"],
"referencedUserId" : "Twitter#138372303",
"pageUrl" : "https://twitter.com/CinemaMag/statuses/123456789123456789",
"links" : [ "http://fifa.to/xyxyxyxy"],
"publicationTime" : 1401310121000,
"insertionTime" : 0,
"mediaIds" : [ "Twitter#123456789123456789"],
"language" : "en",
"original" : false,
"likes" : 0,
"shares" : 0,
"comments" : 0,
"_id" : "Twitter#123456789123456789"}

Note in particular that the JSON file contains a timestamp field, here "publicationTime", and several fields that may be used to classify the type of tweet, e.g., "original" that distingues in two categories: tweets from re-tweets.

In the real Twitter-based DIAs, a message like this is received immediately after publication. Our goal is to produce a new JSON trace, starting from a given Twitter trace, with the following properties:

The arrival rate of tweets in the Twitter trace and the one in the new trace is identical
The artificial and original traces have similar probability of generating a burst of tweets

The QT-GEN tool ensures these two properties. Given the trace, the timestamp field (e.g., "publicationTime") and the classification field (e.g., "original"), it outputs a new JSON trace similar to the original Twitter trace, but nonetheless non-identical.

Load testing a Kafka pipeline

We now illustrate the practical use of QT-LIB with Kafka. Below we present a compact example that is included in the QT-LIB distribution. Initially, as done also with Storm, we construct a QTLoadInjector factory that will assemble the load injector. Moreover, we specify the name of the input trace file, which is assumed to be packaged within the jar file of the data-intensive application.

package com.github.diceproject.qt.examples;

import com.github.diceproject.qt.QTLoadInjector;
import com.github.diceproject.qt.producer.RateProducer;

public class KafkaRateProducer {
	
	public static void main(String[] args) {
		QTLoadInjector QT = new QTLoadInjector();
		String input_file = "test.json"; // this is assumed to be a resource of the project jar file

We are now ready to instantiate a QT load injector for Kafka, which is called a RateProducer:

		RateProducer RP;
		RP=QT.getRateProducer();
		String topic = "dice3"; // topic get created automatically
		String bootstrap_server = "localhost:9092";

Here we have assumed that our target topic is called dice3 and it is exposed by a Kafka instance available on the local machine on port 9092. This corresponds to the port of the so-called Kafka bootstrap server, and should not be confused with the port of the Zookeeper instance associated to Kafka. We can now use the run() method to start immediately the load testing experiment.

		RP.setMessageCount(101);
		RP.run(bootstrap_server, topic, input_file);

These will read JSON messages similar to the ones created by QT-GEN from the test.json file previously declared. This file must be shipped within the application JAR, so that QT-GEN can open it as a local resource. The examples generated 101 message to illustrate the behavior of QT-LIB when the trace runs over since in this case the trace is composed of 100 messages. QT-LIB will cyclically restart the trace from the beginning, resending the first JSON message as the 101st message.

Conclusion

The main achievements of DICE QT are as follows:

DICE QT allows load injection in Storm-based applications using custom spouts that can reproduce tabulated arrival rates of messages or inject in controlled manner messages stored in a text file or an external database, i.e., MongoDB.
DICE QT can analyze a workload trace composed of different types of messages and fit the observed inter-arrival times of these messages into a special class of Hidden Markov Models, called MMAPs, from which new statistically similar traces can be generated to assess the system under varying load scenarios.
DICE QT can also inject load in Kafka topics, therefore supporting a broad variety of targets, including Spark applications

References

↑ André B. Bondi (2015). Foundations of Software and System Performance Engineering: Process, Performance Modeling, Requirements, Testing, Scalability, and Practice. Addison-Wesley.

[bondi-1] André B. Bondi (2015). Foundations of Software and System Performance Engineering: Process, Performance Modeling, Requirements, Testing, Scalability, and Practice. Addison-Wesley.

[1]