Practical DevOps for Big Data/Platform-Independent Modelling

From Wikibooks, open books for an open world
< Practical DevOps for Big Data
Jump to: navigation, search


DICE provides Software Architects with a set of core concepts, at the DPIM layer, to specify the fundamental architecture elements that constitute a Data-Intensive application, i.e., during the DIA Design phase. Designers may use the identified core architecture elements to quickly put together the structural view of their Big-Data application, highlighting and tackling concerns such as data flow and essential high-level processing properties (e.g., rate, properties provided and required by every component, etc.) as well as key data processing needs (e.g., batch, streaming, etc.).

DPIM Profile

DPIM includes all concepts that are relevant to structure a DIA. At the DPIM level we define the high level topology of the application and its QoS requirements. Elements of the DPIM meta-model fall into two categories:

  1. Active DIA elements;
  2. Passive DIA elements;
Figure 1: Picture of the DICE DPIM Profile

More in particular, the meta-model in Figure 1 shows that DIA elements are essentially aggregates of two sets of components. Firstly, the "ComputationalNode", which is basically responsible for carrying out computational task like map, or reduce in MapReduce. One of important attributes of ComputationNode is "computationType" that shows the processing type of big-data i.e, batch processing or stream processing. The ComputationNode itself, further specializes into "SourceNode" and "Visualization" nodes. The role of the SourceNode is to provide data for processing. In other words, the SourceNode represents the source of data which are coming into application in order to being processed. The attribute "sourceType" further specifies the characteristics of source. The ultimate goal of a big-data application is to process the data that have high volume and velocity. So the SourceNode, and ComputationNode are in DPIM since there are the essential part of each and every DIA. The sourceNode is the entry point of data into the application and the Computation is where data would be processed. Visualization here means to visualize the data to represent the knowledge more intuitively and effectively by using different graphs which are computed through Data-Intensive means. Even though, the visualization of big-data itself could be done by a separate application, but here we considered visualization as specification of ComputationNode since ultimately the visualization is a data-intensive computation task. Another element which is also specification of ComputationNode is the FilterNode. Its role is to do any type of pre-processing and post-processing of data if needed.

The second key element in the DICE profile is the "StorageNode". As its name may suggest, the StorageNode represents the element which is responsible to store the data, either for long or short term. Moreover, it is associated with "Channel" that represents the communication channel in the application. The specification of Channel also shows the restrictions and constraints of a channel. It also specifies the characteristics related to transformation of data like information rate and taps. The concept of StorageNode in DPIM mainly corresponds with the "database" in the model. In some cases, it could also be a "filesystem". The channel in DPIM is a representation of "Governance and data Integration" in which mainly includes the technologies responsible for transferring the data, like message broker systems. The other elements in the model are "DataSpecification" and "QoSRequiredProperty", which are annotation stubs for specification the type and format of data and the QoS for system and its evaluation respectively.

Table 1 summarizes the current list of stereotypes of the DICE Profile for the DPIM level.

Stereotype Description (This stereotype is for model elements representing. . . )
DpimComputationNode DIA components with computation throughput, type of data processing, and maybe expected target technology.
DpimFilterNode Filter nodes that extend general DpimComputationNode with input and output ratios.
DpimSourceNode DIA components with a given storage volume, type of generated data, and data generation rate.
DpimStorageNode DIA component with resource multiplicity, type of stored data, and speed in terms of maximum operations rate.
DpimChannel Connectors that have a maximum speed and that are subject to failures and propagation of errors.
DpimScenario An execution scenario of the DIA, which defines the quality properties of interest and the scenario quality requirements.