Practical DevOps for Big Data/Closing Remarks

From Wikibooks, open books for an open world
Jump to: navigation, search

This book has been motivated by the problems that organization face today in developing Big Data systems. Today, most organizations face high market pressure, and their supporting ICT departments are struggling to accelerate the delivery of applications and services while preserving production and operations stability. On the one hand, ICT operators lack understanding of the application internals including system architecture and the design decisions behind architectural components. On the other hand, development teams are not aware of operation details including the infrastructure and its limitations and benefits. These issues are even magnified for Big Data systems. For these, recent years have seen the rapid growth of interest for the use of data-intensive technologies, such as Hadoop/MapReduce, NoSQL, and stream processing. These technologies are important in many application domains, from predictive analytics to environmental monitoring, to e-government and smart cities. However, the time to market and the cost of ownership of such applications are considerably high.

The book has explained how DevOps practices can address the common isolation between Development and Operations. We have illustrated the DICE methodology, a practical approach to software engineering of Big Data system leveraging the idea that design artefacts, such as UML and TOSCA models, can play a pivotal role in organizing the architectural and requirements knowledge within an organization and share it across the design-time and run-time tools of the DevOps toolchain. Contrary to most DevOps tools available in commerce, which narrow the focus on continuous integration and continuous delivery, DICE thus implements a more ambitious form of DevOps, in which the unification of Dev and Ops is not limited only to the delivery phase, but the whole lifecycle of the application development, with both developer and operators relying on the models to carry out their activities. By having a DevOps toolchain that covers the entire application lifecycle, including feed-back of production monitoring data to designers and developers, the proposed DevOps methodology promotes iterative enhancement of an application driven by quantitative monitoring data.

One of the lessons we have learned in defining the DICE methodology is that different actors have very different expectations on the role that the models should play. Designers see models as a core element of their work, therefore they can find natural to work with model-driven engineering tools. The same does not come natural to most operators, which use abstractions closer to the operating system and the system administration way of thinking. Singularly, if the models are presented in textual form, as in the case of TOSCA orchestration and topology models, operators tend to be at ease with them. This apparent contradiction reveals that, even though a methodology such as DICE can be fully based on models behind the scenes, it is important that different levels of exposures of the models is given to different actors in the toolchain, in order for them to be at ease with the tools they use. Resolving such cultural differences between Dev and Ops indeed represents one of the main challenges of delivering a sound DevOps methodology. We believe that the DICE methodology present in this book represents one of the first instances of methodology to develop data-intensive applications that attemps to systematically tackle this problem.