Practical DevOps for Big Data/Related Work

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Importance of Big Data for the Business[edit]

Data are not only part of almost all economic and social activities of our society, but have managed to be viewed as an essential resource for all sectors, organisations, countries and regions. But why this is a reality? It is expected that by 2020 there will be more than 16 zettabytes of useful data (16 Trillion GB). Data are not only part of almost all economic and social activities of our society like velocity, variety and socioeconomic value, flags a paradigm shift towards a data-driven socioeconomic mode which suggests a growth of 236% per year from 2013 to 2020. Thus, data blast is indeed a reality that Europe must both face and endeavour in an organised, forceful, user-centric and goal-oriented approach. It is obvious that data exploitation can be the leading spear of innovation, drive cutting-edge technologies, increase competitiveness and create social and financial impact.

DICE Value Proposition[edit]

Value proposition of a business initiative is the most critical factor that must be defined early in the project. Value proposition should be a brief but at the same time comprehensive statement of a project, addressing questions like: what value is delivered to the stakeholders? Which one of the business problems/requirements are solved and which needs are satisfied? What bundles of products and services are offered to each stakeholder segment?

In order to reach to a strong and to the point value proposition that will refer to DICE as a whole bringing to the surface all its innovations, project’s tangible assets are produced during its lifetime are identified. Even though it is strongly suggested to continuously monitor the market and revise these assets according to the achievements of exploitation activities during project’s lifecycle, the initial analysis should consider project’s core objectives and envisioned results. Such kind of aspects are the innovation, the motivation and the added value offered to DICE stakeholders.

Finally, the analysis mentioned before should facilitate the definition of other important parts of DICE exploitation strategy, namely project’s market segments, trends and target groups. A qualitative comparison of the features of the existing developments with the expected ones of the DICE ecosystem with respect to stakeholders needs is the key to project’s long-term sustainability.

In order to define DICE Value Proposition, DICE exploitation team allocated several tasks to all partners of the consortium. Thus, partners worked on filling in several templates related to the tools they have implementing in the context of DICE, the technologies they have bringing in, the services they have offering, similar initiatives and advantages over them, etc. In the following chapters, we present in details these contributions.

DICE Value Proposition is the following: DICE delivers innovative development methods and tools to strengthen the competitiveness of small and medium Independent Software Vendors in the market of business-critical data-intensive applications (DIA). Leveraging the DevOps paradigm and the innovative Big Data technologies of nowadays, DICE offers a complete open source solution for the Quality-Driven Development of cloud-based DIA.

Positioning DICE to the market[edit]

The ultimate goal of analysing the information is to position DICE to the market. Which are the market domains that DICE innovative ecosystem make sense to penetrate, which are the trends of nowadays, is DICE positioned in them, etc. All these questions are important to be answered at the early phase of the project so as to steer the development phase in a direction that something innovative is created that will ensure its long-term sustainability. It is obvious that only promising ideas ensure projects viability even after the end of the funding of the project.

From a bird’s eye view, the following figure positions DICE to the market by presenting it’s the “macro” and “micro” market domains

DICE Market Positioning

Business requirements and DICE[edit]

DICE can be seen as part of the DevOps movement as it provides a set of tools that facilitate flow of information from Dev to Ops mediated by a model-driven approach and enables operations monitoring and anomaly detection capabilities to facilitate the flow of information from Ops to Dev. Moreover, DICE offers the following tenets in the context of DevOps:

  • Thanks to DICE profile, it enables the DIA design with proper operational annotations that facilitate design time simulation and optimizations. ‘Dev’ and ‘Ops’ can work side by side to improve and enhance application design using simulation and optimization tools that offers refactoring of architectural designs.
  • Includes monitoring and tools that exploit the monitoring data to detect anomalies and to optimize the configurations of applications, also the monitoring data is exploited by tools to offer feedback to the architectural design to identify the bottlenecks and refactor architecture.
  • DICE also offers tools for automating delivery and continuous integrations facilitate integrating design-time and runtime tools in DICE and also offering DevOps automations.

Organizations are now facing a challenge to exploit the data in a more intelligent fashion using data intensive applications that can quickly analyse huge volume of data in a short amount of time. However, Big Data systems are complex systems involving many different frameworks which are entangled together to process the data. How will the software vendors perform at increasingly high levels at the exploding complexity? How will they accelerate time to market without reducing quality of the service? This increases Big Data application design and operational requirements, and also demands for a common approach in which development and operation teams are able to react in real time throughout the application lifecycle and assure quality and performance needs. Therefore, Big Data system development introduces a number of technical challenges and requirements for systems developers and operators that will severely impact on sustainability of Business Models, especially for software vendors and SMEs that have limited resources and do not have the strength to influence the market. In particular the DICE team identified the following problems in practice:

  • Lack of automated tools for quality-aware development and continuous delivery of products that has been verified in terms of quality assurance. In other words, there exists many different frameworks that software vendors can adopt in order to develop their data intensive applications. However, there is no automated mechanism to enable them to verify software quality across such different technological stack in an automated fashion.
  • Lack of automated tools for quality-driven architecture improvements. In order to develop a data intensive application, software vendors require to adopt an architectural style in order to integrate different components of the system. During development, the complexity of such data architecture will be increased and there is no automated tool to enable designer to improve the architecture based on performance bottlenecks and reliability issues that have been detected based on monitoring of the system on real platforms.
  • Lack of tools and methods interoperable with heterogeneous process maturity in industry. Developing and maintaining data-intensive applications is key to IT market diversity, spanning from big industrial players and small-/medium-enterprises. However, there exists a huge variety of software processes in this diversity. Such diversity is seldom taken into account from a methodological and technological standpoint. For example, very few evaluations show the applicability of certain data-intensive design methods within IT companies consistent with Capability Maturity Model Integration (CMMI) Level 4 (Quantitatively Managed) or Level 2 (Managed), yet, the industrial adoption of those design methods may depend greatly on such evaluations. DICE proposes a lightweight model driven method for designing and deploying data intensive applications where models become assets that go hand in hand with intensive coding or prototyping procedures already established in industry (e.g., in CMMI Level 5 players). DICE models play the role of assisting developers in obtaining the desired QoS properties in parallel with preparing their Data-Intensive business logic. This methodological approach offers concrete integration with any CMMI level, from young and upstarting SMEs with initial maturity - these may use DICE at its simplest form, e.g., working with Agile Methods - to large corporate IT enterprises that may require fine-grained and framework specific support to realise quality.

Software vendors have always found the way to adapt to new conditions to help organizations adopt and reach their goals. During the last years, the DevOps practices have gained significant momentum. Organizations from across industries have recognized the need to close the gap between development and operations if they want to remain innovative and responsive to today’s growing business demands. This new paradigm focuses on a new way of doing things, an approach that tries to improve the collaboration and communication of developers and operators while automating the process of software delivery and infrastructure changes. Therefore, a new ecosystem is needed to facilitate such automated processes by offering the following capabilities:

  • Business Needs and Technical Quality of Service: All types of organizations need to move fast, and need to align IT assets to their needs (create new assets or adapt existing ones). Therefore, ICT departments need to assure that their ICT platforms and infrastructures are flexible enough to adopt business changes, regardless of the type of IT infrastructures used to create those data intensive applications, whether they are built using open source software or data services offered by public cloud providers. In other words, they should avoid technological lock-in as much as possible.
  • Architecture Refactoring: Efficiency, reliability and safety of applications should be monitored in testing and production environments, with metrics and data that feedback directly and quickly to development teams for faster testing, improvement and adjustment to meet service level goals.
  • Application Monitoring and Anomaly Detection: Access to live data gathered by monitoring engines should also provide performance and reliability data for observed applications in production to gauge the need for identifying outliers and detecting any anomaly that may harm continuous business processes that are dependent to such data intensive applications. Moreover, such monitoring data should assist application architects and developers in building toward an optimized target infrastructure.
  • Efficiency and optimal configuration: Big Data applications typically spend expensive resources that are offered to companies via public clouds. Therefore, it is of utmost importance to optimize such applications in order to demand less resources and produced more output. Therefore, organizations require to have automated tools to optimally configure such application without the need to hire experts to optimize their Big Data applications.