Practical DevOps for Big Data

From Wikibooks, open books for an open world
Jump to: navigation, search


To do: how this book came into being, how the idea for this book was developed, acknowledgments, etc.


  • Netfective Technology - Youssef Ridene, Joas Yannick Kinouani, Laurie-Anne Parant
  • Imperial College London - Giuliano Casale, Chen Li, Lulai Zhu, Tatiana Ustinova
  • Politecnico di Milano - Danilo Ardagna, Marcello Bersani, Elisabetta Di Nitto, Eugenio Gianniti, Michele Guerriero, Matteo Rossi, Damian Andrew Tamburri, Safia Kalwar, Francesco Marconi
  • IeAT - Gabriel Iuhasz, Dana Petcu, Ioan Dragan
  • XLAB -
  • flexiOPS - Craig Sheridan, Frederick Leighton
  • ATC -
  • Prodevelop - Ismael Torres, Christophe Joubert, Marc Gil
  • Universidad Zaragoza - Simona Bernardi, Abel Gómez, José Merseguer, Diego Pérez, José-Ignacio Requeno

How to Read This Book[edit]

This book is about a methodology for constructing big data applications. A methodology exists for the purpose of solving software development problems. It is made of development processes—workflows, ways of doing things—and tools to help concretise them. The ideal and guiding principle of a methodology is to facilitate the job and guarantee the satisfaction of stakeholders involved in a software project—end-users and maintainers included. Our methodology addresses the problem of reusing complex and not easily learned big data technologies to effectively and efficiently build big data systems of good quality. To do so, it takes inspiration from two other successful methodologies: DevOps and model-driven engineering. Regarding prerequisites, we assume the reader has a general understanding of software engineering, and, from a tool point of view, a familiarity with the Unified Modeling Language (UML) and the Eclipse IDE.

The book is composed of eight parts. Part I is an introduction (Chapter 1) followed by a state of the art (Chapter 2). Part II sets our methodology forth (Chapter 3) and reviews some UML diagrams convenient for modelling big data systems (Chapter 4). Part III shows how to adjust UML in order to make it support a stepwise refinement approach, where models become increasingly detailed and precise. Each chapter (Chapters 5, 6, and 7) is dedicated to one of our three refinement steps. Part IV focuses on model analysis. Indeed, models enable designers to study carefully the system without needing an implementation thereof: a model checker (Chapter 8) may verify whether the system, as it is modelled, satisfies some quality of service requirements; a simulator (Chapter 9) may explore its possible behaviours; and an optimiser (Chapter 10) may find the best one. Part V explains how models serve to automatically install (Chapter 11), configure (Chapter 12), and test (Chapters 13 and 14) the modelled big data technologies. Part VI describes the collect of runtime performance data (Chapter 15) in order to detect anomalies (Chapter 16), violations of quality requirements (Chapter 17), and rethink models accordingly (Chapter 18). Part VII presents three case studies of this methodology (Chapters 19, 20, and 21). Finally, Part VIII concludes (Chapter 22) and mentions future works (Chapter 23).


  1. Introduction
    1. 50% developed Introduction
    2. 0% developed Related Work
  2. DevOps and Big Data Modelling
    1. 50% developed Methodology
    2. 0% developed Review of UML Diagrams Useful for Big Data
  3. Modelling Abstractions
    1. 0% developed Introduction to Modelling
    2. 0% developed Platform-Independent Modelling
    3. 0% developed Technology-Specific Modelling
    4. 0% developed Deployment-Specific Modelling
  4. Formal Quality Analysis
    1. 0% developed Quality Verification
    2. 0% developed Quality Simulation
    3. 0% developed Quality Optimisation
  5. From Models to Production
    1. 0% developed Delivery
    2. 0% developed Configuration Optimisation
    3. 0% developed Quality Testing
    4. 0% developed Fault Injection
  6. From Production Back to Models
    1. 0% developed Monitoring
    2. 0% developed Anomaly Detection
    3. 0% developed Trace Checking
    4. 0% developed Iterative Enhancement
  7. Case Studies
    1. 50% developed Fraud Detection
    2. 0% developed Maritime Operations
    3. 0% developed News and Media
  8. Conclusion
    1. 0% developed Conclusion
    2. 0% developed Future Work
  9. Appendices
    1. 0% developed Glossary
    2. 0% developed Index