Big Data and DevOps

When a new IT buzzword is formed, we tend to analyze its relationship with other IT aspects. Today we are going to review the relationship of two IT buzzwords: Big Data and DevOps.

What is DevOps

Dev is from the word “development” and Ops is from the word “operations”, but as you can see in figure 1, there is a QA piece which has not made it into the new name. Here’s the definition of DevOps from Wiki: “It is a culture, movement or practice that emphasizes the collaboration and communication of both software developers and other information-technology (IT) professionals while automating the process of software delivery and infrastructure changes.” Communication and collaboration is the key for DevOps, and a good QA is the key for a good communication of the dev team and the ops team.

devops.pngfigure 1

To better understand what DevOps is, I recommend the following blog from Marc Hornbeek, Principal Practice Architect on DevOps for Trace3, 7 Pillars of DevOps: Essential Practice for Enterprise Success, which highlighted the 7 key aspects of DevOps and how it will lead to a successful enterprise DevOps practice.

  1. Collaborative Culture – Dev, Ops and QA teams need to align goals and create cooperative procedures.
  2. Designed for DevOps – The basic for DevOps design is modular and immutable architecture using micro services.
  3. Continuous Integration – Must have minimum impact to production.
  4. Continuous Testing – Cover all pipelines and avoid bottleneck
  5. Continuous Monitoring – Ensure full coverage of all pipelines to avoid bottlenecks.
  6. Continuous Delivery – It’s a non-stop practice to support ever-changing business.
  7. Elastic Infrastructure – Virtualized environment/Cloud

DevOps7Pillarsfigure 2

The multiple steps in a DevOps process are called pipelines. There are different variations for the name of the pipes, but they are all similar to the ones in figure 2. It covers the key steps of the software development life cycle.

Design – Create – Merge – Build – Bind – Deliver – Deploy

DevOps Trends

Before moving on the big data, I would like to point out a few trends in DevOps:

Agile Methodology – Rapid development of solutions. Agile development is closely related to DevOps. The collaboration and communication between the developers who build and test applications and the IT teams that are responsible for deploying and maintaining IT systems and operations makes it possible for iterations of quick development and deployment.

Virtualized Infrastructure – Providing scalability and elasticity, with shared infrastructure resources that scales up or down as required. Deploying solutions in the cloud is the right directions.

Continuous Deployment – Continuously test solutions and continuously improve. Let me emphasize this one more time. We need to continuously provide upgrades to support the ever-changing business.

Big Data and Consulting

Big Data Consultant should always coordinate with DevOps no matter if the client is building a big data team, establishing big data infrastructure, or working on any big data analytics/development project.

  • Building teams – Skill sets review, talent acquisition, and training.
  • Big data infrastrucutre – Set up infrastructure in cloud or on premise. Install Hadoop, NoSql database, and third party tools and platforms.
  • Big data project – As consultants, we should never keep the ownership of projects or codes. Need to be able to transfer code, knowledge and support to client.

Based on the function of the group being consulted, whether business or IT, the recommendations can be at a strategic level or at an operational level. Either way, the consultant should bridge the gaps between business and IT.

They should also bridge the gaps between IT teams who are gatekeepers of the data and the data scientist and data analyst who need the infrastructure to run analytics.

A key to success for Big Data DevOps is the operationalization of predictive models to achieve continuous Analytics.

DevOps for Big Data


Basically, DevOps for Big Data can be divided into three categories: data infrastructure, data engineering and data analytics.

DevOps for Data Infrastructure – Provisioning data notes, deploying clusters, installing tools and security policies.

Big data technology such as Hadoop and Spark are getting more mature and more popular. Maintaining a group of seasoned developers and architects who understand how the technology is implemented, and has worked with the open source version can help keeping an edge and guiding the team to the right direction.

With virtual infrastructure getting more and more popular, it’s almost a must have to be able to work on elastic cluster provisioning, monitoring and auto scaling in the cloud.

DevOps for Data Engineering – Defining data structure, ETL, creating APIs, and providing data-platform-as-a-service supports data scientist.

We need to consider the following, when planning DevOps for data engineering.

  1. As any other DevOps projects, signed off project plan and design document need to be obtained, so we can have a clear scope of the initiative and provide better estimation. (for big data projects, your client will think really big.)
  2. Is it a truly big data project? Do not achieve the goal of creating an RDBMS in Hadoop. (ROI is way too small.)
  3. It is a common practice to create data warehouse in Hadoop, the money saved in adding more Oracle and Teradata server can be used to set up Big Data infrastructure, but the ultimate goal of introducing a big data environment is to support Data Analytics.
  4. We can recommend products, or help build solutions, but in the end, the client needs to be able to achieve self services.

DevOps for Data Analytics – Building Models, Turning prototypes into operational solutions.

Data Science is also development. Data Scientist needs to write code and test result in order to find a solution, and that solution needs to be operationalized.


We have discussed the relationship of DevOps and Big Data in this blog,  but it is another interesting topic on how big data will change the landscape of DevOps, which we will talk about next time.