Introduction

Client Requirement to build a Platform for Observability includes Monitoring, logging, tracing and visualization for Big Data Cluster and Kubernetes with ML and Deep learning where they can monitor, analyze and observe the cluster, data pipeline in real time and can receive alerts if any critical activity occurs in their cluster.

And also needs the utilization and access reports of Cluster according to HDFS Quota and number of Files and Directory Access Reports ( which are Maximum Usable and Minimum usable with TimeSpan)

Business Challenge

Defining Four pillars of the Observability 

  • Monitoring
  • Alerting/visualization
  • Distributed systems Tracing Infrastructure
  • Log aggregation/analytics

Technology Requirements

  • Building Reactive Platform for Big Data Analytics using Apache Flink and Scala
  • Microservices Architecture on Kubernetes, Tracing, and Monitoring
  • Logs Aggregation needed on Object Storage 
  • Alerting System Directly Process the Alerts.
  • Analytics Platform which can detect data anomalies and enable log aggregation(show all related log files at one place) at any given period to save efforts and time of development team.

Defining the KPIs

KPIs, or Key Performance Indicators, are essential metrics about your system.

Some commonly used KPIs are:

  • Number of Users
  • Requests Per Second
  • Response Time
  • Latency

According to SRE, Focus Should Be On-

Number of errors

  • Mean Time to Detect and Restore (MTTD/MTTR)
  • Application Performance Index (Apdex)
  • MTTD: Mean Time To Detect
  • MTTR: Mean Time To Restore

     

Calculation Involves recording three key event times:

  • Problem start time (start)
  • Problem detection time (detect)
  • Problem resolution time (resolve)

Defining Monitoring:

  • Observe and monitor the progress or state of something over a span of time
  • Keep under well-organized review
  • Maintain constant surveillance over

Defining Observability:

  • Provide extremely granular insights into the performance of systems along with rich context
  • Provide clarity into implicit failure modes
  • Provide on the fly generation of information required for debugging

Solution

To Build an Extensible platform for Observability and Monitoring of Microservices, Kubernetes, and Big Data, We adopted the following Approach by Cindy Sridharan Building Continuous Security, Compliance and Automation is a necessity for cloud-native application for Constant Integration,  testing, Deployment, and Delivery and DevOps Pipeline for enterprises 

Monitoring Levels

  • Infrastructure Monitoring
  • Data Pipeline Monitoring
  • Applications/Jobs Monitoring

Looking For More Details

Download Now

What are you doing?

Talk to Experts for Assessment on DevOps Intelligence, Big Data Engineering and Decision Science

Reach Us

Transforming to a Data-Driven Enterprise

Get in Touch with us for Artificial Intelligence Platform and Enterprise Analytics Solution

Contact Us

DevOps Strategy & Best Practises

  • Infrastructure Automation
  • Continuous Integration & Delivery
  • DevOps Assessment
Learn More