Persistent Storage and CSI Initiative for Docker and Kubernetes

Overview of Persistent Storage

Persistent Storage is a critical part in order to run stateful containers. Kubernetes is an open source system for automating deployment and management of containerized applications.

There are a lot of options available for storage. In this blog, we are going to discuss most widely used storage which can use on on-premises or in a cloud-like GlusterFS, CephFS, Ceph RBD, OpenEBS, NFS, GCE Persistent Storage, AWS EBS, NFS & Azure Disk.


Prerequisites for Implementing Storage Solutions

To follow this guide you need -

Kubernetes

Kubernetes is one of the best open-source orchestration platforms for deployment, autoscaling and management of containerized applications.

Kubectl

Kubectl is a command line utility used to manage kubernetes clusters either remotely or locally. To configure kubectl use this link.

Container Registry

Container Registry is the repository where we store and distribute docker images. There are several repositories available online we have DockerHub, Google Cloud,Microsoft Azure, and AWS Elastic Container Registry (ECR).

Container storage is ephemeral, meaning all the data in the container is removed when it crashes or restarted. Persistent storage is necessary for stateful containers in order to run applications like MySQL, Apache, PostgreSQL etc. so that we don’t lose our data when a container stops.

3 Types of Storage 


7 Emerging Storage Technologies

OpenEBS Container Storage

OpenEBS is a pure container based storage platform available for Kubernetes. Using OpenEBS, we can easily use persistent storage for stateful containers and the process of provisioning of a disk is automated.

It is a scalable storage solution which can run anywhere, from cloud to on-premises hardware.

Ceph Storage Cluster

Ceph is an advanced and scalable Software-defined storage which fits best with the needs of today’s requirement providing Object Storage, Block Storage and File System on a single platform.

Ceph can also be used with Kubernetes. We can either use CephFS or CephRBD for persistent storage for kubernetes pods.

GlusterFS Storage Cluster

GlusterFS is a scalable network file system suitable for cloud storage. It is also a software-defined storage which runs on commodity hardware just like Ceph but it only provides File systems, and it is similar to CephFS.

Glusterfs provides more speed than Ceph as it uses larger block size as compared to ceph i.e Glusterfs uses a block size of 128kb whereas ceph uses a block size of 64Kb.

AWS EBS Block Storage

Amazon EBS provides persistent block storage volumes which are attached to EC2 instances. AWS provides various options for EBS, so we can choose the storage according to requirement depending on parameters like number of IOPS, storage type(SSD/HDD) etc.

We mount AWS EBS with kubernetes pods for persistent block storage using AWSElasticBlockStore. EBS disks are automatically replicated over multiple AZ’s for durability and high availability.

GCEPersistentDisk Storage

GCEPersistentDisk is a durable and high-performance block storage used with Google Cloud Platform. We can use it either with Google Compute Engine or Google Container Engine.

We can choose from HDD or SSD and can increase the size of the volume disk as the need increases. GCEPersistentDisks are automatically replicated across multiple data centres for durability and high availability.

We mount GCEPersistentDisk with kubernetes pods for persistent block storage using GCEPersistentDisk.

Azure Disk Storage

An Azure Disk is also a durable and high-performance block storage like AWS EBS and GCEPersistentDisk. Providing the option to choose from SSD or HDD for your environment and features like Point-in-time backup, easy migration etc.

An AzureDiskVolume is used to mount an Azure Data Disk into a Pod. Azure Disks are replicated within multiple data centres for high availability and durability.

Network File System Storage

NFS is the one of the oldest used file system providing the facility to share single file system on the network with multiple machines.

There are several NAS devices available for high performance or can we make our system to be used as NAS. We use NFS for persistent storage for pods and data can be shared with multiple instances.


Deployment of Storage Solutions

Now we are going to walk through with the deployments of storage solutions described above. We are going to start with Ceph.

Ceph Deployment

For Ceph we need to have an existing ceph cluster. Either we have to deploy Ceph cluster on Bare Metal or can use Docker Containers. Then install ceph client on a kubernetes host.

For CephRBD we have to create a separate pool and user for the created pool.

# ceph osd pool create kube 200 200

# ceph auth get-or-create client.kubep mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=kube' -o /etc/ceph/ceph.client.kube.keyring

# ceph --cluster ceph auth get-key client.kubep

# kubectl create secret generic ceph-secret-kube --type="kubernetes.io/rbd" --from-literal=key='AQBvPvNZwfPoIBAAN9EjWaou6S4iLVg/meA0YA==’ --namespace=default

# sudo ceph --cluster ceph auth get-key client.admin

# kubectl create secret generic ceph-secret --type="kubernetes.io/rbd" --from-literal=key='AQAbM/NZAA0KHhAAdpCHwG62kE0zKGHnGybzgg==' --namespace=ceph-storage


apiVersion: storage.k8s.io / v1
kind: StorageClass
metadata:
 name: rbd
provisioner: kubernetes.io / rbd
parameters:
 monitors: 192.168 .122 .110: 6789
adminID: admin
adminSecretName: ceph - secret
pool: kube
userId: kubep
userSecretName: ceph - secret - kube
fsType: ext4
imageFormat: "2"
imageFeatures: "layering"

# kubectl create -f ceph-rbd-storage.yml --namespace=ceph-storage


apiVersion: v1
kind: PersistentVolume
metadata:
 name: apache - pv
namespace: ceph - storage
spec:
 capacity:
 storage: 100 Mi
accessModes:
 -ReadWriteMany
rbd:
 monitors:
 -192.168 .122 .110: 6789
pool: kube
image: myvol
user: admin
secretRef:
 name: ceph - secret
fsType: ext4
readOnly: false

# kubectl create -f ceph-vc.yml --namespace=ceph-storage


kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: apache-pvc
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      Storage: 500Mi

# kubectl create -f ceph-pvc.yml --namespace=ceph-storage


apiVersion: v1
kind: ReplicationController
metadata:
 name: apache
spec:
 replicas: 1
selector:
 app: apache
template:
 metadata:
 name: apache
labels:
 app: apache
spec:
 containers:
 -name: apache
image: bitnami / apache
ports:
 -containerPort: 80
volumeMounts:
 -mountPath: /var/www / html
name: apache - vol
volumes:
 -name: apache - vol
persistentVolumeClaim:
 claimName: apache - pvc

# kubectl create -f apache-pod.yml --namespace=ceph-storage

# ceph osd pool create cephfs_data 200

# ceph osd pool create cephfs_metadata 200

# ceph fs new newfs cephfs_metadata cephfs_data

# sudo ceph --cluster ceph auth get-key client.admin

# kubectl create secret generic ceph-secret --type="kubernetes.io/rbd" --from-literal=key='AQDkTeBZLDwlORAA6clp1vUBTGbaxaax/Mwpew==' --namespace=default


apiVersion: v1
kind: ReplicationController
metadata:
 name: apache
spec:
 replicas: 1
selector:
 app: apache
template:
 metadata:
 name: apache
labels:
 app: apache
spec:
 containers:
 -name: apache
image: apache
ports:
 -containerPort: 80
volumeMounts:
 -mountPath: /var/www / html
name: mypvc
volumes:
 -name: mypvc
cephfs:
 monitors:
 -192.168 .100 .26: 6789
user: admin
secretRef:
 name: ceph - secret


GlusterFS Deployment

We can deploy Glusterfs cluster either on Bare Metal servers or on containers using Heketi. After the deployment, we will create a GlusterFS volume.

# mkdir -p /data/brick1/myvol

# gluster volume create myvol replica 2 node1:/data/brick1/myvol node2:/data/brick1/myvol

# gluster volume start gv0


kind: Endpoints
apiVersion: v1
metadata:
 name: glusterfs - cluster
subsets:
 -addresses:
 -ip: 10.240 .106 .152
ports:
 -port: 1 - addresses:
 -ip: 10.240 .79 .157
ports:
 -port: 1

< # kubectl create -f gluster-endpoint.yaml


kind: Service
apiVersion: v1
metadata:
  name: glusterfs-cluster
spec:
  ports:
  - port: 1

# kubectl create -f glusterfs-service.yaml


apiVersion: v1
kind: Pod
metadata:
  name: glusterfs
spec:
  containers:
  - name: glusterfs
    image: apache
    volumeMounts:
    - mountPath: "/var/www/html"
      name: glusterfsvol
  volumes:
  - name: glusterfsvol
    glusterfs:
      endpoints: glusterfs-cluster
      path: kube_vol
      readOnly: true

# kubectl create -f apache-pod.yaml


NFS Deployment

For NFS server to be consumed by the Kubernetes pod. First, we are going to create persistent volume by adding the following the content in nfs-pv.yaml.


apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv 
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany 
  persistentVolumeReclaimPolicy: Retain 
  nfs: 
    path: /data/nfs 
    server: nfs.server1
    readOnly: false

# kubectl create -f nfs.pv.yaml


apiVersion: v1
kind: PersistentVolumeClaim
metadata:
 name: nfs - pvc
spec:
 accessModes:
 -ReadWriteMany
resources:
 requests:
 storage: 10 Gi

# kubectl create -f nfs-pvc.yaml


apiVersion: v1
kind: ReplicationController
metadata:
 name: apache
spec:
 replicas: 1
selector:
 app: apache
template:
 metadata:
 name: apache
labels:
 app: apache
spec:
 containers:
 -name: apache
image: apache
ports:
 -containerPort: 80
volumeMounts:
 -mountPath: /var/www / html
name: nfs - vol
volumes:
 -name: nfs - vol
persistentVolumeClaim:
 claimName: nfs - pvc

# kubectl create -f apache-server.yaml

AWS EBS Deployment

For using Amazon EBS in Kubernetes pod first we have to make sure that -


kind: StorageClass
apiVersion: storage.k8s.io / v1
metadata:
 name: slow
provisioner: kubernetes.io / aws - ebs
parameters:
 type: io1
zones: us - east - 1 d, us - east - 1 c
iopsPerGB: "10"

# kubectl create -f aws-ebs-storage.yaml


kind: PersistentVolumeClaim
apiVersion: v1
metadata:
 name: aws - ebs
annotations:
 volume.beta.kubernetes.io / storage - class: standard

spec:
 accessModes:
 -ReadWriteOnce
resources:
 requests:
 storage: 10 Gi
storageClassName: io1

# kubectl create -f aws-ebs.yaml


apiVersion: v1
kind: ReplicationController
metadata:
 name: apache
spec:
 replicas: 1
selector:
 app: apache
template:
 metadata:
 name: apache
labels:
 app: apache
spec:
 containers:
 -name: apache
image: apache
ports:
 -containerPort: 80
volumeMounts:
 -mountPath: /var/www / html
name: aws - ebs - storage
volumes:
 -name: aws - ebs - storage
persistentVolumeClaim:
 claimName: aws - ebs

# kubectl create -f apache-web-ebs.yaml


Azure Disk Deployment


kind: StorageClass
apiVersion: storage.k8s.io / v1beta1
metadata:
 name: slow
provisioner: kubernetes.io / azure - disk
parameters:
 skuName: Standard_LRS
location: eastus

# kubectl create -f sc-azure.yaml


kind: PersistentVolumeClaim
apiVersion: v1
metadata:
 name: azure - disk - storage
annotations:
 volume.beta.kubernetes.io / storage - class: slow
spec:
 accessModes:
 -ReadWriteOnce
resources:
 requests:
 storage: 10 Gi

# kubectl create -f azure-pvc.yaml


apiVersion: v1
kind: ReplicationController
metadata:
 name: apache
spec:
 replicas: 1
selector:
 app: apache
template:
 metadata:
 name: apache
labels:
 app: apache
spec:
 containers:
 -name: apache
image: apache
ports:
 -containerPort: 80
volumeMounts:
 -mountPath: /var/www / html
name: azure - disk
volumes:
 -name: azure - disk
persistentVolumeClaim:
 claimName: azure - disk - storage

# kubectl create -f apache-web-azure.yaml


GCEPersistantDisk Deployment

For using GCEPersistantDisk in kubernetes pod first, we have to make sure that


kind: StorageClass
apiVersion: storage.k8s.io / v1beta1
metadata:
 name: fast
provisioner: kubernetes.io / gce - pd
parameters:
 type: pd - ssd

# kubectl create -f gcepd-storage.yaml


apiVersion: extensions / v1beta1
kind: Deployment
metadata:
 name: apache
spec:
 template:
 metadata:
 name: apache
labels:
 app: apache
spec:
 containers:
 -name: apache
image: bitnami / apache
ports:
 -containerPort: 80
volumeMounts:
 -mountPath: /opt/bitnami / apache / htdocs
name: apache - vol
volumes:
 -name: apache - vol
gcePersistentDisk:
 pdName: gce - disk
fsType: ext4

# kubectl create -f apache-gce.yaml


OpenEBS Deployment

# kubectl create -f https://github.com/openebs/openebs/blob/master/k8s/openebs-operator.yaml

# kubectl create -f https://raw.githubusercontent.com/openebs/openebs/master/k8s/openebs-storageclasses.yaml


apiVersion: apps
kind: Deployment
metadata:
 name: jupyter - server
namespace: default
spec:
 replicas: 1
template:
 metadata:
 labels:
 name: jupyter - server
spec:
 containers:
 -name: jupyter - server
imagePullPolicy: Always
image: satyamz / docker - jupyter: v0 .4
ports:
 -containerPort: 8888
env:
 -name: GIT_REPO
value: https: //github.com/vharsh/plot-demo.git
 volumeMounts:
 -name: data - vol
mountPath: /mnt/data
volumes:
 -name: data - vol
persistentVolumeClaim:
 claimName: jupyter - data - vol - claim
 -- -
 kind: PersistentVolumeClaim
apiVersion: v1
metadata:
 name: jupyter - data - vol - claim
spec:
 storageClassName: openebs - jupyter
accessModes:
 -ReadWriteOnce
resources:
 requests:
 storage: 5 G
 -- -
 apiVersion: v1
kind: Service
metadata:
 name: jupyter - service
spec:
 ports:
 -name: ui
port: 8888
nodePort: 32424
protocol: TCP
selector:
 name: jupyter - server
sessionAffinity: None
type: NodePort


# kubectl create -f demo-openebs-juypter.yaml

This will provision the pv and pvc automatically for the jupyter pod.


Features Comparison for Storage Solutions

 

Storage Technologies

Read Write Once

Read Only Many

Read Write Many

Deployed On

Internal Provisioner

Format

Provider

Scalability Capacity Per Disk

Network Intensive

CephFS

Yes

Yes

Yes

On-Premises/Cloud

No

File

Ceph Cluster

Upto Petabytes

Yes

CephRBD

Yes

Yes

No

On-Premises/Cloud

Yes

Block

Ceph Cluster

Upto Petabytes

Yes

GlusterFS

Yes

Yes

Yes

On-Premises/Cloud

Yes

File

GlusterFS Cluster

Upto Petabytes

Yes

AWS EBS

Yes

No

No

AWS

Yes

Block

AWS

16TB

No

Azure Disk

Yes

No

No

Azure

Yes

Block

Azure

4TB

No

GCE Persistent Storage

Yes

No

No

Google Cloud

Yes

Block

Google Cloud

64TB/4TB (Local SSD)

No

OpenEBS

Yes

Yes

No

On-Premises/Cloud

Yes

Block

OpenEBS Cluster

Depends on the Underlying Disk

Yes

NFS

Yes

Yes

Yes

On-Premises/Cloud

No

File

NFS Server

Depends on the Shared Disk

Yes

 


Concluding Persistent Storage Solutions

Every storage described above provides different features, speed and flexibility. You have to choose accordingly to your requirement. Persistent storage is necessary for stateful servers like MySQL, PostgreSQL, WordPress sites etc.

There are a lot of options available for persistent storage. This is where we can help you to make the right decision. Reach out to us, tell us your requirement so that we can discuss and help you out.    


How Can Don Help You?

Our DevOps Consulting Services provides DevOps Assessment and Audit of your existing Infrastructure, Development Environment and Integration.

Our DevOps Professional Services includes -

Cloud Infrastructure Solutions

Get Cloud Consulting Services, Cloud Infrastructure Services, Cloud Migration Solutions, Application Migration to Cloud and Cloud Management Services all under one roof. Don offers Cloud Infrastructure Solutions on leading Cloud Service Providers including Microsoft Azure, Google Cloud, AWS and on Container Environment - Docker & Kubernetes

Enterprise Kubernetes Services

Make your Cloud Native Transformation with Kubernetes. Unify your Container Management Solutions into a Kubernetes Solution with support for Multi-Cloud Environments including AWS,Google Compute Engine, Google Kubernetes Engine, Microsoft Azure and more.

Enterprise Continuous Monitoring Solutions

Our DevOps Solutions enables the visibility of Continuous Delivery Pipeline with Continuous Monitoring and Alerting for infrastructure, processes, applications and Hosts. Our Product, for Log Analytics.