NoSQL Databases - Overview, Types and Selection Criteria

NoSQL Database Explained

As the name implies NoSQL, also called Not-only-SQL are the databases that let the developers store/manage unstructured data and perform complex analytical operations on it as well.

Nowadays a wide range of NoSQL databases are available and can be chosen by developers according to their requirement. So the companies and developers now do not need to stay confined to a single kind of database platform.


Key Characteristics of NoSQL Database

Due to a mismatch between the in-memory data structure and relational data structure of applications, many problems were faced by application developers. By using NoSQL databases, developers do not need to convert in-memory structure to relational structure. They also use it as an integration point to the application.

The organizations are shifting to NoSQL database to achieve higher scalability, higher speed, and continuous availability. 

Features of NoSQL Database 

NoSQL databases can easily spread across multiple data centers and cloud availability.

For example, Adobe runs on Datastax enterprise using Apache Cassandra Database cluster between two data centers to ensure its customers can read and write data fast, no matter where they are located.

NoSQL database like Cassandra offers a much more flexible data model that can easily store structured, semi-structured and unstructured data.


Moving From Relational Database to NoSQL Database

Many applications which made in SQL begin with NoSQL by creating a new application and starting from the ground up, but it creates the issue of application rewrite.

Some choose to augment an existing by adding a NoSQL component to it. This often happens with applications than having outgrown RDBMS due to scaling issues, the need for better availability or other issues. Part of the application continues to use existing RDBMS, but the other components of an application are modified to utilize the NoSQL database.

The system that simply is proving too costly from an RDBMS perspective to keep or increase of users concurrency. A full replacement is done with NoSQL database.


Requirements To Move From RDBMS To NoSQL Database

Performance and scalability - Two are odd to each other, increasing one would decrease the other. For performance, how we execute the same set of requests, over the same set of data with -


Types of NoSQL Databases

There are different types of data stores under NoSQL databases available which allow storage of data. These have different ways to store data. Some data stores that come under NoSQL databases are explained below :

Key-Value Database

There is a difference between the databases which come under key-value databases; all databases are not the same.

For example, Memcached data is not persistent while Riak is. Using Memcached to implement the caching of user preferences will load all the data when the node goes down and refresh required from the source system.

If we use Riak we may not need to worry about losing data but we need to focus only on how to update data. It is important to not only choose a key-value database based on your requirements but also to choose which key-value database to be used.

Uses of Key-Value Store

Key/value stores are used when we have to access the following data -

For example - In a shopping mall, information regarding a particular product is stored on the basis of a particular key. So when the product is scanned, on the basis of barcode all the information for a particular product is accessed.

This key/value database allows us to read and write values as follows -


Document Database

Queries - There is no other way to query the data except the key-value stores. We can also perform range queries on the basis of a key.

Transactions - Mostly document based database support transaction for a single document.

Schemaless - Schemaless means it does not require any schema to store the data. Each document can differ in the number of columns. It understands the data of JSON format only.

Scaling up - In this database, each document is an independent document. It does not support joins. So it is easily possible to share the data across multiple nodes independent of each other.

Column-family Database

Column-family databases store data in column families as rows. These rows have many columns associated with a particular row. Column families basically contain the group of correlated data which we can access together.

Cassandra is more fast and scalable as compared to other column-family databases with write operations because data is spread across the cluster.

4 Major Benefits of Column Family Database


Graph Database

Graph databases allow you to store data in the form of nodes and edges in which nodes are represented as entities and relationships are represented in form of edges. Node is an instance of an object in an application. Relations which are known as edges can also have their properties. Edges have directional significance to represent the relationship between the edges.

Properties are added to the edges of graph database which helps us to query the graph database. For example: what is the distance between two cities, the current age of a particular person.Then the above queries can be answered with the properties of edges. Distance between two cities can be found out with the help of start node, and end node, the distance property which links these two vertices help to find the distance between two nodes.

The current age of a person can be found with the help of birth date and the current date. So, the age of person can be answered with the help of age property which links two vertices, which contain the birth date and current date of the person.

Some of the Graph databases are mentioned here -


Choosing Right NoSQL Database

It involves the type of data that you need to store. NoSQL Databases only differ in the data model that they use. A mismatch between NoSQL solution data model and target application can make or break the success of the project which you are building.

The next question is how large an application is expected to grow and how much data scale support will be needed. Some NoSQL databases are memory based and do not scale across multiple machines where others like Cassandra, scale linearly across many applications.

Choosing Data Model


What is the CAP Theorem?

The concept of consistency(C), availability and partition tolerance(P) across distributed systems gives rise to the need of CAP theorem. But CAP theorem demonstrates that any distributed system cannot guarantee C, A, and P simultaneously.

When data is stored on multiple nodes, all the nodes should see the same data, meaning, that when the data is updated at one node then the same update should be made at the other nodes storing the same data also.

For example, if we perform a read operation, it will return the value of the most recent write operation causing all nodes to return the same data.

A system is said to be in a consistent state, if the transaction starts with the system in a consistent state, and ends with a system in a consistent state. In this model, a system can shift into an inconsistent state during a transaction but, in this case, the entire transaction gets rolled back if there is an error at any stage in the process.

To achieve a higher order of availability, it is required that system should remain operational 100% all the time. So we can get a response at any time. So according to this whenever a user makes a request, a user should be able to get the response regardless the state of a system.

According to this, a system should work despite message loss or partition failure. A system that is partition-tolerant can sustain any amount of network failure. A system that is partition tolerant can sustain any amount of network failure that does not result in a failure of the entire network.

A storage system that falls under CP (partition tolerance with consistency) are MongoDB, Redis, AppFabric caching and Memcached DB.

Databases that come under the partition tolerance are those which store their data on multiple nodes.

As in relational data models, it is required that it should follow the ACID (Atomicity, Consistency, Isolation, Durability) properties. But with NoSQL databases, it is not possible for data storage structures to follow all the C, A and P.

Data storage models which come under the NoSQL database of the following but it is not possible to follow all -

Consistent, Available (CA) Systems have trouble with partitions. Examples of CA systems include -

Traditional RDBMSs like Postgres, MySQL etc (relational)

Consistent, Partition-Tolerant (CP) Systems have trouble with availability while keeping data consistent across partitioned nodes. Examples of CP systems include -

Available, Partition-Tolerant (AP) Systems achieve “eventual consistency” through replication and verification. Examples of AP systems include -


Column Family vs Column Store Database

Bigtable, HBase, Hypertable, and Cassandra come under column-store databases due to their ability to store and access column families separately. This makes them appear in the column store like :

Sybase IQ, C-Store, Vertica, Vectorwise, MonetDB, ParAccel and Infobright which are also able to access columns separately.

In NOSQL, column store databases are categorized into two groups -

Difference between column based store databases which are divided into two parts Group A and Group B on the basis of following parameters - 

Group A uses a multi-dimensional map. It can be row-name,column-name and timestamp are sufficient to uniquely map a value in the database. Group A does not use a relational database model.

Group B databases do not use a relational data model. This results in many people saying that column store databases are not relational.

Databases which come under Group A stores parts of a data entity/row in separate column-families. They have the ability to access these column families separately. Because of column family databases, consisting many columns and the columns within column-families are not independently accessible.

Group B: Databases come under this group, stores columns separately, so the columns are accessible independently.

Group A, it is useful for queries that only access a subset of table attributes in any particular query. The only difference is that each column is stored separately instead of column families. Interface used by these two column family stores: Group A is a part of NoSQL, not have a SQL interface. Group B supports the standard SQL interfaces.

Group B is optimized for reading based analytical workloads. These systems have a fast load time, but a poor performance for making updates.

Because the data warehouses require complex queries, they require mainly read operations and rarely updations are performed in data warehouses.

Group A databases come under group A can handle the much higher rate of updates.

Group B stores the data in the database for the above image in the following form -

(ID): 1 2 3 4 5 6

(First Name): Joe, Jack, Jill, James, Jasmine, Justin

(Last Name):Smith,Williams,Davis,Miller,Wilson,Taylor

(Phone):555-1234,555-5668,555,5432,NULL,555-6527,55-8247

(E-mail): jsmilth@gmail.com, jwilliams@gmail.com, NULL, jmiller@gmail.com, NULL, jtaylor@gmail.com

Each value is stored by itself, without information about what row or column it came from. We can figure out from which row it is coming by counting the number of rows above it corresponding to the same column.

For example, if we want to check the last name for an id whose value is 4, we have to check the value in the fourth row in the last name column to find out the name. So it becomes compulsory to fill the fields with null values to maintain to maintain the order and to get the correct value.

So the group B takes much less space on storage than Group A.

So by storing just column values without storing column-names or row-names, databases under group B optimizes performance by reading the data for each column and after that applying aggregations on them.


NoSQL Challenges

Companies struggle with the mental switch from the relational to NoSQL data model. Projects can be made or broken depending on whether the team has correctly modeled the data for the NoSQL databases to maximize its capabilities so it is required that database professionals should be trained and acquired with the new NoSQL data model in the database they choose.

Some NoSQL databases use the master-slave architecture which can only somewhat scale read operations as compared to peer-to-peer architecture that can scale both read and write.

NoSQL databases are still new, so most of the developers are learning how to use it, but over time this situation will resolve. But it is easier to find an RDBMS expert than a NoSQL developer.

In 2012 in an article, it was said that “NoSQL” equals no security. In this article, the author cited the lack security feature in NoSQL databases.


Conclusion

NoSQL database provides us many benefits, consistency, availability and partition tolerance. It also provides facilities to easily store the graph data, which is not available with SQL databases. Instead of that, there are some problems with NoSQL databases.

One of the drawbacks of NoSQL databases follows CAP, according to this database which follows CAP can obtain only two out of three and have to skip the third one. Because with the CAP only two properties can be achieved. But it provides us with the facility to store the data in form of denormalized form.


How Can Don Help You?

Don Database Administration Services For Enterprise and Startups offers-

Application Modernization Services

Application Modernization services enable the migration of monolithic applications to new Microservices architecture with Native Cloud Support including the integration of new functionality to create new value from existing application. Don Application Modernization Services helps in transforming your legacy application to become more agile and efficient through the power of new IT. 

Database Migration Services

Don Database Migration Services assist organizations in migrating their database environment with minimum downtime. The source database remains fully functional during the Database Migration Process minimizing downtime to applications that rely on the database. Migrate your data to and from most widely used commercial and open-source databases including Cassandra, MongoDB, Amazon DynamoDB, PostgreSQL and more.

Database Managed Services

Database and Application Infrastructure are the critical part of Enterprise IT. With Database Managed Services, you get 24x7 access to our DBAs, architects, and engineers to handle day to day activities. Don Database Managed Services offers Database Monitoring, Incident Management, DB Performance Tuning, Backup/Recovery Management, Database Security Management, Database Security Audits, and more.

Database Administration & Database Automation

Database Administration Services & Database Automation Services helps DBAs to provision, patch, configure, and maintain databases in fully automated fashion without the need for human intervention. Don Database Automation Services offers Automated Configuration and Deployment, Discovery and Asset Management, Patching Automation, and more.

Ready To Discuss Your Requirements Request Free Consultation