PaaS databases available on Azure

Working with Azure means you can use on of databases it provides in PaaS model. If you use them you don’t have to install and configure i.e. SQL Server on some kind of (virtual or not) machine. Databases in PaaS model have a lot of other virtues but let’s not go there in this post, today I want to write about choices you can and should make when choosing which database solution you should use, especially if you’re confused with everything Azure has to offer.

First of all, I’ll completely ignore databases that are available on marketplace (RavenDB, Cassandra) or ones you could setup on Azure VMs (basically anything) and are not fully supported by Microsoft. I’ll also skip lately introduced PostresSQL and MySQL databases in PaaS model, I haven’t had time for playing with them and as far as I know, they are working same as normal installations of those.

Azure SQL

Azure SQL is and is not classic MS SQL in PaaS model. It is because it’s still good, old, relational DB with almost all functionalities you’ve seen in MS SQL Server so far. It isn’t because it does have some differences. I could and probably will write a post about those differences alone, generally speaking databases are a lot more independent beings in Azure SQL. For example, you can setup Azure SQL Firewall in the scope of the entire server or just single database so your IP number can have access to a single database but not to any other resource stored on the same server. You can’t use ‘USE’ in your T-SQL because cross-database queries are not possible now

I could and probably will write a post about those differences alone, generally speaking databases are a lot more independent beings in Azure SQL. For example, you can setup Azure SQL Firewall in the scope of the entire server or just single database so your IP number can have access to a single database but not to any other resource stored on the same server. You can’t use ‘USE’ in your T-SQL because cross-database queries are not possible now. Along with that, some DMVs are not accessible on Azure SQL at all. Instead, you should use their database scoped variants or some other DMV at all.

There were also some other changes, they didn’t just remove stuff we know. Most notable of them are in my opinion Elastic Pools (big topic for solo post), Geo-replication where you can set up some replicas in other regions automatically, wonderful out of the box backup where you can restore your database state from exact point in time (as long as it’s in timeframe supported for you Tier) and some other great features.

When should you use Azure SQL? In any scenario that requires using of MS SQL Server and/or migrating existing database solution from it. So whenever you need a mature solution with strict schema and normalised, relational data model – Azure SQL is a way to go.

Redis

Redis on Azure works just as the normal one. It’s key-value in-memory data storage which persists on underlying disk. In-memory part means it’s really fast.

When should you use Redis? Most common usage is any kind of distributed cache or session storage for some kind of web application. Because value can be of type other that string (hashes, sets, lists, queues etc.) it can be used as i.e. message queue.

Cosmos DB – General

Lately, MS introduced Cosmos DB which is a successor to NoSQL Document DB which offered only document database with it’s own API or alternatively MongoDB API (as far as I know it’s fully compatible with any existing Mongo code).

A lot was changed and now Cosmos DB is multi-model, non-relational data store which allows using differently structured data (document, key-value/column-oriented, graph). That means 3 (4 if we’ll separate key-value from column model which I would rather do not) different models in one product. Let’s start with general virtues of Cosmos DB and then let’s look at specific APIs.

First of all Cosmos DB is globally distributed storage and it can be scaled horizontally almost to infinity. More than that, by default every single property is indexed for querying. That means latency and throughput are both great.

Cosmos DB has five different consistency settings. Before that, we have had a choice between eventual consistency – where data could be different if we queried our DB from different regions at the same time and change wasn’t distributed through all nodes yet and strong consistency – where data would have guaranteed consistency at the price of performance. Soon I’ll write some more about Cosmos DB and then I’ll introduce new consistency options properly, now I’ll just mention them – Session (consistency guaranteed in scope of operations made by single user and eventual consistency in any other case), Bound Staleness (consistency is guaranteed in configured window – data will not be older than x operations and/or t seconds) and Consistent Prefix (very close to eventual consistency but data will never be out of order i.e. if you have bank account every financial operation will be shown in order it was made, nothing will be skipped even if state we’ll see is not entirely consistent).

Cosmos DB is also the only (at the moment) service with SLA that states availability at 99,99%, latency at 99th percentile, guaranteed consistency and level of throughput of your choosing. You can take a look at this SLA here.

When to use Cosmos DB? Anytime you need a database solution that can be scaled out and/or globally distributed with ease and you don’t need to normalise data or constraint it with any strict schema.

What about choosing one of the models and APIs? It’s really simple, you use key-value where you don’t need nested properties, collections etc. just simplest store possible. You use DocumentDB whenever your schema should be a little bit richer and JSON format is applicable while using this model you can also choose MongoDB model which could be useful if you’re already familiar with Mongo or just migrating some Mongo bases solution to the cloud.

Last model and API is Gremlin which is graph database. This model is ideal for scenarios where anything can be related to anything. The best example I’ve heard so far is LinkedIn when a user is some kind of unique entity, every place he worked at is unique entity too and every school he finished is entity too. That gives us a lot of unique entities and they could be connected in almost any way imaginable. If your data looks like that, graph databases (and in Cosmos DB it’s Gremlin) is the way to go.

Summary

I hope that if you’ve wondered about various data storage solutions that are available to you on Azure, you know them now. There are of course many more databases and you can install them on Azure VM (or even outside of Azure) and use them normally but in my opinion, Azure offers almost everything you could need for any case of a data model, at least if we exclude very specific ones with special needs.

I’m very interested in Document DB, especially in Gremlin which matches my pet-project Me 2.0 data model in probably one hundred percent. That means I’ll probably write more about it soon (and rewrite 75% of my app along with the way ;)).