Azure Cosmos DB For Your Budget Conscious Projects: Creating a Repository to Store Multiple Entities in One Collection using the MongoDB API

I have been very excited about Cosmos DB since the day of its public preview. The idea of a schema-less, NoSQL document database, which is globally distributed and promises low latency at the 99th percentile is a dream come true for lots of developers. So I did the what anyone would do, installed the emulator and jumped right in. At first try, I created a new collection for each entity that I have in my project, these collections all reside in a single database. This seemed quite logical to me, since the documents inside each of these collections need a unique Id field. However, it was only when I moved everything to production that I realized how the pricing model for Cosmos DB works.

In case you haven’t tried it yet, let me explain how Cosmos DB is priced and why that can be a problem for small projects; Cosmos DB is billed for both storage and throughput, and when you provision a new Cosmos DB collection, you have to choose – as of writing – a minimum of 400 RUs/second (Request Unit per second) per collection. So if your project needs to store 4 different entities in 4 different collection, you will have to pay for at least 400 * 4 RUs/second, which can add up rather quickly. This can be a problem if you already have a project that is written with the MongoDB API, because chances are that you’re using multiple collections to logically and physically separate your entities in your database, and migrating your solution to Cosmos DB without any modifications can yield a very expensive bill. If you want to know more, this article explains the problem more in details.

In this article, I try to create a solution to store all of your different entities under one collection in Cosmos DB, without sacrificing the uniqueness of the ‘Id’ field of each entity and the querying benefits while keeping our business logic isolated and abstracted from the underlying infrastructure, so that it seems that each entity has its own collection. This also allowed me to migrate my project that used a collection per entity to one collection for all entities with very minimal changes.

Setup

In order to store all documents in a single collection while making sure that the Id field stays unique throughout the entire collection, I decided to prefix each entity with its table name and a delimiter. For example, if we have a table for users “User” and the Id for each user is their email address, the Ids of the entities in the collection will look like this:

  1. User|[email protected]
  2. User|[email protected]

However, this model poses an issue as we can’t filter on the ‘Id’ field without adding the prefix first, and in order to get the actual Id value, we have to remove the prefix. In small projects, this wouldn’t be a big issue, but as the project grows, the hassle of always adding and removing the prefix makes the code look ugly and hard to maintain, not to mention that it’s also very easy to create bugs.

We need to isolate this in a repository class that handles all basic database operations, and this is what we will be creating in this article.

The Repository

The complete source code for this project can be found here.

Start with installing the MongoDb driver Nuget package by running the following Nuget command in the Package Manager:
Install-Package MongoDB.Driver
Then we need to create a new interface for our generic documents, create a new interface and call it “IDbDocument”. The only thing we need in this interface is the Id property that all of our entities will share:

public interface IDbDocument
{
string Id { get; set; }
}

We then need to add the interface of our generic repository, create a new interface and call it “IRepository”, the repository will only have async methods and it will have methods for all basic CRUD operations:

public interface IRepository<T> where T : IDbDocument
{
Task<T> GetByIdAsync(string id);
Task<List<T>> GetAllAsync();
Task<List<T>> WhereAsync(Expression<Func<T, bool>> exp);
Task<T> InsertOrUpdateAsync(T newItem);
Task<bool> DeleteAsync(string id);
Task<bool> DeleteAsync(T item);
Task<long> DeleteAllAsync();
}

Restricting the generic type T to IDbDocument, which we created earlier allows us to use the ‘Id’ field in the repository.
Create a new class and let’s name it “MongoDbRepo”, this will be our concrete implementation of the interface. In order to execute operations on the database, we need an instance of “IMongoCollection”, so we’ll add a private readonly field of type “IMongoCollection” and initialize it in the constructor.

public class MongoDbRepo<T> : IRepository<T> where T : IDbDocument
{
private readonly IMongoCollection<T> collection;

public MongoDbCollection(string connectionString, string dbName)
{
//Create a MongoClient from the connectionString
var mongoClient = new MongoClient(connectionString);

//Use the database name for both the database name and collection name
//Get the database by name, or create one if it doesn't already exist
var mongoDatabase = mongoClient.GetDatabase(dbName);
//Get the collection by name, or create one if it doesn't already exist
collection = mongoDatabase.GetCollection<T>(dbName);
}
}

Create a private static read only string for the table prefix value to prefix the Id’s of the entities.

private static readonly string tablePrefix = typeof(T).Name + "|";

To implement the first method “GetByIdAsync”, we need to append the table prefix to the id parameter, and lastly we need to remove the prefix before returning the entity to the user. This way the business logic can be completely isolated and abstracted from the table prefix before the Id field, while keeping the query performance and uniqueness of the ‘Id’ field.

public async Task<T> GetByIdAsync(string id)
{
//First append the table name to the Id to find the corresponding entity
id = AppendTableNameToId(id);

//Find the item by Id
var result = await collection.FindAsync(a => a.Id == id);
var item = result.SingleOrDefault();

if (item != null)
{
//We have to remove the table name from the Id of the returned item to keep the abstraction.
item.Id = RemoveTableNameFromId(item.Id);
}

return item;
}

private string AppendTableNameToId(string id)
{
//Only append the table name if it doesn't already exist.
//In case the method is called twice.
if (!id.StartsWith(tablePrefix))
id = tablePrefix + id;

return id;
}

private string RemoveTableNameFromId(string id)
{
//Remove the table prefix if it exists.
if (id.StartsWith(tablePrefix))
id = id.Remove(0, tablePrefix.Length);

return id;
}

In order to be able to get all the entities that belong to the same table, we need to do a range scan on the string ‘Id’ field. In the example below:

  1. User|[email protected]
  2. User|[email protected]
  3. Company|Microsoft
  4. Company|Google

To get all the users, we need to check the Ids that are bigger or equal to “User|” and exactly smaller than “User}” (‘}’ is the character that directly follows ‘|’ in the ASCII table).
Create a new private static read only string for the upper bound of the table prefix value.

private static readonly string tablePrefixUpperBound =
tablePrefix.Remove(tablePrefix.Length - 1, 1) + (char)(tablePrefix[tablePrefix.Length - 1] + 1);

The rest of the methods’ implementation follow a similar logic, we need to always make sure to append the prefix to the Id before accessing the database and removing it before returning the entity to the user

public async Task<List<T>> GetAllAsync()
{
//Query all entities that are prefixed by the table name
var result = await collection.FindAsync(a => a.Id.CompareTo(tablePrefix) >= 0 && a.Id.CompareTo(tablePrefixUpperBound) < 0);

//We have to remove the table name from the Ids of the returned items to keep the abstraction.
result.ForEach(a => a.Id = RemoveTableNameFromId(a.Id));

return result;
}

public async Task<T> InsertOrUpdateAsync(T newItem)
{
//First append the table name to the Id.
newItem.Id = AppendTableNameToId(newItem.Id);

T item = await GetByIdAsync(newItem.Id);
//If there are no items in the collection with the same Id, insert the new item.
if (item == null)
{
await collection.InsertOneAsync(newItem);
//We have to remove the table name that we added from the Id of the returned item to keep the abstraction.
newItem.Id = RemoveTableNameFromId(newItem.Id);
return newItem;
}
//Otherwise, replace the current item.
else
{
await collection.ReplaceOneAsync(a => a.Id == newItem.Id, newItem);
//We have to remove the table name that we added from the Id of the returned item to keep the abstraction.
newItem.Id = RemoveTableNameFromId(newItem.Id);
return newItem;
}
}

public async Task<bool> DeleteAsync(string id)
{
//First append the table name to the key to find the corresponding entity
id = AppendTableNameToId(id);

//Deletes the entity that matches the expression
return await collection.DeleteOneAsync(a => a.Id == id);
}

public async Task<bool> DeleteAsync(T item)
{
//First append the table name to the Id to find the corresponding entity
var id = AppendTableNameToId(item.Id);

return await DeleteAsync(id);
}

public async Task<long> DeleteAllAsync()
{
//Deletes all the entities that are prefixed with the table name.
return await collection.DeleteManyAsync(a => a.Id.CompareTo(tablePrefix) >= 0 && a.Id.CompareTo(tablePrefixUpperBound) < 0);
}

Advanced Filtering

So far we’ve implemented almost all of the methods needed for the repository. We can now insert, update, get entities by Id, get all entities, delete all, delete by Id, etc. However, we still can’t create complex queries using filters on other fields than the Id field. The WhereAsync method that is declared in the interface does exactly that, you can pass any predicate to this method and it will return a list of items that match that predicate.

Task<List<T>> WhereAsync(Expression<Func<T, bool>> exp);

Since the predicate can include filters on any field including the Id, we need to be able to modify the predicate and add the table prefix before passing it to the MongoDb API. We also need to restrict the search to only scan the table entities instead of the whole document set that are stored in the collection.

Expression trees are immutable objects, which means that they cannot be modified directly. To change an expression tree, you must create a copy of an existing expression tree and make the required changes in the new copy.

We will first create a new class that inherits from “ExpressionVisitor” and name it “ModifyIdVisitor” to traverse the expression tree and add the table prefix whenever the ‘Id’ field is found on the left side of a binary expression or in a method call, e.g. Id == “value” or Id.StartsWith(“value”) respectively.

private class ModifyIdVisitor : ExpressionVisitor
{
/// <summary>
/// Will be called when the expression is a binary expression. E.g. Id == "value" or Id.CompareTo("value") > 0
/// </summary>
/// <param name="node"></param>
/// <returns></returns>
protected override Expression VisitBinary(BinaryExpression node)
{
MemberExpression propertyExpression = null;

//E.g. Id.CompareTo("value") > 0
if (node.Left is MethodCallExpression methodCallExpression)
{
propertyExpression = methodCallExpression.Object as MemberExpression;

}
//E.g. Id == "value"
else if (node.Left is MemberExpression)
{
propertyExpression = node.Left as MemberExpression;
}

//Checks if the member of the expression is the field 'Id'
if (propertyExpression?.Member?.Name == "Id")
{
//Pass the expression the IdValueUpdateVisitor to add the table prefix to the expression constant
var visitor = new IdValueUpdateVisitor();
node = visitor.Visit(node) as BinaryExpression;
}

return base.VisitBinary(node);
}

/// <summary>
/// Will be called when the expression is a method call (A method that returns a boolean in our particular case)
/// E.g. Id.StartsWith("value") or Id.Contains("value")
/// </summary>
/// <param name="node"></param>
/// <returns></returns>
protected override Expression VisitMethodCall(MethodCallExpression node)
{
//We only want to modify the expression value if the method is 'StartsWith',
//because the entities in the collection are all prefixed with the table name
if(node.Method.Name == "StartsWith")
{
var visitor = new IdValueUpdateVisitor();
node = visitor.Visit(node) as MethodCallExpression;
}

return base.VisitMethodCall(node);
}
}

/// <summary>
/// Adds the table prefix to the constant of the expression
/// </summary>
private class IdValueUpdateVisitor : ExpressionVisitor
{
protected override Expression VisitConstant(ConstantExpression node)
{
//Since the Id is always a string, we only want to change the constant of type 'string'.
if (node.Type == typeof(string))
{
//Appends the table prefix if it doesn't already exist.
var newConstant = node.Value.ToString();
if (!newConstant.StartsWith(tablePrefix))
newConstant = tablePrefix + newConstant;

//Creates a new constant expression with the new value.
node = Expression.Constant(newConstant);
}

return base.VisitConstant(node);
}
}

Then in order to restrict the query to only search and scan the entities that belong to the same table, we need to add a new expression to the old expression using a logical AND, but in order for the expressions to be added together correctly, they need to share the same parameter object. Therefore, we will create another class that inherits from “ExpressionVisitor” and name it “ParameterUpdateVisitor” to update the parameter of the old expression to the new parameter.

private class ParameterUpdateVisitor : ExpressionVisitor
{
private readonly ParameterExpression oldParameter;
private readonly ParameterExpression newParameter;

public ParameterUpdateVisitor(ParameterExpression oldParameter, ParameterExpression newParameter)
{
this.oldParameter = oldParameter;
this.newParameter = newParameter;
}

/// <summary>
/// Will be called for each parameter in the expression.
/// Will replace all old parameters with a new parameters
/// </summary>
/// <param name="node"></param>
/// <returns></returns>
protected override Expression VisitParameter(ParameterExpression node)
{
if (ReferenceEquals(node, oldParameter))
return newParameter;

return base.VisitParameter(node);
}
}

Then bring it all together in the following method:

private Expression<Func<T, bool>> AddGlobalTableFilters(Expression<Func<T, bool>> exp)
{
//Checks if the expression has queries over the Id field, and modifies them by adding the table prefix to the values.
var idModifierVisitor = new ModifyIdVisitor();
exp = idModifierVisitor.Visit(exp) as Expression<Func<T, bool>>;

//Creates a new expression that restrict the query to only run over the entities that belong to this table
Expression<Func<T, bool>> newExp = a => a.Id.CompareTo(tablePrefix) >= 0 && a.Id.CompareTo(tablePrefixUpperBound) < 0;

//Updates the new expression to match the parameter of the original expression, so that they can be added together later.
var visitor = new ParameterUpdateVisitor(newExp.Parameters.First(), exp.Parameters.First());
newExp = visitor.Visit(newExp) as Expression<Func<T, bool>>;

//Logical AND between the new expression and the old expression.
var binExp = Expression.And(newExp.Body, exp.Body);

//Returns the composit expression.
return Expression.Lambda<Func<T, bool>>(binExp, newExp.Parameters);
}

Lastly, we can now implement the WhereAsync method:

public async Task<List<T>> WhereAsync(Expression<Func<T, bool>> exp)
{
//Adds a filter to only query the entities that belong to this table.
//In case the user included some queries on the Id field of the entities, they will also be modified to include the table prefix.
//Since all entities from all tables live on the same collection, we don't want to run the query over the entire collection.
var newExp = AddGlobalTableFilters(exp);

var result = await collection.FindAsync(newExp);

//We have to remove the table name from the Ids of the returned items to keep the abstraction.
result.ForEach(a => a.Id = RemoveTableNameFromId(a.Id));

return result;
}

I hope that this article can help you with your next project or to migrate an existing project to take advantage of the features of Cosmos DB. There is still work to be done to make the advanced querying work for more cases, such as comparing the Id field to another field in the entity. Nonetheless, this is a blueprint of how to successfully add entities into the same collection without cluttering your business logic with the prefixes.

Thank you for reading, and feel free to let me know what you think or if you have any suggestions.