At Alteryx we're entering a project where we require some specific business needs. We require an extremely fast and scalable database, hence Mongo. But we also need to package our product for on-premise installations, which I hear requires that we also support certain SQL databases.
...I don't actually understand why enterprises insist on using SQL. I'm told that enterprise DBA's want control over everything, and they don't want to learn new products like MongoDB. To me, it seems that 3rd products that are bought would be exempt from DBA optimizations & other meddling. But I guess I wouldn't know what it takes to be an enterprise DBA, so I'll shut up about this now. Just my thoughts...
Since relational databases are a lot different than document oriented databases I decided to use NHibernate as an ORM since they've already figured out a lot of the hard problems. I chose NHibernate over Entity Framework mainly because I already know NHibernate, and I know that it has good support across many databases. Nothing against EF in particular.
I've been working on this for a week or so. I've gotten pretty deep into the details so I thought a blog post would be a good way to step out and think about what I've done and where I'm going. The design is mostly mine (of course, I stand on the backs of giants) and really just ties together robust frameworks.
Convention Based Object Model
In order to remain agnostic toward relational/document structure, I decided that there would have to be some basic assumptions or maxims. I like the idea of convention-based frameworks and I really think its the best way to go about building this kind of infrastructure. Also, conventions are a great way to enforce assumptions and keep things simple.
IDs Are Platform Dependent
It's not something I really thought about before this. In relational databases we'll often use an integer as the object ID. They're nice because they're small, simple, and sequential. However, Mongo assumes that you want to be extremely distributed. Dense sequential IDs (like int identity) run into all kinds of race conditions and collisions in distributed environments (unless you choose a master ID-assigner, which kind of ruins the point of being distributed).
MongoDB uses a very long (12 byte) semi-sequential number. It's semi-sequential in that every new ID is a bigger number than the IDs generated before it, but not necessarily just +1. Regardless, it's impractical to use regular integers in Mongo and also a little impractical to use long semi-sequential numbers in SQL.
As a result, I chose to use System.Object as the ID type for all identifiers. NHibernate can be configured to use objects as integers with native auto-increment after some tweaking. The Mongo C# driver also supports object IDs with client-side assignment.
Ideally, I would like to write some sort of IdType struct that contains an enumeration and object value (I'm thinking along the lines of a discriminated union here). This would help make IDs be more distinctive and easier to attach extension methods or additional APIs. I'd also like to make IDs protected by default (instead of public).
The Domain Object
I also created a root object for all persistent objects to derive from. This is a fairly common pattern, especially in frameworks where there is a lot of generic or meta-programming.
I had DomainObject implement an IDomainObject interface so that in all my meta-programming I can refer to IDomainObject. That way there shouldn't ever be a corner case where we can't or shouldn't descend from DomainObject but have to anyway (separate implementation from interface).
The User and Name objects are simple, as you can expect any NHibernate object model to look like. The idea is to keep them simple and keep business and data logic elsewhere.
Are You Interested?
From what I can tell, I think we're breaking ground on this project. It doesn't seem like too many people have tried to make a framework to support both relational and document data stores. Initially I was hesitant to support both relational and document stores. But I think there are some excellent side effects that I will outline in upcoming posts.
The content I've written about so far is only a small fraction of what it took to get this on it's feet. Someone once said that you should open source (almost) everything. So, if you (or anyone you know) would like to see the full uncensored code for this, let me know so I can start corporate conversations in that direction.
Also, if at a later point you decide to refactor a sub-document into it's own top-level document collection in Mongo, you have to add IDs to the new documents. I would consider this type of refactoring to usually be a performance tuning task (similar to creating indexes). So naturally it's a concern of the data layer, not the model or business logic.
The trouble with actually making it protected is that so many frameworks expect the ID to be exposed. Probably because relational databases always expect you to have and ID, so many MVCs are designed with that maxim. We're using WCF, so we might actually be able to get away from that concept.