Today's talk is about object-relational mapping, what it is, why it is an essential part of software applications. I will then specifically talk (in a later post) about the ORM framework/tool I am regularly using on my projects and provide consultancy on to my clients: LlblgenPro
I don't pretend here to be an expert on ORM tools inner details and design complexity, I am posting here as an ORM tool user and express my return on experience on this matter.
It's all about data
The vast majority of software applications are dealing with data and data is the core business value of these applications. May you lose or corrupt this data and you are out of business. In some cases, data is an even more critical asset that also needs to be protected against theft and usage by malveillant people.
Data is relational
In the vast majority of cases data is stored in relational database systems. Reason for this is first historical: database systems have been developed over years, and evolved to very performing, scalable, reliable, widely used systems. For such a critical asset of any business, this is not something that will change easily, even though object databases have emerged and are used is some cases. Secondly, the relational representation of data along with the commonly used query language (SQL) are well suited to querying and consolidating large amount of data in a concurrent multi-user mode, computing aggregate information, insuring transactional operations and enforcing integrity of data.
Code is object oriented
Object oriented development is now widely accepted and used for its well-known benefits that we will not detail here. Through object oriented design and techniques, code is made a close abstraction of real world and business domain concepts. Many design patterns have been identified that help solve common design problems and leverage the power of object oriented design to build robust, flexible solutions.
Impedance mismatch and data access layer
Now we can spot the problem/challenge and the mismatch that exists between application code that is object oriented and need to work on data represented and stored in a relational format. And from a separation of concern point of view we want this mismatch to be encapsulated and abstracted from both the database layer and the application code layer. It is indeed a best practice from a software architecture point of view to separate concerns and loose couple the various aspects of a software project (more on this in a future post). In our specific case we want to have an independent relational database layer created, maintained and optimized by a database specialist who does not need to be concerned at all by the way his data will be accessed and used; and at the same time ensure that application developers can use their best creativity and knowledge on how to efficiently build flexible, robust and maintainable application code in an object oriented fashion without having to deal with the technical intricacy of converting back and forth relational data to objects. And this is where we introduce the data access layer, which purpose is to encapsulate all the tricky details and logic needed to perform the bi-directional transformation between relational data and object oriented application data. From now on, we will name object oriented application data after the denomination of business objects or entities.
OK, so can't I develop my own data access layer?
As developers, most of us have one day developed our own data access layer, taking care of communicating with the database, retrieving and exposing data to the business layers and performing the common CRUD (create/update/delete) actions against the database. After all, this at first does not seem like such a big deal does it? But it very quickly appears that writing proper, efficient and robust data access code is really a hell of a work. For instance how about managing primary keys / foreign keys relationship? How about dealing with strongly typed data rather than simple raw data tables? What about server side pagination? What about complex retrieval of data across multiple tables?
Other fact is that the data access code is highly technical plumbing code that is good candidate to be made generic and thus legitimate the use of a common set of tool, framework and runtime libraries to perform the work for us and concentrate on the core business implementation. So here we are: after writing one's own data access layer on a real, reasonably sized project, one will quite fast conclude that leveraging an existing ORM tool is relevant!
So tell me about ORM tools!
One important point to note is that a data access layer does not necessarily have to be ORM layer; in other words, a basic data access layer that only provide encapsulation of the database connections, database commands building and requesting data that is then used by the business layer (probably as ADO.NET datasets) is not a proper ORM layer. The ORM layer goes one step further than a basic data access layer by providing (or connecting to, through a mapping configuration) a business entity model layer on top of the actual data access code and most probably advanced functionalities that can include various interesting stuff like object oriented query language (that could now be exposed through the Microsoft Link extension to .NET languages), web service and distributed application support, auditing, database schema change management, multi-databases, support for data-binding etc.
Using an ORM can therefore save an awful lot of work and give you for free many extra functionality that you can potentially use to improve or extend your applications. The only matter then is to select one ORM tool from the many existing.
You can see from what exists that ORM tools can be of different nature, you will find:
1. Simple entity objects and data access code generator (which I believe are not really fully fledged and powerful ORM tools). Example could be Code Smith templates.
2. Entity object layer code generator + runtime libraries for data access. Example of such a tool is LlblgenPro.
3. Mapping configuration to connect an existing data store to an existing entity object layer + runtime libraries for data access. Example of such a tool is NHibernate
Please note that an ORM tool can also be data centric, model centric or both. A data centric ORM tool will need an existing database schema and generate its entity business model from it; a model centric ORM tool will be able to generate a database from an object entity model. And hybrids ORM tool will be able to connect an existing entity object model and existing database through a configuration mapping file.
I don't personally find the model centric approach and generation of database relevant. The database should IMHO be designed separately. There are techniques to design efficient well performing databases and I don't believe an automatic generation of the schema is appropriate, except maybe for quick proof of concept prototype or very low load applications. Designing an appropriate database schema is too much dependant on the application business domain and the usage that will be done of it, and the load that will be supported by the application.
I personally find the data centric approach being the best compromise for most applications. The constraint is that your business model should then be quite close to your data model, which I believe in most business application is the case. It's maybe not the case for warehouse databases designed and optimized for reporting purposes, but there are other techniques than business application development to manage their data and more a appropriate scenario would then to use a specialized data reporting server application (like Microsoft SQL Reporting Services). Another case for which the entity data model might not fit the database schema could be very high load databases for which the database schema has been deeply optimized for performance.
I can see too ways to overcome the constraint of having close entity model and database schema and allowing some divergence between the models. One is to use a hybrid ORM tools allowing connection though mapping configuration between an entity object model and a database schema. However what I don't like about this solution is that you have to write your own entity data model where a data centric ORM tool could automatically generate it for you. Other option is to use a data centric ORM tool and add to the database schema database views that could encapsulate and hide for you the differences between your models. Well of course then you have to handle updateable view or use triggers to manage write operations on the view but hey, that would be the subject of another post!
So you now know my preferences. Of course every project has its own specificities and constraints and the choice of an ORM strategy must be revised for every new project. But I always try to save myself the most development work I could, and therefore anytime I can I go for a data centric ORM, because of its ability to generate for you both the entity object model and the actual data access layer, saving you a huge development effort! And my favorite ORM tool, of which I will talk about in a future post, is LlblgenPro.
Comments