In this blog post I will show you how you can allow SQL Server to tap into Couchbase directly using regular SQL commands, and use Couchbase Buckets to share data between two or more SQL Server databases.
SQL Server is a powerful database server when it comes to relational data structures for cloud and enterprise applications, and Couchbase is becoming a leading platform for NoSQL distributed databases offering extreme scale and simple geo replication. Many application developers want the benefits of relational data storage and the scale and ease of use of a NoSQL data store. But without the right tools, it is very difficult to combine both technologies seamlessly without creating data silos. In addition, while many DBAs will manage SQL Server effectively, they are usually unable to assist with data integration projects that need to bring both databases together.
In some organizations, it is important to be able to share data between multiple SQL Server databases. This is usually done with built-in features, such as Replication, or Log Shipping for example. However these options can be cumbersome to implement and have specific limitations, such as the database versions being used by both SQL Server databases. More importantly, these features are not as effective when the two databases are geographically separated, when both databases are considered read/write, or when the two databases cannot connect to each other directly.
How It Works
In order to understand how Couchbase and SQL Server can work together, we must first explore the essential differences between the two systems. While both Couchbase and SQL Server are database servers, Couchbase is traditionally used as a document-centric platform storing files as JSON documents accessed using application code, while SQL Server stores data in a proprietary format that is accessed using the SQL language from applications or by analysts running reports.
One of the major differences between the two systems is related to the format of the data being stored. In Couchbase, data is stored as a JSON document structure, which can have properties, arrays and objects. This means that some JSON documents have depth in their data structure that is not easily represented using rows and columns. SQL Server on the other hand stores and services data in rows and columns, and as a result is not a friendly data store for JSON documents. As a result, in order to exchange data between the two database servers, a data transformation needs to take place, between JSON documents and rows and columns.
In addition to the storage format, Couchbase uses the REST protocol to service data; REST commands are sent as HTTP requests. This differs from SQL Server, which services data using a proprietary protocol called TDS (Tabular Data Stream). This difference in protocols means that SQL Server cannot natively connect directly to a Couchbase server.
Enzo™ Unified is designed to solve the data structure and protocol differences outlined above, providing SQL Server a bridge to communicate natively with Couchbase in real-time, using Linked Server (a feature of SQL Server allowing remote querying of other database systems). With Enzo Unified, SQL Server can connect to Couchbase, fetch data, and modify data inside Couchbase using the SQL language, as if Couchbase understood the TDS protocol and the SQL language of SQL Server.
From an installation standpoint, Enzo Unified is a Windows Service that understands the TDS protocol natively; as such it can be co-hosted with SQL Server on the same machine, or installed on a separate server altogether.
The ability to communicate to Couchbase using the SQL language directly from SQL Server creates unique opportunities to build real-time solutions with less complexity. Let’s take a look at an example on how useful this can be.
Let’s assume we have two SQL Server databases installed at two physical locations that cannot connect directly over the network: one in the USA and one in Europe; these two SQL Server databases hold website users that have previously registered, and we want SQL Server to keep its own copy of locally registered users: the US database will hold users from North America, and the European database will hold users from the old continent. This design provides good performance because each website has its dedicated database in the same geographic region.
However, when a new user registers, we want to make sure that the email address provided is unique across all users worldwide. Indeed, a new user should not be allowed to use an existing email address during the registration process. Instead, if a user from Europe logs into the US website, the website should prevent the creation of a new user.
In this example we will be using Couchbase to store a copy of the web user records centrally. Because Couchbase is a distributed database, it can be installed both in the USA and in Europe; its buckets can easily be replicated across geographic regions automatically with simple configuration settings (replication in Couchbase is suggested here to improve the performance or requests, but it not technically necessary). To simplify our diagram, we will show Couchbase as a single unit; however Couchbase is best installed with multiple instances for performance and availability. An instance of Enzo Unified is installed in each geographic location; Enzo Unified will serve as a real-time bridge between SQL Server and Couchbase, and present the Couchbase bucket as a table to SQL Server. Finally, let’s assume that Couchbase is installed in the cloud so that it can be accessed from both locations (see my blog post on how to install Couchbase in Windows Azure).
Accessing Couchbase From SQL Server
In order to share user data across multiple SQL Server databases, SQL Server must first be able to read and write to Couchbase. This is done through Enzo Unified, either by connecting directly to it using ADO.NET or SQL Server Management Studio (SSMS), or through SQL Server by registering Enzo Unified as a Linked Server. The default bucket from Couchbase is used to store web users: the id will contain the user email, and additional properties are saved (first name, last name, and age). As you can see below, Couchbase contains a single document for user email@example.com.
Let’s fetch this record from SSMS.
First, we need to register the schema of this document so that Enzo Unified can represent it as a database table (in Enzo, tables are not used to store data; they provide a bridge to underlying services, so they are called virtual tables). The following command (‘createclass’) creates a new virtual table called ‘webusers’; notice the list of columns we define for this virtual table (id, type, firstname, lastname and age). We will be using the column ‘id’ as the email address of the user, and the type property will be hard-coded as ‘user’.
exec couchbase.createclass 'webusers', 'default', 'string id,string type,string firstname,string lastname,int age'
Once the virtual table has been created, it can be accessed as a regular SQL table (called ‘webusers’), and through stored procedures (insertwebusers, deletewebusers,updatewebusers and selectwebusers).
Next, we will register Enzo Unified as a Linked Server to allow triggers, stored procedures, and views to access Couchbase directly. Generally speaking it is not required to create a Linked Server against Enzo Unified; a developer can connect directly to Enzo Unified using ADO.NET for example. However a Linked Server is necessary for SQL Server to access Couchbase natively (from triggers for example). Once the Linked Server has been registered, we can simply access the Couchbase documents with a simple SQL syntax; for example the following SQL command could be part of a trigger to check if the email address provided already exists, as such:
This record is accessible from both locations (USA and Europe). The performance of this call is primarily tied to the network bandwidth; very little information is actually transmitted over the network. This command ran in less than once second in the US with Couchbase installed in Windows Azure and SSMS running from my office.
When a new web user registers from either website, a trigger could simply call a stored procedure on Enzo Unified to insert a record in Couchbase. This stored procedure was created during the ‘createclass’ call made previously. An INSERT command on the ‘webusers’ table is also possible.
At this point, we have two users registered. Note that SQL Server can read and write to Couchbase, so it is possible to delete and update records just as easily.
Retrieving Data Using Couchbase Views
An interesting feature of Couchbase is the ability to create Views, which are a projection of a subset of documents that can be accessed for reading. Enzo Unified supports Couchbase views as well.
Then in SSMS, we simply declare the view as being available in Enzo Unified. After connecting directly to Enzo Unified, you declare the view as such:
exec couchbase.defineview 'emails', 'default', 'emails', 'emails'
The syntax of the ‘defineview’ command in Enzo is beyond the scope of this blog; this command essentially creates a new virtual table called ‘emails’ in Enzo, which we can now call directly. This virtual table is read-only (the associated insert,update and delete commands are not created). You may notice that we are not using a Linked Server in the following SQL call; this is a direct, native call to Enzo Unified from SSMS, so we do not need to use the registered Linked Server because the call is not made by SQL Server.
This blog introduces you to the fundamental differences between SQL Server and Couchbase, and how Enzo Unified for Couchbase can enhance applications by leveraging the two database platforms. Enzo Unified enables SQL Server to tap into the power of in-memory tables provided by Couchbase, and Couchbase buckets can be updated directly through database events to keep data synchronized. In this example, I showed you how data can be shared by two geographically separated SQL Server databases through native SQL commands to enable data validation without complex replication or synchronization jobs.
About Enzo Unified
Enzo Unified is a data platform that helps companies reduce development and data integration project timelines, improve data quality, and increase operational efficiency by solving some of the most complex real-time data consumption challenges. For more information, contact firstname.lastname@example.org, or visit http://www.enzounified.com.