Over on the Microsoft.Public.SqlServer.XML newsgroup, one Chris Kilmer asked a good question about how to shred a single XML document into multiple tables using multiple stored procedures. Chris's goal with this was to have each stored procedure update one table and pass the XML remaining nodes of the first document off to the next stored procedure. This is actually fairly easy except for one thing: maintaining referential activity between inserted elements.
The XML involved is fairly simple:
<Invoice>
<InvoiceID>1</InvoiceID>
<Customer>
<FName>Bill</FName>
<LName>Gates</LName>
<Address>
<Street>1 Microsoft Way</Street>
<City>Redmond</City>
<Zip>98052</Zip>
</Address>
</Customer>
</Invoice>
We want to shred this data using SQL Server 2005's XML datatype's Nodes method when and where we can. The database schema has three tables scripted as such.
create table dbo.invoices(pkid tinyint identity(10,1) primary key,InvoiceID int)
create table dbo.customers(pkid tinyint identity(20,1) primary key,invoicePKID tinyint constraint fkCustomersInvoices foreign key references dbo.invoices(pkid) on delete cascade on update cascade,fname varchar(20),lname varchar(20))
create table dbo.addresses(pkid tinyint identity(30,1) primary key,customerPKID tinyint constraint fkAddressesCustomers foreign key references dbo.customers(pkid) on delete cascade on update cascade,street varchar(50),locale varchar(50),postalCode varchar(25))
So no big deal with that. While we could argue that there's better designs, this one works for now. Our first stored procedure really needs to do three things:
- Extract the InvoiceID and insert that info the dbo.invoices table
- Get the identity value for the just inserted row as we'll use that we'll need that when we insert the customer's detail into dbo.customers
- We need to get the Customer element as a fresh XML document to pass on the next stored procedure
Here's the code for a procedure that does that.
create procedure dbo.shred1(@in xml)
as
begin
begin try
begin tran
declare @id tinyint
declare @c table(pkid tinyint)
declare @cust xml
-- Get the invoiceID from the XML and insert that into the invoices table,
-- capturing the PKID for row just inserted.
insert into dbo.invoices(invoiceid)
output inserted.pkid into @c
select c.cust.value('InvoiceID[1]','int')
from @in.nodes('/Invoice') as c(cust)
-- Get the Customer sub-element
select @cust = @in.query('/Invoice/Customer')
-- Get that PKID and insert it into the the Customer element
-- so we can pass that along to the next procedure.
select @id = pkid from @c
set @cust.modify('insert <invoiceID>{sql:variable("@id")}</invoiceID> into /Customer[1]')
exec dbo.shred2 @cust
commit
end try
begin catch
rollback
end catch
end
A few comments need to be made about this example as there's some interesting things being done.
- The inbound and outbound instances really should be bound to an XML Schema collection to make sure they valid. That's not shown here to save time, space and detailing of schema collection behavior.
- This method works great if the inbound instance (@in) contains a single invoice only. If it doesn't, things get more complicated in terms of getting the host row identity value into the outbound XML instance.
- We didn't have to embed the hosting row's identity value in the passed-down instance. We've done that here to keep in-line with Chris's request to pass XML instance between the procedures. However, a positive from this is that we could somewhat more easily resolve the issue above since the we could pass a single XML instance containing all of the customers.
- I wouldn't normally choose to use the .Nodes method to shred a single XML instance, however, Chris said that this is what he does so I'm showing that for the sake of consistency. Here there's no need to do that, per se, as we could just as easily and performantly pick element values out resorting to a .Nodes call.
- Although you can shred XML using what we're looking at here, once you get more than four or five XQuery method calls in a T-SQL Server, you're probably better to have put the instance in a table with a primary XML index on the column holding the instance. Otherwise, you're generating a Node Table for each method call which isn't cheap.
The real meat of Chris's question in the post was this: “However, I do not understand how to retreive the Customer node from the Invoice xml as a new xml doc and then send the new xml doc to the next sproc.“ Getting the Customer element is fairly easy -- declare a variable of type XML and then use a SET query calling the .Query method on the source instance to select the desired elements. Ala:
select @cust = @in.query('/Invoice/Customer')
Since we already have our desired instance as an XML instance, we can simply pass it to the next Stored Procedure as parameter thusly:
exec dbo.shred2 @cust
Our next stored procedure is similar. It takes in the customer's data, parses out their name dumps that into the dbo.customers table. The address portion of the data is extracted as passed along to the next procedure. Again, this has some of the same considerations in terms of performance as noted for the first procedure.
create procedure dbo.shred2(@cust xml)
as
begin
declare @c table(pkid tinyint);
declare @addr xml;
declare @id tinyint;
-- Shred out the customer information and
-- insert that into the customers table,
-- capturing the PKID.
insert into dbo.customers(fname,lname,invoicePKID)
output inserted.pkid into @c
select c.x.value('FName[1]','varchar(20)'),
c.x.value('LName[1]','varchar(20)'),
c.x.value('invoiceID[1]','tinyint')
from @cust.nodes('/Customer') as c(x);
-- Get the ID for that just inserted row and embed it into Address Data.
select @id = pkid from @c
select @addr = @cust.query('/Customer/Address')
set @addr.modify('insert <customerID>{sql:variable("@id")}</customerID> into /Address[1]')
-- Shred the Address data
exec dbo.shred3 @addr
end
go
Our last stored procedure is pretty straight forward, and the same comments apply:
create procedure dbo.shred3(@addr xml)
as
begin
insert into dbo.addresses(street,locale,postalCode,customerPKID)
select a.f.value('Street[1]','varchar(50)'),
a.f.value('City[1]','varchar(50)'),
a.f.value('Zip[1]','varchar(25)'),
a.f.value('customerID[1]','tinyint')
from @addr.nodes('/Address') as a(f)
end
go
The only real pain here was figuring a good way to capture the identity value for the just inserted row. Sure, saving off SCOPE_IDENTITY() is option, but this example also demonstrates that can get the identity using the new OUTPUT clause. More, SCOPE_IDENTITY wouldn't have worked well enough if there had been multiple Invovices serialized into the first Stored Procedure's input XML as it returns the last identity generated, not all of the identities generated.