Note: Cross posted from Coding The Document.
Permalink
Our theme recently is things that need to be made more consistent in the office products in order to make document generation development more efficient for developers. This time around we will focus on difference between the way text is marked up in Word and PowerPoint.
I have found that there are a number of subtle but important differences in the way text is written to the Open XML standard. This is then reflected in the Open XML SDK’s API.
Examples of these differences are apparent in features such as text color, bolding, underlining and bulleted lists. The main difference seems that the Word team seems to have taken a more object approach and conversely the PowerPoint team seems to favor attributes. The result is that the PowerPoint definition ends up with more moving parts. To illustrate this let’s take a look at the setting of text color and underlining a group of text.
Text color is handled in Word simply by applying a Color object to the run properties. PowerPoint requires a SolidFill object and child SchemeColor and LuminanceModulation objects to get the same effect.
The differences in the way that text is bolded is very similar. In Word the run is assigned a bold object, but PowerPoint it is a boolean attribute of the run properties.
So what is the impact to the developer. Code reuse for those of us who have to generate both documents and presentations is next to nothing. On top of that the learning curve is practically doubled.
I realize that the two products have evolved through separate paths and isolated team, but time for that type of development is long past. The Open XML standard should be unified where ever possible across the Office Applications and allow for greater interaction between the products. Ultimately the synchronizing of these tools will lead to greater adoption.
Note: Cross posted from
Coding The Document.
Permalink
Word 2007 has two built-in methods for tagging content. If you go to the developers tab you will find the ribbon has a section for Controls and a section for XML. The Controls are also referred to as Content Controls. The XML section allows you to define schemas that can be applied to your document and is sometimes called Custom XML.
Both of these constructs can be used when you are coding an application which needs to identify a part of a document and take some action on it. The Content Controls are represented by SdtBlocks and SdtRuns in the Open XML SDK 2. Where as, the Custom XML is represented by CustomXMLBlocks and CustomXMLRuns.
The Content Controls made for a lot of confusion when I first investigated using them since there are multiple types. For my purposes only one really makes sense and that is the Rich Text. It can be applied to any area of a document regardless of content. Even with this capability I found that Content Controls didn’t fit my needs.
I realize that Microsoft views Custom Controls as more reliable because they are not tied to a schema that defines an strict document, but that isn’t how we use the schema. Our schemas are a list of custom identifiers which the users can reliably apply to a document.
As an example, the name of a company may need to be inserted 100 times in random locations throughout a document. This doesn’t fit a tightly structured schema, but it is also a risky to have the person editing the template enter an identifier by hand for each Content Control.
Accurate naming is just as important if not more important than identifying the type of objects contained in a tag when processing a template. Add to this the fact that you may need to identify sections by business usage and can contain multiple child objects such as charts, tables, images and paragraphs.
Between law suites and a change in approach by Microsoft it seems that Custom XML is going away. Microsoft needs to supply more guidance as to what approaches should be taken for marking up templates in a consistent fashion. They also need to come up with a solution for what I see as major a major weakness of Content Controls for this type of usage.
Lastly, both features are only available in Word. We do more template work with PowerPoint than we do Word which puts our development at a disadvantage. People want this data to “present” and therefore need be able to generate “presentations”. Getting similar functionality in PowerPoint is essential. Hopefully we will see these improvements going forward.
I have spent the last several months developing solutions with Office 2007 and the Office Open XML SDK 2. Our client has requirements that cross the suite from PowerPoint Presentations to Word Documents. The Open XML standard which define the structure of these documents is very powerful. My biggest frustration is the lack of consistent capabilities between the products.
Since we are doing document generation based on templates it is very important we that the code can consistently identify any part of a document, whether that is a section of text, a chart, a table or an image. While Word 2007 has Content Controls and Custom XML (2007 only) which can be used for marking up a document, similar features are not available in PowerPoint. This is a major issue for us since the majority of our templated work is in PowerPoint.
A key to a successful solution for me is that a markup needs to be consistent in the way it is implemented in all of the Office applications. It should also have a way that an end user can add a tag to a document without the risk of it being mislabeled because of human error. This is one of the drawbacks of Content Controls. Another thing that makes CustomXml more attractive is that you can use just one type of control to encapsulate content (more on Content Controls versus CustomXml in the next few days). There are a variety of content controls that are tightly typed. In other situations this may be a plus, but if anything the developer should be able to define the type of objects used for tagging.
Further, the fact that something as simple as a Text object being in a different namespace even within the same document type means that we have to write duplicate code for dealing with text in charts, document paragraphs and embedded spreadsheets. If I were to design it, this shared functionality would be abstracted to its own namespace. I want to be able to write clean, reusable code.
Ultimately the teams within the Office suite need to start working together the way that the language teams have begun to do within Visual Studio. The same tagging tools should be available in Word, PowerPoint, Excel and OneNote and they should be represented the same in the XML that is rendered.
The Chicago Architects Group will be holding its next meeting on February 16th. Please come and join us and get involved in our architect community.
Register
Presenter: Chris McAvoy
Topic: Amazon Cloud Services
Location: Illinois Technology Association
200 S. Wacker Dr., Suite 1500
Room A
Chicago, IL 60606
Time: 5:30 - Doors open at 5:00
We had a great turnout this evening and some wonderful discussion. I really enjoyed presenting as well as seeing a lot of people I haven’t seen in a long time. At the same time there were a lot of new faces. Some of the input from tonight will definitely go to improving this talk if I present it again in the future. If anyone has comments feel free to leave them here.
The next talk I am going to work on is Document Generation Frameworks. This will mainly be around Office Open XML, but it will include ODF if I can fit it in.
Look for announcements for the February and March Chicago Architects Group events. In the mean time visit the links below for information from tonight’s meeting.
The slides can be found here.
The code is available here.
Last month when I spoke at ArcSummit for nPlus1.org about design patterns, object oriented programming and dependency injection Chris Woodruff and I sat down and recorded an episode of Thirsty Developer with Dave Bost and Clark Sell. Check it out here.
It seems that Microsoft has lost a lawsuit over their custom xml implementation for Office Open XML. This was about the worst news I could get just before the holidays since I have been working on creating solutions around just that feature. To me it is the most flexible part of the OOXML standard and crosses all types of Office documents. I can only hope that they settle this dispute so we can get back to moving forward document generation processes.
I am starting to write a new blog with my co-worker Andy Schwantes on open document standards and development. I will be cross-posting much of the content here. Check it out.
In an effort to better serve the Chicago architecture community here is a preview of upcoming topics.
January – Dependency Injection and Inversion of Control Containers
February – Amazon Cloud Service
March – Data Integration Architecture
Future:
Document Generation Architecture
Shared strings are the way that Excel reduced redundant data in a worksheet. They are also important if you are working with charts in Word documents or PowerPoint slide decks. Instead of inserting string constants into a cell you give it the index of the string from the SharedStringTable and mark the cell as having a shared string reference.
So what does it take to work with the SharedStringTable
The first thing you need to do is retrieve the existing shared string table. This is a fairly simple operation as is demonstrated below.
SharedStringTablePart sharedStrings = tempSpreadsheet.WorkbookPart.GetPartsOfType<SharedStringTablePart>().First();
Another operation that you will want to do is finding strings within the table since the entire purpose of using it is to eliminate redundancy. I handled this by creating a simple method after converting the SharedStringItems of the table into a List<T>.
private static int GetIndexOfSharedString(List<spreadsheet.SharedStringItem> items, string text)
{
int result = -1;
for (int index = 0; index < items.Count(); index++)
{
if (items[index].Descendants<spreadsheet.Text>().First().Text.Trim() == text.Trim())
{
return index;
}
}
return result;
}
Lastly you will want to add new strings to the table. This is fairly straight forward, but note the last three lines. If you don’t update the counts you will find your new items never save.
sharedItem = new spreadsheet.SharedStringItem();
sharedStrings.SharedStringTable.AppendChild<spreadsheet.SharedStringItem>(sharedItem);
sharedStrings.SharedStringTable.Count++;
sharedStrings.SharedStringTable.UniqueCount++;
sharedStrings.SharedStringTable.Save();
That is really all there is to it.
(Note: for those who have complained about the code formatting I hope this is better)
The Chicago Architects Group will be holding its next meeting on January 19th. Please come and join us and get involved in our architect community.
Register
Presenter: Tim Murphy
Topic: Dependency Injection and Inversion of Control Containers
Location: Illinois Technology Association
200 S. Wacker Dr., Suite 1500
Room A
Chicago, IL 60606
Time: 5:30 - Doors open at 5:00
On December 7th I presented at the nPlus1.org ArcSummit. My talk was on Dependency Injection and Inversion of Control containers. Thank you to all those who attended.
When we were done Chris Woodruff and I were asked to record an episode of The Thirsty Developer. It was a great experience seeing how these shows are put together and being able to just sit down and talk with Dave Bost, Clark Sell and Chris Woodruff. I’ll post again when the episode comes out.
The slides for the presentation are available here.
The code is available here.
In the further adventures of OOXML we have been looking at different approaches to tagging content in a Word template to be programmatically replaced.
Initially we looked at simple in-line text as a user defined tagging system. The problem is that this is very error prone. The user has to enter the tag exactly the same every time. On top of that if the user backspaces while typing the tag or spell check flags the tag then the tag will be split into multiple tags. This is less than desirable.
Content Controls were the next option we explored. This gives you specific object types to search for, but you still have the user filling in the Tag property for each control. The user error factor is still not eliminated. The other problem is that each control has a possible different type. This is a better option, but still not optimal.
Ultimately we settled on Custom XML schema. We found that they can equally wrap all of the different types of content within a document. The user error factor is also removed because you highlight your content and then select the tag from a list. In the end you have a reliable template format and have simplified your code to locate tags.
More to come.
As far as I have seen content controls in Office 2007 render to either a SdtRun or SdtBlock object. The nice thing is that both of these inherit from SdtElement. This allows you to take the query from my earlier post and replace SdtBlock with SdtElement and now you have a universal retrieval. Of course as with any tool you need to be careful you don’t take it too far. Depending on the structure of you document this may not do what you need.
I try not to be too much of a reposter, but I got a little nostalgic on this one. I remember when the first version of this tome came out and I got a free copy when I visited Redmond for the Guided Design conference. You may not agree with everything you find in here, but it is definitely worth the read to see what Microsoft thinks architecture is.