Performing a Case In-sensitive search in an XML Document

XML, is one of the greatest standards that have been developed and adopted over the recent years for data storing and retrieving. XML Documents can store as much information as required and yet consume very less disk space due to them being flat files. XML has provided a great means of replacement to some extent for Databases.

However XML has some restrictions (rather a good one) in it.

XML is case sensitive. Man, MAN, man & mAn are all different when it comes to XML.

XML tags need to be supplemented with end tags for every start tag. So a <Customer> tag definitely needs a </Customer> tag to mark its end.

Security, Scalability and Maintainability are certain issues which still need to be looked into.

There are a variety of XML Parsers that have been built and in use over the years and many standards such as the XPATH, XQUERY have been defined for querying XML Documents just like one would query a Database Table.

.NET has provided a lot of methods / APIs to work with XML Documents and the fullest support for querying / displaying XML Data is actually one of the major features in the advantage of .NET over its precedessors.

XPATH is fully supported in .NET, which is a W3C accepted standard for querying XML Documents. You can implement a Search through an XML Document using XPATH Expression and retrieve/modify the values in the XML Document.

However, since Xml is case sensitive, a search for "Whidbey", "WHIDBEY", "wHiDBey" will not produce the same results. In fact, the words have to be exactly matching for the XPATH Expression to identify the node.

If we would like to implement a case insensitive search, then we need to modify our XPATH Expression a little bit using the translate method. We will see a sample XPATH Expression.

Let us consider the following Books.xml

<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore>

<book category="Cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>

<book category="xml">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>

<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>

<book category="XML">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>

<book category="xMl">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>


If in the above XML Document, we would like to find out all books that fall under the category "XML" and we dont worry about the case in which the category has been mentioned in the file (i.e. xml, XML, xMl etc.,), we can use the following XPATH Expression which retrieves the set of nodes irrespective of the case of the category "XML".

string searchtext = "Xml";

/bookstore/book[translate(@category, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz') = '" + searchtext.ToLower() + "']

As it is evident from the above XPATH Expression, we are assigning the search keyword "xml" to the variable searchtext and in the XPATH, we are
converting it to all lower case using the ToLower() method of the string class and building the xpath expression. Same way, in the XPATH, we are using the translate function to convert the @category values to all lower so that it searches through the categories irrespective of whether the search keyword is "xml", "XML" or "xML".

Upon executing the above XPATH Expression using appropriate XmlDocument or XpathDocument, the results which fall under the category "XML" will be retrieved irrespective of whether it is "xml", "XML" or "Xml".

Thus, we can implement a case-insensitive search through XML Documents using the XPATH Expression's translate function.

Cheers and Happy XPathing !!!

posted @ Monday, September 12, 2005 1:17 PM

Print

Comments on this entry:

# re: Performing a Case In-sensitive search in an XML Document

Left by Shahid H. Faruqi at 1/13/2006 8:26 PM
Gravatar
Thank you very much. It was very helpful.

# re: Performing a Case In-sensitive search in an XML Document

Left by Tim at 7/6/2006 11:02 AM
Gravatar
Brilliant - thanks very much for this post Harish!!!

# re: Performing a Case In-sensitive search in an XML Document

Left by Achintya Jha at 11/14/2006 7:20 PM
Gravatar
Works great!!! Thanks

# re: Performing a Case In-sensitive search in an XML Document

Left by Gaurav Sawant at 11/28/2006 6:39 AM
Gravatar
thanks a lot for this helpful tip. Saved me a lot of effort. Thanks again

# re: Performing a Case In-sensitive search in an XML Document

Left by Mike at 2/7/2007 10:35 AM
Gravatar
Indeed, very helpful! Thanks!

# re: Performing a Case In-sensitive search in an XML Document

Left by nabeel at 3/8/2007 1:17 PM
Gravatar
awesome dude

# re: Performing a Case In-sensitive search in an XML Document

Left by Remi at 3/8/2007 9:30 PM
Gravatar
I don't understand how that works since you do not know the locale of the text. Transforming the case of text is locale dependent, so it seems to me that your trick only works for english....

# re: Performing a Case In-sensitive search in an XML Document

Left by Ed Graham at 2/10/2009 7:21 PM
Gravatar
Very nice. It's a shame that SQL Server's implementation of XQuery does not include the translate() function ... :<

# re: Performing a Case In-sensitive search in an XML Document

Left by Pawan Gupta at 7/1/2009 6:26 AM
Gravatar
Thanks, really it helped.

# re: Performing a Case In-sensitive search in an XML Document

Left by mario oyunları at 9/21/2009 10:50 AM
Gravatar
you do not know the locale of the text. Transforming the case of text is locale dependent,

Your comment:



 (will not be displayed)


 
 
 
 
 

Live Comment Preview:

 
«November»
SunMonTueWedThuFriSat
25262728293031
1234567
891011121314
15161718192021
22232425262728
293012345