An ASP.NET Blog
I work for Microsoft and help people and businesses make better use of technolgy to realize their full potential. The opinions mentioned herein are solely mine and do not reflect those of my employer.

Selecting Distinct Values using XPath Expression

Monday, April 25, 2005 7:46 AM
Hi,

XPath is one of the flexible standard for querying an xml document.

Let us consider the following CD xml as example:-

<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
<cd country="USA">
<title>Empire Burlesque</title>
<artist>Bob Dylan</artist>
<price>10.90</price>
</cd>
<cd country="UK">
<title>Hide your heart</title>
<artist>Bonnie Tyler</artist>
<price>9.90</price>
</cd>
<cd country="USA">
<title>Greatest Hits</title>
<artist>Dolly Parton</artist>
<price>9.90</price>
</cd>
<cd country="UK">
<title>Hide your heart</title>
<artist>Bonnie Tyler</artist>
<price>9.90</price>
</cd>
</catalog>

In the above xml, if we would like to get distinct country names, then we can make use of the preceding-sibling property to check for any
existence of the country name so as to filter and show only distinct country names.

/catalog/cd[not(@country=preceding-sibling::cd/@country)]/@country

The above expression would select only distinct country names i.e. UK and USA only once each.

Cheers.

Feedback

# re: Selecting Distinct Values using XPath Expression

suppose the xml was this instead:
<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
<cd>
<country>USA</country>
<title>Empire Burlesque</title>
<artist>Bob Dylan</artist>
<price>10.90</price>
</cd>
<cd>
<country>UK</country>
<title>Hide your heart</title>
<artist>Bonnie Tyler</artist>
<price>9.90</price>
</cd>
<cd>
<country>USA</country>
<country>CHINA</country>
<title>Greatest Hits</title>
<artist>Dolly Parton</artist>
<price>9.90</price>
</cd>
<cd>
<country>UK</country>
<title>Hide your heart</title>
<artist>Bonnie Tyler</artist>
<price>9.90</price>
</cd>
</catalog>

this apparently won't work:
/catalog/cd[not(country=preceding-sibling::cd/country)]/country

"CHINA" was skipped.. how would I resolve this? 5/23/2005 7:14 AM | Nathan

# re: Selecting Distinct Values using XPath Expression

I want to query distinct values from XML file 7/11/2005 10:06 AM | Thiyagu

# re: Selecting Distinct Values using XPath Expression

Have you resolved the problem, as in above case where china is skiped, to get distinct value for the tag? If you have solved the problem then do let me know.

Thank you! 8/13/2005 4:58 AM | Himanshu Chawla

# re: Selecting Distinct Values using XPath Expression

distinct-values(//catalog/cd/price)
It will select distinct price 8/25/2005 2:47 AM | apilkoirala

# re: Selecting Distinct Values using XPath Expression

To select distinct values/nodes that are repeated on many levels, ammend the testing node set as follows:

<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
.<genre title="Rock">
..<cd country="USA">
...<title>Empire Burlesque</title>
...<artist>Bob Dylan</artist>
...<price>10.90</price>
..</cd>
..<cd country="UK">
...<title>Hide your heart</title>
...<artist>Bonnie Tyler</artist>
...<price>9.90</price>
..</cd>
.</genre>
.<genre title="Country">
..<cd country="USA">
...<title>Greatest Hits</title>
...<artist>Dolly Parton</artist>
...<price>9.90</price>
..</cd>
.</genre>
.<genre title="Rock and Roll">
..<cd country="UK">
...<title>Hide your heart</title>
...<artist>Bonnie Tyler</artist>
...<price>9.90</price>
..</cd>
.</genre>
</catalog>

/catalog/cd[not(@country = (preceding-sibling::cd | ../preceding-sibling::genre/cd)/@country)]/@country 6/28/2006 7:20 AM | David Wood

# re: Selecting Distinct Values using XPath Expression

Find the distinct node names...

*[not( name()=name(following-sibling::*) )]

returns the last nodes in the set with unique names. Use preceding-sibling to get the first node, but be prepared for a significant performance hit. 11/1/2006 11:54 PM | John Mee

# re: Selecting Distinct Values using XPath Expression

*[not( name()=name(following-sibling::*) )]

does not return the distinct node names because the call name(following-sibling::*) just returns the name of the first node of the siblings after the current node. This is only valid if the document has the same name of consecutive nodes like in this case:
<cd>...</cd>
<cd>...</cd>
<cd>...</cd>

But not when we have the nodes like:
<key>...</key>
<string>...</string>
<key>...</key>
<string>...</string>
<key>...</key>
<string>...</string>

With this query, it will still return 6 nodes.

Do you have any idea how to solve this? 2/28/2008 11:13 AM | Diiar

Post a comment





 

Please add 5 and 8 and type the answer here: