James Michael Hare

...hare-brained ideas from the realm of software development...
posts - 137 , comments - 1099 , trackbacks - 0

My Links

News

Welcome to my blog! I'm a Sr. Software Development Engineer in Seattle, WA. I've been doing C++/C#/Java development for over 18 years, but have definitely learned that there is always more to learn!

All thoughts and opinions expressed in my blog and my comments are my own and do not represent the thoughts of my employer.

Blogs I Read

MCC Logo MVP Logo

Follow BlkRabbitCoder on Twitter

Tag Cloud

Archives

Post Categories

C#/.NET Fundamentals: Choosing the Right Collection Class

The .NET Base Class Library (BCL) has a wide array of collection classes at your disposal which make it easy to manage collections of objects. While it's great to have so many classes available, it can be daunting to choose the right collection to use for any given situation. As hard as it may be, choosing the right collection can be absolutely key to the performance and maintainability of your application!

This post will look at breaking down any confusion between each collection and the situations in which they excel. We will be spending most of our time looking at the System.Collections.Generic namespace, which is the recommended set of collections.

The Generic Collections: System.Collections.Generic namespace

The generic collections were introduced in .NET 2.0 in the System.Collections.Generic namespace. This is the main body of collections you should tend to focus on first, as they will tend to suit 99% of your needs right up front.

It is important to note that the generic collections are unsynchronized. This decision was made for performance reasons because depending on how you are using the collections its completely possible that synchronization may not be required or may be needed on a higher level than simple method-level synchronization. Furthermore, concurrent read access (all writes done at beginning and never again) is always safe, but for concurrent mixed access you should either synchronize the collection or use one of the concurrent collections.

So let's look at each of the collections in turn and its various pros and cons, at the end we'll summarize with a table to help make it easier to compare and contrast the different collections.

The Associative Collection Classes

Associative collections store a value in the collection by providing a key that is used to add/remove/lookup the item. Hence, the container associates the value with the key. These collections are most useful when you need to lookup/manipulate a collection using a key value. For example, if you wanted to look up an order in a collection of orders by an order id, you might have an associative collection where they key is the order id and the value is the order.

The Dictionary<TKey,TVale> is probably the most used associative container class. The Dictionary<TKey,TValue> is the fastest class for associative lookups/inserts/deletes because it uses a hash table under the covers. Because the keys are hashed, the key type should correctly implement GetHashCode() and Equals() appropriately or you should provide an external IEqualityComparer to the dictionary on construction. The insert/delete/lookup time of items in the dictionary is amortized constant time - O(1) - which means no matter how big the dictionary gets, the time it takes to find something remains relatively constant. This is highly desirable for high-speed lookups. The only downside is that the dictionary, by nature of using a hash table, is unordered, so you cannot easily traverse the items in a Dictionary in order.

The SortedDictionary<TKey,TValue> is similar to the Dictionary<TKey,TValue> in usage but very different in implementation. The SortedDictionary<TKey,TValye> uses a binary tree under the covers to maintain the items in order by the key. As a consequence of sorting, the type used for the key must correctly implement IComparable<TKey> so that the keys can be correctly sorted. The sorted dictionary trades a little bit of lookup time for the ability to maintain the items in order, thus insert/delete/lookup times in a sorted dictionary are logarithmic - O(log n). Generally speaking, with logarithmic time, you can double the size of the collection and it only has to perform one extra comparison to find the item. Use the SortedDictionary<TKey,TValue> when you want fast lookups but also want to be able to maintain the collection in order by the key.

The SortedList<TKey,TValue> is the other sorted associative container class in the generic containers. Once again SortedList<TKey,TValue>, like SortedDictionary<TKey,TValue>, uses a key to sort key-value pairs. Unlike SortedDictionary, however, items in a SortedList are stored as sorted array of items. This means that insertions and deletions are linear - O(n) - because deleting or adding an item may involve shifting all items up or down in the list. Lookup time, however is O(log n) because the SortedList can use a binary search to find any item in the list by its key. So why would you ever want to do this? Well, the answer is that if you are going to load the SortedList up-front, the insertions will be slower, but because array indexing is faster than following object links, lookups are marginally faster than a SortedDictionary. Once again I'd use this in situations where you want fast lookups and want to maintain the collection in order by the key, and where insertions and deletions are rare.

The Non-Associative Containers

The other container classes are non-associative. They don't use keys to manipulate the collection but rely on the object itself being stored or some other means (such as index) to manipulate the collection.

The List<T> is a basic contiguous storage container. Some people may call this a vector or dynamic array. Essentially it is an array of items that grow once its current capacity is exceeded. Because the items are stored contiguously as an array, you can access items in the List<T> by index very quickly. However inserting and removing in the beginning or middle of the List<T> are very costly because you must shift all the items up or down as you delete or insert respectively. However, adding and removing at the end of a List<T> is an amortized constant operation - O(1). Typically List<T> is the standard go-to collection when you don't have any other constraints, and typically we favor a List<T> even over arrays unless we are sure the size will remain absolutely fixed.

The LinkedList<T> is a basic implementation of a doubly-linked list. This means that you can add or remove items in the middle of a linked list very quickly (because there's no items to move up or down in contiguous memory), but you also lose the ability to index items by position quickly. Most of the time we tend to favor List<T> over LinkedList<T> unless you are doing a lot of adding and removing from the collection, in which case a LinkedList<T> may make more sense.

The HashSet<T> is an unordered collection of unique items. This means that the collection cannot have duplicates and no order is maintained. Logically, this is very similar to having a Dictionary<TKey,TValue> where the TKey and TValue both refer to the same object. This collection is very useful for maintaining a collection of items you wish to check membership against. For example, if you receive an order for a given vendor code, you may want to check to make sure the vendor code belongs to the set of vendor codes you handle. In these cases a HashSet<T> is useful for super-quick lookups where order is not important. Once again, like in Dictionary, the type T should have a valid implementation of GetHashCode() and Equals(), or you should provide an appropriate IEqualityComparer<T> to the HashSet<T> on construction.

The SortedSet<T> is to HashSet<T> what the SortedDictionary<TKey,TValue> is to Dictionary<TKey,TValue>. That is, the SortedSet<T> is a binary tree where the key and value are the same object. This once again means that adding/removing/lookups are logarithmic - O(log n) - but you gain the ability to iterate over the items in order. For this collection to be effective, type T must implement IComparable<T> or you need to supply an external IComparer<T>.

Finally, the Stack<T> and Queue<T> are two very specific collections that allow you to handle a sequential collection of objects in very specific ways. The Stack<T> is a last-in-first-out (LIFO) container where items are added and removed from the top of the stack. Typically this is useful in situations where you want to stack actions and then be able to undo those actions in reverse order as needed. The Queue<T> on the other hand is a first-in-first-out container which adds items at the end of the queue and removes items from the front. This is useful for situations where you need to process items in the order in which they came, such as a print spooler or waiting lines.

So that's the basic collections. Let's summarize what we've learned in a quick reference table.

Collection

Ordering

Contiguous Storage?

Direct Access?

Lookup Efficiency

Manipulate

Efficiency

Notes

Dictionary

Unordered

Yes

Via Key

Key:

O(1)

O(1)

Best for high performance lookups.

SortedDictionary Sorted No Via Key Key:
O(log n)
O(log n)

Compromise of Dictionary speed and ordering, uses binary search tree.

SortedList

Sorted

Yes

Via Key

Key:

O(log n)

O(n)

Very similar to SortedDictionary, except tree is implemented in an array, so has faster lookup on preloaded data, but slower loads.

List

User has precise control over element ordering

Yes

Via Index

Index: O(1)

Value: O(n)

O(n)

Best for smaller lists where direct access required and no sorting.

LinkedList

User has precise control over element ordering

No

No

Value:

O(n)

O(1)

Best for lists where inserting/deleting in middle is common and no direct access required.

HashSet

Unordered

Yes

Via Key

Key:

O(1)

O(1)

Unique unordered collection, like a Dictionary except key and value are same object.

SortedSet

Sorted

No

Via Key

Key:

O(log n)

O(log n)

Unique sorted collection, like SortedDictionary except key and value are same object.

Stack

LIFO

Yes

Only Top

Top: O(1)

O(1)*

Essentially same as List<T> except only process as LIFO

Queue

FIFO

Yes

Only Front

Front: O(1)

O(1)

Essentially same as List<T> except only process as FIFO

 

The Original Collections: System.Collections namespace

The original collection classes are largely considered deprecated by developers and by Microsoft itself. In fact they indicate that for the most part you should always favor the generic or concurrent collections, and only use the original collections when you are dealing with legacy .NET code.

Because these collections are out of vogue, let's just briefly mention the original collection and their generic equivalents:

  • ArrayList
    • A dynamic, contiguous collection of objects.
    • Favor the generic collection List<T> instead.
  • Hashtable
    • Associative, unordered collection of key-value pairs of objects.
    • Favor the generic collection Dictionary<TKey,TValue> instead.
  • Queue
    • First-in-first-out (FIFO) collection of objects.
    • Favor the generic collection Queue<T> instead.
  • SortedList
    • Associative, ordered collection of key-value pairs of objects.
    • Favor the generic collection SortedList<T> instead.
  • Stack
    • Last-in-first-out (LIFO) collection of objects.
    • Favor the generic collection Stack<T> instead.

In general, the older collections are non-type-safe and in some cases less performant than their generic counterparts. Once again, the only reason you should fall back on these older collections is for backward compatibility with legacy code and libraries only.

The Concurrent Collections: System.Collections.Concurrent namespace

The concurrent collections are new as of .NET 4.0 and are included in the System.Collections.Concurrent namespace. These collections are optimized for use in situations where multi-threaded read and write access of a collection is desired.

The concurrent queue, stack, and dictionary work much as you'd expect. The bag and blocking collection are more unique. Below is the summary of each with a link to a blog post I did on each of them.

Summary

The .NET BCL has lots of collections built in to help you store and manipulate collections of data. Understanding how these collections work and knowing in which situations each container is best is one of the key skills necessary to build more performant code. Choosing the wrong collection for the job can make your code much slower or even harder to maintain if you choose one that doesn’t perform as well or otherwise doesn’t exactly fit the situation.

Remember to avoid the original collections and stick with the generic collections.  If you need concurrent access, you can use the generic collections if the data is read-only, or consider the concurrent collections for mixed-access if you are running on .NET 4.0 or higher.

 

  

Print | posted on Thursday, June 16, 2011 7:11 PM | Filed Under [ My Blog C# Software .NET Fundamentals ]

Feedback

Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

I haven't read it yet.. but it looks great!
6/17/2011 4:03 AM | Angelo Chiello
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

@Kam: Very true. There are a lot of incidental ones like that and SynchronizedCollection and ReadOnlyCollection. Was just starting with the basics :-)
6/17/2011 8:46 AM | James Michael Hare
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

Great work man.

Cheers.
6/17/2011 8:55 AM | Abhishek Sur
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

@Abhishek: Thanks!
6/17/2011 9:32 AM | James Michael Hare
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

Awesome summary!!! I printed out copies of your article and handed them out to the devs on my team.
6/17/2011 12:22 PM | Keith Beller
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

@Keith: Wow, very cool. Glad it was useful!
6/18/2011 11:09 PM | James Michael Hare
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

truel..great blog..
6/22/2011 6:46 AM | sachin
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

Very nice informative post
6/24/2011 1:01 PM | tauqir
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

Thank for your great information!
6/30/2011 9:16 PM | Nguyen Trung Ha
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

What a great article - thank you for posting! This is one to print & hang at work for all devs to read. Very nice.
7/2/2011 8:08 AM | cori
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

Oh how funny - I wrote that before I read Keith's comment.
7/2/2011 8:11 AM | cori
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

Great writeup, I'm gonna keep a copy of this printed out for quick reference. Thanks!

Maybe an interesting followup to this could be discussing how to pass collections around, and when and why you should use Collection<T> and ReadOnlyCollection<T>. FxCop warns you if you don't use those, but it's always a little fuzzy why.
7/3/2011 3:40 AM | Bzz
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

nice post!!
It is also usefull to know that when you are using hashtable .net framework load some unnessessary assemblies for about 20-30 MB in memory
7/5/2011 3:03 AM | Oleg
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

GREAT POST! I always wanted to know the exact diferences between this collections regarding to performance and under the hood funcitoning. :)
7/5/2011 6:03 AM | HART
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

I was looking for more comments on Collection<T> in comparison to List<T> and Array<T>. What about IEnumberable<T> or IQueryable<T>. Surely there are more collection types to choose from that were not mentioned here.
7/6/2011 9:54 AM | Kid
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

I was mainly focusing on the collections in System.Collections.Generic, Collection<T> does warrant some mention, of course, since it's very useful as a base class for any custom collections you want to create, but List<T> is generally faster since it's optimized since its not intended to be inheritable.

IEnumerable and IQueryable aren't collections, but are interfaces so you wouldn't choose them as a collection to contain objects, per se.
7/6/2011 2:09 PM | James Michael Hare
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

this is very informative. Thank for sharing
7/10/2011 8:30 PM | Samit
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

Problem using the Dictionary<K,V>; it's not serializable . Do you know of an alternative ?

Type 'System.Collections.Generic.Dictionary`2[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]' is not supported for serialization/deserialization of a dictionary, keys must be strings or objects.
7/15/2011 3:46 PM | Larry
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

@Larry: Are you using the XmlSerializer? If you use the DataContractSerializer, it works (.NET 3.5 and above) which can serialize to XML or JSON.

Alternatively, you can write your own sub-class that performs custom serialization, but this can get hairy.
7/15/2011 5:01 PM | James Michael Hare
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

What are your thoughts on HybridDictionary? If memory serves me correctly, it is optimized to use either a Dictionary or an array under the hood based on the number of elements in the list.
7/19/2011 3:26 PM | Joe
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

@Joe: My gut was to avoid them because they were a part of the 1.1 collections, and if it were still necessary, they would have been re-included as a generic in the 2.0 generic collections.

But I went ahead and did some performance tests, and each time the Dictionary<TKey,TValue> came out ahead of the HybridDictionary for both small and large sized collections.
7/19/2011 4:05 PM | James Michael Hare
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

To the person dropping vitriolic anonymous comments on my blog. I'm perfectly open to constructive criticism if I mislabel something incorrectly, but please do it in a constructive way. Those comments were just demeaning and insulting.
7/23/2011 8:48 AM | James Michael Hare
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

I cleaned up an inaccuracy in the summary table. I had a column that listed whether the collection was ordered or not, and a simple yes/no was ambiguous here since it doesn't distinguish between ordered and sorted. I changed the column to ordering and made the text more explicit.

For a more detailed discussion on ordered vs sorted see here:

http://stackoverflow.com/questions/1084146/what-is-the-difference-between-an-ordered-and-a-sorted-collection
7/23/2011 9:32 AM | James Michael Hare
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

Thanks for the info. We have an application that uses HybridDictionary's quite a bit but like you pointed out, it was build off of the 1.1 framework. I have been slowing upgrading it to use the Dictionary generic and appreciate the info so I know that I am on the right path.
7/25/2011 1:29 PM | Joe
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

@Joe: No problem! Glad I could help!
7/25/2011 3:21 PM | James Michael Hare
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

Fantastic stuff. I have been coding dot.net for nearly a year now and just started refactoring some of the first code i wrote. I was using alot of arraylist's and have been replacing with generic List<> methods. This article has greatly re-inforced why i started on the change. It will be pinned on my 'people smarter than me say this' cork board. (I use that when my boss questions why i am doing something a particular way.)

Its great to know i am heading in the right direction.
Cheers from
8/3/2011 6:42 PM | Sam Virtue
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

@Sam: Thanks! Yep, once you start using the generic collections you really don't feel like ever going back :-)
8/4/2011 9:15 AM | James Michael Hare
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

Great Article, i liked the way you write.
10/27/2011 2:53 AM | Siddique
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

@Siddique: Thanks!
11/1/2011 10:23 AM | James Michael Hare
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

Good Article to find all of Collections together.
Thanks!
11/7/2011 6:26 AM | MH
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

Very good post. I was searching exactly for this one today.
11/23/2011 5:44 AM | Kulbhushan
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

Great article!
Would be terrific with links to documentation (msdn or expert sites)
11/23/2011 8:22 AM | user
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

@user: Good thought, I'll work on adding MSDN links for each collection.
11/23/2011 11:04 AM | James Michael Hare
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

Thank you

Wonderful blog :)
12/6/2011 4:21 AM | ManojMathe
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

Very well explained. Great article!
12/31/2011 5:38 PM | Juan
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

James, I have a NOOB-type question. I'm tasked with creating an object that caches international currency conversion rates. The idea is that if a particular day is requested that isn't in the public cache, it makes a database call to get that day and adds it to the cache. That way if the same date is called subsequent times, the data will be available without a database call.

Given that scenario, what type of collection would you recommend looking into?

Thanks,
Ed
1/16/2012 9:48 AM | EdFromOhio
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

@Ed: Dictionary is the standard for caching, though you'll have to build more logic into it for eviction, etc. There is a Cache in .NET as well if you want a full featured cache.
1/17/2012 10:14 AM | James Michael Hare
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

Good... Keep it up. cheers

Rohit
xpode.com
2/22/2012 1:11 AM | Rohit
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

I think there is an error. If there is a way to access a SortedSet using a key, I don't know what it is. I believe it should be listed as accessible by index.
4/20/2012 9:54 AM | mse
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

In sets, the item is the key itself. So perhaps via key is a bit of a misnomer.
4/20/2012 10:06 AM | James Michael Hare
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

Really good brush up. I have a copy pinned to my desk
6/7/2012 9:45 AM | Sri
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

Very good explanation in short
11/17/2012 11:46 AM | dipak narsangani
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

Superb explanation.
Concise and to the point.
Will be using as a reference material.
12/19/2012 12:58 PM | Sandeep
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

@Sandeep: Thanks!
12/19/2012 1:48 PM | James Michael Hare
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

To be even more precise SortedDictionary uses Red/Black Tree(self balancing binary search tree) as internal structure.
1/3/2013 12:30 PM | chin
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

Further The red/black tree is implemented via a SortedList
1/3/2013 4:32 PM | chin
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

type above it is a SortedSet<T>
1/3/2013 4:34 PM | chin
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

Hi,
Very good article. What about Tuple, MultiMap, Lazy..?
1/14/2013 6:10 AM | Serge
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

Awesome article... Used to refresh my knowledge
2/21/2013 12:53 AM | Ranjith
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

Excellent! Everything one may need is on a single page! Sophisticated things made easy to understand. Thank you, sir!
3/24/2013 9:18 AM | Eugene
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

Excellent article !

Thank you very much it will be useful for me ;)
5/22/2013 7:22 AM | Cyril
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

This is awesome!! Really helpful for me and developers. Thanks for sharing with us. Following links also helped me to complete my task.

http://www.mindstick.com/Articles/ffc462a1-c822-4664-847a-3d05d80804ee/?Collections%20in%20C

http://msdn.microsoft.com/en-us/library/ybcx56wz(v=vs.80).aspx
6/21/2013 3:34 AM | Alex Leblois
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

Awesome!!! You've hit the nail on the head!!!
9/28/2013 7:37 AM | Srividya
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

I've created a few simple performance tests to see how it really performs: https://github.com/Kwull/performance-dictionary
10/10/2013 3:01 PM | Uladzimir Kazakevich
Gravatar

# re: C#/.NET Fundamentals: Choosing the Right Collection Class

I've done some writing up and comparisons of C# collections myself. You can find them at C# Collection comparison
10/21/2013 1:57 PM | quitecode
Post A Comment
Title:
Name:
Email:
Comment:
Verification:
 

Powered by: