James Michael Hare

...hare-brained ideas from the realm of software development...
posts - 166 , comments - 1431 , trackbacks - 0

My Links

News

Welcome to my blog! I'm a Sr. Software Development Engineer in the Seattle area, who has been performing C++/C#/Java development for over 20 years, but have definitely learned that there is always more to learn!

All thoughts and opinions expressed in my blog and my comments are my own and do not represent the thoughts of my employer.

Blogs I Read

Follow BlkRabbitCoder on Twitter

Tag Cloud

Archives

Post Categories

Image Galleries

.NET

CSharp

Little Wonders

Little Wonders

vNext

C# Fundamentals: The Differences Between Struct and Class

This week in C# Fundamentals, we'll tackle the basic differences between C# struct and class types. Sure, this has been discussed many times by many different people, but I believe it's one of those subjects that needs to be brought up over and over again to help people new to the language and refresh people who may have forgotten all the minutia.

Introduction:

So, what is the difference between a struct and a class? Well, if you have only ever been exposed to the Java world, there is no concept of a struct as all complex types are classes.  But if you've only ever been exposed to the C++ world, you would probably answer that C++ structs and classes are identical in every way save one: structs are public by default and classes are private by default.

So, with C++ and Java both counted in the ancestry of C#, which route did the designers of C# take when they went to implement class and stuct? Well, it turns out that C# struck out on its own and took neither side.  Struct is not simply a public-defaulted class, but they didn't eliminate it either.

C# is unique among the triumvirate in its differences between what it calls a struct and what it calls a class. While there are a lot of good articles and books that accurately describe the difference, there are also unfortunately a lot of generalizations and incomplete descriptions of the differences.

Repetition is the key to learning, and hopefully by this blog repeating the differences those who are new to C# will learn something, and those who are intermediate will refresh themselves, and those who are advanced can point out any holes I may have missed.

Semantics:

First of all, it is important to note that there is a strong semantic difference in people's mind when you say struct or class that needs to be considered. Typically when you tell someone you've defined a class, this tends to trigger in people's mind the idea that you've created a fully-fledged object-oriented type that follows all the basic OO rules (encapsulation, polymorphism, etc) and has its actual fields well hidden and protected.

However, when you say you've defined a struct, a lot of times people get the image of a very flat data structure with simple fields (or properties) and very few operations. That is, many people think fo structs as lightweight, flat types and classes as full-blown object oriented types.

So, regardless of what the language describes as the differences (which we will get to next section), just keep in mind that people tend to think of classes as object-oriented types and structs as flat "data-holder" types.

Syntactical Comparison:

Now, semantics aside, there are a lot of things that are similar between struct and class in C#, but there are also a fair number of surprising differences. Let’s look at a table that sums them up:

 

 

Feature

Struct

Class

Notes

Is a reference type?

No

Yes*

 

Is a value type?

Yes

No

 

Can have nested Types (enum, class, struct)?

Yes

Yes

 

Can have constants?

Yes

Yes

 

Can have fields?

Yes*

Yes

Struct instance fields cannot be initialized, will automatically initialize to default value.

Can have properties?

Yes

Yes

 

Can have indexers

Yes

Yes

 

Can have methods?

Yes

Yes

 

Can have events?

Yes*

Yes

Structs, like classes, can have events, but care must be taken that you don’t subscribe to a copy of a struct instead of the struct you intended.

Can have static members (constructors, fields, methods, properties, etc.)?

Yes

Yes

 

Can inherit?

No*

Yes*

Classes can inherit from other classes (or object by default). Structs always inherit from System.ValueType and are sealed implicitly

Can implement interfaces?

Yes

Yes

 

Can overload constructor?

Yes*

Yes

Struct overload of constructor does not hide default constructor.

Can define default constructor?

No

Yes

The struct default constructor initializes all instance fields to default values and cannot be changed.

Can overload operators?

Yes

Yes

 

Can be generic?

Yes

Yes

 

Can be partial?

Yes

Yes

 

Can be sealed?

Always*

Yes

Structs are always sealed and can never be inherited from.

Can be referenced in instance members using this keyword?

Yes*

Yes

In structs, this is a value variable, in classes, it is a readonly reference.

Needs new operator to create instance?

No*

Yes

C# classes must be instantiated using new. However, structs do not require this. While new can be used on a struct to call a constructor, you can elect not to use new and init the fields yourself, but you must init all fields and the fields must be public!

Many of the items in this table should be fairly self explanatory.  As you can see there the majority of behavior is identical for both class and struct types, however, there are several differences which can become potential pitfalls if not respected and understood!

Value vs. Reference Type

This one is probably the one key difference most people will point to as the primary difference between struct and class in C#, and not surprisingly, it is the source of several potential pitfalls.

The fact that a struct is a value type, while class is a reference type. This has several ramifications:

With value types:

  • Value type assignments copy all members the whole value - this copies all members of one value to another making two complete instances.
  • Value types passed by parameter or returned from methods/properties copy whole value - this behavior is the same as value assignment.
  • Value types assigned to an object are boxed – that is, they are surrounded by an object and then passed by reference.
  • Value types are destroyed when they pass out of scope - local variables and parameters are typically cleaned up when scope is exited, members of an enclosing type are cleaned up when the enclosing type is cleaned up.
  • Value types may be created on the stack or heap as appropriate - typically parameters and locals are created on the stack, members of an enclosing class are typically on the heap.
  • Value types cannot be null - default value of members that are primitive is zero, default value of members that are a struct is an instance with all struct members defaulted.

In contrast, with reference types:

  • Reference type assignment only copies the reference - this decrements/increments reference counts as appropriate.
  • Reference types passed by parameter or returned from methods/properties pass a reference - the reference is copied, but both references refer to the same original object.
  • Reference types are only destroyed when garbage collected - after all references to the object are determined to be unreachable.
  • Reference types are generally on the heap - it’s always possible compiler may optimize, but in general you should always think of them as heap objects
  • Reference types can be null - default value of members that are reference types members is null.

So what does all this mean? Remember that in C#, as in Java, all classes are reference types that are garbage collected. Therefore anytime you assign a reference, pass a reference, and so on you are really passing a handle to the original object. This would be roughly similar in C++ to creating all your objects on the heap using the new keyword and wrapping them in a Boost (or C++ 0x) shared_ptr (though with shared_ptr objects are immediately deleted when the reference count goes to zero. In Java and C# they will only become eligible for collection at the next collection cycle).

Potential Pitfall: Struct Assignments Create Copy

Because structs are value types, this means that any assignment of a value type makes a complete copy of the value type. Consider the following code snippet:

   1: public class PointClass
   2: {
   3:     public int X { get; set; }
   4:     public int Y { get; set; }
   5: }
   6:  
   7: public struct PointStruct
   8: {
   9:     public int X { get; set; }
  10:     public int Y { get; set; }
  11: }
  12:  
  13: public static class Program
  14: {
  15:     public static void Main()
  16:     {
  17:         // assignment from one reference to another simply copies reference and increments reference count.
  18:         // In this case both references refer to one single object 
  19:         var pointClass = new PointClass { X = 5, Y = 10 };
  20:  
  21:         var copyOfPointClass = pointClass;
  22:  
  23:         // modifying one reference modifies all references referring to the same object
  24:         copyOfPointClass.X = 0;
  25:  
  26:         // this will output [0, 10] even though it's our original reference because they are both referring to the same object
  27:         Console.WriteLine("Original pointClass is [{0},{1}]", pointClass.X, pointClass.Y);
  28:  
  29:         // ...
  30:  
  31:         // assignment from one struct to another makes a complete copy, so there are two separate PointStruct each
  32:         // with their own X and Y
  33:         var pointStruct = new PointStruct { X = 5, Y = 10 };
  34:  
  35:         var copyOfPointStruct = pointStruct;
  36:  
  37:         // modifying one reference modifies all references referring to the same object
  38:         copyOfPointStruct.X = 0;
  39:  
  40:         // the output will be [5,10] because the original pointStruct is unaffected by changes to the copyOfPOintStruct
  41:         Console.WriteLine("Original pointStruct is [{0},{1}]", pointStruct.X, pointStruct.Y);
  42:     }
  43: }

Note that in the case of the class, when you modify the object that change is visible to all references to the same object. But with structs, even though the syntax is identical, each variable is a complete and separate point and assignments simply copy over all the fields from one separate instance to another.

Potential Pitfall: Structs Are Passed/Returned by Value

It’s actually misleading to say that C# is a pass-by-reference language. If this were true you could pass an int into a method and change that int and have the original by modified. C# is really a pass-by-value language. Technically, all parameters are passed by value: even reference parameters.

Why then, can we modify an object when it’s passed into a method? This explanation is fairly simple. Keep in mind in C# when you pass an object into a method, you don’t pass the object itself but it’s reference and its reference is passed by value. You can think of this as pass-by-reference-by-value for reference types. Want proof? Try this:

   1: public static class Program
   2: {
   3:     public static void Main()
   4:     {
   5:         List<string> fruits = new List<string> { "Apple", "Orange", "Banana" };
   6:  
   7:         ChangeListToVegetables(fruits);
   8:  
   9:         fruits.ForEach(fruit => Console.WriteLine(fruit));
  10:     }
  11:  
  12:     public static void ChangeListToVegetables(List<string> list)
  13:     {
  14:         list = new List<string> { "Carrot", "Asparagus", "Broccoli" };
  15:     }
  16: }

If we run this, we don't get back a list of vegetables, but our original list of fruit: Apple, Orange, Banana.

Why? Because even though List<T> is a reference type, what we pass into the function is not really the list itself, but a reference to the list instance called fruits. When fruits is passed into ChangeListToVegetables(), a new local variable named list is created which increments the reference count to the object referred to by fruits.

Now, you assign list to a new list of vegetables instead. This makes the local variable list refer away from the list instance that contains fruits (decrementing its reference count) and assigns it to the address of the list object containing vegetables. But note that fruits is unaffected by this and still refers to the original object. If C# were truly pass-by-reference, it would have changed fruits, but it doesn’t because the reference itself is passed by value (i.e a copy of the reference).

Think of it this way. References are like address labels in a way. When you pass an address label to a method for processing, someone there makes a copy of it so you can have your address label back. They can then use that address label to deliver data to your object, but they can’t change the address on your label because they only have their own copy!

In C++, this would be like passing a pointer by value into a function. You can change the contents of the memory that the pointer points to, but you cannot change the address of the original pointer passed in.

Thus, we could do this:

   1: public static class Program
   2: {
   3:     public static void Main()
   4:     {
   5:         List<string> fruits = new List<string> { "Apple", "Orange", "Banana" };
   6:         
   7:         ChangeListToVegetables(fruits);
   8:         
   9:         fruits.ForEach(fruit => Console.WriteLine(fruit));
  10:     }
  11:     
  12:     public static void ChangeListToVegetables(List<string> list)
  13:     {
  14:         list.Clear();
  15:         list.Add("Carrot");
  16:         list.Add("Asparagus");
  17:         list.Add("Broccoli");
  18:     }
  19: }

And we would get the vegetables output because we didn’t modify the reference, we modified what the reference referred to! Note that if you really wanted, you can change the value of a reference passed into a method by making it a ref or out parameter, which in C++ terms would be like passing a reference to a pointer (or a pointer to a pointer).

So why this long circuitous talk on parameter passing? It’s to illustrate one of the major differences between struct and class in that when you pass a class instance into a method, or return a class instance from a method or a property, you are passing/returning by value which means that since structs are value types (not reference), you will create a copy of them.

In other words, any changes you make to an object passed into a method or returned from a method or property will only affect the copy of that value and not the original.

Remember that example before with the List<string> and how we could modify it (though we couldn’t point away from it) in the method? Well let’s try that with a struct:

   1: public struct Point
   2: {
   3:     public int X { get; set; }
   4:     public int Y { get; set; }
   5: }
   6:  
   7: public static class Program
   8: {
   9:     public static void Main()
  10:     {
  11:         Point p = new Point { X = 5, Y = 10 };
  12:         ChangeToOrigin(p);
  13:         Console.WriteLine("p is now: [{0},{1}]", p.X, p.Y);
  14:     }
  15:     
  16:     public static void ChangeToOrigin(Point p)
  17:     {
  18:         p.X = 0;
  19:         p.Y = 0;
  20:     }
  21: }

If we run this, p is still 5,10. Why? Because we don’t pass a reference to the original struct, we pass a copy of the WHOLE struct. Thus when we modify X and Y in ChangeToOrigin(), we are really modifying the X and Y property of the copy of p in the function.

Once again, if you really want to modify a struct passed into a method, you can pass by ref or out.

Potential Pitfall: Structs Returned from Property Getters are Copies

This creates a pitfall in that property getters that return a struct cannot be used to modify one of the members of the struct or a syntax error occurs because .NET realizes you’re changing a temporary copy that will immediately be thrown away. Consider if we had the following snippet:

   1: public struct Point
   2: {
   3:     public int X { get; set; }
   4:     public int Y { get; set; }
   5: }
   6:  
   7: public struct Rectangle
   8: {
   9:     public Point UpperLeft { get; set; }
  10:     public Point LowerRight { get; set; }
  11: }
  12:  
  13: public static class Program
  14: {
  15:     public static void Main()
  16:     {
  17:         Rectangle r = new Rectangle
  18:             {
  19:                 UpperLeft = new Point { X = 5, Y = 5 }, 
  20:                 LowerRight = new Point { X = 10, Y = 2 }
  21:             };
  22:     
  23:         // This is a syntax error because UpperLeft returns a COPY of UpperLeft, 
  24:         // which can't modify the original.
  25:         r.UpperLeft.X = 0;
  26:         
  27:         // if you really wanted to change UpperLeft, you'd have to change it all at once:
  28:         r.UpperLeft = new Point { X = 0, Y = 5 };
  29:     }
  30: }

Note that r.UpperLeft returns a copy of the Point, and for us to call a setter on a member of that copy is logically unsound because that copy is immediately discarded if not assigned to something, so we’d in essence be modifying something that was about to be thrown away.

Note that this also means that if you try to subscribe to an event in a struct returned from a property, you are subscribing to the copy and not to the original

If these had been created with classes instead of struct, it would have worked because we wouldn’t return back a copy of the Point, but a (copy of a) reference to the original Point, which we can of course modify since it will continue to exist.

   1: public struct Point
   2: {
   3:     private int _x;
   4:     private int _y;
   5:  
   6:     public event System.EventHandler Changed;
   7:  
   8:     public int X
   9:     {
  10:         get { return _x; }
  11:         set { _x = value; OnChanged(); }
  12:     }
  13:  
  14:     public int Y
  15:     {
  16:         get { return _y; }
  17:         set { _y = value; OnChanged(); }
  18:     }
  19:  
  20:     private void OnChanged()
  21:     {
  22:         if (Changed != null)
  23:         {
  24:             Changed(this, new System.EventArgs());
  25:         }
  26:     }
  27: }
  28: public static class Program
  29: {
  30:     public static void Main()
  31:     {
  32:         var firstPoint = new Point { X = 5, Y = 10 };
  33:         var secondPoint = firstPoint;
  34:  
  35:         firstPoint.Changed += (s, e) => Console.WriteLine("A point has changed");
  36:  
  37:         // note that event handlers are only on the individual copy you subscribe to,
  38:         // even though secondPoint was assigned to firstPoint, the event handler did not copy.
  39:         firstPoint.X = 0;
  40:     }
  41: }

 

Potential Pitfall: Structs Always Have Implicit Default Constructor

Remember that with classes if you do not define a constructor, C# generates generates one that initializes all members to their default values. 

   1: using System;
   2:  
   3: public struct PointStruct
   4: {
   5:     public int X { get; private set; }
   6:     public int Y { get; private set; }
   7:  
   8:     // creating a non-default constructor in a struct and not 
   9:     // specifying a default does NOT remove default constructor
  10:     // on struct.
  11:     public PointStruct(int x, int y)
  12:     {
  13:         X = x;
  14:         Y = y;
  15:     }
  16: }
  17:  
  18: public class PointClass
  19: {
  20:     public int X { get; private set; }
  21:     public int Y { get; private set; }
  22:  
  23:     // creating a non-default constructor and not specifying a default
  24:     // removes the default constructor from a class
  25:     public PointClass(int x, int y)
  26:     {
  27:         X = x;
  28:         Y = y;
  29:     }   
  30: }
  31:  
  32: public static class Program
  33: {
  34:     public static void Main()
  35:     {
  36:         // syntax error, PointClass() default constructor does not exist.
  37:         var pointClass = new PointClass();
  38:  
  39:         // not an error, PointStruct() default constructor cannot be removed.
  40:         var pointStruct = new PointStruct();
  41:     }
  42: }
  43:  
  44:  
  45:  
  46:  

So what does the default constructor do for a struct?  Well, it initializes all members to their default value.  This means null for reference members, zero for primitive members.  Interestingly enough, you don't have to call the constructor for a struct.  Calling the constructor simply automatically initializes all members for you.  You can instantiate a struct without a constructor call, but all fields must be public (in my mind this is enough to warrant never doing this) and you must initialize all fields before you can use it:

   1: using System;
   2:  
   3: public struct Point
   4: {
   5:     public int X;
   6:     public int Y;
   7: }
   8:  
   9: public static class Program
  10: {
  11:     public static void Main()
  12:     {
  13:         // notice no constructor call, before you can use this, you must set all fields
  14:         // or you'll get a syntax error, this requires that all fields (not properties, fields!)
  15:         // be public or you can't access them to set them!
  16:         Point p;
  17:  
  18:         p.X = 0;
  19:         p.Y = 5;
  20:  
  21:         // if i wouldn't have set X or Y above, accessing p would be a syntax error
  22:         Console.WriteLine("Point is: [{0},{1}]", p.X, p.Y);
  23:     }
  24: }

 

Potential Pitfall: Structs Cannot (Easily) Change Default Values Of Members

An interesting corollary of the implicit default constructor and the fact that struct instance members can't be initialized is that there is no way to automatically set a member in a struct to any value other than the default. 

For example, say you had a struct called Rectangle and it had a bool field called _isVisible and you wanted to default this to true.  There is no easy way to do this with a struct:

   1: public struct Rectangle
   2: {
   3:     // syntax error, can't initialize an instance member of a struct
   4:     public bool _isVisible = true;
   5:  
   6:     // syntax error, can't change the default constructor of a struct
   7:     public Rectangle()
   8:     {
   9:         _isVisible = true;
  10:     }
  11: }

Because there is no way to hide the default constructor for a struct, no way to change the default constructor, and no way to intialize a struct field, you are pretty much stuck with the default value of false.  There are patterns you could do to get around this, but they really convolute the type:

   1: public struct Rectangle
   2: {
   3:     public bool _isVisible;
   4:     public bool _isInitialized;
   5:  
   6:     public bool IsVisible
   7:     {
   8:         get
   9:         {
  10:             if (!_isInitialized)
  11:             {
  12:                 _isInitialized = true;
  13:                 _isVisible = true;
  14:             }
  15:  
  16:             return _isVisible;
  17:         }
  18:  
  19:         set
  20:         {
  21:             _isInitialized = true;
  22:             _isVisible = value;
  23:         }
  24:     }
  25: }

 

Potential Pitfall: Structs Can Worsen Performance if Incorrectly Used

A lot of times, people here that structs are deterministic in terms of clean-up, and thus think that they should be used all over the place to save pressure on the garbage collector.  But, this is not always the case! 

Once again I implore you to consider the rules of optimization:

  1. For novices: don't optimize.
  2. For experts: don't optimize yet.

A lot of times if you try to exit the gate optimizing, you can actually negatively impact your performance.  Keep in mind that structs are always assigned, passed, and returned by value.  This means that if you overuse struct you can be creating a ton of copies which takes time and space especially if your struct size is non-trivial (larger than 16 bytes)!  Whereas with classes, whenever you assign one class to another, or pass or return a class, you are only copying a reference (roughly 4 bytes) so the cost of passing, assigning, and returning class instances is very cheap (which is why in C++ people tend to pass large values by const &).

This means that in many ways passing and using a class can actually be more performant.  The key is to really know the differences of what each does and when each makes sense.  For 99% of the time, though, you should favor class unless you have a very compelling reason not to.

Summary and Recommendations:

So when does a struct make sense?  Keep in mind once again that I'd say your default frame of mind should be to always start with a class unless you have an overriding reason.  Microsoft made the garbage collector extremely performant and you should feel confident relying on it.  That said, here are some guidelines straight from Microsoft's Value Type Recommendations:

  • structs should be very small types - typically 16 bytes (roughly 4 members) or less, this reduces copy times.
  • structs should be immutable - because structs are copied, you save yourself a lot of heartaches if you just make them immutable (once assigned, cannot be changed, like System.DateTime).
  • structs should have value semantics - when we talk about struct, we think of a flat piece of data, and with class, we think about a full-featured object-oriented type.

Once again, I truly believe your first choice should be to lean towards class and not over-optimize.  But if you do find yourself needing to add some performance, and you find you have a type that follows the guidelines above and you're afraid you're allocating so many you're adding undo pressure on the garbage collector, consider the struct, but consider it carefully -- it's a valuable tool and is a good part of the language, but it has its own set of pitfalls and behaviors that you should understand for so that you choose it when it best fits.

Technorati Tags: ,,,,

 

 Technorati Tags: , , , , ,

 

Print | posted on Thursday, July 29, 2010 8:57 PM | Filed Under [ My Blog C# Software .NET Fundamentals Little Pitfalls ]

Powered by: