James Michael Hare

...hare-brained ideas from the realm of software development...
posts - 166 , comments - 1431 , trackbacks - 0

My Links

News

Welcome to my blog! I'm a Sr. Software Development Engineer in the Seattle area, who has been performing C++/C#/Java development for over 20 years, but have definitely learned that there is always more to learn!

All thoughts and opinions expressed in my blog and my comments are my own and do not represent the thoughts of my employer.

Blogs I Read

Follow BlkRabbitCoder on Twitter

Tag Cloud

Archives

Post Categories

Image Galleries

.NET

CSharp

Little Wonders

Little Wonders

vNext

10 Things C++ Developers Learning C# Should Know

After taking a lot of time with C# fundamentals, I decided to go down a different road this week and talk about some of the differences in C# that can be troublesome to people who are used to C++ development but are learning C#. 

My first post on this blog months ago was just a simple piece on how I divorced C++ as my first-love language (here).  This is not to say that C++ is not still a valuable language, in fact as far as object-oriented languages go C++ is still king of performance.  That said, C++ has a lot of quirks and idiosyncrasies due to its age and backward-compatibility with C that have caused it to take a back seat to C# and Java for business programming where time-to-market and maintainability are more important than having the absolute best performance.

C# especially has made huge inroads into the business development market because it offers a lot of the power of C++ without a lot of the danger.  The JIT does a very good job of optimizing the code so much so that in most business programs the performance is very near that of C++, and more importantly it's easier for those less experienced to develop and maintain code that is quicker to market and much less error-prone.

So, given this industry direction a lot of C++ developers have begun to also train themselves in Java and/or C# as well.  The problem is that in C++ developers grow used to certain techniques and assumptions that just don't match in completely in C#. 

This article does not describe all the differences and even skips some obvious ones (single vs. multiple inheritance for example), nor are these necessarily the most important, it's just a set of 10 I've been thinking about, and more articles may follow if people find these useful.

1. Garbage Collection is non-deterministic

This one should be obvious, so I put it right up front.  A lot of times the idea of a Garbage Collector watching after you rubs C++ developers the wrong way, but it truly is there to protect and to serve you.  In addition, with .NET 4.0 the GC has become even more optimized to the point that, unless you are creating tons of transient objects, you should barely notice it -- and in those rare cases you do actually notice it there are things you can do to reduce pressure on the GC (such as re-using class instances or using structs).

In C++, objects are either deleted explicitly (for example by calling delete) if they were dynamically allocated, or are deleted implicitly if they were created on the stack and goes out of scope.  In C#, however, this is not always true.  C# divides the type system into reference types and value types.  Value types include all the primitives, enums, and structs and reference types include all classes. 

Basically value types are always cleaned up when their scope is destroyed (note that it is a mistake to say value types are always on the stack, if a value type is a member of a reference type, obviously it will be on the heap).  Reference types, however are nearly always created on the heap (the only real exceptions are if the JIT optimizes it in some way to use the stack) and the CLR keeps track of how many references exist for each object. 

This does not mean, though, that an object is cleaned up when all references pass out of scope.  Looking back at C++ 0x or in the Boost library in C++, smart pointers will clean up dynamic memory immediately when all references are removed.  In C#, however, the reality it is much more complex than that.  All that you know for certain is that sometime after an object is no longer needed or reachable the object will be cleaned up. 

Now, this may sound somewhat loose, but the reality is the Garbage Collector is really very good at finding out when memory is no longer in use, and it uses a generational algorithm to classify objects with their expected lifetimes and promote them to a next generation if needed.  This is so that it spends most of its time cleaning up short-term, transient objects efficiently and leaves longer-lived objects around for later and rarer in-depth collection cycles.  The end result is that you have a highly optimized GC that you should barely ever notice or care about.

As such, you should never think you'll know when an object is collected, and you should avoid attempting to force the GC to collect early as this often interferes with its very efficient algorithm.  If you do wish to destroy resources deterministically in your class, implement the IDisposable interface, this allows you to use your class in using blocks which will cause Dispose automatically when they exit scope.

2. Structs are not "public classes"

In C++, structs and classes are nearly identical, the only difference being that struct in C++ has public access by default, and the class is private by default.  Otherwise, they are identical in C++, they can both have constructors, methods, destructors, etc.  In C#, though, this is not the case.

In C# classes are reference types which are garbage collected, and structs are value types which are immediately collected when their enclosing scope or type is destroyed.  But more than that, in C# structs are meant to be very light-weight complex types that behave like primitive values in that they are passed by value and copied on assignment.

Really, there are a lot of differences and they should not be confused or considered near-identical (as in C++).  For the complete set of differences, you can visit an older C# Fundamentals post of mine about The Differences Between Struct and Class, but I'll re-copy a table from that post here that sums it up pretty well.

 

Feature

Struct

Class

Notes

Is a reference type?

No

Yes*

 

Is a value type?

Yes

No

 

Can have nested Types (enum, class, struct)?

Yes

Yes

 

Can have constants?

Yes

Yes

 

Can have fields?

Yes*

Yes

Struct instance fields cannot be initialized, will automatically initialize to default value.

Can have properties?

Yes

Yes

 

Can have indexers

Yes

Yes

 

Can have methods?

Yes

Yes

 

Can have events?

Yes*

Yes

Structs, like classes, can have events, but care must be taken that you don’t subscribe to a copy of a struct instead of the struct you intended.

Can have static members (constructors, fields, methods, properties, etc.)?

Yes

Yes

 

Can inherit?

No*

Yes*

Classes can inherit from other classes (or object by default). Structs always inherit from System.ValueType and are sealed implicitly

Can implement interfaces?

Yes

Yes

 

Can overload constructor?

Yes*

Yes

Struct overload of constructor does not hide default constructor.

Can define default constructor?

No

Yes

The struct default constructor initializes all instance fields to default values and cannot be changed.

Can overload operators?

Yes

Yes

 

Can be generic?

Yes

Yes

 

Can be partial?

Yes

Yes

 

Can be sealed?

Always*

Yes

Structs are always sealed and can never be inherited from.

Can be referenced in instance members using this keyword?

Yes*

Yes

In structs, this is a value variable, in classes, it is a readonly reference.

Needs new operator to create instance?

No*

Yes

C# classes must be instantiated using new. However, structs do not require this. While new can be used on a struct to call a constructor, you can elect not to use new and init the fields yourself, but you must init all fields and the fields must be public!

 

3. Do not ignore C# warnings!

We've all had that bit of legacy C++ code that is laced with warnings that we promise ourselves we will clean up someday.  Usually, these warnings tend to grow and grow to where we just find ourselves ignoring them.  If you are one of those folks who diligently clean up your C++ warnings, congratulations and bravo!  If you're like the other 95% of us though, you should shift your attitude about warnings in C#.

C# really only warns you when there is something really potentially wrong (like accidental implicit hiding in #4 below).  Thus you should always, always, always clean up your C# warnings and, if possible, turn on the "Warnings as Errors" compiler option.  This treats every warning as an error and will not allow the build to complete until it is fixed.

There really are no ignorable warnings from the C# compiler, so get in the habit of fixing them as they occur.  The good news is, the number of warnings C# emits are far, far fewer than C++.

4. You must be explicit when you override, otherwise you hide

In C++, when you want a method to be overridable in a sub-class, you declare it to be virtual.  This allows the runtime to defer decision on which implementation to call until runtime when the actual object passed in can be examined:

   1: class Shape
   2: {
   3: public:
   4:     virtual void Draw() { cout << "Drawing a Shape" << endl; }
   5: };
   6:  
   7: class Rectangle : public Shape
   8: {
   9: public:
  10:     // overriding the Shape::Draw method 
  11:     // repeating virtual keyword is not necessary i C++
  12:     void Draw() { cout << "Drawing a Rectangle" << endl; }
  13: };
  14:  
  15: public void DrawShapeOnCanvas(Shape* pShape)
  16: {
  17:     ...
  18:     // if this were not virtual, it would always call Shape::Draw()
  19:     // but since it is virtual, the runtime will look at what type
  20:     // pShape actually is (Shape or Rectangle) and call the appropriate
  21:     // Draw()
  22:     pShape->Draw();
  23: }
  24:     

In C#, this is mostly true as well.  You can use the virtual keyword to mark a method as virtual and hence make it overrideable (though interestingly enough in Java all methods are overrideable and the virtual keyword is not used, thus there is no hiding, per se).  The difference in C# is that, unlike C++, the default behavior when you declare the same method with the same signature is not to override but to hide:

   1: public class Shape
   2: {
   3:     // In C#, you also use virtual to mark that a method can be
   4:     // overridden in a sub-class
   5:     public virtual void Draw()
   6:     {
   7:         Console.WriteLine("Drawing Shape");
   8:     }
   9: }
  10:  
  11: public class Rectangle : Shape
  12: {
  13:     // however, unlike C++, the default behavior is to hide, not override
  14:     public void Draw()
  15:     {
  16:         Console.WriteLine("Drawing Rectangle");
  17:     }
  18: }
  19:  
  20: public static class Program
  21: {
  22:     public static void Main()
  23:     {
  24:         Rectangle r = new Rectangle();
  25:         Shape s = r;
  26:  
  27:         // displays Drawing Rectangle because the Rectangle's Draw
  28:         // hides Shape's Draw
  29:         r.Draw();
  30:  
  31:         // exposes Shape's original Draw
  32:         s.Draw();
  33:     }
  34: }

 So in C# when you go to override, remember to use the override keyword, otherwise you're just hiding:

   1: public class Rectangle : Shape
   2: {
   3:     public override void Draw()
   4:     {
   5:     ...
   6:     }
   7: }

The nice thing is, C# gives you a warning on the implicit hiding, which, if you obey point #3, you will fix.  For more information, I have a blog post on the dangers of implicit hiding here.

5. Generics are not quite Templates

It's very tempting to think that C# generics are the same as C++ templates.  True, they both allow you to write more generic code by allowing you to specify a type parameter, but there are a few notable differences.

C++ templates allow anything to be substituted as long as it will pass the compile.  This means you can call a method or perform an operation on the specified type parameter, and as long as the type supplied at compile time matches the usage it will succeed, otherwise it will fail. 

   1: template <typename T> T Increment(const T& argument)
   2: {
   3:     // this will only succeed compile if there is an operator + 
   4:     // defined between T and int.
   5:     return argument + 1;
   6: }

Also with C++ templates, a copy of the template code is created for every type passed to it.  Thus a list<int>, list<string>, and list<char> end up producing 3 distinct copies of the list<T> code.

With C# generics, you cannot perform any operation against the type parameter that isn't available to the least common denominator.  If you do not have a where clause restricting the type, this only allows you to access methods available on object.  If you do you a where clause, this allows you to access any methods and operations available to the type specified in the where clause. 

This can cause some interesting behavior:

   1: public class Comparer<T>
   2: {
   3:     public static bool AreSame(T first, T second)
   4:     {
   5:         return first == second;
   6:     }
   7: }
   8:  
   9: public static class Program
  10: {
  11:     public static void Main()
  12:     {
  13:         string first = "This";
  14:         string second = "his";
  15:         var comp = new Comparer<string>;
  16:  
  17:         Console.WriteLine("Are these the same: " +
  18:             comp.AreSame(first, "T" + second));
  19:     }
  20: }

Notice above that since you didn't narrow the scope of what T is, it is assumed T is an object, and thus any methods we call or operations we perform apply to object, and thus we get object's == (which is a reference compare) instead of string's == (which checks for string equality).  Overriding still applies (If we had done .Equals instead of == it would have worked), but overloading does not (all operators are overloads, not overrides. 

Also, with C# generics, you will get one copy of the generic code for each value type it is called with, but all reference types share the same copy of the code, since all references take up exactly the same size.  Hence, a List<int>, List<double>, List<string>, List<Stopwatch> only creates 3 copies of the generic code since string and Stopwatch are both reference types and share the same code generation.

6. Even primitives are objects

This may come as a shock to C++ developers, but everything is an object in C++, yes, even non-reference types.  This means that every int, every double, every struct, and every class instances are objects.  This means they can be converted to/from object using typical casting:

   1: // storing a value type as a reference type is called boxing
   2: object x = 7;
   3:  
   4: // taking a reference type back to a value type is unboxing
   5: int y = (int)x;
   6:  
   7: // beware of non-exact casts when unboxing though
   8: // this throws because x is an int, not a double, when you
   9: // unbox you have to know the exact type to pull out, you can't
  10: // convert as the unbox
  11: double d = (double)x;    // throws invalid cast
  12:  
  13: // this, however, will work because you are unboxing and then converting
  14: double d2 = (double)(int)x;

Note that I mean it when I say primitives are objects, they aren't just boxed to become object, they are fully considered as objects (value type objects of course).  The only time boxing/unboxing occurs is when you convert a value type to an explicit reference type of object.  If you want proof that primitives are considered objects, note that you can call object methods directly off primitive literals or instances:

   1: // can call int32's ToString method off a primitive literal
   2: string s = 425.ToString();

7. Don't over optimize, you may defeat yourself

I won't get ito a lot of code details here.  This is just a short note because it is very tempting, coming from C++, to continue to try to squeeze every last drop out of performance by over-optimizing your code base.  This often not only makes your code less readable and maintainable, but in some cases may actually HURT your efficiency. 

For example, some developers like to write long methods to avoid method call times.  This was perhaps more a valid optimization back in the early days of computing than it is now, but yet you still people harping on the overhead of a method call, which is actually very trivial on today's modern hardware.  If you cram all your code into large methods, you deny the chance of the CLR JIT to decide which methods to compile, whereas if you have your code divided into smaller methods, it will only compile those methods that are actually used and needed.

Consider if you have a large method spanning multiple screens that has every possible error case under the sun.  If these cases never or rarely happen, you will always incur the cost of their compilation.  If, however, those rarer cases are in their own methods for readability and good coding practices, you allow the JIT to delay compilation until their first call, which can improve performance.

Yes, this is more a trivial example, but the same is true elsewhere.  If you find yourself pre-optimizing your code to squeeze out performance at the sake of readability and maintainability, stop!  The compiler and runtime already do a lot of optimizing for you!  Remember the rules of optimization:

  1. Beginners: Don't Optimize.
  2. Experts: Don't Optimize yet.

This is so true.  Make sure your design is optimal (decisions like list vs. dictionary or TCP vs. UDP) for the given problem, but only concentrate on micro-optimizing code when you determine there is a measurable bottleneck.  Otherwise you're wasting time and reducing readability in the process.  I go into more details in a post here if you are interested.

8. C# destructors are not C++ destructors

Yes, C# allows you to declare a destructor, but this is not a destructor in the C++ sense.  The C# destructor is really a finalizer.  This means that the destructor is not called when the reference passes out of scope, but when the Garbage Collector actually runs the finalizer on the object. 

In general, you should avoid creating destructors for C# classes, because it adds an extra step in the garbage collection cycle where the object will not get collected as soon as it should be.  As such, you should always make sure that you only declare destructors when you have unmanaged resources that need to be cleaned up. 

Incidentally, you should never clean up managed resources in a C# destructor because you have no guarantee they will still exist at the time of finalization.  If your class has the only remaining reference to, say, a Canvas class, and your class is collectable, then its possible that the Canvas class will be collected as well if your class had the only reference to it.  To make matters even more confusing, you cannot guarantee order of finalization, so it's possible the Canvas will be finalized before your class is, thus to access any managed resources during finalization is inviting disaster.

So, if and only if you have unmanaged resources, you should create a C# destructor for your class, but only then! If you want to release managed resources, you should do so by implementing IDisposable and the Dispose pattern.

9. Casting is not quite the same

In C++ casting comes in all shapes and sizes.  The old C-style casts will perform a conversion if one is defined, or if not will attempt to re-interpret based on the new type.  If this reinterpretation is invalid, the result is undefined.  In the following example, if Rectangle is truly derived from shape and if s does hold a pointer to a Shape (it does in this case) the cast will succeed.  If not, it will reinterpret the Shape * as a Rectangle * whether it is valid or not.

   1: Shape* s = new Rectangle();
   2: Rectangle* r = (Rectangle*)s;

Because of the inherent dangers with C-style casting, C++ came up with several new casts:

  • static_cast<T>(value) - attempts to convert value to T if conversion exists, if does not exist will be compiler error.
  • const_cast<T>(value) - used to remove const-ness of the value.
  • dynamic_cast<T>(value) - used to attempt to downcast value to a subclass, if possible, if not returns null.
  • reinterpret_cast<T>(value) - basically the old C-style cast, will reinterpret the value as a type T.

In C#, however, there's only two casts:

  • (T)value - looks like the old C-style cast, but this cast performs conversions if type T and the type of value are convertable, or if value is truly a type T (or derived from it) will perform an up/down cast if valid, or throws an invalid cast exception if not.
  • value as T - same as (T) value except will not throw if cast is invalid, but will return null instead.

The as cast is one of the gems of C#, as it essentially gives you the power of the C++ dynamic_cast without the headache of having to catch exceptions. These two are equivalent:

   1: public Rectangle ConvertToRectangleUsingCast(Shape s)
   2: {
   3:     try 
   4:     {
   5:         return (Rectangle)s;
   6:     }
   7:     catch (InvalidCastException)
   8:     {
   9:         return null;
  10:     }
  11: }
  12:  
  13: public Rectangle ConvertToRectangleUsingAsCast(Shape s)
  14: {
  15:     return s as Rectangle;
  16: }

Note that C#, unlike C++, will never let you cast something to a type that is invalid.  It will either throw or return null depending on which cast you use.

10. Interfaces are not pure virtual abstract classes

If you want to define an interface in Java or C#, this is already a part of the language.  However, in C++ the only way to define classes was to make them abstract classes with all pure virtual methods:

   1: // this class is abstract because at least one method is.
   2: class Shape
   3: {
   4: public:
   5:     // pure virtual methods are how you define abstract methods in C++
   6:     virtual void Draw() = 0;
   7: }

Because interfaces in C++ are defined this way, they have the virtue of automatically being virtual methods which can be overridden further down the class hierarchy.  This is not immediately true in C#, consider:

   1: public interface IShape
   2: {
   3:     void Draw();
   4: }
   5:  
   6: public class Quadrilateral : IShape
   7: {
   8:     public void Draw()
   9:     {
  10:         Console.WriteLine("Drawing quadrilateral.");
  11:     }
  12: }
  13:  
  14: public class Rectangle : Quadrilateral
  15: {
  16:     public void Draw()
  17:     {
  18:         Console.WriteLine("Drawing rectangle.");
  19:     }
  20: }
  21:  
  22: public static class Program
  23: {
  24:     public static void Main()
  25:     {
  26:         Rectangle r = new Rectangle();
  27:         Quadrilateral q = r;
  28:         IShape s = r;
  29:  
  30:         r.Draw();
  31:         q.Draw();
  32:         s.Draw();
  33:     }
  34: }

So what do you except from this?  Well, the answer is:

   1: Rectangle
   2: Quadrilateral
   3: Quadrilateral

Whoa!  Why didn't it say Rectangle three times?  Because when Quadrilateral implemented Shape, it did so as normal (non-virtual) methods, this means that Rectangle's Draw hides Quadrilateral's Draw, thus it is not an override which will get looked up at runtime!  This can cause a lot of confusion.

Thus, if you want your interface implementation to be overridden in a sub-class, you should implement the interface virtually and override as below:

   1: public interface IShape
   2: {
   3:     void Draw();
   4: }
   5:  
   6: public class Quadrilateral : IShape
   7: {
   8:     public virtual void Draw()
   9:     {
  10:         Console.WriteLine("Drawing quadrilateral.");
  11:     }
  12: }
  13:  
  14: public class Rectangle : Quadrilateral
  15: {
  16:     public override void Draw()
  17:     {
  18:         Console.WriteLine("Drawing rectangle.");
  19:     }
  20: }
  21:  
  22: public static class Program
  23: {
  24:     public static void Main()
  25:     {
  26:         Rectangle r = new Rectangle();
  27:         Quadrilateral q = r;
  28:         IShape s = r;
  29:  
  30:         r.Draw();
  31:         q.Draw();
  32:         s.Draw();
  33:     }
  34: }

Now that we use virtual when we implement and override when we override, we will get Rectangle printed three times as we expected before.

Summary

So these are just a few of the many differences between C++ and C# that C++ developers should keep in mind as they move to C#.  As I said before, this is just the tip of the iceberg, and hopefully if folks find this useful I will continue it with follow ups because there are many, many more.

 

 Technorati Tags: , , , ,

 

Print | posted on Thursday, August 12, 2010 5:34 PM | Filed Under [ My Blog C# C++ Software .NET Fundamentals ]

Powered by: