Geeks With Blogs
.NET Nomad What I've learned along the way
Back Links

LINQ Overview, part zero

LINQ Overview, part one (Extension Methods)

LINQ Overview, part two (Lambda Expressions)

 

Note: I realize it has been a really long time since I've posted anything.  It is both exciting and humbling that I continue to receive such positive feedback on these articles.  In fact, that is why I am trying to put in the effort and finish off this series before moving on to more recent topics.  This nomad has been on some interesting journeys these past months, and I am really excited to share what I've learned.

Data types, and why they matter

In the world of computer programming data typing is one of the most hotly debated issues.  There are two major schools of thought as far as type systems go. 

The first is static typing in which the programmer is forced by the programming environment to write her code so that the type of data being manipulated is known at compile-time.  The purpose of course being that if the environment knows the type of data being manipulated it can, in theory, better enforce the rules that govern how the data should be manipulated and produce an error if the rules are not followed.  It would be a mistake to assume exactly when the environment knows the types involved, however.  For example, the .NET CLR is a statically typed programming environment but it provides facilities for reflection and therefore sometimes type checking is done at run-time as opposed to compile-time.  Take a look at the following code sample:

static void TypeChecking()
{
   string s = "a string for all seasons";
   double d = s;

   object so = s;
   d = (double)so;
}

If you look at line two, you'll see we are trying to assign a string value to a double variable.  This trips up the compiler since string can not be implicitly cast to double.  This is an example of compile-time type checking and demonstrates why it can be useful as in this case it would have saved us a few minutes of debugging time. 

On line four we are assigning our string value to an object variable.  Since anything can be stored in an object variable we are essentially masking the type of the value from the compiler.  This means that on the subsequent line our explicit cast of so to double doesn't cause a compilation error.  Basically, the compiler doesn't have enough information to determine if the cast will fail because it doesn't know for certain what data type you will be storing in so until runtime.  Of course, since C# is a statically typed language the cast will generate an exception when it is executed.  Don't minimize the damage that this can pose.  What if a line like this was laying around in some production code and that method was never properly tested? You'd wind up with an exception in your production application and that's bad!

The second major school of thought in regards to typing is referred to as dynamic typing and as its name implies, dynamic type systems are a little bit more flexible about what they will accept at compile-time.  Dynamic type systems, especially if you have a solid background in traditionally static typed environments, may be a little harder to grok at first, but the most important thing to understand when working with a dynamic type system is that it doesn't consider a variable as having a type; only values have types.

As with anything in modern programming languages, everything is shades of gray.  For example even within the realm of static type systems there are those that are considered to be strongly typed and those that are considered to be weakly typed.  The difference being that weakly typed languages allow for programs to be written where the exact type of a variable isn't known at compile-time. 

The "why they matter" bit should be self evident.  You simply can not write effective code in a high-level language like C#, Python, or even C without interacting with the type system and therefore you can't write effective code without understanding the type system.  As a language, C# is currently tied to the CLR and therefore its syntax is designed to help you write statically typed code, more often than not code intended to provide compile-time type checks.  However, as the language and developer community using it have evolved there has been a greater call to at least simulate some aspects of dynamic languages.

 

The var keyword

Introduced in C# 3.0 the var keyword can be used in place of the type in local variable definitions.  It can't be used as a return type, the type of a parameter in a method or delegate definition, or in place of a property or field's type declaration.  The var keyword relies on C#3.0's type inference feature.  We first encountered type inference in part one where we saw that we didn't have to explicitly state our generic parameters as long as we made it obvious to the compiler what the types involved were.  A simple example is as follows:

        static void UseVar()
        {
            var list = new List<Book>();
            var book = new Book();

            book.Title = "The C Programming Language";

            list.Add(book);

            foreach (var b in list)
                Console.WriteLine(b.Title);            
        }

In the above we are using var on three separate lines, but all for the same purpose.  We've simply used the var keyword in place of the data type in our variable declarations and let the compiler handle figuring out what the correct type should be.  The first two uses are pretty straightforward and will save you a  lot of typing.  The third use of var is also common, but be forewarned as it will only work well with collections that implement IEnumerable<T>.  Collections that implement the non-generic IEnumerable interface do not provide the compiler with enough information to infer the type of b.

You have to be careful with var as even though the compiler can tell what type you meant when you wrote it, you or the other developers you work with might not be able to in three or four months.  You also have to look out for situations like the following:

        static void UnexpectedInference()
        {
            var d = 2.5;
            float f = d;
        }

As you can see you may have meant for number to be a float, but the compiler decided that you were better off with a double.  Naturally, if you wrote the following the compiler would have more information as to your intent and act appropriately:

        static void UnexpectedInference()
        {
            var d = 2.5f;
            float f = d;
        }

The var keyword is nice and all, but was it added to the language just to stave off carpal tunnel? No, they needed to provide the var keyword so that they could give you something more powerful, anonymous types.

 

Anonymous Types

To look at why var was introduced into the language, lets look at the following program.

    class Program
    {
        static void Main(string[] args)
        {
            var books = new List<Book>() {
                new Book() { 
                    Title = "The Green Mile",
                    Genre = "Drama",
                    Author = new Author() {
                        Name = "Stephen King",
                        Age = 62,
                        Titles = 1000
                    }
                },
                new Book() {
                    Title = "Pandora Star",
                    Genre = "Science Fiction",
                    Author = new Author() {
                        Name = "Peter F. Hamilton",
                        Age = 49,
                        Titles = 200
                    }
                }
            };

            var kings = from b in books
                       where b.Author.Name == "Stephen King"
                       select new { Author = b.Author.Name, BookTitle = b.Title };

            foreach (var k in kings)
            {
                Console.WriteLine("{0} wrote {1}", k.Author, k.BookTitle);
            }            

            Console.ReadLine();
        }
    }

    public class Author
    {
        public string Name { get; set; }
        public int Age { get; set; }
        public int Titles { get; set; }
    }

    public class Book
    {
        public Author Author { get; set; }
        public string Title { get; set; }
        public string Genre { get; set; }
    }

The above code is pretty straight forward so let's focus on the LINQ query :

           var kings = from b in books
                       where b.Author.Name == "Stephen King"
                       select new { Author = b.Author.Name, BookTitle = b.Title };

Do you notice anything interesting? WOAH! We just used the new operator to instantiate an object without saying what type it was, in fact, you couldn't tell the compiler what type it was even if you wanted to.  If you can't write out the name of the type, then how are you supposed to declare a variable of that type? Oh, right, we have the var keyword which uses type inference to figure out the correct data type to use!

So, how does C# 3.0 provide such a cool feature? It is supposed to be a statically typed language, right? Well, the answer to that isn't actually complicated in the least.  Think about it.  What do compilers do? They read in a program definition and then generate code in, typically, a lower level language like machine code or CLR byte code.  We have already established that type inference is being used to determine the correct data type for our var variable above, but the obvious problem is that we didn't define the type it winds up using.  After reading in our query the compiler can tell we are defining a class that has two properties, Author and BookTitle.  Further, it knows we are assigning System.String values to both properties.  Therefore, it can deduce the exact class definition required for our code to work in a statically type checked way.  If you were to fire up reflector you'd be able to find the following class definition:

    [DebuggerDisplay(@"\{ Author = {Author}, BookTitle = {BookTitle} }", Type="<Anonymous Type>"), CompilerGenerated]
    internal sealed class <>f__AnonymousType0<<Author>j__TPar, <BookTitle>j__TPar>
    {
        // Fields
        [DebuggerBrowsable(DebuggerBrowsableState.Never)]
        private readonly <Author>j__TPar <Author>i__Field;
        [DebuggerBrowsable(DebuggerBrowsableState.Never)]
        private readonly <BookTitle>j__TPar <BookTitle>i__Field;

        // Methods
        [DebuggerHidden]
        public <>f__AnonymousType0(<Author>j__TPar Author, <BookTitle>j__TPar BookTitle);
        [DebuggerHidden]
        public override bool Equals(object value);
        [DebuggerHidden]
        public override int GetHashCode();
        [DebuggerHidden]
        public override string ToString();

        // Properties
        public <Author>j__TPar Author { get; }
        public <BookTitle>j__TPar BookTitle { get; }
    }

First of all, the above looks pretty funky.  In fact, if you copy and paste it into visual studio it isn't going to compile.  The import thing to realize is that the class was built for you at compile-time not at runtime.  Another interesting point is that the compiler is smart enough to reuse this class if your code calls for a second anonymous type with the same properties.

All in all, anonymous types work like any other class with the same constraint (i.e. internal sealed).  See, I told you that this stuff isn't magic!

 

LINQ Tie In

Anonymous types are cool and all, but due to the restrictions on their use I am sure you'll find, as I have, that they are most useful in LINQ queries.  The way in which you would use an anonymous type in conjunction with LINQ is as follows:

           var kings = from b in books
                       where b.Author.Name == "Stephen King"
                       select new { Author = b.Author.Name, BookTitle = b.Title };

As you can see above, we are using an anonymous type in the select part of the query to define what our result set looks like.  In database terminology this is called a projection, and our example above is really no different than selecting a subset of columns from a table in a relational database.

At this point you may be saying, "so what? In the example you could have just selected a book object and gotten access to the author using normal dot notation".  You'd be correct in that instance, however consider the LINQ query below:

            var kings = from b in books
                        join a in authors on b.AuthorID equals a.ID
                        where a.Name == "Stephen King"
                        select new { Author = a.Name, BookTitle = b.Title };

In the above, we are essentially joining two collections of in memory objects on what amounts to a foreign key, i.e. they both know the ID of the author.  Hopefully you can now see the utility of anonymous types.  If we had simply selected the books, we would then need to do a subsequent query in order to find the corresponding author.

The only other place I see a lot of value in using anonymous types is to flatten out a set of related objects for the purpose of databinding or other UI rendering.

 

Summary

We have now examined the anonymous type feature added to C#3.0 as well as how to use the new var keyword.  It is my guess that you'll find yourself using var quite frequently, but only if you do a lot of LINQ will you being using it for anonymous types.  The next segment will focus on the actual architecture of LINQ, once you understand that there really isn't anything mysterious left.

Posted on Thursday, October 29, 2009 3:28 PM | Back to top


Comments on this post: LINQ Overview, part three (Anonymous Types)

# re: LINQ Overview, part three (Anonymous Types)
Requesting Gravatar...
Ive found what is missing here!

thanks again
Left by clasificado on Jul 09, 2010 7:41 PM

Your comment:
 (will show your gravatar)


Copyright © newman | Powered by: GeeksWithBlogs.net