Charles Young

  Home  |   Contact  |   Syndication    |   Login
  198 Posts | 64 Stories | 512 Comments | 373 Trackbacks

News

Twitter












Article Categories

Archives

Post Categories

Image Galleries

Alternative Feeds

BizTalk Bloggers

BizTalk Sites

CEP Bloggers

CMS Bloggers

Fun

Other Bloggers

Rules Bloggers

SharePoint Bloggers

Utilities

WF Bloggers

The C# compiler is a pretty good thing, but it has limitations. One limitation that has given me a headache this evening is its inability to guard against cycles in structs.  As I learn to think and programme in a more functional style, I find that I am beginning to rely more and more on structs to pass data.  This is natural when programming in the functional style, but structs can be damned awkward blighters.

Here is a classic gotcha.  The following code won't compile, and in this case, the compiler does its job and tells you why with a nice CS0523 error:

    struct Struct1
    {
        public Struct2 AStruct2
    }

    struct Struct2
    {
        public Struct1 AStruct1
    }

Structs are value types and are automatically instantiated and initialized as stack objects.  If this code were compiled and run, Struct1 would be initialised with a Struct2 which would be initialised with a Struct1 which would be initialised with a Struct2, etc., etc.  We would blow the stack.

Well, actually, if the compiler didn't capture this error, we wouldn't get a stack overflow because at runtime the type loader would spot the problem and refuse to load the types.  I know this because the compiler does a really rather poor job of spotting cycles.

Consider the following.  You can use auto-properties, in which case the compiler generates backing fields in the background.  This does nothing to eliminate the problem.  However, it does hide the cycle from the compiler.  The following code will therefore compile!

    struct Struct1
    {
        public Struct2 AStruct2 { get; set; }
    }

    struct Struct2
    {
        public Struct1 AStruct1 { get; set; }
    }

At run-time it will blow up in your face with a 'Could not load type <T> from assembly' (80131522) error.  Very unpleasent.

ReSharper helps a little.  It can spot the issue with the auto-property code and highlight it, but the code still compiles.  However, ReSharper quickly runs out of steam, as well.   Here is a daft attempt to avoid the cycle using a nullable type:

    struct Struct1
    {
        public Struct2? Struct2 { get; set; }
    }

    struct Struct2
    {
        public Struct1 Struct1 { get; set; }
    }

Of course, this won't work (duh - so why did I try?).  System.Nullable<T> is, itself, a struct, so it does not solve the problem at all.  We have simply wrapped one struct in another.  However, the C# compiler can't see the problem, and neither can ReSharper.  The code will compile just fine.  At run-time it will again fail.

If you define generic members on your structs things can easily go awry.  I have a complex example of this, but it would take a lot of explaining as to why I wrote the code the way I did (believe me, I had reason to), so I'll leave it there.

By and large, I get on well with the C# compiler.  However, this is one area where there is clear room for improvement.

Update

Here's one way to solve the problem using a manually-implemented property:

    struct Struct1
    {
        private readonly Func<Struct2> aStruct2Func;

        public Struct1(Struct2 struct2)
        {
            this.aStruct2Func = () => struct2;
        }

        // Let's make this struct immutable!  It's good practice to do so
        // with structs, especially when writing code in the functional style.
        // NB., the private backing field is declared readonly, and we need a
        // constructor to initialize the struct field.  There are more optimal
        // approaches we could use, but this will perform OK in most cases,
        // and is quite elegant.
        public Struct2 AStruct2
        {
            get
            {
                return this.aStruct2Func();
            }
        }
    }

    struct Struct2
    {
        public Struct1 AStruct1 { get; set; }
    }

posted on Tuesday, January 22, 2013 10:06 PM

Feedback

# re: Poor, confused C# compiler 1/23/2013 3:28 PM Dan
Isn't the lambda you have here just closing over the parameter, effectively boxing it?

You could achieve the same thing more directly by boxing it directly by making the backing field type object (I'd probably implement a Boxed<T> type to hold it to hide the type-unsafety).

# re: Poor, confused C# compiler 1/24/2013 10:22 AM David
I'm afraid I can't really agree with "would blow the stack". Firstly, it can go anywhere, depending on how the struct is used (it might as well blow the heap), and secondly, even an infinitely nested zero sized struct would still be zero bytes, so it shouldn't do any harm in the first place.
But still an interesting post, of course

# re: Poor, confused C# compiler 1/24/2013 10:38 AM Rob G
You're going OTT to my mind. Simply design immutable classes (no setters/modifiers) and use them instead. It won't create them on the stack, but that's not a bad thing.

I don't see the relation of struct's to functional style. Hell GC was actually invented to make LISP feasible, and every single functional language I'm aware of utilises GC.

# re: Poor, confused C# compiler 1/24/2013 10:52 AM John Morrison
Try using a more straightforward language like C++ that allows you to explicitly specify a declaration as a value or a reference.

The problem with C# is that classes are declared as references always and structs are declared as values always. Two objects can hold references to each other but they cannot hold each other by value.

# re: Poor, confused C# compiler 1/24/2013 12:46 PM Gary Wheeler
I've never had cause to define something like cyclic struct definitions, and my first reaction is to wrinkle my nose at the code smell arising from such a thing. Can you provide a more concrete example where this is a useful thing to do?

# re: Poor, confused C# compiler 1/24/2013 1:32 PM Andy Bantly
references ... because someone thought pointers smelled bad long ago. I'd say that references in modern languages have grown into ugly beastly things. As first a C and now a C++ person, I'll have to agree with the poster about declaring as a value or reference.

# re: Poor, confused C# compiler 1/24/2013 3:01 PM Eric Johansson
Have you tried using a pointer?

# re: Poor, confused C# compiler 1/24/2013 3:05 PM Greg
I would think that if you have to trick the compiler into doing something, then you probably shouldn't be doing it.

# re: Poor, confused C# compiler 1/24/2013 5:00 PM Dan Sutton
Of course, it could be argued that it *should* be possible to write code which buggers things up -- if you couldn't, it wouldn't be much of a language. You can do the same thing by writing an infinitely recursive procedure, and then you can complain that the compiler is too stupid to know how stupid it is to do that... or else you can just not do things like that in the first place.

# re: Poor, confused C# compiler 1/24/2013 5:25 PM Eugene
Wow! How people struggle when they can't use C/C++ !!


# re: Poor, confused C# compiler 1/24/2013 10:06 PM Vitaliy
C# was not designed as a functional language, some features were added to it to make it feel like its functional but it is still OOP language so is .Net framework, use your tools appropriately

# re: Poor, confused C# compiler 1/24/2013 11:45 PM Charles Young
Loving some of the comments. Yeah, Eugene, I'm way too dumb to use C++. I admit it.

Surely there is merit in having the compiler guard against mistakes. I do make mistakes. My point, simply, is that I would expect the compiler to do a better job than it does of detecting cycles. It could be that the processing required to rigorously detect such cycles would have too great an impact on performance. However, it could equally be that this is a feature of the compiler that hasn’t been very well implemented and deserves some attention.

Rob’s point about using immutable classes in preference to structs is well made. Being in the process of re-educating myself to think and code in a more functional style, I take the point that there is no need to think that structs are necessary to this. However, I can’t completely shake the feeling that structs ‘fit’ the functional way of thinking better than classes.

The code I’m writing is quite involved. It is a visitor for Linq expressions which interprets query comprehensions as rules and ‘converts’ them to logically-equivalent expressions in a target rule language (I’m writing support for two target rule languages, currently – Microsoft’s BRL and W3C RIF PRD with additional annotations). There are a lot of different concerns to be addressed (e.g., providing a clear distinction between bound and free variables), and my first attempts resulted in truly horrible code. Hence, I re-wrote using a functional approach (a textbook state monad) in order to sort out the mess. The result is much cleaner. In an earlier version of the code, I used a class to represent state. As I traverse the expression tree depth-first, I need to ‘remember’ state for earlier nodes for later processing of their additional sub-branches (i.e., I need to label the tree). Being lazy I originally used memberwise cloning to implement this, but this is very inefficient. I ended up replacing this with immutable structs as a more efficient and natural approach. If you know state monads you will know that they are functions that close over a value and return a 2-tuple containing state and a value (in this case, a Linq expression node). I 'cheated' slightly by implementing my monad as a kind of functor that is driven by the visitor. So, my monad ended up being a struct containing a struct. And I made a mistake in the implementation. It’s called a ‘bug’…something the C++ coders commenting on this post will have never encountered, given how very clever they are and how straightforward their language is. The compiler didn’t spot it. I think it should have.

# re: Poor, confused C# compiler 1/25/2013 12:33 AM Charles Young
I rather disagree, Vitaliy. Like so many other languages (even C++ 11), C# has evolved into a multi-paradigm language. Sure, it was originally designed as a Java-esque OO language, and you can still write code in the (pre-Lambda) imperative Java style if you wish. I used to think that OO and functionalism were incompatible, but a quick survey of the modern GPL landscape quickly proves that I was very wrong on that score. The two styles are addressing different concerns. OO is chiefly about modularisation. This can be modularisation of functional or imperative code, or even both at once! Functionalism is chiefly about typing of expressions (as opposed to data) and not at all about modularisation. It is certainly challenging to support the two styles within a single language, but most modern languages are now doing this fairly successfully. This is the result of huge amounts of design. C#'s functionalism is in part designed or influenced by some very well-known (in that community)Haskell guys. I think this is exactly how the modern C# is designed to be used!

I do, of course, agree that C#'s compiler does not pack all the smarts that some functional compilers do, and that the language does not go as far down the functional route as some other languages. However, it remains multi-paradigm.

# re: Poor, confused C# compiler 1/25/2013 1:12 AM hotsleeper
I think the comment about the Code Smell is a worthwhile issue. The fact that you are doing a complex task accounts for most of the complexity in the approch, but the fact that the approch selected is producing this sort of painful solution is perhaps a suggesting that another solution may be easier to maintain.
In my experience, anytime my code is straining the compiler and or language features, its a good time to step back and look for a more straight forward solution.

From your explanation of the task, its sounds like you are traversing an expression in LINQ and converting it literally to one of two domain specific languages. (I assume those languages have "equivilent" expression capability as LINQ." In which case, a functional style is probably optimal for the expression parser, while OOP may be more prefferable for complex state representation. Have you considered trying a number of small visitor agents which can assemble the result from sections of the expression rather than one monolithic visitor that has to manage the whole solution in one bite?

# re: Poor, confused C# compiler 1/25/2013 2:17 AM Charles Young
Thanks, hotsleeper. The code is a work in progress, and as I explained, I've been busy eradicating big-time full-on stench from the original version. I'm certain there are still some smells, but the original pungent odour is much reduced. I made the mistake because I got confused over a related issue to do with use of generics. There is some complexity there due to the way state plays with an OO inheritance approach I have used to separate different concerns in order to maximise code re-use. My current view is that using classes instead of structs would have been a bit simpler and would have avoided allowing me to make the mistake I made (something I think the compiler should have done). Classes could possibly prove more efficient for labelling, but there is a good chance the struct approach will prove faster – important if you are processing very large rule sets. I’ve used interfaces on my structs (there are more than one) which means there is an inevitable degree of boxing going on. I will, at some point, need to implement a version of the code using classes for state and do some careful performance testing. I would have used generics regardless of use of structs or classes, and frankly the code wouldn’t be significantly simpler had I gone down the class route. The code is non-trivial and encapsulates the state monad in an inheritance hierarchy. Maybe this smells, but frankly, there is a degree of inherent complexity that won’t go away whatever approach I use. I’m sure there is still room for improvement.

# re: Poor, confused C# compiler 1/25/2013 7:52 AM Charles Young
And further (@hotsleeper) I did consider using parser combinators. I think this is what you may be suggesting. I wouldn’t want multiple visitors. That would imply visiting the same tree multiple times which would be very inefficient. Of course, I not parsing the C#. My visitor is the equivalent of an AST processor. The lexer and parser are provided by Microsoft. However, my approach not a million miles from parser combinators and does use lots of little functional ‘agents’ to compose the output. That was a major outcome of re-writing the code around a state monad.

Here is a bit more information on how things are structured (more rope for commentators to hang me with). I have an abstract base class that handles expression tree traversal in the classic recursive manner. I derive a class that attempts to interpret the expression tree as a query comprehension (Linq expression trees are by no means limited to queries). It also currently handles variable analysis, but I should probably refactor this into a different class. Linq expression trees are great, but they do a poor job of representing bound and free variables. Variables are central to generating rule sets. The W3C RIF PRD standard is founded on these notions, and many rule languages implement explicit matching syntax over bound and free variables. Next layer up is code to analyse query comprehensions as production rules. Query comprehensions are a great way to represent the left hand side of rules, and yes, rule languages generally offer much the same expressivity as Linq in this regard. Like Linq expression trees, PRD is a semantic tree representation. The tree is shaped rather differently at the root in order to express the quantification of bound variables. This is much the same as ‘from’ syntax in Linq’s syntactic sugar. In Linq, ‘from’s are not associated with the root of the expression tree. They are associated with the other end of the left-most branch which is arguably a better approach than PRD when visiting trees. I also need to handle representation of the right hand side of rules which are lists of actions. I do this using a fluent interface. The class that handles rule representation converts the sequence of Linq expressions into a more detailed sequence of tokens that represent the rule. Finally, I derive a class that composes the token sequence into the target language. At this point, the state monad essentially binds inner functions that do the composition. Each function is focused on handling a single token type. This is essentially your ‘little agent’ idea and makes the code far more manageable. I adopted this approach to tame the horror of my original monolithic non-functional attempt. It separates out concerns nicely, but binds everything together neatly using a state monad.

Post A Comment
Title:
Name:
Email:
Comment:
Verification: