Thursday, September 24, 2009

[Originally Published Apr 2004 - Updated October 2009]

Everyone "understands" that Microsoft's .NET and the CLR is a "garbage collector" based environment; but is it really.

First we must establish what is meant be "garbage" in this context. When an object is created there is (typically) one reference by which it can be accessed (the return value of "new"). While the program executes, there may be other references established to the same item; and established references may terminate. When an object can no longer be referenced, it is deemed to be "Garbage". [note: This is a bit of a simplification but will satisify out needs]

Next we must look at the definition of "collection", Websters dictionary offers the following:

collection: the act or process of collecting.
colllect: to bring together into one body or place.

Now lets look at what happens when a "GC.Collect" occurs.... (For simplicity we will look at generation 0, and ignore the impact of "pinned" objects). The object graph is "walked" starting at the rooted references, and any reachable item that is in Generation 0 is marked. When the walk is complete, the live objects are moved to the Gen1 heap, and the Gen0 heap reset back to the beginning. The result is that the memory occupied by all of the previous Gen0 residents is now available.

This reveals the fundamental problem with calling this process "garbage collection". Absolutely NOTHING is done with the garbage. Specifically there are no operations which involve moving the garbage so it is "brought together in one place".

To see what a "real" garbage collection is, consider an anology. In ones house, there are likely to be multiple wastebaskets; one in the kitchen, one in the bathroom, and other scattered throughout the residence. On trash day (or earlier if the Wife has anything to say about the matter), one goes through the residence and collects all of the garbage from multiple locations, places it in one bag, and brings it outside to the rubbish container. The amount of work is dependant on the number of original locations of garbage, and the amount of garbage in each location. The amount of "precious" (non-garbage) item in the house has absolutely no bearing on the process or the effort it will involve.

But when we look at the .NET situation, the exact opposite is true. It is the number of LIVE objects that impacts the performance as these are what must be scanned and moved. It does not matter if there is a single small "garbage" object on the heap, or if there are tens of thousands (of varying sizes). Once the live (precious) objects have been moved out of harms way, it is a single, constant time operation to reset the heap to be ready to get new objects.

This shows that .NET implements a Live Object Preservation pattern, and NOT a grabage collection pattern.

While this entire post may seem like a "symantic quibble", it has serious ramifications when dealing with .NET architecture/design and implementation. In other environments there is NO overhead (aside from the actual memory) to keeping references to heap based object which will be needed (or even just possibly needed) later. In many cases, the cost of allocating [always higher in a conventional heap than in a CLR heap]  and deleting (updating the freelist) far outwieghs the memory utilization issue, and so references are kept for an extended period of time.

When this approach is taken in a .NET application, these live objects represent a performance hit everytime (neglecting some optimizations) that the GC runs - simply because the GC deals with processing live objects. On the otherhand, allocating a (non-large) object in .NET is typically a simply pointer increment, and abandoning it (assuming no finalizer) is a 0 time issue.

Over the past few years, I have been involved with a number of projects where clients were complaining that ".NET was slow" and could not meet their perfomance demands. In the vast majority of cases, this was directly tracked to the implementation not having proper (for .NET) object lifetime management..

addendum: When one looks at environments such at C/C++, the conventional/standard implementation (pre C++0x) do not include "garbage collection". The heap is (typically, and simplified) implemented as a structure containing the "free blocks" of items that were previously deleted. This means that (a pointer to) memory that is not longer in use [i.e. garbage] IS actually MOVED. Each time there is a call to "delete" or "free(...)" there is a synchronous [i.e. it completes before delete/free() returns] collection of information about the garbage that occurs.

In .NET the large object heap [LOH] is used for items which exceed a threshold size [80,000 bytes]. This particular heap IS operated in a manner nearly identical to a C/C++, in that the "live" objects are NOT moved, and it is a set of references to the avilable memory (garbage) that is manipulated.

 

 

 

[Originally Written October 2004 - Updated September 2009]

Many people state that Microsoft .Net technology provides a "Virtual Machine" environment via the CLR. However, an examination of various definitions of Virtual Machine shows that this is not the best analogy.

For our first example definition, let us look no further than Microsoft's own site:

Virtual Machine: A software-implemented computer that emulates a complete hardware system in a self-contained, isolated software environment and runs its own operating system.

Clearly this does not apply so lets break down the parts of  "a complete hardware system". The three major categories of devices that make up a system are: Memory (some type of storage),  Processing, and Input/Output.

When a program written in any language uses the Microsoft Implementation of the CLR, it directly utilizes the actual memory presented by the underlying system, All processing is done using native instruction execution on the underlying processor, and all Input/Output is accomplished via the device (Drivers) provided by the underlying operating system.

So, while it is possible (and there are projects attempting to reach this goal) to implement the CLR as a Virtual Machine. It is clear that the current Microsoft implementation provides none of these features.

The feature that the CLR provides is that there is an intermediate stage where the source code has been reduced from the original form into a well defined set of intermediate instructions that are independant of any specific target environment. Additionally a rich library (the BCL) is provided for addressing many common constructs and providing additional abstractions over lower level functionallity.

But remember this intermediate code NEVER executes. It undergoes a second compilation phase to become pure native instructions.

This process actually has a long history. Long, long ago [1970's 1980's] it was quite common for high level language compilers to emit (either by default or as an option) assembly language SOURCE rather than object code. This output could then be copies to various target machines (with differences in capabilities) and run through the assembler (often with differing configurations or external linkages) to produce an executable that was specifically tailored to the targeted machine.

Although the mechanics are different this is completely analogous to what happens with a CLR based program.

The result is that while .Net (the CLR) does provide a level of abstraction from the actual executable code, it does not meet the criteria for a "Virtual Machine".

Tuesday, March 17, 2009

As I indicated in a previous post, I was contacted by Newsday (a large regional newspaper from Long Island, New York) about what the MVP program meant for me as a small business owner. The original article is available at:  http://www.newsday.com/business/ny-bzinside166071263mar16,0,4560998.story

 

Saturday, March 07, 2009

Reflections on the 2009 MVP Summit, and an exciting news update from Dynamic Concepts Development Corporation.

Saturday, May 03, 2008

Originally written in May 2008...This post is part of a multi-part series on developing Agile Software. In this installment we will examine some of the issues that can help that class definitions are stable and reusable.....

Tuesday, September 30, 2008

It all depends on what easy means...
 

Considering the current economy in the United States, and the crisis on Wall Street, it is no suprise that companies are doing everything to manage costs. This is especially true in the IT and Software Development areas. Some companies are putting critical projects on hold, others are drastically scaling back, and many are lloking for lower costs alternatives.

Based on postings on various job boards the average posted hourly / per diem contract rate has decreased by nearly 20% from 18 months ago. In many cases this means that the client is attempting to utilize a resource with less experience.

While this does lower the cost on paper, the question really should be "Is it truly saving any money?". Remember that much of the current financial crisis is because of sub-prime mortgages and artificially (sometimes fraudulent)  inflated appraisals . The previous economic bubble (the “DotCom” boom/bust) was also largely based on valuations that looked good on paper, without the real value to back up the pricing.

At a software developer’s conference earlier this year, one of the discussion groups focused on this specific issue. Everyone that participated agreed that getting “the best quality per dollar” was the single most important metric;  so the discussion turned to “What is Quality?” with respect to the entire software development life cycle. This is where the discussion got very interesting.

Classic metrics such as “Lines of Code per Day”, “Bug Rates”, “Code Churn” were all discussed and while useful, were found to be lacking. When metrics such as these are used (especially if one is given a very high priority), then the development team will often adjust their behavior to provide “good numbers”. If you want lots of code per day, write the code fast; skip the analysis, let the QA team do ALL the testing. If you want a very low bug rate; spend hours one each line of code, and let the delivery schedule slip. If you want very low code churn; don’t check anything into the source control system and let the team members become serverly out-of-sync.

After the discussion had been going on a while, I proposed an alternative. Instead of trying to determine what quality WAS, could we define “What Indicates a LACK OF QUALITY?”. I also proposed a one word answer to the question: SUPRISES.

If you look at any of the current design methodologies, they all promise to help with one thing, predictability. This is impossible if there are too many surprises. The surprises can take many forms ranging from simple bugs not being detected until late in the design cycle to fatal flaws in the underlying architecture or design. Sometimes these surprises are not revealed until the product is released, or a maintenance update is attempted.

Many years ago “The Rules of Ten” were published. This stated that an issue found at any step in the design cycle would cost 10 times as much to fix at the next stage. While the exact number may be debatable, the basic premise that addressing issues as early as possible is the most cost effective approach is generally agreed upon

The statement of “You Get What You Pay For” also applies to the issue of quality. The experienced professional will know what works and what only appears to work from direct experience. This goes far beyond specific technical knowledge of a language or tool. It also involves the project management structure, team dynamics, and other factors in the specific environment. What works extremely well at one company, can fail miserably at another. Less experienced people, while often possessing excellent technical skills at the tactical level, typically lack the ability to consider the more strategic view.

In the end, the company that hires employees or engage consultants must make the decision between a low cost on paper (salary/rate) versus a lower overall development cost (also known as Total Cost of Ownership or TCO). They must ask questions such as “How do I improve reliability, and maintainability?”, “How do I ensure on time delivery of a program that meets all of the requirements and goals?”, and most importantly “How do I spend my money in the most effective manner of addressing these issues?

The CPUWizard is a trademark of Dynamic Concepts Development Corp. We are a full service software consulting firm, that has been providing quality software solutions to business since 1984. Located in New York, NY we provide a full range of on-site and remote consulting services. Please visit us on the web at www.dynconcepts.com or e-mail, TheCPUWizard@dynconcepts.com

Tuesday, April 22, 2008

Much has been written about the Agile Software Development Processes. While there are many different approaches to this methodology, they all involve being able to rapidly and effectively deal with changing requirements during the software development lifecycle (SDLC). This article looks at software development from a slightly different perspective; creating software that is in and of itself agile....