The Enterprise Library team is trying to improve the performance of the Logging Application Block (LAB) even further. Since version one we did come a long way. It is interesting that even the guidance projects of Microsoft do have change their mind. Lets recap how the LAB has evolved since it initial release in January 2005.
|Released Version ||Logging Application Block (LAB) Characteristics ||ca. Logs/s |
|January 2005 v1.0 ||Log File is opened/closed after every write. Performance was awful but accepted. ||400 Logs/s |
|January 2006 v2.0 ||.NET 2.0 Support, Easier configuration, Log File is flushed after every write. Much better perf. ||3000 Logs/s |
|April 2007 3.0 ||New Application Blocks but no significant changes in the LAB. ||3000 Logs/s |
|May 2008 4.0 ||Log files which are rolled over are finally supported out of the box. The TextFormatter was reworked to make it over 13 times faster as I suggested. ||10 000 Logs/s |
Since customers are still complaining about the performance of the Logging Application Block a performance feature was added to the product backlog:
LAB02: Async logging (text formatting done asynchronously) to cut down on load on primary thread (M)
Looks ok. We have many CPUs by now why not put it on another thread? As Grigory has pointed out there are several problems with that approach. If you switch to another thread you have to fetch the call stack of the calling thread first before doing the thread transition. But since the Stacktrace generation is a fairly expensive operation there is little if nothing at all gained to do it on an another thread. Thread affinity strikes back and destroys your nice lazy init design pattern to initialize only the properties of a log messages which are really configured to be formatted by the message template. So what is the problem here? First I think it is the customers expecting too much from a general purpose application block. The title of the post says it all: Logging is not Tracing!
When you design a big application you need to have good and flexible error reporting perhaps across machines to collect log data in a centralized way. That is a perfect use case for the Logging Application Block where you configure some remote trace listener and send the log data to a central log server which stores its log messages in a database, log file or whatever. If you use out of process communication you are limited by the network performance already which is in the best case several thousand logs/s.
Besides Error Reporting you also need to trace your program flow to find out where the performance bottlenecks are and even more important when an error occurs you have a chance to find out how you did get there. In an ideal world every function would have some tracing enabled with the function duration, passed parameters and how far you did get in your function. This is what performance profilers actually do by using the Profiling API of .NET. If you have ever used such a tool you will notice that the performance of you application will be degraded by a factor 5-100. Doing automatic instrumentation is therefore not an option and you still need to write some code or use an IL weaver to add some tracing code to your app.
I think there has never been made a real distinction between logging and tracing.
- Logging is error reporting.
- Tracing is following your program flow and data.
Logging is always enabled. Tracing on the other hand is normally disabled since it would generate huge amounts of data in very short time. By differentiating these two use cases you can optimize each of them in a different manner. Logging needs to be flexible. Tracing has three main goals: Correctness, performance and performance. This can heavily influence your API design and other design choices to support each scenario as well as possible.
And now back to the original question: Why do customers still complain about the performance of the Logging Application Block (LAB)? I am very certain that these customers use the LAB for tracing and complain about its bad performance. When you try to log enter/leave of some very frequently called functions (millions calls/s) you end up with huge log files and a crawling application. What do you do if your application is slow? Use a profiler and check where most of the time is spent and find that most of the time is spent in the LAB string formatting and disk IO functions. There is little you can do about the disk IO so you leave it and try to optimize the LAB further. That will only push the limit a little and be all in vain until the next customer complains that if he Logs enter/leave of his highly optimized GetHashCode() function his application becomes very slow. How do we get out of this? There is no easy way out of this. It could help to make such performance boundaries more explicit like the WCF team has done it with DataContracts. With explicit I mean one API for logging another one for Tracing. That could help to minimize the pain of users with wrong expectations of the performance of the LAB.
That said I see still quite some potential for the LAB to support tracing better than today:
- Create some TraceFormatter with some quite fixed message format which includes time, pid, tid, category and message for example. No Call Stack since it is too expensive.
- In the formatters cache the the last LogEntry object and the formatted message. If you trace to two different destinations with the same formatter one could reuse the already formatted message if the object reference is the same as last time.
- Format the time in an efficient way like it was shown here. A factor 15 compared to DateTime.ToString should help to make TraceFormatter a really fast one.
- Do not use multithreading. It will make it slower, harder to understand and less reliable. I expect to have 100% reliable logging and tracing. Always. The following simple code should work with no exceptions:
void Main(string  args)
Log.Info("Hello world"); // could be lost if done asynchronously
If you introduce a time latency between the log call and its processing you can loose the most important log message just before your application has terminated.
Perhaps we will see some nice Tracing Application Block in the future if enough customers complain. Just tell Grigory to consider this as well for future version of the Enterprise Library.