SpeedTrace is a relatively unknown profiler made a company called Ipcas. A single professional license does cost 449€+VAT. For the test I did use SpeedTrace 4.5 which is currently Beta. Although it is cheaper than dotTrace it has by far the most options to influence how profiling does work.
First you need to create a tracing project which does configure tracing for one process type. You can start the application directly from the profiler or (much more interesting) it does attach to a specific process when it is started. For this you need to check “Trace the specified …” radio button and enter the process name in the “Process Name of the Trace” edit box. You can even selectively enable tracing for processes with a specific command line. Then you need to activate the trace project by pressing the Activate Project button and you are ready to start VS as usual. If you want to profile the next 10 VS instances that you start you can set the Number of Processes counter to e.g. 10. This is immensely helpful if you are trying to profile only the next 5 started processes.
As you can see there are many more tabs which do allow to influence tracing in a much more sophisticated way. SpeedTrace is
a profiler which does not rely entirely on the profiling Api of .NET. Instead it does modify the IL code (instrumentation on the fly) to write tracing information to disc which can later be analyzed. This approach is not only very fast but it does give you unprecedented analysis capabilities. Once the traces are collected they do show up in your workspace where you can open the trace viewer.
I do skip the other windows because this view is the most useful one. You can sort the methods not only by Wall Clock time but also by CPU consumption and wait time which none of the other products support in their views at the same time. If you want to optimize for CPU consumption sort by CPU time. If you want to find out where most time is spent you need Clock Total time and Clock Waiting. There you can directly see if the method did take long because it did wait on something or it did really execute stuff that did take so long.
Once you have found a method you want to drill deeper you can double click on a method to get to the Caller/Callee view which is similar to the JetBrains Method Grid view. But this time you do see much more. In the middle is the clicked method. Above are the methods that call you and below are the methods that you do directly call. Normally you would then start digging deeper to find the end of the chain where the slow method worth optimizing is located. But there is a shortcut.
You can press the magic button to calculate the aggregation of all called methods. This is displayed in the lower left window where you can see each method call and how long it did take. There you can also sort to see if this call stack does only contain methods (e.g. WCF connect calls which you cannot make faster) not worth optimizing. YourKit has a similar feature where it is called Callees List.
In the Functions tab you have in the context menu also many other useful analysis options
One really outstanding feature is the View Call History Drilldown. When you select this one you get not a sum of all method invocations but a list with the duration of each method call. This is not surprising since SpeedTrace does use tracing to get its timings. There you can get many useful graphs how this method did behave over time. Did it become slower at some point in time or was only the first call slow? The diagrams and the list will tell you that.
That is all fine but what should I do when one method call was slow? I want to see from where it was coming from. No problem select the method in the list hit F10 and you get the call stack. This is a life saver if you e.g. search for serialization problems. Today Serializers are used everywhere. You want to find out from where the 5s XmlSerializer.Deserialize call did come from? Hit F10 and you get the call stack which did invoke the 5s Deserialize call. The CPU timeline tab is also useful to find out where long pauses or excessive CPU consumption did happen. Click in the graph to get the Thread Stacks window where you can get a quick overview what all threads were doing at this time. This does look like the Stack Traces feature in YourKit. Only this time you get the last called method first which helps to quickly see what all threads were executing at this moment. YourKit does generate a rather long list which can be hard to go through when you have many threads.
The thread list in the middle does not give you call stacks or anything like that but you see which methods were found most often executing code by the profiler which is a good indication for methods consuming most CPU time.
This does sound too good to be true? I have not told you the best part yet. The best thing about this profiler is the staff behind it. When I do see a crash or some other odd behavior I send a mail to Ipcas and I do get usually the next day a mail that the problem has been fixed and a download link to the new version. The guys at Ipcas are even so helpful to log in to your machine via a Citrix Client to help you to get started profiling your actual application you want to profile. After a 2h telco I was converted from a hater to a believer of this tool. The fast response time might also have something to do with the fact that they are actively working on 4.5 to get out of the door. But still the support is by far the best I have encountered so far.
The only downside is that you should instrument your assemblies including the .NET Framework to get most accurate numbers. You can profile without doing it but then you will see very high JIT times in your process which can severely affect the correctness of the measured timings. If you do not care about exact numbers you can also enable in the main UI in the Data Trace tab logging of method arguments of primitive types. If you need to know what files at which times were opened by your application you can find it out without a debugger.
Since SpeedTrace does read huge trace files in its reader you should perhaps use a 64 bit machine to be able to analyze bigger traces as well. The memory consumption of the trace reader is too high for my taste. But they did promise for the next version to come up with something much improved.