Dylan Smith

ALM / Architecture / TFS

  Home  |   Contact  |   Syndication    |   Login
  71 Posts | 0 Stories | 109 Comments | 29 Trackbacks

News



Archives

Post Categories

Blogs I Read

Tuesday, January 14, 2014 #

In part 1 of this series we looked at Why Agile Fails due to lack of mature Technical Practices.

The 2nd really common reason I see teams fail with Agile is due to a lack of experience with *successful* agile projects.

What I see far too often is teams that have read all about agile online, and possibly been to some conferences and heard people talk about it.  Maybe some (or all) of the team members go and read some agile books, and maybe they even decide to send all team members on a 1-week Scrum course.

These are all good things, however, there is no substitute for actual experience working on a successful agile team.  There is just no way reading a book, or even taking a course can be a substitute for real-life hands-on experience.

There are a lot of tricky little details that all add up and make the difference between a team being successful or not with Agile.  Things like:

  • How do I split up the work into User Stories that follow INVEST principles
  • How do we break-down the work into meaningful tasks
  • What do we do when we reach the end of a Sprint and User Stories are only half-done
  • How do we build trust between the team and the stakeholders and/or upper management (this is a *really* hard one)
  • What are the problems that we should be focusing on in the Retrospective
  • How do we prioritize paying down technical debt against our customer facing User Stories
  • etc

A lot of these are problems that books and training may try to tackle, but they are the type of things that you can really only learn by experiencing them.

I think that training is important – possibly even necessary – but it is not sufficient (and I say this even though my company offers this training: http://www.imaginet.com/training).  You *need* to have at least one senior team member that has lived and breathed agile in a successful team.  Who has seen how tricky Epics can be broken down into effective User Stories.  Somebody who can recognize when something is not going well and can bring it up in the retrospective.

I think that last point is possibly one of the most important.  Teams with no experience with Agile success have a hard time recognizing when they are doing things poorly.  They often think that’s just the way things are supposed to be, or don’t recognize that they are doing something that is going to hurt them down the road.

 

So my advice for any team that wants to try Agile, is to make sure you have somebody (senior) on the team that has experience living and breathing agile, and it has to be experience being successful with Agile.  Having somebody with Agile experience on a team that failed, obviously isn’t going to be too helpful.

I’m a consultant who does agile coaching, so one option I’ll of course suggest is bringing in an Agile Coach from one of the many good consulting firms that do this.  My company – Imaginet – does this: http://www.imaginet.com/alm-coaching-and-mentoring

However, hiring a consultant is not the only option here.  Hiring an employee (or recruiting one from another department) with the desired experience would work just as well.  The key point is to make sure you have somebody (preferably several somebodies) that have this experience on your team.


Tuesday, December 3, 2013 #

I’ve worked with a lot of teams who have tried to adopt Agile and failed.  There are many reasons why this happens, but I tend to see a few very common reasons crop up over and over.  In particular there are 3 that I see clients struggle with all the time:

  1. Lack of focus on technical practices
  2. No agile experience on the team
  3. Missing buy-in from upper management

I’m going to try and tackle this in a 3-part blog series.

 

Lack of Focus on Technical Practices

When I talk about clients trying Agile, what I really mean is companies trying Scrum.  The *vast* majority of companies I work with that try to go agile, are really trying to adopt Scrum.  For most companies Agile == Scrum. As I’m sure you all know, Scrum says absolutely nothing about technical practices.

The Waterfall Approach

Let’s step back a moment and look at what happens on countless waterfall projects.  The team goes through an extensive requirements gathering phase, trying to capture everything possible to avoid rework later.  Then there is a lengthy design phase, where software architects come in and try to design an elegant software architecture that can handle the very detailed requirements.  And to be fair, a lot of them do an excellent job at this task.  I have no beefs with the ability of these architects, or the quality of the designs they produce.  They often come up with very effective software architecture/design to satisfy the requirements.

Then the problems start.  As development gets under way, edge cases that were never considered are discovered.  Requirements that don’t make sense are found.  Additional capabilities need to be developed.  None of these things were accounted for in the original design phase.  Rather than re-visiting the architecture, it is contorted into doing things that weren’t originally envisioned, features are tacked on, and technical debt grows.  Despite this, the development team usually manages to make things work and eventually gets to feature-complete.

Then the users get their hands on the software, and all hell breaks loose.  The software doesn’t meet the business needs, requirements were misunderstood, the design doesn’t satisfy the business needs, and lots of rework is required.  Again, none of this rework was taken into account in the original architecture, so it is bastardized some more to make things work.  The dev team that implemented it is probably still all around, and they are pretty familiar with the code base since they just wrote it, so they are able to hack away and make the necessary changes.  Again racking up massive amounts of technical debt.

Then the software goes into maintenance mode.  Over several years more changes are needed, the design is hacked away at.  The original developers move on to greener pastures, and the new developers don’t understand the original design, and hack away some more.  Eventually things get really bad, progress on new features slows to a crawl, the software becomes fragile, people are afraid to touch large areas of the code-base.  And after maybe 5 years (if they’re lucky) the team starts considering a total re-write.  This time we’ll get things right they say.  Inevitably, the same story repeats, and the endless cycle of 5-year rewrites continues.

The “Agile” Approach

What do you think happens when these same teams attempt to adopt Agile/Scrum?  Rather than spending a lot of time gathering extensive requirements up-front, just enough are gathered to get started.  Just enough architecture activities are done, and development proceeds.  Now, rather than developing in isolation for a year, and discovering a few missing requirements, followed by deployment and a bunch more rework; requirements never considered in the original (brief) architecture efforts are introduced every 3-week sprint.  Hacks are made, and technical debt accumulates MUCH more quickly due to the rapidly changing nature of requirements on an agile project.

Instead of the 5-year rewrite cycle on Waterfall projects, the agile code-base can often deteriorate within a year.  Sometimes before the 1st major release is even made, the code-base gets to the point where teams have to throw it away and start over.  This process of code-rot and technical debt accumulation is only accelerated on agile projects if nothing is done to aggressively address it.

 

The Solution

The solution is that in order for teams to be successful adopting Agile, they must adopt more mature technical practices in tandem with the Agile work management practices.  In particular, teams must be laser focused on keeping technical debt under control through aggressive refactoring, and automated testing to enable refactoring with confidence.

The team must be continually refactoring and improving the design, as they adapt to changing requirements and demands on the software.  I won’t go over specific practices in detail here, but the common failure of teams is to think that they can adopt the Agile work management practices, without also focusing on adopting (much) more mature technical practices.


Monday, September 16, 2013 #

In TFS 2012 Update 2 Microsoft introduced the ability to tag Work Items (http://msdn.microsoft.com/en-us/library/vstudio/dn132606.aspx).

image

I absolutely love the idea of tagging Work Items.  Especially because they allow you to add custom metadata without needing to do any WITD customization.  As I mentioned in a previous post, this is a great help to enable the Single Team Project approach.  However, there are some big problems, that IMO make Work Item Tags almost unusable for any large project.

  • Tags in their current form are only really usable in Web Access. When viewing WI’s in VS or MTM you can’t enter/modify tags, and you can only even see them in the results grid, the WI Detail form doesn’t appear to include tags.

  • I can’t query by tags in my WI Queries

  • I can’t merge tags.  Say if I had a tag to indicate hot fix bugs, and some people typed “HotFix” and some typed “Hot Fix”.  Even once I discover this there’s no way to easily correct the tags other than going into each Work Item one at a time and changing it (can’t use Excel to bulk edit Tags either).

  • I can’t delete tags.  So in the above HotFix example, even if I go through and change every WI to “HotFix” the next time somebody creates a new one they will still see the incorrect “Hot Fix” tag in the dropdown.

  • If I have tags that I’m using to categorize Work Items, say “Priority: High”, “Priority: Medium”, “Priority: Low”, I may want to go and ensure that every WI has a Priority tag assigned.  As far as I can tell there is no way to search/query/filter for this.  I can’t do a “show me all work-items without one of these 3 tags”. You can’t do any kind of “negative” filtering.

  • I can’t filter SSRS reports or generate excel based reports on tags.

  • I can’t have security on tags, who can create tags, delete tags, scope tag groups to teams, etc..

These issues are big enough blockers that I can’t recommend teams use tagging until at least some of them have been resolved.  WI Tagging makes for a great demo, but it needs some more love and care before most of my teams can actually start using it.


Tuesday, September 10, 2013 #

As most of you know it’s probably easier to understand Rocket Science than it is Microsoft Licensing.  Unfortunately, I’ve had to deal with it enough over my career that I have a pretty good grasp of at least how MSDN, Visual Studio and TFS Licensing works.  The best resource for attempting to decipher how it works is the MSDN Licensing White Paper.

However, there are several gotchas with the way licensing works.  These are things that most people don’t even realize, and they run afoul of the rules without knowing it.  There are also some good parts of the licensing rules that lots of people don’t know about.  Lets start with the good ones:

 

The Good

TFS is Free

For most teams TFS is actually “free”.  MSDN includes a production license for TFS, meaning so long as you have more MSDN Licenses (VS Pro and up) than you have TFS Servers you don’t need to buy any additional licenses (of course you still need to make sure you have CAL’s for everybody, but your MSDN users will all have CAL’s as part of their MSDN). 

 

SQL and SCVMM Exception

TFS requires a SQL Server and if you’re using TFS Lab you need an SCVMM Server.  MSDN includes a “limited use” license that allows you to install SQL Server Standard and SCVMM without needing to purchase licenses for them, on the condition that they are *only* used for TFS (the SQL Server can be used for the TFS Data Tier and the SCVMM Data Tier and the SharePoint Data Tier, so long as the SCVMM/SharePoint instances are used only for TFS).  Note: If you wish to use SQL Server Enterprise you have to purchase a license for that. 

 

TFS Lab Software

If you use TFS Lab functionality, the virtual machines you setup do not require you to purchase any licenses for Windows/SQL/BizTalk/etc (or for the Hyper-V Host and Library servers) – so long as everybody accessing TFS Lab environments has an MSDN license.  This effectively means that TFS Lab is "free" on the condition that only MSDN licensed users access it.

 

The Bad

Cannot Deploy Development VM’s to Same Hosts as Production VM’s

This is really the most shocking one to me.  Any VM that uses MSDN licensed software cannot reside on the same physical host server as any production VM’s.  This would effectively require companies to have separate virtualization infrastructure for the production and non-production VM’s.  Almost nobody actually does this (in my experience), and as a result is violating the MSDN licensing rules.  Here is the specific quote from the MSDN Licensing White Paper that specifies this limitation:

“If a physical machine running one or more virtual machines is used entirely for development and test, then the operating system used on the physical host system can be MSDN software. However, if the physical machine or any of the VMs hosted on that physical system are used for other purposes, then both the operating system within the VM and the operating system for the physical host must be licensed separately. The same holds true for other software used on the system—for example, Microsoft SQL Server obtained as MSDN software can only be used to design, develop, test, and demonstrate your programs.”

When I first read that I thought I must be misunderstanding what that means because it just sounded so outrageous.  However, I received confirmation from Microsoft (on the ALM MVP mailing list) that this does in fact mean you must have separate hosts for Production vs. MSDN-Licensed VM’s.

 

Software Installation Requires MSDN

Most teams I work with have several Dev/Test/UAT/etc environments that they use.  These are often provisioned by some central IT Ops group, then the Dev team takes over and installs the necessary software.  These environments should not require any purchase of licenses for Windows/SQL/etc because they will be using MSDN licensed software (this is exactly what MSDN licensing was invented for).  However, typically when the IT Ops group provisions the machines, that usually involves installing Windows and Windows Updates, and joining it to the domain.  What most people don’t realize, is that anybody that accesses that machine requires MSDN, even if they are just accessing it to install the OS, or join it to the domain.  So unless your IT Ops group has MSDN licenses for their staff (which they usually don’t), then they are violating MSDN licensing rules.

 

Multiple TFS CAL’s Required for Consultants/Contractors 

If you are somebody that accesses the TFS Servers belonging to multiple different companies, you might not realize that you actually need to acquire a separate TFS CAL for each company.  For example, I work for Imaginet and Imaginet pays for my MSDN subscription.  The TFS CAL that comes with that MSDN is only valid for connecting to Imaginet TFS Servers.  Each client I visit, I would need a separate CAL to access their TFS Servers in theory acquired by that Client.

 

UAT Exception and Using MTM 

There is a specific exception in MSDN that says you can use MSDN Licensed environments (your Dev/Test/UAT environments) for performing UAT even by people that do not hold MSDN Licenses.  In this case, my understanding is UAT is specifically limited to end-users of the application (does not apply to QA staff, who will require MSDN licenses).  The question came up recently whether those end-users performing UAT are allowed to use MTM to facilitate that UAT process (run Manual Test Cases from MTM and use MTM to record the results). The answer is that UAT testers can in fact use MTM without an MSDN license (just like any other MSDN software), *however* they are not allowed to access TFS without a CAL, even when doing UAT.  Since MTM is totally useless without accessing TFS, this effectively means that they cannot use MTM to perform UAT.

The silver lining here, is that if you do wish the UAT testers to use MTM, you don’t need to buy them an expensive MSDN license, you just have to buy them a somewhat cheaper TFS CAL.


Thursday, September 5, 2013 #

I seem to be spending a lot of time lately trying to convince clients that a single Team Project for the entire Enterprise is the way to go.  To most people this seems counter-intuitive.  They tend to create Team Projects for each actual Project and/or Team within their Enterprise.  That just makes sense right?  Indeed, if you look at most books on TFS they will usually have a section with guidance on Scoping Team Projects that usually recommends approaches that result in many Team Projects.  My “go to” TFS book - Professional TFS 2012 - recommends choosing one of 3 approaches: Team Project Per Application, Team Project Per Release, or Team Project Per Team.

However, most TFS experts have come to agreement in the past couple of years that one big Team Project for the entire Company is often the best choice.  The fact that this realization is relatively new explains why you won’t find much literature on it in published TFS books (hopefully that will change in the near future).  Within the group of ALM MVP’s (which includes most people who would be considered TFS experts) most of us agree on this approach.  And in fact most (all?) of the Microsoft TFS Product Team that I’ve talked to about this agree with the Single Team Project approach (they use this approach internally within Microsoft).  Microsoft is actively introducing new TFS features that make this approach easier (more on that below). I also had a conversation with the authors of the Professional TFS 2012 book recently, and they all agreed that their guidance is outdated, and the next edition of that book should focus more on the Single Team Project approach.

The best source of information I’ve found to date on this topic is a handful of blog posts by Martin Hinshelwood (in chronological order):

 

What’s Wrong With Many Team Projects

The core of the problem is that introducing multiple Team Projects introduces constraints and limitations on what you can do, while providing little benefits.  The perceived benefits of multiple Team Projects can usually be realized within a Single Team Project just through different mechanisms, and without the limitations that Team Project boundaries place on you.

What are the reasons that teams typically create multiple Team Projects?  They do so because there are benefits to isolating and separating the various assets of each project/team, and creating a Team Project for each seems like the most intuitive way to achieve this.  If you have two separate projects (by projects here I mean separate Business Projects typically with different teams working on them), then you typically don’t want members of Project A seeing or being able to modify the assets of Project B (by assets, I mean source code, Work Items, build definitions, etc).  In addition to having security around the assets of each team, you also don’t want to have things like Reports, Backlogs, Queries, etc. showing assets from multiple projects.  Each team should have it’s own separate backlog, reports, etc.  And having a Team Project for each seems like an obvious way to achieve this.

However, the multiple Team Projects can cause significant hurdles if you ever wish to move data across a Team Project boundary.  Team Projects are intended to isolate the data stored in each one, and there is no easy way to move some data from one Team Project to another.  For source code, you could easily grab a copy of the source code in one project, and check-it in to another team project.  However you would lose all history.  I think you might be able to also do a Move command to move source code from one project to another, but again the History doesn’t come across (at least not in an ideal way).  You can try to use the TFS Integration Platform (TIP) to migrate source code and history, however the TIP is horribly buggy and awkward to use.  So much so that most TFS Experts I know refuse to even try to use it anymore.

The real problems though come when talking about Work Items.  There is no obvious way to move Work Items between projects.  Again, you might think the TIP is an option, but just like with source code, the TIP is crippled to the point of being unusable.  Excel is a popular choice, export the WI’s from one Team Project into Excel, then use Excel to Import them into the other Team Project.  There are some major issues with the Excel approach:

  • Any HTML fields (such as most description fields) will lose all formatting when being exported to Excel.  This is usually unacceptable.

 

  • If the Work Item Type Definitions between projects differ then you may be in trouble.  At a minimum you will have to go through a mapping exercise to determine how the Work Item Types and fields from one project map to fields in the other project.  Then apply that mapping manually via Excel as part of the Export/Import process.

 

  • Most WI’s only have one valid “starting state”, then defined transitions from that state.  So lets say your WI workflow enforces all WI’s start in “New”, then they must go to “Approved”, then finally to “Done”.  Well how do you migrate WI’s that are already in the Done state?  You can’t just Excel-Import them, because you’d effectively be trying to create a new WI directly into the Done state, which isn’t allowed.  You’d have to first import them all as New, then transition them to Approved, then to Done.

 

  • You will lose WI History.  This is often acceptable for most teams, but thought I’d point it out anyways.

 

To summarize, moving data (specifically WI’s) between Team Projects is complex and extremely time-consuming.  It can be done, but usually requires you to hire a TFS Consultant to help you with it, and that can get very expensive.  As a TFS consultant myself, you might think I’d be happy about this.  After all, my company makes a lot of money helping clients with things like this.  But believe me when I tell you, if I never have to do a WI migration ever again, I will be a happy guy.

 

There is still one last important point here to bring this all together, why would you ever want to move data between Team Projects?  If you specifically created a Team Project for each project/team, then you should never need to move data between them right?  Wrong!  There are a number of reasons why the Team Project structure you create initially may not be appropriate 5 years from now.  Here’s some examples of situations I’ve run into with clients:

  • One team starts a project, then gets handed off to a different team later
    • A common example is where one team will develop some software, then hand it off to a different team for support/maintenance.  That maintenance team is often responsible for many projects.  The maintenance team wants all their work items for all their projects in one team project so they can see one “team backlog”, and roll-up reports across projects.  They also may want to see backlog/reports for each specific project under them also.

 

  • You have a single Team that works on multiple Projects
    • An example I’ve seen is you have 4 projects each in their own Team Project, but you have a shared BI team that handles all reporting/ETL/Data Warehouse tasks across all projects.  Each project wants to see all of the project tasks together, and the BI team wants to see a consolidated list of work/backlog that belongs to the BI team.  If each of the 4 projects have their own Team Project, then there is no way for the BI team to see a consolidated backlog of their work.  If you create a separate Team Project for the BI team then there’s no easy way to move BI work items into that project, and if you duplicate them you run into another set of problems.

One last point that is useful to keep in mind, is that in general it is easier to split a team project up into multiple Team Projects later, than it is to start with many and combine them. 

 

Structuring the Single Team Project

Going with a Single Team Project avoids most of the above problems.  You will obviously never need to move Work Items (or code) between projects.  But now you have the challenge of how to organize data within the Single Team Project to provide separation and isolation.  This is accomplished mostly through a combination of using the Area hierarchy and Teams functionality.  If we imagine we combine what used to be multiple Team Projects into a single team project, we end up with many “sub-projects” (not an official term) within the single Team Project.  We can create a root Area for each sub-project, a TFS Team for each sub-project, and a root source control folder for each sub-project.  Then we can use the Area field to filter all reports and queries.  Each Team is tied to the related Area and is used to provide each Team/Sub-Project with it’s own Product Backlog.  And Security can be granted based on Area and/or Source Control Path. 

  

Areas 

You should create a root area for each sub-project.

 

Iterations

Just like with Area hierarchy you should create a root node in the Iteration Hierarchy for each sub-project.  Then you can maintain/manage the sprints/iterations for each sub-project separately.  

 

Source Control 

Create a root folder for each sub-project.

 

TFS Teams 

Create a TFS Team for each sub-project.  You can create a hierarchy of Teams, so each sub-project could potentially have many teams each with their own Product Backlog, then then roll-up into the parent team for that sub-project.

 

TFS Security Groups 

You probably don’t want to use the default TFS Security Groups (Reader, Contributor, Build Administrator, Project Administrator).  If you give somebody the Contributor role, it will make them a Contributor across every sub-project which probably isn’t desired.  What you should do is create those 4 groups for each and every sub-project.  So if you have 5 sub-projects, you should end up with 20 TFS Security Groups (plus the original 4 that aren’t sub-project specific).

 

Work Item Security 

Grant the various sub-project Security Groups permissions for the root Area associated with that sub-project.  This will ensure that only members of that team can edit/view WI's that belong to that teams' Area(s).

 

Source Control Security 

Just like Work Items, you should grant the sub-project Security Groups permissions only to the root folder associated with that sub-project.

 

Build Agents/Controllers 

Ideally you would have a Build Controller for each sub-project, then one or more Build Agents/Build Servers for each sub-project.  Having separate build-agents/build-servers for each sub-project has long been a good practice.  This is because a team typically needs admin rights on their build server, and often need to install various software/SDK’s/frameworks/etc. onto the build server to support their build process.  This is different for each team typically, so you usually don’t want to share build servers between teams.  The separate build controller recommendation is to avoid the possibility of a team accidentally using another team’s build server.  If we were to share a Build Controller, it is too easy to accidentally configure the build to use any build agent regardless of “agent tags”, but this will have the unintended behavior of using other teams/sub-projects build servers/agents.  By forcing me to pick a sub-project specific build controller when setting up a build definition, this makes it harder to accidentally run into this situation.

 

Build Definitions 

When you have a Single Team Project, you are going to end up with a very long list of Build Definitions since all sub-projects Build Definitions will all be mashed together into one list.  In 2010 there used to be a tool called InMeta Build Explorer that would allow you to introduce virtual folders to organize them.  That tool does not work in TFS 2012/2013, however there are new features built into TFS/VS that make things more manageable. You can now filter/search through the Build Definition list right in Team Explorer, you can also setup My Favorites and Team Favorites to make your most common builds more visible.

 

Work Item Queries 

You can create folders to organize Work Item Queries.  You should create a root folder for each sub-project, then assign permissions appropriately so each sub-project Security Group only has permission to it’s relevant WIQ folder.

 

SharePoint Portal

If your team uses a SharePoint Project Portal then you will ideally want a separate Portal for each sub-project that only shows data for that sub-project.  You can accomplish this by opening up the main Portal (for the Team Project) then create a new SubSite for each sub-project.  When you create the SubSite pick the TFS Project Portal Site Template.  Then open up the new Portal and you will need to edit each Web Part properties to ensure they all filter by the relevant Area for this sub-project.

 

Reporting

For the Reporting component of TFS no changes are needed.  Most (all?) Reports include the ability to filter by Area, so you simply filter by the Area corresponding to the Sub-Project you are interested in.

 

What’s The Downsides to a Single Team Project

There are a couple downsides to doing the Single Team Project approach:

  

More Complex Administration 

Adding a new sub-project requires a little leg-work to setup the proper security groups, root areas, WIQ folders, etc.  You generally need to have a central TFS Administrator group/person that understands this process (and probably has it documented) and performs it whenever a new sub-project is needed.  For example creating a new sub-project might consist of the following steps:

  • Create root Area
  • Create TFS Team
  • Create WI Query Folder
  • Create root Source Control folder
  • Create Team-specific Security Groups
  • Setup Build Controller/Agent
  • etc

  

Process Template Customizations 

Any customizations to the Process Template or Work Item Type Definitions will apply to every sub-project.  This is by far the biggest problem.  Any custom fields, or custom WI workflow states, or even custom WI Types will show up in all sub-projects.  Microsoft appears to be rapidly introducing features that help mitigate this problem.  Some examples of ways to mitigate this could be:

  • Ensure all changes go through a central TFS Administration group.  This group can try to minimize customizations by suggesting alternatives (ex. adding a transition reason rather than an entirely new state).  Also you can try to make then more generic (e.g. Adding a custom field to link to a bug in HP QC, you could call it Quality Center ID, or you could call it Reference ID then it can be used across other sub-projects to link to other external systems that maybe aren’t QC).

 

  • Work Item tags allow you to input metadata against a Work Item without needing to customize the WITD.  This is *great* for the Single Team Project approach, because you can now enter sub-project specific information without needing to add customizations that affect other sub-projects.  For example, if one sub-project really wants to designate some Bugs as HotFix, they might try to add a custom-field “HotFix?” with a Yes/No dropdown.  The problem is, that HotFix dropdown will show up on all sub-projects, and other sub-projects probably would have no idea what that even means.  Instead you could simply apply a HotFix tag to those WI’s in that specific sub-project only.  Another example, is only a small number of sub-projects want to put a Priority field on Tasks (High, Medium, Low).  Instead of a custom field they could simply use tags: “Priority: High”, “Priority: Medium”, “Priority: Low”.

 

  • Kanban Boards introduced into TFS recently allow you to define different Kanban states/workflows for each Team.  This can often allow you to keep the Workflow states very generic just to support cross-project reporting.  Then the team-specific Kanban states can be used to model each teams specific workflow.

 

  • Work Item Extensions is a new feature introduced to enable the KanBan Boards.  It is not currently possible for anybody other than Microsoft to take advantage of this feature, but the TFS Team used Work Item Extensions to implement the KanBan features.  It adds custom fields to WI’s dynamically based on the Area Path of the WI.  In theory (and this is my hope), as this feature evolves it will allow us to define custom fields that only apply to a specific Area associated with a specific sub-project.

Sunday, June 2, 2013 #

In Visual Studio 2010 we had “Database Projects” that allowed us to design/develop/deploy databases.  In Visual Studio 2012 this was overhauled and is now part of SSDT (SQL Server Developer Tools).  While the core functionality is extremely similar there are some differences between the two that I’m going to try and describe in this post.

PRO: Visual Table Designer

In 2010 we only had a basic text editor for editing the various .sql files.  In 2012 We get both the T-SQL and graphical table designer side-by-side.

Table Designer

PRO: Simpler File/Folder Structure

In 2010 the default folder structure is rather complex (even if you have a trivial database).  In 2012 the default folder structure is much simpler, flatter, and it only creates folders for object types that you actually import (unlike 2010 where it created placeholder folders for all object types even if you didn’t have any).

In addition, 2012 generates less files when you do a database import.  It keeps everything related to a specific table in a single file.  In the above example of the Products table, that is 1 file (Products.sql) in 2012, in 2010 it would have been 14 files (table + 10 defaults + 1 pkey + 2 fkeys).  I much prefer the new structure.

2012 Folder StructureDB Folders

 

PRO: Fewer Build Outputs / Config Files

In 2010 when you build a Database Project it outputs the following files:

  • dbschema – the xml description of the database objects
  • sqlcmdvars – a set of values for any custom sqlcmd variables you may have defined
  • sqldeployment – config values that control the deployment process
  • sqlsettings – database properties
  • postdeployment.sql – post-deployment script
  • predeployment.sql – pre-deployment script
  • deploymanifest – manifest file with links to the other 6 files

In 2012 when you build you get the following output:

  • dacpac – the description of the database objects (this includes pre/post deployment scripts and database properties)
  • publish.xml – the publish profile that contains deployment config values and sqlcmd values

If you want to provide DB deployment files for multiple environments things get even worse in 2010.  In 2012 you would have one dacpac, and a publish profile for each environment.  In 2010 you need separate copies of the sqlcmdvars/sqldeployment/deploymanifest files.  So if we imagine we have 3 environments (DEV, QA, PROD), under 2010 we would require 13 files – under 2012 we require only 4 files.

PRO: Support In SQL Server Tooling

In 2010 the deployment tooling (VSDBCMD.exe) was all custom tools specific to the Database Project system.  In 2012 it uses DACPAC files which are a pre-existing SQL Server concept.  The SQL Server toolset that DBA’s know and love (SSMS) already includes support for performing operations with DACPAC files.  This may help get buy-in from the DBA’s when you propose giving them a DACPAC rather than a SQL script at deployment time.

PRO: Available in all Visual Studio SKU’s

In 2010 you needed to have at least Visual Studio Premium to take full advantage of Database Projects.  The .sqlproj in 2012 is provided as part of SSDT (SQL Server Developer Tools) which is free.  You can use it even if you are only using the free VS Express Edition.

CON: Data Generation Plans are Gone

SSDT does not include any equivalent for the Data Generation Plan feature in 2010 (it also didn’t have Database Unit Testing when it first shipped, but that was added in the December 2012 SSDT Update).

CON: Refactor Doesn’t Include Pre/Post Deployment Scripts

In 2012 we have similar Refactoring support (rename columns/tables) and it will find all references and update them all at the same time.  However, in 2012 it will not update any references in the Pre/Post Deployment Scripts (in 2010 it would).  This is compounded by the fact that my Post-Deployment Scripts tend to be doing a lot more work now that Data Generation Plans are gone (I generate my sample data in my Post Deployment Script instead).

CON: (Some) Deployment Options Missing in 2012

You can specify a bunch of configuration values that control the deployment process (.sqldeployment file in 2010, the Publish Profile in 2012).  Some options that existed in 2010 are missing in 2012.  The main one that I’ve run into is the Ignore Column Order option is gone – this allows you to indicate to the deployment tooling that if 2 tables are the same except for a different order of columns do nothing.

CON: Provider Extensibility Model Gone

In 2010 there was a Provider Extensibility Model that allowed 3rd parties to author alternative Providers that could plug-in to the 2010 Database Projects providing support for non-SQL Server Databases.  The only one I know of was the Oracle Provider from Quest (called TOAD Extension For Visual Studio), but it was used by *a lot* of Oracle developers.  In 2012 it is a strictly SQL Server-Only tool.

Conclusion

So which one is better, 2010 or 2012?  While the PRO list is actually longer than my CON list, most of the PRO’s are superficial improvements that I could live without if I had to, or work-around them (I don’t need a GUI table designer, fewer files/folders is nice but not a killer feature, etc).  The CON’s are generally things that I can’t workaround (need to develop against Oracle? Too Bad) and can be big headaches.  In my not-so-humble opinion, I feel like SSDT tooling with VS 2012 is a small step backwards over what we had in 2010.

 

Update:

It was pointed out to me by @Gregory_Ott that I missed one (if anyone knows of anymore I missed leave a comment and I’ll keep updating this post).

CON: Code Analysis Rule Authoring

In 2010 there was an extensibility point allowing you to author your own T-SQL Code Analysis Rules.  In 2012 this is no longer possible (http://social.msdn.microsoft.com/Forums/en-US/ssdt/thread/1bc90f25-d3c7-4f52-a4e3-8e2cec2ff135/).


Sunday, May 19, 2013 #

The past few weeks I’ve been helping a client come up with an Enterprise Architecture and I realized that I seem to have zero’d in on an EA that I would probably use at most places.

First off what do I mean by Enterprise Architecture?  I know lots of people use this to mean different things, for this post I’m using the term Enterprise Architecture to describe how the various applications and systems in an Enterprise will interconnect and integrate with each other (where necessary). Effective Enterprise Architecture should enable powerful integration scenarios and application re-use, while encouraging loose coupling to minimize the cost of change and impact of change on other systems.

The EA described below relies on Web Services and a SOA approach, while also leveraging the PubSub (Publish/Subscribe) pattern common in Message-Based Architecture.

While this Enterprise Architecture describes the external interfaces each system exposes, and how data flows between them, it does not describe the inner workings of each specific System (that’s the Application Architecture). Having a well-thought out Enterprise Architecture enables flexibility in choosing Application Architectures. Within each system you can choose to use a different Application Architecture, or even change a System’s Application Architecture in the future with minimal impact on other systems.

 

Integration at the Database

Most teams I encounter do integration exclusively by reading/writing directly from each Systems’ databases.  Although the majority of software teams out there probably do integration by database, in my experience the majority of software teams also deeply regret this decision.  Integrating at the DB level tightly couples Applications to the database schema design, making it risky to ever change that design.  It also limits reuse of application logic limiting you to the reuse of data only.

Service Oriented Architecture

Enterprise Architecture consists of breaking down the software ecosystem into independent Systems, and defining a well-known way for those Systems to integrate and/or exchange data. A common approach to manage this integration is to take a SOA approach, and wrap each system in a (Web) Service with a well-defined Service Contract. This moves the inter-system dependencies to the Service Layer rather than the Database layer. At first glance this would seem to simply move the coupling from the DB to the Service Contract, making it risky to ever change the Service Contract. However, it does enable re-use of application logic in addition to data. But more importantly there are well-known techniques to evolve Service Contracts while maintaining compatibility.

The most common approach to Service evolution is to expose separate End-Points for each version of the Service Contract. This way if you wish to modify the Service Contract, you publish a new End-Point with the new Service Contract while leaving the End-Point with the previous Service Contract active. Then implement a compatibility layer that translates service calls from the old Service Contract to the new Service Contract. This way the core System logic needs only support the most recent version of the Service Contract. It also provides a convenient place to introduce logging to understand which Systems are still depending on the old Service Contracts and plan the upgrade work required to move them to the new Service Contract, enabling the eventual retirement of older Service Contracts/End-Points.

PubSub Pattern

The introduction of an SOA approach, would be a marked improvement over integration at the DB level, however there are still challenges with a Service only approach to integration:

  • Availability - Service-level integration introduces availability concerns. Imagine a scenario where System A depends on Systems B, C, and D via their Service Contract. System A either needs to retrieve data from B/C/D to perform some work, or some operation in System A needs to request B/C/D to perform an operation as part of the System A operation (or often both of the above). If any of the B/C/D systems become un-available it will also impact the availability of System A, as now any System A operations that interact with B/C/D will also fail. System A’s availability is now tied to B, C, and D’s availability.

  • Coupling - Let’s imagine that we’re System A developers, and when some important operation in System A occurs, Systems B, C and D need to be notified so they can perform some related action. There are really two options, either B, C and D can poll the System A’s Service constantly querying to see when the relevant data in A has changed. This will have significant performance impacts on System A. The alternative is for System A to explicitly call some Service method in B/C/D when the relevant operation in System A happens. This not only incurs the availability concerns noted above, but what happens when another team develops System E that also wishes to be notified? Does the System E development team now need to ask System A’s development team to make changes to System A in order for System E to work? This is not a good situation to be in.

My approach is that in addition to wrapping every System with a Web Service, we also borrow from Message-Based Architecture and use a PubSub (Publish/Subscribe) pattern. In this pattern each System would publish “Domain Events” when things of interest occur within the System (ideally a System would publish an event every time any data owned by that System changes). Using one of the readily available messaging frameworks, this makes it easy for any System to subscribe to Events from any other System. Whenever anything of note happens in System A it simply publishes an event with the related data, and any other interested systems can subscribe to that Event and react accordingly. This way System A (from the above example) has no knowledge or dependency on the other Systems that may subscribe to its events (System E developers can create their System without having to ask System A developers to make any changes). If System A goes down it will not affect Systems B/C/D, and if System B/C/D go down it will not affect System A.

The other scenario is if System A requires some data from Systems B/C/D to perform an operation in System A. Rather than calling Service methods on B/C/D when that data is required, instead System A can subscribe to the relevant events in System B/C/D when that data changes and System A can maintain its own data cache of data “owned” by B/C/D updating it when the relevant B/C/D events are received. This way the System A operation can complete, even if B/C/D are all unavailable at the time. The important principle to keep in mind with this approach is that a given piece of data can only be “owned” by a single system. System A is free to cache data from B/C/D, but if System A wishes to change any data owned by B/C/D it must “ask” those systems to change it via the B/C/D Web Service. We also need to ensure that the messaging infrastructure we put in place has guaranteed delivery; meaning if System A happens to be down, when it comes back online it will still receive any Events that occurred while it was offline (modern messaging frameworks mostly handle this for us).

Each System under this Enterprise Architecture should look like the following:

Custom System

In this case the “Core Domain” is the actual implementation of that system (we’re assuming in the above diagram that there is some Domain DB contained within it, but that’s not necessary). The actual Application Architecture contained within the Core Domain is irrelevant to the Enterprise Architecture. The “Core Domain” in this case may even be a Commercial Software package such as AX or SAP.

In the case of a Commercial Software system, it often won’t support the Enterprise Architecture proposed here. In that case we need to wrap it with the appropriate integration layer to support the Enterprise Architecture. Consider the below example of a Dynamics AX System:

AX System

In the case of AX I believe it already exposes a Web Service, however, in the case that it didn’t and AX Integration was performed some other way (e.g. DB integration, copying files in a specific format to a specific directory, etc) we would create a custom Web Service that exposed those integration methods over a Service Contract (we don’t want any non-AX Systems talking directly to the AX Database except AX itself, and any Integration Layer we would write). Likewise, AX doesn’t publish Domain Events (and even if it did it wouldn’t do so using the Messaging framework we chose), again we can write some plumbing code to add support for this. In the above diagram we are writing a custom component “AX Event Generator” that would poll the AX Database looking for interesting changes in data and would raise the appropriate Domain Events that other Systems could subscribe to (some COTS Systems may have some notification system or way to “hook” system events eliminating the need to poll the DB). If we wanted AX to respond to Domain Events from other Systems, we would write a simple Event Consumer component that subscribed to Domain Events from other systems and executed the appropriate action in AX.

Using this approach, for other Systems to integrate with AX they no longer need to understand AX Database schema, they don’t need to understand any unusual integration mechanism that AX may use, they only need to understand the Web Service Contract and the Domain Events raised by the AX System (just like every other System).

 

Typically System-To-System Communication is done primarily via Domain Events. Clients (i.e. UI’s) primarily communicate via the Web Service(s).


Sunday, May 5, 2013 #

In the last couple of posts I talked about how larger aggregates make enforcing invariants easier, but smaller aggregates reduce concurrency conflicts.  You need to use domain knowledge to choose aggregate boundaries that minimize the chances of invariants spanning aggregates, and minimize the chances that multiple users will be editing the same aggregate simultaneously.

 

In this post I want to cover how I enforce the invariants (hopefully few) that do need to span aggregate boundaries.  As I see it there are basically two choices:

  1. Multi-Aggregate Locking
  2. Minimize-And-React Approach

If you recall the problem with enforcing these invariants is that you need to acquire a lock on multiple aggregates in order to prevent a race condition.  Lets look at the example of preventing customer orders that exceed that customers credit limit.  In order to enforce that you’d have some code that checked the invariant when creating new orders.  It would essentially do something like:

if (SUM(All-Outstanding-Orders) + NewOrder.Total > Customer.Credit_Limit)

then Reject-Order

else Save-Order

 

In order to avoid race conditions we need to guarantee that none of the data in the if statement changes between the time that the condition is evaluated and the order is saved.  In this case that means none of the existing Outstanding orders can change, and the customers credit limit can’t change, also no new outstanding orders can be created.  If we assume that Customer and Order are separate aggregates, that means that we need to lock the Customer aggregate, and each aggregate corresponding to an Outstanding Order.  The tricky part can be that we also need to ensure no new Order Aggregates for that customer are created.

Most people I talk to about this are surprised at the complexity I seem to be talking about.  They think that they have been writing applications for many years and never had to worry about this consistency stuff.  Indeed, I used to think the same way.  However, it turns out most applications have subtle race conditions such as the above and nobody even realizes it or cares.  And this is perfectly acceptable!  Rare race conditions, may not be worth the development effort to eliminate and/or handle.  However, as an application architect at a minimum I like to know that these race conditions exist and can make a conscious decision whether I want to invest time dealing with them or not.  Even if the decision is that it’s not worth worrying about, I want to make sure that these are explicit decisions, rather than unexpected surprises when they may arise later.

 

So having said that, we know we have an invariant that crosses aggregate boundaries, so there is a race condition.  What are our options?  Of course we can just ignore it. A race condition will rarely occur, and the impact of it occuring (e.g. a customer getting an order approved that may exceed their credit limit) may be acceptable.  Especially in applications that measure the users in dozens, race conditions should be extremely rare.  However, if you measure your users in thousands (or more) the rare race condition, may actually occur quite frequently.

 

The first option from above is to implement some form of multi-aggregate locking.  For the Customer Credit Limit example, we would need a way to lock the customer aggregate (which contains the credit limit data), all active orders, and also lock all new orders for that customer.  There are a few options for doing this at the application level, and even at that database level (perhaps using transaction isolation levels).  However, I typically try and avoid the complexity involved in this approach.

 

The 2nd option is what I’m calling “Minimize-and-React”.  Rather than trying to eliminate the race condition (via locking), try to minimize it (by doing the check at the last possible moment – as we probably are already doing), then put in place a mechanism to detect when the race condition has been violated and react appropriately.  In a lot of cases the “react” portion should probably just be sending an email to a human to investigate.  When using an architecture that uses “Domain Events” you can create what some people call a “Saga” (although not an entirely accurate term).  In this case you would create a Saga for each cross-aggregate invariant you wish to enforce, and have it subscribe to the appropriate events to detect when the invariant has been violated.  Then take the appropriate actions (e.g. send an email to notify somebody, or possibly execute compensating actions).

 

In the Customer Credit Limit example, I could create a Saga that subscribed to the events: CustomerCreditLimitChanged, OrderCreated (and probably other events such as CustomerOrderChanged, OrderCancelled, etc).  Basically, any events which could impact the evaluation of the invariant.  Since the Sage subscribes to Events, which by definition represent actions which have already occurred, it can detect violations of the race conditions without the race conditions present in the Domain Model (Aggregates).  So in the saga I would subscribe to the various events, and in each handler call the code to check whether the invariant has been violated.  Then take the appropriate action in response – typically either sending a notification to somebody, or taking some correcting/compensating action.


Monday, April 15, 2013 #

In the last post we looked at how aggregate boundaries affect our ability to provide consistency guarantees and enforce invariants across our domain model.  What we said is that enforcing an invariant within an aggregate boundary – rather than invariants that span aggregates – is much easier to do.  So based on that we would want to design our software with very large aggregates.  Taken to the extreme we could have the entire domain model within a single aggregate.  This would allow us to easily enforce any invariant without ever needing to worry about consistency across aggregate boundaries.

The downside to having excessively large aggregates is the impact it has on scalability.  I’m not taking about scalability in terms of adding more servers and hardware to increase throughput.  But rather scaling the amount of users using the system.  When you have large aggregates, that also means that when you “lock” your data to provide consistency guarantees you are locking large amounts of data at once.  In the extreme example of having the entire domain inside one aggregate, you will be locking the entire domain model.  If your system is only ever used by a single user at a time, then that is actually perfectly reasonable.  However, most systems we build are used by multiple users at the same time.  If we had one giant aggregate that means that anytime anybody changed any data, it would increment the version number and any other edits in progress will get a concurrency exception when they try to save (when the concurrency check looks at the version # and sees somebody else has changed it in the middle of that user editing it).

If we start to split up our domain model into smaller aggregates, it reduces the likelihood of concurrency exceptions happening.  If we have each Customer as an Aggregate (containing the Orders), then you will only get concurrency exceptions if two users are trying to edit Orders for the same customer.  If you make Customer and Order separate aggregates you only get concurrency exceptions if two users are trying to edit the same order at the same time.

So now we have two competing desires, larger aggregates give us flexibility for enforcing invariants, smaller aggregates give us less chance of concurrency exceptions.  We have to make a tradeoff between these two properties.  We have a little more flexibility than just size of the aggregate though, we can strategically choose how to place those boundaries.  You can have to 2 similarly sized aggregates, that encompass different sets of entities; one of those aggregate boundaries may be better than the other.

What I try to do is choose aggregate boundaries such that most of my system’s invariants will not have to span aggregates, but also try to choose them such that there is a small likelihood that multiple users will be simultaneously updating the same aggregate.  Ultimately this all comes down to examining your business domain, and expected usage patterns of your application in order to make the best decision here.  Let’s look at a couple of examples.

 

Customer / Orders Example

In the Customer/Order example, we’ve talked about 3 different possibilities for Aggregate boundaries:

  1. Single aggregate encompassing the entire domain model
  2. Customer aggregate that contains Order entities
  3. Separate aggregates for Customer and Order

Assuming we have a system that is used by many users simultaneously, we can probably rule out option #1 pretty easily.  In order to decide between #2 and #3 I’d have a discussion with the domain experts, and try to get a feel for the usage patterns for creating/maintaining the Orders data.  Do they have account managers that are responsible for specific customers?  If so it’s unlikely that multiple users will be editing Orders belonging to the same customer, so I would likely go with option #2 because of benefits of easier enforcing invariants across orders.  If it was more of a call-center type business where anybody can enter orders for any customers I might start considering option #3.  However, I might also start asking about their typical scenarios.  If we’re talking about the system that takes online delivery orders for Pizza Hut, it’s pretty unlikely that multiple orders for the same customer are going to be undergoing changes at the same time (by multiple users).  In fact, I’m having a hard time coming up with any example system that takes customer orders that would commonly have multiple users editing orders for the same customer at the same time.  That would lead me towards option #2 from above.  But the key point I’m trying to make is that the decision should be driven by business/domain knowledge, and take into account the consistency vs scalability tradeoffs.

 

Poker Example

Lets look at another example, my software to manage a weekly poker league.  In this case I could see a couple obvious choices:

  1. Single aggregate encompassing the entire domain model
  2. Each Game is an aggregate

If we remember the sample invariants from the last blog post, the examples I used were:

  1. For a completed poker game, total pay-in must equal total pay-out
  2. There can be only one poker game for each week

The first invariant can be enforced easily enough with either choice of aggregate boundaries (all the data involved is contained in a single Game), but the 2nd invariant would span aggregates if we had an aggregate for each game (we need to look at the set of all games in order to validate the invariant).  So there’s a clear consistency advantage for option #1.

If we look at it from a scalability perspective, lets consider whether we are likely to have multiple users editing data at the same time?  In this example scenario it’s actually pretty reasonable to have the entire domain model as a single aggregate.  The only significant data updates are somebody entering in the results of a new game (once a week), or maybe tweaking some past mistakes.  Regardless, it’s unlikely there will be multiple people editing data at the same time, so in this case I would introduce a new entity called League (we need something to act as the aggregate root), and have it contain a collection of all Games.

If we take this example a little further, lets imagine we want to offer our poker league manager as SaaS.  Now we have many leagues stored in our domain model.  In that case it doesn’t seem reasonable to have somebody editing one leagues data lock *all* leagues data (as it would if the entire domain model was still a single aggregate).  In that case it would seem to make sense to have each separate League be it’s own Aggregate.  This also appears to work well as it’s unlikely we would have any invariants that span Leagues.

 

 

In the next post we’ll take a look at what I do when I realize I need an invariant that spans Aggregate boundaries (hint: it’s much more painful than invariants within an Aggregate boundary).


Sunday, April 7, 2013 #

Those who know me know I’m a pretty big fan of the CQRS set of design patterns.  CQRS style architectures typically borrow / build-upon the DDD (Domain Driven Design) set of patterns (in fact before Greg Young coined the term CQRS he was calling it DDDD [Distributed DDD]).  One pattern that’s pretty central in DDD is the concept of Aggregates.  This is the practice of splitting your domain model up into pieces, and these pieces are what we call Aggregates.  Each aggregate may contain several “Entities”, but must contain a specific Entity that is designated as the Aggregate Root.  Examples of Aggregates could be Customer, Product, Order, etc.

 

A lot of people – even people that claim to be doing DDD – will just naturally make almost every entity into it’s own Aggregate.  They are missing an important design decision around scoping your Aggregate boundaries appropriately. As per Evans’ DDD book, Aggregates are intended to define the consistency and transactional boundaries of your system.  This has some really significant implications that make it important to choose your Aggregate boundaries with care.  There’s a bunch of literature providing guidance around choosing your Aggregate boundaries, but in this blog post I want to talk a little bit about what I think about when I do this, and provide some examples.

 

Consistency

When designing software you need to understand what consistency guarantees you have (and probably more importantly the guarantees you don’t have).  I see too many intermediate/advanced software developers take on the task of designing/architecting important software, without properly understanding the consistency aspects of the system and the tradeoffs involved.

 

Consistency is being able to guarantee that a given set of data is all from a specific identical point in time (I’m sure there’s a better official definition, but that’s how I think about it).  This is important because most software has a set of invariants (fancy word for “rules”) that you want to enforce across the domain model.  A few examples of invariants might be (I’m in the middle of building some software to manage our weekly poker league, so I hope you like poker related examples):

  • Total value of unpaid orders for a customer must not exceed that customers’ credit limit
  • For a completed poker game, total pay-in must equal total pay-out
  • Username must be unique
  • There can be only one poker game results for each week (weekly league)

 

These are rules that our software system is expected to enforce.  If a customer tries to place an order that would exceed their credit limit the system should reject it.  Likewise, if somebody tries to enter a username that’s already taken the system should reject that to ensure the invariant is kept intact.

   

What might not be immediately obvious is that you need to have some consistency guarantees in order to enforce every single one of those invariants.  “Locking” goes hand in hand with consistency, as that’s typically how you achieve consistency guarantees.  So for the first example (orders + credit limit)  in order to enforce that invariant you need to have a consistent data set representing all of that customers unpaid orders, *and* you need to be able to acquire some kind of lock, so you can ensure that nobody writes a new order in between the time you do the invariant check (sum(orders) + new_order_cost <= customer.credit_limit) and save the new order.  If you can’t lock that data, you end up with a race condition, that could result in the invariant being violated.

   

Most software I encounter uses optimistic concurrency/locking to achieve this.  Usually this means adding a version # to your entities/aggregates, then checking that it hasn’t changed since you retrieved it when saving.  For example, if the user is editing the customer information, the software will keep track of the customer version # that was retrieved when the user started editing, then when they hit save the system will check that the version # in the database hasn’t changed before it writes the updates (if it has changed it will reject the update with some kind of concurrency exception).  You also need some way to “lock” the Customer aggregate/entity to prevent race conditions between the time we check the version # and actually writing the updates.  For a typical system that uses a Relational DB (e.g. SQL Server), you might be able to rely on DB features to enforce the locking and prevent race conditions.  If you’re doing something like Event Sourcing you will need to implement your own or use a 3rd Party Framework that does this for you.

 

If we come back to the original topic – aggregate boundaries – these come into play because it turns out it’s pretty straightforward to enforce invariants within an aggregate, but if you have an invariant that spans multiple aggregates, it becomes significantly harder.

 

Back to the Customer/Orders example.  If we assume that both Customer and Order are separate Aggregates, then they will each have their own version #’s.  In order to enforce the credit limit invariant we need to get all unpaid orders for that customer and sum up the order totals and compare with the customers credit limit.  To do this properly we need to make sure that the data doesn’t change out from under us while we’re checking the invariant, meaning we would in theory need to lock the customer, and every order that we’re are looking at.  But we would also need to ensure that no new orders for that customer are created also.  With the simple version # per aggregate implementation, that is simply not supported (at least not without a lot of added complexity).

 

What if we were to change our aggregate boundaries?  Lets say that Order isn’t a separate aggregate but we have a collection of Order entities contained within the Customer aggregate (Customer entity is the aggregate root).  Now enforcing the invariant is easy, because all the data necessary is contained within a single Aggregate.  We can easily lock the Customer aggregate (using the single version # we have) and enforce our invariant.

   

There’s certainly techniques for enforcing invariants that cross aggregate boundaries, but it definitely adds complexity (more on this later).

   

If we only consider consistency guarantees when designing our aggregate boundaries, then we would want to make our aggregates as large as possible.  The bigger the aggregate, the more power and flexibility we have to easily enforce invariants.  If we take it to the extreme, we could make our entire domain model a single aggregate, with one version # for the entire domain.  However, consistency isn’t the only consideration.  We need to make a tradeoff between Consistency and Availability/Scalability.

 

In the next post I’ll take a look at how Availability / Scalability comes into play when choosing Aggregate boundaries, and take a look at options for enforcing invariants that span aggregate boundaries.


Sunday, March 24, 2013 #

I’ve been working with a lot of clients over the past couple years helping them adopt TFS Lab Management.  One discussion that always comes up is how to architect the infrastructure required to run TFS Lab.  I’m going to try and put down in writing the advice I usually give so I have somewhere to point people to in the future.

There are 3 main components in TFS Lab:

  • Hyper-V Host(s) – A server to host the running Virtual Machines (and yes, it must be Hyper-V)
  • Library Server(s) – A place to store the VM Templates, and stored VM’s.  This is essentially just a network file share.
  • SCVMM Server – The centralized server that manages all this infrastructure.

The Hyper-V host must be a physical server (no you can’t create a VMWare VM and run Hyper-V inside of that – well, my co-worker actually had a client that got that to work, but the performance was horrendous so don’t do that).  This means that setting up TFS Lab for the first time will require you to purchase/acquire at least one physical server.

If your datacenter is run off VMWare like so many of my clients are – it’s usually not a big deal to purchase a server specifically for TFS Lab that sits off in a corner running Hyper-V.  In fact, if you run your datacenter on Hyper-V already, I’d still recommend isolating your TFS Lab away from your main virtualization infrastructure (i.e. setup a new SCVMM and hosts, don’t try to reuse your existing one).

 

Single-Server Deployment

Most of my clients start by adopting TFS Lab for one specific project, with the intention that if they like what they see they will scale up it’s use across the rest of their projects in the future.  What I usually recommend to get started, is purchase a single beefy server (more below on typical server specs and price), and run all components off that one server.

Single Server

For a single team project this works great.  One nice thing is that because everything is on a single server your network infrastructure won’t play an issue in performance.

Note: SCVMM requires a SQL Server to store it’s configuration data.  This is not pictured here, but I will almost always use the same SQL Server that TFS uses for it’s configuration/collection databases (i.e. not any of the servers pictured here).

Notice I host the SCVMM instance inside of a Virtual Machine (and SCVMM manages the host that it is actually running in – sounds kind of wacky but it works fine).  This is contrary to the Microsoft guidance.  Me and my co-workers have setup TFS Lab for many clients, and typically put SCVMM inside of a VM and have had no issues.  In fact there are some important benefits you gain by doing this.  Most importantly is it becomes easier to move the SCVMM server off to a different physical host down the road (as we will do in the below examples).  If you install the SCVMM server on the physical machine (like so many people tend to do), when it comes time to scale out your Lab Infrastructure it is much harder to re-locate that SCVMM server elsewhere.

 

 

Physical Hardware Advice

For teams starting with TFS Lab the typical server hardware I recommend is around a $15,000 server.  Nowadays, that should translate to a server with around 16 physical cores (32 logical with HyperThreading), 128 GB of RAM, 6x 2TB 7.2k RPM SATA drives (2TB RAID 1 Array for native OS + Library, 4TB RAID 10 Array for Host).  Obviously, this depends on the size of the team, and complexity/size of the environment required for your application.  But probably 90% of the teams I help setup Lab for the first time end up going with around a $15k server to start.  Note: There is Microsoft guidance somewhere that recommends not to use Hosts with more than 48 GB of RAM; that guidance is outdated and misleading IMO, and I suggest you disregard it.

See my post on Lab Capacity Planning for a more detailed approach to determining hardware requirements.

I usually create Lab VM’s with 1 CPU + 4 GB RAM, and typically budget about 100GB per VM for the VHD + Snapshots.  Leaving some resources for the host OS, the above machine specs would allow you to create ~30 VM’s.

Note: TFS Lab Performance is very dependent on the disk subsystem.  You want to maximize the number of spindles to increase parallelism, I usually advise going with many cheaper 7.2k RPM drives to maximize spindles and data density.  For ultimate performance SSD is an obvious choice, but TFS Lab still requires a significantly large amount of storage making SSD too expensive for most teams (hopefully that should change in the next couple of years).  There are 2 scenario’s where performance can be an issue, the large transfers from Library->Host that occur when you deploy a new Environment; and the operation of an existing environment.  I tend to focus on the latter, which means paying close attention to the disks used by the Host(s).  I spent a bunch of time recently working with a client to diagnose performance issues; to benchmark disk performance the SQLIO tool and this article are priceless.  Also be careful when it comes to SAN.  SAN storage tends to be much more expensive than local attached disk (especially in the capacities we’re usually dealing with for TFS Lab – watching the face of a SAN Engineer when requesting 10 TB of storage is a fun exercise regardless =)), and there are many more moving parts in a SAN which means more potential bottlenecks.

 

 

Scaling To a 2nd Team

Lets imagine that we’ve got a single-server TFS Lab setup and running, the team using it loves it and a 2nd team (separate TFS Team Project) wants to start using it. 

Sure, you could share the same single-server setup across multiple team projects.  But unless money/hardware is extremely tight I wouldn’t recommend doing that.  The problem is that both teams will be sharing the same finite set of hardware resources (CPU, Memory, Disk), and there usually isn’t much visibility across teams.  What can happen is Team A spins up a bunch of Lab Environments in the morning, then when Team B tries to spin some up in the afternoon they get errors about No Suitable Host Available, because the host has run out of available resources because of Team A.  You can combat this by either over-provisioning the hardware so that it’s unlikely it will get maxed out, or by ensuring that the teams sharing the same Host(s) communicate well to avoid stepping on each others toes.

Instead, what I prefer to do is have dedicated hardware for each Team Project.  Specifically dedicated Host(s) and Library for each Team Project.  The SCVMM instance will still be shared between all Team Projects. If we take the above Single-Server/Single-Team-Project architecture, and scale it out for a 2nd Team Project it might look like below:

Two Team Projects

In this scenario, what I’ve done is dedicate the original server (#1) to Team Project A, and bought 2 new servers: another $15k server for Team Project B, and a smaller/cheaper server (~$2500) dedicated to run SCVMM. 

I moved the SCVMM Virtual Machine off off Server #1 onto the new Server #3.  Because SCVMM was in a VM it makes this migration extremely simple.  SCVMM is a shared resource across all Team Projects so I don’t want it to reside on any hardware dedicated to a specific Team Project.  In this scenario the server that hosts SCVMM Virtual Machine doesn’t even need to be Hyper-V, this is the one case where I don’t mind hosting the VM in the organizations primary virtualization infrastructure (even if that’s VMWare).

Also of note, is that I configure multiple Libraries (one per Team Project), and for this scenario where each Team Project only has a single host server, I place the library on the same physical server as the host.  This has the benefit that the large Library->Host transfers and Host->Library transfers never need to hit the network.

 

 

Scaling a Single Large Team Project

The other important scenario is when you have a single Team Project that outgrows the single-server deployment.  They simply need more resources (# of VM’s, CPU, RAM, Disk) than the original single-server can provide.  In this case I aim for an architecture something like this:

One Big Team Project

As in the previous example I’ve moved the SCVMM VM off to it’s own host.  Since the Library is now shared between several hosts I also move that off to some centralized location.  In the image above I have it located on the physical server that hosts the SCVMM VM; however you could also place it on it’s own dedicated physical server (if I had multiple Team Projects I would definitely do this, as you’ll see below), or you could place it in a VM hosted on the same host (Server #3) or elsewhere.  You want to pay attention to the network routing between your Library and Hosts.  There will be large transfers happening (potentially hundreds of gigs at once), so you want the network between them to be fast and short.  Typically you want to ensure that the hosts and Library are connected to the same physical network switch (I’m starting to see more people putting a 10GigE switch in place just for TFS Lab even if the rest of their network is still slower).

 

 

Mature TFS Lab Infrastructure

The final example is combining these various scenarios together into an organization that has many Team Projects, some large enough to require multiple hosts, and some where a single-server will suffice:

Multiple Team Projects

 

 

Configuring TFS to Dedicate Hosts/Libraries to Team Projects

Something to note, is that the TFS Admin Console only allows you to assign Host Groups and Libraries to Team Project Collections, but not individual Team Projects.  It’s still possible, but you have to use the command-line rather than the TFS Admin GUI.  You have to first assign your Host Group(s) and Libraries to the Team Project Collection in the GUI (make sure to turn Auto-Provision off).  Then you have to run the following commands to assign the various host groups and libraries to specific Team Projects:

TFSLabConfig CreateTeamProjectHostGroup

TFSLabConfig CreateTeamProjectLibraryShare


Sunday, March 10, 2013 #

I’ve spent a bunch of time lately with clients helping them understand why their applications are so slow and how to improve performance.  This often comes down to their use (or misuse) of ORM frameworks such as nHibernate and/or Entity Framework.  I think this probably stems from the fact that ORM’s have gone mainstream somewhat recently, and most developer teams realize they should be using one, but they have never really learned the intricacies of how to use one properly.

 

The first thing I do is pull out SQL Profiler and run through some common scenarios in their application and just get a rough count of how many DB queries happen in each application scenario.  A lot of teams are surprised when they see hundreds or thousands of queries being executed as a result of a single button click in their application.

 

In my experience teams seem to be suffering from one of two problems, either loading too much data at once (eager loading), or loading too little (lazy loading).  The lazy loading problem is probably more common, but the eager loading scenario is easier to explain so I’ll start with that.

 

 

Eager Loading

I’ve run into a few code-bases where they have explicitly turned off lazy-loading in nHibernate (lazy loading is the default behavior).  Unless you explicitly partition your domain model (e.g. using Aggregate boundaries like DDD proposes), not using lazy loading can result in massive amounts of data being retrieved from the DB for seemingly simple scenarios.  If you think of your Domain Model as a giant object graph, where you have many types of objects, most with links to other objects.  When you ask nHibernate for any object, it will automatically retrieve the object you asked for from the DB, *plus* any linked objects, and any objects linked from those, and on and on, until it has populated an entire object graph into memory for you.  When you have any non-trivial domain model, this can be a huge amount of data.  Lets look at an example:

Object Graph

If we have lazy loading turned off, and do a simple operation like asking nHibernate to give us an Invoice with a specific Id.  What will happen is nHibernate will go and retrieve that row from the Invoices table, but it will also get the related InvoiceBatch object, and all of the InvoiceItems, and for each InvoiceItem it will retrieve the Shipment object, and the Product object, and for each Product it will get the Product Group, and so on. 

 

It can get really bad if you have circular references in your domain model – which is fairly common because it is so convenient for writing business logic (e.g. the Invoice object has a collection of InvoiceItems, and the InvoiceItem also contains an Invoice object). In our example, lets assume that InvoiceBatch contains a collection of child Invoice objects, and each Invoice contains an InvoiceBatch object.  When we ask nHibernate for a single Invoice, it will populate the Invoice Batch object, which will in turn populate the Invoices collection and all objects related to every Invoice in that collection.  Lets imagine another example, if we have an Employee object that has a property referencing the Manager (also an Employee object), and also has a collection of Employees representing the Subordinates.  When you retrieve any Employee it will also retrieve the Manager Employee object, then his Manager, and his Manager, and so on until you get up to the top (CEO), then it will get all of the CEO’s sub-ordinates, and all of their Sub-ordinates, and so on.  Ultimately, this means anytime you ask nHibernate to get a single Employee it is actually retrieving *all* employess, along with any other related objects.

 

 

Lazy Loading

The solution to this is to either partition your domain model in some way (e.g. Aggregates as per DDD), or use Lazy Loading (the default in nHibernate).  Lazy Loading works by only retrieving the Invoice object from the DB, then loading any sub-objects only if and when you attempt to access them (aka lazily).  This ensures, that only the minimal set of data that you need to do your work is retrieved from the Database.  nHibernate does it’s lazy loading in a way that is mostly transparent to the developer, when you ask nHibernate for an Invoice object, it is actually generating a dynamic proxy object, that looks like an Invoice (it inherits from Invoice), but has some hooks in there to allow nHibernate to intercept any property access so it can lazy load them as needed.

 

However, Lazy loading has it’s own problems, and these are probably more common due to the fact that lazy loading is on by default.  These problems are commonly called the Select N+1 Problem.  Lets say I had a screen with a grid displaying a list of Invoices, and one of the fields in that grid is Invoice.Customer.Offices[0].Address.City.  What will happen is nHibernate will execute a single query to retrieve all the Invoices I ask for, but then when I try to render it into the grid I’ll have to loop through each Invoice and access the Customer property (which will trigger nHibernate to fire off a SQL query), then access the Customer.Offices collection (another query), then the Office.Address (another query), and finally retrieve the City for display.  These queries will happen separately for every Invoice displayed in the grid.  So if I have 30 invoices displayed in the grid, I could potentially have 91 SQL queries executed.  And that is a relatively simple scenario, in a more complex (realistic) application this problem can become a serious performance concern.

 

What we need is a middle-ground between the first scenario (load everything), and the 2nd scenario (load minimal, and lazy load everything else).  Most modern ORM frameworks will have support for programmatic “eager loading”.  Usually you will have some kind of Repository layer/class in your application.  This is where you want to put this code.  You’ll still leave lazy loading turned on in all your nHibernate mappings, but then in your repository functions you can tell it specifically how much of the object graph should be loaded up-front (and the rest will still be lazy loaded if/when accessed).  With nHibernate this is done via the Fetch/FetchMany/ThenFetch/ThenFetchMany methods.

 

Lets take the previous example where we want to display Invoices in a grid and include a column that displays Invoice.Customer.Offices[0].Address.City.  What we’d like to have happen is for nHibernate to load up this data for all 30 invoices in as few queries as possible (ideally one).  Previously we would have retrieved the list of invoices by doing a simple session.Query (assuming our grid is displaying all Invoices).  Now we might have code in our Repository that looks something like this:

Original GetAllInvoices

I’m going to modify this method to let nHibernate know that I want all Invoices, but I also want it to go ahead and populate the Customer, Offices, and Address objects at the same time.

Eager GetAllInvoices

I’ll let you look up how the Fetch functions work yourself. This should result in a single SQL query that loads all the necessary data (instead of the previous 91 queries).  You may have many areas of your app that require a list of invoices, and some require more or less of the object graph to be loaded.  Often you will see several methods in the InvoiceRepository that return a list of all Invoices, but the different methods will eager load different subsets of the object graph for different uses.

 

When I’m trying to optimize the lazy/eager loading behavior of my code, I’ll find myself spending a bunch of time going through in the debugger with SQL Profiler open, and seeing what code is triggering lazy-loading queries, and starting to build up a list of all the pieces of the object graph I might want to eager load.

 

 

nHibernate Gotcha – Don’t Do Multiple FetchMany

There is a major gotcha to be aware of (at least with the nHibernate eager loading).  If you try to eager load more than one collection in a single query the results won’t be what you expect.  Put another way, never use more than one FetchMany/ThenFetchMany in a single query.  Lets look at an example, lets say we wanted to load all Invoices, and also eager load all Offices and Contacts for the related Customers.  We might try writing code like this:

Broken GetAllInvoices

If you look at the SQL being executed it’s actually doing something like this:

SELECT ...
FROM Invoices LEFT OUTER JOIN Customers ON Invoices.CustomerId = Customers.CustomerId
              LEFT OUTER JOIN Offices ON Offices.CustomerId = Customers.CustomerId
              LEFT OUTER JOIN Contacts ON Contacts.OfficeId = Offices.OfficeId

What happens is that as you add more relationships that are collections, the result set grows large since SQL is doing a cartesian product of all the collections (so if you have 10 invoices, with 10 customers, and each customer has 4 offices, and each office has 7 contacts, you get a resultset of 280 rows).  nHibernate doesn’t deal with this well (it won’t complain, it will just result in an incorrect object graph returned to you – which makes the problem even worse IMO).  I believe, in this example if you examined the resulting object graph, each Customer would show itself as having 28 offices (when in fact it should have 4 offices, with 7 contacts in each).

 

Luckily, there is a solution.  nHibernate essentially has it’s own in-memory cache scoped to the session.  When it goes to lazy load something, it will first look in this cache to see if the object has already been loaded, and if so it can skip querying the database.  (Note: I’m not sure if this is exactly how nHibernate works under the covers, but this is how I conceptualize it).  What we can do, is give nHibernate a few queries, and tell it to use them to pre-populate it’s internal cache.  nHibernate is even smart enough to execute all these queries within a single round-trip to the database.  Now whenever we find ourselves wanting to multiple FetchMany, we can just break that down into multiple queries that nHibernate will use to populate it’s cache.  Here’s the previous example re-written to actually work:

ToFuture GetAllInvoices

In this case I’m executing the first query which will retrieve all Office objects (related to Customers that are linked to our Invoices), and it will eager load each Office’s Contacts collection.  Then I do a separate query to retrieve all Invoices, their related Customer object, and each Customer’s Offices collection.  Both of these SQL queries will be executed as part of a single round-trip (assuming your DB supports that – SQL Server does).  The Contacts will be present in nHibernate cache, so no lazy-loading is required to access them.

 

If you have a significant portion of the object-graph that you want to eager load the “fetch code” can get a little complex.  The silver lining is, that even if you get it wrong you’re not going to break anything, it just means things will be inefficiently lazy loaded when they should have been eager loaded, but your application behavior should still be correct just slow (so long as you obey the single FetchMany per query rule).

 

To finish this post off, here’s what code might look like if you wanted to eager load the entire object graph from the first graphic (note: this code is not tested):

Complete GetAllInvoices


Sunday, February 24, 2013 #

Just sitting in the Seattle airport finally returning home from my first MVP Summit (well in truth I’m flying directly to my next client, no home till next weekend).

As I said this was my first time attending an MVP Summit, so I didn’t know exactly what to expect.  It turned out to be an incredible week, and gives me a new appreciation for the term “drinking from a firehose”.  I’m told that your experience can be very different depending on what Product Group you are associated with.  I’m lucky to be with the Visual Studio ALM group which I’m told is one of the most involved and open of them all.

The week was split up like so:

Mon-Wed – Scheduled sessions/presentations/discussion with the Product Group.

Thu – MVP-2-MVP Day

Fri – Office Hours with the ALM Product Group

And lots and lots of parties in between!

 

The Product Group sessions were 3 jam-packed days where first Brian Harry, then various feature teams got up in front of the room and filled us in on the vision of the product(s) going forward.  I think everything presented/discussed was all new non-public information about feature sets coming in vNext and even vNext+1.  Most of the features being discussed were so early that there is no working code to demo, the discussions revolved around powerpoint slides and storyboards (and sometimes we were discussing features so far in the future that storyboards don’t even exist yet).

These weren’t your typical conference sessions though.  There was lots of interaction, probably half the sessions took the format where the product group just put up a topic for discussion and let the audience drive the discussion around what we felt was needed to solve whatever problem was under discussion (or if the problem even existed in the first place).  We did a bunch of live polls, where various teams would give us a bunch of potential features and get us to rank which ones were most important to us.  And just in general, the audience was very actively involved throughout every day (I swear there were some sessions where the audience did more talking than the presenter(s)).

 

On Thursday we did the MVP2MVP day organized by Neno Loje.  This was a day with back to back 20 min sessions from 9am – 5pm (with no breaks!).  Whoever thought this up originally, kudos to them.  The great thing about this, is it gives the ALM MVP’s a chance to present various topics of interest to other ALM MVP’s.  Unlike conference sessions, where you can’t assume deep knowledge, these sessions can cut away all the fluff because everybody in the audience is already an expert, so you get to focus on just the interesting stuff.  A lot of these sessions were ALM/TFS related projects that various MVP’s are working on.  Some examples off the top of my head:

Friday was scheduled basically over the course of the week thanks to Chuck.  He rounded up the various product owners that we MVP’s said we’d like to sit down and chat with.  These were informal sessions, no powerpoints or demos.  Just frank discussion and the opportunity for us to ask questions or give feedback.

 

And of course there are massive amounts of partying networking that goes on:

Sunday was the get-together of all the Canadian MVP’s.  Us Canadians know how to “network”!

IMG_0284

Monday was MVP Welcome Party at the Hyatt, followed up by minor house party at Ted Newards house with 120 of his closest friends (thanks Ted!).  Ted piled up a mountain of tech books he no longer wanted at the door for anybody to take:

IMG_0310

 

Tuesday had no official events planned, so the Imaginet Crew decided to head out for a quiet night, with some Indoor Sky Diving:

 

Wednesday night was the official MVP Party, where MS rented out the entire Seahawks stadium for the night:

IMG_0338

 

And of course this included Karaoke by Darcy:

and by James Chambers:

IMG_0354

 

And to carry-on a tradition started last year, we ended the night with some Lobster:

IMG_0385

 

I decided to end my weekend with a trip to Crystal Mountain for some snowboarding, which turned out to be a great choice as there was a big dump of snow on Friday night:

IMG_0425


Sunday, February 10, 2013 #

There’s been some chatter lately about an old debate between Feature Branches vs Feature Toggles.  I used to be firmly in the Feature Branches camp, but about a year ago (at the ALM Summit) I became convinced that Feature Toggles are a better choice in a lot of cases.

Feature branches are fairly common.  It is the practice of creating separate branches for each major feature, or perhaps choosing a group of features for each “feature branch”.

Feature Branching

Teams usually adopt the practice of feature branching because of the increased flexibility it gives them for doing releases (often fixed date releases vs fixed scope releases).  By creating say 5 feature branches for the 5 major features the team is working on, if only 3 of them are ready for release when the target release date rolls around, then only those 3 feature branches get merged into MAIN, and the other 2 feature branches continue development and will be merged into MAIN when complete for inclusion in some future release.

 

 

A team I used to work with would not have fixed date releases, but would let the completed features sort of “pile up” until the powers that be decided there was enough completed features to justify a release (or until some really valuable feature was completed perhaps).  Because we were using feature branches, we had the flexibility to release at any point in time, since the half-finished features were isolated off in their own branches.

 

 

The problem with feature branches is there can be a lot of pain related to merges.  Ideally, the developers will frequently merge from MAIN->FeatureBranch (maybe they do this every day if there are any changes to merge in), this way whenever a developer/team completes their feature and merges it into MAIN, all other feature branches will get those changes ASAP.  However, the whole idea behind feature branches, is that the code for each feature is isolated in its own branch until it’s ready to be released, at which point it is merged into MAIN.  This results in one big changeset in MAIN that represents all the code for that feature (possibly weeks of work).  As soon as that feature branch is merged in, all other feature branches will have to merge that potentially massive changeset into their feature branches, resulting in a lot of potential merge pain.

 

 

Now, most of the people in the Pro Feature Branch camp will tell you that you deal with this by keeping your feature branches small, ideally a day or two.  But in reality that is going to be difficult for most teams.  While they might like to do that, they are a long way from being mature enough to do that.  And I don’t buy that it’s entirely a team maturity issue, some features are simply large, and the teams don’t want to have to break it down into tiny releasable features.  Take for example the recent Git changes the TFS team implemented.  They have been working on that feature for many months.  Even if they could break it down into many 1-2 day features, I doubt that would have been desirable for them.

 

 

Not only that, but there is a subtle backwards incentive system at play when it comes to feature branches.  If I make my feature branch really large and long-lived, it’s not *me* that feels the pain.  It’s everybody else when I finally merge it to MAIN.

 

 

So what ends up happening is that feature branches are a mechanism that teams use to explicitly defer integrating/merging their changes together (on purpose!).  This is completely opposite the practice of *continuous* integration that most of would probably say is desirable (even if some of us don’t really understand what it truly means).  Fowler wrote about this in detail some time ago.  By isolating various teams/developers changes off in feature branches, we are explicitly deferring integration.

 

 

So what’s the other option?  We said at the start that teams use feature branches to achieve flexibility around releasing.  If we just go back to having everybody working in one branch, sure we will have achieved continuous integration, but we’re back with the release problems that we tried to solve with Feature Branches.

 

 

This is where Feature Toggles comes into play.  What if we did all work in the same branch, but we used some mechanism to turn on/off features (say via flags in a config file).  Now when I first heard this suggested, I instinctively told the guy he was a goofball, it sounds like this will result in a giant spaghetti mess of code, with if statements surrounding all kinds of crap, leading to a maintenance nightmare.  However, this doesn’t need to be the case.  For one, the “toggle” is only in place while the feature is under development.  Once the feature is complete all toggle code and any config file entries are removed (in fact, when I break down the User Story into tasks, I’ll make sure the last task on the list is “Remove Toggle”).  It does require some conscious thinking about how to develop the feature in such a way that it can be hidden from the user and implemented in such a way to allow it to be turned on/off.  In my experience, this turns out to be easier than you may think.  The vast majority of features can be hidden behind a feature flag if you give it a little thought; and those few that can’t you can still always create a feature branch if you wish.

 

 

Some people worry about the implications of including code for unfinished features in released software (even if hidden behind a toggle). I don’t think this as a big of a deal as some people think. My release process will typically involve creating a Release Candidate branch, and performing pre-release testing on that build of the software to ensure it meets the release quality standards.  If there are any ill-effects resulting from including unfinished features behind a toggle in these Release Candidates they should hopefully be discovered during this testing phase.

 

 

This gives you the benefits of *true* continuous integration.  If somebody writes some code that breaks another developers.  The guy that wrote the breaking code will find out immediately, and he will be the one responsible for fixing it.  Unlike feature branches, where if I write some code that breaks another developers feature-under-development, I don’t know because the other dev’s code is off in its own branch.  I merge my code to MAIN, the other dev merges my code into his branch, and his branch becomes broken, only now it’s his job to fix it, not mine; even though I’m the one that wrote the problem code.

 

 

There are some other benefits that you get along with feature toggles.  For one, you have a super-easy rollback plan if deploying a new feature goes south, simply turn off the feature toggle (assuming you leave the toggle in place until after the first release).  You can do phased roll outs - how about turning on major new features just for a subset of your users, to get some feedback before turning it on for everybody.  The TFS team at Microsoft uses feature toggles extensively.  They turn on a lot features in advance (for the TF Service cloud offering) for MVP’s to try out and provide feedback on.

 

 

Conclusion

To summarize, teams usually adopt feature branches to provide release flexibility.  But they have the undesired effect of deferring integration, and causing lots of merge pain.  By using feature toggles you can have continuous integration, while still retaining the release flexibility of feature branches.  Some care must be taken to implement features in a way to support toggles, and you must be disciplined about removing toggles once features are complete.  You get the added benefits of being able to easily do phased roll-outs on a feature by feature basis too!


Sunday, January 27, 2013 #

For probably over a year now I’ve been hearing lots of hype around package managers and NuGet in particular.  I’ve never really “got it” – that is until last week.  So what, NuGet will download the nHibernate assemblies for me.  I can do that myself easily enough, why on earth do I need a specialized tool to do that for me?!  But it will download not only nHibernate, but all of nHibernate’s dependencies too!  Big deal, that’s never been an issue for me before, usually these 3rd party packages come with all the necessary dependencies (nHibernate includes ANTLR, Moq includes Castle, etc).  I challenged a couple people I respect to convince me that I need NuGet, and despite their best efforts I was never convinced.

Last week I was working with a client who has multiple teams, working on multiple components/products, in parallel, but they all get released at once as part of one big release.  And all these various components and teams, are writing code with dependencies on other teams.  And each component is evolving independently of the others, but eventually they all need to come up with a final version that works with all the other final versions that will make up the release.  I like to compare it to the TFS team at Microsoft.  When they were developing TFS 2012, they wanted it to work with SQL 2012, Visual Studio 2012, .Net 4.5, none of which actually existed at the time TFS 2012 dev was underway (they were all under development also).  Trying to develop against a dependency that is also a moving target presents a number of problems, and it turns out NuGet can be a tremendous help here.

The problem is there are many projects (lets call them packages) that have interdependencies between them, and are potentially developed by different teams and on different release cycles. The challenge is we want to ship an updated product that contains updated versions of many of the packages, and we need to be confident that they all work well together.

The common approach to handling this is to treat each package as separate projects, and any dependencies on other projects are treated as an external dependency (similar to 3rd party dependencies like Log4Net). This is usually handled by having a lib folder within your source tree, and checking in the binaries for the external dependencies. This allows each project team to make an explicit decision about which version of their external dependency they are going to develop against, and choose if and when to update that version the latest version to reduce disruption to their development cycle.

Another important practice, is if a team is developing a package which other teams depend upon, they often want to control which versions of their package are available for other teams to consume. They don’t want every check-in to produce a build that other teams can potentially consume, because often these builds will have half-finished features in there. Typically a team will want to set a higher quality bar for which builds they share with the rest of the world to depend upon. This is usually handled by using a DEV branch and a MAIN branch. The quality standard for code to get into the MAIN branch is higher (no half-finished features), and every MAIN build is available for other teams to consume. Typically a team will update MAIN at the end of each Sprint.

There is another more subtle problem that becomes more significant as the number of packages and dependency graph between them grows. Let’s look at an example:

Dependency Example 1

If we imagine that all 4 packages are 1.0 to start with. We belong to the dev team for A. We are working towards shipping 2.0 of our product, which will include updated versions of all 4 packages, but there are 4 teams each working on a different package.

Our source tree for A contains a lib folder with sub-folders for B and C. And since B and C each depend on D we can either have a separate sub-folder for D and reference it directly from A, or we can include copies of the D binaries in the lib sub-folders for both B and C. Let’s assume that we do the latter, and both the B and the C sub-folders contain the binaries for D.

Team D finishes its work and makes 2.0 available in its MAIN branch. Team B immediately updates their project to pull in the D-2.0 binaries and updates their code to work with it (B-2.0). However, Team C has not yet pulled in the updated D-2.0 binaries yet (and possibly won’t for a while still). As Team A developers we want to pull in the updated B package so we can recompile against it and ensure our A code still works properly. However, we also depend on C which is 1.0 and depends on D-1.0. So which version of the D binaries should we use? We can’t have both D-1.0 and D-2.0. If we have D-1.0 it will potentially break B, if we have D-2.0 it will potentially break C. What are we to do? It’s versioning hell!

The way teams deal with this is to use versioning policies (either explicitly or implicitly). Rather than saying B depends on D-2.0 and C depends on D-1.0, what you need are versioning policies that say B depends on D-2.0 *and up*, and C depends on D-1.0 and up. This way the above scenario can be made to work by using D-2.0. In order to implement this “versioning policy” there are a few options:

1. Update the B and C csproj files so they don’t demand a specific version and don’t reference a strong-name for the D assembly. Then include the D-2.0 binaries in your A source tree.

2. Leave the csproj files alone, and instead introduce a “binding redirect” that indicates that anything that references D-1.0 should instead use D-2.0. This can be configured in the app.config/web.config.

Using a package manager such as NuGet can automate a lot of this work for you. Instead of having to manually walk the dependency graph to figure out which version of D binaries you require (transitive dependencies), NuGet will do this for you and automatically download all the necessary binaries for both direct and transitive dependencies according to the various versioning policies specified.  Then you simply check them in to your lib folder (usually called packages when using NuGet instead of lib).  NuGet not only walks the dependency graph to figure out the appropriate versions of all binaries you require, but it will create the appropriate binding redirects for you in your config file also.

All you need to do is create some automated builds on the various MAIN branches that will publish each package to your own private NuGet server.  Now when a team wishes to update their dependencies they simply update the versioning policy to specify a newer version number, then let NuGet do the necessary work, and check-in any updated binaries.

Note: Everything above assumes that there are no cycles in your dependency graph. If there are you’re basically screwed.  Refactor to eliminate cycles immediately.