SAFe and Weighted Shortest Job First (WSJF)

In 2012, when Dean Leffingwell launched the Scaled Agile Framework (SAFe) it was obvious the impact that Don Reinertsen’s teachings had on elements of the design. In particular, SAFe specifies Don’s recommended method for scheduling: Weighted Shortest Job First (WSJF). Whatever you think of the rest of SAFe, it really should be commended for encouraging organisations further along in a few specific areas:

  1. It’s fabulous that a much wider audience is being exposed to the importance of scheduling methods and that Weighted Shortest Job First (WSJF) will be more widely used for prioritisation and queue management. Perhaps organisations implementing SAFe will become less blind to queues, and the cost of queues in their organisation?
  2. SAFe also suggests that WSJF should be applied at feature level, which matches our experience. Perhaps when they start to see the distribution of value in their backlogs they will start to realise that large-batch projects are a terrible vehicle for developing software.
  3. SAFe also suggests using Cost of Delay as the weighting in WSJF. Perhaps this means that Product Owners will surface their assumptions about where the value lies, and design better experiments to test and learn about what is valuable, rather than relying on their gut-feel? Perhaps managers around the world will start talking about, and focus more on Cost of Delay and less on far less interesting things like productivity, velocity, and estimates of dates and cost. And perhaps teams will finally have a way of making vastly better trade-off decisions and more quickly discover, nuture and speed up the delivery of value.

I’m super excited about the last one in particular. In my experience, this is a really powerful lever. I’m sure Don agrees, as he could hardly have stated it more clearly when he said:

If you only quantify one thing, quantify the Cost of Delay.

In fact, this very quote of Don’s is at the top of the page where SAFe explains Cost of Delay, so I’m sure they’ve paid attention to it. Lets take a look at how SAFe explains and recommends that organisations go about quantifying the Cost of Delay. I’ve had some experience doing this in a number of settings, so I’m really interested to see how they approach it…

SAFe’s interpretation of Cost of Delay

SAFe teaches that Cost of Delay is made up of three primary elements, which add up to the Cost of Delay. Let’s look at each of these and I’ll do a little thinking-out-loud along the way:

User-Business Value: Do our users prefer this over that? What is the revenue impact on our business? Is there a potential penalty or other negative impact if we delay?

We are on fairly solid ground with “revenue impact”. (This maps well to the Increase Revenue and Protect Revenue buckets in the value framework we use to get people quantifying Cost of Delay). I’m not so sure about the users preference question. If you are talking about internal users (a common case in SAFe’s Target Market) this is often a fairly weak indicator of value. Asking if there is a potential penalty “if we delay” seems to overlap with the next parameter – but perhaps they are talking about fines, loss of license or ability to operate?  This could be more clear, I think.

Time Criticality: How does the user/business value decay over time? Is there a fixed deadline? Will they wait for us or move to another solution? What is current effect on customer satisfaction?

The first three questions are all great questions. I would put customer satisfaction under the previous heading of Business value. What’s not clear though is how the answers to these questions should be treated. Is a “fixed deadline” high criticality, or low? The way this should work is that the Cost of Delay for something that has a fixed deadline is initially zero – right up until the point when you need to have started it. After this point, the Cost of Delay is possibly all of the revenue related to that date. In many cases, leaving it until the option expires is too risky, so the Cost of Delay may ramp up some time beforehand to reflect the risk. All work should have some degree of urgency, or “time criticality” though. If there’s no urgency, you’re better off investing elsewhere. It’s not clear from this how SAFe treats time criticality – I’d love to see some examples of how they’ve actually used this model in the wild.

Risk Reduction-Opportunity Enablement Value: What else does this do for our business? Does it reduce the risk of this or future delivery? Is there value in the information we will receive? Will this feature enable new business opportunities?

Again, these are good questions, but they are all questions that are a subset of, or affect the likelihood of delivering value in my view. Surely it would be easier to simply add these to the first parameter? Reducing risk is primarily about avoiding cost or in some way affecting the asymmetry of the payoff function  We can also put a price-tag on information. (Some might even argue that all value is simply a filtration of a set of information). Enabling new business opportunities seems to be about Increasing Revenue. By treating these types of value as a separate thing to Business Value the risk is that they will be treated differently to “Business Value”. This seems convoluted and confused to me, but perhaps that’s just me.

Overall, in my view, the definition of these parameters could do with some simplification and clarification. I have a couple more questions though:

Why is it additive?

In my understanding, Cost of delay is a way of combining Value and Urgency. If either of these two parameters are zero then the Cost of Delay is zero. Adding “Time Criticality” to “Business Value” doesn’t make economic sense to me. I could in theory have something with zero business value, but highly time critical – and this would somehow score? The result of this is that using the SAFe algorithm will almost certainly give you a suboptimal scheduling of options.

Why Relative?

More worrying is the stance of “we needn’t worry about the absolute numbers”. I can only assume this decision was made by people with experience primarily in I.T. It is a shame that the Lean and Agile community seems to think that the primary purpose of Cost of Delay is for prioritisation. There is much more to it than that. Relative estimates simply don’t give you any of the potential benefits I hoped for above. Even very roughly quantified estimates (perhaps into fairly large ranges) would be better than this. The are a few other issues with relative estimates of value to consider:

  1. It’s very difficult to remember the thinking behind the relative positioning of more than 20 or so items. Applying Fibonacci doesn’t make sense to me either. For estimates of cost or effort we have an obvious optimism bias, which gets worse the larger the task primarily due to conflation. This is why the probability distribution of lead-time conforms to Weibull – it’s the conflation of uncertainties. There is no evidence (that I’ve seen) of this same skew applying to the value side of the equation though. Our overestimates tend to be balanced out by our underestimates – it is actually the Black Swans that dominate the value side of the equation. On what basis is Fibonacci used? (I fear this obsession with Fibonacci is turning us all into sons of simpletons).
  2. Relative Estimates don’t help much if you have more than one stakeholder. One person’s relative score of 2 is another person’s 8. If you’re managing three or four stakeholders then relative estimates becomes either an escalation to the HiPPO or a painful horse-trading exercise. In my experience, this is a huge pain point for most large organisations – which I believe is the target market for SAFe.
  3. Relative Estimates don’t help encourage us to surface assumptions. Actually quantifying the Cost of Delay has the wonderful effect of forcing us to switch into the more analytical System 2 thinking in order to question our gut-feel System 1. As we teach in the workshops we have run with various clients and at conferences, the point of this is not really the resulting number, although any quantified Cost of Delay is better than none. It is more about surfacing the assumptions and identifying the most uncertain part of the whole enterprise: “where is the value?”. Talking about, writing down and making visible our Cost of Delay estimates is like sunlight disinfecting our backlogs. It’s not that we are shooting for certainty, it’s that we should at the very least share those uncertainties far and wide so that people can design more effective experiments to discover whether the value is there or not.
  4. By hiding behind relative estimates we do nothing to change the focus of the conversation. Abdicating from quantification on the value side of the equation means that the focus tends to drift toward the duration or cost estimate side of the equation. As we all know, this is mostly a waste of everyone’s time and effort, not least of which is because of the information asymmetry, let alone the uncertainty.

As Jason Yip has already pointed out the key ingredient in all of this is the quantified Cost of Delay. Without it, most of the benefits evaporate. You simply can’t make trade-off decisions with relative value estimates. The cost of queues will still be invisible. The cost of large batches will still not be obvious. Assumptions about value will likely remain hidden. We will still be stuck negotiating estimates of dates and cost and obsessing over pointless things like “velocity”. In short, this is not the Cost of Delay I know.

In conclusion

As I said at the beginning of this, it’s great that SAFe has adopted WSJF and Cost of Delay. Unfortunately, it seems much of the potential positives I’ve listed above have been watered down or muddied – to the point where, in my opinion, we are mostly left with muddy water and very little substance. To be clear, I’m all for simplifying: I’ve been doing this for real with organisations and at conferences over the last 5 years. If SAFe really is interested in making its Framework better, then I’m sure they’ll reach out and ask for input to make this better – we’d be more than happy to help.

UPDATE: I’ve made the smallest, simplest suggestion to improve SAFe’s approach to Cost of Delay here.

If you’re interested, I’ve also published a Qualitative approach to Cost of Delay, for those who are afraid of numbers.