WSJF – Weighted Shortest Job First

Weight Shortest Job First (WSJF) was born for mainframe job scheduling

What is WSJF?

Weighted Shortest Job First is a scheduling algorithm (or if you prefer, prioritisation method) that maximises the amount of whatever it is you choose to “weight” by in a given time period through a constrained resource.

Where did WSJF come from?

Back in the 1970’s, when computing resources were both extremely expensive and severely constrained, Shortest Job First was used as a way to schedule the batch jobs in a way that made the best use of a scarce resource.  More recently, Don Reinertsen proposed and popularised knowledge of Weighted Shortest Job First, as a way of maximising the return on investment in software teams.

There are only two ingredients required for WSJF: the weighting of each job and the duration of each job. The weighting that Don specifically recommends for product development is Cost of Delay. (Here is a 3 minute video that explains Cost of Delay). The other part, duration is how “short” each job is. What we need to know is “How long will this job block the pipeline for?” For that, we need some forecast of the Duration that this job will be in-process. This matters because while it is in-process it is blocking other work from being started. During that time, we are incurring the Cost of Delay. What Don specifically proposes is a specific form of Weighted Shortest Job First: Cost of Delay divided by Duration.

There are other possible forms of WSJF Weighted Shortest Job First – depending on what you are trying to optimise for. You could choose to weight by technical risk, market risk, stakeholder influence, length of time waiting, or any other factor or combination that you might want to maximise. For most organisations though, they are most interested in maximising their economic return. In that case, it is Cost of Delay that you should weight by.

CD3: Cost of Delay Divided by Duration

CD3 cost of delay divided by durationSaying “Weighted Shortest Job First” or, more specifically, “Cost of Delay Divided by Duration” more than a few times gets rather tiring, so to ease communication we can shorten this to CD3 – at least for the case where we are weighting by Cost of Delay. This also has the benefit of communicating the more important component in the algorithm: the Cost of Delay.

When using CD3, the priority order of features or projects is determined by dividing the estimated Cost of Delay by the estimated duration: the higher the resulting score, the higher the priority.

One of the benefits of the CD3 version of WSJF is that it enables us to use a common measure to compare opportunities with different value and urgency, as well as take account of the situation where the duration differs. CD3 optimises the return on investment by minimising the total Delay Cost incurred given a set of potential options. In most product development settings, capacity is relatively inflexible and difficult to scale. This means that understanding how long the pipeline is blocked is often quite valuable information.

Because CD3 uses Duration on the denominator, it also has the benefit of encouraging the breakdown of work into smaller batches. Breaking down work is one of the easiest and most effective improvements we can make in terms of getting more value, faster flow and better quality.

WSJF: Cost of Delay Divided by Duration

Let’s take a look at an example to help us understand how and why WSJF/CD3 improves Return on Investment. Consider the following three features, for which we will consider the outcome of two alternatives scheduling approaches.

Cost of Delay Duration CD3 Score
Feature A $1,000/week 5 weeks 200
Feature B $4,000/week 1 week 4,000
Feature C $5,000/week 2 weeks 2,500
Using First-in, First-out (FIFO)

We could choose to work on and deliver these features one at a time in the order they arrived. A, then B, then C. This is called First In, First Out (FIFO). It is a common scheduling approach in manufacturing. After all, the person asking for Feature A will have been waiting for the longest time so it could make sense for us to serve them first. Then we will move on to B, and then C.

For the 5 weeks we are working on Feature A we incur the Cost of Delay of all three features: $5,000/wk + $4,000/wk + $1,000/wk. This adds up to $10,000/week over 5 weeks, giving us a Delay Cost incurred of $50,000.

Once we’ve delivered Feature A we can then move on to developing Feature B. For the 1 week this takes us to deliver we incur the Cost of Delay of Features B and C: $4,000/week + $5,000/week = $9,000/week for 1 week – an additional $9,000, bringing us to a total of $59,000 worth of Delay Cost incurred.

Finally, we can start working on Feature C, incurring the Cost of Delay of C during it’s development of $5,000/week for the two weeks it takes to build Feature C. So, another $10,000 of Delay Cost to add to our previous of $59,000 for a total of $69,000 Delay Cost incurred.

Using WSJF: Cost of Delay Divided by Duration (CD3)

Let’s consider another way of processing these Features. If we develop the features based on whichever has the highest CD3 score we would do Feature B first, followed by Feature C, and finally Feature A.

For the 1 week we are working on Feature B we incur Cost of Delay of $(4,000 + 5,000 + 1,000)/week. Delay Cost after 1 week = $10,000
For the following 2 weeks we are working on Feature C we incur Cost of Delay of $(5,000 + 1,000)/week. Delay Cost = $12,000
For the 5 weeks we are working on Feature A we incur Cost of Delay of $1,000/week. Delay Cost = $5,000

Total Delay Cost using CD3 = $27,000

Comparing

Using FIFO resulted in a total Delay Cost of $69,000. Using CD3 gave us a total Delay cost of $27,000 – a 61% decrease in the Delay Cost incurred. As you can see, using the CD3 version of Weighted Shortest Job First to order your backlog can make quite a big difference.

What this simple comparison doesn’t show though is the human effects of giving preference to work that is smaller – and the impact that has on flow and lead-times. As we’ve shown in our Experience Report, using CD3 also encourages the breaking down of batches as a way of influencing priority. This means that instead of rewarding larger batches, which often claim a higher benefits case, we are encouraging people to minimise the size of the things they ask for. This has huge benefits in a number of areas, not least of which is significantly faster lead-times as well as better throughput thanks to reduced variability of work items being processed.

Scaled Agile Framework, WSJF and Cost of Delay

It is a great endorsement of WSJF that Dean Leffingwell’s Scaled Agile Framework (SAFe) recommends using it for prioritisation. SAFe also emphasises that WSJF should be applied at feature level – which also matches our experience of using Cost of Delay. (One of the most damaging concepts to ever take hold in organisations is that large-batch projects are a suitable vehicle for developing software). SAFe should be commended for encouraging organisations further along in both of these areas.

Having said that, the current teaching of WSJF in SAFe leaves plenty of room for improvement – in particular how they treat Cost of Delay. In their attempt to simplify Cost of Delay, SAFe may have unintentionally made it more complicated, and rendered it useless for much of the value that Cost of Delay can bring. Here are my thoughts about how SAFe interprets Cost of Delay. Long before I put pixels to screen, Jason Yip shared his views here.

My main plea would be to make sure you don’t boil Cost of Delay down to simply a better way to prioritise – there is much more to it than that!

UPDATE:

I have since proposed a qualitative approach to Cost of Delay, which could be used in WSJF. Based on this, I’ve also proposed the smallest possible tweak to the SAFe version of WSJF to make it at least coherent and reflective of the underlying concepts.

Comments 2

  1. A great write up and I completely agree with the value of CD3 in driving the best business decisions, for me this is obviously an improvement to the FiFo approach most commonly adopted in the IT domain.
    I’m confused as to why more businesses don’t follow a CD3 approach though? do you think there is a lack of exposure (which SAFe goes some way to address) or is it the slight over head of considering the value of work and the cultural changes around smaller work packages? (in many organisations we bundle together the work to give the largest ROI number we can)

Leave a Reply

Your email address will not be published. Required fields are marked *