Value, Flow, Feedback

“Velocity” needs to die. Alternative measures?

In spending time with senior Executives of lots of different organisations, a word that they often latch on to –and have a tendency to misuse– is “Velocity”. I’ve lost count of the number of times I’ve heard someone say something along the lines of “We need to increase our Velocity.” This betrays a fundamental misunderstanding about a) what velocity is, and b) that velocity going up really doesn’t give you what you want.

Unfortunately, in trying to correct this thinking, we don’t really have a good alternative for people to use. Maybe that’s a good thing? “Agility” is a nebulous term, that get’s used and abused. I generally try to avoid using it at all because of this ambiguity. Nevertheless, I know when I’m beat and in the spirit of being a pragmatic Engineer, who still has to operate in a world full of uncertainty, I think we can do better than <nothing> as an alternative. So, what could we replace it with?

What does Agile really mean?

Setting aside the platitudes in the Agile Manifesto for a minute, I’ve previously made the assertion that true agility is a function of Speed and Optionality.

One of the most agile animals we know would probably be the humble House Fly. It is quick (in terms of acceleration) and it can change direction extremely quickly – which gives it all the optionality in world – at least the three-dimensional world it’s playing in.

And yet the fly can outmaneuver any human-built craft at low speeds. Buzzing annoyingly across a room, a housefly reaches speeds of up to 10 kilometers per hour at twice the acceleration of gravity. When turning, it is even more impressive: the fly can execute six full turns per second, reaching its top angular speed in just two-hundredths of a second. It can fly straight up, down, or backward, and somersault to land upside down on a ceiling. If it hits a window or a wall sideways, which it often does, the fly will lose lift and begin to fall. But its wings keep beating, and within a few microseconds, the fly recovers its lift and can move off in the opposite direction.

What about for Product Development? How might we measure ability to accelerate and change direction? I’ve previously articulated three foundational principles behind “agility” that I use to guide and assess the “agility” of how an organisation is operating – and how it is improving over time:

  1. Deliver Value Early and Often
  2. Optimise End-to-End Flow
  3. Discover Quality through Fast Feedback

These three are aligned to the Speed and Optionality in that by delivering early and often, that provides optionality to change direction. If you are fast end to end, you can develop and deliver an improvement (through iteration or increment) in fairly short order. And the third point is critical: without feedback you can’t safely operate in a complex environment. It’s the feedback that gives you information (OODA Loop style) with which to decide how to deploy your capabilities speed and optionality. This isn’t just Agile for agile’s sake, but in a practical sense – in order to actually improve, both as a team and as a product or user/customer experience.

Ok, so with that as a starting point. How might we measure “agility” in our organisations?

First, some caveats: This is supposed to be a starter for ten. I know full well that this is not perfect. No measures are. Nor does it need to be perfect. The current certainly isn’t! My reasoning is that the current vacuum of alternatives to “Velocity” is leading Execs (and others) to focus on the completely wrong thing. Without an alternative, we are stuck with Availability Bias: drunk people looking for their keys under the streetlamp because that’s where the light is, NOT because it’s where they dropped their keys. To perhaps labour the point: we don’t need a laser-pointer shining on the exact spot. The mission is to direct attention on to areas where focus is more likely to lead to Good Things™ happening.

As always, if you can see a way to improve this, please share!

VEO: Value Early and Often

At it’s most simple, I think this could be a combination of the “Value Early” part, which would be the elapsed time from when a team decides to focus on a problem area, to when they have developed a good enough MVP to test. Does it take 9 months to do an MVP? Or is it closer to 3 months? Let’s call this Time to Market, or TTM.

TTM = Time to Market — Elapsed time in weeks from team starting to explore a problem space to when the first MVP is being used by Customers/Users.
If MVP takes 9 months, the TTM = 39
If MVP takes 3 months, the TTM = 13

We also need something to cover the “and Often” part. I’d suggest something like Release Frequency as a half decent measure of this. So, what is the elapsed time between releases? Is it quarterly? or Monthly? Weekly? Daily? Multiple times a day? Clearly the shorter the better, as a general rule.

RF = Release Frequency — Elapsed time in weeks between releases to customers/users. If Quarterly Releases, then RF = 13
If Monthly Releases, the RF = 4.3
If Daily (weekday) Releases, then RF = 0.2

I’m not sure how to combine these two parameters, but adding them together doesn’t seem logical. I’m gonna multiply them for now. So…

Value Early and Often, VEO = TTM x RF

So, let’s imagine how the conversation between a Senior Exec and, say, a Delivery Manager might go with this:

Exec: “So, what’s our Value Early and Often Score?”

Delivery Manager: “Well, we’ve seen an improvement over the last quarter. Our TTM has gone from 39 weeks down to 26 weeks. This is mostly from doubling our Release Frequency which was Quarterly previously. We are now down from 13 weeks to a release to production every 6 weeks. There’s been a huge effort in making releases cheaper and Continuous Delivery to make this possible.”

Exec: “Sounds good – what’s the overall Score now?”

DM: “From 507 down to 156. A 70% improvement!”

Exec: “That’s amazing. We should celebrate that achievement. Do you need anything from me to get it down even further?”

DM: “Not really. But we have already plucked a lot of the low hanging fruit. To get it down further we really need to invest in improving the quality and coverage of our Unit Testing. For that, the teams have suggested that we increase capacity allocated to this to 20%. We’re shooting for TTM of 13 weeks and RF of 4 weeks, which would bring us down to 52.”

Exec: “If that’s what they suggest, we should look at what impact that might have on various roadmaps. I’m happy to signal to the Product Management community that this is important – but the decision is really up to the individual teams”

With me so far? Let’s keep going and see where we end up…

E2EF: End to End Flow

This may seem like a repeat of TTM, but I’m assuming that the MVP is a bigger batch. What I’m interested is how long it takes for an individual Feature (of “story” if those represent something of value to the user/customer). We’re looking now not at the batch, but how quickly one item in the batch goes from backlog to done.

For this, we can use the fairly standard definition of Cycletime, but ideally we would make this from End-to-End. The other thing to avoid here is measuring how long each part takes. If teams are building car doors, but not integrating those doors with the rest of the pieces needed to deliver and increment or iteration or information that is valuable it’s not really “end to end” in my view.

For many orgs, you can get a fairly decent dataset on this by looking at Jira Control Charts. Or, just timestamp when an item is pulled from the backlog to when it’s “done”. It’s quite important that this also include the time spent in the last mile of development, from “code complete” to when it’s fully integrated and considered good enough to ship.

CT = End to End Cycletime. Elapsed Time in days from pulling a “Ready for Dev” story or feature into WIP through to “Done Done” i.e. to production-level, ready to ship Quality.
If Scrum with 2 week sprints, should be less than 14 days.

We could make this more complicated by looking at Mean Time to Recovery and a load of other useful metrics, but for now let’s just keep it simple.

End-to-End Flow, E2EF = CT

Again, let’s imagine how the conversation between a Senior Exec and, say, a Delivery Manager might go with this:

Exec: “So, what’s our End-to-End Flow Score?”

Delivery Manager: “Well, to make the more frequent releases possible we’ve had to improve our Continuous Integration setup, shaving off half of the “last mile” to get to a production-like environment. We also added two UX designers where we had queues building up. From this and other improvements that have come out of team-level retrospectives, our E2E story cycletime has gone from a little over 4 weeks down to under 3 weeks for 70% of stories.”

Exec: “Sounds good – so a drop from 28 down to 21 days?”

DM: “Yeah – there’s of course some variation in that, but thats the trend for ~70% of stories”

Exec: “Understood. A 25% improvement is pretty good. What’s next on this?”

DM: “The teams think they can maybe get this down to under a fortnight. The bottleneck for most teams has shifted from downstream to upstream – so we’re starting to look at our Definitions of Ready to see if we can tighten that up to smooth the flow through the teams but without shifting more work upstream.”

Exec: “Perfect. Again, let me know if there’s anything I can do to support that.”

So now we have covered two of the three principles in some way shape or form. I’d argue this last one is perhaps the most important though, so stay with me…

FFL: Fast Feedback Loops

This one is more tricky. We’re in the realms of SNR: Signal to Noise Ratios, False Positives and False Negatives. Test Pyramids and Broken Windows Theory. Some feedback loops contain almost no information whatsoever.

Good quality feedback loops are also nested, so if a quality problem makes it through an earlier feedback loop without being picked up, hopefully one of the many broader outside loops will catch it before a user or customer is affected.

With all those caveats, how might we objectively measure fast feedback loops? We also want it to be relatively simple – we’re competing with “velocity” on the simplicity scale after all. So, what’s a half-decent starting point?

How about if we chose three fairly common feedback loops and used the cycletime for each of those – measured from when we first start working on something to when we get some sort of feedback loop relating to quality that would tell us whether we are likely heading in the right direction?

We of course already have a couple of key feedback loops covered in the above measures (TTM is the speed to feedback from users/customers, and CT is the speed of feedback for an individual feature or story). What nested feedback loops nested inside those two might be something we could objectively measure?

Here’s a “starter for ten” set of three that might be worth trying?

FL^3 = the cycletime of three nested feedback loops: From Pull of Story ->
a) UT: local Unit Tests to run (<1 day?)
b) SIT: System Integration Tests (< 5 days?) and
c) DSR: Demo/System Review (< 14 days?)

Again, I’m not sure how we might combine these three, but let’s say we multiply them together? Measured in Days? (Really not sure about this!) I’m also going to leave out, for simplicity’s sake, any measure of how often the team reflect on their own ways of working (AKA Retrospectives) as a core feedback loop for continuous improvement for the team itself rather than the Product they are working on. If we were to include it, it might be a fourth loop? Seems too easy to game that one though…

Fast Feedback Loops, FFL = UT x SIT x DSR

So how might the conversation go on this topic?

Exec: “So, the Value Early and Often and E2E Flow Scores are showing improvement. What about our Fast Feedback Loops Score?”

Delivery Manager: “Glad you asked. I’ve already mentioned the investments we’ve made in speeding up Continuous Integration. The feedback loop from Start to SIT Completed is down from 4 weeks to less than 2 weeks on average. Unit Test are now running 50% faster too, so that’s down from nightly to half a day from when we start a story. It’s not reflected in the Score, but we’ve put a huge effort into refactoring broken tests and improving the SNR for the tests that we do have. We can now go from a broken build to a fix much faster, since we have better logging of bugs when they arise. Time to Demo hasn’t changed – we’re still doing these once a fortnight.”

Exec: “OK – so what the Fast Feedback Loops score?”

DM: “From 392 (=28 x 1 x 14) down to 98 (=14 x 0.5 x 14)”

Exec: “That’s great. This is so important – what’s the next steps on Fast Feedback Loops?”

DM: “Some of the teams feel they are ready to experiment with weekly Demos. We’re trying to convince some stakeholders of the importance of early feedback, but there’s some grumbling about this taking too much time.”

Exec: “OK. I’ll have a chat at my next weekly with my peers and explain the importance of fast feedback. With the improvements in Release Frequency and Time To Market, I’m sure we’ve earned the right to ask for them to cooperate – at least until they see the benefits for themselves.”

I could probably write a whole book on the importance of Fast Feedback loops with high quality Signal to Noise Ratio, and different ways to measure quality in these terms. Until I do, go and check out Steve Smiths book “Measuring Continuous Delivery“.

Combining all three: VEO, E2EF, FFL

Ideally, we wouldn’t combine these. I’d say each measure is worthy of attention separately, and combining them not only combines Apples and Oranges (not to mention potentially very different units) together in strange ways, but it dilutes and oversimplifies. If I was forced to combine, again, I’d probably multiply. If you wanted a higher number to represent “more agility” then you’d just take the inverse, something like this…

“Value, Flow, Feedback” Score = 1,000,000 / [VEO x E2EF x FFL]

Using the examples above (which may seem low to some of you – but you have to meet people where they are!) the overall score would look something like this:

VFF(Before) = 1,000,000 / [507 x 28 x 392] = 1/ 5,564,832 = 0.18 (Yeah, pretty low, right?)

VFF(After)   = 1,000,000 / [156 x 21 x 98]    = 1/ 321,048     = 3.1   (Better!)

VFF(Goal?) = 1,000,000 / [52 x 14 x 49]       = 1/ 35,672       = 28   (Much Better!)

How could teams use this? Well, for starters, baseline where you are today. For each, what improvements could you make? This way you could have an objective, consistent measure – and a way to communicate where you have come from, where you are today and what your goal is. I’d argue this would be better than the dozens of different qualitative Agile Assessments I’ve seen – which are typically based on some judgement about whether a team has adopted a particular set of agile practices, whether those are working or not.

Pitfalls?

I could probably write another whole blog post on how each of these could and are likely to be misused and abused. Not least of which is the likely application of targets to each of them. To which the economist Charles Goodhart would say, “When a measure becomes a target, it ceases to be a good measure.

Would “VFF Score” be worse than “Velocity” though? Probably not.

Thoughts? @ me! Joshua J. Arnold.