Security defect triage in delivery projects

The guys at Recx asked me to look at a draft of their recent blog post The Business v Security Bugs – Risk Management of Software Security Vulnerabilities by ISVs where they describe some of the business constraints and influences on security defect triage for Independent Software Vendors and make the case that ultimately the triage decision is a business decision not a technical security decision.

I was happy to do it as I’ve known the guys at Recx for a long time and they are a great little British security company with some seriously deep technical security skills. They have a lot of experience working through ISV security defect triage processes both as external security researchers and as internal product security managers.

Recx make the following statement:

Much has been written about end user organization software vulnerability/patch risk management, but little in the context of software security from an independent software vendor release risk management perspective.

While I would agree that security patch cycles for the operational  management of existing systems is a topic that is well discussed, good practices agreed upon and regularly audited by external auditors I would argue that the same is not true for security defect triage in system development.

My experience is as a client of system integrators (SI) delivering complete systems built of COTS components and subsystems with SI developed custom glue code and business rules rather than within an ISV but I think there are very similar concerns when conducting security defect triage.

Choosing to fix security defects tends to have just two negative impacts on system delivery projects, costs and delays. Delays, for a variety of reasons listed below mostly translate into costs.

Sometimes you need to buy new hardware or software to mitigate a security defect but when running large delivery projects it is rare that the cost of the hardware or the software licensing is the driving cost of the project as a whole. There are always exceptions but most of the cost of large system delivery projects is in the people needed to design, build and test the system. For example: If on a larger (£Multi-Million) project you need a new firewall high-availability pair then the hardware and software costs are likely to be lost in the noise, however, the impact of the extra activities in the design, build and test phases and the uplift in the planned support costs are all likely to be the areas of cost concern.

The problem is that the biggest costs in delaying a release of a system extends the time that the various teams need to be stood up, sometimes you may be paying for people to sit idle while you’re waiting the fix to be developed. If you decide to re-baseline and stop paying for these people until they’re needed then there is a good chance given how ‘lean’ most system integrators run that they will be picked up by other paying projects and you’ll lose their knowledge and experience of the project adding to your delay as you need to rebuild the team and cope with the skilling up that brand new resources need.

There are political dimensions to delays that have to be considered, it’s a lot easier to tell a stakeholder that a 6 month delivery project is delayed by three weeks at the start of the project than 3 days before go-live. Last minute notification of delays to stakeholders can significantly impact trust relationships with those stakeholders in the future which if this is just one project out of a programme of projects can be a major impact to the overall programme. Stakeholder trust, especially when there is significant business change underway is critical for large programmes.

The business which in this case could be the client and/or the system integrator depending on where the cost of the change ends up will always be looking at the bottom line and that’s a strong influence, especially where the project managers are measured on their ability to deliver benefits on time and to cost.

A more far-sighted project manager will see a fix pushed into the future roadmap as a risk to the future operation of the system, if it is added to a standard patch cycle update then the impact is likely to be manageable whereas a future point release or an out of cycle critical patch puts the operational system availability at risk and may inconvenience stakeholders once they are using the system

The security risk managers in the business will want to see that the risks are being managed in a pragmatic and ultimately effective manner and will worry about the need to fix the defect at speed in operation if an active attack is discovered. However, it is also often the case that the initial users of the system are a small subset of the final user base and that the initial months of operation put less users and or information assets at risk than the final system is designed to support so a pragmatic risk manager may be prepared to take increased risk up to a level of usage allowing time for a fix to be developed and deployed post go-live.

At the end of the day a judgement has to be made balancing the following variables:

  • New hardware / software costs
  • Immediate financial cost of delay
  • Immediate political cost of delay
  • Risk of exploitation over time
  • Impact of exploitation over time or by number of users
  • Increased cost of a delayed fix
  • Increased risk of deployment to a live system in the future

However, the ‘business’ will likely include a range of stakeholders including the system integrator, the project manager, the various representatives of the user communities, other business partners and the security risk managers; none of whom likely have the same priorities when judging those variables. I think that is an important point, the ‘business’ is a diverse group with varying and often conflicting priorities. Often the guy shouting the loudest isn’t the all-encompassing representative of the ‘business’ he claims to be but just the most noisy and pushy of the throng of stakeholders.

A really good question at this point is: “Who owns the final decision?”.

If you don’t know that at the start of the project you’ll run into a lot of pain finding it out when you run into a significant security defect.