Some time ago I wrote about using the Goal-Question-Metric (GQM) method for identifying useful and organisationally relevant measurements in order to have a clear view of some aspect of security.
Often we think about metrics in terms of engaging security colleagues, executives and the board. However, occasionally in distributed organisations, we need to step away from delivery ourselves and provide clear guidance to other teams to deliver to our needs, we become the stakeholders of their projects.
This was the position I was in where I needed to document my requirements for a project to understand the current state of asset management and improve capabilities, where necessary, in the various engineering teams across an organisation in order to understand and improve the cyber posture of the organisation.
To do this I used the GQM methodology to state a clear goal, to identify the questions I would ask to see if those teams were delivering that goal and some indicative measurements or metrics that would answer those questions. Below is a worked example of using GQM for cyber measurement.
My clearly stated goal: “To know the environment accurately in order to accelerate the early stages of cyber incident response and to be able to make confident decisions sooner“.
In order to know we were delivering that goal I identified the following questions I would ask:
Do we know what assets we have?
Do we know enough about these assets in order to be able to analyse their role & potential impact in an incident?
Do we know which of our assets are critical?
Do we have an accurate map of our networks and how they connect to each other?
Do we have an accurate map of gateways into and out of our networks?
Do we have confidence that we still control our networks?
Do we know where are critical assets are on our networks?
Do we have appropriate network controls to limit the blast radius / contagion risk from an incident?
Are we confident we can recover business services and assets following a destructive attack?
In order to know how to answer these questions here are the indicative metrics I identified (I was open to different metrics being proposed by engineering teams but this was a strawman to start that conservation):
Question 1 Do we know what assets we have?
Question
Metric
Definition
1
1
What % of network devices connected to the network are accurately described in the inventory?
1
2
What % of servers connected to the network are described in the inventory?
1
3
What % of end-user devices connected to the network are described in the inventory?
Question 2 Do we know enough about these assets in order to be able to analyse their role & potential impact in an incident?
Question
Metric
Definition
2
1
What % of server assets have business services mapped to them?
2
2
What % of assets have MAC addresses recorded within the last week?
2
3
What % of assets have IP Addresses recorded within the last week?
2
4
What % of assets have Hostnames recorded within the last week?
2
5
What % of assets have Owner recorded?
2
6
What % of assets have Prod/PreProd/Dev status recorded?
2
7
What % of assets have Operating System recorded?
2
8
What % of assets have Operating System Version recorded?
2
9
What % of assets have Deployed Software recorded?
2
10
What % of assets have Deployed Software Versions recorded?
2
11
What % of assets have which network services are provided by which software recorded?
2
12
What % of assets have which TCP/UDP ports are in use by which network services recorded?
2
13
What % of assets have encryption at rest configured recorded?
2
14
What % of assets are identified as holding customer data?
2
15
What % of assets are externally-facing?
Question 3 Do we know which of our assets are critical?
Question
Metric
Definition
3
1
How many critical applications are identified in the inventory?
3
2
How many systems supporting critical application are identified in the inventory?
3
3
What % of systems supporting critical applications are mapped to groups of assets in the inventory?
3
4
What % of assets mapped to systems supporting critical applications have dependencies mapped?
Question 4 Do we have an accurate map of our networks and how they connect to each other?
Question
Metric
Definition
4
1
Do we have an automatically-generated network map/model?
4
2
Do we have an automatically-generated list of devices connected to our network?
Question 5 Do we have an accurate map of gateways into and out of our networks?
Question
Metric
Question
5
1
Do we have an accurate list of Internet gateways?
5
2
Do we have an accurate list of 3rd party connections?
5
3
Do we have an accurate list of remote access gateways?
Question 6 Do we have confidence that we still control our networks?
Question
Metric
Definition
6
1
How many named network administrator accounts do we have?
6
2
How many generic/shared network administrator accounts do we have?
6
3
How many remote access accounts do we have?
6
4
Is transmitted data kept confidential?
6
5
What controls limit access to our network environment?
6
6
How are those controls configured?
6
7
How are our Internet gateways configured?
6
8
How recently were each of our Internet gateways assessed/tested?
6
9
How many network-connected assets are monitored?
6
10
Can external systems access internal systems directly?
6
11
Do we authenticate access to our network environment?
6
12
How recently were our identified assets assessed for vulnerabilities?
6
13
How many of our assets are patched in line with BU policy/SLA?
6
14
Do we model the security implications of changes to our network flow control rules?
6
15
How do we manage privileged network administrator accounts?
6
16
How many inactive network administrator accounts are there?
6
17
Is there a change control procedure for network infrastructure changes?
6
18
Is there egress logging collected at each Internet gateway?
Question 7 Do we know where are critical assets are on our networks?
Question
Metric
Definition
7
1
How close are our critical systems to our Internet gateways?
7
2
Do we have appropriate controls between our critical systems and our Internet gateways?
7
3
Do we have appropriate controls between our critical systems and our less critical systems?
7
4
What controls exist between remote access users and our critical systems?
7
5
How many service accounts exist per host on each critical system?
Question 8 Do we have appropriate network controls to limit the blast radius / contagion risk from an incident?
Question
Metric
Definition
8
1
Have we implemented robust segmentation models in our network infrastructure?
8
2
What exceptions to a robust network segmentation model have been implemented in our network infrastructure?
Question 9 Are we confident we can recover business services and assets following a destructive attack?
Question
Metric
Definition
9
1
How long since the last backup of the asset?
9
2
Is there a hot/warm/cold DR capability available for the asset?
9
3
What asset dependencies exist for the backup and recovery processes & systems?
9
4
Are the backup and recovery systems running the same operating system as the systems they backup?
If we were able to answer all of these questions we would have an incredible level of situational awareness but we accepted that we would likely not be able to answer all questions fully or to an acceptable level of rigour but that by answering the questions as best we can we would smoke out systemic issues in our situational awareness, which proved to be true.