Harnessing hindsight

Assessing risk, resilience and operational incidents in airlines

Harnessing Hindsight

Carl Macrae examines how near-miss incident reports are used to oversee and manage risk in civil aviation.

Operational breakdowns are serious in any industry, but in airlines they can be catastrophic. Air accidents, like any organisational collapse, emerge from a complex combination of factors: from gaps in company policy, to clumsy or inappropriate procedures, to the errors of operational personnel. Long considered a safety-critical industry, airlines have been at the forefront of developing a range of approaches to manage these operational risks. One key strategy is analysing and learning from past events and minor incidents. Incident reporting programmes allow personnel at the operational 'sharp-end' to report mishaps, failures or concerns of relevance to flight safety. These can then be analysed and the findings acted on. Developed and used in the aviation industry for the past three decades, incident reporting programmes are increasingly emerging in other domains-most recently healthcare and finance.

Challenges of practice 

The focus of scholarly attention, however, remains largely on the design and implementation of reporting programmes: how to establish trust and encourage reporting, how to design databases and information systems, and how to structure causal analysis models and risk analysis tools. Research has rarely examined the practices of assessing and managing incident reports - where the interpretive challenges of risk analysis and management come to the fore.

A large airline can see around eight thousand reports a year from its pilots alone-and similarly high levels of reporting from engineering, ground and cabin crews. But these reports tend to be brief, truncated accounts of what are often complex organisational events. Reports can be 'one-liners', caricatured on occasion as simply saying that 'something went bang as we landed the aeroplane'. Generally, they concern minor operational fluctuations, hiccups and anomalies that typically result in little or no adverse outcome, and are compensated for or worked around - for instance, an error inputting data into a flight computer that is noticed and corrected during a subsequent cross-check. And they span a wide range of operational issues and areas, concerning literally anything that operational personnel decide to report: from circuit-breakers left tripped after a service to inappropriate advice given by a technical department.

Finally, programmes are run by independent organisational units that report to board level, but are separate to line management and have no executive capacity. This encourages reporting by personnel, and allows incident analysis to be removed from operational and commercial pressures. But it also introduces a challenge: these units have no direct authority to enforce action.

Examining the interpretive practices of the flight safety investigators who assess and manage incident reports in airlines provides specific insights into how minor events are analysed and learnt from. It also holds broader implications for how risks can be identified and made sense of in other complex organisational settings. The interpretive practice of investigators can be explained in terms of three analytical concepts: resilience, vigilance and participation. These three concepts respectively characterise how risks are practically understood, identified and acted on in this setting.

Resilience 

Operational safety and risk have typically been defined in terms of adverse consequences: a standard metric for risk is the severity and likelihood of a harmful outcome. Likewise, safety is often defined as the avoidance or absence of adverse events. Yet these approaches are found to be of limited use in this safety-critical setting. Investigators take the view that organisational activity is inherently imperfect. Errors and failures are a normal feature of operations: people will make mistakes as part of their daily work, and components will fail as part of their natural lifecycle. Accordingly, investigators assume that the potential for catastrophe is ever present, as small failures and events could combine in complex, unforeseen ways. The only means of guaranteeing absolute safety, as far as they are concerned, is to keep the aircraft locked in a hangar. In light of these assumptions, investigators differentiate relative safety and unacceptable risk within an interpretive framework that can be characterised as organisational risk resilience: the organisational capacity to protect operations from the potential of minor mishaps developing into major breakdowns.

Airline operations are replete with risk controls and safety defences such as routines of cross-checking and reading back instructions, training for out-of-ordinary conditions, and automated warning systems. These defences provide resilience to errors and failures. But acceptable safety requires not merely resilience in this typical sense of 'bouncing back' from actual mishaps. Rather, it requires resilience to the risks of minor operational failures escalating, by ensuring systems of defences remain in place beyond any actually called upon. Further, these defences and risk controls are viewed by investigators as social and organisational processes. So, for instance, the automated warning system that alerts pilots to terrain hazards is understood as a network of practical activities encompassing maintenance work, the ability of flight crew to notice the warning and respond appropriately, the provision of effective training, and the development of appropriate procedures and policy - and not merely as a technical system that is in place or not. Operational incidents are therefore used to diagnose where and how processes of organisational resilience are degraded, rather than to attempt predictions of future catastrophes.

Vigilance 

Making sense of incidents and identifying risks is, at core, about using and developing organisational knowledge. Current models of incident analysis and risk assessment focus on the incident data: categorising, classifying, abstracting and quantifying it. In practice, investigators interpret incidents by drawing on their extensive operational experience of organisational risks. Risks are identified through an interaction between what reports say and what investigators know. This includes, for instance, understanding the broader operational context surrounding an event, being aware of any similar problems or incidents experienced elsewhere in the industry, and knowing the operational history of the implicated processes - such as when and why they were developed. The aim of incident analysis, as investigators see it, is to oversee and know about the risks that currently exist. However, one of their most basic assumptions is that their knowledge of risk is always partial and limited. Some risks will always lie outside the bounds of their current knowledge. As such, they continually work to expose these unknown, latent risks. They adopt an approach that is based on humility and scepticism towards the safety of operations, the information they receive, and their own interpretations of risk, that can be characterised as interpretive vigilance. This interpretive work is directed at identifying weak and fleeting signs of ignorance, in the form of suspicions or doubts. Four distinct interpretive tactics are used to construct these suspicions, based on identifying patterns of failure, drawing relations between major issues and minor events, perceiving novelty in unrecognised forms of failure, and finding discrepancies in operational practices - or their knowledge of them.

Participation 

Incident reports are used not only as a source of risk data, but as specific opportunities to investigate and act on particular aspects of operations. But, as investigators have no direct authority to enforce action, they work to co-opt local specialists and personnel throughout the organisation to investigate, reflect and act on the risks implied by incidents. These means of addressing risks can be characterised as the creation of participative networks around risks. Investigators aim to influence and effect organizational action by setting a safety agenda, through initiating local investigations and publishing regular reports and reviews. Their primary tactics are to pose questions about safety and to publicise signs of potential problems, prompting local specialists to examine and review the implicated operational activities. In the case of more complex risks, this often involves bringing together networks of experts from different organisational units and operational areas. In this way, investigators co-ordinate distributed processes of organisational learning around numerous concrete and specific indications of risk: operational incidents. Knowledge is developed and change effected through the active participation and engagement of organisational personnel.

Lessons for theory from practice 

PlaneWhat implications does this examination of practice hold for current theory? First, it suggests that current models of risk management, and methods of risk analysis, could be productively extended by more fully attending to the 'positive' face of operational risk - the organizational practices and social processes that underpin organisational resilience - so moving beyond the current focus on predicting and avoiding failures, errors and harm. Second, it emphasises the central place of knowledge - and its dark side, ignorance - in dealing with risk. Assessing small moments of operational failure is an interpretive process that draws on forms of knowledge that are not readily quantified or formalised, such as the particulars, specifics and details garnered from practical operational experience, or vicarious knowledge of similar events experienced by other organisations. And identifying signs of ignorance, in the form of suspicions that arise from subtle relations and mismatches between current knowledge and organisational events, equally appears to offer a useful proxy for identifying latent risks. Third, it points to the importance of institutional designs that balance the tensions between central oversight and local participation and action, and that establish organisational spaces for collective enquiry and sensemaking around risk events.

Carl Macrae is an ESRC Postdoctoral Fellow at CARR

^