The ‘Worst case’ isn’t ever the worst

I am intrigued (and appalled) by the oil spill in the Gulf of Mexico, and the note that the spill was worse than BPs worst case estimate.  As soon as anyone talks about worst case estimates I get deeply sceptical, since it doesn’t really mean anything.  Probabilities of certain events are fine, but the implication that you are so smart that you can work out and cap the ‘worst case’ is wildly arrogant.  But, I know that the offshore oil industry is pretty focussed on safety, so I thought I would both find out more about the expected level of safety, and maybe use it as an example of how things can go wrong even when you try very hard to prevent it, so act as a caution in other arena where much much less work is done before people declare themselves confident in the worst-case outcome.

However, having spent over three of hours on what I thought might be a one hour poke around though, I have come away a whole lot less comfortable.  So, I am not going to write anything more on the trap of ‘worst case’, and instead describe what made me uncomfortable.  I looked at a report on the blow-out preventors (BOPs), one on overall risks associated with BOPs, and the actual Exploration plan filed by BP).

Blow out preventor failures

Blow-out preventers fail – indeed rather frequently for something that is supposed to be a key control on disaster – reports have shown 117 failures in the US in 2 years, and 138 in four years in wells off Brazil, Norway, Italy and Albani (See here).  But, these apparently bad statistics may not be all they seem since they include all failures, not catastrophic ones.  But, this failure has clearly spooked some in the oil industry (e.g. see here).

As an engineer, I am sure folks tried hard, to make these things effective, and am intrigued how it is done, so I had a look at paper done in 2004 for the US mineralogical service on the pipe shearing mechanisms that form the last line of defence (here).  These pipe shears are designed to cut the drill pipe, and then shut and seal the well.  It has several rather worrying quotes :-

“As stated in a mini shear study recently done for the MMS, only three recent new-build rigs out of fourteen were found able to shear pipe at their maximum rated water depths. Only half of the operators accepting a new-build rig chose to require a shear ram test during commissioning or acceptance. This grim snapshot illustrates the lack of preparedness in the industry to shear and seal a well with the last line of defense against a blowout.”


“While the industry is fully aware of the inability of sealing shear rams to shear tool joints, it is unclear as to whether the industry fully appreciates the fact that internal upsets can also be problematic. Going forward, internal upsets as well as tool joints should be taken into account when considering the hang off location, ram space out and resulting shear point of the drill pipe. We do not want to attempt to shear at an internal upset or tool joint; to do so will probably not only be unsuccessful, it will also most likely damage or destroy the shear blades.”

I can’t find definitive information on the joints, but they appear to be about 18 inches long, and in a 31 foot pipe that means that if you have to operate the shear at a random point on the pipe (as I suspect you might, in an emergency), you have a 5% chance of hitting a joint, and won’t be able to shear, even if the shearing of the main part of the pipe was perfect (which it isn’t).  A failure in the Yucatan peninsula in 1979 – the only failure in real use known about – suffered exactly this type of problem, though with drill collars rather than joints :-

“As in other disasters, multiple issues occurred and wrong directions were taken, but the shear rams were activated at one point and did fail to shear. Reportedly, they were pulling the drill string too quickly without proper fluid replacement and the well started coming in. They had no choice but to close the shear rams; unfortunately, drill collars were in the stack and shearing failed. The situation deteriorated from this point. This incident started the development of shear rams that could shear casing and/or drill collars.”

Pity that they didn’t succeed eh? Or, if they have, that it hadn’t been made mandatory.  And I can’t resist one rather geeky quote (but I’ll translate below)

“Many statistical procedures assume (including the discussion regarding standard deviations above) that a variable is normally distributed; therefore, it is appropriate to determine/address whether data is normally distributed or not. To address whether our data of the Actual Shear minus Distortion Energy calculated shear was normal, the Chi-Square goodness-of-fit test was performed on the data. The results determined a Chi-Square of 32.41 and a P-Value of < 0.0001; these numbers are such that a normal distribution cannot be assumed. A further test called the Lilliefors test was run on the data, which also concluded that the data was not normally distributed. None-the-less, the assumption of normalcy allows us to better understand the nature of our data.”

This reads like finest non-liquid horse product.  My translation is that ‘we tried two ways to show the data followed a normal (aka Gaussian) distribution, found that it didn’t , but carried on like it did since we had no way of doing statistics if wasn’t normal.  We pretended that this helped us understand risk.’ Of course they may have done more work than this, so it may not be true in this case, but it’s a hugely common type of failure, and of massive import in doing statistics about low probability events like, say, having to shear a drill pipe in an emergency!

“The ideal for the industry would be a shear force predictor that would be successful for shearing the pipe in question close to 97% of the time.”

This statement may mean that they want maths that provides a base figure, on top of which a safety factor will be applied to allow for the experiments being under, well, experimental conditions.  But, I have no idea how the safety factor would be determined, and the maths doesn’t look robust, per note above.  Being charitable, let me leave it that the shears work almost all the time on normal pipe, and return to the existence of joints, which already mean that there is at least a 1 in 20 chance of failure to shear a pipe if you cannot position it so that the shear doesn’t hit a joint – maybe more under disaster scenarios where pipe could jam on a joint or similar.  Which doesn’t sound great for a last line of defence.  In my head I would already think of having two shears, spaced so that you know that one of them will hit pipe it can actually cut – but this isn’t what seems to be done.

BOP risk assessments

I couldn’t find anything robust on failure mode analysis for BOPS.  It may exist and it may be done well.  But, I was able to find an M.Sc. thesis (here) written on sub-surface vs. surface BOPs – I do not know anything about the accuracy of the document, or indeed its veracity – it could have simply been made up and not reflect the real world practise … but, in terms of poor failure mode analysis it is fairly typical, so interesting to look at.  It starts with a rather theoretical assessment of risk, but then notes

“The above formulas (Eq. 3.5 to 3.11) only apply to very reliable systems, which in most cases describe the ideal system conditions or work with generic models. For unreliable or real systems, the formulas are inaccurate; therefore the assumptions are not valid and stochastic simulation must be considered.”

So far so good.  But, then we get to the models assumptions.  Page 36 includes a table of expected failure rates and consequences, to feed into the model.  These are described as a reasonably robust data set, but it is immediately apparent that there is nothing in it that looks like a 5% failure rate that comes from an attempt to shear across a joint in an emergency (that would translate to 5x10E-3 in the nomenclature used in the paper if you want to look). The closest is human error at 1%.  This comes from what looks like a data set issue – the key factor is not how often it fails over some unit of time, but how often it fails per use.  By analogy, I have no interest in hearing about the safety of an airbag in my car based on the duration I drove the car but didn’t crash it.  I am interested in safety when it is used.  And, whilst the failure rate in an emergency might not really be 5%, as you might sometimes have a chance to position the drill first, it doesn’t feel remotely safe to assume it is orders of magnitudes lower.  For clarity, the analysis here doesn’t even try to work out the failure rate under this type of scenario, it simply takes the failure rate of isolated failures over time.

This seems a fairly fundamental issue with the analysis, and really means that whilst it talks about a stochastic model, it looks more like a mote-carlo analysis based on reported frequency of unrelated events.  No real attempt to model failure modes and link events seems to have been done.  And, even with this analysis, the failure rate shown is 4% in a year (see page 50).  Or, one failure could be expected every 25 years, which seems an surprisingly high level of risk tolerance.

A final note from this document, that appears to show a mental anchoring bias  –

“The CODAM dataset gave a pessimistic result, unlike the PARLOC dataset, whose results were optimistic. Although neither of the results might be correct, they can provide a range for the actual value of reliability.”

You can’t say that either is pessimistic or optimistic – that implies you know the real answer.  All you really know is that they were different – they could both be highly optimistic.  The only minor crumb of comfort I can find is in the suggested future work which shows awareness of the need for a more robust failure mode analysis :-

“Only primary failures from each component were taken into consideration for this study, because the purpose was to have a preliminary assessment on whether it would be positive or not to implement a high-pressure riser. Future work should include secondary and tertiary failures to take into account chain events and their consequences.”

This is good to see, but fails to note that such an analysis may invalidate the initial one, and the conclusions given about that are much less equivical.

BPs Exploration Plan for Deepwater Horizon

Lets get to the actual Exploration Plan that BP filed for this well.  You can find a copy here.  It clearly does not regard this as a high-risk operation, to whit –

“A scenario for the potential blow-out of the well from which BP would expect to have the highest volume of liquid hydrocarbons is not required for the operations proposed in this EP.

Lots on hurricanes though!  All in all, it has a somewhat dubious worst case estimate with no source or logic given which feels poor.  But, apart from that is I am guessing similar to hundreds of other such documents, and broadly says ‘we’ll comply with the requirements’.  If the requirements were robust and applicable, then that seems reasonable enough.  In other words, it didn’t have to be this rig, or this well, or BP – it could have been anyone anywhere, if the basic technology can’t cope with the circumstances.

It is worth noting that the BOP has apparently at least partially operated in this case, since the flow of oil is lower than would otherwise have been the case (See here).  Lots has been made of the rate (210,000 gallons/day) exceeding BP’s ‘worst case estimate’ of 162,000 gallons a day – even more stark if 210,000 is LOWER than it was expected to be.  But, it is notable that the EP has BP’s total disaster plan capacity in the area as 300,000 gallons/day.  It might have been useful if BP had actually showed the range of outcomes rather than the spuriously precise 162,000 gallons/day

This entry was posted in Cognitive biases, Environment and tagged , , , , . Bookmark the permalink.

3 Responses to The ‘Worst case’ isn’t ever the worst

  1. Pingback: Part deux – ‘worst case’ = self-delusion | Greg Pyes blog

  2. pat says:

    Oil industry still uses old statistical techniques in safety analysis.
    Full blown Bayesian belief networks based on the Peal calculus that can handle non-normal and multi-modal distributions as chained events are not used yet.
    As far as I can see this is the case across most of the ‘safety’ industry. I’m not sure though as I only work in the oil sector.

  3. Pingback: Early view on the real oil leak failures | Greg Pyes blog

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s