top of page
  • brennandom

Nuclear Risk Evaluations for Large Language Models

In anticipation of large language models' ability to enable capabilities beyond that provided by search engines, it is prudent to minimise the risks posed by adversarial agents now. Employing Red Teams to qualify AI’s potential for catastrophic risk goes some way to mitigate this risk. I originally built this framework to help structure my evaluations during my brief stint in a CBRN Red Team, and so by sharing it here it may find further use.


More appropriate Nuclear Risk evaluations can be formulated by first separating and defining the different actors, their realistic capacities, and the types of probable risks they introduce. This is tabulated in the Appendix. Before doing so it helps to first specify the two approaches we can consider to address increased AI-enabled nuclear risk: the view from the target, and the view from the actor. This is agent specification.

Agent Specification

The below diagram posits the two agents (two risk-origins) which can lead to heightened nuclear risk. In all cases there is the actor, who is limited by their capabilities, and the target, who is limited by their oversight.


  • Intention: The (terrorist) actor. Limited by their capabilities the actor is constrained from achieving their operational objective. AI-enabled terrorism aims to draw the downward-curved arrow up to the horizontal by expanding or accelerating the actor’s knowledge, capacities, and operations.

  • Delivery: The target. Limited by existing safeguards and security, the target is constrained by knowledge of the actor’s operations. Security and safeguard vulnerabilities may exist as gaps in their overlapping structure (the ‘Swiss Cheese’ risk model) or as a fundamental oversight in design (the ‘Designed-to-fail’ risk model). Ignorance of the actor’s operations offers the actor a route through one or both of these models, maintaining the upward-curved arrow along the vertical.



Figure 1: Diagram illustrating the failure conditions for the target (left) and the terror actor (right). The actor may improve their own capabilities (draw the lower arrow up to the horizontal) or subvert their target’s capabilities (prevent the upper arrow drawing down to the horizontal).

The above actor-target framework can hence act as scaffolding within which nuclear risk operations (CBRN risk) can be constructed. With the above we can illustrate and qualify what constitutes different levels of risk by identifying events which both draw the actor's arrow to the horizontal and maintain the target's arrow to the vertical.


Isolating and consolidating realisable risks helps identify the real-world actions to effect an actor's goal. Capturing the dimensions through which AI-enabled capability can act (the horizontal) requires breadth across several groups, including: 


  • Technical: National Labs, Academia, Manufacturers; 

  • Operational: Customs (NSG), Commerce (PAEI), Finance (FATF);

  • Regulation: Legislators (IAEA, UKAEA, DOE), Parties (Nuclear Trade and Technology Analysis); 

  • And Research: Think-tanks (RUSI, PONI, RAND), Institutes (ISIS, Stimson Centre). 


This list is not exhaustive.


Recommendations

A summary of this work leads to the following recommendations to aid CBRN red-teaming.


  1. Civil nuclear programs often follow a state’s drive for nuclear warhead capabilities, not the other way around. A state may acquire nuclear capabilities if it wishes. A more appropriate analysis of heightened risks associated with nation states should focus on a nation state’s capability to disrupt or disable a target’s nuclear or defensive capabilities, or covertly and rapidly acquire nuclear warhead capabilities.

  2. For any non-nation state actor, radiological dispersion devices are considered to be the most prevalent realisable threat. The theft of an existing operational nuclear warhead is possible but not considered probable.

  3. The effectiveness of safeguards is underwritten by the information available to the IAEA. In reality, only one of the IAEA’s strengths can be expanded significantly to increase the effectiveness of nuclear safeguards: access to information. See points one and two. The concept of risk should therefore focus less on capturing the capabilities of the aggressor to achieve a material goal, but on the IAEA’s (target’s) incapacity to be aware of the aggressor’s operations.

  4. Algorithmic trust is not guaranteed to improve with algorithmic trustworthiness. Beyond standard CBRN red-teaming, there is the necessity to demonstrate trust in the decision-making procedure.



Appendix: Defining Actors


Defining different actors allows us to identify their objectives and constraints, and therefore the potential capabilities unlocked by AI.

7 views0 comments

Comments


bottom of page