A novel framework for AI safety has been proposed, one that derives safety properties from software architecture rather than model training. This approach is rooted in the Buddhist doctrine of Dependent Origination, which states that all phenomena arise from conditions and nothing exists independently. By applying this principle, researchers have developed a foundational ethical axiom and a set of architectural laws for safe AI systems.
The current dominant paradigm for AI safety focuses on treating safety as a model property, where safety is instilled into model weights through techniques such as RLHF, DPO, and fine-tuning. However, this approach has been found to be limited, as even the best-trained models can fail to produce safe outputs due to the knowledge-application gap. This gap is structural and cannot be closed by training alone.
The proposed framework, on the other hand, recognizes that safety is a property of the architecture, not the model. By deriving safety rules from the principles of how reality works, this approach provides a more robust and reliable solution. The rules are enforced by the surrounding architecture, which decides what executes, rather than relying on the model itself.
This approach is grounded in empirical findings, including the discovery of the knowledge-application gap in language models, and convergent independent derivation of the core axiom from five distinct traditions. The framework has been tested and refined through over a thousand iterations of building and hardening a production system.
The use of Buddhist philosophy provides a structurally precise design vocabulary for AI architecture, offering functional analogs that enforce safety where models cannot override them. This novel approach has the potential to revolutionize the field of AI safety, providing a more effective and reliable solution for ensuring the safe operation of AI systems.
Photo by Pixabay on Pexels
Photos provided by Pexels
