Nothing beats the intense smoky flavour and the seasoned crust of a perfect pepperoni pizza ordered in a lazy Friday night.
If you too share a passion for pizza and a love for machine learning, the idea of training a robot to make pizza with machine learning might not sound too far-fetched to you. In the conceivable future, a robot might be able to assemble raw ingredients in the appropriate proportion and bake it to perfection. In fact, here’s a French robot making a pizza.
However, the process of using machine learning to train the pizza-making robot might not be smooth-sailing. What if it makes pizza using ingredients that have gone bad? What if the robot burns down the kitchen when making the pizza? These are but two examples of potential artificial intelligence (AI) accidents which highlight the importance of AI Safety when implementing such systems.
The problem in AI Safety
AI and machine learning have made tremendous strides and monumental impacts on society in recent years. While AI has made possible previously unfathomable applications, it has also drawn flak by inflicting harm to marginalized groups. The accidental nature of such harm does not absolve the AI algorithms from its responsibility to protect its users. Therefore it remains imperative for machine learning practitioners to understand the root cause of such accidents.
Today, I will briefly answer the following questions —
what is AI safety and accident?
what are the causes of AI accidents
how can we prevent AI accident?
Throughout the post, I will use the analogy of a pizza-making robot. It is assumed that this robot is trained using reinforcement learning using a reward function that optimizes its speed in pizza-making.
What is an AI accident?
AI accident is defined as —
the unintended and harmful behaviour that may emerge from poor design of real-world AI system. 
Loosely speaking, AI safety is the set of action or principles aimed at preventing AI accidents.
Why do AI Accidents Happen?
There are many sources of AI accidents. According to a group of researchers from Google Brain, Stanford, UC Berkeley and Open AI , the five sources of AI accidents are:
- Negative side effects
- Reward hacking
- Scalable oversight
- Unsafe exploration
- Lack of robustness to distributional shift
All these might sound like gibberish to you right now. Let’s explore what each of them means.
1. Negative side effects
Is it possible that the pizza-making robot will adversely affect the environment while making delicious pizzas? For instance, in the pursuit of making one pizza as quickly as possible, the robot decides to knock over all the condiments, leaving a mess for the kitchen owner to clean up?
Sometimes, the most effective way to achieve the agent’s goal may involve doing something that is at best unrelated and at worst destructive. This might be to difficult to avoid when the robot is placed in a multifaceted, complex environment. While humans have the common sense not to perform disruptive actions while achieving the goal, the same cannot be said for machine learning agents.
One solution: An impact regularizer. As such, a possible action is to define an impact regularizer and include that in the reward function given to the robot. Machine learning practitioners would recognize the regularizer as a mathematical expression that penalizes the overfitting to the data set. Similarly, an impact regularizer penalizes any a change to the environment.
2. Reward Hacking
Is it possible for the pizza-making robot to game the objective function given to it by its creator? For instance, if the objective function is to make a pizza as quickly as possible, the pizza-making robot might skimp on the toppings and bake a toppingless pizza… Not a particularly scrumptious pizza if you ask me.
This can happen due to a few reasons, one of them being Goodhart’s Law. This happens when the chosen objective function is a metric that correlates strongly with the completion of the task but breaks when optimized.
For instance, the rate of making a pizza is highly correlated with the rate at which flour is consumed since flour is an important ingredient to make the pizza dough. Thus, one might decide to measure the rate of making pizza by the time required to use up a fixed amount of dough.
To optimize the rate of flour depletion, the agent might decide to toss all the flour away. By its standard, it has successfully depleted all the flour within negligible time and thinks it broke the world record as the fastest pizza maker.
One possible solution is to carefully engineer the agent through extensive testing of the system. Though this approach is practical and can create highly reliable systems, it is not the silver bullet to the problem.
Another possible solution is to have multiple reward functions through the implementation of different mathematical functions for the same objective. For instance, instead of estimating the speed of pepperoni pizza making using the rate of depletion of flour, it can be better approximated as the minimum of the rate of depletion of flour from the box of flour and the rate of addition of flour to the mixer. That should stop the robot from throwing away perfectly good flour.
3. Scalable Oversight
Is it possible that the pizza-making robot to ignore aspects of the reward function that are too difficult to evaluate during training? For instance, we can use a mathematical objective function that rewards both the speed of pizza-making and a score on the taste of the pizza.
However, this reward function assumes that there is a human who will be eating thousands of pizzas to give a score to each pizza. That might not be realistic — and thus this check on the pizza taste happens relatively infrequently during training. How do we ensure that the robot still makes acceptable pizza in the dearth of information?
This is a problem associated with semi-supervised reinforcement learning, where the robot sees the rewards, not for all timesteps, but only for a fraction of them.
One possible solution is distant supervision. Instead of allowing the robot to see the actual rewards for a tiny fraction of the timestep, we provide the robot with a noisy estimation of the reward for all timesteps.
4. Safe exploration
Is it possible for the pizza-making robot to make dangerous exploration moves? For instance, the robot might leave the oven on unattended with no pizza for extended durations when it is preparing the dough. This is at best a waste of energy and at worst a costly disaster.
The problem of safe exploration is heavily explored in academia. One of the possible solutions is to use simulated exploration in place of having the robot perform the exploration in real life. The impact of performing catastrophic exploratory actions by agents is minimal in a simulated environment as compared to that in real life. However, this approach is limited by how well the simulator reflects real-life which is often more erratic and complex than imagined by the designer of the simulator.
For instance, simulated environments have been extensively used in the training of self-driving cars before the first of its kind hit the roads. This drastically reduces the danger of self-driving cars on the roads. Yet, a simulator may not have considered the scenario of wild animals dashing across the roads, which can confuse the simulation-trained agent if it has not seen a wild animal before.
5. Lack of Robustness to Distributional Shift
Is it possible for the pizza-making robot to stop functioning if we switch the position of the bottles of salt and sugar? Instead of a savoury main course, the robot might end up serving a failed experimental dessert.
Machines learning systems are not particularly skilled at adapting to changes to its surrounding or giving accurate predictions on data that it has not seen before. This is the main reason for the unfortunate racial bias observed in machine learning algorithms. For instance, a 2015 scandal involving Google mistagging two African American users as gorillas arose due to the lack of African American representation in the training data set.
The solution to this problem is two-fold. The model needs to first recognize that the distribution of the test data set it is seeing is potentially different from that of the training set. Having identified a shift in distribution, it needs to respond appropriately.
This post explained the concept of accidents in machine learning and explored some potential causes and solutions to accidents using the analogy of a whimsical pizza-making robot.
Admittedly, the occurrence of accidents in machine learning are somewhat trivialized when illustrated with the example of a whimsical pizza-making robot. Yet, this can be extrapolated to large-scale machine learning systems with the potential of causing catastrophic impacts.
Thus, data science and machine learning practitioners should be cognizant of AI safety and implement safeguards in their models. There have been calls for social scientists in the realm of AI safety to ensure the alignment of AI actions with human intentions. 
If you are interested in the topic of AI safety, do check out the paper ‘Concrete problems in AI safety’ for a detailed exposition. Alternatively, check out Open AI’s progress in making AI safer for the world. In particular, it has release Safety Gym, a suite of tools to measure the progress of reinforcement learning agents that respect safety constraints.
For more amazing resources on AI safety, please refer to OpenAI, DeepMind, The Open Philanthropy Project, Ought, MIRI, GovAI, The Future of Life Institute, or the Center for Human Compatible AI.
About the Author
If you liked this post, you might also enjoy my attempt at finding a good wine using interpretable machine learning.
I also enjoy interacting with the readers and welcome any feedback. Connect with me on LinkedIn!
 Amodei, Dario, et al. “Concrete problems in AI safety. arXiv 2016.” arXiv preprint arXiv:1606.06565.
 OpenAI. 2020. AI Safety Needs Social Scientists. Accessed 22 November 2020.