Innate values describe agents' intrinsic motivations, which reflect their inherent interests and preferences for pursuing goals and drive them to develop diverse skills that satisfy their various needs. Traditional reinforcement learning (RL) is learning from interaction based on the feedback rewards of the environment. However, in real scenarios, the rewards are generated by agents' innate value systems, which differ vastly from individuals based on their needs and requirements. In other words, considering the AI agent as a self-organizing system, developing its awareness through balancing internal and external utilities based on its needs in different tasks is a crucial problem for individuals learning to support others and integrate community with safety and harmony in the long term. To address this gap, we propose a new RL model termed innate-values-driven RL (IVRL) based on combined motivations' models and expected utility theory to mimic its complex behaviors in the evolution through decision-making and learning.
Combined Motivations and Characteristic:
Psychologic Movtivation Models:
We assume that all the AI agents (like robots) interact in the same working scenario, and their external environment includes all the other group members and mission setting. In contrast, the internal environment consists of individual perception components including various sensors (such as Lidar and camera), the critic module involving intrinsic motivation analysis and innate values generation, the RL brain making the decision based on the feedback of rewards and description of the current state (including internal and external) from the critic module, and actuators relating to all the manipulators and operators executing the RL brain's decisions as action sequence and strategies.
The Proposed IVRL Model based on Expected Utility Theory Models:
The architecture of the IVRL DQN and Actor-Critic models:
Considering that the VIZDoom testbed can customize the experiment environment and define various utilities based on different tasks and cross-platform, we selected it to evaluate evaluate our IVRL model. We choose four scenarios: Defend the Center, Defend the Line, Deadly Corridor, and Arens, and compare our models with several benchmark algorithms, such as DQN, DDQN, A2C, and PPO. These models were trained on an NVIDIA GeForce RTX 3080Ti GPU with 16 GiB of RAM.
Four Testing Scenarios (Defend the Center, Defend the Line, Deadly Corridor, and Arens):
In our experiments, we found that selecting the suitable utilities to consist of the agent innate-values system is critically important for building its reward mechanism, which decides the training speed and sample efficiency. Moreover, the difference in the selected utility might cause some irrelevant experiences to disrupt the learning process, and this perturbation leads to high oscillations of both innate-value rewards and needs weight.
The Performance Comparison of IV-DQN and IV-A2C Agents with DQN, DDQN, PPO, and A2C in the VIZDoom:
The innate value system serves as a unique reward mechanism driving agents to develop diverse actions or strategies satisfying their various needs in the systems. It also builds different personalities and characteristics of agents in their interaction. From the environmental perspective, due to the various properties of the tasks, agents need to adjust their innate value system (needs weights) to adapt to different tasks' requirements. These experiences also shape their intrinsic values in the long term, similar to humans building value systems in their lives.
This research introduces a new RL model from individual intrinsic motivations perspectives termed innate-values-driven reinforcement learning (IVRL). It is based on the expected utility theory to model mimicking the complex behaviors of agent interactions in its evolution. By adjusting needs weights in its innate-values system, it can adapt to different tasks representing corresponding characteristics to maximize the rewards efficiently. In other words, through interacting with other agents and environments, the agent builds its unique value system to adapt to them, the same as "Individual is the product of their own environment."
Moreover, in the multi-agent setting, organizing agents with similar interests and innate values in the mission can optimize the group utilities and reduce costs effectively, just like "Birds of a feather flock together." in human society. Especially combined with AI agents' capacity to aid decision-making, it will open up new horizons in human-multi-agent collaboration. This potential is crucially essential in the context of interactions between human agents and intelligent agents when considering establishing stable and reliable relationships in their cooperation, particularly in adversarial and rescue mission environments.
This work is supported by the NSF Foundational Research in Robotics (FRR) Award 2348013.