Emergencies like natural disasters and terrorist attacks, which are characterized by sudden onset situations, have always been part of the world’s reality. In these situations, collective human behavior such as crowd stampedes may be triggered. Sometimes stampedes lead to fatalities, as people are crushed or trampled. In this paper, a dynamic model for the evacuation of pedestrians during emergency situations is proposed with the consideration of information transmission in the crowd. In this model, based on a complex adaptive system, it is assumed that the information, part of which is psychology and behavior, has an influence on the interactions of pedestrians. We investigated two factors: (1) the systematic condition of information transmission and (2) the intensity of beneficial information that affects the interaction. Parameters of the factors are discussed to capture the dynamic features of the pedestrian crowd through proved theorems, and the critical values for effective evacuation are obtained. Finally, a numerical example for the evacuation of pedestrians is designed to validate the practicality of this model and mathematical demonstration.
A common phenomenon in everyday life is that, when a strange event occurs or is announced, a regular crowd can completely change, showing different intense emotions and sometimes uncontrollable and violent emerging behavior. These emotions and behaviors that disturb the organization of a crowd are of concern in our study, and we attempt to predict these suspicious circumstances and provide help in making the right decisions at the right time. Furthermore, most of the models that address crowd disasters belong to the physical or the cognitive approaches. They study pedestrian flow and collision avoidance, etc., and they use walking speed and angle of vision. However, in this work, based on a behavioral rules approach, we aim to model emergent emotion, behavior and influence in a crowd, taking into account particularly the personality of members of the crowd. For this purpose, we have combined the OCEAN (Openness, Consciousness, Extraversion, Agreeableness, and Neuroticism) personality model with the OCC (Ortony, Clore, and Collins) emotional model to indicate the susceptibility of each of the five personality factors to feeling every emotion. Then we proposed an approach that uses first fuzzy logic for the emotional modeling of critical emotions of members of the crowd at the announcement or the presence of unusual events, in order to quantify emotions. Then, we model the behavior and the tendency towards actions using probability theory. Finally, the influence among the members of the crowd is modeled using the neighborhood principle and cellular automata.
In machine learning, learning a task is expensive (many training samples are needed) and it is therefore of general interest to be able to reuse knowledge across tasks. This is the case in aerial robotics applications, where an autonomous aerial robot cannot interact with the environment hazard free. Prototype generation is a well known technique commonly used in supervised learning to help reduce the number of samples needed to learn a task. However, little is known about how such techniques can be used in a reinforcement learning task. In this work we propose an algorithm that, in order to learn a new (target) task, first generates new samples—prototypes—based on samples acquired previously in a known (source) task. The proposed approach uses Gaussian processes to learn a continuous multidimensional transition function, rendering the method capable of reasoning directly in continuous (states and actions) domains. We base the prototype generation on a careful selection of a subset of samples from the source task (based on known filtering techniques) and transforming such samples using the (little) knowledge acquired in the target task. Our experimental evidence gathered in known reinforcement learning benchmark tasks, as well as a challenging quadcopter to helicopter transfer task, suggests that prototype generation is feasible and, furthermore, that the filtering technique used is not as important as a correct transformation model.
In this paper, the role of adaptive group cohesion in a cooperative multi-agent source localization problem is investigated. A distributed source localization algorithm is presented for a homogeneous team of simple agents. An agent uses a single sensor to sense the gradient and two sensors to sense its neighbors. The algorithm is a set of individualistic and social behaviors where the individualistic behavior is as simple as an agent keeping its previous heading and is not self-sufficient in localizing the source. Source localization is achieved as an emergent property through agent’s adaptive interactions with the neighbors and the environment. Given a single agent is incapable of localizing the source, maintaining team connectivity at all times is crucial. Two simple temporal sampling behaviors, intensity-based-adaptation and connectivity-based-adaptation, ensure an efficient localization strategy with minimal agent breakaways. The agent behaviors are simultaneously optimized using a two phase evolutionary optimization process. The optimized behaviors are estimated with analytical models and the resulting collective behavior is validated against the agent’s sensor and actuator noise, strong multi-path interference due to environment variability, initialization distance sensitivity and loss of source signal.
Consensus formation is investigated for multi-agent systems in which agents’ beliefs are both vague and uncertain. Vagueness is represented by a third truth state meaning borderline. This is combined with a probabilistic model of uncertainty. A belief combination operator is then proposed, which exploits borderline truth values to enable agents with conflicting beliefs to reach a compromise. A number of simulation experiments are carried out, in which agents apply this operator in pairwise interactions, under the bounded confidence restriction that the two agents’ beliefs must be sufficiently consistent with each other before agreement can be reached. As well as studying the consensus operator in isolation, we also investigate scenarios in which agents are influenced either directly or indirectly by the state of the world. For the former, we conduct simulations that combine consensus formation with belief updating based on evidence. For the latter, we investigate the effect of assuming that the closer an agent’s beliefs are to the truth the more visible they are in the consensus building process. In all cases, applying the consensus operators results in the population converging to a single shared belief that is both crisp and certain. Furthermore, simulations that combine consensus formation with evidential updating converge more quickly to a shared opinion, which is closer to the actual state of the world than those in which beliefs are only changed as a result of directly receiving new evidence. Finally, if agent interactions are guided by belief quality measured as similarity to the true state of the world, then applying the consensus operator alone results in the population converging to a high-quality shared belief.
To ensure cooperation in the Prisoner’s Dilemma, individuals may require prior commitments from others, subject to compensations when agreements to cooperate are violated. Alternatively, individuals may prefer to behave reactively, without arranging prior commitments, by simply punishing those who misbehave. These two mechanisms have been shown to promote the emergence of cooperation, yet are complementary in the way they aim to promote cooperation. Although both mechanisms have their specific limitations, either one of them can overcome the problems of the other. On one hand, costly punishment requires an excessive effect-to-cost ratio to be successful, and this ratio can be significantly reduced by arranging a prior commitment with a more limited compensation. On the other hand, commitment-proposing strategies can be suppressed by free-riding strategies that commit only when someone else is paying the cost to arrange the deal, whom in turn can be dealt with more effectively by reactive punishers. Using methods from Evolutionary Game Theory, we present here an analytical model showing that there is a wide range of settings for which the combined strategy outperforms either strategy by itself, leading to significantly higher levels of cooperation. Interestingly, the improvement is most significant when the cost of arranging commitments is sufficiently high and the penalty reaches a certain threshold, thereby overcoming the weaknesses of both mechanisms.
This paper presents a framework for integrating intrinsic motivation with particle swarm optimisation. Intrinsically motivated particle swarm optimisation can be used for adaptive task allocation when the nature of the target task is not well understood in advance, or can change over time. We first present a general framework in which a computational model of motivation generates a dynamic fitness function to focus the attention of the particle swarm. We then discuss two approaches to modelling motivation in this framework: a computational model of curiosity using an unsupervised neural network and a model of novelty based on background subtraction. We introduce metrics for evaluating intrinsically motivated particle swarm optimisation and test our algorithm as an approach to task allocation in a workplace hazard mitigation scenario. We found that both proposed motivation techniques work well for generating a fitness function that can locate hazards, without requiring a precise definition of a hazard. We found that particle swarm optimisation can converge on optima in our generated fitness landscape in some, but not all, of our simulations.
Visual object-recognition plays a crucial role in animals that utilize visual information. In this study, we address the prey-predator recognition problem by optimizing artificial convolutional neural networks, based on neuroethological studies on toads. After the optimization of the overall network by supervised learning, the network showed a reasonable performance, even though various types of image noise existed. Also, we modulated the network after the optimization process based on the computational theory of classical conditioning and the reinforcement learning algorithm for the adaptation to environmental changes. This adaptation was implemented by separated modules that implement the "innate" term and "acquired" term of outputs. The modulated network exhibited behaviors that were similar to those of real toads. The neural basis of the amphibian visual information processing and the behavioral modulation mechanism have been substantially studied by biologists. Recent advances in parallel distributed processing technologies may enable us to develop fully autonomous, adaptive artificial agents with high-dimensional input spaces through end-to-end training methodology.
At present, due in part to our insufficient understanding of the traumatic experience, we are unable to account for the fact that while some people develop post-traumatic symptoms following a traumatic event, others do not. This article suggests that by adopting the enactive approach to perception—according to which perceiving is a way of acting—we may be able to improve our understanding of the traumatic experience and the factors which result in the development of post-traumatic symptoms. The central argument presented in this paper is that when the options of flight or fight are unavailable as a coping/defense mechanism, one freezes (freeze response). In this situation, the ability to master one’s movements is damaged and, in radical cases, the ability to move is lost altogether; as a result the sensorimotor loop may collapse. This, in turn, leads to distorted perception and, in consequence, memory disorders may develop.
The multiple pursuers and evaders game may be represented as a Markov game. Using this modeling, one may interpret each player as a decentralized unit that has to work independently in order to complete a task. This is a distributed multiagent decision problem and several different possible solutions have already been proposed. However, most solutions require some sort of central coordination. In this paper, we intend to model each player as a learning automaton and let them evolve and adapt in order to solve the difficult problem they have at hand. We are also going to show that, using the proposed learning process, the players’ policies will converge to an equilibrium point. Simulations of such scenarios with multiple pursuers and evaders are presented in order to show the feasibility of the approach.
Aggregation in swarm robotics is referred to as the gathering of spatially distributed robots into a single aggregate. Aggregation can be classified as cue-based or self-organized. In cue-based aggregation, there is a cue in the environment that points to the aggregation area, whereas in self-organized aggregation no cue is present. In this paper, we proposed a novel fuzzy-based method for cue-based aggregation based on the state-of-the-art BEECLUST algorithm. In particular, we proposed three different methods: naïve, that uses a deterministic decision-making mechanism; vector-averaging, using a vectorial summation of all perceived inputs; and fuzzy, that uses a fuzzy logic controller. We used different experiment settings: one-source and two-source environments with static and dynamic conditions to compare all the methods. We observed that the fuzzy method outperformed all the other methods and it is the most robust method against noise.
Eastern gray squirrels produce alarm calls—vocalizations used in the presence of danger that influence the behavior of some receivers. This influence is possible because the alarm calls’ rate, duration, and structure contain information about the threat and the caller. Gray squirrels’ mix of different structural call types (kuks and quaas) could contain information on potential internal influences within the squirrel. Hidden Markov models (HMMs) are ideal tools to investigate whether hidden states explain the frequencies of kuks versus quaas throughout an alarm call sequence. In this study, we compare the ability of an iid (independent and identically distributed) model and two- to six-state HMMs to represent observed sequences of kukking, quaaing, and periods of silence. Audio recordings of 44 gray squirrels were collected and the first 30 s of each alarm call sequence was coded based on spectrograms. A number of HMMs were fitted, and the overall fit of the observed sequences to each model was assessed using Akaike’s Information Criterion (AIC) and Bayesian Information Criterion (BIC), and Monte Carlo methods. The five-state HMM fit the observed call frequencies better than the other models, suggesting that the squirrels’ alarm calling sequences are influenced by a more complex temporal sequencing of acoustic units.
The learning walks of ants are an excellent opportunity to study the interaction between brain, body and environment from which adaptive behaviour emerges. Learning walks are a behaviour with the specific function of storing visual information around a goal in order to simplify the computational problem of visual homing, that is, navigation back to a goal. However, it is not known at present why learning walks take the stereotypical shapes they do. Here we investigate how learning-walk form, visual surroundings and the interaction between the two affect homing performance in a range of virtual worlds when using a simple view-based homing algorithm. We show that the ideal form for a learning walk is environment-specific. We also demonstrate that the distant panorama and small objects at an intermediate distance, particularly when the panorama is obscured, are important aspects of the visual environment both when determining the ideal learning walk and when using stored views to navigate. Implications are discussed in the context of behavioural research into the learning walks of ants.
One of the main assertions of sensorimotor contingency theory is that sensory experience is not generated by activating an internal representation of the outside world through sensory signals, but corresponds to a mode of exploration and hence is an active process. Perception and sensory awareness emerge from using the structure of changes in the sensory input resulting from these exploratory actions, called sensorimotor contingencies (SMCs), for planning, reasoning, and goal achievement. Using a previously developed computational model of SMCs we show how an artificial agent can plan ahead with SMCs and use them for action guidance. Our main assumption is that SMCs are associated with a utility for the agent, and that the agent selects actions that maximize this utility. We analyze the properties of the resulting actions in a robot that is endowed with several sensory modalities and controlled by our model in a simple environment. The results demonstrate that its actions avoid aversive events, and that it can achieve a low-level form of spatial awareness that is resilient to the complete loss of a sensory modality.
Reinforcement learning (RL) in the context of artificial agents is typically used to produce behavioral responses as a function of the reward obtained by interaction with the environment. When the problem consists of learning the shortest path to a goal, it is common to use reward functions yielding a fixed value after each decision, for example a positive value if the target location has been attained and a negative value at each intermediate step. However, this fixed strategy may be overly simplistic for agents to adapt to dynamic environments, in which resources may vary from time to time. By contrast, there is significant evidence that most living beings internally modulate reward value as a function of their context to expand their range of adaptivity. Inspired by the potential of this operation, we present a review of its underlying processes and we introduce a simplified formalization for artificial agents. The performance of this formalism is tested by monitoring the adaptation of an agent endowed with a model of motivated actor–critic, embedded with our formalization of value and constrained by physiological stability, to environments with different resource distribution. Our main result shows that the manner in which reward is internally processed as a function of the agent’s motivational state, strongly influences adaptivity of the behavioral cycles generated and the agent’s physiological stability.
In this paper we investigate whether selective attention enables the development of action selection (i.e. the ability to select among conflicting actions afforded by the current agent/environmental context). By carrying out a series of experiments in which neuro-robots have been evolved for the ability to forage so to maximize the energy that can be extracted from ingested substances we observed that effective action and action selection capacities can be developed even in the absence of internal mechanisms specialized for action selection. However, the comparison of the results obtained in different experimental conditions in which the robots were or were not provided with internal modulatory connections demonstrate how selective attention enables the development of a more effective action selection capacity and of more effective and integrated action capacities.
Efficient skill acquisition is crucial for creating versatile robots. One intuitive way to teach a robot new tricks is to demonstrate a task and enable the robot to imitate the demonstrated behavior. This approach is known as imitation learning. Classical methods of imitation learning, such as inverse reinforcement learning or behavioral cloning, suffer substantially from the correspondence problem when the actions (i.e. motor commands, torques or forces) of the teacher are not observed or the body of the teacher differs substantially, e.g., in the actuation. To address these drawbacks we propose to learn a robot-specific controller that directly matches robot trajectories with observed ones. We present a novel and robust probabilistic model-based approach for solving a probabilistic trajectory matching problem via policy search. For this purpose, we propose to learn a probabilistic model of the system, which we exploit for mental rehearsal of the current controller by making predictions about future trajectories. These internal simulations allow for learning a controller without permanently interacting with the real system, which results in a reduced overall interaction time. Using long-term predictions from this learned model, we train robot-specific controllers that reproduce the expert’s distribution of demonstrations without the need to observe motor commands during the demonstration. The strength of our approach is that it addresses the correspondence problem in a principled way. Our method achieves a higher learning speed than both model-based imitation learning based on dynamics motor primitives and trial-and-error-based learning systems with hand-crafted cost functions. We successfully applied our approach to imitating human behavior using a tendon-driven compliant robotic arm. Moreover, we demonstrate the generalization ability of our approach in a multi-task learning setup.
Game theory is commonly used to study social behavior in cooperative or competitive situations. One socioeconomic game, Stag Hunt, involves the trade-off between social and individual benefit by offering the option to hunt a low-payoff hare alone or a high-payoff stag cooperatively. Stag Hunt encourages the creation of social contracts as a result of the payoff matrix, which favors cooperation. By playing Stag Hunt with set-strategy computer agents, the social component is degraded because of the inability of subjects to dynamically affect the outcomes of iterated games, as would be the case when playing against another subject. However, playing with an adapting agent has the potential to evoke unique and complex reactions in subjects because of its ability to change its own strategy based on its experience over time, both within and between games. In the present study, 40 subjects played the iterated Stag Hunt with five agents differing in strategy: exclusive hare hunting, exclusive stag hunting, random, Win-Stay-Lose-Shift, and adapting. The results indicated that the adapting agent caused subjects to spend more time and effort in each game, exhibiting a more complicated path to their destination. This suggests that adapting agents exhibit behavior similar to human opponents, evoking more natural social responses in subjects.
Cockroach shelter-seeking strategy may initially look like an undirected random search, but we show that they are attracted to darkened shelters. They arrive at a shelter in about half the time control cockroaches take to reach the same location with no shelter present. We were able to identify six statistically significant trends from the behavior of 134 cockroaches in 1-min naïve walking trials with four different shelter configurations. By combining these trends into a model, we built a stochastic algorithm that significantly biases a simulated agent toward a target location. We call this model RAMBLER (Randomized Algorithm Mimicking Biased Lone Exploration in Roaches). RAMBLER could be adapted for a mobile robot equipped with an onboard camera and antenna-like contact sensors.
To follow a goal-directed behavior, an autonomous agent must be able to acquire knowledge about the causality between its motor actions and corresponding sensory feedback. Since the complexity of such sensorimotor relationships directly influences required cognitive resources, this work proposes that it is of importance to keep the agent’s sensorimotor relationships simple. This implies that the agent should be designed in a way such that sensory consequences can be described and predicted in a simplified manner. Living organisms implement this paradigm by adapting sensory and motor systems specifically to their behavior and environment. As a result, they are able to predict sensorimotor consequences with a strongly limited amount of (expensive) nervous tissue. In this context, the present work proposes that advantageous artificial sensory and motor layouts can be evolved by rewarding the ability to predict self-induced stimuli through simple sensorimotor relationships. Experiments consider a simulated agent recording realistic visual stimuli from natural images. The obtained results demonstrate the ability of the proposed method to (i) synthesize visual sensorimotor structures adapted to an agent’s environment and behavior, and (ii) serve as a computational model for testing hypotheses regarding the development of biological visual sensorimotor systems.
Using a simulated driving station, 36 participants applied the brake as quickly as possible following the activation of a red light under each of six conditions including (1) the control (braking only), (2) 72 dBA music stimulus, (3) 86 dBA music stimulus, (4) cell phone conversation, (5) cell phone conversation and 72dBA music, and (6) cell phone conversation and 86dBA music. Participants were distracted by the cell phone conversation, as demonstrated by slower response time and reaction time (RT). The addition of the music stimulus, even at 86 dBA, did not exacerbate the deficits. Braking movement time was faster, and peak braking force greater when the cell phone conversation was present than when it was absent. Participants appear to have anticipated impaired RT and adapted unconsciously by executing a more rapid movement to the brake pedal. Also, participants appear to have compensated for slower RT by applying greater braking force. The adaptive behavior observed in the experiment is discussed in the context of unconscious goal pursuit and neuromotor noise theory.
Humans can effortlessly perceive an object they encounter for the first time in a possibly cluttered scene and memorize its appearance for later recognition. Such performance is still difficult to achieve with artificial vision systems because it is not clear how to define the concept of objectness in its full generality. In this paper we propose a paradigm that integrates the robot’s manipulation and sensing capabilities to detect a new, previously unknown object and learn its visual appearance. By making use of the robot’s manipulation capabilities and force sensing, we introduce additional information that can be utilized to reliably separate unknown objects from the background. Once an object has been identified, the robot can continuously manipulate it to accumulate more information about it and learn its complete visual appearance. We demonstrate the feasibility of the proposed approach by applying it to the problem of autonomous learning of visual representations for viewpoint-independent object recognition on a humanoid robot.