ON CALCULATING THE VALUE OF A DIFFERENTIAL GAME IN THE CLASS OF COUNTER STRATEGIES

For a linear dynamic system with control and disturbance, a feedback control problem is considered, in which the Euclidean norm of a set of deviations of the system’s motion from given targets at given times is optimized. The problem is formalized into a differential game in “strategy-counterstrategy” classes. A game value computing procedure, which reduces the problem to a recursive construction of upper convex hulls of auxiliary functions, is justified. Results of numerical simulations are presented.


Introduction
In this paper a linear dynamical system subjected to actions of control and disturbance is considered. A feedback control problem with quality index optimization is posed. The quality index is given in the form of the Euclidean norm of a set of deviations of the system's motion from given targets at given instants of time. The "saddle point condition in a small game" [1, p. 79] (see also [5, p. 46]) also known as the Isaacs condition [2] (see further inequality (2.7)) is not assumed. Withing the game-theoretic approach [1][2][3][4][5][6][7][8][9][10] the problem is formalized into a positional differential game in "strategy-counter strategy" classes (see, e. g., [1, p. 78], [5, p. 20]).
Basing on methods from [4,5], a procedure that reduces the considered problem under condition (2.7) to recurrent constructions of upper convex hulls of auxiliary functions was given in works [9,10]. In the present paper, the applicability of that procedure is proved for the case when condition (2.7) is not imposed. To achieve this, we follow the idea of unification of differential games [3] and use constructions of characteristic inclusions from the theory of minimax solutions of Hamilton-Jacobi equations [6] (see also [7,8]).
Results of numerical simulations are presented.

Problem Statement
Consider a dynamical system described by the following equation: Here t is time, x is a phase vector, u is a control vector, v is a disturbance vector; t 0 and ϑ are fixed instants of time (t 0 < ϑ); P and Q are given compact sets; matrix function A(t) is continuous on [t 0 , ϑ], vector function f (t, u, v) is continuous on [t 0 , ϑ] × P × Q. A current position of system (1.1) is a pair (t, x) ∈ [t 0 , ϑ] × R n . Denote Here and further the symbol · denotes the Euclidean vector norm. Define a set K of possible positions: where R 0 > 0 is some fixed number. Let a position (t * , x * ) ∈ K, t * < ϑ, and an instant t * ∈ (t * , ϑ] be given. We assume that admissible control and disturbance realizations are Borel measurable From the position (t * , x * ), such realizations uniquely generate the motion of system (1.1) as an absolutely continuous vector-function The aim of the control is to make quality index γ (1.4) as small as possible. While solving this problem, it is convenient to consider a problem of forming the most unfavorable from the control's point of view disturbance actions aimed at maximizing γ.
According to [1, p. 75; 5, p. 51], these two problems may be united into an antagonistic positional differential game of two players in "strategy-counter strategy" classes. A control action u is interpreted as an action of the first player, a disturbance action v is interpreted as an action of the second player. Admissible strategy u(·) of the first player is an arbitrary function Admissible counter strategy of the second player is an arbitrary function which for fixed (t, x) ∈ K, ε > 0 is Borel measurable with respect to u ∈ P. Here ε > 0 is the accuracy parameter (see., e.g., [1, p. 68], [5, p. 47]).

Procedure for Calculating the Game Value
In accordance with [10], consider the following procedure for calculating the value of differential game (1.1), (1.4). Let t * ∈ [t 0 , ϑ). Assign a partition of the time segment [t * , ϑ] : In further considerations of partitions like (2.1) we will assume that it contains the instants Here and further the symbol ·, · denotes the inner product of vectors.
Step by step, in the reverse order, starting from the last point of the partition ∆ k (2.1), define sets G j (t * , τ j ± 0) of vectors m ∈ R n and scalar functions ϕ where the upper index denotes the matrix transposition. Further constructions are carried out according to the following recurrent relations. Assume that for j + 1, 1 j k, the sets G j+1 (t * , τ j+1 ± 0) and the functions ϕ j+1 (t * , τ j+1 ± 0, m), m ∈ G j+1 (t * , τ j+1 ± 0), are already defined. Then, for the current j, let us define where the symbol ψ(·) * G denotes the upper convex hull of the function ψ(·) on the set G, i.e. the minimal concave function that majorizes ψ(·) for m ∈ G.
Next, if the instant τ j is not equal to any of the instants t [i] from (1.4), then we set where maximum is calculated over all such triples {ν, m * , l} that according to (2.3) correspond to the given vector m ∈ G j (t * , τ j − 0). Let us denote For t * = ϑ, we formally assume that ∆ k denotes a degenerate partition which contains only (2.5) Theorem 1. For any number ξ > 0 there exists a number δ > 0 such that, for any initial (1.4) are contained in this partition, the following inequality holds (2.6) In paper [10] the statement of this theorem was proved under the assumption that the following saddle point condition in a small game holds: The aim of this paper is to prove Theorem 1 without using condition (2.7).

The u-and v-stability properties of the value e(·)
In paper [10] inequality (2.6) is proved on the basis of the u-and v-stability properties of value e(·) (2.4) with respect to system (1.1). But in the case when condition (2.7) does not hold, some stricter u-stability property is necessary (see, e.g., [1, p. 208]). If one tries to prove this stricter property by following the scheme from [10], there arise the following substantial problems. When the control action v is formed in response to admissible realizations of u = u(t) by the rule v = v * (u(t)), where the function v * : P → Q is Borel measurable, the reachable set of system (1.1) may lack compactness. That is why further we consider an auxiliary z-model, establish proximity of motions of system (1.1) and the z-model, and prove an appropriate u-stability property of the value e(·) with respect to the z-model. Property of v-stability does not depend on condition (2.7), that is why further we use this property as it was stated in [10].
Let S ⊂ R n be a unit sphere and q ∈ S. Motions of the auxiliary z-model are described by the following differential inclusion Here λ 2 0 is the constant from (1.2). Note that similar differential inclusions are considered in order to define minimax solutions of Hamilton-Jacobi equations (see, e.g., [6, p. 14], [8]). A position of z-model (3.1) is a pair (t, z) ∈ [t 0 , ϑ] × R n . Define a set K z of possible positions of the z-model: where α > 0 is some fixed number, and λ 0 is the constant defined in (1.2). It can be proved that for any (t, z, q) ∈ [t 0 , ϑ] × R n × S the set F * (t, z, q) is nonempty, convex and compact in R n , and the multivalued mapping [t 0 , ϑ] × R n × S (t, z, q) → F * (t, z, q) ⊂ R n is continuous in the Hausdorff metric. Therefore (see, e.g., [11]), for any position (t * , z * ) ∈ K z , t * < ϑ, and any t * ∈ (t * , ϑ] and q ∈ S differential inclusion (3.1) has at least one solution z[t * [·]t * ] = {z(t) ∈ R n , t * t t * } that satisfies the equality z(t * ) = z * . Each such solution determines a motion of z-model (3.1) that starts from the position (t * , z * ). For any such motion an inclusion (t, z(t)) ∈ K z , t ∈ [t * , t * ], is valid. Moreover, according to [11], for any fixed q the reachability set of differential inclusion (3.1) at the instant t * from the position (t * , z * ) is a convex compact set in R n .
Lemma 2 (property of u-stability with respect to the z-model). Let (t * , z * ) ∈ K z , t * < ϑ and a partition ∆ k (2.1) is chosen. Let t * = τ 2 be the second instant of the partition ∆ k . Then for any q * ∈ S there exists a motion z[t * [·]t * ] of z-model (3.1), for q = q * , that starts from the initial position (t * , z * ), such that the following inequality holds Here ∆ * k * is a partition of the time segment [t * , ϑ], induced by the instants from the partition ∆ k : P r o o f of this lemma is similar to the proof of the u-stability property from [10] with a replacement of the reachability set of system (1.1) by the reachability set of differential inclusion (3.1).

Lemma 3 (property of v-stability).
Let (t * , x * ) ∈ K, t * < ϑ, and a partition ∆ k (2.1) is chosen. Let t * = τ 2 be the second instant of the partition ∆ k and ∆ * k * be partition (3.7). Then for any control realization u * [t * [·]t * ) = {u * (t) = u * ∈ P, t * t < t * } there exists such an admissible disturbance realization v[t * [·]t * ), that for a motion x[t * [·]t * ] of system (1.1) generated from the position (t * , x * ) by these realizations the following inequality holds P r o o f of this lemma is given in [10]. In the proof of Theorem 1 the following fact from [10] is used: For any position (t * , z * ) ∈ K z , t * < ϑ, and partition ∆ k (2.1), the following relations hold (3.8) where the value of h(t * ) is determined according to (1.5).
For j = 1 inequality (4.5) is derived from relation (4.2). Given that inequality (4.5) is valid for j, 1 j k, let us prove it for j + 1. Choose a vector q e j = q e j (τ j , x(τ j ), ε) ∈ S from the condition By Lemma 2 for q = q e j there exists such a motion z (j) [τ j [·]τ j+1 ] of z-model (3.1) that starts from the position (τ j , z j ) and for which the following inequality holds (4.7) By Lemma 1, due to the choice of the number δ * > 0, taking defenition (4.4) of strategy u e (·), choice (4.6) of the vector q e j and inequality (4.3) into account, we obtain Hence, taking into consideration definition (4.2) of accompanying points, we derive From (4.7) and (4.8) we conclude , then h(τ j+1 ) = h(τ j ) and the validity of inequality (4.5) for j + 1 follows from inequality (4.9), equality (3.8) and the induction hypothesis.
R e m a r k 1. In a similar way with clear modifications it can be checked that if in the procedure in definition of function ∆ψ j (·) (2.2) the operations of minimum and maximum are exchanged, then value e(·) (2.4), constructed on the basis of such modified procedure, will approximate the function of the value of differential game (1.1), (1.4) in classes of "counter strategies -strategies". R e m a r k 2. On the basis of value e(·) (2.4) by means of the extremal shift to accompanying points [1,5] one can construct ζ-optimal control laws of the players (see [5,13]), that guarantee inequalities (1.6) and (1.7).

Example
The example considered below is based on a model problem from [12, p. 49-58] (see also [5, section 38]). Consider a dynamical system described by the following equation Initial condition x(0) = (1, −1, 1, 1), and quality index are given. The control problem for system (5.13) with quality index (5.14) was solved by means of constructions described above. Results of numerical modeling are the following. In numerical experiments we used uniform partition of time segment [0, 4] with the step δ = 0.02 and the value of accuracy parameter ε = 0.2. The a priori calculated value of differential game (5.13), (5.14) in classes "strategies -counter strategies" was ρ u ≈ 2.46, while in classes "counter strategies -strategies" was ρ v ≈ 1.52. In the picture on the left the narrow curve depicts the motion trajectory of system (5.13) which was formed in the result of actions of ζ-optimal control laws of the first and the second players in On Calculating the Value of a Differential Game in the Class of Counter Strategies 47 classes "strategies -counter strategies". The realized value of quality index (5.14) was γ = | − 1.55 + 0.5| 2 + | − 0.91 + 2| 2 + | − 1.16| 2 + |0.75 − 2| 2 1/2 ≈ 2.28 ≈ ρ u .
In the picture on the right, the narrow curve depicts the motion trajectory of system (5.13) that was formed in the result of actions of ζ-optimal control law of the second player in classes "strategies -counter strategies", while the control actions of the first player were chosen randomly. The realized value of the quality index was γ ≈ 4.51 > ρ u . The thick curve depicts the motion trajectory that was formed in the result of actions of ζ-optimal control law of the first player in classes "counter strategies -strategies", while the control actions of the second player were chosen randomly. The realized value of the quality index was γ ≈ 0.12 < ρ v .
The targets are shown in the pictures by small black squares. Points on the trajectories correspond to the moments of motion quality evaluation.