Design of Activation Functions for Inference of Fuzzy Cognitive Maps: Application to Clinical Decision Making in Diagnosis of Pulmonary Infection

In Keun Lee; Hwa Sun Kim; Hune Cho

doi:10.4258/hir.2012.18.2.105

Abstract

Objectives

Fuzzy cognitive maps (FCMs) representing causal knowledge of relationships between medical concepts have been used as prediction tools for clinical decision making. Activation functions used for inferences of FCMs are very important factors in helping physicians make correct decision. Therefore, in order to increase the visibility of inference results, we propose a method for designing certain types of activation functions by considering the characteristics of FCMs.

Methods

The activation functions, such as the sinusoidal-type function and linear function, are designed by calculating the domain range of the functions to be reached during the inference process of FCMs. Moreover, the designed activation functions were applied to the decision making process with the inference of an FCM model representing the causal knowledge of pulmonary infections.

Results

Even though sinusoidal-type functions oscillate and linear functions monotonously increase within the entire range of the domain, the designed activation functions make the inference stable because the proposed method notices where the function is used in the inference. And, the designed functions provide more visible numeric results than do other functions.

Conclusions

Comparing inference results derived using activation functions designed with the proposed method and results derived using activation functions designed with the existing method, we confirmed that the proposed method could be more appropriately used for designing activation functions for the inference process of an FCM for clinical decision making.

I. Introduction

Various artificial intelligence techniques have been used for decision making in medical activities, such as diagnosis and therapy recommendations [1]. However, these techniques can be useful when the physician's knowledge is well represented in terms of computer realization and use. However, a large number of parameters such as medical symptoms and laboratory test results make it difficult for these techniques to be implemented using computers [2]. To overcome this problem, over the past few decades, various methods, e.g., neural networks [3,4], Bayesian networks [5], ontology [6,7], fuzzy cognitive maps (FCMs) [2,8,9], have been proposed to represent a physician's knowledge and to support clinical decision making. In particular, FCMs [2,8-13] can efficiently handle complex modeling problems when assessing clinical decision making tasks [2]. FCMs represent causal knowledge between events and are used as tools that can predict results for current states of events by inference. FCMs have been used in not only clinical decision making but also system control, game theory, information analysis, etc. [10]. The advantages of FCMs are that FCMs can be represented in a matrix form and their inference process involves numerical matrix computations. During the inference process, the value of nodes in FCMs could be out of the range of [0, 1]; therefore, activation functions are used to keep the value of the nodes within the range. Therefore, the activation function used for the inference of FCMs is an important factor in determining the results of the inference.

Several research efforts have been conducted on activation functions for the inference process of FCMs [12-14] where a sigmoid function, hyperbolic tangent function, step function, and threshold linear function have been considered as activation functions, and [13] showed that the sigmoid function offers significantly greater advantages than the other functions. Moreover, Lee and Kwon [12] suggested a method for determining the λ value of a sigmoid function, as shown in Equation (1), to design an activation function that adapts to an FCM model.

(1)

During inference, the concept values of FCMs are restricted to v∈(0, 1) by the sigmoid function. One of the characteristics of a sigmoid function is that its domain is (-∞, ∞), whereas the range of the function is (0, 1), i.e., we cannot obtain a "0" or "1" as a concept value after inference. Moreover, while a sigmoid function using λ = 5 [13] is known to be a good degree for normalization in [0, 1], the slope of a function of around χ = 0 is greatly different from that of χ = 1. Therefore, a sigmoid function is not suitable for use as a normalization function.

As shown in Figure 1, the sinusoidal-type function shown in Equation (2) at interval [-βπ/2, βπ/2], and the linear function shown in Equation (3) are better normalization functions than a sigmoid function. Moreover, the range of a sinusoidal-type function is [0, 1], where the domain is restricted to the interval [-βπ/2, βπ/2], and therefore, we can obtain "0" and "1" as the concept values after inference.

(2)

(3)

Therefore, in this paper, we propose a method for designing activation functions for the inference of FCMs, which is different from the method suggested in [12]. Moreover, we apply the designed function to a clinical decision regarding the prediction of a pulmonary infection model [2].

II. Methods

1. Model Description and Preliminaries

For convenience, we will use the following notations and definitions throughout this paper.

Notations. N, R, Rⁿ, and R^n×m denote a set of natural numbers, a real number space, a real n-space, and a set of real n×m matrices, respectively. The superscript "T" denotes a vector and matrix transposition (i.e., if u∈Rⁿ, then u^T = [u_i]_1≤i≤n, and if A=[a_ij]_n×m∈R^n×m, then A^T = [a_ij]_m×n, where 1≤i≤n, 1≤j≤m, and n, m∈N). For all u∈Rⁿ, let ∥u∥ denote the Euclidean vector norm (i.e., ∥u∥ = (u^T·u)^1/2). For all A∈R^n×m, let ∥A∥ denote the spectral norm (i.e., ∥A∥= (the maximum eigenvalue of A^T·A)^1/2). If hir-18-105-i001

is a state vector of a system, then hir-18-105-i002

denotes an equilibrium state vector of the system. If f:R→R, then f'(·) and f^-1(·) are the first derivative and inverse function of f(·), respectively. If u∈[u₁, u₂] for any u, u₁, u₂∈R, then I_max (u) and I_min (u) stand for the maximum and minimum values of u, respectively, i.e., I_max(u)=u₂ and I_min(u)=u₁.

The following descriptions show mathematical models representing the characteristics and inference process of an FCM as defined in [9,12,15].

Definition 1 (Components of FCM). (refer to [15]) Suppose C_i and C_j are concepts in an FCM, and v_i and v_j are the values of C_i and C_j belonging to [0, 1], respectively, when i, j∈N = {1,2,..., n} and n∈N is the number of concepts. Then, weight w_ij is defined as a real number in [-1, 1]. We deem the weight as positive, negative, or having no causality from C_i to C_j when w_ij>0, w_ij<0, and w_ij=0, respectively.

Definition 2 (Inference process of FCM). (refer to [15]) For every i∈N and any j∈N, let C_i be the causal concepts that influence concept C_j. Then, for every j∈N and all iteration steps of k≥0 during inference process of the FCM,

(4)

where ρ₁, ρ₂∈(0, 1] and hir-18-105-i003

∈[0, 1] represents the value of C_j at the k -th iteration step. Moreover, f:R→R is an activation function to restrict hir-18-105-i004

into the interval [0, 1]. Equation (4) is also represented in vector form as

(5)

where

and w=[w_ij]_n×n, where 1≤i, j≤n, and are called a state vector and weight matrix, respectively. Moreover, hir-18-105-i006

As in [12], we transform the model from Equations. (4) and (5) into the form described in the following definition.

Definition 3 (Transformation). (refer to [12]) For every j∈N and all k≥0, let hir-18-105-i007

and

in Equation (4); then,

(6)

where

We also consider unipolar, nonlinear, and continuous functions as activation functions of FCMs. Therefore, we assume that the activation functions satisfy the following conditions:

Assumption 1. The function f:R→R is bounded; i.e., 0≤f(u)≤M for all u∈R and any M∈R such that M>0.

Assumption 2. The function f:R→R satisfies the Lipschitz condition with a Lipschitz constant, L > 0; i.e., |f(u₁)-f(u₂)|≤L|u₁-u₂| for all u₁, u₂∈R.

Lee and Kwon [12] suggested a bound of L of activation functions satisfying Assumptions 1 and 2, as shown in Equation (7), which guarantees the global exponential stability of Equation (5) during an inference process of an FCM.

(7)

Moreover, as shown in Equation (8), the bound of λ was derived using Equation (7) and the property in which the maximum value of the derivative of a sigmoid function occurs when χ = 0.

(8)

Consequently, the λ value are determined by adapting the weight matrix w as an FCM model and the sigmoid function whose λ value satisfies inequality (8) guarantees the stability of Equation (5).

2. Design of Sinusoidal-Type Activation Functions

As mentioned in the introduction, a sinusoidal-type function may not be appropriate to an activation function because it is oscillated in the bound of domain (-∞, ∞). From another viewpoint, the sinusoidal-type function, shown in Equation (2), within the bound of domain [-βπ/2, βπ/2], could be better than a sigmoid function as a normalization function, because a sinusoidal-type function is a monotonous increase function that has a gentler slope than a sigmoid function does. Also, the range of a sinusoidal-type function is [0, 1], which is different than a sigmoid function whose range is (0, 1). Therefore, we need to find the value of β. Intuitively, βπ/2 and -βπ/2 may be the maximum and minimum values that χ can reach during inference, respectively. Consequently, finding the maximum and minimum values of χ is a way to find the value of β. Since a sinusoidal-type function will be used as an activation function for Equation (5) in this paper, the domain values will be the elements of vector v^(k) in Definition 3. To find the bound of x^(k) as the result of inference of an FCM using Equation (5), we give following lemmas.

Lemma 1. Let x^(k) and M be the same as in Definitions 3 and Assumption 1, respectively. Then, for all k≥0 , the following inequality is satisfied.

(9)

Proof. Let the right term of inequality (6) be φ(x) as follows:

(10)

If we use the Euclidean norm for both terms of Equation (10), we can then derive the following inequality.

Therefore, we have

(11)

If we suppose there exists x^(-1) such that v⁽⁰⁾=f(x^(-1)), then inequality (11) is represented as Equation (9) because of hir-18-105-i010

from Assumption 1. □

Lemma 2. Let hir-18-105-i011

be the same as in Definition 3. Then, for all 1≤j≤n, hir-18-105-i012

Proof. Consider a vector hir-18-105-i013

. Then,

. Here, |

| is the maximum absolute value among other elements within the unit circle in the vector norm. Thus, from inequality (11), we have hir-18-105-i012

. □

We can know the domain range of a sinusoidal-type activation function used in the inference process of FCMs through Lemmas 1 and 2. Therefore, we give following theorem for the design of the sinusoidal-type activation function.

Theorem 1. Let hir-18-105-i011

and M be the same as in Definition 3 and Assumption 1. If there exists an inverse function, f^-1(·), of an activation function, f(·), the following equation is satisfied for all k≥0 and any j∈N.

(12)

Proof. If hir-18-105-i015

is the j-th element of x^(k), by Lemma 2 we know the range of hir-18-105-i015

to be

. Also,

can be represented as hir-18-105-i016

because f(·) is invertible. Therefore, we finally have hir-18-105-i017

. □

Note 1. If we know the range of hir-18-105-i015

and

for all k≥0 and j∈N, we can design activation functions satisfying Assumptions 1 and 2 by assigning I_max( hir-18-105-i015

) and I_min( hir-18-105-i015

) to I_max( hir-18-105-i004

) and I_min( hir-18-105-i004

), respectively.

The following corollary shows how to actually design a sinusoidal-type activation function using Theorem 1.

Corollary 1. Let β be the same as in Equation (2) and ρ₁, ρ₂, w, n, and M be the same as in Theorem 1. Then, in the designed sinusoidal-type activation function, the value of β is

(13)

Proof. We can easily derive the following equation, which is the inverse function of Equation (2).

Thus, we have

If I_max(

) and I_min( hir-18-105-i015

) are assigned to I_max( hir-18-105-i004

)=1 and I_min( hir-18-105-i004

)=0, respectively, then β is computed as

The following corollary shows that the designed sinusoidal-type activation function guarantees the global exponential stability of the inference process of an FCM.

Lemma 3. (refer to [12]) If hir-18-105-i015

is the same as in Definition 3, and L is a Lipschitz constant as shown in Assumption 2, then for all j∈N, |f'( hir-18-105-i015

)|≤L.

Corollary 2. The inference process of an FCM using Equation (5) is globally exponentially stable when the activation function is a sinusoidal-type, as in Equation (2), where β=1.5708/(ρ₁+ρ₂∥w∥)n^1/2.

Proof. The sinusoidal-type function of Equation (2) satisfies Assumption 1, and the maximum value of the first derivative of the function occurs when χ=0. Therefore, the range of β is calculated using inequality (7) and Lemma 3 as follows:

(14)

This guarantees the global exponential stability of Equation (5). If M = 1 in inequality (13), we finally have

This inequality is satisfied for all n∈N. Therefore, Equation (2), where β=1.5708/(ρ₁+ρ₂∥w∥)n^1/2, also guarantees the global exponential stability of Equation (5). □

We next give an example that confirms the stability of the inference process using the designed sinusoidal-type activation function.

Example 1. Let ρ₁=ρ₂=1 and M=1 in Definition 2 and Assumption 1. Also, suppose that weight matrix w and initial state vector v⁽⁰⁾ are

The following three kinds of sinusoidal-type activation functions with different values of β are considered in this example:

(i) β₁ = 0.4652, calculated by the proposed method, where n=2:

(ii) β₂=0.8376, which is within the range of β calculated by inequality (14), and guarantees the global exponential stability of the inference process,

0<β<0.83766392031338.

(iii) β₃ = 1.0870, which is out of the range of β, and does not guarantee the global exponential stability of the inference process.

Using the designed sinusoidal-type activation functions, the inferences in Equation (5) based on w and v⁽⁰⁾ are performed. After inference, the following results are obtained: vectors saturated to (i) hir-18-105-i018

= [0.9501 0.5346] and (ii) hir-18-105-i019

= [0.9369 0.6396], and vectors oscillated between (iii) hir-18-105-i020

= [0.7706 0.9451] and hir-18-105-i021

= [0.7355 0.9401].

Figure 2 shows the designed activation functions and trajectories of the concept values during inference. The results in (i) and (ii) are stable, but some oscillation is observed at the start of the trajectory in the result in (ii), as shown in Figure 2D. Otherwise, the result in (iii) is not stable. Comparing the result in (i) with those in (ii) and (iii), it is reasonable to conclude that this oscillation in the trajectories is caused by the activation functions, which are I_max(f( hir-18-105-i015

))<|

|, where j∈{1, 2}.

3. Design of Linear Activation Functions

A linear function, as shown in Equation (3), is not appropriate for an activation function for inference of FCMs, because it monotonously increases for the domain and range of (-∞, ∞). However, if we know the domain range of the linear function to be reached during inference process, we can design the linear function as an activation function. That is, the following corollary shows a way to design a linear-type activation function that satisfies the following condition.

Assumption 3. The function f:R→R is bounded; i.e., 0≤f(u)≤M for all u,u₁,u₂∈R such that u∈[u₁, u₂] and u₁<u₂, and any M∈R such that M>0.

Corollary 3. Let α be the same as in Equation (3) and ρ₁, ρ₂, w, n, and M be the same as in Theorem 1. Then, in the designed linear-type activation function, the value of α is

Proof. We derive the following equation, which is the inverse function of Equation (3).

Thus, we have

If I_max(

) and I_min( hir-18-105-i015

) are assigned to I_max( hir-18-105-i004

)=1 and I_min( hir-18-105-i004

)=0, respectively, then α is computed as

4. Design of FCM on Pulmonary Infection

To apply he designed sinusoidal-type and linear activation functions to an FCM model for a clinical decision, we refer the FCM model designed in [2] representing causal knowledge of pulmonary infections. However, the characteristics of the FCM in [2] are different with those of the FCM considered in this paper. That is, different from Definition 1, the concept values in [2] were bounded into the interval [-1, 1], and a bipolar activation function was used for the inference process. Therefore, we customize the FCM model designed in [2], as shown Figure 3, by adding seven concepts (C26-1, C27-1, C28-1, C29-1, C30-1, C31-1, and C32-1) representing the negative values of the concepts (C26, C27, C28, C29, C30, C31, and C32, respectively), as was done in [11]. For instance, if the concept value of C26 is "v_C26=-1," then D1 is affected by the amount from v_C26×w_C26,D1=-1×0.7. However, according to Definition 1, we cannot represent the concept value of "-1." To affect negative influence on D1, we create a new concept, C26-1, which applies a negative value to concept C26, i.e., if "sputum culture" is "-1" then "v_C26-1=1" because C26-1 involves the meaning of the negative concept value. Moreover, we give the weight "w_C26-1,D1=-0.7" between concepts C26-1 and D1, i.e., even though the value of C26-1 is positive, D1 is affected negatively by v_C26-1×w_C26-1,D1=1×(-0.7).

We deal with the two scenarios described in [2]. That is, this experiment aims to show the process for physician's decision that which patient is more serious in pulmonary infection, based on observed symptoms and laboratory test results of patients in the following scenarios.

Scenario 1: An immunocompromised patient (A23 = 1) with a high fever (A4 = 0.7), loss of appetite (A5 = 1), and high systolic blood pressure (A13 = 0.7), with radiologic evidence present in his/her chest x-rays (A16 = 1), a small number of WBCs (A22 = 0.4), a negative sputum culture (A26-1 = 1), and negative antigen (A32-1 = 1).

Scenario 2: An older patient (A25 = 0.8) with a low fever (A4 = 0.3), altered mental status (A12 = 0.4), high oxygen requirements (A9 = 0.8), a normal number of leukocytes-white blood cell (A22 = 0), positive sputum culture (A26 = 1), negative blood culture (A28-1 = 1), and negative gram stain (A31-1 = 1).

III. Results

According to [2], physicians made decision that the patient in scenario 1 is more serious in pulmonary infection than that in scenario 2, and the inference of FCM also shows the same result with the physicians' decision. In this experiment, therefore, we compared the results of the inference process of the FCMs with four activation functions, (i) the sigmoid function designed in [12], (ii) a sinusoidal-type function designed using the method proposed in [12], (iii) a sinusoidal-type function and (iv) a linear function designed using the proposed method. In these inferences, we provided the same stimulus as in [16].

According to the two scenarios, we created the following initial state vectors:

=[0 0 0 0.7 1 0 0 0 0 0 0 0 0 0.7 0 1 0 0 0 0 0 0.4 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0],

=[0 0 0 0.3 0 0 0 0 0.8 0 0 0.4 0 0 0 0 0 0 0 0 0 0 0 0 0.8 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0].

Figure 4 shows the designed activation function and trajectories of the values of concept "D1: Severity" at each inference of the two scenarios.

For (i), the values of concept D1 regarding scenarios 1 and 2 were converged to hir-18-105-i024

=0.9983 and hir-18-105-i025

=0.9972, respectively. As shown in Figure 4A, the slope is almost flat around the maximum values of hir-18-105-i026

and

, and thus the gap between the converged concept values is very small, at | hir-18-105-i024

|=0.0011. Even though it was difficult to make a decision based on the results in (i), the results showed that the patient in scenario 1 has a severer condition than the patient in scenario 2, which is the same result determined in [2].

For (ii), the values of concept D1 regarding scenarios 1 and 2 were converged to hir-18-105-i024

=0.5691 and hir-18-105-i025

=0.6688, respectively. However, in the designed activation function, as shown in Figure 4C, the maximum values of hir-18-105-i026

and

exceeded the value that yielded I_max(v). As a result, the patient in scenario 2 has a severer condition than the patient in scenario 1, which is contrary to (i). That is, the result is not useful for decision making.

For (iii), the values of concept D1 regarding scenarios 1 and 2 were converged to hir-18-105-i024

=0.6556 and hir-18-105-i025

=0.6398, respectively. Differing with cases (i) and (ii), the designed activation function looks almost like a linear function around the maximum values of hir-18-105-i026

and

. Moreover, the gap between the converged concept values is the largest among the results, at | hir-18-105-i024

|=0.0158. That is, comparing (i) and (iii), we can see that the results of the latter make it more convenient for a physician to make a decision and with more correct results.

For (iv), the values of concept D1 regarding scenarios 1 and 2 were converged to hir-18-105-i024

=0.5487 and hir-18-105-i025

=0.5435, respectively. The gap between the converged concept values is the better than the result in (i), at | hir-18-105-i024

|= 0.0052.

As a result, we can see that although the sinusoidal-type activation functions was designed using the method proposed in [12], it occasionally provides incorrect results for decision making. Therefore, we can determine that the method proposed in this paper is more appropriate for designing sinusoidal-type activation functions than the method proposed in [12].

IV. Discussion

There exist various methods to support physicians' clinical decision making such as fuzzy, neural networks, decision tree, and FCMs. However, the usability of the methods is strongly dependent on the features of a clinical field; because their knowledge models are different from each other and each method has its own strength and weakness. And even in the same method, the results of decision making may be different according to its knowledge model. Thus, the users in clinical field (e.g., physicians) only refer to the results from the methods when they make clinical decisions. The aim of the methods in clinical field is how clearly shows the results to the users.

In this paper, we focused on a clinical decision making based on FCMs, which are good models of the causal knowledge of relationships between medical concepts and provide prediction results based on the current status of a concept through an inference process. Therefore, activation functions used for the inference process are very important factors that support physicians in making the right decision. In other words, for physicians to make a final decision, how well the physicians' knowledge is represented as an FCM model is not the only important factor. The inference process of that model is also important in the application of clinical decision making. In general, sigmoid functions have been used as activation functions for the inference process of FCMs; the design of an activation function is greatly dependent on the experience of experts because, during inference, the slope varies considerably within the domain range of the function.

Therefore, we proposed a method for designing sinusoidal-type and linear activation functions by calculating the domain range of the activation function to be reached during the inference process of FCMs. Even though sinusoidal-type functions are oscillated and linear functions are monotonously increased within the entire range of the domain, the designed activation functions make the inference stable because the proposed method notices where the function is used in the inference. Moreover, because a sinusoidal-type function designed by the proposed method provides a gentler slope than a sigmoid function does, it can be used as a normalization function. We applied the designed functions to an FCM model that represents the causal knowledge of pulmonary infections. Comparing the activation function designed using the proposed method with activation functions designed using an existing method, we confirmed that the proposed method can be appropriately used for designing the activation functions for the inference process of an FCM for clinical decision making.

This study dealt with only two kinds of functions and limited their adaption into an example of decision making with the designed knowledge model in other study. In future research, we will consider another type of functions such as hyperbolic tangent function and apply the functions to more various FCM models in medical field.