### I. Introduction

### II. Methods

### 1. Data Source

Step 1: collecting questionnaire-based information about health history and behavioral information;

Step 2: using standardized physical measurements to collect physical and physiological data;

Step 3: taking blood samples for biochemical measurement and laboratory examinations of lipids and glucose status that were performed by trained personnel.

### 2. Data Pre-processing and Dealing with Missing Values

*p*= 0.561), persons with at least one missing variable were removed from analysis. Table 1 shows the demographic and clinical characteristics of the participants.

### 3. Data Mining Algorithms

#### 1) Neural networks

*f*(

*x*) = 1 / (1 + exp(

*-x*))) in the hidden layer and a linear function (

*x*'s are predictor variables and

_{i}*w*'s are input weights) in the output layer. The functional form of the MLP can be written as

_{ij}*x*is the

_{i}*i*-th nodal value in the previous layer,

*y*is the

_{j}*j*-th nodal value in the present layer,

*b*is the bias of the

_{j}*j*-th node in the present layer,

*w*is a weight connecting

_{ji}*x*and

_{i}*y*,

_{j}*N*is the number of nodes in the previous layer, and

*f*is the activation function in the present layer [23].

#### 2) Support vector machines

*b*and ${\left\{{\mathit{w}}_{\mathit{i}}\right\}}_{\mathit{i}=1}^{\mathit{D}}$ denote coefficients that have to be estimated from the data, and ${\left\{\left({\mathit{x}}_{\mathit{i}},{\mathit{y}}_{\mathit{i}}\right)\right\}}_{\mathrm{i}=1}^{\mathrm{N}}$ are a set of samples where

*y*∈{+1, -1}.

_{i}*y*(

_{i}*w*(

^{T}.Φ*w*)+

_{i}*b*)≥1-ξ

*, where ξ*

_{i}*≥*

_{i}*0*,

*i*=

*1*, ..., n, where the training data are mapped to a higher dimensional space by the function

*Φ*, and

*C*is a user-defined penalty parameter on the training error that controls the trade-off between classification errors and the complexity of the model. Therefore, the decision function (predictor) is

*f*(

*x*)=sign(

*w*(

^{T}Φ*x*)+

*b*), where x is any testing vector [11].

*K*(

*x*,

_{i}*x*)=

_{j}*Φ*(

*x*)

_{i}*(*

^{T}Φ*x*), and the most widely used four kernel functions are the linear (

_{j}*K*(

*x*,

_{i}*x*)=

_{j}*x*

_{i}

^{T}*x*); radial basis function (RBF) (

_{j}*K*(

*x*,

_{i}*x*) = exp(-γ∥

_{j}*x*-

_{i}*x*∥

_{j}^{2}), where γ is the kernel parameter); polynomial (

*K*(

*x*,

_{i}*x*) = (

_{j}*x*

_{i}

^{T}*x*+1)

_{j}*,*

^{d}*d*> 0 is the degree of the polynomial kernel); and sigmoid (

*K*(

*x*,

_{i}*x*) = tanh(

_{j}*w*

_{i}

^{T}*x*+1)).

_{j}#### 3) Random forests

#### 4) Fuzzy c-mean

*m*<

*a*, where

*m*is any real number greater than 1 which is called the fuzziness index and controls the fuzziness of membership of each observation,

*x*is the

_{i}*i*-th component of

*d*-dimensional observed data,

*u*is the degree of membership of

_{ij}*x*in cluster

_{i}*j*(

*u*∈[0,1], $\sum _{\mathit{j}=1}^{\mathit{c}}{\mathit{u}}_{\mathit{ij}}=1$ ∀

_{ij}*i*=1,2,...,

*n*, ∀

*j*=1,2,...,

*c*),

*c*is the

_{j}*d*-dimension center of the cluster, and ∥∥ is any norm, such as Euclidean distance expressing the similarity between any observed data and the center of cluster [2].