### INTRODUCTION

_{2}will lead to increased pollen output, longer pollen seasons, and greater human exposure to pollen, noting that the strength of both allergy inducement and risk will increase. D'Amato

*et al*.8 also predicted that the pollen season will become longer as a result of global warming. Pollen allergy forecasting can be considered to be a response to this gradually increasing risk of pollen allergies, and Germany, Japan, Korea, England, the United States, and other countries now provide allergy forecasts for their respective primary pollens.

*et al*.,9 which used independent multiple regression models for 7 South Korean cities. To expand on this scope, Kim

*et al*.4 then developed a model that could make prediction covering all of South Korea through the use of a single model employing robust multiple regression based on a Weibull probability distribution. However, the expanded and unified model still underestimated pollen concentrations and could not predict high concentrations well. In addition, the pollen seasons predicted by this model were longer actually than observed. We believe that the main causes of these problems are that the training data contain more low than high concentrations and the regression model cannot properly model the nonlinear relationship between weather factors and pollen concentrations. To address this, we used a machine learning method called a deep neural network (DNN) model in this study to improve on the existing method by modeling nonlinear relationship between weather factors and pollen concentrations. In addition, a bootstrap aggregating-type ensemble model was incorporated to prevent over-fitting and underestimation of the DNN model.

*Betula*, Iglesias-Otero

*et al*.,12 who studied

*Plantago*, and Astray

*et al*.13 who studied

*Castanea*. These efforts employed models with a single hidden layer in which the networks were limited in that they were constructed with the data obtained from a single site as the target. Therefore, these models could make predictions pertaining only to that site. In addition, many of these ANN models such as the one proposed by Astray

*et a*l.,13 utilized pollen concentration data observed on the previous day, which are generally not available for daily operational forecast.

### MATERIALS AND METHODS

### Pollen observational network in Korea

^{−3}) based on the daily total amount of air intake (10L min

^{−1}), the intake area (14 mm × 2 mm), the collection tape's daily impact area (14 mm × 48 mm), and the observed area under the microscope.

^{−3}) based on the daily pollen concentration and daily allergic reaction reported by sensitized allergy patients.4

### Weather data

*et al*.,4 the weather variables used in our analysis were daily maximum temperature, daily minimum temperature, growing degree day (

*GDD*: accumulated mean temperature above 0°C from January 1st), difference of

*GDD*(

*dGDD*: difference in

*GDD*between current and the previous day), daily mean relative humidity, daily mean wind speed, and daily precipitation. To describe the nonlinear relationship between the weather conditions and the pollen concentration, Kim

*et al*.4 fit the input variables to a Weibull probability density function (PDF); here, we fit the

*GDD*to a Weibull PDF. The input variables ultimately used in the study are shown in Table 1.

##### Table 1

### Oak pollen data

^{−3}). The risk grades are also distributed, with mild, moderate, severe, and extreme occurring with frequencies of 92.7%, 3.5%, 2.2%, and 1.6%, respectively. Using the data, in which the distribution tends toward low concentrations, leads to a model that underestimates concentrations and cannot predict high concentrations, which in turn degrades the model's performance in producing pollen risk warnings. To reduce underestimation, we varied the sampling size according to risk grade and used the results for model training.

### Training and test data

### DNN-based concentration model for oak pollen

*B*bootstrap sample sets from the training set, training the DNN model through pre-training and fine tuning of the sample sets, creating final prediction values from each DNN model's predictive value mean, and finally classifying the risk according to the pollen risk grades (Fig. 1). Here, we used a truncated mean of the prediction values in which the top and bottom 5% of the values were removed to reduce the effect of outliers. We tested the sensitivity of

*B*to the number of bootstrap sample sets and used this to set its value. We found that as

*B*grew larger, the mean absolute error (MAE) and the root mean squared error (RMSE) both grew smaller, with convergence occurring at values of

*B*above 30. Ultimately, a bootstrap DNN model with a

*B*of 30 was used as the pollen concentration model.

### Evaluation of the new model

^{−3}or higher was set as the season's starting date, while the last day in which the pollen concentration was observed (or predicted) to be 10 grains m

^{−3}or lower was the season's ending date.

##### (Equation. 1)

${\scriptscriptstyle APE\u2007=\u2007\frac{100}{n}{\sum}_{i\u2007=\u20071}^{n}\left|\frac{{P}_{i}\u2007-\u2007{O}_{i}}{{O}_{i}}\right|,\mathrm{\u2007}\mathrm{\u2007}\mathrm{w}\mathrm{h}\mathrm{e}\mathrm{r}\mathrm{e}\mathrm{\u2007}{P}_{i}:\mathrm{\u2007}\mathrm{p}\mathrm{r}\mathrm{e}\mathrm{d}\mathrm{i}\mathrm{c}\mathrm{t}\mathrm{e}\mathrm{d}\mathrm{\u2007}\mathrm{v}\mathrm{a}\mathrm{l}\mathrm{u}\mathrm{e},\mathrm{\u2007}{O}_{i}:\mathrm{\u2007}\mathrm{o}\mathrm{b}\mathrm{s}\mathrm{e}\mathrm{r}\mathrm{v}\mathrm{e}\mathrm{d}\mathrm{\u2007}\mathrm{v}\mathrm{a}\mathrm{l}\mathrm{u}\mathrm{e}}$### RESULTS

### Pollen concentration and risk grade

^{−3}in the regression, SVR and DNN models, respectively. The mean of the MAE was 23.26, 23.14, and 17.25 grains m

^{−3}in the regression, SVR and DNN models, respectively. In terms of the overall MAPE, RMSE and MAE, the DNN model exhibited the best performance and the regression model showed the worst performance.

##### Table 2

*i.e*., all 3 models had low accuracy. However, in underestimating the extreme grade by one grade as the severe grade, the DNN model showed the best performance at 54.6% (Table 3).

##### Table 3

### Pollen season

### DISCUSSION

*et al*.23 used an ANN and support vector machine on data collected by laser beam and photo-detector to automatically observe 8 types of pollen. Oteros et al.24 automatically observed pollen by comparing the microscopic images of pollen samples collected from collection devices with 58 criteria based on an image library. Other research on automatic detection has been carried out by Kawashima

*et al*.,25 O'Connor

*et al*.,26 and Wagner and Macher.27

*et al*.13 and Iglesias-Otero

*et al*.12 used an ANN model that took the previous day's pollen concentration and weather data as input variables to improve pollen concentration prediction performance. In the present study, we tested a prototype model that uses the previous day's pollen concentration values, which were found to be very similar to observed pollen concentrations. Overall, automatic observation systems would be expected to improve the model's predictive power. However, the current set of input data without utilizing the observation data of the previous day is the best option for daily operational forecast of pollen.

*et al*.,35 Rosa

*et al*.36 and Papa

*et al*.37 Using HS to optimize a DNN by using the number of hidden layers and the number of neurons in each layer as parameters is one approach to optimizing DNN structures that might be used to more efficiently determine DNN structures.