Comparison of various estimation methods for the parameters of Michaelis-Menten equation based on in vitro elimination kinetic simulation data

Yong-Soon Cho; Hyeong-Seok Lim

doi:10.12793/tcp.2018.26.1.39

Abstract

The Michaelis-Menten equation is one of the best-known models describing the enzyme kinetics of in vitro drug elimination experiments, and takes a form of equation relating reaction rate (V) to the substrate concentration ([S]) via the maximum reaction rate (V_max) and the Michaelis constant (K_m). The current study was conducted to compare the accuracy and precision of the parameter estimates in the Michaelis-Menten equation from various estimation methods using simulated data. One thousand replicates of simulated [S] over serial time data were generated using the results of a previous study, incorporating additive or combined error models as a source of random variables in the Monte-Carlo simulation using R. From each replicate of simulated data, V_max and K_m were estimated by five different methods, including traditional linearization methods and nonlinear ones without linearization using NONMEM. The relative accuracy and precision of the estimated parameters were compared by the median values and their 90% confidence intervals. Overall, V_max and K_m estimation by nonlinear methods (NM) provided the most accurate and precise results from the tested 5 estimation methods. The superiority of parameter estimation by NM was even more evident in the simulated data incorporating the combined error model. The current simulation study suggests that NMs using a program such as NONMEM provide more reliable and accurate parameter estimates of the Michaelis-Menten equation than traditional linearization methods in in vitro drug elimination kinetic experiments.

Go to :

Introduction

The characterization of drug elimination kinetic studies is an essential part of the development process, and it requires accurate and precise estimation of the parameters of the relevant kinetic models. Michaelis-Menten kinetics is one of the bestknown models of enzyme kinetics in in vitro drug elimination or drug-drug interaction experiments. The Michaelis-Menten equation (M-M equation) consists of two parameters, the maximum reaction rate (V_max) and the Michaelis constant (K_m) describing the rate of enzymatic reactions by relating reaction rate (V) to the concentration of a substrate ([S]).[1] By definition, Km is equal to the concentration of the substrate at half V_max. It can also be thought of as a measure of how well a substrate complexes with a given enzyme, otherwise known as its binding affinity. An equation with a low Km value indicates a large binding affinity, as the reaction will approach V_max more rapidly. An equation with a high Km indicates that the enzyme does not bind as efficiently with the substrate, and Vmax will only be reached if the substrate concentration is high enough to saturate the enzyme. Therefore, K_m is an intrinsic parameter of enzyme-catalyzed reactions and it is significant for its biological functions.[2]

$V = \frac{V m a x \times [S]}{K m + [S]}$

Two most commonly used methods for determining the parameters of the M-M equation are the Lineweaver-Burk plot and the Eadie-Hofstee plot, both of which are linearization methods transforming the original nonlinear M-M equation into a linear one, and the data is then fit by a linear regression, which can be displayed as a straight line in 2-dimensional graph. In spite of the simplicity and familiarity of linear regression method, these approaches have some pitfalls. The most common problem with these transformations is the fact that transformed data does not usually satisfy the basic assumptions of linear regression, that is, the distribution of data around the straight regression line follows a Gaussian distribution, and the standard deviation of the distribution is equal at each value of the independent variables. For this reason, there has been increased interest in fitting full time course kinetic data to nonlinear equations based on numerical integration of rate equations without linearization.[3 4 5]

Simulation studies are useful for comparing various estimation methods using virtual in vitro enzyme kinetic data. In this study, we compared the accuracy and precision of parameter estimation for the M-M equation determined by various estimation methods, including traditional linearizing methods and nonlinear regression methods.

Go to :

Methods

Simulation Data Generation

Invertase is the enzyme which catalyzes the reaction which results in the inversion of optical rotation from positive for sucrose to net negative for the sum of fructose plus glucose. Based on the M-M equation and the studied kinetics of invertase (V_max=0.76 mM/min and K_m=16.7 mM), a set of virtual substrate concentration ([S]) data without considering residual errors was simulated for 5 initial substrate concentrations of 20.8, 41.6, 83, 1667.7, and 333 mM and at pre-specified times thereafter using the deSolve package in R 3.3.3.[1]

$\frac{{d[S]}_{p r e d}}{dt} = - \frac{V m a x \times {[S]}_{p r e d}}{K m + {[S]}_{p r e d}}$

where [S]pred is the predicted substrate concentration without error.

The observation time points for each concentration in the simulation were as follows; 0, 17, 27, 38, 62, 95, and 1372 min for initial concentrations of 20.8 mM; 0, 10, 30, 61, 90, 112, 132, and 154 min for 41.6 mM; 0, 50, 90, 125, 151, and 208 min for 83 mM; 0, 8, 16, 28, 52, 82, and 103 min for 166.7 mM; and finally 0, 7,14, 26, 49, 75, 117, and 1052 min for 333 mM.

A set of final virtual experimental data was constructed from the error-free data set by incorporating an additive error model or a combined (additive + proportional) error model.

Additive error model: [S]_i = [S]_pred + ϵ1_i

Combined error model: [S]_i = [S]_pred + ϵ1_i + [S]_pred × ϵ2_i

where [S]_pred is an error free-substrate concentration calculated from the M-M equation, [S] is an error incorporating substrate concentration, ϵ1 is a random variable following a normal distribution with a mean of 0 and a standard deviation of 0.04 (calculated as approximately half of the lowest value of [S]_pred), ϵ2 is a random variable following a normal distribution with mean of 0 and standard deviation of 0.1 (an arbitrarily selected high value), and i is the index of simulation replicates.

A Monte-Carlo simulation with 1,000 replicates for each scenario was carried out using R

M-M Parameter Estimation by Various Estimation Methods and Analysis of the Estimation Results

The V_max and K_m values were estimated from the original simulation dataset or the manipulated simulation data using the following 5 different estimation methods, which are summarized in Table 1.

Table 1

Summary of estimation methods

Estimation Method	Description	Dataset (response variable-explanatory variable)	Regression	Model
LB	Lineweaver-Burk plot	Vi-[S]^*	Linear	$\frac{1}{Vi} = \frac{{Km}^{†}}{{Vmax}^{§}} × \frac{1}{[S]} + \frac{1}{Vmax}$
EH	Eadie-Hofstee plot	Vi-[S]	Linear	$V i = V m a x - K m \times \frac{Vi}{[S]}$
NL	Nonlinear regression to fit Vi-[S] data	Vi-[S]	Nonlinear	$Vi = \frac{Vmax \times [S]}{Km + [S]}$
ND	Nonlinear regression to fit V_ND-[S]_ND^‡	V_ND-[S]_ND	Nonlinear	$V_{N D} = \frac{Vmax \times {[S]}_{N D}}{Km + {[S]}_{N D}}$
NM	Nonlinear regression to fit [S]-time data	[S]-time	Nonlinear	$\frac{d[S]}{dt} = - \frac{V m a x \times [S]}{K m + [S]}$

^*Vi: the initial velocity calculated from substrate concentration versus time ([S]-time) data. For the detailed calculation of Vi, see the text; [S]: substrate concentration ^‡V_ND: the velocity calculated by the average rate of change for substrate concentrations ([S]) of adjacent time points, [S]_ND: the corresponding substrate concentrations to V_ND calculated by the average of substrate concentrations ([S]) of adjacent time points ^†K_m: the Michaelis constant ^§V_max: the maximum reaction rate.

1) Lineweaver-Burk plot (LB)
2) Eadie-Hofstee plot (EH)
3) Nonlinear regression to fit Vi-[S] data (NL), where Vi is the initial velocity.
4) Nonlinear regression to fit V_ND-[S]_ND (ND), where V_ND is the velocity calculated by the average rate of change for the substrate concentrations ([S]) at adjacent time points and [S]_ND is the corresponding substrate concentrations to VND calculated from the averaged substrate concentrations ([S]) of adjacent time points.
5) Nonlinear regression to fit [S]-time data (NM)

A dataset suitable for each estimation method was prepared from the original simulated dataset as follows.

For the estimation methods using initial velocity, i.e. LB, EH, and NL, Vi values were calculated from the original simulated substrate concentration versus time ([S]-time) data using linear regression in R. For each [S], regressions were repeated using the first 3 points, then the first 4 points, and the first 5 points, etc. and the adjusted R² was computed as follows:

Vi = -1 × (the slope calculated from linear regression)

where n is the number of data points in the regression and R2 is the square of the correlation coefficient, respectively. The slope calculated from the regression with the largest adjusted R² value was selected. However, if the adjusted R² did not improve, but was within 0.0001 of the largest adjusted R² value, the regression with the larger number of points was used. The slope was then transformed to a positive value to indicate the velocity of the reaction product as follows:

$A d j u s t e d R^{2} = 1 - \frac{(1 - R^{2}) \times (n - 1)}{(n - 2)}$

For LB and EH, additional manipulation was performed after the Vi values were obtained. For LB, the reciprocal of Vi and [S] was taken and 1/Vi versus 1/[S] data was produced. For EH, Vi/[S] was computed and Vi versus Vi/[S] data was created.

For ND, a velocity (V_ND) (was computed from the average rate of change of [S] for adjacent time points and the corresponding substrate concentration ([S]_ND) (was obtained averaging the substrate concentrations ([S]) of adjacent time points.

$V_{N D} = \frac{{[S]}_{j + 1} - {[S]}_{j}}{t_{j + 1} - t_{j}}$

${[S]}_{N D} = \frac{{[S]}_{j + 1} + {[S]}_{j}}{2}$

where is the index of time points.

For NM, the original simulation dataset by itself was used with no data manipulation.

After manipulating a dataset to a suitable form for each estimation method, estimation was conducted by first order conditional estimation with interactions in NONMEM (version 7.3, ICON Development Solutions, Ellicott City, MD, USA). The models for each analysis methods were as follows and the residual variabilities were applied according to the error model used for the simulation.

$For LB, \frac{1}{Vi} = \frac{Km}{Vmax} \times \frac{1}{[S]} + \frac{1}{Vmax}$

$F o r E H, V i = V m a x - K m \times \frac{Vi}{[S]}$

$F o r N L, V i = \frac{Vmax \times [S]}{Km + [S]}$

$F o r N D, V_{N D} = \frac{Vmax \times {[S]}_{N D}}{Km + {[S]}_{N D}}$

$F o r N M, \frac{d[S]}{dt} = - \frac{V m a x \times [S]}{K m + [S]}$

The median estimated values and 90% confidence intervals (CIs) of V_max and K_m for a particular fitting method were subsequently calculated for the purposes of comparison.

Go to :

Results

Data Fitting by Various Estimation Methods

One thousand replicates of simulated [S]-time data pointswere created for each error scenario, i.e., the additive error and combined error models. The profiles of randomly selected data are presented in Figure 1. Each set of simulation data was fitted by not only traditional estimation methods such as LB and EH but also by nonlinear fitting methods such as NL, ND, and NM using NONMEM. The results fitted by these estimation methods to the selected simulation data of Figure 1 are displayed in Figure 2 and 3. For the simulation data incorporating additive error, there was no minimization failures in all methods while 17 simulations in EH, 14 simulations in NL, 22 simulations in ND, and 383 simulations in NM had minimization failure in the simulation data incorporating combined error. The causes of all failures were rounding errors associated with SIGDIGIT of NONMEM. Not only estimates with successful minimization but those with minimization failure due to rounding error were included in the posthoc analysis.

Figure 1

The substrate concentration ([S])-time profiles of randomly selected simulation data incorporating additive error (a) or combined error (b). The simulated data is shown as differently shaped points for each initial [S].

Figure 2

Curves fitted using 5 different fitting methods, a)-e), for the selected simulation data incorporating additive error in Figure 1. The simulated data are shown as points and the prediction using each fitting methods are shown as dashed lines in a)-e). LB is Lineweaver-Burk plot, EH is Eadie-Hofstee plot, NL is nonlinear regression to the initial velocity (Vi) versus substrate concentration (Vi-[S]) data, ND is nonlinear regression to the velocity versus substrate concentration (V-[S]) data directly transformed from [S]-time data, and NM is nonlinear direct fitting method to substrate concentration ([S])-time data.

Figure 3

Curves fitted using 5 different fitting methods, a)-e), for the selected simulation data incorporating combined error in Figure 1. The simulated data are shown as points and the prediction using each fitting methods are shown as dashed lines in a)-e). LB is Lineweaver-Burk plot, EH is Eadie-Hofstee plot, NL is nonlinear regression to the initial velocity (Vi) versus substrate concentration (Vi-[S]) data, ND is nonlinear regression to the velocity versus substrate concentration (V-[S]) data directly transformed from [S]-time data, and NM is nonlinear direct fitting method to substrate concentration ([S])-time data.

Comparison of M-M parameters determined by Various Estimation Methods

Median values and 90% CIs of the M-M parameters estimated by 5 different methods for the simulation data using additive error and combined error models are presented in Table 2 and 3, respectively. The box-and-whisker plot of these parameters for the two error models are displayed in Figure 4 and 5. The relative accuracy and precision of the estimated parameters were assessed by the median values and 90% CIs. Overall, regardless of the error model used in the simulation data, the median values of both V_max and K_m estimated by NM were closest to the original parameters, and the 90% CIs of NM were also the smallest among the tested estimation methods.

Figure 4

Box-and-whisker plot of Vmax (a) and Km (b) estimated by 5 different fitting methods for simulation data incorporating additive error. Red dashed lines denote true parameter values. LB is the Lineweaver-Burk plot, EH is the Eadie-Hofstee plot, NL is nonlinear regression to the initial velocity (Vi) versus substrate concentration (Vi-[S]) data, ND is nonlinear regression to the velocity versus substrate concentration (V-[S]) data directly transformed from [S]-time data, and NM is nonlinear direct fitting method to substrate concentration ([S])-time data.

Figure 5

Box-and-whisker plot of Vmax (a) and Km (b) estimated by 5 different fitting methods for simulation data incorporating combined error. Red dashed lines denote true parameter values. LB is Lineweaver-Burk plot, EH is Eadie-Hofstee plot, NL is nonlinear regression to the initial velocity (Vi) versus substrate concentration (Vi-[S]) data, ND is nonlinear regression to the velocity versus substrate concentration (V-[S]) data directly transformed from [S]-time data, and NM is nonlinear direct fitting method to substrate concentration ([S])-time data.

Table 2

Michaelis-Menten equation parameters estimated by different metods in the simulation data incorporating additive error

Estimation Methods^*	Vmax		Km
Estimation Methods^*	Median	90%CI	Median	90%CI
LB	0.758	0.756~0.76	22.076	21.568~22.574
EH	0.758	0.756~0.76	22.089	21.59~22.576
NL	0.761	0.759~0.762	22.457	21.969~22.944
ND	0.733	0.731~0.735	16.659	16.566~16.748
NM	0.76	0.759~0.761	16.698	16.638~16.768

^*LB is the Lineweaver-Burk plot, EH is the Eadie-Hofstee plot, NL is nonlinear regression to the initial velocity (Vi) versus substrate concentration (Vi-[S]) data, ND is nonlinear regression to the velocity versus substrate concentration (V-[S]) data directly transformed from [S]-time data, and NM is nonlinear direct fitting method to the substrate concentration ([S])-time data.

Table 3

Michaelis-Menten equation parameters estimated by different methods in the simulation data incorporating combined error

Estimation Methods^*	Vmax		Km
Estimation Methods^*	Median	90%CI	Median	90%CI
LB	0.49	0.405~2.398	9.077	0.167~93.829
EH	0.634	0.431~2.949	19.396	0.167~111.753
NL	0.632	0.438~60480.69	24.522	4.892~4934281
ND	3.67	0.844~26.508	164.313	45.15~205286.2
NM	0.764	0.616~0.972	16.762	14.046~22.281

For the simulation data incorporating additive error, estimated Vmax values were generally unbiased and precise in all the tested methods, although estimation by ND was the most unreliable. With regard to Km, the estimation by NM was the closest to the true value, followed by ND, EH, LB, and NL. Estimations by NM and ND were also the most precise, and the other results were similar.

For the simulation data incorporating combined error, both the estimated V_max and K_m showed a relatively poor accuracy and precision when compared the estimations of the simulation data incorporating additive error, except for the accuracy of estimation by NM. With regard to V_max, the estimation by NM was the most accurate, followed by EH, NL, LB, and ND. With regards to K_m, the estimation by NM was the most accurate and was followed by EH, LB, NL, and ND. The estimation by NM was also the most precise and was followed by LB, EH, ND, and NL for both parameters. Notably, for the simulation data incorporating combined error, while the methods other than NM exhibited considerably unreliable estimations, NM estimation represented a comparable accuracy to that of the additive error data, although the variability of the estimations was slightly increased.

Go to :

Discussion

Undoubtedly, the M-M theory is the foundation upon which most current quantitative analysis of enzymes is built.[6] Advances have been made which describe more complex behavior, such as allosteric regulation or cooperativity in the framework of the M-M theory or outside it, but the basic model is still widely used for most single-substrate enzyme kinetics. The classical approach to deriving the parameters of M-M, i.e.,V_max and K_m, is through initial-rate measurements. Under the quasisteady state approximation, the M-M equation and corresponding nonlinear differential equations collapse into a single hyperbolic function. The initial stages of an array of experiments can describe this relation between reaction speed and substrate concentration. This yields the maximal reaction rate, V_max, and Michaelis constant, K_m, which corresponds to the substrate concentration at half of V_max.[7] Unfortunately, an accurate estimation requires a large amount of data as well as substrate concentrations much higher than K_m, which are often difficult to obtain. Thus, there is great value in directly determining the M-M constants from the [S]-time curve.[8] Here, we have conducted a simulation study to see if direct nonlinear fitting method is reliable compared to the classical approach or some other methods derived from the classical approach.

In this study, 1000 replicates of simulation data were generated based on the original M-M data with assay errors according to 2 different residual error models. The simulation data were fitted by 5 different methods which consisted of 1 nonlinear direct fitting method for [S]-time data (NM) and 4 linear or nonlinear fitting method for transformed V-[S] data (LB, EH, NL, ND).

During data generation process, we incorporated additive error or combined error model as the residual error model into the simulation data. This is because these two models generally reflect the assay errors arising from pharmacokinetic and pharmacodynamic data and the errors originating from model misspecification of experiment data. For dense pharmacokinetic data, the combined error model broadly reflects assay errors, whereas for pharmacodynamic data, additive error model may be sufficient. Exponential and proportional models are generally avoided because of the tendency to overweight low concentrations.

In posthoc analysis, estimates obtained with minimization failure as well as those with successful minimization were used. The causes of all failures were rounding errors associated with SIGDIGIT of NONMEM. Software programs other than NONMEM usually judge minimization success or failure regardless of SIGDIGIT. SIGDIGIT of NONMEM is calculated in the scaled and transformed parameter (STP) space and there are usually more actual significant digits in the original scale.[9] Additionally, the median values and 90% CI for the parameters estimated from the only minimization successful data were almost same as those of the parameters estimated from the whole data set including rounding errors. Considering this, estimates obtained from NONMEM runs with minimization failure due to rounding error seems to be reasonable.

When comparing the median values and 90% CI of the parameters estimated from each fitting method, estimation by NM was the most accurate and precise among the 5 methods, for both error models. That is, the use of nonlinear regression and untransformed data yielded the most reliable estimation of enzyme kinetic parameters. In particular, the superiority of parameter estimation of nonlinear regression directly to [S]-time data was clearly revealed in the simulated data incorporating combined error. When a combined error was applied, NM method performed fairly well and gave comparable answer to those of the simulation data incorporating additive error, especially for point estimates, whereas the other methods gave considerably poorer answers. In the combined error model, % CV (coefficient of variation) is the greatest at the lowest concentration and decreases with the increment of concentration. In the in vitro experiments conducted at low initial concentration with larger assay variability, the difference of biases in the parameter estimation between NM method and the other method can be maximized, because NM method uses dense, serial concentration over time data, providing relatively unbiased estimates, whereas the other methods uses only one or few velocity data, resulting in more biased estimates (as shown in Figure 3). The combined error can represent complex scenarios in which experimental errors such as variation in the measured volume of substrates and enzymes, variations during substrate dilution and imprecisions in the instrumentation are mixed. Although this level of error was higher than is likely in practice, the differences between the 5 methods are seen most clearly using this error model. In addition to reliable estimation, NM bypasses the need to subsequently fit the concentration dependence of the measured rate after fitting the primary kinetic data ([S]-time) to a simplified function, which is the process performed for the other methods.

It was apparent from this study that the ND method gave answers that were the least accurate and generally imprecise. The unreliable estimation of ND method can partly explained by the unequal variance of velocity and the correlated error between velocities induced by the unequal time intervals of the simulation data.

When comparing the linear transformation methods (LB and EH) using linearized Vi-[S] data, EH plots and LB plots gave similarly reliable answers in the additive error model but the EH plot exhibited more accurate and similarly precise estimation in the combined error model, which was consistent with previous investigations.[10] The LB plot has the disadvantage of compressing data points at high substrate concentrations into a small region and emphasizing the points at lower substrate concentrations, which may alter the slope of the line due to the long lever arm effect of the reciprocal plot, leading to a poor estimation of Vmax and Km in the combined error model.[7] On the other hand, the EH plot bypasses the unequal weighting of errors in the reciprocal plot, an implicit fault of LB plot, because it does not use a reciprocal plot and separates the two kinetic parameters, V_max/K_m and V_max as 1/slope and intercept, respectively, thus giving more reliable answers than the LB plot. Although the EH plot produces more reliable estimates, the presence of the dependent variable, velocity, on both axes makes rigorous error analysis difficult. In all linear transformation methods, linear regression was used to estimate the slope and intercept of the straight line, and afterwards V_max and K_m were calculated from the straight line parameters. Although these methods are very useful for data visualization and are still widely employed in enzyme kinetic studies, linear transformation makes them prone to errors, as shown in this study.

When comparing the linear transformation methods (LB and EH) and the nonlinear regression method using transformed Vi-[S] data (NL), NL generally gave similarly accurate but poorer answers than LB and EH, especially in the combined error model. This is likely to be result of the bias caused by transformation from [S]-time to Vi-[S] in which the only first 10-20% of the reaction data was used, although NL is more suitable for fitting to Vi-[S] data itself.

For nonlinear regression, we used NONMEM, the estimation method of which is based on maximum likelihood estimation (MLE) and STP whereas other software packages usually used in enzyme kinetics, such as Excel and GraphPrism, perform fitting based on least square estimation (LSE) and original scaled and untransformed parameters.[11 12] The latter require a good initial guess of parameters and there is no guarantee of convergence to the global minimum.[13 14] The former is generally insensitive to initial values and reaches the global minimum in our experience. Thus, although we did not intensively compare NONMEM-like estimation software and the other software in this study, we recommend using NONMEM-like estimation software for nonlinear regression.

In conclusion, estimating the actual parameters of the M-M equation from data is the aspiration of all researchers involved in enzyme kinetics research. This remains to be perfected though various estimation methods have been used. In this study, the superiority of direct nonlinear regression to [S]-time data was clearly revealed through comparing the accuracy and precision of the parameters of the M-M equation, as determined by various estimation methods from the simulated data, especially simulated data incorporating combined error, which might represent complex experimental situations. Our results suggest that direct nonlinear fitting of [S]-time data using program such as NONMEM can provide a better parameter estimation with regard to accuracy and precision in enzyme kinetics, which is the basis of in vitro drug-drug interaction study.

Go to :