INTRODUCTION
Total testosterone is an important biomarker of various testosterone-related endocrine disorders, and a reliable method for measuring total testosterone is essential [
1]. Immunoassays are available for measuring total testosterone on automated immunochemistry platforms. However, they show significant interferences with positive bias at low concentrations, which are commonly observed in testosterone-deficient men, women, and children. Moreover, there is a poor agreement among the different immunoassays. Mass spectrometry (MS) methods exhibit higher accuracy and lower variability than immunoassays, especially at low analyte concentrations. The very low concentrations of total testosterone in certain populations necessitate the use of MS-based methods. Accurate measurements in patients with low testosterone concentrations have led to position and consensus statements from the Endocrine Society, along with efforts and guidelines to evaluate and improve testosterone testing [
2-
5].
A previous study evaluated selective performance characteristics of the following five automated immunoassays for total testosterone: the Architect 2nd Generation Testosterone assay on the Architect i2000sr system (Abbott Diagnostics, Abbott Park, IL, USA), the Testosterone (TSTO) assay on the ADVIA Centaur system (Siemens, Malvern, PA, USA), the Testosterone assay on the Beckman Coulter UniCel DxI 800 system (Beckman Coulter, Brea, CA, USA), the Testosterone II assay on the Roche Modular E170 system (Roche Diagnostics, Indianapolis, IN, USA), and the Total Testosterone assay on the Immulite 2000 XPi system (SiemensTheSiemens) [
6]. According to it, some methods showed acceptable performances by improving upon existing testosterone assays, but, for the concentrations less than 1.9 nmol/L (1.9×0.2884≒0.55 ng/mL), which is the upper reference limit for women by its LC-MS/MS, all methods had positive biases, ranging from 14.2 to 63.8%.
In order to improve consistency of different methods, Centers for Disease Control and Prevention (CDC) established the Hormone Standardization program (HoSt) for testosterone assays. Standardization of total testosterone measurements in serum would be established through method comparison and bias estimation between the CDC reference laboratory and the testing laboratory [
7].
The CDC HoSt program consists of two phases. In phase 1, 40 samples are sent to the participant laboratory, with total testosterone concentrations assigned. The participating laboratory can use these 40 samples to perform a bias assessment and adjust its calibration as needed prior to the start of phase 2. In phase 2, the laboratory will receive 4 sets of 10 samples with unknown concentrations over the course of 12 months. The acceptable overall mean bias criterion of ±6.4%, based on biological variability data, will be used to issue certification. Twenty assays are certified by the CDC HoSt program (updated on December 2021). Among these, 16 are LC-MS/MS assays and four are chemiluminescence immunoassays (CLIAs) from Siemens [
8]. This data indirectly indicates that standardization of serum testosterone measurement is to some extent achieved among LC-MS/MS methods, but not achieved among immunoassays, yet.
We developed and validated an ultra-performance LC-MS/MS (UPLC-MS/MS) assay for quantifying low total testosterone concentrations. We compared the results with immunoassay measurement results. Finally, we participated in the an Accuracy-Based Survey (ABS) of the College of American Pathologists (CAP, Northfield, IL, USA) and the CDC HoSt program to demonstrate the accuracy of our LC-MS/MS method.
MATERIALS AND METHODS
This method was developed and evaluated in the Special Chemistry Department of GCLabs (Yongin-Si, Gyeonggio-do, Korea) from November, 2020 to August 2021. Experiments involving human subjects were performed according to the Declaration of Helsinki (2019) and approved by the Institutional Review Board (IRB) of GCLabs (IRB No. GCL-2021-1050-01).
Reagents and chemicals
HPLC-grade methanol and water (Tedia, Fairfield, OH, USA), HPLC-grade acetonitrile (Fisher, Pittsburgh, PA, USA), formic acid (Fluka, Muskegon, MI, USA), and tert-butyl methyl ether (MTBE) (Acros, Fair Lawn, NJ, USA) were used. Calibrators (6Plus1 Multilevel Serum Calibrator Set MassChrom Steroid Panel 2) and quality control (QC) materials (MassCheck Steroid Panel 2 Serum Control Levels I, II, and III) were purchased from Chromsystems (Gräfelfing, Bayern, Germany). Certified reference material (CRM) testosterone (1 mg/mL [3.47 nmol/L] in acetonitrile, m/w 288.42 g/mol) was purchased from Cerilliant (Round Rock, TX, USA). The internal standard (IS; testosterone-2,3,4-13C3, 0.1 mg/mL [0.35 nmol/L] in methanol, m/w 291.40 g/mol) was purchased from Sigma-Aldrich (St. Louis, MO, USA). The standard reference material (Hormones in Frozen Human Serum, Male Serum and Female Serum, SRM 971a) was purchased from the National Institute of Standards and Technology (NIST) (Gaithersburg, MD, USA). Isolute SLE+ 400 μL Supported Liquid Extraction (SLE) plates were purchased from Biotage (Uppsala, Uppsala, Sweden).
Calibrator, QC material, and IS preparation
The calibrators were reconstituted by adding 3.0 mL of distilled water (DW) to the vials, and calibration curves were constructed using seven concentrations of each calibrator (0, 0.05, 0.25, 0.96, 2.94, 5.78, and 11.50 ng/mL [0, 0.16, 0.86, 3.33, 10.19, 20.04, and 39.88 nmol/L, respectively]). Three QC samples of low, medium, and high concentrations (0.20, 1.46, and 7.92 ng/mL [0.69, 5.06, and 27.46 nmol/L, respectively]) were prepared by adding 3.0 mL of DW to the vials. NIST SRM 971a materials stored at –80°C were thawed and used as internal QC materials. The IS at 0.1 mg/mL (0.35 mmol/L) in methanol was diluted with acetonitrile to yield a 100-mL working solution with a final concentration of 50 ng/mL (173.37 nmol/L).
Sample preparation
Twenty microliters of IS working solution was transferred into each well of a 96-well plate. Then, 200 μL of calibrators, controls, NIST materials, and sera from study subjects were added into each well. The solution in each well was mixed by pipetting up and down 10 times using a 12-channel pipette. A SLE plate was placed on a deep-well plate to collect the extraction solvent. The mixed samples were loaded onto the SLE plate and drawn into the plate by applying vacuum using a vacuum pump. Then, 800 µL of MTBE solvent was added for extraction under vacuum. The solvent was evaporated to dryness and reconstituted in 100 μL of 50% methanol by vortexing. Twenty microliters of the reconstituted eluate was injected into the LC-MS/MS instrument.
LC-MS/MS
The ExionLC UPLC (Sciex, Framingham, MA, CA, USA) system equipped with a Kinetex C18 column (3.0×100 mm, 2.6 μm; Phenomenex, Torrance, CA, USA) was used for LC. The mobile phases were 0.1% formic acid in DW (A) and 0.1% formic acid in acetonitrile (B). The following gradients were applied to the column: 0 minute, 60% A and 40% B; 0.5 minute, 60% A and 40% B; 4 minutes, 5% A and 95% B, 4.5 minutes, 5% A and 95% B; 4.6 minutes, 60% A and 40% B; and 7 minutes, 60% A and 40% B. The total run time was 7 minutes; the flow rate was 0.30 mL/min.
We used the Triple Quad 6500+ (Sciex) MS/MS system equipped with an electrospray ionization source operated in positive ion mode. Nitrogen gas was used for nebulation, desolvation, and collision. The analytes were monitored in the multiple reaction monitoring (MRM) mode. The transitions and other MS/MS settings are shown in
Table 1. Quantitation was performed based on the ratio of the integrated peak area of testosterone to that of the IS and was calculated using Analyst Instrument Control and Data Processing Software (version 1.7.1 with HotFix1, AB Sciex).
Assay performance analysis
Analytical performance characteristics, including precision, accuracy, linearity, lower limit of quantitation (LLOQ), carryover, matrix effects, and stability, were evaluated, and method comparisons were conducted in accordance with the US Food and Drug Administration Center for Drug Evaluation and Research (CDER) Bioanalytical Method Validation Guidance for Industry [
9], CLSI guidelines [
10,
11], a previous study [
12], and review articles on LC-MS/MS laboratory development and operation [
13,
14].
Intra-run precision was assessed using five replicates in a single run for five days, and inter-run precision was assessed using 20 separate runs over 20 days, with a single run per day and five analyte concentrations. These five concentrations included three of QC materials and two of NIST SRM 971a Hormones in Frozen Human Serum: 0.32 ng/mL (1.12 nmol/L) with an uncertainty limit of ±0.005 ng/mL (0.01 nmol/L) for the Female Serum and 5.81 ng/mL (20.14 nmol/L) with an uncertainty limit of ±0.090 ng/mL (0.31 nmol/L) for the Male Serum. Accuracy was assessed using same NIST SRM 971a materials. Acceptance limits were within ±5.3% CV for precision and within ±6.4% bias of nominal concentrations for accuracy. Analytical performance goals for testosterone measurement on the basis of biological variability were set according to the CDC HoSt program participant protocol acceptance criteria [
8] quoted from the study of Yun,
et al.’s study [
15]. The linearity of the response was assessed by mixing a 1/800 dilution of NIST SRM 971a Male Serum as the low-concentration material and 50 ng/mL (173.37 nmol/L) CRM as the high-concentration material to achieve final concentrations of 0.007, 0.02, 0.03, 0.06, 12.54, 25.03, 37.52, and 50.00 ng/mL (0.02, 0.05, 0.10, 0.20, 43.50, 86.79, 130.09, and 173.37 nmol/L, respectively). The LLOQ was evaluated using NIST SRM 971a Male Serum diluted with DW, with <20% CV (eight concentrations×five times). Carryover was evaluated according to the following equation, with four serial measurements of two levels of calibrators (calibrators 1 and 6):
Where, L indicates the low concentration, and H the high concentration. The acceptance criterion for carryover was ±1.0%.
Ion suppression was evaluated by the post-column infusion method. In brief, a standard testosterone solution with a concentration of 200 ng/mL (693.48 nmol/L) was continuously infused directly into the mass detector at a flow rate of 7 μL/min, while DW and six extracted participant samples were injected into the UPLC column at a flow rate of 0.2 mL/min. If there was a significant change in the detection level, ion suppression was considered to have occurred at the point at which the change was observed.
To evaluate the stability of testosterone concentrations in serum, 10 serum samples stored at –20°C were exposed to one to three repeated freeze-thaw cycles, and testosterone concentrations were measured after each cycle. In addition, 10 serum samples were stored at 4°C and –20°C, and testosterone concentrations were measured at 3, 7, 14, and 28 days and at 3, 7, 14, 28, and 60 days, respectively. The acceptance criterion was that the accuracy (% nominal) at each level should be ±15%.
Verification of reference intervals (RIs)
After comparing the lower limits of all reference values from Mayo Clinic Laboratories, Quest Diagnostics, and ARUP Laboratories based on LC-MS/MS and the in-kit reference values of a chemiluminescent microparticle immunoassay (CMIA) and electrochemiluminescent immunoassay (ECLIA) (
Supplemental Data Table S1), we transferred the lowest reference value limits among them. For verification of the transferred RIs, we measured the testosterone concentrations in residual samples from 30 healthy men and 30 healthy women aged 20–49 years and 30 healthy men and 30 healthy women older than 50 years. The results were statistically analyzed and considered appropriate if 27 out of 30 (in the 95% confidence interval) measurements were within the desired RI according to CLSI guidelines [
16].
Method comparison of LC-MS/MS and CMIA
We measured serum testosterone concentrations in 40 random subjects having testosterone concentrations of 0.04–16.00 ng/mL (0.14–55.48 nmol/L) by LC-MS/MS and CMIA using the Architect 2nd Generation Testosterone assay on the Architect i2000sr system. Additionally, we measured serum testosterone concentrations in 160 subjects (40 men, 40 women, 40 boys [<20 years], and 40 girls [<20 years]) with unknown clinical histories, having testosterone concentrations <0.48 ng/mL (1.67 nmol/L), which is the upper reference limit for women <50 years by LC-MS/MS.
Statistical analysis
EP Evaluator (Data Innovations, Burlington, VT, USA) was used for validation of the selected RIs and for method comparisons using Passing–Bablok regression analysis and percent bias plots. P-values <0.05 were considered significant.
RESULTS
Representative UPLC-MS/MS chromatograms of serum total testosterone are shown in
Fig. 1. The inter- and intra-run imprecision ranged from 0.48% to 2.81% CV, which is lower than the minimal requirement based on biological variation. The percentage bias for accuracy ranged from 1.55% to 3.85% and was within the acceptable range (
Table 2). The linearity range was 0.008–52.16 ng/mL (0.03–180.84 nmol/L), with R
2=0.9999. The LLOQ was 0.008 ng/mL (0.03 nmol/L). The % bias and CV (%) calculated at eight concentrations using SRM 971a, Male Serum were as follows: measured mean concentrations (ng/mL) (% bias, CV (%)); 0.008 ng/mL (0.03 nmol/L) (6.85, 10.73), 0.02 ng/mL (0.06 nmol/L) (8.97, 11.32), 0.03 ng/mL (0.10 nmol/L) (4.83, 8.58), 0.06 ng/mL (0.21 nmol/L) (2.62, 0.92), 12.58 ng/mL (43.62 nmol/L) (0.30, 1.25), 26.18 ng/mL (90.77 nmol/L) (4.60, 1.28), 39.19 ng/mL (135.88 nmol/L) (4.45, 2.52), and 52.16 ng/mL (180.84 nmol/L) (4.31, 2.38). Although the diluted sample does not represent the authentic matrix, the absence of ionization suppression supports the determination of the LLOQ. There was no carryover effect. No significant ion suppression or enhancement was observed at the corresponding retention time (
Supplemental Data Fig. S1). Testosterone in serum was stable over three freeze-thaw cycles, for 28 days at 4°C, and for 60 days at –20°C.
After validation of the selected RIs, we determined them to be as follows: for men aged 20–49 years, 2.49–8.36 ng/mL (8.63– 28.99 nmol/L), for men older than 50 years, 1.93–7.40 ng/mL (6.69–25.66 nmol/L), for women of 20–49 years, 0.08–0.48 ng/mL (0.29–1.67 nmol/L), and for women older than 50 years, 0.03–0.41 ng/mL (0.10–1.41 nmol/L).
The results of the method comparison of UPLC-MS/MS and the Architect CMIA are shown in
Fig. 2 and
Supplemental Data Table S2. The correlation was good (R=0.989, slope=0.995) in 40 random subjects having total testosterone concentrations of 0.04–16.00 ng/mL (0.14–55.48 nmol/L); however, the CMIA showed positive percent bias compared with LC-MS/MS in men, women, boys, and girls having testosterone concentrations <0.48 ng/mL (1.67 nmol/L), i.e., 20.36% in all (N=160), 64.46% in men, 7.04% in women, 29.99% in boys, and 15.76% in girls.
Our results of the participation in the external proficiency tests of CAP ABS-B and the CDC HoSt program were all acceptable.
DISCUSSION
We developed and validated a UPLC-MS/MS method for quantifying low total testosterone concentrations in serum. All assay performance characteristics were satisfactory. Most importantly, the LC-MS/MS method had lower LLOQ values and a wider linearity range than conventional immunoassays. The LLOQ of the LC-MS/MS method was 0.008 ng/mL (0.03 nmol/L), which is three times lower than that of 2nd Generation Testosterone assay on the Architect i2000sr system, which is 0.02 ng/mL (0.08 nmol/L) according to the manufacturer. The linearity range of our method was 0.008–52.15 ng/mL (0.03–180.84 nmol/L)—three times wider than that of the Architect CMIA, which is 0.04 –18.62 ng/mL (0.13–64.56 nmol/L) according to the manufacturer [
17]. In fact, one male participant had a testosterone concentration of 0.02 ng/mL (0.07 nmol/L) as measured by LC-MS/MS, which was not detected by the CMIA. Furthermore, the Architect CMIA showed positive percent bias compared to our method, which is consistent with a previous finding of 19.2% positive bias for the Architect CMIA [
6].
In a study by Moal,
et al. [
18], none of the five immunoassays tested demonstrated sufficiently reliable results at testosterone concentrations <1.00 ng/mL (3.47 nmol/L), whereas LC-MS/MS precisely measured low testosterone concentrations. Kushnir,
et al. [
19] developed an LC-MS/MS method for measuring testosterone in women and children. The LLOQ of their method was 0.04 nmol/L (0.01 ng/mL). The authors suggested that the sensitivity and specificity of their method were adequate for the analysis of testosterone in samples from women and children. The LLOQ of our method is even lower and thus adequate for the analysis of testosterone in samples from women and children with low testosterone concentrations. Testosterone concentrations measured by CLIA on the Immulite 2000 system (Siemens) tended to be higher than those obtained by LC-MS/MS [
20]. However, two samples with low testosterone concentrations among 35 samples were not detected by CLIA but were detected by LC-MS/MS. This is in line with our findings.
In an inter-laboratory comparison study of serum total testosterone measurements using LC-MS/MS, the variability of total testosterone results among MS assays was substantially lower than that reported for immunoassays [
21]. Our results of the participation in external proficiency tests were all acceptable. In the CAP Y-A 2021 Ligand-Special program, results are considered acceptable when they are within the peer group mean±3 SD, which is the allowable limit according to the CAP [
22]. In the CAP ABS-B 2021, 22 of the 66 participating laboratories measured testosterone concentrations using MS (44 laboratories used EIA); however, as this was an educational challenge program, CAP did not report our grade [
23]. Our results were all acceptable as evaluated by us, according to a bias goal of 6.4%. Accuracy-based proficiency testing can significantly contribute to improving testosterone testing by providing reliable data on accuracy in patient care to laboratories, assay manufacturers, and standardization programs [
24]. MS methods in the 22 abovementioned laboratories showed accurate median concentrations with narrow ranges, whereas the EIAs showed variable median concentrations with wider ranges according to the manufacturers [
23]. Our results of the participation in the CDC HoSt program were also satisfactory. The mean (%) bias was 4.8% in phase 1 and 3.2% in phase 2 (Q3–4 of 2021), which are all within the acceptable overall mean bias of 6.4%. The HoSt certificate will be issued for one year’s performance after passing consecutive additional tests conducted in each quarter.
In future, LC-MS/MS should not only be applied as a routine primary measurement method for low testosterone concentrations but also as a reference measurement method in various national projects in which the accuracy of the results is of utmost importance. We are currently participating in the Biomarker Panel Data Production program supervised by National Biobank of Korea, in which serum testosterone concentrations from healthy men and women are measured using UPLC-MS/MS.
We recommend a critical medical decision point for selecting the best method to measure low testosterone concentrations between immunoassays and the LC-MS/MS method. Below the medical decision point, it would be better to measure testosterone concentrations using LC-MS/MS than using immunoassays. Based on our findings and those of other studies, the medical decision points for selecting the best testosterone measurement method as well as for showing the positive bias of various immunoassays are as follows: 0.48 ng/mL (1.67 nmol/L) (the present study), 0.55 ng/mL (1.90 nmol/L) [
6], 1.00 ng/mL (3.47 nmol/L) [
18], and 0.50 ng/mL (1.73 nmol/L) [
19]. Thus, serum testosterone concentrations of 0.48–1.00 ng/mL (1.67–3.47 nmol/L) can be considered medical decision points to select the best measurement method in clinical laboratories. We suggest a limit of 0.48 ng/mL (1.67 nmol/L), which is the upper limit of the RI for women of 20–49 years of age based on UPLC-MS/MS.
There are several Korean studies on LC-MS/MS-based measurement of testosterone concentrations [
25-
27]. Testosterone in neonates has been measured using dried blood spot multiplexed steroid profiling using LC-MS/MS [
25]. Serum testosterone has been measured for monitoring chemical castration agents [
26] and for serum steroid profiling in healthy children and adults [
27] using LC-MS/MS. However, there are less than three institutions providing LC-MS/MS clinical services and that too in limited situations.
In conclusion, we developed a UPLC-MS/MS method for measuring low total testosterone concentrations and compared it with an immunoassay. We demonstrated its adequate performance characteristics. Method accuracy was demonstrated by the UPLC-MS/MS analysis results of SRM materials and by participating in several accuracy-based proficiency testing programs. Our method has lower LLOQ values and a wider linearity range than conventional immunoassays. It enables precisely measuring low total testosterone concentrations, especially in children and women, in Korea.