The aim of this study was to validate inter-observer variability for strain ultrasound elastography (USE) and to compare the diagnostic performance of a combination of gray-scale ultrasound (US) and USE with that of gray-scale US. Three observers from different institutions evaluated gray-scale US images and USE video files of 443 cytopathologically proven benign or malignant thyroid nodules over a 3-mo period. Inter-observer variability did not statistically differ between USE using the Asteria criteria and gray-scale US; however, USE using the Rago criteria had the lowest inter-observer agreement (p < 0.043). For all three observers, sensitivity was increased by adding USE to gray-scale US (81.3%-88.3%, 75.4%-85.4%) compared with gray-scale US (70.4%-80.8%). Specificity was decreased by adding USE to gray-scale US (51.7%-59.1%, 59.1%-73.9%) compared with gray-scale US (69.0%-82.8%). USE and gray-scale US had comparable inter-observer variability. However, on addition of USE to gray-scale US, the additional diagnostic yield was limited compared with that of gray-scale US alone.