Can IHC Staining Quality be Quantified?

The Role and Challenges of IHC Data
Tissue-based immunohistochemical (IHC) data are important. Sometimes this is the only type of data available to support critical decisions in research, business, or diagnostics. Although that should be reason enough to care a great deal about data quality, that seems to be an area with room for improvement. To be specific: data quality in this context should be understood as sensitivity, specificity, and reproducibility.

Traditionally, IHC stained tissue sections are read and interpreted by a pathologist using a microscope. It is well known that manual assessment of biomarker expression is associated with significant inter- and intra reader variability. A recent multicenter study of Ki67 quantified the inter reader variability associated with this biomarker (1). Studies have shown that image analysis, when used correctly, can be a useful tool for reducing certain types of errors associated with biomarker assessment. As a consequence, regulators, healthcare administrators, reviewers of scientific journal contributions, oncologists and pathologists themselves are increasingly requesting quantitative image analysis for biomarker assessment.

Yet, image analysis does not eliminate or reduce the single most important source of error associated with this type of data: variability in staining quality. This source of error is often neglected or at least underestimated, but will have to be addressed effectively in order for IHC based data to stay relevant and important.

It would be a serious mistake to believe that image analysis can (or should) be used to compensate for insufficient staining quality. But interestingly, it turns out that image analysis can be used for quantification of staining quality.

Can IHC Staining Quality be Quantified

Watch our webinar on Visiopharm has developed and patented a method for the quantification of staining quality. It is simple to implement in quality schemes and provides an objective measure of staining quality that is easy to interpret. 


Data Quality Quantified
With all the relevant disclaimers, HER2 gene amplification is a widely recognized Gold Standard for HER2. That allows us to explore the relationship between staining quality and data quality (sensitivity and specificity) for HER2 protein expression in a quantitative way.

This relationship was explored in a retrospective study presented by Dr. Anja Brügmann at the ECP meeting in London in 2014. The study was based on the NordiQC quality assessment scheme run B12 for HER2 protein expression, involving 228 diagnostic pathology labs (2).

As part of the quality run, the staining quality for each lab was assessed by the five NordiQC assessors, and given one of four marks: Optimal, Good, Borderline, or Poor.

The findings were as follows:

  • Only about 60% of the labs were providing staining of Optimal quality, according to the assessors and the evaluation criteria established for the quality run.
  • For labs with insufficient staining (Borderline or Poor), the sensitivity and specificity wrt gene amplification dropped with as much as 50 percentage points.
  • Outside the window of optimal staining, the inter-lab variability increased with as much as a factor 6.
  • The good news is: Within the window of optimal staining quality, the inter-lab variability (even across reagent vendors) was comparable to what has been published for gene expression assays.

It is worth noting that most of these 228 pathology labs had been participating in the NordiQC scheme for several years. Many of them had previously received protocol recommendations (with varying levels of compliance). Most were using FDA approved Ready-To-Use kits and autostainers. All of the participating labs were/are routinely assessing HER2 protein expression as a core service. The results reported for other EQA schemes are very similar.

Extrapolating Results and Conclusions
As described above, variability in staining quality has a major impact on data quality for HER2 protein expression. There is no reason to believe that it is any less for other IHC-based biomarkers. In the NordiQC external quality scheme, about 30% of all stains are marked insufficient (www.nordiqc.org). In fact, for many research applications, more exotic markers are used under far less controlled circumstances than described above.

In other words, insufficient staining may seriously compromise our ability to make meaningful inferences and ultimately undermine our ability to make informed decisions based on tissue based IHC data. Therefore, organizations such as NordiQC, cIQc, UKNEQAS, and CAP are making important (heroic!) efforts to offer EQA and proficiency testing schemes for diagnostic pathology labs. Their impact is significant, positive and measurable. Still, we see diagnostic pathology labs provide insufficient staining quality and, as a consequence, data quality that is less than optimal.

This is obviously a persistent problem, and the underlying reasons are still discussed within and among EQA organizations. It has been suggested that the lack of objective (quantitative) standards may lead to lack of compliance with protocol recommendations. Another possible explanation offered, is the limited availability of quality runs for various biomarkers which is due to limited bandwidth of EQA organizations (3).

We do not know whether addressing these particular limitations will contribute to improve general compliance with protocol recommendations, and generally improve overall staining quality. But we think it is worth asking those questions, and explore whether image analysis can play a role here.

Staining Quality Quantified
A first step is to quantify IHC staining quality. The method we developed for this, assumes a reference block with a set of reference formalin fixed paraffin embedded (FFPE) tumors. The reference tumors are typically expressing the biomarker of interest at high, intermediate and low positive levels as well as negative. This is a standard for most EQA schemes.

In order to establish reference levels of expression for each reference tumor, a limited number of sections from the reference block are cut and stained using optimized, standardized staining protocols. Preferably these sections are stained by the reference lab. The biomarker expression is determined for each reference tumor in each of the stained (reference) sections. The reference level for a given reference tumor is then determined as an average (or median) expression level across sections.

The staining quality for a test section is now determined as the “distance” between the reference expression levels and test levels (i.e. measured across all tumors). Of course, a number of distance metrics may be defined, but some turns out to be more relevant than others.

HER2 protein expression was calculated based on HER2 Connectivity (4). Based on this approach, and a distance metric defined across all reference tumors, we demonstrated concordance with the five NordiQC assessors in discriminating between sufficient and insufficient staining, with an area under the Receiver Operating Characteristic (ROC) curve of ~0.97.

This method will be simple to implement in quality schemes. It provides an objective measure of staining quality that is easy to interpret. As it automates aspects of the quality grading, it potentially makes it possible/feasible to increase the frequency of quality runs, or possibly even provide on-demand staining quality assessments. Such a principle would also be easy for bio-pharmaceutical companies to implement in multi-center trials for controlling staining quality.

The method has been patented by Visiopharm, and is currently under development as a module for our Oncotopix® digital pathology platform.

Watch our webinar: Can IHC Staining Quality be Quantified?


This webinar will discuss how Visiopharm has developed and patented a method for the quantification of staining quality. It is simple to implement in quality schemes and provides an objective measure of staining quality that is easy to interpret. It helps automate aspects of the quality grading, and potentially makes it possible/feasible to increase the frequency of quality runs, or even provide on-demand quality assessments. Such a principle would also be easy for bio-pharmaceutical companies to implement for controlling staining quality in multi-center trials.

References:

(1) Virtual double staining in the assessment of proliferation markers; Rasmus Røge, Aalborg University Hospital; NordiQC conference 2013; See handout.

(2) Image analysis of breast cancer HER2 protein expression used in assessment of staining quality. / Brügmann, Anja Høegh; Grunkin, M.; Nielsen, Søren; Jensen, V.; Heikkilä, P.; Gaspar, V.; Vyberg, Mogens. I: Virchows Archiv, Vol. 465, Nr. Suppl 1, OFP-07-014, 2014, s. S20. Accessed at http://link.springer.com/article/10.1007/s00428-014-1618-2

(3) NordiQC director, prof. Mogens Vyberg and NordiQC manager, biomedical scientist Søren Nielsen (personal communication)

(4) Digital image analysis of membrane connectivity is a robust measure of HER2 immunostains; Brügmann et. al.; Breast Cancer Res Treat; February 2012, Volume 132, Issue 1, pp 41-49

Follow us on LinkedIn | Twitter

Previous
Next