DATA QUALITY ASSURANCE AND SCORING

Quality assurance

Antibody validation

    Immunohistochemistry (IH)

    Immunofluorescence (IF)

    Immunofluorescence siRNA (IF siRNA)

    Western blot (WB)

    Protein array (PA)

Reliability score

    Immunohistochemistry (IH)

    Immunofluorescence (IF)

RNA approval - cells




Quality assurance

The usefulness of antibodies in different assays is dependent on both sensitivity and specificity of epitope binding. The quality of antibodies in the database is monitored through a number of different quality assurance steps. Below is a list of measures taken to ensure that the quality of produced and utilized PrEST antibodies is acceptable. All PrEST antibodies must pass steps 1-3 in order to be used for immunohistochemistry. Steps 4-5 provide a basis for an evaluation and scoring of antibody validity. All antibodies that provide a reasonable pattern of immunoreactivity are added to the Human Protein Atlas portal. Feed-back from the research community is appreciated and needed for continuous curation of data.

Quality assurance steps for PrEST antibodies generated within the Human Protein Atlas project:

  1. Plasmid inserts are sequenced to assure that the correct PrEST sequence is cloned.
  2. Size of resulting recombinant protein (including the specific PrEST) is analyzed using mass spectrometry to assure that the correct antigen has been produced and purified.
  3. To control for cross-reactivity, affinity purified antibodies are tested for sensitivity and specificity on protein arrays consisting of glass slides with spotted PrEST fragments.
  4. Antibody specificity is analyzed using Western blot in a standardized setup. Total protein lysates from a limited number of tissues (liver and tonsil), cell lines (RT4 and U-251 MG), and human plasma are used to evaluate the antibody target binding in a Western blot setting. Antibodies with a non-supportive routine WB have been revalidated using an over-expression lysate (VERIFY Tagged Antigen(TM), OriGene Technologies, Rockville, MD) as a positive control.
  5. Immunohistochemical staining of normal and cancer tissue is examined by trained pathologists to assure plausible immunohistochemical staining properties.

For commercially available antibodies (CABs), immunohistochemistry has been performed in a similar manner as for HPA-antibodies. These antibodies have also been tested on Western blots. For each commercially available antibody, a link to the antibody provider is given on the "Antibody/Antigen" page.


Back to top



Antibody validation

The antibody validation indicates how well the quality assurance data supports the specificity of the antibody towards the expected human target protein in various assays.

For antibodies supplied through commercial or other academic sources we provide Western blot validation, immunofluorescence validation and immunohistochemistry validation based on literature conformity and for immunohistochemistry validation also RNA consistency. For further validation we refer to quality controls provided by the respective company.


Immunohistochemistry (IH)

The result of the immunostaining of each antibody is compared with available gene/RNA/protein characterization data, resulting in two different validations: Literature conformity and RNA consistency. Literature conformity is based on conformance of the expression pattern to available gene/protein characterization data in scientific literature and data from bioinformatic predictions. Extensive or sufficient gene/protein data requires that there is evidence of existence on a protein level and that a substantial quantity of published experimental data is available from literature and public databases. Limited protein/gene data does not require evidence of existence on a protein level and refers to genes for which only bioinformatic predictions and scarce published experimental data is available. RNA consistency is based on a comparison of immunohistochemistry data with the internally generated RNA-Seq data.

The different options of literature conformity are:

  • Consistent with extensive gene/protein characterization data
  • Consistent with gene/protein characterization data
  • Partly consistent with extensive gene/protein characterization data
  • Partly consistent with gene/protein characterization data
  • Not done
  • No available gene/protein characterization data
  • Not consistent with gene/protein characterization data

RNA consistency is scored as follows:

  • Consistent with RNA-Seq data
  • Mainly consistent with RNA-Seq data
  • Mainly not consistent with RNA-Seq data
  • Not at all consistent with RNA-Seq data
  • Cannot be evaluated
  • Not done



Back to top



Immunofluorescence (IF)

For each cell line, the observed staining is assigned a validation score, classified as either Supportive, Uncertain or Non-supportive based on concordance with available experimental gene/protein characterization data in the UniProtKB/Swiss-Prot database. The validation scores for the three cell lines are merged into one of the main categories; Supportive, Uncertain or Non-supportive, to represent the antibody staining in all analyzed cell lines.

Validation scores for Immunofluorescence:

Supportive

  • One/multiple location(s) supported by experimental gene/protein characterization data and supported by ≥1 other antibody.
  • One/multiple location(s) with no available experimental gene/protein characterization data or partly supported and partly conflicting data, but supported by ≥1 other antibody.
  • One/multiple location(s) supported by experimental gene/protein characterization data.
  • Multiple locations partly supported (at least one) by experimental gene/protein characterization data.
  • One/multiple location(s) in cytoplasm (e.g. Golgi apparatus, mitochondria) supported by experimental evidence for cytoplasmic localization.

Uncertain

  • Location not consistent with experimental gene/protein characterization data, but supported by ≥1 other antibody.
  • One/multiple location(s) with no available experimental gene/protein characterization data.
  • Not decisive - One/multiple location(s) where experimental gene/protein characterization data is partly supporting and partly conflicting.
  • No staining.
  • One/multiple location(s) supported by experimental gene/protein characterization data but showing dissimilar staining to ≥1 other antibody.

Non-supportive

  • Location not consistent with experimental gene/protein characterization data.
  • Location not consistent with experimental gene/protein characterization data and showing dissimilar staining to ≥1 other antibody.
  • One/multiple location(s) with no available experimental gene/protein characterization data or partly supported and partly conflicted, but showing dissimilar staining to ≥1 other antibody.

The validation of multi-targeting (targeting proteins encoded by two or more genes) antibodies is based on the conformance of the expression pattern to available gene/protein characterization data. Similarity between paired antibodies is not taken in account due to the complexity of multiple gene targets.

Validation scores for Immunofluorescence - multi-targeting antibodies:

Supportive

  • The multi-targeting antibody (targeting proteins encoded by two or more genes) yielding a staining pattern consistent with available gene/protein characterization data for all of the genes.
  • The multi-targeting antibody (targeting proteins encoded by two or more genes) yielding a staining pattern partly consistent with available gene/protein characterization data for all of the genes.

Uncertain

  • The multi-targeting antibody yielding a staining pattern with no available gene/protein characterization data.
  • The multi-targeting antibody yielding a staining pattern consistent with available gene/protein characterization data for at least one of the genes but not all.
  • The multi-targeting antibody not yielding a staining pattern.

Non-supportive

  • The multi-targeting antibody yielding a staining pattern not consistent with available gene/protein characterization data.



Back to top



Immunofluorescence siRNA (IF siRNA)

For each siRNA validation assay a validation score is assigned based on the decrease in antibody-based staining intensity upon target protein downregulation.

Validation scores for immunofluorescence siRNA validation:

Supportive

  • Signal downregulation > 25 % by both siRNA:s.
  • Signal downregulation > 25 % by one siRNA and > 10 % by the other.
  • Signal downregulation > 25 % by one siRNA.
  • Signal downregulation < 10 % by one/two siRNA:s.



Back to top



Western blot (WB)

Supportive

  • Bands corresponding to the predicted size in kDa (+/-20%).
  • Band of predicted size in kDa (+/-20%) with additional bands present.

Uncertain

  • Single band larger than predicted size in kDa (+20%) but partly supported by predicted transmembrane region, signal peptide or by other available data.
  • No bands detected.
  • Single band differing more than +/-20% from predicted size in kDa and not supported by predicted transmembrane region, signal peptide or by other available data.

Non-supportive

  • Weak band of predicted size in kDa (+/-20%) but with additional bands of higher intensity also present.
  • Only bands not corresponding to the predicted size.
  • Target too small/large to be analyzed with the present setup.

For antibodies showing non-supportive Western blot data the corresponding image is not shown.


Back to top



Protein array (PA)

Supportive

  • Pass with single peak corresponding to interaction only with its own antigen.

Uncertain

  • Pass with quality comment low specificity (binding to 1-2 PrESTs >15% and <40%).

Non-supportive

  • No or weak signal.
  • Low specificity (one antigen with >40% signal or more than two antigens with signal >15%).

Antibodies that are validated as non-supportive are not published.


Back to top



Reliability score

A reliability score is set for all genes and indicates the level of reliability of the analysed protein expression pattern based on available protein/RNA/gene characterization data.

Immunohistochemistry (IH)

Genes with more than one antibody

The reliability score for genes with more than one antibody are selected manually, based on the output of the annotated protein expression. Experienced personnel evaluate the performance of respective antibodies and compare the staining pattern with available protein/RNA/gene characterization data as well as similarity between paired antibodies and internally generated RNA-Seq data. A similar immunostaining pattern between paired antibodies implies that two or more antibodies directed towards the same protein target show the same cellular and subcellular distribution pattern in a vast majority of analyzed normal tissues. A partly similar immunostaining pattern implies that two or more antibodies directed towards the same protein target show the same cellular and subcellular distribution pattern in a majority of analyzed normal tissues but that the distribution of positivity differs between antibodies in a subset of analyzed tissues.

The following criteria are needed for a gene with two or more antibodies to yield a Supportive reliability score:

  • Paired antibodies with similar or partly similar immunostaining pattern, and RNA-Seq data consistent or mainly consistent with protein expression data.

OR

  • Paired antibodies with similar or partly similar immunostaining pattern, consistent with extensive gene/RNA/protein characterization data with support for the cellular distribution of immunoreactivity, RNA-Seq data unavailable or cannot be evaluated.

Genes with only one antibody

For genes with only one antibody, the two antibody validations literature conformity and RNA consistency are together used for automatic generation of the reliability score of the gene, divided into supportive (premium) or uncertain (not premium). Tissue enriched, group enriched and tissue enhanced with higher expression in a small number of tissues as compared to other analyzed tissues automatically include positive and negative controls and are hence handled slightly different when it comes to the criteria.

The following criteria are needed for a gene with only one antibody to yield a Supportive reliability score:

  • Protein expression data consistent or mainly consistent with RNA-Seq data.
  • Protein expression consistent with available gene/protein characterization data.
  • For the RNA categories tissue enriched, group enriched and tissue enhanced, also protein expression partly consistent with available gene/characterization data or no available gene/protein characterization data in combination with protein expression consistent with RNA-Seq data is acceptable for a supportive reliability score.
  • For the RNA category tissue enriched also the combination of no available gene/protein characterization data with protein expression mainly consistent with RNA-Seq data is acceptable for a supportive reliability score.


Back to top



Immunofluorescence (IF)

The reliability of the annotated protein expression data is scored as supportive or uncertain depending on similarity in immunostaining patterns and consistency with available experimental gene/protein characterization data in the UniProtKB/Swiss-Prot database. Assays referred to in the reliability scores are western blot (WB) and siRNA. If siRNA validation supports a subcellular localization it is always considered supportive. The reliability scores are based on the following criteria:

Supportive

  • Two independent antibodies yielding similar or partly similar staining patterns.
  • Two independent antibodies yielding dissimilar staining patterns, both supported by experimental gene/protein characterization data.
  • One antibody yielding a staining pattern supported by experimental gene/protein characterization data.
  • One antibody yielding a staining pattern with no available experimental gene/protein characterization data, but supported by other assay within the protein atlas.
  • One or more independent antibodies yielding staining patterns not consistent with experimental gene/protein characterization data, but supported by siRNA assay.

Uncertain

  • Two independent antibodies yielding partly similar staining patterns but not consistent with experimental protein/gene characterization data.
  • Two independent antibodies yielding dissimilar staining patterns with no available, or partly supportive/partly conflicting, experimental gene/protein characterization data.
  • One antibody yielding a staining pattern with no available, or partly supportive/partly conflicting. experimental gene/protein characterization data.


Back to top



RNA approval - cells

Antibodies used for the analysis of protein expression in cell lines were validated by comparison of immunohistochemical staining results with available transcript data in 44 cell lines. For two cell lines (LP-1 and Hth83) transcript data is missing.

Several different approval criteria were applied in order to adequately assess the quality of each antibody. Criteria are listed in table below. Spearman correlation between continuous values of IHC quantification and FPKM values across the set of cell lines constitutes one of the basic strategies. In addition to this we also compare categorized expression levels (low, medium and high) set by arbitrary threshold values, in order to evade the difficulty of comparing continuous numbers generated with two methods offering vastly different levels of accuracy and sensitivity. In brief the approval is performed automatically from generated expression values, and is designed as a funnel in which the antibodies are tried against the selection criteria with a descending level of stringency. Antibodies approved according to more stringent criteria are denoted "supportive antibodies" (marked with a star in the Human Protein Atlas), while the remaining antibodies are denoted "uncertain".

Approval category Criteria Supportive
Expression lymphoid cell lines ≥20% of lymphoid cell lines medium/high (the same for RNA and protein) AND 100% of remaining cell lines no/low. Yes
Expression myeloid cell lines ≥25% of lymphoid cell lines medium/high (the same for RNA and protein) AND 100% of remaining cell lines no/low. Yes
Expression hematopoietic cell lines ≥40% of lymphoid cell lines medium/high, one solid tumor cell line allowed to be moderate, remaining solid tumor cell lines no/low. Yes
Expression solid tumor cell lines ≥40% of lymphoid cell lines medium/high, one hematopoietic cell line allowed to be moderate, remaining hematopoietic cell lines no/low. Yes
Expression epithelial cell lines ≥ 20% of 15 epithelial cell lines medium/high (the same for RNA and protein) AND 100% of remaining cell lines no/low. Yes
Expression single cell line Only 1 cell line medium/high (the same for RNA and protein) AND ≤50% of remaining cell lines low (the rest no expression). Yes
Expression subset of cell lines 2-10 cell lines medium/high (the same for RNA and protein) AND ≤50% of remaining cell lines low (the rest no expression). Yes
Correlation ≥0.65 Spearman correlation ≥0.65 across 44 cell lines. Yes
All high/all medium/all low Either high, medium or low expression across all 44 cell lines. One cell line allowed to deviate in each category.* Yes
Congruent expression levels Congruent detection (RNA and protein) of no/low and medium/high expression in all 44 cell lines. Yes
Congruent expression, highest/lowest Protein and transcript data reveal detection of expression above threshold in the same cell lines. Additional criteria on cell lines with highest/lowest expression. ** Yes
Correlation ≥0.55, highest/lowest Spearman correlation ≥0.55 across 44 cell lines, in addition either the cell line with the highest level or the lowest level of expression must be congruent (RNA and protein). No
Congruent expression Protein and transcript data reveal detection of expression above threshold in the same cell lines. No
All no expression No expression detected across all 44 cell lines. No
All no expression/low expression No or low expression detected in all 44 cell lines. No
* For all no, one cell line is allowed to show low expression.
For all low, one cell line is allowed to show no or medium expression.
For all medium, one cell line is allowed to show low or high expression.
For all high, one cell line is allowed to show medium expression.
** Transcript and protein data must congruently identify the:
  cell line with the highest expression AND the three cell lines with the lowest expression OR
  cell line with the lowest expression AND the three cell lines with the highest expression OR
  the seven cell lines with the highest expression OR
  the seven cell lines with the lowest expression.

Back to top