ASSAYS AND ANNOTATION

Immunohistochemistry - tissues

Immunohistochemistry - cells

Immunofluorescence - cells

Immunofluorescence - siRNA validation

Western blot

Protein array

RNA

Evidence




Immunohistochemistry - tissues

Description

The protein atlas contains histological images obtained by sections from human tissues. The images represent a view similar to what is seen in a microscope when examining sections of tissue on glass slides. Each antibody in the database has been used for immunohistochemical staining of both normal and cancer tissue, the specific binding of an antibody to its corresponding antigen results in a brown color. The tissue section is also counterstained with hematoxylin to enable visualization of microscopical features. Hematoxylin staining is unspecific and results in a blue coloring of both cells and extracellular material. The immunohistochemical protocol is a standardized protocol (available for download), performed in an identical manner every time, the only variables are primary antibody dilution (optimized for every individual antibody) and the secondary antibody (host species dependent).

Tissue microarrays provide the possibility to immunohistochemically stain a large number and variety of normal and cancer tissues (movie about tissue microarray production and immunohistochemical staining). The generated tissue microarrays include samples from 144 individuals corresponding to 44 different normal tissue types, and samples from 216 cancer patients corresponding to 20 different types of cancer. Each sample is represented by 1 mm tissue cores, resulting in a total number of 576 images for each antibody. Normal tissues are represented by samples from three individuals each (except for endometrium, skin, soft tissue and stomach which are represented by samples from six indivudals each), one core per individual, and protein expression is annotated in 83 different normal cell types present in these tissues. For cancer tissues, two cores are sampled from each individual and protein expression is annotated in tumor cells. Normally, a smaller fraction of the 576 images are missing for each antibody due to technical issues. Specimens containing normal and cancer tissue have been collected and sampled from anonymized paraffin embedded material of surgical specimens, in accordance with approval from the local ethics committee.

Since specimens are derived from surgical material, normal is here defined as non-neoplastic and morphologically normal. It is not always possible to obtain fully normal tissues and thus several of the tissues denoted as normal will include alterations due to inflammation, degeneration and tissue remodeling. In rare tissues, hyperplasia or benign proliferations are included as exceptions. It should also be noted that within normal morphology there may exist inter-individual differences and variations due to primary diseases, age, sex etc. Such differences may also effect protein expression and thereby immunohistochemical staining patterns.

Samples from cancer are also derived from surgical material. Due to subgroups and heterogeneity of tumors within each cancer type, included cases represent a typical mix of specimens from surgical pathology. The inclusion of tumors is based on availability and representativity, however, an effort has been made to include high and low grade malignancies where such is applicable. In certain tumor groups, subtypes have been included, e.g. breast cancer includes both ductal and lobular cancer, lung cancer includes both squamous cell carcinoma and adenocarcinoma and liver cancer includes both hepatocellular and cholangiocellular carcinoma etc. Tumor heterogenity and inter-individual differences may be reflected in diverse expression of proteins resulting in variable immunohistochemical staining patterns.

Annotation

In order to provide an overview of protein expression patterns, all images of immunohistochemically stained tissues were manually annotated by a board certified pathologist or by specially educated personnel followed by verification of a pathologist. The pathologists are experienced in interpretation of tissue morphology under the microscope and have used a specially designed software to view and annotate the histological images. Annotation of each different normal and cancer tissue was performed using fixed guidelines for classification of immunohistochemical outcome. Each tissue was examined for representativity, and immunoreactivity and the different tissue specific cell types included in each normal tissue or tumor cells included in the cancer tissues were annotated. Basic annotation parameters included an evaluation of i) staining intensity (negative, weak, moderate or strong), ii) fraction of stained cells (rare, <25%, 25-75% or >75%) and iii) subcellular localization (nuclear and/or cytoplasmic/membranous). The manual annotation also provides a summarizing text describing the staining pattern for each antibody.

The terminology and ontology used is compliant with standards used in pathology and medical science. SNOMED classification has been used for assignment of topography and morphology. SNOMED classification also underlies the given original diagnosis from which normal as well as cancer samples were collected.

A histological dictionary used in the annotation is available as a PDF-document, containing images which are immunohistochemically stained with antibodies included in the protein atlas. The dictionary displays subtypes of cells distinguishable from each other and also shows specific expression patterns in different intracellular structures. Annotation dictionary: screen usage (15MB), printing (95MB).

Knowledge-based annotation

Annotated protein expression aims to create a comprehensive knowledge-based map over protein expression patterns in normal human tissues and cells. The conflation of data from two or more antibodies directed towards the same protein target (non-overlapping epitopes), evaluation of the performance of respective antibodies and a review of available protein/RNA/gene characterization data, allows for a knowledge-based interpretation of the distribution pattern and relative abundance of proteins in various tissues. An annotation of protein expression is possible for all genes for which there are two or more antibodies directed towards the corresponding protein target.

The immunohistochemical staining pattern in normal tissues, subjectively annotated based on the experienced evaluation of positive immunohistochemical signals in defined subpopulations of cells within a tissue context, provides the fundament for a subsequent annotated protein expression. The microscopical images and previous annotations of the 83 included normal cell types are reviewed simultaneously and compared between different antibodies towards the same protein target. The annotation data from the different antibodies is merged and results in a single expression profile for each protein. In addition to accounting for performance of antibodies and available protein/RNA/gene characterization data, the review also takes sub-optimal experimental procedures into consideration. This includes immunostaining errors such as sub-optimal titration of the primary antibody and suspected cross-reactivity as well as the fact that multiple immunostainings have been performed on non-consecutive tissue microarray sections, allowing for differences in immunohistochemical staining patterns caused by inter-individual and inter-specimen variations. The final annotated protein expression is considered as a best estimate and as such reflects the most probable histological distribution and relative expression level for each evaluated protein, and is displayed as high, medium, low or not detected level of expression.

Back to top



Immunohistochemistry - cells

Description

As a complement to the representation of normal and cancer tissue, the protein atlas displays images of a selection of widely used and well characterized human cell lines as well as cell samples from healthy individuals and leukemia/lymphoma patients.

A cell microarray has been used to enable immunohistochemical staining of a panel of cell lines and cell samples. Duplicates from 46 cell lines,10 leukemia blood cell samples and 2 samples of PBMC renders a total of 116 cell images per antibody. Included cell lines are derived from DSMZ, ATCC or academic research groups (kindly provided by cell line founders). Information regarding sex and age of the donor, tissue origin and source is listed here. All cells are fixed in 4% paraformaldehyde and dispersed in agarose prior to paraffin embedding and immunohistochemical staining.

The CMA enables representation of leukemia and lymphoma cell lines, covering major hematopoietic neoplasms and even different stages of differentiation. Cell lines from solid tumors are also included in the CMA. A subset originate from solid tumors not represented in the TMAs, e.g. sarcoma, choriocarcinoma, small cell lung carcinoma, and the remaining cell lines are derived from tumor types also represented in the TMAs.

The immunohistochemical protocols used result in a brown-black staining, localized where an antibody has bound to its corresponding antigen. The section is furthermore histochemically counterstained with hematoxylin to enable visualization of microscopical features. Hematoxylin staining is unspecific, and results in a blue coloring of both cells and extracellular material.

Annotation

In order to provide an overview of protein expression patterns, all images of immunohistochemically stained cell lines are annotated using an automated recognition software for image analysis. The image analysis software, TMAx (Beecher Instruments, Sun Prairie, WI, USA), built on an object-oriented image analysis engine from Definiens, utilizes rule-based operations and multiple iterative segmentation processes together with fuzzy logic to identify cells and immunohistochemical stain deposits.

Output parameters from the software always displayed in conjunction with the annotated images are:

  • number of objects defined as cells in the image
  • staining intensity (negative, weak, moderate and strong)
  • fraction (%) of positive cells
In addition, two overlay images with additional numerical information are presented to facilitate interpretation. The information displayed includes:
  • Cell: object based view representing fraction (%) of immunostained cells. The color code for each cell represents a range of immunoreactivity, blue (negative/very weak), yellow (weak/moderate), orange (moderate/strong) and red (strong) cells. This classification is based on areas of different intensities within each object (cell). This differs slightly from the subjective classification provided by manual annotation of cells in normal and cancer tissue.
  • Area: area-based view representing immunostained areas (%) within cells. The color code represents a range of immunoreactivity, yellow (weak/moderate), green (moderate/strong) and red (strong). Negative/very weak areas are transparent. The intensity score is generated from the total of this area based analysis.

Back to top



Immunofluorescence - cells

Description

As a complement to the immunohistochemically stained cells and tissues, the protein atlas displays high resolution, multicolor images of immunofluorescently stained cells. This provides spatial information on protein expression patterns on a fine cellular and subcellular level.

Originally three cell lines, U-2 OS, A-431 and U-251 MG, originating from different human tissues were chosen to be included in the immunofluorescent analysis. Starting from year 2012, the cell line panel has been expanded to include additional cell lines: A-549, BJ, CACO-2, HaCaT, HEK 293, HeLa, Hep-G2, MCF-7, PC-3, RH-30, RT-4, SH-SY5Y, SiHa, SK-MEL-30 and TIME. To enhance the probability for a large number of proteins to be expressed, the cell lines were selected from different lineages, e.g. tumor cell lines from mesenchymal, epithelial and glial tumors. The selection was furthermore based on morphological characteristics, widespread use and multitude of publications using these cell lines. Information regarding sex and age of the donor, cellular origin and source is listed here. For each antibody two suitable cell lines from the cell line panel are now selected for the immunofluorescent analysis, based on RNA sequencing data. The third cell line chosen for each antibody is always U-2 OS, in order to localize the whole human proteome on a subcellular level in one cell line.

In addition to the human cell lines, the mouse cell line NIH 3T3 is also stained. This is only done for the antibodies corresponding to genes where the mouse and human genes are orthologues.

Besides the HPA antibodies, the cells are also stained with reference markers in order to facilitate the annotation of the subcellular distribution of the protein targeted by the HPA antibody. The following probes/organelles are used as references; (i) DAPI for the nucleus, (ii) anti-tubulin antibody as internal control and marker of microtubules, and (iii) anti-calreticulin or anti-KDEL for the endoplasmic reticulum (ER).

The resulting confocal images are single slice images representing one optical section of the cells. The microscope settings are optimized for each sample. The different organelle probes are displayed as different channels in the multicolor images; the HPA antibody staining is shown in green, nuclear stain in blue, microtubules in red and ER in yellow.

Annotation

In order to provide an interpretation of the staining patterns, all images of immunofluorescently stained cell lines are manually annotated. For each cell line and antibody the intensity and subcellular location of the staining is described. The staining intensity is classified as negative, weak, moderate or strong based on the laser power and detector gain settings used for image acquisition in combination with the visual appearance of the image. The subcellular location is further combined with parameters describing the staining characteristics (e.g. smooth, granular, speckled or fibrous).

Knowledge-based annotation

Knowledge-based annotation of subcellular location aims to provide an interpretation of the subcellular location of a protein in three human cell lines. The conflation of immunofluorescence data from two or more antibodies directed towards the same protein and a review of available protein/gene characterization data, allows for a knowledge-based interpretation of the subcellular location.

Back to top



Immunofluorescence - siRNA validation

Description

To validate the protein subcellular localization determined with the HPA-antibody, the staining procedure is repeated on siRNA transfected U-2 OS cells.

A reverse solid phase transfection protocol is used to coat cell seeding surfaces with siRNA and transfection reagents prior to cell seeding. After siRNA transfection has occurred, cells are fixated and stained according to the standard protocol. For each antibody, the assay is performed in duplicates using siRNA:s from two different providers, and the results are compared to negative control cells transfected with scrambled siRNA.

Images are automatically acquired using objectives with 10x- and 40x-magnification. An automated image analysis protocol segments the cells and extracts features from all acquired images before statistical software automatically compare the population median staining intensity between siRNA coated and negative control samples.

Relative Fluorescence Intensity (RFI) denotes the percentage of remaining staining intensity in siRNA down regulation wells.

Annotation

For each antibody, the statistical analysis is performed in the one of the three segmented cell areas (nucleus, cytoplasm or whole cells) that best matches the antibody staining. Based on the RFI values, an siRNA Validation score is set grouping the siRNA assays according to the amount of signal down regulation.

Back to top



Western blot

Description

Western blot analysis of antibody specificity has been done using a routine sample setup composed of IgG/HSA-depleted human plasma and protein lysates from a limited number of human tissues and cell lines. Antibodies with a non-supportive routine WB have been revalidated using an over-expression lysate (VERIFY Tagged Antigen(TM), OriGene Technologies, Rockville, MD) as a positive control. Antibody binding was visualized by chemiluminescence detection in a CCD-camera system using a peroxidase (HRP) labeled secondary antibody.

Antibodies included in the Human Protein Atlas have been analyzed without further efforts to optimize the procedure and therefore it cannot be excluded that certain observed binding properties are due to technical rather than biological reasons and that further optimization could result in a different outcome.

Back to top



Protein array

Description

All purified antibodies are analyzed on antigen microarrays. The specificity profile for each antibody is determined based on the interaction with 384 different antigens including its own target. The antigens present on the arrays are consecutively exchanged in order to correspond to the next set of 384 purified antibodies. Each microarray is divided into 14 replicated subarrays, enabling the analysis of 14 antibodies simultaneously. The antibodies are detected through a fluorescently labeled secondary antibody and a dual color system is used in order to verify the presence of the spotted proteins. A specificity profile plot is generated for each antibody, where the signal from the binding to its own antigen is compared to the unspecific binding to all the other antigens. The vast majority of antibodies are given a pass, but a fraction are failed either due to low signal or low specificity.

Back to top



RNA

Description

In total, 44 cell lines and 32 tissues have been analyzed by RNA-seq to estimate the transcript abundance of each protein-coding gene.

For cell lines, early-split samples were used as duplicates and total RNA was extracted using the RNeasy mini kit. Information regarding cellular origin and source of each cell line is listed here.

For normal tissue, specimens were collected with consent from patients and all samples were anonymized in accordance with approval from the local ethics committee (ref #2011/473) and Swedish rules and legislation. All tissues were collected from the Uppsala Biobank and RNA samples were extracted from frozen tissue sections.

For a total number of 91 cell line samples and 122 tissue samples, mRNA sequencing was performed on Illumina HiSeq2000 and 2500 machines (Illumina, San Diego, CA, USA) using the standard Illumina RNA-Seq protocol with a read length of 2x100 bases. Transcript abundance estimation was performed using Tophat v2.0.8b and Cufflinks v2.1.1. For each gene, FPKM values or 'number of Fragments Per Kilobase gene model and Million reads', were calculated and the average FPKM value for replicate samples were used as abundance scores. The threshold level to detect presence of a transcript for a particular gene was set to > 1 FPKM.

The RNA-seq data was used to classify all genes according to their tissue-specific expression into one of six different categories, defined based on the total set of all FPKM values in 32 tissues:

  • tissue enriched (expression in one tissue at least five-fold higher than all other tissues)
  • group enriched (five-fold higher average FPKM level in a group of two to seven tissues compared to all other tissues)
  • tissue enhanced (five-fold higher average FPKM level in one or more tissues compared to the mean FPKM of all tissues)
  • expressed in all (> 1 FPKM in all tissues)
  • not detected (< 1 FPKM in all tissues)
  • mixed (detected in 1-31 tissues and none of the above categories)
An additional category "elevated", containing all genes in the first three categories (tissue enriched, group enriched and tissue enhanced), has been used for some parts of the analysis.

FPKM thresholds were further set for categorization of transcript expression levels into low, medium or high RNA abundance.

Abundance FPKM tissue FPKM cell line
Not detected 0-1 0-1
Low 1-10 1-20
Medium 10-50 20-50
High >50 >50

Back to top



Evidence

Description

Protein evidence is calculated for each gene based on three different sources: UniProt protein existence (UniProt evidence); a Human Protein Atlas antibody- or RNA based score (HPA evidence); and evidence based on two proteogenomics studies (MS evidence). In addition, for each gene, a protein evidence summary score is based on the maximum level of evidence in all three independent evidence scores (Evidence summary).

All scores are classified into the following categories:

  • Evidence at protein level
  • Evidence at transcript level
  • No evidence
  • Not available

UniProt evidence is based on UniProt protein existence data, which uses five types of evidence for the existence of a protein. All genes in the classes "experimental evidence at protein level" or "experimental evidence at transcript level" are classified into the first two evidence categories, whereas genes from the "inferred from homology", "predicted", or "uncertain" classes are classified as "No evidence". Genes where the gene identifier could not be mapped to UniProt from Ensembl version 75.37. are classified as "Not available".

The HPA evidence is calculated based on the manual curation of Western blot, tissue profiling and subcellular location as well as transcript profiling using RNA-seq. All genes with supportive protein reliability in one or more of the three methods immunohistochemistry, immunofluorescence or Western blot are classified as "Evidence at protein level". For the remaining genes, all genes detected at FPKM > 1 in at least one of the tissues or cell lines used in the RNA-seq analysis are classified as "Evidence at transcript level". A small number of genes lack RNA-seq data due to software error and are classified as "Not available". All remaining genes are classified as "No evidence".

MS evidence is based on two proteogenomics studies Kim et al 2014 and Ezkurdia et al 2014. Each gene detected by at least one of the MS-based studies is classified as "Evidence at protein level" and all remaining genes as "Not available".

Back to top