1 Personalized Analysis of Cancer Data: From Genes to Pathways (and Back) Eytan Domany Dept of Physics of Complex Systems Weizmann Institute of Science, Rehovot, Israel Anna Git Carlos Caldas Yotam Drier Michal Sheffer Anna Livshits Gari Fuks Montpellier June 2017
2 BREAST CANCER: THE CHALLENGE:PERSONALIZED PROGNOSTIC PREDICTIVE MEDICINE – FOR BETTER TREATMENT OF CANCER MEASURE (IN SAMPLE FROM TUMOR) GENOME- WIDE HIGH-THROUGHPUT DATA (MUTATIONS, EXPRESSION, METHYLATION, SNP, DNA COPY NUMBER, ETC), AND USE FOR PROGNOSIS (PREDICT OUTCOME, AGGRESSIVENESS) PREDICT RESPONSE TO THERAPY OF INDIVIDUAL PATIENTS/TUMORS BREAST CANCER - THE MAIN QUESTIONS: 1. CHEMOTHERAPY – YES/NO? IF YES – WHICH? SEARCH FOR “OMIC” MOLECULAR SIGNATURES SINCE THE CLASSICAL CLINICAL CRITERIA (NIH, ST. GALLEN, NPI) LEAD TO OVERTREATMENT
3 CHEMOTHERAPY ??? No CHEMO if Low Risk BREAST CANCER DEATH RATE /100,000 per year INCIDENCE: ABOUT 1 OUT OF 9 WOMEN AFFECTED. EARLY DISCOVERY: SMALL TUMOR ( < 2cm ), HAS NOT SPREAD TO LYMPH NODES, LOWEST GRADE, STAGE TREATMENT: SURGICAL REMOVAL OF TUMOR + RADIOTHERAPY + HORMONAL THERAPY IF ER+; Herceptin if HER2+ CHEMOTHERAPY ??? No CHEMO if Low Risk DECISION Yes/No was TAKEN ON THE BASIS OF CLINICAL PARAMETERS: NIH, St Gallen, NPI CRITERIA Estrogen Receptor – resides in cell cytosol and nucleus, upon binding of the hormone (ligand) the complex transports to the nucleus and facilitates transcription or acts as TF. 2/3 of patients are ER+ Treatment by tamoxifen reduces recurrence by 10-15%. Progesterone Receptor PgR another hormone Surgery and radiation – local, hormonal – targeted. Chemo – systemic, much less targeted. SEVERE OVERTREATMENT => GRADES 1,2,3
4 (Amsterdam signature)A SUCCESSFUL GENE EXPRESSION BASED PROGNOSTIC SIGNATURE FOR EARLY-DISCOVERY BREAST CANCER: ANOTHER ONE: Wang et al. Lancet 2005, List = 76 genes (Rotterdam Signature) Van’t Veer. et al. Nature 2002, List = 70 genes (Amsterdam signature) 76 3 70 Several microarray studies yielded gene sets whose expression profiles successfully predicted survival (Ramaswamy et al., 2003; Sorlie et al., 2001; van 't Veer et al., 2002). Problem: complete lack of agreement between the gene lists. NO!! Different Platforms ! Different Populations of Patients ! Different Types of Analysis! Ein-Dor et al Bioinformatics (2005) Michiels et al Lancet (2005)
5 (Amsterdam signature, Mammaprint)A SUCCESSFUL GENE EXPRESSION BASED PROGNOSTIC SIGNATURE FOR EARLY-DISCOVERY BREAST CANCER: ANOTHER ONE: Wang et al. Lancet 2005, List = 76 genes (Rotterdam Signature) Van’t Veer. et al. Nature 2002, List = 70 genes (Amsterdam signature, Mammaprint) 76 3 70 INHERENT LACK OF ROBUSTNESS OF PROGNOSTIC GENE LISTS* (GENES WERE RANKED BY CORRELATION OF EXPRESSION WITH OUTCOME – TOP 70 MADE THE LIST) *Ein Dor et al Bioinformatics 2005, *PNAS 2006; *Michiels et al Lancet 2005, Several microarray studies yielded gene sets whose expression profiles successfully predicted survival (Ramaswamy et al., 2003; Sorlie et al., 2001; van 't Veer et al., 2002). Problem: complete lack of agreement between the gene lists.
6 PROGNOSTIC PERFORMANCE OF THE 70 TOP-RANKED GENESVan’t Veer THE PROGNOSTIC VALUE OF THE SELECTED 70 GENES IS SIMILAR TO THAT OF MOST OTHER RANDOM SETS OF 70 GENES Ein-Dor et al Bioinformatics 21:171 (2005)
7 (Amsterdam signature, Mammaprint)A SUCCESSFUL GENE EXPRESSION BASED PROGNOSTIC SIGNATURE FOR EARLY-DISCOVERY BREAST CANCER: ANOTHER ONE: Wang et al. Lancet 2005, List = 76 genes (Rotterdam Signature) Van’t Veer. et al. Nature 2002, List = 70 genes (Amsterdam signature, Mammaprint) 76 3 70 INHERENT LACK OF ROBUSTNESS OF RANKED GENE LISTS* HOW MANY SAMPLES ARE NEEDED TO GET A ROBUST LIST OF 70?** (ROBUST = 50% OVERLAP, f = 0.50) => PHYSICS!!! *Ein Dor et al Bioinformatics 2005, **PNAS 2006; Several microarray studies yielded gene sets whose expression profiles successfully predicted survival (Ramaswamy et al., 2003; Sorlie et al., 2001; van 't Veer et al., 2002). Problem: complete lack of agreement between the gene lists.
8 NTOP = αNg (=70) [Ng = #genes, n = #samples]P( f ) IS GIVEN AS A SUM OVER 2Ng BINARY VARIABLES, COUPLED BY 3 CONSTRAINS; USE SADDLE POINT INTEGRATION, LARGE Ng EXPANSION TO GET: Insert Gaussian, mark f*, Sigma fn*(α) – Typical Overlap – St.D. of the distribution
9 between two lists of top 70 genes (ranked by correlation with outcome)HOW MANY PATIENTS ARE NEEDED TO HAVE f n* = 0.5 (typical f ) ?? Prob[ f > ] = 0.5 n * f n (samples) Van’t Veer needs 2200 training samples to get 50% typical overlap between two lists of top 70 genes (ranked by correlation with outcome) Ein-Dor et al PNAS 2006
10 (Amsterdam signature)A SUCCESSFUL GENE EXPRESSION BASED PROGNOSTIC SIGNATURE FOR EARLY-DISCOVERY BREAST CANCER: ANOTHER ONE: Wang et al. Lancet 2005, List = 76 genes (Rotterdam Signature) Van’t Veer. et al. Nature 2002, List = 70 genes (Amsterdam signature) 76 3 70 INHERENT LACK OF ROBUSTNESS OF RANKED GENE LISTS* HOW MANY SAMPLES ARE NEEDED TO GET A ROBUST LIST OF 70?** NO IMPROVEMENT OVER “CLASSICAL” METHODS (DESPITE CLAIMS) (BUT - SEE Cardoso NEJM 2016) NO BIOLOGICAL INSIGHT GAINED *Ein Dor et al Bioinformatics 2005, **PNAS 2006; Drier PLoS ONE 2011 Dupuy, & Simon JNCI 2007, Dowsett et al JCO 2013, Domany Cancer Res 2014 Several microarray studies yielded gene sets whose expression profiles successfully predicted survival (Ramaswamy et al., 2003; Sorlie et al., 2001; van 't Veer et al., 2002). Problem: complete lack of agreement between the gene lists.
11 CURRENTLY USED (TRANSCRIPTOMIC) PREDICTORS:# GENES PLATFORM SAMPLE Mammaprint + Classical (Adjuvant!) 70 Microarray Fresh-Frozen Oncotype Dx (Knowledge –based) 21 qRT-PCR Formalin-Fixed Paraffin-Embedded Prosigna (PAM 50) 50 Nanostring LIMITED SUCCESS, DID NOT REPLACE CLASSICAL CLINICAL VARIABLES
12 FAILURES - WHY?: SOME OF THE REASONS (1. CULTURAL AND 2. TECHNICAL):1. THE FIELD WAS DOMINATED BY TWO EXTREMES: a. USE NO BIOLOGICAL/CLINICAL EXISTING KNOWLEDGE, (turn ignorance into a virtue) or b. DEMAND/ASSUME FULL DETAILED MECHANISTIC KNOWLEDGE (don’t dare talk to me unless you know and use all details) 2. FEW POINTS (TUMORS, ) IN HIGH DIMENSIONAL SPACES (GENES: – 10,000): “CURSE OF DIMENSIONALITY” SINGLE-GENE BASED DESCRIPTION = “ATOMISTIC” APPROACH
13 WHAT’S WRONG WITH THIS CAR?:“ATOMISTIC” APPROACH: MEASURE SOME PROPERTY (e.g. TEMPERATURE) OF EVERY SINGLE COMPONENT – 12,000 NUMBERS CHARACTERIZE THE “STATE “ OF EACH CAR TRY TO DETERMINE THE FEATURES THAT CAN BE USED TO TELL HEALTHY CARS FROM SICK ONES. NO EXISTING KNOWLEDGE ABOUT CARS IS USED
14 A “PHENOMENOLOGICAL” “SYSTEMS” APPROACHTRANSMISSION COOLING ENGINE BRAKES MEASURE FOR EACH SYSTEM ONE NUMBER, THAT INDICATES THE DEVIATION OF THIS SYSTEM’S FUNCTIONING FROM NORMAL . EACH CAR IS CHARACTERIZED BY A SET OF SUCH “SYSTEM-LEVEL INDICATORS” (ABOUT 100) - USE THESE TO SEPARATE HEALTHY FROM SICK CARS
15 PATHWAY (OR - BIOLOGICAL PROCESS) – BASED ANALYSIS:THE IDEA a. USE EXPRESSION (OR ANY OTHER) HIGH-THROUGHPUT DATA FROM A LARGE NUMBER OF SAMPLES. Drier, Sheffer & Domany PNAS 2013
16 a. USE EXPRESSION DATA GLIOBLASTOMA: TCGA Nature 2008, 435 TUMOR AND 10 NORMAL REMBRANDT Mol Cancer Res 2009 : 228 GLIOBLASTOMA, 28 NORMAL 2. COLON CANCER: Sheffer et al, PNAS 2009 : SAMPLES, 52 NORMAL, 49 POLYPS, 182 PRIMARY TUMORS, 30 METASTASES Sveen et al, Genome Med 2011: NORMAL, 76 PRIMARY, Kogo et al, Cancer Res 2011: 9 NORMAL, 132 PRIMARY 3. BREAST CANCER: METABRIC Curtis et al Nature 2012 : TUMORS, 144 NORMAL TCGA, 988 TUMORS, 106 NORMALS 4. THYROID CANCER: TCGA PROJECT, ON 58 NORMAL, 482 CANCER 5. KIDNEY CANCER, TCGA , 870 TUMOR, 120 NORMAL
17 PATHWAY (OR - BIOLOGICAL PROCESS) – BASED ANALYSIS:THE IDEA a. USE EXPRESSION (OR ANY OTHER) HIGH-THROUGHPUT DATA FROM A LARGE NUMBER OF SAMPLES. b. USE BIOLOGICAL KNOWLEDGE – LISTS OF ( ) GENES THAT BELONG TO A BIOLOGICAL PROCESS OR PATHWAY P Drier, Sheffer & Domany PNAS 2013
18 b. USE EXISTING KNOWLEDGE - ASSIGNMENT OF GENESTO PATHWAYS P USE KEGG, BioCarta FROM MSigDB, AND NCI-Nature Pathway Interaction DATABASES Number of pathways Number of genes in pathway TYPICALLY – TENS OF GENES IN A PATHWAY; HUNDREDS OF SAMPLES “CURSE OF DIMENSIONALITY” IS ELIMINATED GBM: GENE SETS PASSED FILTERS (>3 VARYING GENES)
19 PATHWAY (OR - BIOLOGICAL PROCESS) – BASED ANALYSIS:THE IDEA a. USE EXPRESSION (OR ANY OTHER) HIGH-THROUGHPUT DATA FROM A LARGE NUMBER OF SAMPLES. b. USE BIOLOGICAL KNOWLEDGE – LISTS OF ( ) GENES THAT BELONG TO A BIOLOGICAL PROCESS OR PATHWAY P c. DERIVE FOR EACH SAMPLE i AND PATHWAY P A “PATHWAY DEREGULATION SCORE” D ( i,P) Drier, Sheffer & Domany PNAS 2013
20 c. FOR EACH SAMPLE i AND PATHWAY P - CALCULATING THE PATHWAY DEREGULATION SCORE (PDS) Consider pathway P; identify dP genes that belong to it. Sample i is represented by a point Xi in the space of the expression values of these genes X i KEGG APOPTOSIS PATHWAY, dP = 33 GENES, COLON DATA
21 c. PATHWAY DEREGULATION SCORE (PDS)Calculate the Principal Curve (Hastie & Stuezle 1989) of the cloud of points formed by the full sample set X i
22 c. PATHWAY DEREGULATION SCORE (PDS)Project every sample onto the principal curve; projection of sample i is Yi . The projection to the extremal point near the Normal samples is the Reference Point N Normal Reference Point, N Tumor i projection: Point Yi X i
23 c. PATHWAY DEREGULATION SCORE (PDS)4. The distance of Yi from N, measured along the principal curve, is Di (P), the Deregulation Score of pathway P in sample i. Normal Reference Point, N Tumor i projection: Point Yi X i
24 PATHWAY (OR - BIOLOGICAL PROCESS) – BASED ANALYSIS:THE IDEA a. USE EXPRESSION (OR ANY OTHER) HIGH-THROUGHPUT DATA FROM A LARGE NUMBER OF SAMPLES. b. USE BIOLOGICAL KNOWLEDGE – LISTS OF ( ) GENES THAT BELONG TO A BIOLOGICAL PROCESS OR PATHWAY P c. DERIVE FOR EACH SAMPLE i AND PATHWAY P A “PATHWAY DEREGULATION SCORE” D ( i,P) d. DO THIS FOR NP ~ FEW HUNDRED PATHWAYS e. A SAMPLE IS REPRESENTED IN TERMS OF ITS NP PATHWAY DEREGULATION SCORES => DESCRIBED BY NP PARAMETERS f. PERFORM ALL ANALYSIS USING THESE “SYSTEM-LEVEL” VARIABLES WITH CLEAR BIOLOGICAL MEANING. Drier, Sheffer & Domany PNAS 2013
25 EACH SAMPLE (144 NORMAL, 997 BREAST TUMOR) IS PDS OF 552 PATHWAYS: EACH SAMPLE (144 NORMAL, 997 BREAST TUMOR) IS REPRESENTED BY 552 SUCH PATHWAY–BASED SCORES Sample 789 Pathway 143 Livshitz et al Oncotarget 2015
26 THE METABRIC BREAST CANCER DATASET Curtis et al Nature 2012Using expression data from TUMOR and 144 NORMAL samples (997 in “Discovery set”, 995 in “Validation”) Calculate (using “Pathifier” analysis*) a Pathway Deregulation Score (PDS) for 552 pathways/biological processes, for each sample (Discovery + Normal) D(P,i) = PDS of pathway P in sample i – represent the extent to which pathway P is deregulated in sample i *Drier, Sheffer & Domany PNAS 2013
27 PERFORM ANALYSIS IN THIS SPACE: REORDERING* SAMPLES (AND PATHWAYS) REVEALS STRUCTURE IN DATA** Sample 789 Pathway 143 Sample 789 Pathway 143 *Tsafrir et al Bioinformatics (2005) **Livshits et al Mol Onc (2015)
28 f. PERFORM ANALYSIS IN THIS SPACE: REORDERING SAMPLES (AND PATHWAYS) REVEALS STRUCTURE IN DATA Sample 789 10 SAMPLE CLUSTERS 7 PATHWAY CLUSTERS
29 552 by 1141 PDS-MATRIX CAPTURES KNOWN SUBTYPES
30 APPLY METHOD TO BREAST CANCER: FOCUS ON TWO GROUPS OF BASAL / TRIPLE NEGATIVE TUMORS Sample 789 Pathway 143 BASAL/TN SUBTYPE – HIGH AND LOW IMMUNE INVOLVEMENT pathways Immune DIFFERENT OUTCOME/SURVIVAL FOR THE TWO GROUPS!
31 CLINICAL SIGNIFICANCE: FOR BASAL / TN SUBTYPE, HIGH IMMUNE INVOLVEMENT BETTER SURVIVAL Survival rate p=0.013 Time (years) CLINICAL SIGNIFICANCE: Basal tumors with HIGH IMMUNE system involvement – better survival Basal tumors with LOW IMMUNE system involvement -- worse
32 BIOLOGICAL INTERPRETATION: HIGH IMMUNE INVOLVEMENT (PDS) HIGH TIL LEVEL Survival rate BIOLOGICAL INTERPRETATION: HIGH IMMUNE PDS high level of Tumor Infiltrating Lymphocytes Highest correlation with TIL levels - for T-CELL related PATHWAYS - cell-specific signatures => Tcells BIOMARKER! p=0.013 PROGNOSTIC BIOMARKER? Alexe et al (2007): no difference in survival between TN tumors with high/low immune involvement Time (years) CLINICAL SIGNIFICANCE: Basal tumors with HIGH IMMUNE system involvement – better survival Basal tumors with LOW IMMUNE system involvement -- worse
33 PREDICTIVE BIOMARKER: FOR BASAL/TN SUBTYPE, IMMUNE INVOLVEMENT BETTER RESPONSE TO THERAPY Alexe et al (2007): TN PATIENTS DID NOT RECEIVE CHEMOTHERAPY METABRIC (2012): MAJORITY OF TN WERE TREATED (anthracyclins). IS THE DIFFERENCE IN OUTCOME DUE TO TREATMENT? ALL BASAL PATIENTS p=0.013 BASAL PATIENTS-TREATED p=0.006 BASAL PATIENTS-UNTREATED p=0.66 DIFFERENCE IN SURVIVAL BETWEEN BASAL PATIENTS WITH HIGH vs LOW IMMUNE INVOLVEMENT IS OBSERVED ONLY FOR PATIENTS WHO RECEIVED CHEMOTHERAPY. PREDICTIVE BIOMARKER?
34 BASAL PATIENTS WITH LOW BASAL PATIENTS WITH HIGHPREDICTIVE BIOMARKER: FOR BASAL SUBTYPES, IMMUNE INVOLVEMENT BETTER RESPONSE TO THERAPY BASAL PATIENTS WITH LOW IMMUNE INVOLVEMENT p=0.006 BASAL PATIENTS WITH HIGH IMMUNE INVOLVEMENT p=0.56 POSSIBLE INTERPRETATION 1: ANTHRACYCLINS ARE KILLING BASAL PATIENTS WITH LOW IMMUNE INVOLVEMENT, AND HAVE NO EFFECT ON PATIENTS WITH HIGH IMMUNE INVOLVEMENT. SHOCKING!! INTERPRETATION 2: HIGH RISK PATIENTS (BAD INDICATORS) WERE SENT TO CHEMO. IF LOW IMMUNE – CHEMO DID NOT HELP. HIGH IMMUNE – CHEMO DID HELP!
35 PREDICTIVE BIOMARKER: FOR BASAL SUBTYPES, IMMUNE INVOLVEMENT BETTER RESPONSE TO THERAPY CT No CT Total Cluster 7 (Low Imm) 46 16 62 Cluster 8 (High Imm) 36 29 65 82 45 127 Anthracyclins & immune system: Zitvogel Cell Death & Differ. (2014) Nat. Med. (2014) Oncoimmunology (2014) WE USED CT/NO CT AS A PROXY FOR (CLASSICAL) HIGH/LOW RISK. HIGH IMMUNE INVOLVEMENT/TIL INDICATES GOOD RESPONSE OF HIGH-RISK BASAL/TN PATIENTS TO ANTHRACYCLINS. DO NOT TREAT (WITH ANTHRACYCLINS) HIGH RISK BASAL/TN PATIENTS WITH LOW TIL. PREDICTIVE BIOMARKER!
36 SUGGESTED DECISION PIPELINE:IDENTIFY TRIPLE NEGATIVE PATIENTS (HISTOCHEMISTRY) USE CLINICAL (OR OTHER) INDICATORS TO IDENTIFY HIGH RISK PATIENTS, CANDIDATES FOR CHEMOTHERAPY FOR HIGH-RISK PATIENTS: MEASURE T – CELL INFILTRATE LEVEL IN TUMOR, or OTHER (SINGLE-GENE BASED) BIOMARKER. CANDIDATES: SYK, CD14, CXCR3, CXCL9 4. IF HIGH TIL – DO NOT TREAT WITH ANTHRACYCLINES* CXCR3 G-protein coupled receptor induces variety of cellular processes in leukocytes, binds CXCL9 +two more ligands CD14 surface antigen expressed on monocytes/macrophages CXCL9 - T cell chemoattractant induced by INF gamma, binds to CXCR3 SYK - non-receptor tyrosine kinese expressed in hematopoietic cells couples activated receptors to downstream signaling *Livshits et al Oncotarget (2015)
37 TAKE – HOME LESSONS*: 1. DO NOT USE IGNORANCE-BASED “TOP RANKED” SINGLE GENE LISTS: THEY ARE UNSTABLE**, MOSTLY DEVOID OF BIOLOGICAL MEANING***. 2. CHARACTERIZE TUMORS BY KNOWLEDGE-BASED, SYSTEM-LEVEL VARIABLES# (Pathway Deregulation Scores). 3. LOWER AIMS: NO SILVER BULLET THAT WORKS FOR ALL BREAST CANCER SUBTYPES AND ALL CHEMOTHERAPIES. 4. GENOMIC BIOMARKERS SHOULD COMPLEMENT CLASSICAL CLINICAL RISK INDICATORS (NOT REPLACE THEM). * Domany Cancer Res (2014) ** Ein-Dor et al Bioinformatics (2005); PNAS (2006) Michiels et al Lancet (2005) *** Drier et al PLoS ONE (2011) # Drier et al PNAS (2013); Shi et al Annals Onc (2016)
38 SUPPORT (CURRENT AND PAST)The Leir Charitable Foundation German-Israeli cooperation Project (DIP) Israel Ministry of Science (IMoS) Israel Ministry of Industry and Commerce/NOFAR Israel Science Foundation (ISF) National Cancer Institute – NIH Program Project Grant EC Research Grants Minerva, Wolfson, Mario Negri Foundations
39 APOLOGIES FOR RUNNING OVER TIMETHANKS FOR LISTENING & APOLOGIES FOR RUNNING OVER TIME