Biostatistics Qian, Wenfeng.

1 Biostatistics Qian, Wenfeng ...
Author: Erick Carr
0 downloads 2 Views

1 Biostatistics Qian, Wenfeng

2 Myself Qian, Wenfeng (钱文峰)Institute of Genetics & Developmental Biology, CAS Center for Molecular Systems Biology

3 My group

4 My research Single cell genetics Kinetics of gene expressionVariations among isogenic cells Kinetics of gene expression Protein synthesis/degradation Transcriptional/translational burst Quantitative functional genomics

5 Genotype-Phenotype Map

6 DNA coding rules 01101000 01100101 01101100 01101100 hello world hello world ATGCGCATATGCGCATTGCGCATATGCGCATG ……………… GCGCATATGCGCATG

7 My education 2006, B.S., Peking UniversityBiological Sciences 2012, Ph.D., University of Michigan Evolutionary Genetics

8 Course introduction Applied biostatisticsExamples, examples, and examples Try to make it not too heavy

9 Statistics Statistics is the study of the collection, organization, analysis, interpretation and presentation of data.

10 Schedule March 21: Probability March 23: Introduction to RMarch 30: Hypothesis testing April 1: Analysis of variance April 6: Regression and correlation April 11: Plots with R April 19: Presentations (+ a report = final exam) Shaohuan Wu

11 R language Standard statistical tool in scienceYou will need to bring your laptop to the class, with R installed.

12 Download R

13 R studio

14 Exam Final exam is a report based on the use of statistics in a small project. The report should be words. Ten-minute (including 2 min Q & A) oral defense of the report in front of the class.

15 PPT Will be uploaded to my lab website after each classqianlab.genetics.ac.cn Words in red: waiting for your response Words in green: the beginning of a new example

16 Textbooks

17 Textbook Handbook of Biological Statistics An R companion for the handbook of biological statistics rcompanion.org/documents/RCompanionBioStatistics.pdf Other reference: Biometry by Sokal & Rohlf What is a p-value anyway? By Andrew Vickers

18 Before next class Handbook of Biological StatisticsAny relevant materials on pages 1-28 (before class II) An R companion for the handbook of biological statistics Pages 1-13 (before class II)

19 Your introduction

20 Statistics is the base of most sciencesThe definition of the modern science?

21 What is science? A theory in the empirical sciences can never be proven, but it can be falsified, meaning that it can and should be scrutinized by decisive experiments. Hypothesis testing Karl Popper

22 All swans are white

23 Science is about rejecting null hypothesisAristotle Galilei Leaning Tower Pisa

24 Science is about rejecting null hypothesisEinstein Eclipse

25 In biology In genetics Mendelian genetics Other examples?Mixing of traits Mendelian genetics Two copy of genes that can be separated in the next generation, generating the 3:1 ratio Other examples? Mendel

26 Deterministic vs stochastic eventsDeterministic events Stochastic events If I toss a coin, I will get a face up I will get up in the tomorrow morning A child will grow up Head or tail? The exact time point (minute and second) I would wake up naturally The height and weight of the child Other examples?

27 Phenomena in biology Are likely to be stochastic, compared to physical phenomena In physical world Sun rises Planet moves Water boils

28 In Biology Weight and height Disease Life spanThe outcome of your exam Reason?

29 Reasons of stochasticity in lifeTraits are determined by both genes and environments Environment is heterogeneous Most traits are affected by multiple genes Each gene has a minor impact Developmental strategy (body plan) Life sciences contains a huge number of factors, which makes stochasticity everywhere.

30 Regression to the mean In statistics, regression toward (or to) the mean is the phenomenon that if a variable is extreme on its first measurement, it will tend to be closer to the average on its second measurement An positive gene in your screen may not appear in the next time. The best student in the collage could become ordinary later in his/her career Why?

31 How do we describe stochastisity?Distribution!

32 Density function

33 Density function Cumulative density function

34 Normal distribution The bell shape Appears everywhere in biology Why?Traits are determined by both genes and environments Many genes with minor effects Additivity What if not?

35 Normal distribution The bell shape Appears everywhere in biology Why?Traits are determined by both genes and environments Many genes with minor effects Additivity What if not?

36 The probability of a person taller than 1.9 meterIf the distribution of height follows normal distribution, with mean = 1.75 and standard deviation = 0.06

37 Descriptive statisticsAlgebraic Mean (μ) Variance (σ2) Standard deviation (σ)

38 Normal distribution

39 The probability of a person taller than 1.9 meterIf the distribution of height follows normal distribution, with mean = 1.75 and standard deviation = 0.05 P = 1- “NORMDIST(1.9, 1.75, 0.05, 1)” =0.6%

40 The height is more than 1.9 meterIf the distribution of height follows normal distribution, with mean = 1.75 and standard deviation = 0.05 What is the probability of less than 1.2 meter?

41 The height is more than 1.9 meterIf the distribution of height follows normal distribution, with mean = 1.75 and standard deviation = 0.05 What is the probability of less than 1.2 meter? What if this number is different from your intuition?

42 The probability of a person taller than 1.9 meterIf the distribution of height follows normal distribution, with mean = 1.75 and standard deviation = 0.05 What is the probability of being between 1.7 and 1.75?

43 Can you draw A density curve of standard normal distribution in Excel?A cumulative density curve of standard normal distribution in Excel?

44 Bill Gates’ visit to a barMedian

45 Bill Gates’ revisit to the barInterquartile range Boxplot

46 How do we treat stochastic dataAt a summer tea party in Cambridge, England, a guest states that tea poured into milk tastes different from milk poured into tea. Her notion is shouted down by the scientific minds of the group. But one man, Ronald Fisher, proposes to scientifically test the hypothesis. 

47

48 How to test the hypothesis?H0: There is not difference on order of milk and tea

49 How to test the hypothesis?H0: There is not difference on order or milk and tea 10 cups of drink Mixed blind to the lady Let the lady tell the order of milk and tea If H0 is correct, what is the probability the lady get all 10 guess correct?

50 How to test the hypothesis?If H0 is correct, what is the probability that the lady got all 10 guesses correct?

51 How to test the hypothesis?If H0 is correct, what is the probability the lady get all 10 guesses correct? 0.1% It is unlikely that event with such low probability happened in a single test. Thus, the most likely scenario is that H0 is incorrect, and there is difference between two orders.

52 What is a P-value? The p-value is defined as the probability of obtaining a result equal to or "more extreme" than what was actually observed, assuming that the model is true. P-value can be used in statistics to reject a null hypothesis

53 What if… Among 10 tests, the lady succeeded for 8 of them?

54 Binomial distributionFirst child, Boy or Girl Second, B or G Third, B or G Eight possibilities: BBB, BBG, BGB, BGG, GBB, GBG, GGB, GGG What is the probability of having 2 B in 3 children?

55 Binomial distribution𝑃 𝑥=𝑘 = 𝑛 𝑘 𝑝 𝑘 (1−𝑝) 𝑛−𝑘 n=3 k=2 p=0.5

56 What if… Among 10 tests, the lady succeeded for 8 of them?What is the p-value?

57 Probability estimationAlternatively, we can estimate the probability of success (E) In this case 80% We can get 95% confidence interval (CI) If 0.5 is out of CI, we conclude a difference between the order

58 Confidence interval

59 How to calculate confidence interval?For binomial distribution, Variance 𝜎 2 =𝑛𝑝𝑞 Standard deviation 𝜎= 𝑛𝑝𝑞 In this case, σ = sqrt(10 * 0.8 * 0.2) = 1.26 If we use normal distribution to approximate the binomial distribution 95% confidence interval = [μ-2σ, μ+2σ] =[8-2.5, 8+2.5] = [5.5, 10.5] 5 is out of the 95% confidence interval

60 Law of large number The estimate of the probability 0.8 may not be accurate … The larger the sample size, the more accurate our estimate is. So that we could potentially distinguish 50% from 60%

61 Applications of such ideaHold your nose, and you may not be able to tell coke from sprite

62 Is a drug effective or not?Other examples?

63 Number of left handed peopleIf the probability of left handed people is 5% in a population, what is the probability of a 50-student class containing exact 1 left handed people?

64 Poisson distribution λ = mean = variance

65 Number of left handed peoplePoisson distribution λ = 50* 5% = 2.5 P(X = 1) = 2.5× 𝑒 −2.5 1! = 20% How about 0, 2, 3, 4 left handed people? Application: when the total # is not available

66 Uranium(235U)’ radiationNeutron Rate 1/sec The probability of having exactly 1 radiation event in the next sec?

67 Luria-Delbrück experimentQuestion: Did the mutation to resistance happen BECAUSE of the presence of a virus, or even BEFORE adding the virus to the culture?

68

69 Poisson distribution Luria-Delbrück distribution

70

71 Luria-Delbrück experiment

72

73

74

75

76

77 Intuition is extremely important in statistics

78 Blaise Pascal Pascal's principle

79 Geek’s joke One day, Einstein, Newton, and Pascal meet up and decide to play a game of hide and seek. Einstein volunteered to be “It.” As Einstein counted, eyes closed, to 100, Pascal ran away and hid, but Newton stood right in front of Einstein and drew a one meter by one meter square on the floor around himself. When Einstein opened his eyes, he immediately saw Newton and said “I found you Newton,” but Newton replied,

80 Einstein, Newton, and Pascal Play Hide and Seek“No, you found one Newton per square meter. You found Pascal!”.

81 Pascal’s Problem The rule of the gameTwo people toss the coin one by one Player A wins when s/he gets 3 “head” Player B wins when s/he gets 3 “tail” The game has to stop when A gets 2 “head” and B gets 1 “tail” because of King’s call How to split the bet?

82 Opinions B: A gets 2/3 and B gets 1/3 A: A gets 3/4 and B gets 1/4A needs one more “head”, P = 1/2 B needs two more “tails”, P = 1/4 A: A gets 3/4 and B gets 1/4 B wins only when B gets two “tails” P = 1/4 Otherwise, A wins P = 3/4 Who is correct?

83 Conclusion A: A gets 3/4 and B gets 1/4

84 Monty Hall problem Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice? Your guess?

85 Monty Hall problem If the car is not behind door 3, the probabilities of being behind door 1 and door 2 are equal P = ½ for both.

86 Solution 1 1/3

87 Solution 2

88 Intuition: Consider 10000 doors …You chose door 1 The host open 9998 doors for you, and none of them have cars behind Do you switch?

89 Monty Hall problem Switch it!

90 The probability of the same birthday in a classConsider a class with 50 people What is the probability that at least two students have the same birthday? Your guess?

91 The probability that all have different birthdayThe first person: 1 The second person: 364/365 The third person: 363/365 The 50th person: 316/365 P = 0.03

92 The answer The probability that all have different birthdays P = 0.03The probability that at least two students have the same birthday 1 – P =0.97

93 The success of an experimentTwo people A and B are doing an experiment in my lab According to the history records, the successful rate for A is 0.8, and that for B is 0.7 Each of them does the experiment once What is the probability of at least one success?

94 The success of an experimentConsider the probability both of them fail P = * 0.3 = 0.94

95 The success of an experimentConsider the probability both of them fail P = * 0.3 = 0.94 Any problems here?

96 The success of an experimentConsider the probability both of them fail P = * 0.3 = 0.94 Any problems here? It depends on whether the two people are doing experiments independently! Do they use the same set of reagents? If true, then A’s failure increases the probability of B’s failure

97 The conditional probabilityP(A|B) The probability of A given B The probability of girl given the first child is a boy in the family P(the second child is a girl | the first child is a boy) If independent P (2nd girl | 1st boy) = P (girl)

98 Probability of infectionA test can detect 95% of the people with infection (true positive) There is 1% probability of false positive The frequency of a infection is 0.5% What is the probability of infection, given a positive result in the test

99 Bayesian theorem 𝑃 𝐴 𝑖 𝐵)= 𝑃 𝐴 𝑖 𝑃 𝐵 𝐴 𝑖 ) 𝑖=1 ∞ 𝑃 𝐴 𝑖 𝑃 𝐵 𝐴 𝑖 )𝑃 𝐴 𝑖 𝐵)= 𝑃 𝐴 𝑖 𝑃 𝐵 𝐴 𝑖 ) 𝑖=1 ∞ 𝑃 𝐴 𝑖 𝑃 𝐵 𝐴 𝑖 ) Ai = infected B = positive in the test P (Ai | B)

100 Autosomal single-locus diseasePatients Normal individuals ?

101 Autosomal single-locus diseasePatients Normal individuals ?

102 The probability of 4th girl in the family, given the first 3 are all girlsYour opinion?

103 Genetics or stochasticityModel I: for some genetic reasons, only sperms with X chromosome survive. Model II: the birth of sons and daughters are equally likely For a family with 3 daughters, which model is more likely?

104 Genetics or stochasticityModel I: for some genetic reasons, only sperms with X chromosome survive. Model II: the birth of sons and daughters are equally likely How to calculate it quantitatively?

105 Genetics or stochasticityModel I: for some genetic reasons, only sperms with X chromosome survive. Model II: the birth of sons and daughters are equally likely LOD score: log10 of odds LOD = log10(P(obs. | model I)/ P(obs. | model II))

106 Genetics or stochasticityModel I: Genetics Model II: By chance LOD = log10(P(obs. | model I)/ P(obs. | model II)) P(obs. | model I) = 1 P(obs. | model II) = 8 LOD =log10(1/8) = 0.9 Threshold: >3 or <-3