Introduction to Linear Regression

1 Introduction to Linear Regressionheart rate versus exer...
Author: Trevor Lee
0 downloads 2 Views

1 Introduction to Linear Regressionheart rate versus exercise time

2 DISCLAIMER & USAGE The content of this presentation is for informational purposes only and is intended for students attending Louisiana Tech University only. The authors of this information do not make any claims as to the validity or accuracy of the information or methods presented. Any procedures demonstrated here are potentially dangerous and could result in damage and injury. Louisiana Tech University, its officers, employees, agents and volunteers, are not liable or responsible for any injuries, illness, damage or losses which may result from your using the materials or ideas, or from your performing the experiments or procedures depicted in this presentation. The Living with the Lab logos should remain attached to each slide, and the work should be attributed to Louisiana Tech University. If you do not agree, then please do not view this content. boosting application-focused learning through student ownership of learning platforms

3 Linear regression Provides a predictable way to quantify the relationship between two variables, even when significant uncertainty and measurement error exist Environmental Data Medical Data Process Parameters

4 π‘π‘π‘š=π‘π‘’π‘Žπ‘‘π‘  π‘œπ‘£π‘’π‘Ÿ 10 π‘ π‘’π‘π‘œπ‘›π‘‘ π‘π‘’π‘Ÿπ‘–π‘œπ‘‘βˆ™6Collect some data to see how linear regression works We know that our heart rate increases as we begin to exercise Heart rate is usually expressed in beats per minute (bpm) We can record our pulse over a short period of time to estimate heart rate we’ll collect over a 10 second period π‘π‘π‘š=π‘π‘’π‘Žπ‘‘π‘  π‘œπ‘£π‘’π‘Ÿ 10 π‘ π‘’π‘π‘œπ‘›π‘‘ π‘π‘’π‘Ÿπ‘–π‘œπ‘‘βˆ™6 The variation of heart rate during exercise is complex and depends on many factors (fitness, the level of exertion, the duration of exercise, what you’ve been eating/drinking, etc.) We will assume that heart rate is initially linear with the duration of exercise just to collect some data this could serve as a starting point for a systematic study of heart rate during exercise

5 collect heart rate five timesCollect pulse after doing jumping jacks measure pulse for 10 seconds (have a partner write down the number of beats) jump do jumping jacks for 10 seconds 10 seconds of total exercise measure pulse for 10 seconds do jumping jacks for 10 seconds 20 seconds of total exercise measure pulse for 10 seconds do jumping jacks for 10 seconds 30 seconds of total exercise measure pulse for 10 seconds do jumping jacks for 10 seconds 40 seconds of total exercise measure pulse for 10 seconds jumping time (s) STOP STOP jump STOP jump STOP jump STOP total time (s) collect heart rate five times

6 Logistics Choose one or two people per table to do jumping jacks; this is voluntary don’t do the jumping jacks if there is any reason why this activity could be harmful to you The people who are jumping should get away from tripping hazards and other people (clear a space around your table and keep yourself under control while exercising) Your instructor will keep track of time and tell you when to jump and when to collect heart rate; a cell phone, watch or online stopwatch can be used We need about 7 to 10 sets of data from the entire class not everybody will get to exercise  We’ll analyze and plot this data using Excel The heart rate collected will include some error Collect pulse as soon as you stop jumping After 10 seconds, call out the number of pulses collected over 10 seconds to your partner(s) and start jumping again Just be as accurate as possible

7 Enter heart rate data into a Exceltime (s) student 1 (bpm) student 2 (bpm) student 3 (bpm) student 4 (bpm) student 5 (bpm) student 6 (bpm) student 7 (bpm) student 8 (bpm) 10 20 30 40 Multiply the number of pulses collected over 10 seconds by 6 to get beats per minute (bpm) Report bpm to your instructor Build a spreadsheet on your computer along with the instructor

8 Plot data for the entire class in ExcelMake a scatter plot using symbols only – no lines Time is the independent variable and is plotted as the x-axis Heart rate is the dependent variable and is plotted as the y-axis The title of the plot is always listed as β€œy versus x” which is β€œheart rate versus exercise time” for this problem

9 Make a hand plot for one data setYour instructor will select one student’s data that is typical of the data for the entire class; we will analyze this data Make a hand plot using your own paper as shown below (use proper format!!) Draw a β€œbest fit” line through the data; just use your judgment heart rate versus exercise time β€œbest fit line” use data from class not this data

10 Find an equation to fit the dataAssume the data is linear Pick two points from your data (or make up two points by picking from the line) Compute the slope π‘Ÿπ‘–π‘ π‘’ π‘Ÿπ‘’π‘› π‘œπ‘Ÿ βˆ†π‘¦ βˆ†π‘₯ Write equation using point-slope form as 𝑦=π‘šβˆ™π‘₯+𝑏 Analysis of our equations Compare your answer with others in the class If you chose the same two points to define your β€œbest fit” line, then your equations should be the same Choosing different points causes us to get different equations Linear regression, which can be derived using calculus, gives us the same equation every time Linear regression takes the guess work out of finding best fit lines example (use data from your class) π‘š= βˆ†π‘¦ βˆ†π‘₯ = find the slope: 120βˆ’82 40βˆ’10 =1.27 find the y-intercept by plugging in one of the data points: 𝑏=𝑦 βˆ’π‘šβˆ™π‘₯= 120βˆ’1.27βˆ™40=69.3 write the equation: β„Žπ‘’π‘Žπ‘Ÿπ‘‘ π‘Ÿπ‘Žπ‘‘π‘’=1.27βˆ™π‘‘π‘–π‘šπ‘’+69.3 . . . where heart rate is in bpm and time is in seconds.

11 Understanding linear regressionπ‘₯ 𝑦 best fit line 𝑦 𝑓𝑖𝑑 =π‘šβˆ™π‘₯+𝑏 data point 𝑖 ( π‘₯ 𝑖 , 𝑦 𝑖 ) 𝑦 𝑖 𝑦 𝑖 βˆ’ 𝑦 𝑖 𝑓𝑖𝑑 π‘₯ 𝑖 𝑦 𝑖 𝑓𝑖𝑑 𝑦 𝑖 𝑓𝑖𝑑 =π‘šβˆ™ π‘₯ 𝑖 +𝑏 linear regression generates the best line by minimizing the squares of the errors minimize 𝑦 𝑖 βˆ’ 𝑦 𝑖 𝑓𝑖𝑑 2 for all data points to find optimum values of m and b we call this least squares linear regression

12 Repeat this process again for another data set.Finding m and b π‘š= 𝑛 π‘₯ 𝑖 𝑦 𝑖 βˆ’ π‘₯ 𝑖 𝑦 𝑖 𝑛 π‘₯ 𝑖 2 βˆ’ π‘₯ 𝑖 2 𝑏= 𝑦 𝑖 βˆ’π‘š π‘₯ 𝑖 𝑛 𝑦=π‘šβˆ™π‘₯+𝑏 5βˆ™ 10220βˆ’ 100βˆ™ 451 π‘š= =1.2 5βˆ™ 3000βˆ’ 100 2 451βˆ’ 1.2βˆ™ 100 820 100 𝑏= =66.2 1720 400 5 2880 900 4800 1600 β„Žπ‘’π‘Žπ‘Ÿπ‘‘ π‘Ÿπ‘Žπ‘‘π‘’= 1.2βˆ™ π‘‘π‘–π‘šπ‘’+ 66.2 100 βˆ‘ π‘₯ 𝑖 451 βˆ‘ 𝑦 𝑖 10220 βˆ‘ π‘₯ 𝑖 βˆ™π‘¦ 𝑖 3000 βˆ‘ π‘₯ 𝑖 2 Repeat the above procedure for the data set selected in your class. Compare the m and b that you get with your classmates. Doing this by hand is good practice for the exam. Repeat this process again for another data set.

13 Repeat for all of the class dataReformat your spreadsheet to have single x and y columns as shown (5 lines for each students heart rate data) Find the sums and plug them into the equations for m and b to find the best fit line; try to do these calculations in Excel it’s tricky  due to fixed cell references and the placement of parentheses Create a plot of all data in Excel Plot the best fit line without any symbols over the data points See the next page for an example βˆ‘ π‘₯ 𝑖 βˆ‘ 𝑦 𝑖 βˆ‘ π‘₯ 𝑖 βˆ™π‘¦ 𝑖 βˆ‘ π‘₯ 𝑖 2

14 Details of solving previous problem in ExcelA B C D E F 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 don’t look at these tips unless you get stuck!! use these data point to plot the best-fit line =C$28*B5+C$29 =(COUNT(B5:B24)*D26-B26*C26)/(COUNT(B5:B24)*E26-B26^2)