A data visualization course for undergraduate data science students Silas Bergen Winona State University; Winona, MN, USA CAUSE Webinar December 12, 2016.

1 A data visualization course for undergraduate data scie...
Author: pranay devisetty
0 downloads 3 Views

1 A data visualization course for undergraduate data science students Silas Bergen Winona State University; Winona, MN, USA CAUSE Webinar December 12, 2016 1

2 Context Spring 2015: Winona State launches Data Science major. Core consists of: Statistics:  Introductory statistics  Intermediate statistics  Regression analysis Computer science:  Algorithms and Problem-Solving I and II  Databases and Management Systems Data science:  Introductory Data Science (DSCI 210)  Data Summary and Visualization (DSCI 310)  Management of Structured Data (DSCI 325) STAT CS STAT 2

3 Data Science Curriculum: Context  DSCI 210 (Introduction to Data Science)  Principles of data aggregation and summarization  Introduction to data wrangling tools  Primary tools: Excel, R  Unstructured data “light” (rvest and web-scraping, text analysis)  Prerequisite for DSCI 310  DSCI 310 (Data visualization)  DSCI 325 (Management of structured data)  “Data Science II”  More advanced data management techniques  Heavy use of dplyr and SAS 3

4 DSCI 310: Course Content 4

5 1.Visualization theory and best practices (~3 weeks) Draws from Tufte, Cleveland & McGill, Wickham How we perceive information (elementary perceptual tasks) Critique-based; no designs Assessments: regular visualization critiques 5

6 1. Visualization theory and best practices: Critique example 1.What are the variables encoded in this visualization, and what are the EPTs used to encode them? 2.Do you think the bar/area chart in the lower left is effective? If so, explain why. If it could be improved, explain how. 3.How is funding encoded? There is one relationship I would really be interested to explore, but which this visualization makes very difficult. Can you think what that might be, and how could this relationship be visualized effectively? http://mmviz.blogspot.co.uk/2015/03/visualizing-shelter-problem.html 6

7 1. Visualization theory and best practices: Critique example What are the variables, their types, and the EPTs used to encode them? List at least 3 insights you can glean from this visualization. Do you think this chord chart is more beautiful or confusing? Does the surrounding text help or hinder your ability to navigate this visualization? How could you visualize these data with a stacked bar chart? Would this be an improvement, or not? http://bl.ocks.org/nbremer/raw/75c76f4be60fce435aba/ 7

8 2. Visualization design tasks (~7 weeks) Data visualization: o Univariate visualizations o Multivariate visualizations o Geographic visualizations o Interactive & Static dashboards Tools: Tableau (primary); ggplot Assessments: “Design tasks” Students provided with data set and set of questions to answer visually Students submit draft to peer grading website CrowdGrader.org for peer feedbackCrowdGrader.org Must review 3 peer visualizations according to provided rubric Dual-edged benefit: feedback on their own visualization + chance to see peer designs Students get time to incorporate feedback; submit final draft to me for grading 8

9 Aside: Tableau PROS Point-and-click interface Easy interactivity (crosstalk-style) Tableau Public for online sharing Students can “master” by semester’s end CONS Easy interactivity Only Tableau Public is free, generally Minimal data manipulation and analytic capabilities Less flexibility than other options 9

10 2. Visualization design: Design task example Using the American Community Survey data, create 3 visualizations that address the following question: how does inequity in median income comparing employed men to employed women vary across the country? The 3 visualizations should be: 1.A map with median income inequality encoded by color. 2.A map with median income inequality encoded by size. 3.A third visualization which is not a map of your own design. 10

11 2. Visualization design: Design task example 11

12 2. Visualization design: Design task example Teen pregnancy rates (TPRs) using CDC data 1.How have TPRs changed over time, both on the state-by-state level and overall? 2.Are there racial gaps in the TPRs? How have any gaps changed over time? 3.Which states have the best TPRs, and which states have the worst? 4.(Related to Q3): Which states have improved the most from 2003 to 2014? 12

13 2. Visualization design: Design task example Aggregation challenge: 13 Column of misdirection

14 2. Visualization design: Design task example 14

15 1 midterm group project o Data provided to students 1 final individual project o Data found by students Oral presentations 3. Projects (~5 weeks) 15

16 Group Projects: Examples Task: create a visual report of the “Mortality in Local Jails and State Prisons” BJS report. http://www.bjs.gov/latestreleases.cfm 16

17 Group Projects: Examples 17

18 Group Projects: Examples Individual Project #2: Visualize The Current’s 2015 YTD playlist (www.thecurrent.org/playlist/)www.thecurrent.org/playlist/ 18

19 Group Projects: Examples 19

20 Final projects: Examples 20

21 http://tabsoft.co/1UmY2UQ 21

22 http://tabsoft.co/1UmY2UQ 22

23 23

24 24

25 http://tabsoft.co/1UmY2UQ 25

26 http://tabsoft.co/1UmY2UQ 26

27 Worked well: Peer review + editing cycle for all design tasks “The best of” for each design task Biggest challenges: Teaching creativity (“Take Art 101” in student evaluations) Emphasizing insight over data dumping and the curse of the click 27

28 THANK YOU! [email protected] 28