1 An Important Question…Hey, this is new and exciting Yeah, seen this all before
2 Two Predictions… “Books will soon be obsolete in the public schools. Scholars will be instructed through the eye. … Our school system will be completely changed inside of ten years.” Thomas Edison “In from three to eight years we will have a machine with the general intelligence of an average human being. I mean a machine that will be able to read Shakespeare, grease a car, play office politics, tell a joke, have a fight. At that point the machine will begin to educate itself with fantastic speed. In a few months it will be at genius level and a few months after that its powers will be incalculable.” Marvin Minsky
3 The Problem of HyperboleThe mechanics of hyperbole in four easy steps: Pick a technology Make a world-changing/ disrupting prediction Cherry pick some case studies to ‘prove’ your point Write a book. Big data is no exception
4 A New Tool… “The discovery of hitherto unknown, but commercially valuable, information in large databases.”
5 A New Tool Classical statistics: Predetermined questions (hypotheses);Designed experiment/survey and/or Used existing datasets; Structured data. Well established tools; Known unknowns. Data mining: No predetermined questions; Used large existing sets; New tools: Machine learning. Unknown unknowns.
6 Evolution of Analytics/Big Data…Timeline Terminology Data Claims 1990s Today Knowledge Discovery in (Large) Databases (KDD) Data mining Knowledge Discovery Data Analytics Business Analytics Business Intelligence Structured Structured & Unstructured Single Source Multiple Sources “Large” “Big” Modest Extravagant! Big Data as the Answer to Everything
7 The Problem of HyperboleTime for some cooling off?
8 The Dark Side
9 The Dark Side Threats/risks include: De-anonymisationLarge scale data losses Unethical usage Citizens’ privacy Discrimination Exclusion Illegal usage Misplaced trust and not least… Misguided public policies.
10 The Dark Side “(predatory programs based on big data) … distort higher education, drive up debt, spur mass incarceration, pummel the poor at nearly every juncture and undermine democracy.” Remember - it’s not just the data – it’s the algorithms.
11 The Problem of Open Government DataWho Owns the Data? The Problem of Licensing Both Raise Complex Questions
12 Concluding ReflectionsYes this is new and yes, it may turn out to be valuable It would be wise not to expect miracles Need to think carefully about how we use this technology One part of a much larger and more complex set of issues If you torture data long enough it will confess to anything