Transparent Data Supply

1 Transparent Data SupplyThe 20th Annual MIT Internationa...
Author: Posy Johnston
0 downloads 2 Views

1 Transparent Data SupplyThe 20th Annual MIT International Conference on Information Quality 2015 Reference Model for Transparent Data Supply Sami Laine Aalto University Carol Lee Northeastern University USER INTERACTION METADATA FOR IMPROVED INFORMATION TRACEABILITY

2 Over 10 years healthcare IS research and developmentPersonal background combines technical, human and healthcare perspectives University of Turku, Finland Information systems Empirical field studies in healthcare settings Turku University Hospital, Finland Healthcare datawarehousing Project management, system and service design Aalto University, Finland Usability research Healthcare data and information quality Siili Solutions, Finland Information management consulting Over 10 years healthcare IS research and development

3 Accurate data is an important issue for research and practiceTraditionally, data and information quality research and management has emphasized Quality controls Standardization The key lesson of TDQM is that information suppliers and manufacturers need to expand their knowledge about how and why the consumers use information. Managerial traceability (e.g. IP Maps) Technical traceability (e.g. Data Lineage)

4 One should understand the actual circumstances and human behavior at the original data entry situations Fundamental Problems in Data Provenance Lack of Context Lack of Human Factors Solutions to Contextual Data Provenance Managerial methods (e.g. CEIP Maps) Technical methods (e.g. Open Provenance) Data provenance remains still a major challenge Data customer dissatisfaction Technical provenance limitations

5 What exactly are these ”Context Factors” and ”Human Factors”?Research Agenda What exactly are these ”Context Factors” and ”Human Factors”? Why are these factors so important to know? We aimed to point out context and human factors that should be recognized while interpreting the actual meaning and quality of situated data.

6 Research background ENVIRONMENT DESIGN RESEARCH KNOWLEDGE BASERelevance Rigor BUSINESS NEEDS High Quality Data (Primary requirement) Transparent Provenance (Supporting requirement) TECHNOLOGY NEEDS Procurement Guidelines (Method artifact) Transparent Software Systems (Instantiation artifact) DEVELOP / BUILD Explanatory Design Theory (ISDT terminology) OR Reference Model Artifact (DSR terminology) FOUNDATIONS TDQM (Data and Information Quality Research) Context of Use (Human-Computer Interaction) Software Layers (Software engineering) METHODOLOGIES Explanatory Case Study (Empirical research) DSR & ISDT (Constructive approach) Business needs ASSESS REFINE Applicable knowledge JUSTIFY / EVALUATE Explanatory Case Study (General requirements) Scenarios (General components) BUSINESS CONTRIBUTIONS Reference model points out provenance gaps SCIENTIFIC CONTRIBUTIONS Reference model extends TDQM

7 Explanatory Design Theories provide functional explanations as to why a solution has certain components KERNEL THEORIES Theories from social or natural sciences governing design requirements META-REQUIREMENTS Describes the class of goals to which the theory applies META-DESIGN Describes a class of artifacts hypothesized to meet the meta-requirements TESTABLE HYPOTHESIS FOR DESIGN PRODUCT Used to test whether proposals matches the meta-requirements KERNEL THEORIES Theories from natural and social sciences governing the design process itself More Junk DESIGN METHOD Description of procedure(s) for artifact construction TESTABLE HYPOTHESIS FOR DESIGN PROCESS Used to test whether the design method results in an artifact satisfying the meta-requirements

8 Context of Use at Data EntryContextual human action can be interpreted from two alternative perspectives Task Tool User Environment Context of Use at Data Entry Prescriptive Planning Data entry can be managed by quality control. Semantic heterogeneity can be managed by standardization. Data entry is affected by complex circumstances. Descriptions are inherently vague and incomplete. Situated Action Aims to improvement Aims to understanding Wang, R. Y., Lee, Y. W., Pipino, L. L. and Strong, D. M. (1998) Manage Your Information as a Product. Sloan Management Review, 39(4), pp Suchman, L. A. (1987). Plans and situated actions: The problem of human-machine communication. New York, NY, USA: Cambridge University Press.

9 What does this really mean?Research Question ”08:53” What does this really mean?

10 How exactly is a registration timestamp value such as “08:53” created in hospital processes?Arrival Registration Treatments Discharge Departure

11 Kernel theories were selected from three different disciplinesContext of Use According to Context of Use, all human action is affected by subtle characteristics of users, tasks, tools, and environments. Total Data Quality Management Software Layers

12 How exactly is a registration timestamp value such as “08:53” created in hospital processes?MEANING USER TASK TOOL ENVIRONMENT “arrival at location” Patient Self-registration Barcode card Current unit “available service at reception” Secretary (current user) Registration EPR & key press “midnight at previous day” EPR & manual adjustment “will leave at this time” Secretary (at previous unit) Discharge Previous unit “will be picked up at this time” “is leaving unit now”

13 Even a simple data element can be complex information!The same data value can mean completely different things, but they all look identical at data layer! Registration at ”08:35” “Arrival at location” “Available service at reception” “Midnight at previous day” “Will leave at this time” “Will be picked up at this time” “Is leaving unit now” Even a simple data element can be complex information!

14 Kernel theories were selected from three different disciplinesContext of Use Total Data Quality Management TDQM has defined data supply to consist of data creation, collection and recording phases. Software Layers

15 CREATE COLLECT RECORD These are often not known!The meaning can and quality will change across data creation, collecting and recording phases! CREATE COLLECT RECORD These are often not known! Actual meaning and quality cannot be known

16 Kernel theories were selected from three different disciplinesContext of Use Total Data Quality Management Software Layers Software systems are based on three layers: data source, business logic, and user interface.

17 User Interface Application Logic Database These are often not known!The meaning can and quality will change across user interface, application logic and databases! User Interface Application Logic Database These are often not known! Actual meaning and quality cannot be known

18 Dimensional Axis of the Reference ModelOne should know these to understand actual meaning and quality of Situated Data! CREATE COLLECT RECORD Dimensional Axis of the Reference Model USER USER INTERFACE TASK APPLICATION TOOL DATABASE ENVIRONMENT

19 There exists significant unknown contextual variations and hidden quality problems in and between axis! CREATE COLLECT RECORD USER USER INTERFACE TASK APPLICATION TOOL DATABASE ENVIRONMENT

20 One should understand black boxes!Focusing on data recorded to databases can make you blind to semantic heterogeneity and alternative error profiles! CREATE COLLECT RECORD One should understand black boxes! “arrival at location” “available service at reception” “midnight at previous day” “will leave at this time” “will be picked up at this time” USER INTERFACE APPLICATION “is leaving unit now” Registration at ”08:53” DATABASE

21 Contextual metadata could help to understand what situated data really means!Metadata Flows CREATE COLLECT RECORD Metadata Sources USER USER INTERFACE TASK APPLICATION TOOL DATABASE ENVIRONMENT Metadata Categories

22 Data supply should fulfill these requirements to provide transparent and accurate informationQuality Controls Precise Semantics Documented Contexts Automatic Supply Traceable Contexts Openness Quality Controls At data entry situations, errors and environmental interferences should be minimized by effective quality control. Precise Semantics Semantic meanings should be precise across all contexts rather than generalized common concepts, which leaves room for ambiguities. Documented Contexts Data and metadata should store contextual variations, which are currently lost from data sets, based on the technical data layer and organizational data recording phase. In practice, contexts of data creation and collection should be captured and stored for later use in addition to the current data recording phase. Technically, the contextual properties of a user interface and application layers should be also stored in addition to the properties of the data layer. Automatic Supply The emphasis should be on automatic documentation of primary events rather than manual documentation for secondary purposes. Traceable Contexts Data and metadata should support traceability across documented contexts. Original data creation situations should be traceable from the recorded data links between all the three PP roles and three technical layers because semantics mismatches cannot be recognized from a single layer, but only in comparison to other layers or roles. Openness Data and metadata should be opened transparently for secondary users. Open Data or Open API should not be a black-box hiding the contextual details and variations in previous data supply roles or technical layers. Laine, Sami, Lee, Carol, Nieminen, Marko (2015), ”Transparent Data Supply for Open Information Production Processes”, In the Proceedings of the European Conference on Information Systems (ECIS), Münster, Germany.

23 Variables of the Reference ModelMetadata on these factors should be collected, stored and delivered to expose unknown contextual variations. CREATE COLLECT RECORD Variables of the Reference Model Quality Maturity Precise Semantics Complete Context Automated Mechanisms Traceable Dimensions Etc. USER SituationMetadata USER INTERFACE TASK APPLICATION TOOL DATABASE ENVIRONMENT

24 Reference model extends theoretical framework of TDQM to situated interactions at data supply situations Data Supply Lineage Data Manufacturing Lineage Data Supply Data Manufacturing Data Consumption CREATE COLLECT RECORD IP Maps Suggested Extension USER INTERFACE APPLICATION DATABASE USER TASK TOOL ENVIRONMENT CEIP Maps The reference model suggests that additional metadata about data supply situations should be collected, stored and delivered for secondary users. data creation, collection and recording user interface structures, application rules and database structures Additional metadata might later reveal hidden quality problems that simply cannot be recognized from a single perspective (e.g. single axis or context factor). Semantic heterogeneity Biased errors profiles Information Production Process

25 Context of Use at Data Entry Context of Use at Data EntryWe present an alternative paradigm for data and information quality research Task Task Tool Tool User User Environment Environment Context of Use at Data Entry Context of Use at Data Entry Prescriptive Planning Data entry can be managed by quality control. Semantic heterogeneity can be managed by standardization. Data entry is affected by complex circumstances. Descriptions are inherently vague and incomplete. Situated Action From the Situated Data Supply perspective, data standards and quality controls are only partial factors that should be taken into account by data consumers. There is a need to collect, store and measure contextual information that is out of the scope of plans and standards. The fundamental problem is that the significance of contextual details can never be fully determined beforehand, but their impact might become evident afterwards. Therefore, Situated Action perspective emphasizes the need for comprehensive contextual awareness and unbounded sensitivity to unexpected events. Data quality could be observed contextually rather than in relation to plans and standards. It is not a replacement to traditional data quality management, but a complementary approach that focuses on interpretation of the unexpected rather than impacting planned outcomes. Aims to improvement Aims to understanding

26 There is need for constructs, models, methods and instantiations for Transparent Data SupplyECIS 2015 ICIQ 2015 RESEARCH ON TRANSPARENCY - COMPLETED Empirical Requirements (Empirical Relevance) Kernel Theories (Theoretical Grounding) RESEARCH ON REFERENCE MODEL - COMPLETED Meta-Requirements Meta-Design IT Artifacts for Design Science Research (Hevner et al, 2004) Constructs Models Methods Instantiations FUTURE RESEARCH “Data Supply Lineage” “Reference Model” “Software Engineering Methods” “Metadata Mining Software” Are you interested?

27 Thanks for your attention! QUESTIONS?Sami Laine Aalto University, Department of Computer Science and Engineering, Finland https://www.researchgate.net/profile/Sami_Laine/ FULL-TEXT DOWNLOAD