1 Plan for Today’s Lecture(s)Intro to Resource Description Resource Properties and Description Creating Resource Descriptions 1
2 INFO 202 “Information Organization & Retrieval” Fall 2015UNIVERSITY OF CALIFORNIA, BERKELEY SCHOOL OF INFORMATION INFO 202 “Information Organization & Retrieval” Fall 2015 Robert J. Glushko @rjglushko 23 September 2015 Lecture 8.1 – Introduction to Resource Description
3 An Overview of Resource DescriptionWhat is a resource description? Why do we describe resources? What resource properties should be described? How are resource descriptions created? What makes a good resource description? 3
4 What is a Resource Description?Information that is created intentionally and associated with a resource to enable it to be organized and interacted with A resource description is also a resource from the perspective of any other resource that uses it Structured descriptions of information resources are often called “metadata” 4
5 Concert Ticket: Both a Resource and a Resource Description
6 A Resource
7 Metadog, not Metadata dans Les Baux-de-Provence, 8 Août 2013
8 Why We Describe ResourcesWe describe resources so we can refer to them, select them, organize them, interact with them, and maintain them Each purpose might require different descriptions and different methods of using them Different resource domains can have characteristic or standard resource descriptions (or description categories) 8
9 Kimra’s Kitchen Chair
10 Resource Descriptions as SubstitutesA resource description is often a functional substitute for the resource it describes when the latter can’t be accessed or used Computer-processable descriptions of physical resources are especially good substitutes because they enable manipulations and interactions otherwise impossible At much larger scope and scale… 10
11 Why We Describe ResourcesBut different types of resources must also have differentiating properties, otherwise there would be no reason to distinguish them as different types Over time as a collection of resources grows and as requirements for interactions change, the reasons for describing resources will also change, and the descriptions must change as well We often combine descriptions... and we often compare them 11
12 Objectives of Bibliographic DescriptionFinding a resource that you know exists Identifying a resource to make sure you have the one you were looking for Selecting a resource from a set of candidates in a collection Obtaining the resource if what you have at this point is just a resource description 12
13 Complications The properties of resources that are easiest to describe are not always the most useful ones, especially for information resources For non-text information resources this problem is magnified because the content is often in a semantically opaque format that cannot usefully be analyzed by people Business strategy and economics strongly influence the extent of resource description 13
14 Stop and Think: Real Estate AdvertisementsReal estate advertisements are notorious for their creatively optimistic descriptions; a house that is advertised as being “convenient to transportation” is most likely next to a busy highway or bus route, and a house in a “secluded location” is in a remote and desolate part of town How would you describe the house or apartment or room where you live in a way that turns its negatives into positives?
15 INFO 202 “Information Organization & Retrieval” Fall 2015UNIVERSITY OF CALIFORNIA, BERKELEY SCHOOL OF INFORMATION INFO 202 “Information Organization & Retrieval” Fall 2015 Robert J. Glushko @rjglushko 23 September 2015 Lecture 8.2 – Resource Properties and Description
16 Classifying Resource Properties
17 Intrinsic Properties Intrinsic properties are inherent in a resourceSome are static, never changing their values Others are dynamic, but they change “from the inside of the resource” not “from the outside” by actions or efforts of outside agents (like “developmental” properties – age, skill, experience) Intrinsic properties can sometimes be used as an identifier - like a computed "signature" for a document or media object 17
18 Physical Properties 1 Physical or perceptible properties are those "at the surface" that are immediately apparent For "natural" things, these physical properties often make excellent descriptors because they are intrinsic or inherent rather than assigned They occur in consistent, predictable, and correlated combinations, which makes them reliable aids to identifying instances or types of things (the “joints of nature”) 18
19 Distinctive Physical Properties19
20 Picture and Icon Signs as Resource Descriptions
21 Physical Properties 2 For "manmade" things surface properties are less predictable ("innovation" enables the separation of form and function) For "information" things in physical form the correlation between appearance and content varies across the Document Type Spectrum For "born digital" resources the appearance is often extrinsic because it is assigned or associated (a style sheet, for example) instead of being an inherent part of the resource when it was created 21
22 Which Properties Best Identify a Thing? A Type of Thing?The properties best for identifying an instance might be those that aren't typical of its class 22
23 Extrinsic Properties Extrinsic properties are assigned to a resource rather than being inherent in it Some extrinsic properties are static: assigned names or identifiers don’t usually change Other extrinsic properties change often: the current location, popularity, price, etc. of a resource 23
24 Cultural Properties Cultural properties are those that derive from conventional language use or culture and often involve analogy Because they derive from knowledge of culture or language, they might be unintelligible to people who don't have the same perspective and experience .. and they might lose their cultural salience 24
25 “Holbein” Carpets More about Holbein and Henry VIIIThe Ambassadors, 1553 Anatolian Carpet, 16th Century Anatolian Carpet, 16th Century A Where does Holbein rank on a list of most important painters of all time? More about Holbein and Henry VIII
26 Contextual PropertiesContextual properties relate to the situation or context in which the thing being described exists Dey et al (2001) define context as “any information that characterizes a situation related to the interactions between users, applications, and the surrounding environment.” This open-ended definition implies many contextual properties that might be used in a description Since context changes, context-based descriptors might be appropriate when assigned but not later; see "persistence" and "effectivity" 26
27 Contextual Description == Private Language?A "guest" tag makes sense to the photographer but no one else would use it to describe this photo if they don’t share the context
28 Structural PropertiesThe internal and external structure of a thing can be a useful part of its description (sometimes these are intrinsic, and other times extrinsic) The number and arrangement of component parts (e.g., the number of chapters or pages in a book) The number and type of connections with other resources (e.g., the number of Facebook friends or Twitter followers) 28
29 Meaning is Structure Could you describe this without using structural concepts?
30 The Social Network of Jesus
31 Properties & Principles of Kitchen OrganizationIntrinsic static properties: If you store your pots, frying pans, and baking pans and nest each set by size Extrinsic static properties: A spice rack with the spices arranged in alphabetical order Intrinsic dynamic properties if you arrange your milk and other perishable goods by expiration date, a “useful life remaining” property that decreases to zero as the expiration date approaches Extrinsic dynamic properties if you put the most frequently used condiments or spices in the front of a refrigerator or pantry shelf. 31
32 Properties & Principles of Document OrganizationIntrinsic static properties: Author, date published, words in the text Extrinsic static properties: ISBN, LOC Classifications Intrinsic dynamic properties: Effectivity (e.g., laws and regulations) Extrinsic dynamic properties: Links/citations to and from other documents 32
33 Stop and Think: Principles of Arrangement in a Clothing StoreIntrinsic static properties: … Extrinsic static properties: … Intrinsic dynamic properties: … Extrinsic dynamic properties: … 33
34 Properties vs. RelationshipsBut while we can express structure within and between resources as properties, there is “better language” for expressing this as RELATIONSHIPS Arrangement Proximity Connectivity Part-whole … (See TDO Chapter 5) 34
35 INFO 202 “Information Organization & Retrieval” Fall 2015UNIVERSITY OF CALIFORNIA, BERKELEY SCHOOL OF INFORMATION INFO 202 “Information Organization & Retrieval” Fall 2015 Robert J. Glushko @rjglushko 23 September 2015 Lecture 8.3 – Creating Description Vocabularies and Assigning Descriptions
36 Process of Describing ResourcesIdentify / scope the resources to be described Determine the purposes or uses of the descriptions Study the resource(s) to identify descriptive properties Design the description vocabulary Design the description form and implementation Create the descriptions (either "by hand" or by some automated / computational process Evaluate the descriptions (TDO FIGURE 4-3) 36
37 Scoping and DescriptionHow explicit and thorough these 7 steps need to be depends on what we are calling "scope“ Do we know who the users are? Are the describers the users? Do we have all the resources, a representative set, or just the first set ... do we know what else is coming? 37
38 Designing the Description VocabularyGood descriptions use terms that their intended users might use Good descriptions don't contain details that aren't necessary Good descriptions are created systematically and follow standards 38
39 Stop and Think: Description and ExpertiseEveryone knows something about trees, but some people know more than others, and their particular experience and perspective influences how they describe trees. What kind of properties and descriptions would be used by university students? By research botanists? By landscape designers? By park maintenance workers? By indigenous people who live in tropical rain forests? 39
40 The Need for Controlled VocabulariesThe words people use to describe things or concepts are "embodied" in their context and experiences... so they are often different or even "bad" with respect to the words used by others These naturally-occurring words are an "uncontrolled vocabulary" 40
41 The Need for Controlled VocabulariesSearches for resources using an uncontrolled vocabulary will not succeed when they fail to match resources described or indexed using a controlled vocabulary To agree on the words we use in descriptions will improve recall, but it means that we must use a subset of the words we would otherwise use 41
42 What is a Controlled Vocabulary?A controlled vocabulary is a standardized set of terms (such as subject headings, names, classifications, etc.) assigned by organizers / cataloguers / indexers of resources A CV can be a content standard for the values used in (or as) descriptive elements A CV can be thought of as a fixed or closed dictionary in which everything must be defined using the same set of terms 42
43 Types of Controlled VocabulariesDictionaries Authoritative names Authority control for places and time periods Identifiers Code lists Subject heading lists (like the Library of Congress – LCSH) Thesauri Classification systems 43
44 Horizontal and Vertical VocabulariesConcepts that are common to all (or a large number) of domains examples: XCBL, UBL Vertical Particular industry or vertical market Specialized product or process semantics Sometimes called “domain-specific languages” 44
45 “Dimensionality Reduction” in DescriptionsCreating a controlled vocabulary reduces the number of descriptive terms that are or can be assigned to something When achieved by computational techniques, "dimensionality reduction“ goes by names like "principle components analysis," "orthogonal decomposition," "latent semantic analysis," "factor analysis," “feature extraction,” and others These techniques analyze the correlations between descriptors and transform a large set into a smaller set of uncorrelated ones But they are statistical constructs and might not have any clear interpretation 45
46 Description Form and Implementation: Format Matters!
47 Creating Resource DescriptionsBy professionals: "Institutional" descriptions that follow standards By authors: Best knowledge of purpose and intended audience By users: Most variable, influenced by social purposes By automated processes: Most reliable, primarily technical / objective properties but semantic description capability is emerging 47
48 Evaluating Resource DescriptionsQuality should be evaluated with respect to intended purposes... but what if different stakeholders have different purposes? Creating resource descriptions can be costly; is it worth it? How and when measure the costs? Computational description is vastly less expensive, but is it good enough? How can we incent "community" or "crowdsourced" description to be of better quality? 48
49 Description Properties & Principles in Persona Design; Evaluating PersonasA persona is a resource instance (a user model) created to stand for a resource type (a class of users) How much detail should a persona have? How much should this detail be based on "real" demographic, psychographic, or marketing data? Should personas evolve, or should they be static? Intrinsic static properties: … Extrinsic static properties: … Intrinsic dynamic properties: … Extrinsic dynamic properties: … 49
50 Improved Methods for Creating PersonasPruitt & Grudin, "Personas: Practice and Theory", Proceedings of the Conference on Designing for User Experiences, 2003. Use market research and existing user research about actual or potential customer segments to determine which segments should be "enriched" into personas Begin with the existing market data and research, and don't "enrich" until this foundation work has been incorporated Use standard documents as templates or metamodels to guide the collection and use of the persona source information so that personas are consistent in the kinds of descriptions they contain Validate and prioritize personas by using them to guide subsequent usability and market research 50
51 Data-based vs. Intuitive Personality JudgmentsYouyou, W., Kosinski, M., & Stillwell, D. (2015). Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences, 112(4), Use Facebook “likes” of articles, videos, artists and other items to predict a person’s self-ratings on openness, conscientiousness, extraversion, agreeableness and neuroticism computer could more accurately predict the subject's personality than a work colleague by analyzing just 10 likes, better than a friend with 70, a family member with 150, and a spouse with 300 “computers could be able to infer our psychological traits and react accordingly, leading to the emergence of emotionally intelligent and socially skilled machines” 51
52 Readings for Next LectureKent Chapters 2 and 3 Naumann, Felix and Herschel, Melanie. An Introduction to Duplicate Detection, Chapter 1, 2010. 52
53 Midterm #1 Available from noon Friday to midnight Sunday?You can choose any 60 minute time period to complete the exam Answer any 3 of the 5 questions Answer ALL PARTS OF THE QUESTION 53