Rob Gleasure robgleasure.com

1 Rob Gleasure [email protected] robgleasure.comIS6146 Da...
Author: Moses Riley
0 downloads 5 Views

1 Rob Gleasure [email protected] robgleasure.comIS6146 Databases for Management Information Systems Lecture 11: The Semantic Web and Semi-Structured Data - Introducing XML Rob Gleasure robgleasure.com

2 IS6146 Today’s session From Web 1.0 to Web 2.0From Web 2.0 to ‘The semantic Web’ (Web 3.0) Semi-structured and unstructured data Introduction to XML

3 The 2000s and Web 2.0 What is Web 2.0? Not really one clear definition… “Web 2.0 is the business revolution in the computer industry caused by the move to the Internet as a platform, and an attempt to understand the rules for success on that new platform” (Tim O’Reilly) “A mix of technology and business process that facilitates conversational marketing” (Bottle PR) “Web 2.0 isn’t a thing… it’s a state of mind” “The web is changing from a document delivery system to an application platform” (Andy Budd)

4 The 2000s and Web 2.0 (continued)What is Web 2.0? Web 2.0 is largely a reaction to the mistakes of 2000 and the “dot-com collapse”. What happened?  Naïve business models, often solving non-existent problems  Technology for technology’s sake  Assumption technology could solves non-technology problems  Over estimation of usability and accessibility  Lack of respect for standards

5 The 2000s and Web 2.0 (continued)Web 1.0 vs. 2.0 Web 1.0 Web 2.0 About reading About writing About companies About communities About HTML About XML About Home pages About Blogs About Portals About RSS About Taxonomy About Tags About Wires About Wireless

6 The 2000s and Web 2.0 (continued)Web 1.0 vs. 2.0 Web 1.0 Web 2.0 About Owning About Sharing About IPOs About trade sales About Windows, Netscape, etc About Google, Facebook, etc About Web forms About Web applications About Screen scraping About APIs About Dialup About Broadband About Hardware costs About Bandwidth costs

7 The 2000s and Web 2.0 (continued)Web 2.0 & the Architecture of Participation Providing a service, not a product Encourage user contribution Collective intelligence Make it easy to re-use and re-mix Customer self-service Community and sense of ownership

8 The Semantic Web More recently, this has changed againSo much content out there, the problem becomes making sense of it Move towards ‘the semantic web’ (Web 3.0) Challenge for users is to Distil content down to the stuff of interest Distribute content without overwhelming everyone

9 The Semantic Web and the Movement of InformationThis changing dynamic away from broadcast models of communication to more collaborative models also means that information moves differently This means the channels to users are changed for digital enterprises Research channels Marketing channels Sales channels

10 The Semantic Web and the Movement of InformationAs anyone can contribute, the quality of individual contributions suffers Remember, what any one person contributes to the crowd may be poor, it’s the aggregation that creates value This means most people don’t search the web passively for new things, it would take too long and there’s too much rubbish out there People connect with family, friends, and other people whom they trust as a source of information, then content spreads among networks Think of how class-relevant information spreads among the class through Facebook

11 Example: Retweets During the Australian Federal Election CampaignImage from

12 Example: News Spreading About the Discovery of the Higgs Boson

13 One Day… In 1906 Galton was at a fair when he came upon a contest asking individuals to guess the weight of the meat on an ox Galton saw the data from the guesses as an opportunity to demonstrate the futility of democracy, as he believed that experts (e.g. butchers) would get very close, while the public would be miles off The mean median guess was 1,207 and the mean was 1,197lb The correct number was 1,198 The crowd as a whole were closer than any individual experts

14 The Wisdom of Crowds James Surowiecki (2004) used Galton’s story as a jumping off point to coin the term ‘wisdom of crowds’ The wisdom of crowds does not describe a collective interdependent form of cognition, instead it describes the aggregation and consolidation of cognition by independently-minded individuals This allows individual biases and skewed personal experiences to be averaged out

15 The Wisdom of Crowds Surowiecki argues there are basically three types of crowd wisdom Crowd cognition Experts, non-experts, and individuals with varying experiences may use different biases and reference points to solve some mental problem Statistically, this means that they are each prone to different errors Taken together, these errors average and cancel out one another

16 The Wisdom of Crowds Surowiecki argues there are basically three types of crowd wisdom Crowd coordination Individuals in a crowd typically use less thorough analysis than they might if solving a problem individually, e.g. if you were solely and personally responsible for selecting your TDs, you would probably spend more time thinking about it than you do now This means individual behaviour is faster, more dynamic, and more responsive Less need for upfront planning and top-down control

17 The Wisdom of Crowds Surowiecki argues there are basically three types of crowd wisdom Crowd cooperation Individuals are more free to establish trust when there is no hierarchy dictating them terms Individuals can self-organise to interact with those whom they feel most connection Emergent standards and repeated processes fill the void when structure is absent, e.g. 4Chan

18 Wise Crowds vs GroupthinkSurowiecki also notes that not all crowds are wise, irrationality snowballs in some crowds, e.g. stock market bubbles  Wise crowds have 4 characteristics Diversity of Opinion Creativity demands diversity, if an individual is not viewing the problem in a way that is somehow different their contribution is minimal Independence of Opinion Diversity is stiffed if individuals feel pressured to conform or experts create a culture of graded respect

19 Wise Crowds vs Groupthink (continued)Decentralisation People with specialised skills or knowledge are allowed to draw on those skills or knowledge Diversity becomes decreasingly useful if certain portions of the crowd are ignored Aggregation Information cascades result if decisions are made in sequence, as individuals gravitate towards existing opinions Intelligence needs to be gathered privately, then combined

20 Signs of Groupthink A move towards groupthink from crowd wisdom usually manifest several warning signs Little discussion of alternatives Little discussion of risk Little information search Little discussion of contingency The same people dominate discussions

21 Folksonomies Content is tagged non-hierarchically and without explicit relationships between tags Consensus is formed around the best tags to capture the meaning and usefulness of specific content, which is then searchable accordingly Sometimes this is explicit, other times implicit

22 Example: Reddit.com ‘The front page of the Internet’, founded in 2006Basically a bulleting board – someone starts a thread on the main board or in a subreddit Other users vote up or down the thread, as well as comment on it – those comments also get voted up or down The whole web is distilled into hierarchically organised content, opinions, and insights Question – how does the organisation match your interests? Remember, no one is really ‘average’ or ‘normal’

23 Example: Delicious.comA site/browser extension where users can add links to content that they wish to ‘bookmark’ Users tag each of their bookmarks with freely chosen index terms Users can click these tags to see links that have been tagged similarly by other users Suggested content is also presented to users, based on trends among other users

24 Why Would Individuals Bother Labelling and Sharing Content?It is easy and enjoyable It easy and pretty quick to do We get social feedback that helps us ensure we’re in line with the consensus It’s part of our nature When we’re interested in something we typically want to show people, e.g. ‘this new song/movie/site/product I came across’ or ‘this picture you forgot I took on that night out where I look totally fine and you look like you’re in bits, LOLz’

25 Semi-Structured and Unstructured DataAny type and no predictable format or sequence Des not follow any obvious rules Examples include text, images, video or sound Semi-structured data (sometimes called ‘self-organising’) Organized semantically (with meaning) Similar entities are grouped together Entities in the same group may not have same attributes Order of attributes may not be important and not all attributes may be required No overarching logical system relating entities

26 Semi-Structured and Unstructured DataImage from

27 Semi-Structured Data TechnologiesBasically come in two flavours XML (eXtensible Markup Language) JSON (JavaScript Object Notation) XML is slightly more cumbersome than JSON, but also more powerful thanks to The ability to support a greater range of data types The support of several embedded web technologies, i.e. XML schema, XSLT, and Xpath/XQuery We’ll focus on XML in this course

28 Break?

29 What is XML? XML stands for eXtensible Mark-up LanguageXML has its roots in 1960’s SGML (Standard Generalized Markup Language) XML is an open W3C Standard (completely Platform & vendor independent) XML was not designed to replace HTML XML designed to describe data HTML/CSS designed to format data

30 What is XML? Doesn’t really “do” anything…XML tags not predefined and must be created  Allows you to create your own language according to your own semantics XML files have a .xml extension XML is supported by a family of technologies, such as XML, DTD, XML Schema Definition Language, XSLT/XSL-FO, XPath, Xpointer/Xlink, DOM/SAX XML’s biggest asset is its widespread support!

31 What is XML used for? XML to separate data from HTMLDynamic content is becoming more and more popular on the web In combination with JavaScript, the data content of HTML pages can be updated and extracted quickly and safely XML to simplify data sharing and transport There are many incompatible ways to store data amongst different applications XML data is stored in plain text format in a standardised layout  This means that incompatible systems can communicate and share data with little or no initial configuration

32 What is XML used for? XML to create new internet languagesExamples of languages include: XHTML the latest version of HTML  WSDL for describing available web services WAP and WML as markup languages for handheld devices RSS languages for news feeds RDF and OWL for describing resources and ontology SMIL for describing multimedia presentations for the web 

33 The Structure of XML XML documents must be both ‘well formed’ and ‘valid’ XML documents have a nested ‘tree’ structure  starts at "the root" , branches to "the leaves". (all XML documents are required to have a root element of some description) XML elements must have a closing tag Note that unlike some other web-based (e.g. HTML) markup languages, XML tags are case sensitive and white space in text is preserved

34 XML Elements An element can contain text and/or other elementsAn element may have one or more attributes An element’s name can contain letters, number or other characters. However, it can’t start with a number or punctuation character or the letters xml. It also can’t contain spaces.

35 XML Attributes XML elements can have attributes in the opening tagThese must always be quoted, e.g. Attributes are useful for small pieces of metadata (such as IDs) but not suitable for actual data as They can’t contain multiple values simultaneously You can’t nest other data within them They’re difficult to expand later on

36 Well Formed and Valid XML DocumentsOnce the XML document adheres to all of this syntactic rules, i.e. it includes the proper xml declaration at the beginning has a root element has closing tags for each element has case sensitive tags has properly nested elements has properly quoted attribute values W3schools have a validator online at

37 Example of a Well Formed XML Document Paul Anne Shopping list Tea Biscuits Milk

38 XML Namespaces Sometimes unrelated concepts will be named similarly, e.g. order records may have a element that refers to quantity, employees may have a element that refers to their phone extension We can do this by giving such elements a unique prefix, e.g. the number element from orders may become , the number element from staff

39 XML Namespaces (continued)This relies on our declaring the prefix using the xmlns attribute in the parent node for conflicting elements, i.e. xmlns:prefix="URL" E.g. xmlns:e ="http://www.foo.com/employees"> Jane Smith Philange 10 The URL doesn’t need to point anywhere, it just needs to be unique so that the namespace is unique

40 XML Schemas and the Document Type DefinitionOnce a document is ‘well formed’, it may then be ensured that it is ‘valid’ A valid XML document conforms to the rules of a Document Type Definition (DTD) or XML Schema We’re going to focus on schemas, as XML Schemas are extensible to future additions XML Schemas are richer and more powerful than DTDs XML Schemas are written in XML XML Schemas support data types XML Schemas support namespaces

41 XML Schemas and XML DocsXML Document XML Document XML Document

42 XML Schemas An XML Schema may look something like:                        

43 Want to read more? Links and references for XML

44 Want to read more? Tim O’Reilly’s seminal paper on Web 2.0 Discussion of Web 2.0 style Article on gamification in Web 2.0 An archive of older sites, useful for seeing how things have changed

45 Want to read more (continued)?Study of Twitter information sharing during the London riots https://www.asis.org/Bulletin/Dec- 11/DecJan12_Tonkin_Pfeiffer_Tourte.pdf How TNT Made The Biggest Viral Video Ad Of The Year—In Belgium most-viral-videos-of-this-year ?op=1&IR=T Analysis of Twitter use during the Australian federal election campaign, 2013 interaction-on-twitter/ Analysis of Twitter activity surrounding discovery of the Higgs higgs-gossip.html