Semantic Search-SLA-20140609-Busch

Report
Taxonomy Strategies
The Search for Meaning and
Semantics: Taxonomies Get It Done
Joseph Busch – Why Semantics Matter
June 9, 2014
Copyright 2014 Taxonomy Strategies. All rights reserved.
Agenda
 Why semantics matter (… a quick review from 2001)
 What is semantic search, SKOS and Linked Data?
 Some semantic search examples?
Taxonomy Strategies The business of organized information
2
Why Semantics Matter
May 20, 2001
Taxonomy Strategies The business of organized information
3
When you own a
Rembrandt you can
spell his name any
way you want.
Taxonomy Strategies The business of organized information
4
But when you
want to find a
Rembrandt …
you better spell
his name
correctly.
Taxonomy Strategies The business of organized information
5
Vocabulary resources can help find the right
artist even if their name is typed incorrectly.
Taxonomy Strategies The business of organized information
6
Users cannot type in the
complex queries needed to
find all the relevant items...
But this can be done
automatically.
Taxonomy Strategies The business of organized information
7
Complex queries are
even more important
when you search the
entire web.
Taxonomy Strategies The business of organized information
8
So you find Rembrandt the
Dutch guy...
Taxonomy Strategies The business of organized information
9
… And not Rembrandt
the toothpaste.
Taxonomy Strategies The business of organized information
10
Getty Vocabularies Linked Data Services
February 19, 2014
Taxonomy Strategies The business of organized information
11
Agenda
 Why semantics matter
 What is semantic search, SKOS and Linked Data?
 Some semantic search examples?
Taxonomy Strategies The business of organized information
12
Search Failure
 19% Character errors.
(Young, et al)
 40% Vocabulary errors.
(Seaman. Norgard, et al)
 20% Index confusion.
 21% Successful (Nielsen)
Taxonomy Strategies The business of organized information
19%
21%
40%
20%
13
Taxonomy Strategies The business of organized information
14
Semantic search solution
 Semantic search improves search accuracy by inferring the
contextual meaning of terms via:
 Disambiguation
 Part of speech (POS) analysis
 Synonyms, variations and quasi-synonyms
 Concept matching
 Natural language query analysis
 Key sentence detection
 Generate more consistent content to search on.
 Correct user errors.
 Map the language of users to the language of the target content.
 Augment search results with linked data.
Taxonomy Strategies The business of organized information
15
What semantics do for search?
Function
Description
Related search
Query corrections … did you mean?
Concept search
Query expansion with synonyms, abbreviations,
acronyms, etc.
… do you also want?
Ontology-based search Query expansion with narrower or broader terms;
scoping exhaustive search results
Faceted search
Dynamic filtering of search results; online shopping
Clustering
Dynamically bucketing search results into predefined categories
Stored queries
RSS feeds, alerts, SDI (selective dissemination of
information), etc.
Personalization
Weighting search results based on explicit profiles
and implicit data (where you’ve been and what
you’ve done)
Taxonomy Strategies The business of organized information
16
What is SKOS?
 Provides the basis for any user, tool, or program to identify, define
and link concept vocabularies.
Relationship
Definition
Concept
A unit of thought, an idea, meaning, or category of objects or
events. A Concept is independent of the terms used to label it.
Preferred Label
A preferred lexical label for the resource such as a term used in a
digital asset management system.
Alternate Label
An alternative label for the resource such as a synonym or quasisynonym.
Broader Concept
Hierarchical link between two Concepts where one Concept is
more general than the other.
Narrower Concept
Hierarchical link between two Concepts where one Concept is
more specific than the other.
Related Concept
Link between two Concepts where the two are inherently "related",
but that one is not in any way more general than the other.
Taxonomy Strategies The business of organized information
17
CONCEPT
prefLabel
Fringe
parking
lc:sh85052028
altLabel
Park and
ride
systems
altLabel
altLabel
prefLabel
Park-nride
altLabel
Park
and
ride
Park &
ride
trt:Brddf
altLabel
Subject
Predicate
Object
lc:sh85052028
skos:prefLabel
Fringe parking
lc:sh85052028
skos:altLabel
Park and ride systems
lc:sh85052028
skos:altLabel
Park and ride
lc:sh85052028
skos:altLabel
Park & ride
lc:sh85052028
skos:altLabel
Park-n-ride
trt:Brddf
skos:prefLabel
Fringe parking
trt:Brddf
skos:altLabel
Park and ride
trt:Brddf
skos:altLabel
P&R system
Trt:Brdd
skos:broader
Parking
Taxonomy Strategies The business of organized information
altLabel
broader
P&R
system
trt:Brdd
prefLabel
Parking
18
Why SKOS?
According to Alistair Miles* (SKOS co-author)
 Ease of combination with other standards
 Vocabularies are used in great variety of contexts.
– E.g., databases, faceted navigation, website browsing, linked open data,
spellcheckers, etc.
 Vocabularies are re-used in combination with other vocabularies.
– E.g., Library of Congress Subject Headings + Transportation Research
Thesaurus; USPS states + USPS zip codes + US Congressional districts; etc.
 Flexibility and extensibility to cope with variations in structure and
style
 Variations between types of vocabularies
– E.g., list vs. classification scheme
 Variations within types of vocabularies
– E.g., Z39.19-2005 monolingual controlled vocabularies and the Transportation
Research Thesaurus
* Head of Epidemiological Informatics at Oxford University Wellcome Trust
Centre for Human Genetics (formerly OUP Senior Computing Officer)
Taxonomy Strategies The business of organized information
19
Why SKOS? (2)
 Publish managed vocabularies so they can readily be consumed
by applications
 Identify the concepts
– What are the named entities?
 Describe the relationships
– Labels, definitions and other properties
 Publish the data
– Convert data structure to standard format
– Put files on an http server (or load statements into an RDF server)
 Ease of integration with external applications
 Use web services to use or link to a published concept, or to one or more
entire vocabularies.
– E.g., Google maps API, NY Times article search API, Linked open data; etc.
 A W3C standard like HTML, CSS, XML and RDF, RDFS, and OWL.
Taxonomy Strategies The business of organized information
20
Agenda
 Why semantics matter
 What is semantic search, SKOS and Linked Data?
 Some semantic search examples?
Taxonomy Strategies The business of organized information
21
Taxonomy browser
Taxonomy Strategies The business of organized information
22
Taxonomy-powered search results
Taxonomy Strategies The business of organized information
23
Oracle.com top-level taxonomy
Person
Organization
Location
Content Type
Product Line
Technology
Audience
Products
Has a
Is a
Application
Industry Solution
Taxonomy Strategies The business of organized information
24
Oracle
event finder
http://events.oracle.com/
Filter on Location and Language
More filters based on this result
Subscribe to RSS
feed based on the
criteria set on this
page
Results shown on Google
maps UI
Taxonomy Strategies The business of organized information
25
APS Taxonomy browser
Taxonomy Strategies The business of organized information
26
Linked data example
A faceted
taxonomy of
concepts in
physics
APS Taxonomy
Broad Subject Areas
Methods & Theories
Phenomena
Physical Systems Astronomical systems
Atomic-scale objects
Beams
Complex systems
Dynamical systems
Electric & magnetic fields
Engineered materials
Fundamental particles
Gases delete
Information systems
Liquids delete
Materials
Materials by Composition
Nonlinear systemMaterials by Dimensionality
Nuclei
Materials by Property
Plasma
Materials by Structure
Quasiparticles
Taxonomy Strategies The business of organized information
Elements of the
periodic table,
and common
isotopes
Elements by Group
Group 1
Group 2
Group 3
Group 4
Group 5
Group 6
Group 7
Group 8
Group 9
Group 10
Group 11
Group 12
Cadmium
Group 13
Copernicium
Group 14
Mercury
Group 15
Zinc
Group 16
Group 17
Group 18
194Hg
196Hg
198Hg
199Hg
200Hg
201Hg
202Hg
204Hg
27
Paper submission tagging (prototype)
Taxonomy Strategies The business of organized information
28
Joseph A Busch
Mobile 415-377-7912
[email protected]
QUESTIONS
Taxonomy Strategies The business of organized information
29
Session description
 Semantic search – a phrase that is increasingly used in the popular
as well as the professional literature. What does it look like, and how
will it work. Panelists will present their visions of semantic search.
Program is designed to be interactive with audience participation –
suggestions for functions and features they see in the future.
 What is semantic search?
 What are the components of semantic search?
 How can it be used in libraries?
Taxonomy Strategies The business of organized information
30

similar documents