Folie 1 - Fabian M. Suchanek

Report
YAGO – A Core of Semantic Knowledge
Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum
(Max-Planck Institute for Computer Science Saarbrücken/Germany)
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
1
Overview
‫ ر‬Motivation
‫ ر‬The Yago ontology
‫ ر‬Content
‫ ر‬Model
‫ ر‬Extension
‫ ر‬Conclusion
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
2
The Truth about Elvis
Elvis is alive!
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
3
The Truth about Elvis
Elvis is alive!
He works as an astronaut in
NASA's special security program
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
4
Usual solution
Which NASA astronaut was born when Elvis was born?
Yields only rubbish.
Reasons:
1. Google participates in the conspiracy
2. Google does not search knowledge, but Web sites
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
5
Solution: An ontology
astronaut
born
born
1935
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
is an
?
6
Solution: An ontology
entity
subclass
person
subclass
is a
astronaut
is a
born
born
1935
means
"Elvis Presley"
Fabian M. Suchanek
?
means
"The King"
YAGO - A Core of Semantic Knowledge
7
Solution: An ontology
entity
subclass
Classes
person
Relations
subclass
is a
astronaut
is a
born
Individuals
1935
means
Words
born
"Elvis Presley"
Fabian M. Suchanek
?
means
"The King"
YAGO - A Core of Semantic Knowledge
8
Where do we get the ontology from?
Previous approaches:
‫ ر‬Assemble the ontology manually
(WordNet, SUMO, GeneOntology)
Problems: Usually low coverage (MPI is in none of these)
‫ ر‬Extract the ontology from corpora (e.g. the Web)
(KnowItAll, Espresso, Snowball, LEILA)
Problem: Usually low accuracy (50%-92%)
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
9
Where do we get the ontology from?
YAGO approach:
Assemble the ontology from Wikipedia (=> good coverage)
Use the category system of Wikipedia (=> good accuracy)
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
10
Exploiting the Wikipedia category system
Elvis Pr
born
blah blah blub Elvis (don't read this! Better listen to
the talk!) laber fasel suelz. Insbesondere, blub,
texte zu, und so weiter blah blah blub Elvis laber
fasel suelz. Blub, aber blah! Insbesondere, blub,
texte zu, und so weiter blah blah blub Elvis laber
fasel suelz. Insbesondere, blub, texte zu, und so
weiter
1935
Exploit relational categories
Categories:
1935_births
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
11
Exploiting the Wikipedia category system
Elvis Pr
American_singer
is a
born
blah blah blub Elvis (don't read this! Better listen to
the talk!) laber fasel suelz. Insbesondere, blub,
texte zu, und so weiter blah blah blub Elvis laber
fasel suelz. Blub, aber blah! Insbesondere, blub,
texte zu, und so weiter blah blah blub Elvis laber
fasel suelz. Insbesondere, blub, texte zu, und so
weiter
Categories:
1935
Exploit relational categories
Exploit conceptual categories
American_singers
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
12
Exploiting the Wikipedia category system
Elvis Pr
Disputed_article
American_singer
is a
is a
born
blah blah blub Elvis (don't read this! Better listen to
the talk!) laber fasel suelz. Insbesondere, blub,
texte zu, und so weiter blah blah blub Elvis laber
fasel suelz. Blub, aber blah! Insbesondere, blub,
texte zu, und so weiter blah blah blub Elvis laber
fasel suelz. Insbesondere, blub, texte zu, und so
weiter
Categories:
1935
Exploit relational categories
Exploit conceptual categories
Avoid administrational categories
Disputed_articles
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
13
Exploiting the Wikipedia category system
Rock'n_Roll_Music
American_singer
Elvis Pr
is a
is a
born
blah blah blub Elvis (don't read this! Better listen to
the talk!) laber fasel suelz. Insbesondere, blub,
texte zu, und so weiter blah blah blub Elvis laber
fasel suelz. Blub, aber blah! Insbesondere, blub,
texte zu, und so weiter blah blah blub Elvis laber
fasel suelz. Insbesondere, blub, texte zu, und so
weiter
Categories:
Rock'n_Roll_Music
Fabian M. Suchanek
1935
Exploit relational categories
Exploit conceptual categories
Avoid administrational categories
Avoid thematic categories
YAGO - A Core of Semantic Knowledge
14
The Upper Model
entity
person
?
American_singer
is a
born
1935
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
16
The Upper Model: From Wikipedia?
Business
Social_group
People_by_occupation
?
American_singer
is a
born
1935
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
17
The Upper Model: From WordNet?
Person#3
Singer#1
...
Singer#17
American_singer
is a
born
1935
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
18
The Upper Model: From WordNet?
Person#3
Origin#7
Singer#1
...
Singer#17
American_singers_of_Jewish_origin
is a
born
1935
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
19
The YAGO ontology
Person#3
subclass
Singer#1
means
subclass
"singer"
American_singer
is a
born
1935
Fabian M. Suchanek
means
"Elvis Presley"
YAGO - A Core of Semantic Knowledge
20
The YAGO ontology: Accuracy
Relation
subclass
Accuracy
is a
94.54% +/- 2.36%
familyName
97.81% +/- 1.75%
givenName
97.62% +/- 2.08%
establishedIn
90.84% +/- 4.28%
bornInYear
93.14% +/- 3.71%
diedInYear
98.72% +/- 1.30%
locatedIn
98.41% +/- 1.52%
politicianOf
92.43% +/- 3.93%
writtenInYear
94.35% +/- 3.33%
hasWonPrize
98.47% +/- 1.57%
Fabian M. Suchanek
97.70% +/- 1.59%
YAGO - A Core of Semantic Knowledge
21
6,000,000
The YAGO ontology: Number of Facts
Ontologies should not be
judged purely by the number
of facts! This is just an
informational overview.
2,000,000
30,000
60,000 200,000 300,000
KnowItAll
SUMO WordNet OpenCyc
Fabian M. Suchanek
Cyc
YAGO - A Core of Semantic Knowledge
Yago
22
The Yago Model: Why binary is not enough
singer
(Elvis, is_a, singer)
(But only from
1953 to 1977)
is a
(We know this
from Wikipedia)
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
23
The Yago Model: Why binary is not enough
singer
#1 (Elvis, is_a, singer)
time
is a
1953-1977
#2 (#1, time, 1953-1977)
#3 (#1, source, Wikipedia)
source
Fabian M. Suchanek
Wikipedia
YAGO - A Core of Semantic Knowledge
24
The Yago model formally
A YAGO ontology over
‫ ر‬a set of relations R
‫ ر‬a set of common entities C
#1 (Elvis, is_a, singer)
‫ ر‬a set of fact identifiers I
#2 (#1, time, 1953-1977)
is a function
#3 (#1, source, Wikipedia)
I  (RCI)  R  (RIC)
We can talk about
‫ ر‬facts (#1, source, Wikipedia)
‫ ر‬additional arguments (#1, time, 1953-1977)
‫ ر‬relations (time, hasRange, time_interval)
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
25
The Yago model: Logical aspects
Axioms:
person
(x, is_a, y)
subclass
singer
is a
Fabian M. Suchanek
is a
(y, subclass, z)
=> (x, is_a, z)
...
YAGO - A Core of Semantic Knowledge
26
The Yago model: Logical aspects
finite, unique
f1, f2, f3, f4, f5,
f6, f7, f8, f9, f10
Axioms:
(x, is_a, y)
derive facts
(y, subclass, z)
=> (x, is_a, z)
f1, f2, f3, f4, f5
...
Eliminate facts
f1, f2, f3
Fabian M. Suchanek
finite, unique
YAGO - A Core of Semantic Knowledge
27
Extending the Ontology
Whom did Elvis marry?
X married Y
Elvis married Priscilla
Priscilla
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
28
Extending the Ontology with LEILA
Whom did Elvis marry?
subj obj
X married Y
subj
obj
Elvis, the great rock star, married Priscilla
Priscilla
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
29
Extending the Ontology
Ontology
(YAGO)
Information
Extraction
(LEILA)
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
30
The Truth about Elvis
Which astronaut was born in the same year as Elvis?
http://www.mpi-inf.mpg.de/~suchanek/downloads/yago/
Enter your Yago Query:
"Elvis Presley" bornInYear $year
$astro bornInYear $year
20 results
$astro isa astronaut
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
31
The Truth about Elvis
Which astronaut codenamed "Roger" was born in the same year as Elvis?
http://www.mpi-inf.mpg.de/~suchanek/downloads/yago/
Enter your Yago Query:
"Elvis Presley" bornInYear $year
$astro bornInYear $year
"Roger" givenNameOf $astro
$astro isa astronaut
Fabian M. Suchanek
$astro = Roger_Chaffee
YAGO - A Core of Semantic Knowledge
32
Conclusions
‫ ر‬Yago bases on a logically clean model
‫ ر‬Yago has an accuracy of around 95%
‫ ر‬Yago is 3 times larger than the largest competitor
‫ ر‬Elvis is alive
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
33
Reference
For all details, please refer to our technical report
"Yago – A Core of Semantic Knowledge"
(Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum)
available at http://www.mpii.mpg.de/~suchanek
BibTex:
@TECHREPORT{yagotr,
AUTHOR = {Suchanek, Fabian and Kasneci, Gjergji and Weikum, Gerhard},
TITLE = {Yago: A Core of Semantic Knowledge},
TYPE = {Research Report},
INSTITUTION = {Max-Planck-Institut f{\"u}r Informatik},
ADDRESS = {Stuhlsatzenhausweg 85, 66123 Saarbr{\"u}cken, Germany},
NUMBER = {MPI-I-2006-5-006},
YEAR = {2006}
}
Fabian M. Suchanek
YAGO - A Core of Semantic Knowledge
34

similar documents