Slide

Report
Using Ontologies to make Smart
Cities Smarter
Rosario Uceda-Sosa, Biplav Srivastava and Bob Schloss
IBM Research
{[email protected], [email protected], [email protected]} @.ibm.com
June 2012
A Semantic Data Model for Smart Cities
A semantic data model (an ontology) of a city, if it is complete and authoritative, (1) simplifies the development of applications that
require integrated access to city data sources and (2) enables solution reuse as we move from one city to the next.
Independently of using ETL for data consolidation, a semantic data model (3) can extend the metadata with new categories
(SanitationServices, CrimesAgainstProperty) without modifying the application or the data sources.
Semantic
Data Model
2. Reuse
3. Metadata Extensions
Application Developer/
Consultant
[ETL]
Data sources
Data Model
An ontology can make a city interconnected and smart, but it needs to assume that
1. Cities have their own data sources, not necessarily connected, and may not want to consolidate them.
2. Cities have non-standard organizations, departments and competencies.
… but, what is an ontology, anyway?
What do you think?
… but, what is an ontology, anyway?
In Computer Science, “An ontology is a formal explicit description of concepts in a domain of discourse (classes (sometimes called
concepts)), properties of each concept describing various features and attributes of the concept (slots (sometimes called roles or
properties)), and restrictions on slots (facets (sometimes called role restrictions)). An ontology together with a set of individual
instances of classes constitutes a knowledge base. In reality, there is a fine line where the ontology ends and the knowledge base
begins.” [Noy, 2000]
Not to be confused with ontologies (and/or taxonomies) in Philosophy or Life Sciences
In a Smart City domain, we’re concerned with modeling the city data (city activity data, city departments, assets, KPIs), not
the city itself (the full set of spatial and temporal relations between people and objects in the city) Ontologies help us to
structure and reason about city events, entities and services.
Ontology = Class + Relations + Constraints
Knowledge Base = Ontology + instances + (Standard) Inference and rules
Not all ontologies are created equal
In practice, ontologies are used -together with inferencing engines and rules-, for a variety of purposes. If we
think of them as schemas, there are different ways
Purpose
Normative
schema
Integrative
Schema,
depend
on instances
Instances
Inferencing
Examples
As a deductive
system
Deductive System (axioms +
deductive rules)
Part of the
knowledge base
Defined by rules.
Expert systems,
Planning,
Optimization.
As a data blueprint
Constrain a domain
Must conform to
the normative
schema determined
by the ontology
Subsumption,
class inferencing
Biomedical and life
sciences (FMA,
Radlex)
As a data classifier
Classify open data
Unknown formats
Subsumption,
class inferencing
Tag ontologies
(MOAT, Echarte,
SCOT, NAO, etc.)
As a data integrator
Integrating pre-defined
model to existing data
sources
Instances are
mapped, no
constraint
enforcement.
Subsumption,
class, entity
inferencing
SCRIBE
As data mapping
vocabulary
Mapping to/from existing
data sources
Mined instances
determine the
ontology/schema.
Subsumption,
class inferencing
D2RQ (a tool)
SCRIBE belongs to the fourth category: It has no constraints and was designed to support the programming of tools that
allow domain experts to deal with entities natural to them (even if the recorded data is actually distributed).
What makes a good ontology for data integration?
A good ontology is a useful ontology, an ontology that both humans and systems can process.
Human
Usability
Communicable. Naming, natural language
support, etc.
Concise. A simple way to describe the key
entities of the model and yet able to infer many
facts
Consistent. Naming conventions and modeling
patterns
Authoritative to domain experts
Documented, not just descriptions, but also
provenance
Managed and maintained by people throughout
the model lifecycle.
Reusable in similar domains, for similar
instances.
System
Usability
Scalable so large amounts of data can be
parsed, stored and retrieved.
Efficient query and inferencing
Programmable solutions, both in open and
closed data paradigms.
Open infrastructure and tools
The SCRIBE Model of Cities
Scribe design decisions
A good ontology is a useful ontology, an ontology that both humans and systems can process.
Human
Usability
System
Usability
Communicable. Naming, natural language support, etc.
Natural language naming, user
readable labels
Concise. A simple way to describe the key entities of the model and
yet able to infer many facts
Anchor classes: events, services,
assets, KPIs. Simple and expressive
OWL sublanguage, relation
taxonomies
Consistent. Naming conventions and modeling patterns
Clear boundaries between classes
and instances.
Authoritative to domain experts
Alignment with standards
Documented, not just descriptions, but also provenance
Wealth of annotations
Managed and maintained by people throughout the model lifecycle.
Class stewards, involvement of
domain experts and end users
Reusable in similar domains, for similar instances.
Mechanisms for modularization of
extensions and customizations
Scalable so large amounts of data can be parsed, stored and
retrieved.
Caching mechanisms for DB data (?)
Efficient query and inferencing
Ontology-based inferencing (?)
Programmable solutions, both in open and closed data paradigms.
Data adapters and schema exploring
(?)
Open infrastructure and tools
Jena, DB2DRQ, Ruby on Rails, etc.
SCRIBE data model
SCRIBE is a non-normative, authoritative, modular, extensible semantic model for Smarter Cities.
It consists of a Core Model that includes common classes (events and messages, stakeholders, departments, services, city
landmarks and resources, KPIs, etc.), extensions by domain and customizations by city.
Simple
language
Simple
Simple language
language
•Classes
Inheritance
Relations
•Classes
•Classes ++
+ Inheritance
Inheritance ++
+ Relations
Relations ++
+
Inferencing
Inferencing
Inferencing
•Based
on
standards
(OWL-QL,
SPARQL)
•Based
•Based on
on standards
standards (OWL-QL,
(OWL-QL, SPARQL)
SPARQL)
•Mappable
to
UML
•Mappable
•Mappable to
to UML
UML
•Metadata
annotations
and
Tagging
•Metadata
•Metadata annotations
annotations and
and Tagging
Tagging
Common
building blocks
SCRIBE Core Model
City Customization
Extension
Organization/Operation profile
Authoritative
Authoritative
•Aligned
•Aligned with
with standards
standards (CAP,
(CAP, NIEM,
NIEM,
MISA/MRM,
UCore)
MISA/MRM, UCore)
•Validate
•Validate with
with customer
customer scenarios
scenarios
•Validated
•Validated with
with open
open city
city data
data
AssetManagement
Features
BuildingAndParcel
Transportation
Water
Weather
The key concepts of the SCRIBE Ontology
1. Describes messages, events and services as they flow through the system
Message
Message
Message
(Advisory)
Before/after
triggers
Before/after
triggers
Before/after
triggers
Event
Event
Event
(Storm, RoadWork)
WorkItem
Protocol
WorkItem
WorkItem
(RoadWorkWI)
WorkItem
(RoadWorkWI)
(RoadWorkWI)
(RoadWorkWI)
Protocol
(InfrastructureWorkP)
Protocol
(InfrastructureWorkP)
(InfrastructureWorkP)
Asset
(pipe, valve)
2. Represents types of city services (not the city organization itself) so the administrative structure of a city
can be assembled from SCRIBE building blocks
CityServiceArea
Owns
Agency
(WhitePlainsTraffic)
City and Government Standards and SCRIBE
While most of the standards relevant to Smarter Cities are message exchange models (CAP, UCore,
NIEM) or business planning (MISA/MRM) , SCRIBE integrates the (1) message-based models with (2)
asset management and (3) services and their KPIs in an extensible model.
CAP
UCore NIEM
MISA/MRM
Core entities
Alert, message
certainty, security,
urgency
Incident
People, Places, Events and
Things
Program, service, outcome, target
group, outcome.
Advantages
Simple to
implement and
read. Established
standard
Extension
mechanisms
defined.
Supported by
DoD, DHS, DoJ.
Tools for search and subset
extraction (SSTG) Established
standard. Well defined
extension process (IEPD)
International, municipality based
Issues
Subject and
related resources
are underdefined
Not mature
enough,
incomplete.
Large (4000 concepts) and
cumbersome (even with
support tools) Not deep in any
domain
Represents administration,
business planning of a city, not its
operation. Cumbersome to extend.
Representational
Language
XML
XML
XML with schema substitution
for inheritance
XML (rdfs?)
Smarter City Standards and SCRIBE
(1) A message is an event (with publisher/subscribers or requestors/responders) AND it has as a subject an
(external/processing) event. In principle, a message could refer to another message.
Entity
(Person, Organization,
- item)
Role
Organization
ServiceArea
(Person, Organization)
(CityOrganization)
(Public Safety, etc.)
causes
isStakeholder ->
hasRole ->
Stakeholder
Event
Is-a
(1)
ExternalEvent
Asset
Maximo-Based
Overlap, superset, etc.
Tom Travis
WorkItem
Planner
Message
subject
Transportation
Dept
CAP-Based
Overlap, superset, etc.
Stakeholder1
NIEM-Based
Overlap, superset, etc.
RoadRepair
WorkOrder
Intersection: Main
And Hamilton
The SCRIBE Metadata
Inferencing and object properties
There are three types of ‘horizontal’
relations:
• HasAttribute (inv. attributeOf) for
properties and attributes (name,
identifier, etc.)
• HasAggregateMember (inv
aggregateMemberOf) for parts or
members (hasChild, a process has
process steps as members)
• AssociatedTo (its own inverse) for
everything else
 We can do inferencing on
extensions to SCRIBE
SCRIBE tooling
SCRIBE is written using standard RDF/OWL editors and software (Jena)
Application Developer/
Consultant
EndUser
Model Tooling
Edit, extend model
Customize Model
Standard OWL/XML (TopBraid, Protégé, Pellet, SPARQL, etc. )
Integrate with Data
Query/Navigate Model and Data
MIDO, DB2RQL, R2DQ, etc.
Form-based queries? Record-based navigation?
Implementation
Simple subset of OWL, directly
mappable to UML
SCRIBE Core Model
Content
Semantic model of events, city assets,
geography and resources, city organization
and services, KPIs, processes,
City Customization
City Data Catalog
MIDO
Database Schema
SCRIBE is also
a. A modeling process
B. Tools to make the model usable. The first tool we’ve
worked on, MIDO (Mapping Instance Data to Ontologies),
allows the mapping of existing data to the SCRIBE model
and is part of the process of customizing SCRIBE to a new
city.
Customizing Scribe in different cities
Scribe is NOT closed. We know that cities have different organizations, different service levels and different
KPIs. The Scribe model is designed to provide the building blocks (service types, city departments, KPI
taxonomies, CAP messages) that can be customized to define the overall operations of a city
Standards (CAP, NIEM, MISA/MRM, etc.)
MIDO
Maps city data to
Scribe.
Populates model
with instance
data
Scenarios/Data (cities open data)
Scribe CORE
Services
Departments
Assets
KPIs
Services
Departments
Assets
KPIs
Services
Departments
Assets
KPIs
Washington D.C.
Chicago
Dublin
Scenario
311 events in Washington D.C.
Suppose a Smarter City application that manages city operations wants to display citizen complaints (311
calls) on a map, filtered by a few user-defined constraints (times, locations, type of call, etc.)
A fraction of the 311 incident table (from DC Open Data) is below. Among the data we have:
•
Identifier
•
Type of service (code + description)
•
Time (ServiceOrderDate, ServiceResolution date, etc.)
•
Place (Lon/Lat, Ward, PSA, District, etc.)
•
The agency that should handle the request
•
Various qualifiers (enum types): priority, resolution, etc.)
How to map 311 events to an existing model
The application may access directly the 311 table by querying incidents according to given criteria:
A “SELECT SERVICEREQUESTID SERVICETYPECODE LATITUDE LONGITUDE WARD DISTRICT
PSA DATEREPORTED FROM DC911 WHERE SomeConstraintHere”
OR
The application may define an intermediate (data model) layer that:
Event
Defines
a
ServiceRequest
object
that
knows
how
to
retrieve
B
all the data from one or more tables.
ID
C
Defines two objects, ServiceRequest, where all the
common data to all service requests is, and
DC311SvceReq, which captures the info specific to DC.
ServiceRequest
Type
DateOrdered
Lon/Lat
Notice that in (C), inheritance can be applied to locations (wards,
districts, addresses, Lon/Lat points are ways to describe a location)
Also, we could push the model further and have all kinds of
abstractions, say, an event class that captures ID, Time, Location
and Type.
IS-A
…
DC311SvceReq
Ward
Mapping 911 (crime) incidents
Now suppose that the application wants to add the visualization of crime incidents. The corresponding open
data table is shown below. Notice that it looks similar to DC311… but not quite:
•
ID’s have different format
•
Time is ReportDateTime, and has a time of day, not just a date
•
Offenses do not have codes
•
There’s no referring agency
From the point of view of the application:
A
We can create another query for the DC911 table and consolidate the information at the application
level (requires recompiling)
B
We can add types and data to the object model, but this bloats the objects.
C
We can use the inheritance hierarchy to refactor the information in the model. IF the model is well
thought out, the changes are minimal… But we’ll need inferencing, infrastructure to keep the graphs…
We’ll be replicating RDF/OWL
The right data integration point. A semantic model approach
… And there are net benefits to a model-driven, semantic approach:
1.
2.
3.
Applications can be coded ‘in the abstract’. E.g., Display all current events independently of whether
they are 311 or 911.
Applications can refine the metadata without having to touch the code or the underlying data. E.g.
Display all sanitation requests
Applications can be shielded from the details of the databases, like in the case of implicit joins. E.g.
Display the names of the dispatchers associated with active requests.
The SCRIBE model captures enough information about events to allow a small customization to work.
Step 1. Customizing SCRIBE for Washington D.C.
SCRIBE captures the basics of events, service types,
dates, etc. but we don’t expect the model to be
comprehensive. For example, we didn’t model all
the types of services that the 311 table had.
To customize SCRIBE, we created a new file for DC,
importing the core model.
We may want to customize SCRIBE for a variety of
reasons
•
•
•
SuperCans is a DC-specific program and it will likely
remain in the DC specific classes.
CollectingIllegalDumping or SeasonalCollection
were not contemplated in the core, and they may be
marked for promotion at a later date (using the
modelPromotion annotation)
Adding a new data property to a core class, like a
DC-specific identifier
Note that constraints and rules in the DC model do not
need to be reflected in the mapping to SCRIBE.
Step 2. Mapping instance data to the model
Next, we map the data in the columns to either a data property (transferring the data into that data
property, like in the case of SIMPLEREQUESTID) OR a class (to match an enumerated type, which
in the case of SCRIBE is represented as a taxonomy of classes.)
ServiceRequest
ServiceRequestID
associatedTo
ServiceType
hasDescriptor
ServiceTypeDescriptor
codeData
This mapping is done through a mapping model and tool called MIDO, whose details are not
covered here. However, we can assume that the columns in the two tables have been
mapped to the SCRIBE model AND the instance data can be accessed through the SCRIBE
model.
Step 3. Query through the model. Query abstract classes
The data from DC Service Requests and Crime Incidents can now be queried together as events, not just as
service requests or criminal incidents.
Query: All Events in DC, with type, District
and Ward
…
Notice that some of the data is missing in
the original table… That’s still ok
Step 3. Query through the model. Annotation Metadata
As shown previously. The inferencing in the ontology can be leveraged in a query.
Query: Public Sanitation Service
Requests
Step 3. Query through the model. Implicit join
Everything in a semantic model is connected. The service request can be linked to the name of the
dispatcher of the department.
Query: Select events associated to dept of
Public Works and his dispatcher
Lessons Learned
Scribe design decisions
A good ontology is a useful ontology, an ontology that both humans and systems can process.
Human
Usability
System
Usability
Communicable. Naming, natural language support, etc.
Key to management and model
validation
Concise. A simple way to describe the key entities of the model and
yet able to infer many facts
Balance between simple language
(RDF), conciseness and inferencing
power is key to usability. Map to UML.
Consistent. Naming conventions and modeling patterns
Use of relation taxonomy to infer
relations despite extensions.
Authoritative to domain experts
Merging standards is not enough.
Alignment with standards allows a
consistent model.
Documented, not just descriptions, but also provenance
Limited benefit to end users unless
coupled with sample instances or data
entry forms
Managed and maintained by people throughout the model lifecycle.
People not always available for the full
lifecycle
Reusable in similar domains, for similar instances.
Mechanisms for promotion of changes
to the core.
Scalable so large amounts of data can be parsed, stored and
retrieved.
Not clear whether data should remain
in RDB
Efficient query and inferencing
Impact analysis queries may require a
few seconds. This is OK.
Programmable solutions, both in open and close data paradigms.d
A standard library of data adapters and
mappings to SCRIBE are needed.
Open infrastructure and tools
We used Jena, DB2DRQ, Ruby on
For more information
http://researcher.ibm.com/view_project.php?id=2505
OR
email ro[email protected]
References
•
•
•
•
•
•
•
A direct map of relational data to RDF, W3C working draft 14 March, 2011,
http://www.w3.org/TR/2011/WD-rdb-direct-mapping-20110324/
R2RML: RDB to RDF Mapping Language, W3C Working Draft 24 March 2011,
http://www.w3.org/TR/r2rml/
The D2RQ Platform v0.7 - Treating Non-RDF Relational Databases as Virtual RDF Graphs, 2009-08-10,
http://www4.wiwiss.fu-berlin.de/bizer/d2rq/spec/
Hannes Bohring and Soren Auer, Mapping XML to Ontologies,
citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.59.8897
T. nf
Rodrigues, P. Rosa, J. Cardoso, Mapping XML to existing OWL ontologies,
citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.59.292
DB2OWL, A tool for automatic Database-To-Ontology mapping, http://citeseerx.ist.psu.edu/viewdoc/
summary?doi=10.1.1.97.5970
•
•
•
•
•
Municipal Information Systems Association/Municipal Reference Model (MISA/MRM),
http://www.misa.on.ca/en/
National Information Exchange Model, http://www.niem.gov/
D. Gonzales, C. Ohlandt, E. Landree, C. Wong, R. Bitar and J. Hollywood. The Universal Core Information
Exchange Framework, Assessing its Implications for Acquisition Programs, RAND report, 2011,
http://www.rand.org/content/dam/rand/pubs/technical_reports/2011/RAND_TR885.sum.pdf
D. Allemang, J. Hendler, Semantic Web for the Working Ontologist, Effective Modeling in RDF and OWL,
Morgan Kaufman, 2008.
Noy, McGuinness, Ontology Development 101: A Guide to Creating Your First Ontology.
http://www.ksl.stanford.edu/people/dlm/papers/ontology-tutorial-noy-mcguinness-abstract.html

similar documents