Five Simple Steps to Experience the Power of a Knowledge Graph, using Virtuoso

Kingsley Uyi Idehen
OpenLink Virtuoso Weblog
8 min readMar 7, 2018

For individuals and organizations alike, computing remains challenged by the pervasiveness of data silos, something I’ve written about in various posts over the years. Common to all those posts is my use of Virtuoso (our multi-model RDBMS, Data Access Middleware, and Data Virtualization platform) to demonstrate how data de-silo-fication can be achieved in an unobtrusive manner; i.e., you don’t have to “rip & replace” existing infrastructure in order to take advantage of what a Knowledge Graph has to offer.

Creating and Exploiting a Knowledge Graph in Five Simple Steps

Note: You can skip steps 1 through 4, if you choose to use the live Virtuoso instance behind our public URIBurner Service.

  1. Obtain and Install Virtuoso — for on-premise use on Windows, macOS, or Linux; Docker Container, preconfigured Amazon EC2 AMI in the AWS Cloud, or preconfigured Virtual Machine in the Microsoft Azure Cloud.
  2. Start the Virtuoso Server — by following the documentation for the Windows or macOS native UI; or by using the Linux command-line, virtuoso -c virtuoso.ini.
  3. Perform basic Virtuoso configuration — primarily, by changing the default passwords for the dba and dav “super-user” accounts using the HTML-based Administrator (the Conductor) or the SQL command-line
  4. Install Virtuoso productivity modules — e.g., the Extract, Load, and Transform Middleware for RDF-based Linked Open Data (a/k/a the Sponger) and the Faceted Browser — via their respective Virtuoso Application Distribution archives (VADs)
  5. Install one or more of our browser extensions — such as the OpenLink Structured Data Sniffer (OSDS), the OpenLink Structured Data Editor (OSDE), and/or the OpenLink Data Explorer (ODE)— which turn Safari, IE, Firefox, Chrome, Opera and related browsers into Linked-Data-aware user agents.

You are now ready to commence exploitation of Virtuoso-powered data access, integration, and management!

Knowledge Graph Exploitation

Whether deployed to laptop, desktop workstation, or server, Virtuoso’s built-in “Sponger” middleware provides ETL (Extract, Transform, and Load) services that can analyze document content (from a variety of data source types) and then generate descriptions of said documents in 5-Star Linked Open Data form (i.e., web-like structured data constructed using RDF-language-based digital sentences/statements). RDF documents generated by the Sponger are available in a wide variety of document types including RDF-Turtle, JSON-LD, HTML5+Microdata, HTML+RDFa, RDF/JSON, RDF-XML, CSV, OData/Atom, and OData/JSON.

Entity Description Page Returned by the URIBurner ETL Service

Installation of the Sponger ETL Middleware module equips your Virtuoso instance with a live service endpoint identified by the URL pattern, http://{your-instance-cname}/sponger/. This endpoint resolves to an HTML page that includes an input field for capturing URLs that identify the documents against which you seek to perform ETL operations.

If you prefer, rather than working through the form, you can construct sponger service URLs by hand, using the pattern: http://{your-instance-cname}/about/html/{document-url}. For instance, if using our URIBurner instance to sponge the DBpedia page about DBpedia itself, this translates to http://linkeddata.uriburner.com/about/html/http://dbpedia.org/resource/DBpedia.

Simplest of all, you can leverage the functionality provided by our OpenLink Data Explorer (ODE) or OpenLink Structured Data (OSDS) browser extensions to immediately invoke analysis of the page currently in focus (i.e., what’s currently displayed in your browser’s address bar), simply by clicking the ODE or OSDS toolbar icon.

Live Examples

The following live examples are based on our URIBurner Service, a publicly accessible instance of the Virtuoso ETL middleware. Every example based on this service can also be experienced through your own public or private Virtuoso instance.

Data Source URLs

Sponger ETL Service URL Examples

Basic entity description pages —

Deeper follow-your-nose pages, for deeper exploration and serendipitous discovery of additional relevant information —

What’s Happening Here?

A Knowledge Graph comprises RDF sentence collections that describe any number of things. In this case, we are describing documents which would naturally include describing the topics covered by said documents.

To effectively describe anything using sentences, we must have an identification mechanism in place that facilitates how we identify (denote) the subject, predicate, and object of each sentence. This is where hyperlinks (specifically, HTTP URIs) come into powerful use, enabling us to easily look-up what entities are identified by the sentence subject, predicate, and object.

The “deceptively simple” act of constructing sentences using hyperlinks is what produces a Knowledge Graph deployed using Linked Data principles. This is also referred to as an Entity Relationship Graph when visualized using a Graphical Notation (as opposed to a Linear Notation). Naturally, the more sentences you collate, the deeper your Web becomes.

Virtuoso can handle all of this for you. As stated earlier, no canned data is required, because Virtuoso starts generating and storing useful data (entity relationships expressed in RDF sentences) the moment you direct it to describe a document, whether published to the external Web or to an internal Enterprise Intranet.

What follows are some additional details about the kinds of Linked Data documents that Virtuoso can generate.

Basic” Entity Description Pages

As with the live DBpedia and DBpedia-Live instance deployments of Virtuoso, we refer to this kind of page as being “basic” only because its link traversal (i.e., HTTP URI lookup or de-reference) doesn’t include deep expansion of class instances (i.e., rdf:type attribute/property [relation] values).

The annotated diagram below depicts an entity description document (as generated by the Sponger) using EAV (Entity, Attribute, Value) terminology.

Entity Description visualization using Entity, Attribute, and Value (EAV) Terminology

The annotated diagram below presents the same entity description document (again, as generated by the Sponger), this time using SPO (Subject, Predicate, Object) terminology.

Entity Description visualization using Subject, Predicate, and Object (SPO) Terminology

Deeper” Linked Data Follow-Your-Nose Entity Description Pages

This type of page is “deeper” (relative to the “basic”) simply because its link traversal (i.e., its HTTP URI lookups or de-references) does include deep expansion of rdf:type attribute/property (relation) values.

The annotated diagram below depicts an entity description document (as generated by the Sponger) using EAV (Entity, Attribute, Value) terminology.

Entity Description visualization using Entity, Attribute, and Value (EAV) Terminology

The annotated diagram below depicts the same entity description document (again as generated by the Sponger) using SPO (Subject, Predicate, Object) terminology.

Entity Description visualization using Subject, Predicate, and Object (SPO) Terminology

In addition to Faceted Browsing for Knowledge Graph exploration, you also have the ability to use SPARQL to generate alternative exploration starting points.

Knowledge Graph Interaction using SPARQL

Now that you’ve populated your Virtuoso instance via a dynamic ETL processing pipeline, you can peform intelligent operations using the SPARQL and/or SQL query languages.

This SPARQL Query produces a dynamically-generated HTML document that provides an index of hyperlinks that denote Sample Entities grouped by Entity Type.

SELECT DISTINCT SAMPLE (?s) AS ?sample COUNT(*) AS ?count ?EntityType 
FROM <http://www.wired.co.uk/article/the-webs-greatest-minds-on-how-to-fix-it>
WHERE {?s a ?EntityType}
GROUP BY ?EntityType
ORDER BY DESC 2
Query Result HTML Document

Virtuoso’s Unique Architecture

Data de-silo-fication is the fundamental value proposition of Virtuoso. Upon installation of Virtuoso (whether on your local desktop or a remote server), you are immediately equipped with a data-junction-box that enables conceptual virtualization of heterogeneous data.

Because of this immediate empowerment, there are no “canned demos” based on unrealistic “canned data”. The moment you install Virtuoso with its Sponger Middleware and Faceted Browsing modules enabled, you are ready for a different kind of experience with data, one that has been expected for a long time but never delivered in such an unobtrusive manner.

Data Virtualization

Virtuoso’s conceptual harmonization of heterogeneous data is also known as “data virtualization.” Virtuoso delivers this virtualization so well because it is a full-blown relational database management system (RDBMS) that’s been equipped with a built-in data virtualization engine that supports a wide variety of open-standards-based protocols.

Virtuoso’s Unique Open-Standards-Compliant Multi-Model RDBMS and Data Virtualization Architecture [live clickable architecture diagram edition]

Broad support of open standards ensures data virtualization doesn’t come at the cost of product lock-in. We want our customers and prospects to consider Virtuoso only because it’s the “best of class” solution in its problem space, and not because of proprietary language or other lock-in.

Conclusion

Historically, the shortage of of high-level productivity tools and the general unawareness of the existence of any such tools have compounded confusion that inadvertently swirls around the notion and utility of a Knowledge Graph.

As demonstrated here, Virtuoso, as both a productivity tool and highly sophisticated platform that scales from tiny client setups to high-end server deployments, puts to rest these distractions of yore by enabling any end-user, power-user, systems-integration professional, enterprise architect, willing executive, or developer to fully exploit the power and promise of a Semantic Web, by leveraging a simple software installation that doesn’t require anyone to write a single line of code!

Related

--

--

Kingsley Uyi Idehen
OpenLink Virtuoso Weblog

CEO, OpenLink Software —High-Performance Data Centric Technology Providers.