The Many Shades of Knowledge Graphs: Let Me Count the Ways

One of the most significant developments about the current resurgence of statistical Artificial Intelligence is the emphasis it places on knowledge graphs. These repositories have paralleled the contemporary pervasiveness of machine learning for numerous reasons, from their aptitude for preparing training datasets for this technology to pairing it with AI’s knowledge base for consummate AI.

Consequently, graph technologies are becoming fairly ubiquitous in a broadening array of solutions from Business Intelligence mechanisms to Digital Asset Management platforms. With tools like GraphQL gaining credence across the data landscape as well, it’s not surprising many consider knowledge graphs one of the core technologies shaping modern AI deployments.

As such, it’s imperative to understand that all graphs are not equal; there are different types and functions ascribed to the various graphs vying for one another for the knowledge graph title. Since the critical function of knowledge graphs is to make data exchangeable across systems while detecting relationships between data elements, one of the pivotal aspects of its definition is “knowledge graphs are built on top of ontologies and taxonomies and terminology systems,” maintained Franz CEO Jans Aasman.

“You can’t have a knowledge graph without it. Otherwise, you just have a graph application where people just make up the names for the nodes, they link up a few nodes, and that’s it.”

Understanding the constructs within knowledge graphs pertaining to the interrelation of ontologies, properties, semantics, metadata management, introspection, corpuses and more is crucial to successfully aligning data in ways that maximize AI deployments—from machine learning to symbolic reasoning.

AI Time Journal Resources
Are you learning data science?

Check out what books helped 20+ successful data scientists grow in their career.

Labeled Property Graphs

According to Cambridge Semantics VP of Product Steve Sarsfield, traditionally there were only “two types of knowledge graphs: RDF [Resource Description Framework] and Labeled Property Graphs”. LPGs are the most rudimentary form of knowledge graphs because they provide relationship detection between data elements, but lack the uniformity of nomenclature Aasman alluded to that are foundational to these solutions.

LPG advantages include the fact they don’t require significant upfront time modeling data and enable users to quickly add data properties—which are useful for reification, provenance, and machine learning confidence scores. Nevertheless, LPGs aren’t predicated on consistent ways to identify concepts, nodes, and data’s meaning. They function as silos that can’t share data between graphs or organizations.

“With property graphs, you can only add those properties to relationships,” commented TopQuadrant CEO Irene Polikoff. “With the RDF approach you can add it to anything.”

Semantic Graphs

LPGs are often contrasted with semantic graphs; the former focus on nodes while the latter focus on the edges (relationships) between nodes. Although it’s possible to create proprietary semantic graphs, these databases are exemplified by RDF graphs that leverage universal standards for data identifiers, vocabularies, and taxonomies.

This confluence is largely responsible for harmonizing data regardless of differences at point of origination—which is highly beneficial for involving the various vectors for meaningful machine learning deployments. Specific applications in which semantic knowledge graphs excel include:

  • Data Engineering: Knowledge graphs play a pivotal role in assembling divers data into a common data model for building machine learning models. According to Lore IO CEO Digvijay Lamba, a “knowledge graph is really how AI does the mapping” to get different data into a uniform model. In this use case, knowledge graphs align different data concepts that machine learning algorithms map to a common model, significantly reducing the effort required to wrangle data for data science.
  • RDF*: The emerging RDF* standard is a hybrid between LPGs and RDF that enables the latter to swiftly include properties in a standards-based environment. When using machine learning for data management staples like tagging or classifying documents for regulatory compliance, adding descriptors about confidence or probability is necessary because “it wasn’t just something a person said; it was something the computer figured out,” Polikoff remarked. “You would want to keep that likelihood or the link to the underlying reason.”
  • Introspection: Semantic knowledge graphs are prized within the AI space for their penchant for question answering, which is instrumental for supporting natural language technologies applications. “The idea of introspection is that the knowledge graph can tell you what it knows,” mentioned TopQuadrant CTO Ralph Hodgson. “You can ask it what do you know about this.” According to Hodgson, ad-hoc question asking isn’t supported by property graphs; it’s not supported by relational technologies, either.

Taxonomies, Vocabularies

Implicit to the introspection Hodgson mentioned is a clarification of the meaning of different entities aligned in knowledge graphs. As Aasman observed, “The world’s big: there’s many concepts that you need to agree on before you can build a useful knowledge graph.”

Vocabularies define the terms used in knowledge graphs, while “taxonomies are also a special part of the knowledge graph because they use graphs as their model and they are organized, hierarchical concepts,” Polikoff explained. Vocabularies and taxonomies form the basis of the terminology systems Aasman referred to, enabling organizations to specify the words used for the various entities represented in data.

This facet of a knowledge graph, which Polikoff implied is sometimes considered a type of knowledge graph in itself, plays a vital role in the ability of these tools to “contain all of the different synonyms of different things, like how can different things be worded in the world,” Lamda denoted. Agreeing on this terminology is foundational to these graphs’ capacity for question answering and machine reasoning.

Ontologies

Ontologies are another subset of knowledge graphs that supplement taxonomies by “providing schema, structure, and rules to the rest of the data in the knowledge graph,” Polikoff revealed. The scope of ontologies includes the basics of what Lamda referred to as a “common data model”, to more extensive applications pertaining to “complex properties and axioms, and they [ontologies] represent the knowledge of a domain in great detail,” Hodgson posited. This latter application is critical for sophisticated AI deployments, notwithstanding ontologies’ shared modeling capabilities.

Still, those capabilities are responsible for harmonizing all data for reasoning and machine learning. Aasman recalled a use case in which a large hospital system leveraged taxonomies and ontologies for unifying “patient data from all over the place, whether it’s in a data warehouse, separate databases, in their ICU from various means, HL7 [Health Level Seven] streams.” The healthcare provider can then run machine learning on the resultant patient entity trees, or use them to build more complex AI models.

Corpuses

According to Polikoff, a corpus is “another type of graph…it’s documents or instances across documents.” Knowledge graph corpuses are a means of aggrandizing enterprise knowledge with external sources.

These undertakings also include what Polikoff termed a data asset collection involving metadata. Such corpuses are vital to knowledge graphs’ propensity for enriching cognitive computing endeavors like reasoning, one of the fundamental characteristics of machine intelligence.

Aasman specified that when harmonizing various patient data for “reasoning, you want to reason with a combination of the patient data with the life sciences biological knowledge.” In this case, a corpus would include knowledge from medical journals or research studies. Knowledge graphs can include this information for intelligent inferences that derive new facts by reasoning about (or combining) others, fortifying machine intelligence.

Access Graphs

The universal standards of accomplished knowledge graphs are invaluable for sharing information. By unifying data’s meaning based on standardized terminology and data models, it’s possible to enlarge knowledge graphs by incorporating others in them. Across the enterprise, this capability enables different departments (sales, marketing, research and development, etc.) to profit from each other’s graphs with comprehensive analytics or reasoning on each relevant datum.

In these instances, organizations could leverage an access graph “to bring those graphs together,” Hodgson divulged. However, in the likely event some data shouldn’t be viewed by certain users without specific security or governance clearance, “you can put an access graph on top of another graph, which will have those reification statements in it that prohibits the view of things,” Hodgson said.

Metadata and More

The broadening sphere of knowledge graphs encompasses much more than just property graphs and semantic graphs. Although most knowledge graphs involve one of these two approaches—or their hybrid, RDF*—these platforms are also comprised of different types of sub-graphs, which function as components or applications of their capabilities. Ontologies, taxonomies, corpuses and access graphs are all examples of these subset graph types within the knowledge graph framework that align disparate data for singular use cases—like training machine learning models.

As Polikoff implied, another common application of this technology is for metadata graphs. These graphs are helpful for intricate network monitoring in complex hybrid and multi-cloud environments, in which users can observe metadata about the functionality of virtual machines, individual applications, databases, and their pipelines. Such a graph is primed for deploying “machine learning to start predicting what machine or VM is going to go down based on a bunch of characteristics of the current running processes in the machine, the application, and the load,” Aasman indicated.

Image Credits
Featured Image: Unsplash

Contributor

Jelani Harper is an editorial consultant servicing the information technology market. He specializes in data-driven applications focused on semantic technologies, data governance, and analytics.

Opinions expressed by contributors are their own.

About Jelani Harper

Jelani Harper is an editorial consultant servicing the information technology market. He specializes in data-driven applications focused on semantic technologies, data governance, and analytics.

View all posts by Jelani Harper →