Why ontology is so important I: Knowledge Hoarding

Caohongliu
2 min readMay 18, 2021

--

In the previous stories about Knowledge Graph, I have talked about ontology. Ontology (put simply) defines the concepts and relationships (and imposes constraints) that are permissible in a KG. It’s often used as a data schema with shared vocabulary.

Ontology plays a very important role for data integration, data understanding and reasoning within the Knowledge Graph. In this story, I will explain how ontology can deal with knowledge hoarding in the company.

Knowledge Hoarding

About knowledge hoarding, image from data.world presentation: https://www.youtube.com/watch?v=ZWM-Dlw3VCM&list=PLDhh0lALedc7LC_5wpi5gDnPRnu1GSyRG&index=2

The Figure above shows briefly what is knowledge hoarding. Let’s assume that all the colleagues from different business units are nice and are willing to share the knowledge/data, knowledge hoarding can still exist. There are many possibilities that knowledge hoarding may happen, including difficult naming understanding, data is application centric, documentation non-existent/not detailed/ambiguous, data quality unknown and so on.

For example, a team build an application for the clients and they decide to collect the client data (with GDPR compliance). When they are defining the data collection protocol (if there is one hopefully) they list 200 features that are important to the marketing people including date, time zone, seg1, seg2, etc. If there is no documentation, people from other team will never understand what’s seg1 or seg2. Even though with a detailed documentation, ambiguity can still exist. For example, is the date local time or based on a specific time zone like UTC+1?

Time to knowledge

In the previous example, when data scientist from other teams want to use this dataset, they need to spend a lot time to talk to the marketing people about the meaning of each feature, talk to the developers about data collection protocol and implementation details. And if the developer left the company, some questions can never be answered. The time to knowledge on this dataset is increased.

Now let’s imagine we have 10 datasets from 10 different teams/business units. They have complementary info to each other and we need to merge them to have an holistic view about the clients. If all these 10 datasets have their own data schema and documentation, how many people are needed to just do the data understanding and schema matching before data merging? How long should the marketing/finance people need to wait for an answer from the data?

What ontology can do?

Ontology defines the entities and their relationships (and imposes constraints) for a domain. If the domain experts in the company along with ontologist define a company domain ontology which is shared by all the team, the knowledge hoarding situation discussed above will not happen. And if all the datasets in the company follows the ontology, the time to knowledge will be reduced significantly and the data scientists can be freed to do more valuable tasks and work more efficiently.

--

--

No responses yet