Disambiguation is defined as the process of identifying which meaning of a word or term is used in context.


Words can have different meanings depending on the context they are used in. One often cited example is the word “mercury” which can refer to the element, the planet or the Roman god. A high-level headline line like “Little Known Facts About Mercury” does not reveal whether the article is about astronomy, chemistry or Roman mythology, context is required to make that determination.

Increasingly, natural language processing (NLP), a subcategory of artificial intelligence, is used for disambiguation. On a high-level the algorithms work similarly to humans by considering the context, specifically other terms that make it possible to hone in on the meaning. In the example above, the contextual mention of Zeus would suggest mythology, the mention of other elements, chemistry. The mention of the term “Venus”, however, would not allow to differentiate between astronomy and mythology and further context would be needed.


Disambiguation in scientific publications

Authorship of scientific publications poses a particularly difficult disambiguation challenge. Databases, such as PubMed or Monocl Professional, need to solve the problem of assigning an activity, e.g. a publication to the correct author given that this author might not have a unique name. In other words, disambiguation is required to determine which of the many possible John Smiths or Wang Fangs is the true author of a scientific publication, the PI on a clinical trial, the member of a medical society advisory board - to name just a few examples.

NLP algorithms are used to tackle this problem by looking at context, specifically other publications and activities that provide information about the therapeutic area, scientific discipline, or speciality an expert generally publishes about. Additional information such as affiliation and the network of collaborators can be used to further refine the process.


Why disambiguation matters for medical affairs

For both external expert selection and engagement medical affairs professionals need to have a deep understanding of the expert they are interacting with. Using an expert database such as Monocl Professional which provides broad information about professional activities (publications, board positions, speaking engagements, grants, funding, social media engagement, etc) greatly facilitates these tasks.

The value of such a database is maximized if expert data is and activities are attributed to the correct expert.

At Monocl a dedicated team is responsible for disambiguation of all data that is integrated into Monocl Professional ensuring our customers have a highly robust database as a basis for their work.

For a better understanding of how Monocl uses NLP algorithms to ensure robust data please read our blog “What’s in a name?”