Words as Gatekeepers: Measuring Discipline-specific Terms and Meanings in Scholarly Publications

Lucy Li
AI2 Blog
Published in
3 min readMay 8, 2023

--

Screen grabs from S2ORC that depict the measurement of jargon used in research papers.
We measure scholarly jargon, which consists of discipline-specific word types (blue) and senses (orange), in the Semantic Scholar Open Research Corpus (S2ORC). The top left excerpt is from an optoelectronics paper by Satishkumar et al. (2000). We link these measurements to two key social implications involving audience design and scientific success.

Scholarly text is often laden with jargon, or specialized language that can facilitate efficient communication within fields but hinder understanding for outsiders. Jargon naturally evolves so that researchers and scholars can convey meaning succinctly, but it can be a barrier between fields, and between scientists and the general public.

For example, words such as junction, diode, and bias are specific to the field of optoelectronics, as shown in the figure above. In particular, bias is overloaded with different meanings, or senses, across fields, as it can refer to social discrimination, statistical misestimation, or electric currents. In our paper, we use a natural language processing (NLP) approach called word sense induction to disentangle words’ senses, and show that they can be as specialized as field-specific word types. We define jargon as both discipline-specific words and discipline-specific meanings. See our Findings of ACL 2023 paper for a detailed description of how we operationalize and validate our measure of jargon.

Examples of discipline-specific word types. The disciplines listed are international trade, natural language processing, immunology, operating system, agronomy, chromatography, industrial organization, computer network, telecommunications, dentistry, and horticulture. The jargon includes wto, trade, fdi, ftas, antidumping, nlp, corpora, treebank, disambiguation, corpus, treg, cd4, immune, il, th2, kernel, performance, network, and root.
Examples of discipline-specific word types (above), and discipline-specific word senses (below). Can you figure out what overloaded meanings the words in the bottom half have across their two disciplines? See the full tables in our paper to check if your intuition is right!

We measure jargon in English abstracts across three hundred fields of study from the Semantic Scholar Open Research Corpus (S2ORC). We find that while the the biological sciences use very distinctive word types, such as names of molecules and chemicals, subfields in math, technology, physics, and economics tend to reuse existing words with specialized meanings. For example, mathematicians repurpose common words such as power, pole, union, surface, and origin.

We connect these measurements of scholarly jargon to two key social implications, to showcase the utility of our metrics for “science of science” research and computational sociolinguistics, which is the study of how social factors relate to language.

Graphs where the x-axis is the index, or where we are in the abstract, from the beginning at 0 to the 100th word. The y-axis is the average maximum “jargony-ness” of the word at that index. The gap between different journal types is larger for abstracts in engineering and computer science than for those in medicine and biology.
The x-axis is the index, or where we are in the abstract, from the beginning at 0 to the 100th word. The y-axis is the average maximum “jargony-ness” of the word at that index. The gap between different journal types is larger for abstracts in engineering and computer science than for those in medicine and biology.

First, we measure audience design, or whether scholars decrease their use of jargon depending on who they write for. We find that most fields reduce jargon when publishing in general-purpose, multidisciplinary journals such as Nature, but some fields do so more than others. For example, in the above figure, computer science adjusts its published content based on venue more so than medicine and biology do. A possible explanation for this behavior is that general-purpose venues have a history of being led and dominated by biological and physical sciences.¹ So, though “general-purpose” venues may intend to be for all of science,² some fields are expected to adapt their language more so than others.

A table with the columns “types” and “senses”, that show regression coefficients for the fractions of discipline-specific words or senses in abstracts. The dependent variables are citation count and interdisciplinary impact. Significantly negative coefficients are highlighted, and “# obv.” is the number of observations. The magnitude of coefficients are not comparable across rows, since each are separate regressions.
The columns “types” and “senses” show regression coefficients for the fractions of discipline-specific words or senses in abstracts. The dependent variables are citation count and interdisciplinary impact. Significantly negative coefficients are highlighted, and “# obv.” is the number of observations. The magnitude of coefficients are not comparable across rows, since each are separate regressions. “Bonferroni correction” refers to a type of statistical correction to account for multiple comparisons.

Second, we examine how discipline-specific language is associated with two distinct measures of scientific success: citation counts and interdisciplinary impact. Interdisciplinary impact measures the diversity of fields that cite a paper. We ran separate regression models for each field, to see how the relationship between jargon and success may differ across them. Although the direction of correlation between jargon and citation rates varies, jargon is nearly always negatively correlated with interdisciplinary impact.³

Combined, our findings suggest that though some fields do not reduce their use of jargon as much as others in general-purpose venues, this practice may impede interdisciplinary communication. This opens a potential opportunity for the reconsideration of abstract writing norms, especially for venues that intend to bridge disciplines.

[1] PLOS One’s founding letter and Nature’s initial launch of Scientific Reports are two examples of general-purpose venues’ origins.

[2] For example, see Nature’s “Aim and Scope”.

[3] Our study is not causal, but provides a path forward for future studies around the effects of jargon on interdisciplinary connections.

Follow @allen_ai and @semanticscholar on Twitter, and subscribe to the AI2 Newsletter to stay current on news and research coming out of AI2.

--

--