By training a machine learning algorithm on the abstracts of previously published journals and live presentations, UDig designed a recommendation system to automate the process of matching peer reviewers for the American Geophysical Union.

Challenges

Scientists from around the world submit articles to be published by the American Geophysical Union (AGU). Each of these articles must first survive a peer review, but the process to select individuals to review submitted content relied heavily on a human component to find appropriate authors. As a result, there was a narrowing of the scientists and authors most often selected to provide peer reviews which led to an overrepresentation of certain socioeconomic classifiers.

By using the methodology developed by UDig, AGU ensures an equitable distribution of Peer Reviewers with representation across many demographics.

Solution

Using the abstracts from previously published Journals and live presentations table, UDig designed an NLP-backed recommendation system. The NLP portion consisted of a term frequency-inverse document frequency (TF-IDF) model and a Doc2Vec model. TF-IDF is a measure used for information retrieval. Its intention is to reflect term relevance within a particular document. The idea behind TF-IDF is to assign importance when a particular word occurs multiple times within a document as it would appear that this word is meaningful within that document. At the same time, if the word occurs frequently in the target document as well as all other documents in the corpus, it will be assigned less weight as this may just be a frequently occurring word such as stopwords like “the” or “for”.

Doc2Vec’s purpose is to convert words or entire documents into numerical representations. It maintains order and semantic information of any arbitrarily sized text. In our doc2vec model, we used the abstract as the text corpus and the abstract ID to represent the articles associated authors. After text normalization, the modeling phase began. This phase consisted of hyperparameter tuning, training, and result evaluation. Both the doc2vec and the TF-IDF models compute similarity between the target document and the corpus. The abstract with the highest similarity score output by the models would represent our recommendation. Next, we randomly selected a list of 20 target abstracts for recommendations. We output 40 total recommendations: one from the TF-IDF and one from doc2vec for each target abstract.

AGU then had 21 different reviewers analyze the recommendations for relevance. The feedback was clear that TF-IDF outperformed the Doc2Vec model. By using the methodology developed by UDig, AGU ensures an equitable distribution of Peer Reviewers with representation across many demographics.

Technology Used

  • Python
  • AWS
  • Postgres