Your Privacy

This site uses cookies to enhance your browsing experience and deliver personalized content. By continuing to use this site, you consent to our use of cookies.
COOKIE POLICY

Data & Analytics

Peer Recommendation Engine Designed for American Geophysical Union

Peer Recommendation Engine Designed for American Geophysical Union

By training a machine learning algorithm on the abstracts of previously published journals and live presentations, UDig designed a recommendation system to automate the process of matching peer reviewers for the American Geophysical Union (AGU).

How We Went from Ideas to Impact

  • By using the methodology developed by UDig, AGU ensures an equitable distribution of Peer Reviewers with representation across many demographics.

The Idea

Scientists from around the world submit articles to be published by the American Geophysical Union (AGU). Each of these articles must first survive a peer review, but the process to select individuals to review submitted content relied heavily on a human component to find appropriate authors. As a result, there was a narrowing of the scientists and authors most often selected to provide peer reviews which led to an overrepresentation of certain socioeconomic classifiers.

The Impact

Using the abstracts from previously published Journals and live presentations table, UDig designed an NLP-backed recommendation system. The NLP portion consisted of a term frequency-inverse document frequency (TF-IDF) model and a Doc2Vec model. TF-IDF is a measure used for information retrieval. Its intention is to reflect term relevance within a particular document. The idea behind TF-IDF is to assign importance when a particular word occurs multiple times within a document as it would appear that this word is meaningful within that document. At the same time, if the word occurs frequently in the target document as well as all other documents in the corpus, it will be assigned less weight as this may just be a frequently occurring word such as stopwords like “the” or “for”.

Doc2Vec’s purpose is to convert words or entire documents into numerical representations. It maintains order and semantic information of any arbitrarily sized text. In our doc2vec model, we used the abstract as the text corpus and the abstract ID to represent the articles associated authors. After text normalization, the modeling phase began. This phase consisted of hyperparameter tuning, training, and result evaluation. Both the doc2vec and the TF-IDF models compute similarity between the target document and the corpus. The abstract with the highest similarity score output by the models would represent our recommendation. Next, we randomly selected a list of 20 target abstracts for recommendations. We output 40 total recommendations: one from the TF-IDF and one from doc2vec for each target abstract.

AGU then had 21 different reviewers analyze the recommendations for relevance. The feedback was clear that TF-IDF outperformed the Doc2Vec model. By using the methodology developed by UDig, AGU ensures an equitable distribution of Peer Reviewers with representation across many demographics.

  • How We Did It
    Automated Taxonomy CreationRecommendation Engines
  • Tech Stack
    PythonAWSPostgres

Digging In

  • Insurance

    Insurtech Insights USA 2024: Recap & Key Takeaways

    This was my first time attending Insurtech Insights USA The Insurance Conference 2024.  With over 5,000 attendees and 120 sessions with speakers from insurtechs, brokers, carriers, and MGA’s, the event highlighted the industry’s readiness to embrace new tech and solve challenging problems. My colleague, Reid Colson, and I were there to meet with clients and industry […]

  • Artificial Intelligence

    Capitalizing the AI Wave to Advance Data Governance

    AI is everywhere and getting a lot of attention, but you can’t successfully leverage AI without good data. You can use the buzz around AI to advance your data governance capabilities. Join us as we explore the intersection of AI and Data Governance.

  • State Government

    NASCIO Midyear Conference | 4 Key Takeaways

    I attended the National Association of State CIO (NASCIO)’s Midyear Conference for the first time in National Harbor, Maryland. It was my first time at a NASCIO event, and what struck me most was how collaborative the NASCIO community is and how passionate these leaders are about serving their state’s constituents and employees.   Each […]

  • Insurance

    Leveraging Data, Analytics, & AI in Insurance

    Join Rob Reynolds, Vice President & Chief Data & Analytics Officer at W.R. Berkley Corporation, Lloyd Scholz, Chief Technology Officer at Markel, and Reid Colson, EVP of Consulting at UDig and former Chief Data & Analytics Officer at Markel, to get an insiders’ view on how insurers are future-proofing their firms through data, analytics and AI. This practical discussion will leave attendees with an understanding of how to drive value through data. While focused on insurance, the perspectives shared in this call apply beyond insurance firms. If you are looking to get more out of your data, this discussion is for you.

  • People

    Digging In with Kevin Cox, Principal Consultant

    Digging In is a regular series of blog posts profiling UDig employees.  We hope this series helps you get to know our team and understand why we dig what we do! Today, we are sitting down with Kevin Cox, Principal Consultant, Data & Analytics.   UDig: Tell us a little bit about your background and your […]

  • Insurance

    Designing a Data Consolidation Strategy for a Fast-Growing Insurance Brokerage Company

    Fast growth made an insurance brokerage company realize they needed a data consolidation strategy to better organize and protect their data.

  • Data & Analytics

    Unlocking the Full Potential of a Customer 360: A Comprehensive Guide

    In today’s fast-paced digital economy, understanding your customer has never been more critical. The concept of a customer 360 view has emerged as a revolutionary approach to gaining a comprehensive understanding of consumers by integrating data from different touchpoints to offer a holistic view.  A customer 360 view is about taking an overarching approach to […]

  • Data & Analytics

    Microsoft Fabric: A New Unified Data Platform

    MicroPopular data services and tools often specialize in specific aspects of the data analytics pipeline, serving teams in the data lifecycle. For instance, Snowflake addresses large-scale data warehousing challenges, while Databricks focuses on data engineering and science. Power BI and Tableau have become standard tools for business intelligence tasks. So, where does Microsoft Fabric create […]

  • Artificial Intelligence

    How Prompt Engineering Impacts the Deployment of AI

    The integration of artificial intelligence (AI) within enterprise operations marks a significant shift towards more efficient, informed decision-making processes. At the heart of this transformation is prompt engineering — a nuanced approach that plays a pivotal role in optimizing AI model interactions. This post explores the intricate framework of prompt engineering, outlines the structures of […]