Developing a Common Dataset: A Necessary Challenge for Associations 

At UDig we have the pleasure of working closely with an association in the healthcare space. Like many associations, their small staff works hard for a cause they passionately believe in. Perhaps none work harder than their Data QA and Analytics team who tirelessly and repetitively fix data elements coming from their member programs to form a cohesive view of their world. Day in and day out the data team works diligently to compare “apples to apples”, despite being handed a gigantic “bowl of fruit.”

They are a visionary organization, however, and set about trying to change their paradigm. Knowing that they needed to simultaneously reduce the effort to compare data elements, while enriching the dataset to allow for more robust measurements, the team began by working closely with their members, Electronic Health Records Vendors, State oversight representatives and other stakeholders to develop a common dataset. A single language they could all speak, that would have not only functional impacts (i.e.- less time spend crunching data), but be able to provide crucial insight into their offerings and translate that knowledge into improved outcomes for patients and providers alike.

Anyone who has ever developed a common dataset (or even a data dictionary for a single entity for that matter) is probably thinking “that’s way easier said than done!”. You would be correct. Our client’s journey began five years ago, and the road ahead remains rocky, covered in fog, and other traveling metaphors that convey the difficulty of shifting to a common dataset. Numerous hurdles face the development of the dataset: from “data fatigue” (as one member referred to the numerous regulatory and other bodies with which they submit data) to shifting definitions of data elements set by governing bodies outside of our client’s control.

Still, if a Common Dataset isn’t attempted now, then when? To quote their CEO “Let’s not let the great be the enemy of the good.” In a perfect world, everyone who touches this dataset would be speaking the same language, and the heroic effort of data analysts could be spent being forward thinking, identifying trends and looking for operational efficiencies that can impact the bottom line. In reality, initial rollout will likely be met with resistance at many different levels; until the value is realized (or at least the vision is shared).

How, then, does an organization even set about developing a common dataset? 

  • Begin with consensus building of the need for the dataset. Identify and quantify the value such an effort would produce.
  • Next, identify the set of crucial data required to make the effort viable. Think carefully about what data is “nice to have” versus an absolute requirement; particularly if excess data will increase the complexity of data cleanup and integration.
  • Now that the set of data has been identified, work to clearly articulate how that data should look. This should take the form of both a business glossary (i.e. English definitions of the data*) and a functional data dictionary (i.e. a set of technical requirements, clearly indicating acceptable parameters, values, etc.).
  • Finally, socialize the dataset and iterate on what was developed.

*Don’t bite off more than you can chew here! In my experience, this is the most difficult phase of any data project: I’ve nearly seen people come to blows over the definitions of a particular attribute of data. Compromise is your friend here and a neutral facilitator can greatly improve the experience for all involved!

Now comes the fun part: the technical implementation of the Common Dataset capture and analytics mechanisms. This could be, of course, the subject of numerous blogs, white papers, late night phone calls and intense hand-wringing sessions. It’s worth noting that this phase will be dramatically simplified based on the strength of the previous steps. Well-defined data is much easier to work with. Working with vendors to automate the data acquisition process will save countless man hours spent “hand jamming” the data, while using the data definitions mentioned above will inform a robust quality process to ensure junk data doesn’t make its way into your analytics.

We’ve now spoken with countless associations who are in much the same place. They know they need to compare apples to apples, but their members are all enjoying dramatically different kinds of fruit. Finding common ground will not only benefit all of their members but provide faster, more meaningful results. Your data isn’t getting any smaller. More and more data is generated every single day. Standardizing a set of data now can reduce manpower, improve data quality, provide insights and optimize processes that effect the bottom line.