ETL Data Prep: Spring Cleaning For Data

By

Digging In on Dataverse by Lavastorm

The outcome of any analytics project is limited by the quality of data available to the organization. Most companies are familiar with data quality issues and how significantly they can hinder efforts to make data-driven decisions. Companies that can implement robust data management practices will have a competitive advantage in every industry as they reap the benefits of trusted, reliable data.

Key points about data quality:

So you get it, data quality is IMPORTANT. But what can you do today to start tackling data quality challenges in your organization? Find a self-service ETL data prep tool that works for you. There are countless options out there, but today I’m writing about Dataverse by Lavastorm.

Dataverse by Lavastorm

Dataverse is a web-based desktop application designed for data processing, integration, and analytics. It can import data from many standard sources (Excel, database, SharePoint, XML, MongoDB, etc.) and export the processed data to several formats.

Dataverse, like many other ETL publishers, offers a basic free version and a paid version with enhanced functionality. Compared with other freeware, Dataverse provides extensive functionality that may be sufficient for a wide range of industries. Dataverse’s simple interface allows users to visualize data transformations quickly and easily. There are hundreds of built-in functions, and users can define their own custom functions as well. You can build data flows in Dataverse that can help you identify and correct errors in your data (e.g. duplicate records, incorrect date formats, missing fields, etc.).

One of the major limitations of the Dataverse freeware is a cap on the number of rows that can be processed through at a given time; the maximum is 2 million rows. The paid versions of Dataverse offer unlimited rows as well enhancements like security integration, API support, and automation.

Tips to get started with Dataverse freeware:

  • Product available for download HERE
  • Simple tutorial videos posted by Dataverse HERE
  • Make sure you adhere to the technical setup requirements listed on the download site or you will experience reduced performance
  • Application includes a thorough embedded help directory; additional resources can be accessed on their community page HERE

Alternative technologies to Dataverse (list is not comprehensive):

  • Talend Open Studio, free (very limited capabilities compared to paid version)
  • CloverETL Community, free (very limited capabilities compared to paid version)
  • Pentaho
  • Informatica
  • SSIS
  • Oracle Data Integrator
  • IBM InfoSphere DataStage

Data quality is just one piece of a modern data management strategy. These challenges can be daunting and hard to fix on your own. Fortunately, you don’t have to go it alone. UDig is here to help. With expertise in Data Governance, Data Integration, Data Architecture, and BI & Analytics we’re ready to back you up. Contact us to set up a conversation and see how we can help!

About The Author

Michelle Pegler is a Senior Consultant on the Data team.