Your Privacy

This site uses cookies to enhance your browsing experience and deliver personalized content. By continuing to use this site, you consent to our use of cookies.
COOKIE POLICY

Conquering New Data Sources | SQL Databases

Conquering New Data Sources | SQL Databases
Back to insights

Data analysts are sometimes challenged to quickly understand and discover valuable insights from new data sets or data sources. This can be an overwhelming task, especially if there’s no metadata documentation available and/or no subject-matter experts available to help. When business leaders and executives are looking for fast answers to their questions, you need to be ready to respond with high quality BI and analytics. Here are some best practices to quickly familiarize yourself with uncharted data sources in SQL databases.

Explore the Schema

You’ll often need to begin by selecting sample data from many different tables. This will help you get an initial idea of what information is available and how/where it’s stored in the database. Make note of the tables that seem to hold data critical to the analysis you want to perform. Run COUNT and COUNT DISTINCT functions on all critical tables to discover which column(s) define a unique row in that table. For these critical tables, identify which columns seem most pertinent to your analysis and which columns might link up with other tables of interest. Innovative business insights are usually derived from data in many tables across the database, so table joins are important to determine at this stage.

Run Simple Statistics on Critical Tables

When you’ve identified the tables containing data critical to your analysis, you should explore those tables further with some high-level statistics. With sales data, for example, you might want to look at daily/monthly/yearly totals for sales revenue, number of customers, etc. You might look at the breakdowns by customer type, market region, product SKU, etc. Use the MIN and MAX functions to see the range of values in critical numeric or date columns. This will help you identify data that seems out of bounds and should potentially be excluded from your analysis (e.g. dates in the future, default dates like 1/1/1900, negative or blank revenue values).

Check Findings with Accepted Business Knowledge

Once you have an idea of where the critical data is stored and some high-level summary statistics about that data, it’s a good time to compare your findings with accepted business knowledge. Do the statistics you’re pulling align with published reports within the organization? If not, can you speak with the owners of those reports to understand the discrepancies? Is there a reporting application available that can help you check the summary statistics or even validate column definitions/values on individual records?

Document What You’ve Learned

It’s always a good idea to take notes as you go through this data exploration. As you complete the initial exploration, you may want to add structure and details to your notes so they can serve as good reference documents for you in the future. More than likely, you’ll end up sharing your documentation with others in your company who need to access and utilize the same data. Standard reference documentation is an important part of data management and governance that can help ensure everyone is interpreting and using the data in a consistent way.

Start your Analysis

Now that you have a better understanding of the available data, you are ready to begin your actual analysis! It’s important to continue to tread lightly at this point since you aren’t truly an expert on this data yet. When you believe you’ve discovered important insights, it’s prudent to run your findings by a knowledgeable peer for a sanity check before sharing more broadly. If the peer uncovers any incorrect logic or assumptions in your analysis, make sure you update your metadata documentation appropriately.

 

Digging In

  • Data & Analytics

    Unlocking the Full Potential of a Customer 360: A Comprehensive Guide

    In today’s fast-paced digital economy, understanding your customer has never been more critical. The concept of a customer 360 view has emerged as a revolutionary approach to gaining a comprehensive understanding of consumers by integrating data from different touchpoints to offer a holistic view.  A customer 360 view is about taking an overarching approach to […]

  • Data & Analytics

    Microsoft Fabric: A New Unified Data Platform

    MicroPopular data services and tools often specialize in specific aspects of the data analytics pipeline, serving teams in the data lifecycle. For instance, Snowflake addresses large-scale data warehousing challenges, while Databricks focuses on data engineering and science. Power BI and Tableau have become standard tools for business intelligence tasks. So, where does Microsoft Fabric create […]

  • Data & Analytics

    Improve Member Experience: Maximize Engagement & Value for Associations

    As you know, member engagement is key to providing value and retaining members over time. However, you must also recognize that member needs and preferences are evolving rapidly, especially as they desire more seamless digital experiences. Additionally, member expectations for personalized, omnichannel interactions have risen in recent years, and this means that associations must strategically […]

  • Data & Analytics

    A Guide to Data Strategy Success in Your Association

    While countless organizations aim to harness the potential of data, few possess a clear strategy to transform raw information into actionable insights that fuel their operations and marketing efforts. Don’t fall into the trap of investing in limited, tactical solutions.

  • Data & Analytics

    ChatGPT & Your Data Strategy – Revolution or Evolution?

    You would be hard-pressed to find a single person who was not some degree of impressed when they first tried out ChatGPT. After its public release, the conversation in the tech space seemingly changed overnight about how AI would change everything. But much like past hot topics in the tech world – such as the […]

  • Data & Analytics

    Revamping Data Pipeline Infrastructure to Increase Owner Satisfaction at Twiddy

    In an ever-evolving technological landscape, embracing new methodologies is vital for enhancing efficiency. Our data and analytics interns recently undertook a significant overhaul of one of Twiddy’s data pipeline infrastructures, implementing Airbyte pipelines with Kestra orchestration to replace an existing Java application. Motivated by several challenges with the previous system, most importantly a complete loss […]