Your Privacy

This site uses cookies to enhance your browsing experience and deliver personalized content. By continuing to use this site, you consent to our use of cookies.
COOKIE POLICY

Jump-start your Cloud Migration: Matillion Data Loader

Jump-start your Cloud Migration: Matillion Data Loader
Back to insights

As more organizations migrate to the cloud from legacy and traditional databases / warehouses, managers want to know the best routeHow do we modernize our data quickly and effectively in a manner that offers a balance in ease of use and cost while maintaining the familiarity with data structures for end-usersanalysts and developers? 

Lucky for us, Matillion recently launched the Matillion Data Loader. Matillion is a cloud-based, drag-and-drop ELT application. Data Loader is their solution to streamlined (easy and low cost) data pipelines for cloud migration. 

Matillion Data Loader will allow us to migrate an existing MySQL database and its structure to our Snowflake warehouse.  This is called the “lift and shift approach and though we may only need the E (extract) and (load) of ELT for now, we want to use an application (or set of applications) that provide flexibility in case we want to leverage the cloud for our transformations in the future.   

To show just how easy it is, we’ll walk through an example of configuring a pipeline.  In our example, we will migrate data from an existing MySQL database ultimately landing our data in a Snowflake environment. 

In order to build our pipeline, we’ll first need to sign up for an account. 

After signing up and logging into the system you’ll see a mostly blank screen—with directions to add a pipeline.

Matillion Data Loader - Pipeline

In order to start leveraging a pipeline, you’ll need to configure source and destination data sources.  In our case, this will be MySQL to Snowflake.  We’re not limited to MySQLMatillion Data Loader comes with a wide range of source databases / APIs.  Destinations are currently available for Amazon Redshift, Google BigQuery and Snowflake cloud data warehouses. 

Matillion Data Loader Screenshots

During configuration of the destination data source you’ll need the following information: 

  1. Account: You can find this information via the url in your browser window while logged into Snowflake.  Make sure to ONLY include information after https:// and before snowflakecomputing.com.  For example, if my URL reads https://my-account.us-east-1.snowflakecomputing.com then I would simply put my-account.us-east-1 in this section. 
  2. Username: If possible, I would suggest creating a new user for Matillion purposes as this makes it easier to track usage and compute credits.
  3. Password Type: Your organization may have security requirements to use private keys over a password—both are available options.
  4. Password:

Matillion Data Loader - Destination
You will need to add a password via the Manage button, this will make adding future pipelines more straightforward (eliminating the need to enter your password each time the pipeline runs)

Matillion Data Loader - Destination 2

Next, you’ll need to identify the appropriate role, warehouse, database and schema.  I recommend testing to ensure connectivity. 

Congrats!  You’ve now configured your destination—now we’ll need to move onto our source, which is a similar process. 

When navigating through the source configuration, you’ll need the following information: 

  1. JDBC Connection String / URL and database 
  2. E.g. – jdbc:mysql://mysqlhoststring/database 
  3. Username 
  4. Password (add a new password via the Manage button) 

Click next until you see the following menu, which will allow you to decide which tables you’d like to load data from. 

After selecting your tables, you will have the option of defining field types as well as any incremental columns to leverage as an incremental field.  Each table is limited to a single incremental field. Incremental fields will allow for faster data loading, as only the newest rows will be migrated instead of full table refreshes. 

Click Next, and define the database, schema and warehouse (for compute purposes) you want your pipeline to use.  You’ll need to do this twice—once for staging (where data will live temporarily as Matillion loads the data incrementally) and then again as your target (where data will ultimately land for analysis). 

Finally, you’ll be presented with scheduling options—configure these as needed but note that each run will leverage compute (and will consume Snowflake usage credits).  Additionally, you can choose to receive failure notices should a pipeline fail. 

Congratulations!  You’ve now configured your first pipeline!  The tool presents you with a view that displays historical run information as well as the frequency of run and the number of rows migrated.  Should you ever need to turn off the pipeline or make a change in frequency, you can do so using the menu in the top left. 

Now that we’ve built our pipeline, we can leverage the power of our Snowflake warehouse to power our company’s dashboards, analysis, and advanced analytics projects—the possibilities are virtually endless.  That said, at the foundation, we’ll need a well thought out organizational data strategy before doing so.   

Reach out to UDig to explore the multitude of cloud tools that allow organizations to get started moving data quickly with minimal cost or support. 

 

 

Digging In

  • Data & Analytics

    Unlocking the Full Potential of a Customer 360: A Comprehensive Guide

    In today’s fast-paced digital economy, understanding your customer has never been more critical. The concept of a customer 360 view has emerged as a revolutionary approach to gaining a comprehensive understanding of consumers by integrating data from different touchpoints to offer a holistic view.  A customer 360 view is about taking an overarching approach to […]

  • Data & Analytics

    Microsoft Fabric: A New Unified Data Platform

    MicroPopular data services and tools often specialize in specific aspects of the data analytics pipeline, serving teams in the data lifecycle. For instance, Snowflake addresses large-scale data warehousing challenges, while Databricks focuses on data engineering and science. Power BI and Tableau have become standard tools for business intelligence tasks. So, where does Microsoft Fabric create […]

  • Data & Analytics

    Improve Member Experience: Maximize Engagement & Value for Associations

    As you know, member engagement is key to providing value and retaining members over time. However, you must also recognize that member needs and preferences are evolving rapidly, especially as they desire more seamless digital experiences. Additionally, member expectations for personalized, omnichannel interactions have risen in recent years, and this means that associations must strategically […]

  • Data & Analytics

    A Guide to Data Strategy Success in Your Association

    While countless organizations aim to harness the potential of data, few possess a clear strategy to transform raw information into actionable insights that fuel their operations and marketing efforts. Don’t fall into the trap of investing in limited, tactical solutions.

  • Data & Analytics

    ChatGPT & Your Data Strategy – Revolution or Evolution?

    You would be hard-pressed to find a single person who was not some degree of impressed when they first tried out ChatGPT. After its public release, the conversation in the tech space seemingly changed overnight about how AI would change everything. But much like past hot topics in the tech world – such as the […]

  • Data & Analytics

    Revamping Data Pipeline Infrastructure to Increase Owner Satisfaction at Twiddy

    In an ever-evolving technological landscape, embracing new methodologies is vital for enhancing efficiency. Our data and analytics interns recently undertook a significant overhaul of one of Twiddy’s data pipeline infrastructures, implementing Airbyte pipelines with Kestra orchestration to replace an existing Java application. Motivated by several challenges with the previous system, most importantly a complete loss […]