The Grouparoo Blog
Teams are centralizing their data in their data warehouse by loading data in and transforming it as necessary. Increasingly, we are seeing teams turn to dbt to do this transforming. The idea is to write *.sql files that, when run in the right order, create useful rollup tables or materialized views of the data.
We've been asked by teams using dbt how Grouparoo can then sync their data to their cloud-based apps. This is what some are calling Reverse ETL. Reverse ETL is taking data from the warehouse and writing it back to line-of-business tools.
I've created an example project to show how the tools work together.
Approach
A beautiful thing about dbt is its simplicity. All it takes to apply the SQL transforms is to execute dbt run
. This can be scheduled to run every so often as a cron job.
While Grouparoo can run as an application to continuously sync and serve its web UI, you can also execute grouparoo run
. This will execute everything that needs to sync and exit.
By putting these two commands together, we can transform the data and sync it to the configured tools. In the example project, this means putting ./transform_and_sync
in your cron instead of dbt run
. The right data ends up in the destionation(s) of our choosing.
That's it!
Example project
The example project has instructions to seed users
and purchases
into an test warehouse like Postgres. Then it has a few transforms to roll up customer data into a few materialized views. One of these views is who we should sync to a destination, Mailchimp.
The configuration in the Grouparoo project sets up the sync pipeline:
- declares the source of the data as the warehouse
- notes which columns create properties
- creates segmentation groups like high value Spanish speakers from these properties
- maps the profile data and groups to be Mailchimp contacts and tags
After running the cron job, all 1000 users are in Mailchimp CRM based on the mappings defined. In this case, they have their first names filled out and are tagged with they are a high value Spanish speaker.
dbt plugin
In order to make this work smoothly, we created a dbt plugin for Grouparoo.
Grouparoo configuration files can be JSON and completely declarative, but they can also be any Javascript code. With this in mind, we created dbt helpers to make Grouparoo smarter about how to connect to the data warehouse.
Instead of repeating the credentials, the app knows how to read them from the various ways that dbt can be set up.
const { dbtProfile } = require("@grouparoo/dbt");
exports.default = async function buildConfig() {
// fetch warehouse connection details from parent dbt profile
const { type, options } = await dbtProfile();
return [
{
class: "app",
id: "warehouse",
name: "Warehouse",
type,
options,
},
];
};
This also includes the ability to automatically change environments based on the dbt setup, so if you have different dev
and prod
databases or warehouses, Grouparoo will use the right one using the current dbt profile.
We look forward to expanding on this plugin based on how people use it. Let us know in Github or Slack how else you want to integrate.
Tagged in Engineering Product
See all of Brian Leonard's posts.
Brian is the CEO and co-founder of Grouparoo, an open source data framework that easily connects your data to business tools. Brian is a leader and technologist who enjoys hanging out with his family, traveling, learning new things, and building software that makes people's lives easier.
Learn more about Brian @ https://www.linkedin.com/in/brianl429
Get Started with Grouparoo
Start syncing your data with Grouparoo Cloud
Start Free TrialOr download and try our open source Community edition.