Send Your Matomo Data to a Data Warehouse of Your Choice

Matomo is the most established open-source analytics platform on the market. It carved out a real alternative to Google Analytics years before the privacy wave made it fashionable. It’s battle-tested, packed with features, and trusted by organizations that take data ownership seriously. If you’re running Matomo today, you probably chose it for good reasons.

But if you’ve operated a Matomo instance at any meaningful scale, you also know that there’s a gap between what Matomo tracks and what you can actually do with that data. The tracking is solid. The infrastructure underneath it is where things get painful.

The Short Version

If you already know the problem and just want the solution: d8a speaks the Matomo protocol natively. Add one line to your existing tracking snippet and your events start flowing into BigQuery, ClickHouse, or any other warehouse via object storage - without touching your Matomo setup.

_paq.push(['addTracker', 'https://d8a.example.com/matomo.php', '<property_id>']);

That’s Matomo’s own addTracker API. No d8a SDK, no changes to your existing tracking code. Full setup and schema details are further below.

If you want the full picture - why Matomo’s storage layer struggles at scale, what the schema looks like in a warehouse, and how to pick the right backend - read on.

Where Matomo Starts to Struggle

These aren’t design flaws born of carelessness. Matomo was built over many years to be easy to install, and it succeeded at that. But the technology choices that made it accessible also created ceilings that show up once your traffic grows or your data team starts asking harder questions.

MySQL Wasn’t Built for Analytics Workloads

Matomo uses MySQL (or MariaDB) as its data store. MySQL is a solid transactional database, but analytics queries are fundamentally different: they scan large ranges of rows, aggregate across time windows, and filter on high-cardinality dimensions. MySQL is not good at any of that.

Matomo works around this by pre-computing reports via an archiving cron job (core:archive). At low volume, this works fine. At 10M+ events per month, the cron can take hours to complete. Until it finishes, dashboards show stale or incomplete data. And any on-the-fly segment that hasn’t been pre-processed bypasses the archive entirely and hits raw log_* tables directly, often timing out or locking the database for other queries.

The Database Schema Is a Normalized Maze

Matomo’s MySQL schema is a product of years of incremental development without the kind of schema review that keeps things navigable. Data is spread across many linked tables:

log_visit - one row per visit (session), with 70+ columns
log_link_visit_action - links visits to individual actions (page views, events, downloads, etc.)
log_action - a lookup table for action names and URLs, normalized to save storage space
log_conversion - goal conversions, linked back to visits
log_conversion_item - individual ecommerce items within conversions
archive_numeric_* and archive_blob_* - pre-aggregated report data, stored as serialized PHP arrays and partitioned by month

Getting a simple answer like “what were the top landing pages for sessions that included a purchase” requires joining log_visit to log_link_visit_action to log_action to log_conversion, understanding which idaction_* column maps to which action type, and knowing that action names are stored by reference ID rather than as readable strings. Most data teams take one look at this and decide it’s not worth the effort.

Pre-Aggregation Creates a Maintenance Dependency

The archiving cron is not optional overhead. It’s a core architectural requirement. Matomo’s reporting API reads from the archive_* tables, not from raw logs. If the cron fails, stops, or falls behind, your reports stop updating.

Creating a new custom segment triggers a full historical reprocessing job that can run for hours on large datasets. Deleting the archive tables (sometimes necessary to fix corruption) means re-archiving everything from scratch. The cron becomes a single point of failure that someone on your team has to monitor and maintain.

Self-Hosting Complexity Compounds Over Time

The initial install is genuinely easy: PHP, MySQL, a web server, and you’re running. But operating Matomo in production adds layers:

The archiving cron needs scheduling, monitoring, and enough PHP memory and execution time to finish before the next run starts.
GeoIP databases need periodic updates. MaxMind changed their licensing model a few years back, adding yet another moving part.
Plugin upgrades can break across major Matomo versions. Community plugins are often abandoned or lag behind core releases.
MySQL needs careful tuning at scale: innodb_buffer_pool_size, slow query log monitoring, and table partitioning (which Matomo doesn’t handle out of the box).
Background tasks for segment pre-processing, scheduled report generation, and data purging all run through the same PHP-based task queue.

Each piece is manageable on its own. Together, they create a maintenance surface that quietly expands until someone on the team is spending a significant chunk of their week just keeping Matomo healthy. That’s often why organizations migrate to Matomo Cloud. Not because they wanted to give up control, but because the ops burden outgrew their willingness to maintain it.

Getting Raw Data Out Is Harder Than It Should Be

Matomo has a comprehensive API, but it’s a reporting API, not a raw-data API. It returns pre-aggregated metrics and dimensions, not event-level rows. If you want raw event data for a custom analysis pipeline, BI tool, or ML model, your options are limited: query the MySQL log_* tables directly (with all the schema complexity described above), use a raw data export plugin, or set up a MySQL replication pipeline. None of these are simple, and none give you the data in a shape that’s easy to work with downstream.

What Matomo Gets Right

None of the above should overshadow what Matomo does well, and it does a lot well.

The UI is genuinely useful. Heatmaps, session recordings, funnels, A/B testing, form analytics, tag manager - all integrated in one interface. For marketing and product teams that need answers without writing SQL, Matomo delivers in a way that most open-source analytics tools simply don’t. That matters, and it’s why many organizations keep Matomo running even when they’ve outgrown parts of it.

Privacy compliance is best-in-class. First-party data collection, a cookieless tracking option, built-in consent management, GDPR and CCPA tooling out of the box. For organizations in regulated industries or privacy-sensitive markets, Matomo has been the go-to choice for good reason.

The tracking is solid. The matomo.js tracker is mature, well-tested, and covers page views, events, ecommerce, site search, content tracking, media, goals, and custom dimensions. The implementation ecosystem runs deep: tag managers, SPA support, server-side tracking, log analytics.

The addTracker API is brilliant. This is Matomo’s own mechanism for sending the same tracking events to multiple endpoints simultaneously. It’s native, documented, and requires zero changes to your existing tracking code. It also happens to be exactly what makes the next section possible.

One Line of JavaScript, Full Warehouse Access

d8a speaks the Matomo tracking protocol natively. Add a single line to your existing tracking snippet:

_paq.push(['addTracker', 'https://d8a.example.com/matomo.php', '<property_id>']);

Every event your Matomo tracker already fires - page views, custom events, ecommerce orders, goal conversions, site search, content tracking - gets duplicated to d8a automatically. Your Matomo instance keeps working exactly as before.

For the full setup guide, including the complete tracking snippet, see Setting up Matomo as a d8a source.

What Changes in the Warehouse

One Flat Table Replaces Many

d8a stores everything in a single events table. Every tracked field, every session-scoped metric, every ecommerce attribute is a dedicated, named column. No joins, no action lookup tables, no serialized PHP blobs.

The contrast with Matomo’s MySQL schema is stark:

Page data. In Matomo, getting a page URL means joining log_visit to log_link_visit_action to log_action and filtering by action type, because action names are stored as integer IDs in a lookup table rather than readable strings. In d8a, page_location, page_title, and page_referrer are plain string columns: SELECT page_location FROM events.
Ecommerce items. Matomo keeps items in a separate log_conversion_item table with columns like idaction_sku, idaction_name, and idaction_category that reference integer IDs in log_action. In d8a, ecommerce_items is a structured array column on the event row itself, with sku, name, category_1 through category_5, price, and quantity as named fields.
Session aggregates. In Matomo, enriching an event with session-level context means joining log_link_visit_action back to log_visit and then to log_action to resolve action names. In d8a, session_total_page_views, session_duration, session_source, session_entry_page_location, and dozens of other session-scoped columns are available directly on every event row.

Session Scope Is Built In

No window functions. No CTEs. No self-joins. If you want the landing page for sessions where users made a purchase, it’s a WHERE clause, not a subquery. Session counters like session_total_purchases and session_total_outbound_clicks are already on every row.

Full Ecommerce Support

Order-level fields (revenue, tax, shipping, discount) land as typed columns. Items are stored as a structured array on the event row itself, with names and categories fully resolved - no joining back to a lookup table to make the data readable.

Custom Variables and Dimensions Preserved

Matomo’s cvar, _cvar, and dimensionN parameters are parsed into structured arrays (custom_variables and custom_dimensions) with named fields. If you rely on custom dimensions in Matomo, they carry over intact without any custom extraction logic.

For the full schema reference, see Matomo protocol database schema. For details on how each Matomo tracking parameter maps to d8a columns, see the Matomo tracking protocol reference.

Pick Your Warehouse

d8a supports three storage backends, and the choice is yours:

BigQuery. Serverless, pay-per-query, scales to petabytes. Plug into Looker Studio, Looker, Power BI, or any BI tool that speaks SQL.

ClickHouse. A columnar OLAP database purpose-built for analytics workloads. Self-hosted or managed. Extremely fast for time-series aggregations and high-cardinality filtering.

Object storage (files). d8a writes event data as structured files that can be loaded into any data warehouse or data lake. Snowflake, Redshift, Databricks, DuckDB, Synapse - if your organization already runs a warehouse, object storage gets your Matomo data there. This is the universal option: regardless of what your data infrastructure looks like, there’s a path.

None of these require pre-aggregation, archiving crons, or background queues. Every report is calculated at query time against the raw event table. The warehouse does what it was designed to do.

d8a also offers a fully managed cloud that connects to your own data warehouse, so you don’t need to run any d8a infrastructure yourself.

Run It Alongside Matomo

The recommended setup is to run d8a in parallel with your existing Matomo instance. Keep Matomo for the teams and workflows that depend on it. Add d8a for warehouse-grade querying and long-term data storage.

Marketing team keeps their Matomo dashboards, heatmaps, and funnels.
Data team gets a flat, queryable table in a warehouse they already know.
Both see the same events. No duplication of tracking implementation work.

If at some point you decide you no longer need Matomo’s UI, you can also point setTrackerUrl directly at d8a and stop running the Matomo instance entirely. Matomo’s matomo.js tracker is available from public CDNs, so you don’t even need a Matomo server to keep using the tracker. But that’s an option for teams that have already moved on from Matomo’s interface, not the starting recommendation.

When You’re Ready for More Flexibility

Once you’re comfortable with d8a handling your warehouse pipeline, you might start noticing the limits of Matomo’s event model itself. Matomo requires every custom event to have at least a category (e_c) and an action (e_a). That’s mandatory, even when neither concept is meaningful for what you’re tracking. If you just want to fire a signup_completed or trial_started event, you still have to invent a category and action to make Matomo accept it.

Beyond that, the full four-slot format - category, action, name, and value - doesn’t accommodate arbitrary custom parameters. Think subscription_tier, experiment_variant, content_score, onboarding_step, or any dimension specific to your business. In Matomo, you’d have to shoehorn these into custom dimensions or custom variables, each with its own slot limits and configuration overhead.

d8a also supports its own native tracking protocol, which follows a GA4-style event model: a named event with an open set of key-value parameters. No forced category/action/name/value structure. You define the parameters that matter for your business, and they land as queryable columns or in the params array for custom ones. If you outgrow what Matomo’s protocol can express, the upgrade path is already there - no pipeline changes, no warehouse migration. Just a different tracker on the same d8a backend.

If you’re running Matomo and want your tracking data in a real data warehouse, this is the simplest path to get there. One line of JavaScript. A flat schema that any analyst can query. Your choice of warehouse.

The setup guide is one page. The schema is fully documented. The code is open source. And if you’d rather not run infrastructure at all, d8a cloud connects to your own warehouse.