d8a .tech
Data Analytics

The State of Open Source Analytics

Divine Data Team
#analytics#open-source#data-sovereignty#GA4
Feature image

The open-source analytics market is currently undergoing its biggest shift in a decade, driven primarily by the GA4 exodus and the tightening grip of privacy regulations like GDPR and CCPA.

For years, the market was stagnant: you either used Google Analytics or, if you were a privacy die-hard, you struggled with a self-hosted instance of Matomo. Today, the ecosystem has fragmented into three distinct philosophies, driven by modern infrastructure requirements.

The Privacy Minimalists (e.g., Plausible, Umami) emerged as a reaction against the bloat of ad-tech. These tools realized that 90% of website owners don’t need complex attribution models—they just need to know how many people visited. They prioritize lightweight scripts and legal compliance over data depth, accepting that simplicity and compliance matter more than comprehensive behavioral tracking.

The Product Engineers (e.g., PostHog) stopped caring about marketing hits and started caring about user behavior. This segment leverages modern columnar databases like ClickHouse to process massive datasets, blurring the line between web analytics and application debugging. They treat analytics as a core product development tool rather than a marketing afterthought.

The Infrastructure Builders (e.g., d8a.tech) represent the newest wave. These tools recognize that mature organizations don’t just want a dashboard; they want data ownership. They decouple the collection of data from the visualization of data, allowing engineers to build their own pipelines without the “Google Tax” that comes with proprietary platforms.

The market is moving away from the old LAMP stack (Linux, Apache, MySQL, PHP) toward high-concurrency languages like Go and Elixir, paired with columnar databases. The old stacks simply cannot handle the event volume of the modern web, making architectural modernization essential rather than optional.


Open Source Web/App Analytics Comparison in 2025

PlatformLicenseKey FocusTech StackLink
d8a.techMITThe “Headless” GA4 Alternative. A GA4-compatible data collection engine that gives you full data ownership.Go (Golang)https://d8a.tech/
PlausibleAGPLv3Privacy-First Minimalist. No cookies, no GDPR consent banners needed. Extremely lightweight script.Elixir / ClickHousehttps://plausible.io/
MatomoGPLv3The Legacy Standard. The closest UI/UX equivalent to the old Google Analytics (UA). Packed with features but resource-heavy.PHP / MySQLhttps://matomo.org/
PostHogMITProduct OS. Built for engineers to track product usage, feature flags, and session recordings.Python / ClickHousehttps://posthog.com/
UmamiMITSimple & Fast. A modern, cleaner alternative to Matomo. Easy to self-host (Vercel/Netlify friendly).Next.js / Postgreshttps://umami.is/

Where They Excel vs. Where They Suck (The “No Marketing BS” Breakdown)

Here is the breakdown of the operational headaches and wins you will encounter with these tools.

1. Matomo

Matomo excels at compliance and comprehensive features. If your Legal team demands on-premise data and your Marketing team demands Heatmaps, Tag Manager, and Funnels in one interface, Matomo is the only open-source choice that checks every box.

However, Matomo struggles with MySQL as the underlying bottleneck. The archiving process pre-calculates data blobs via cron jobs instead of querying raw logs. At 10M+ events, this cron job can take hours to run, and if it fails, your dashboard sits empty. Additionally, creating a custom segment on the fly—such as filtering for users from Japan using Chrome—bypasses the pre-calculated data and hits raw MySQL rows directly. On large datasets, this query will time out, lock your database, and crash the reporting interface. You end up forced to aggressively delete old raw data just to keep the system responsive.

2. Plausible Analytics

Plausible excels at handling traffic spikes. It runs on Elixir and ClickHouse, a database purpose-built for analytics. You can hit the front page of Hacker News with 100k visitors in an hour, and Plausible won’t even stutter. The tracking script is under 1KB, so it never slows down your site load time.

The trade-off is that Plausible stores aggregates, not session logs, leaving you unable to ask deeper questions. You cannot request specific user journeys—for example, showing the exact path of someone who bought the Premium plan. Complex attribution modeling is off the table, and when you need to debug a specific conversion drop-off, Plausible gives you a general number rather than the granular “who” or “how” details. It is strictly a tool for high-level trend analysis.

3. PostHog

PostHog excels at debugging user behavior by correlating quantitative data with qualitative insights. You can see a chart showing a drop in signups, click a point on the graph, and immediately watch session recordings of the users who failed to convert. This integration of analytics with product debugging is unbeatable for Product Engineers.

The downside is operational complexity. Self-hosting PostHog at scale is not a hobby project—it requires managing a cluster involving ClickHouse, Kafka, Zookeeper, and Redis. If a Kafka consumer lags, your data pipeline delays. If Zookeeper gets out of sync, the entire cluster degrades. This represents a heavy infrastructure commitment compared to deploying a simple Go binary or PHP script.

4. Umami

Umami excels at the Vercel workflow. You can deploy it for free on a Vercel hobby plan with a Supabase Postgres backend in roughly 3 minutes. The UI is snappy, modern, and dark-mode friendly, feeling like a tool built in 2024 rather than 2010.

The limitation is depth of analysis. Umami sits in an awkward middle ground—it’s not as privacy-rigid as Plausible (it allows more tracking), but it lacks the deep filtering capabilities of Matomo. If you have complex questions like comparing retention rates between cohorts, Umami essentially tells you to export the CSV and analyze it yourself.

5. d8a.tech

d8a.tech excels at high-performance ingestion. Because it’s a compiled Go binary, it processes events with ease. You can throw massive concurrency at it on a cheap VPS, and it simply writes to disk or database without complaint. It is the most efficient way to get GA4-compatible data into a private warehouse where you maintain full control.

The catch is that it’s headless—there’s no login button to see a pie chart immediately after installation. You essentially become the Product Manager of your analytics infrastructure. You have to write the SQL queries, set up visualization tools like Superset or Grafana, and define business logic yourself. When the marketing team asks where the bounce rate is, you have to define the formula for bounce rate in SQL. It is a tool for builders, not consumers.

Ready to see what analytics looks like when you don’t have to compromise? The code is open source. The architecture is proven. Your data stays yours.


Would you like a more detailed comparison on the Privacy Minimalists or the Infrastructure Builders?

← Back to Blog