← Back to Home

Platforms Built — Data & Microservices

I have architected and delivered end-to-end data platforms and microservice ecosystems that enabled self-service analytics, operational resilience, and measurable business outcomes. These platforms span data lakes, governed semantic layers, real-time event streams, and microservices-based reporting systems for large industrial and SaaS customers.

Enterprise Data Platform Modernization

Consolidated siloed on-prem and cloud data into a governed data lake using Apache Iceberg on object storage (Parquet format). The platform provided:

  • ACID-compliant table formats with time-travel and schema evolution (Iceberg).
  • Parquet storage for performant columnar analytics and reduced storage cost.
  • Object storage integration (MinIO / S3 compatible) for cheap, scalable storage.
  • Data cataloging, dataset lineage, and role-based access for governed self-service.

Impact: reduced analytical ETL time by 60–80%, enabled trusted metrics across teams, and accelerated dashboard delivery for business stakeholders.

Microservices Reporting Platform

Designed a microservices-first reporting architecture to replace a monolithic reporting stack. Key elements:

  • Event-driven ingestion with Kafka (high-throughput buffering & replayability).
  • Stream processing (Spark/Flink) for near-real-time aggregation and feature extraction.
  • Isolated reporting microservices exposing semantic APIs consumed by BI tools.
  • Multi-tenant design with quota and governance boundaries for each tenant/region.

Outcome: report latency cut from hours to minutes, elastic scale to handle peak ingestion, and a clear separation between platform and BI layers for faster feature rollout.

SaaS Activation, License Monitoring & Optimization

Built SaaS lifecycle services for activation/deactivation, usage telemetry, and license optimization (used within OpenLM product initiatives).

  • Centralized license registry with usage events streamed to analytics.
  • Automation for license provisioning and policy-driven deactivation.
  • Cost dashboards and recommendations to reclaim unused licences.

Resulted in measurable license cost reduction for customers and improved governance over SaaS spend.

Operational Observability & CI/CD

Delivered platform reliability through robust CI/CD pipelines, infra-as-code, and observability stacks:

  • Automated builds, tests, and canary deployments for microservices (GitHub Actions / Jenkins).
  • Logging/metrics tracing (Prometheus + Grafana + OpenTelemetry) for end-to-end visibility.
  • Alerting and runbook integration to reduce MTTR and support SRE workflows.

Effect: stable rollouts with predictable lead times and a visible reduction in incident durations.

Governance, Security & Data Contracts

Implemented governance controls and data contracts to enable safe self-service:

  • Row/column-level access controls and attribute-based access for datasets.
  • Data contracts and schema checks preventing downstream breakages.
  • Secure credentials management and auditing for object stores and service principals.

This provided the confidence for analytics teams to ship new dashboards without manual gatekeeping.

Case Study: Migrating to Apache Iceberg + Parquet

At one of the largest privately owned steel companies in Indonesia, I led the migration of analytics data from traditional relational data stores into a modern data lakehouse architecture based on Apache Iceberg with Parquet as the columnar format. The project required balancing cost, governance, and analytical performance across multiple regions.

Objectives

  • Unify siloed departmental data into a central, queryable store.
  • Enable schema evolution without downtime or re-ingestion.
  • Reduce report generation time for management dashboards.
  • Lay a foundation for real-time analytics pipelines.

Approach

  • Deployed MinIO for object storage with S3 compatibility.
  • Adopted Apache Iceberg table format for ACID guarantees and time-travel queries.
  • Re-partitioned raw data into Parquet files for efficient scanning.
  • Introduced Spark SQL as the query engine, connected to BI tools (Power BI / QuickSight).
  • Implemented metadata refresh jobs and catalog sync for governance.

Results

  • Batch ETL windows reduced from 8 hours to ~2 hours.
  • Ad-hoc query performance improved by 60–70%.
  • Enabled reproducible analysis with time-travel queries.
  • Storage costs reduced by leveraging compressed Parquet.

Migration Checklist

  • 🔍 Assess current data volume, schema complexity, and retention needs.
  • 📂 Identify landing zone (object storage, MinIO/S3, HDFS).
  • ⚙️ Define partition strategy (by time, region, product, etc.).
  • 🧩 Convert legacy tables to Parquet, validate against queries.
  • 📑 Register tables in Iceberg catalog (Glue/Hive/REST catalog).
  • 🛠️ Update ETL jobs to write to Iceberg instead of RDBMS.
  • 📊 Connect BI tools (Power BI, QuickSight, Superset) to new tables.
  • 🔄 Monitor performance, optimize file sizes, compaction, and caching.

This case study demonstrated how open formats and modern table management systems can transform traditional industrial analytics into a future-ready lakehouse platform.

Apache Iceberg & Parquet Migration Architecture

Figure: Migration from legacy RDBMS to Iceberg + Parquet Lakehouse (S3/MinIO).

Representative Tech Stack

Kafka, Spark/Flink, Apache Iceberg, Parquet, MinIO / S3, PostgreSQL / ClickHouse, Docker & Kubernetes, Prometheus/Grafana, OpenTelemetry, GitHub Actions / Jenkins, Python, Java, and Node.js.

Business Outcomes — Sample Metrics

  • Report latency: from ~3–6 hours → 2–5 minutes for core business reports.
  • ETL window shrink: 60–80% reduction in batch processing time (Parquet + Iceberg optimizations).
  • License spend: single-digit % to mid-teens % reduction in wasted SaaS licensing via automated reclamation.
  • MTTR (incidents): decreased by ~40% after observability and runbook automation.

How I work with clients / teams

I partner with product owners, architects, and operations to translate business outcomes into a phased technical roadmap. Early emphasis is placed on:

  • Clearly defined SLAs and success metrics.
  • Small, deliverable-aligned milestones to show early ROI.
  • Automation and governance that scale without bottlenecks.