Platforms Built — Data & Microservices

I have architected and delivered end-to-end data platforms and microservice ecosystems that enabled self-service analytics, operational resilience, and measurable business outcomes. These platforms span data lakes, governed semantic layers, real-time event streams, and microservices-based reporting systems for large industrial and SaaS customers.

Enterprise Data Platform Modernization

Consolidated siloed on-prem and cloud data into a governed data lake using Apache Iceberg on object storage (Parquet format). The platform provided:

ACID-compliant table formats with time-travel and schema evolution (Iceberg).
Parquet storage for performant columnar analytics and reduced storage cost.
Object storage integration (MinIO / S3 compatible) for cheap, scalable storage.
Data cataloging, dataset lineage, and role-based access for governed self-service.

Impact: reduced analytical ETL time by 60–80%, enabled trusted metrics across teams, and accelerated dashboard delivery for business stakeholders.

Microservices Reporting Platform

Designed a microservices-first reporting architecture to replace a monolithic reporting stack. Key elements:

Event-driven ingestion with Kafka (high-throughput buffering & replayability).
Stream processing (Spark/Flink) for near-real-time aggregation and feature extraction.
Isolated reporting microservices exposing semantic APIs consumed by BI tools.
Multi-tenant design with quota and governance boundaries for each tenant/region.

Outcome: report latency cut from hours to minutes, elastic scale to handle peak ingestion, and a clear separation between platform and BI layers for faster feature rollout.

SaaS Activation, License Monitoring & Optimization

Built SaaS lifecycle services for activation/deactivation, usage telemetry, and license optimization (used within OpenLM product initiatives).

Centralized license registry with usage events streamed to analytics.
Automation for license provisioning and policy-driven deactivation.
Cost dashboards and recommendations to reclaim unused licences.

Resulted in measurable license cost reduction for customers and improved governance over SaaS spend.

Operational Observability & CI/CD

Delivered platform reliability through robust CI/CD pipelines, infra-as-code, and observability stacks:

Automated builds, tests, and canary deployments for microservices (GitHub Actions / Jenkins).
Logging/metrics tracing (Prometheus + Grafana + OpenTelemetry) for end-to-end visibility.
Alerting and runbook integration to reduce MTTR and support SRE workflows.

Effect: stable rollouts with predictable lead times and a visible reduction in incident durations.

Governance, Security & Data Contracts

Implemented governance controls and data contracts to enable safe self-service:

Row/column-level access controls and attribute-based access for datasets.
Data contracts and schema checks preventing downstream breakages.
Secure credentials management and auditing for object stores and service principals.

This provided the confidence for analytics teams to ship new dashboards without manual gatekeeping.

Case Study: Migrating to Apache Iceberg + Parquet

At one of the largest privately owned steel companies in Indonesia, I led the migration of analytics data from traditional relational data stores into a modern data lakehouse architecture based on Apache Iceberg with Parquet as the columnar format. The project required balancing cost, governance, and analytical performance across multiple regions.

Objectives

Unify siloed departmental data into a central, queryable store.
Enable schema evolution without downtime or re-ingestion.
Reduce report generation time for management dashboards.
Lay a foundation for real-time analytics pipelines.

Approach

Deployed MinIO for object storage with S3 compatibility.
Adopted Apache Iceberg table format for ACID guarantees and time-travel queries.
Re-partitioned raw data into Parquet files for efficient scanning.
Introduced Spark SQL as the query engine, connected to BI tools (Power BI / QuickSight).
Implemented metadata refresh jobs and catalog sync for governance.

Results

Batch ETL windows reduced from 8 hours to ~2 hours.
Ad-hoc query performance improved by 60–70%.
Enabled reproducible analysis with time-travel queries.
Storage costs reduced by leveraging compressed Parquet.

Migration Checklist

🔍 Assess current data volume, schema complexity, and retention needs.
📂 Identify landing zone (object storage, MinIO/S3, HDFS).
⚙️ Define partition strategy (by time, region, product, etc.).
🧩 Convert legacy tables to Parquet, validate against queries.
📑 Register tables in Iceberg catalog (Glue/Hive/REST catalog).
🛠️ Update ETL jobs to write to Iceberg instead of RDBMS.
📊 Connect BI tools (Power BI, QuickSight, Superset) to new tables.
🔄 Monitor performance, optimize file sizes, compaction, and caching.

This case study demonstrated how open formats and modern table management systems can transform traditional industrial analytics into a future-ready lakehouse platform.

Apache Iceberg & Parquet Migration Architecture

Figure: Migration from legacy RDBMS to Iceberg + Parquet Lakehouse (S3/MinIO).

Representative Tech Stack

Kafka, Spark/Flink, Apache Iceberg, Parquet, MinIO / S3, PostgreSQL / ClickHouse, Docker & Kubernetes, Prometheus/Grafana, OpenTelemetry, GitHub Actions / Jenkins, Python, Java, and Node.js.

Business Outcomes — Sample Metrics

Report latency: from ~3–6 hours → 2–5 minutes for core business reports.
ETL window shrink: 60–80% reduction in batch processing time (Parquet + Iceberg optimizations).
License spend: single-digit % to mid-teens % reduction in wasted SaaS licensing via automated reclamation.
MTTR (incidents): decreased by ~40% after observability and runbook automation.

How I work with clients / teams

I partner with product owners, architects, and operations to translate business outcomes into a phased technical roadmap. Early emphasis is placed on:

Clearly defined SLAs and success metrics.
Small, deliverable-aligned milestones to show early ROI.
Automation and governance that scale without bottlenecks.

← Back to Home