Posts

Building an In-House Data Science Platform

The in-house DSP is more than just a technological investment; it’s a strategic move to position the organization at the forefront of data-driven innovation. This centralized ecosystem is meticulously designed to enhance the end-to-end data science workflow, providing a collaborative space for data scientists, engineers, and analysts.Read More →

Sun Dec 03 2023

Snapshot Migration Using Spark on Kubernetes

Snapshot migration is a technique used to transfer data from one system to another by capturing a point-in-time snapshot of the source data. This method is particularly useful when dealing with large datasets, ensuring data consistency and minimizing downtime during the migration process. Leveraging Spark on Kubernetes adds scalability and flexibility to the migration pipeline, making it an ideal choice for modern data engineering workflows.Read More →

Sat Nov 11 2023

Airflow on Kubernetes with Helm

Airflow is a robust open-source platform that lets you programmatically author, schedule, and monitor workflows. Combining the power of Airflow with the resilience and scalability of Kubernetes, we can create a highly reliable data pipeline management system.Read More →

Sat Oct 21 2023

Blue/Green Deployment on Kubernetes

Blue/green deployment offers a seamless and controlled approach to updating and releasing applications, ensuring smooth transitions and efficient management of different application versions.Read More →

Sat Oct 14 2023

Migrating Data From Many Source With Change Data Capture on GCP (Part 1)

CDC provides real-time or near-real-time movement of data by moving and processing data continuously as new database events occur.Read More →

Sun May 28 2023

Mount Filestore to use for the Google Kubernetes Engine clusters

Filestore instances are fully managed file servers on Google Cloud that can be connected to Compute Engine VMs, GKE clusters, external datastores such as Google Cloud VMware Engine, and your on-premises machines.Read More →

Sat May 20 2023

BigQuery resource management

Our goal in this project is to optimize and allocate resources to the right workloads. By minimizing waste, and maximizing resources, we can provide the users of BigQuery at Tiki with the easiest and most optimized experience for their work.Read More →

Mon Aug 22 2022

Storage Feature Engineering for real-time prediction in MoMo

Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data.Read More →

Fri Jul 09 2021

Data Discovery 1.0 in MoMo

Data discovery is a term used to describe the process for collecting data from various sources by detecting patterns and outliers with the help of guided advanced analytics and visual navigation of data, thus enabling the consolidation of all business information.Read More →

Thu Jul 08 2021