RLWRLD

Data Platform Engineer

Division

Korea

Job group

Tech/Product

Experience Level

Experience irrelevant

Job Types

Full-time

Locations

Seoul Office서울특별시 강남구 선릉로 561

RLWRLD is a leading Physical AI company developing a Robotics Foundation Model (RFM) that enables robots to perceive, reason, and act in the real world like humans.

Building on deep research capabilities in AI and robotics and a strong data collaboration network with industrial partners in Japan, Korea, and beyond, RLWRLD is rapidly advancing our RFM to enable precise manipulation by high-degree-of-freedom robotic hands. The company is also collaborating with world-class research groups and partners in robotics and sensor solutions to develop AI models that can be practically deployed across industries such as manufacturing, logistics, and services.

Having raised approximately KRW 60 billion in cumulative seed funding from leading domestic and global venture capital firms and major corporations, RLWRLD continues to attract exceptional talent who are eager to drive innovation across AI, robotics technology, and business.

About the Product Organization

At RLWRLD, our Product Organization is responsible for developing all core products — spanning planning, development, and research.

We are building foundational technologies such as:

Robotics Foundation Model (RFM)
APIs/SDKs to deliver RFM functionality
Data pipeline & teleoperation tools
Training systems for model learning
Benchmark systems to test performance
Robot control systems
Infra stack (GPU orchestration, compute management)

Our team includes both research and software engineers, working fluidly across AI model development and software infrastructure. We collaborate closely with Academy Researchers, robotic hardware partners, and internal business developers to deliver cutting-edge robotics solutions.

Position Overview

In robotics AI development, the data pipeline is not merely a supporting system—it is a core infrastructure that directly determines the speed of model experimentation and the stability of training.

This role focuses on automating and optimizing the entire flow of large-scale multimodal data generated by robots, from collection and preprocessing to storage, loading, and training. By doing so, it builds systems that materially improve overall development productivity, enabling robotics model researchers and engineers to focus on experimentation without data bottlenecks.

We are looking for passionate and exceptional individuals who can ensure reliable multimodal data flow and significantly accelerate the pace of robotics development.

Key Responsibilities

Training Data Pipeline Design and Performance Optimization
Design end-to-end data pipelines covering training data collection through loading
Minimize processing latency so that model and engineering teams can immediately use data after offloading
Analyze bottlenecks in large-scale data processing and optimize throughput and latency
Automation of Large-Scale Data Collection and Preprocessing
Build data collection pipelines for multimodal robot sensor data, including cameras, depth sensors, IMU, joint states, force/torque, etc.
Automate and parallelize preprocessing tasks such as noise removal, time alignment, synchronization, and format conversion
Design idempotent preprocessing architectures that support data reprocessing
Data Storage Architecture Design and Optimization
Design storage architectures for large-scale time-series and structured data
Define schemas and partitioning strategies considering robot log formats such as Parquet, MCAP, and protobuf
Optimize storage cost and performance for training, analytics, and backtesting workloads
Data Loading and Training Integration Optimization
Design data loading architectures that minimize I/O bottlenecks during model training
Optimize access to large-scale training datasets using sampling strategies, sharding, and caching
Ensure stable data loading in distributed training environments (GPU / multi-node)
Data Pipeline Integration and Automation
Integrate data systems with robot testing and simulation pipelines
Build automated data processing and validation workflows integrated with CI/CD environments
Implement data quality checks, failure detection, and automated recovery logic
Monitoring and Operational Stability
Build monitoring systems for data throughput, latency, and failure rates
Establish rapid root-cause analysis and response processes for pipeline failures
Design observability and alerting systems for long-term, stable operation

Required Qualifications

Programming and Data Processing Skills
Experience with Python-based data processing (e.g., Pandas, Polars, PySpark)
Hands-on experience with large-scale datasets and performance optimization
Understanding of SQL/NoSQL database design, indexing, and query tuning
Data Engineering and Infrastructure Experience
Experience building and operating large-scale data pipelines or data platforms
Experience with distributed file systems and object storage (e.g., HDFS, AWS S3)
Experience operating data systems in containerized environments (e.g., Docker)
Automation and Operations Expertise
Experience automating ETL pipelines and batch/streaming data processing
Experience operating data workflows in CI/CD environments (e.g., Jenkins, GitLab CI)
Experience with automated testing and monitoring of data pipelines

Preferred Qualifications

ML/DL and MLOps Experience
Understanding of data requirements in machine learning and deep learning training pipelines
Experience managing large-scale training datasets and building reproducible experimentation environments
Experience with automated retraining, data versioning, and experiment tracking
Robotics Sensor Data Expertise
Understanding of robot sensor data characteristics, including RGB/depth cameras, IMU, and joint states
Experience handling ROS/rosbag or similar robotics data formats
Performance and Systems Optimization
Experience with high-performance I/O, parallel processing, and cache design
Experience resolving data loading bottlenecks in GPU-based training environments

Working Conditions

Work Location: 561 Seolleung-ro, Gangnam-gu, Seoul (RUBINA Building, Yeoksam-dong)
Employment Type: Full-time
Probationary Period
A three-month probationary period will apply upon employment.
During this period, your work attitude and performance will be evaluated.
Depending on the evaluation results, the probationary period may be extended or the employment offer may be withdrawn.

How to Apply

Application Materials:
Resume in English or Korean
(optional) Portfolio, research materials, or project documents showcasing your capabilities
Application Deadline: Rolling basis

Hiring Process

Document Screening → 1st Interview → 2nd Interview → 3rd Interview → Final Offer
Candidates who pass the document screening will be contacted individually.
Additional interview rounds or coding test may be conducted if necessary.

Work Environment & Support

Flexible Work Schedule: Adjust your working hours autonomously to match your personal rhythm.
Equipment & Software Support: We provide job-specific equipment and essential software required for your role.
Office Amenities: Enjoy our in-office snack bar and coffee machines.
Holiday & Birthday Gifts: Small gifts are provided for holidays and birthdays.
Health Checkup Support: We support your well-being through regular health checkups.