Data Platform Engineer
Division
Korea
Job group
Tech/Product
Experience Level
Experience irrelevant
Job Types
Full-time
Locations
Seoul Office서울특별시 강남구 선릉로 561

RLWRLD is ​a ​leading ​Physical AI ​company developing a Robotics ​Foundation ​Model (RFM) ​that enables robots ​to perceive, ​reason, ​and act ​in ​the ​real world like ​humans.


Building ​on deep research ​capabilities ​in ​AI and robotics ​and a ​strong ​data collaboration ​network with ​industrial ​partners in Japan, ​Korea, and ​beyond, RLWRLD is rapidly advancing our RFM to enable precise manipulation by high-degree-of-freedom robotic hands. The company is also collaborating with world-class research groups and partners in robotics and sensor solutions to develop AI models that can be practically deployed across industries such as manufacturing, logistics, and services.


Having raised approximately KRW 60 billion in cumulative seed funding from leading domestic and global venture capital firms and major corporations, RLWRLD continues to attract exceptional talent who are eager to drive innovation across AI, robotics technology, and business.








About the Product Organization


At RLWRLD, our Product Organization is responsible for developing all core products — spanning planning, development, and research.


We are building foundational technologies such as:

  • Robotics Foundation Model (RFM)
  • APIs/SDKs to deliver RFM functionality
  • Data pipeline & teleoperation tools
  • Training systems for model learning
  • Benchmark systems to test performance
  • Robot control systems
  • Infra stack (GPU orchestration, compute management)


Our team includes both research and software engineers, working fluidly across AI model development and software infrastructure. We collaborate closely with Academy Researchers, robotic hardware partners, and internal business developers to deliver cutting-edge robotics solutions.




Position Overview


In robotics AI development, the data pipeline is not merely a supporting system—it is a core infrastructure that directly determines the speed of model experimentation and the stability of training.


This role focuses on automating and optimizing the entire flow of large-scale multimodal data generated by robots, from collection and preprocessing to storage, loading, and training. By doing so, it builds systems that materially improve overall development productivity, enabling robotics model researchers and engineers to focus on experimentation without data bottlenecks.


We are looking for passionate and exceptional individuals who can ensure reliable multimodal data flow and significantly accelerate the pace of robotics development.




Key Responsibilities

  • Training Data Pipeline Design and Performance Optimization
  • Design end-to-end data pipelines covering training data collection through loading
  • Minimize processing latency so that model and engineering teams can immediately use data after offloading
  • Analyze bottlenecks in large-scale data processing and optimize throughput and latency
  • Automation of Large-Scale Data Collection and Preprocessing
  • Build data collection pipelines for multimodal robot sensor data, including cameras, depth sensors, IMU, joint states, force/torque, etc.
  • Automate and parallelize preprocessing tasks such as noise removal, time alignment, synchronization, and format conversion
  • Design idempotent preprocessing architectures that support data reprocessing
  • Data Storage Architecture Design and Optimization
  • Design storage architectures for large-scale time-series and structured data
  • Define schemas and partitioning strategies considering robot log formats such as Parquet, MCAP, and protobuf
  • Optimize storage cost and performance for training, analytics, and backtesting workloads
  • Data Loading and Training Integration Optimization
  • Design data loading architectures that minimize I/O bottlenecks during model training
  • Optimize access to large-scale training datasets using sampling strategies, sharding, and caching
  • Ensure stable data loading in distributed training environments (GPU / multi-node)
  • Data Pipeline Integration and Automation
  • Integrate data systems with robot testing and simulation pipelines
  • Build automated data processing and validation workflows integrated with CI/CD environments
  • Implement data quality checks, failure detection, and automated recovery logic
  • Monitoring and Operational Stability
  • Build monitoring systems for data throughput, latency, and failure rates
  • Establish rapid root-cause analysis and response processes for pipeline failures
  • Design observability and alerting systems for long-term, stable operation





Required Qualifications

  • Programming and Data Processing Skills
  • Experience with Python-based data processing (e.g., Pandas, Polars, PySpark)
  • Hands-on experience with large-scale datasets and performance optimization
  • Understanding of SQL/NoSQL database design, indexing, and query tuning
  • Data Engineering and Infrastructure Experience
  • Experience building and operating large-scale data pipelines or data platforms
  • Experience with distributed file systems and object storage (e.g., HDFS, AWS S3)
  • Experience operating data systems in containerized environments (e.g., Docker)
  • Automation and Operations Expertise
  • Experience automating ETL pipelines and batch/streaming data processing
  • Experience operating data workflows in CI/CD environments (e.g., Jenkins, GitLab CI)
  • Experience with automated testing and monitoring of data pipelines



Preferred Qualifications

  • ML/DL and MLOps Experience
  • Understanding of data requirements in machine learning and deep learning training pipelines
  • Experience managing large-scale training datasets and building reproducible experimentation environments
  • Experience with automated retraining, data versioning, and experiment tracking
  • Robotics Sensor Data Expertise
  • Understanding of robot sensor data characteristics, including RGB/depth cameras, IMU, and joint states
  • Experience handling ROS/rosbag or similar robotics data formats
  • Performance and Systems Optimization
  • Experience with high-performance I/O, parallel processing, and cache design
  • Experience resolving data loading bottlenecks in GPU-based training environments




Working Conditions

  • Work Location: 561 Seolleung-ro, Gangnam-gu, Seoul (RUBINA Building, Yeoksam-dong)
  • Employment Type: Full-time
  • Probationary Period
  • A three-month probationary period will apply upon employment.
  • During this period, your work attitude and performance will be evaluated.
  • Depending on the evaluation results, the probationary period may be extended or the employment offer may be withdrawn.



How to Apply

  • Application Materials:
  • Resume in English or Korean
  • (optional) Portfolio, research materials, or project documents showcasing your capabilities
  • Application Deadline: Rolling basis



Hiring Process

  • Document Screening → 1st Interview → 2nd Interview → 3rd Interview → Final Offer
  • Candidates who pass the document screening will be contacted individually.
  • Additional interview rounds or coding test may be conducted if necessary.



Work Environment & Support

  • Flexible Work Schedule: Adjust your working hours autonomously to match your personal rhythm.
  • Equipment & Software Support: We provide job-specific equipment and essential software required for your role.
  • Office Amenities: Enjoy our in-office snack bar and coffee machines.
  • Holiday & Birthday Gifts: Small gifts are provided for holidays and birthdays.
  • Health Checkup Support: We support your well-being through regular health checkups.
Share
Data Platform Engineer

RLWRLD is ​a ​leading ​Physical AI ​company developing a Robotics ​Foundation ​Model (RFM) ​that enables robots ​to perceive, ​reason, ​and act ​in ​the ​real world like ​humans.


Building ​on deep research ​capabilities ​in ​AI and robotics ​and a ​strong ​data collaboration ​network with ​industrial ​partners in Japan, ​Korea, and ​beyond, RLWRLD is rapidly advancing our RFM to enable precise manipulation by high-degree-of-freedom robotic hands. The company is also collaborating with world-class research groups and partners in robotics and sensor solutions to develop AI models that can be practically deployed across industries such as manufacturing, logistics, and services.


Having raised approximately KRW 60 billion in cumulative seed funding from leading domestic and global venture capital firms and major corporations, RLWRLD continues to attract exceptional talent who are eager to drive innovation across AI, robotics technology, and business.








About the Product Organization


At RLWRLD, our Product Organization is responsible for developing all core products — spanning planning, development, and research.


We are building foundational technologies such as:

  • Robotics Foundation Model (RFM)
  • APIs/SDKs to deliver RFM functionality
  • Data pipeline & teleoperation tools
  • Training systems for model learning
  • Benchmark systems to test performance
  • Robot control systems
  • Infra stack (GPU orchestration, compute management)


Our team includes both research and software engineers, working fluidly across AI model development and software infrastructure. We collaborate closely with Academy Researchers, robotic hardware partners, and internal business developers to deliver cutting-edge robotics solutions.




Position Overview


In robotics AI development, the data pipeline is not merely a supporting system—it is a core infrastructure that directly determines the speed of model experimentation and the stability of training.


This role focuses on automating and optimizing the entire flow of large-scale multimodal data generated by robots, from collection and preprocessing to storage, loading, and training. By doing so, it builds systems that materially improve overall development productivity, enabling robotics model researchers and engineers to focus on experimentation without data bottlenecks.


We are looking for passionate and exceptional individuals who can ensure reliable multimodal data flow and significantly accelerate the pace of robotics development.




Key Responsibilities

  • Training Data Pipeline Design and Performance Optimization
  • Design end-to-end data pipelines covering training data collection through loading
  • Minimize processing latency so that model and engineering teams can immediately use data after offloading
  • Analyze bottlenecks in large-scale data processing and optimize throughput and latency
  • Automation of Large-Scale Data Collection and Preprocessing
  • Build data collection pipelines for multimodal robot sensor data, including cameras, depth sensors, IMU, joint states, force/torque, etc.
  • Automate and parallelize preprocessing tasks such as noise removal, time alignment, synchronization, and format conversion
  • Design idempotent preprocessing architectures that support data reprocessing
  • Data Storage Architecture Design and Optimization
  • Design storage architectures for large-scale time-series and structured data
  • Define schemas and partitioning strategies considering robot log formats such as Parquet, MCAP, and protobuf
  • Optimize storage cost and performance for training, analytics, and backtesting workloads
  • Data Loading and Training Integration Optimization
  • Design data loading architectures that minimize I/O bottlenecks during model training
  • Optimize access to large-scale training datasets using sampling strategies, sharding, and caching
  • Ensure stable data loading in distributed training environments (GPU / multi-node)
  • Data Pipeline Integration and Automation
  • Integrate data systems with robot testing and simulation pipelines
  • Build automated data processing and validation workflows integrated with CI/CD environments
  • Implement data quality checks, failure detection, and automated recovery logic
  • Monitoring and Operational Stability
  • Build monitoring systems for data throughput, latency, and failure rates
  • Establish rapid root-cause analysis and response processes for pipeline failures
  • Design observability and alerting systems for long-term, stable operation





Required Qualifications

  • Programming and Data Processing Skills
  • Experience with Python-based data processing (e.g., Pandas, Polars, PySpark)
  • Hands-on experience with large-scale datasets and performance optimization
  • Understanding of SQL/NoSQL database design, indexing, and query tuning
  • Data Engineering and Infrastructure Experience
  • Experience building and operating large-scale data pipelines or data platforms
  • Experience with distributed file systems and object storage (e.g., HDFS, AWS S3)
  • Experience operating data systems in containerized environments (e.g., Docker)
  • Automation and Operations Expertise
  • Experience automating ETL pipelines and batch/streaming data processing
  • Experience operating data workflows in CI/CD environments (e.g., Jenkins, GitLab CI)
  • Experience with automated testing and monitoring of data pipelines



Preferred Qualifications

  • ML/DL and MLOps Experience
  • Understanding of data requirements in machine learning and deep learning training pipelines
  • Experience managing large-scale training datasets and building reproducible experimentation environments
  • Experience with automated retraining, data versioning, and experiment tracking
  • Robotics Sensor Data Expertise
  • Understanding of robot sensor data characteristics, including RGB/depth cameras, IMU, and joint states
  • Experience handling ROS/rosbag or similar robotics data formats
  • Performance and Systems Optimization
  • Experience with high-performance I/O, parallel processing, and cache design
  • Experience resolving data loading bottlenecks in GPU-based training environments




Working Conditions

  • Work Location: 561 Seolleung-ro, Gangnam-gu, Seoul (RUBINA Building, Yeoksam-dong)
  • Employment Type: Full-time
  • Probationary Period
  • A three-month probationary period will apply upon employment.
  • During this period, your work attitude and performance will be evaluated.
  • Depending on the evaluation results, the probationary period may be extended or the employment offer may be withdrawn.



How to Apply

  • Application Materials:
  • Resume in English or Korean
  • (optional) Portfolio, research materials, or project documents showcasing your capabilities
  • Application Deadline: Rolling basis



Hiring Process

  • Document Screening → 1st Interview → 2nd Interview → 3rd Interview → Final Offer
  • Candidates who pass the document screening will be contacted individually.
  • Additional interview rounds or coding test may be conducted if necessary.



Work Environment & Support

  • Flexible Work Schedule: Adjust your working hours autonomously to match your personal rhythm.
  • Equipment & Software Support: We provide job-specific equipment and essential software required for your role.
  • Office Amenities: Enjoy our in-office snack bar and coffee machines.
  • Holiday & Birthday Gifts: Small gifts are provided for holidays and birthdays.
  • Health Checkup Support: We support your well-being through regular health checkups.