RLWRLD

Reinforcement Learning Research Engineer

Division

Korea

Job group

Tech/Product

Experience Level

Experience irrelevant

Job Types

Full-time

Locations

Seoul Office서울특별시 강남구 선릉로 561

RLWRLD is a leading Physical AI company developing a Robotics Foundation Model (RFM) that enables robots to perceive, reason, and act in the real world like humans.

Building on deep research capabilities in AI and robotics and a strong data collaboration network with industrial partners in Japan, Korea, and beyond, RLWRLD is rapidly advancing our RFM to enable precise manipulation by high-degree-of-freedom robotic hands. The company is also collaborating with world-class research groups and partners in robotics and sensor solutions to develop AI models that can be practically deployed across industries such as manufacturing, logistics, and services.

Having raised approximately KRW 60 billion in cumulative seed funding from leading domestic and global venture capital firms and major corporations, RLWRLD continues to attract exceptional talent who are eager to drive innovation across AI, robotics technology, and business.

About the Product Organization

At RLWRLD, our Product Organization is responsible for developing all core products — spanning planning, development, and research.

We are building foundational technologies such as:

Robotics Foundation Model (RFM)
APIs/SDKs to deliver RFM functionality
Data pipeline & teleoperation tools
Training systems for model learning
Benchmark systems to test performance
Robot control systems
Infra stack (GPU orchestration, compute management)

Our team includes both research and software engineers, working fluidly across AI model development and software infrastructure. We collaborate closely with Academy Researchers, robotic hardware partners, and internal business developers to deliver cutting-edge robotics solutions.

Position Overview

We are seeking a Real-world Robot Learning innovator who goes beyond simulation to directly address the complexity of real industrial environments using physical robots.

This role tackles core challenges that extend beyond laboratory-scale algorithms, continuously improving policies in real-robot environments through Offline-to-Online Reinforcement Learning (RL) strategies powered by large-scale data. In particular, you will design large end-to-end models based on Vision-Language-Action (VLA) integration—combining vision, language, and action—and optimize them for real robot systems to deliver intelligent control models that operate reliably in real-world deployments.

We are looking for individuals who go beyond architecture design and can overcome real-world uncertainty through data, ultimately delivering high-performance control policies for next-generation robotics.

Key Responsibilities

Development of High-Performance VLA-Based Control Policies
Research and develop RL algorithms optimized for high-capacity generative models such as Diffusion, Flow Matching, and Auto-regressive models
Design and implement reinforcement learning methods tailored to high-dimensional representation learning
Push beyond the limitations of imitation learning by developing effective RL techniques that enable complex behaviors and robust handling of edge cases that are difficult to achieve with imitation alone
Building Practical Offline-to-Online RL Pipelines
Data-efficient RL: Develop sample-efficient Offline-to-Online RL algorithms that maximize performance using large-scale offline datasets with minimal real-robot interaction
Scalable pipelines: Design robust training and deployment pipelines that enable continuous application and improvement of RL models beyond the research stage and into production systems
Advanced Multimodal Reward Modeling
Complex task reward design: Research reward models that precisely evaluate success and progress in complex manipulation tasks using multimodal data (vision, tactile signals, language, etc.)
Human-in-the-loop & scalable supervision: Develop mechanisms that convert real-world industrial feedback into effective learning signals
Real-Robot-Centered Validation and Cross-Functional Collaboration
Real-world validation: Deploy developed models on real robot manipulators, analyze performance data, and prioritize real-world applicability
Cross-functional collaboration: Work closely with systems and hardware engineers to ensure that algorithmic advances translate into optimal end-to-end robot system performance, including latency and stability

Required Qualifications

Deep Learning & Generative Model Expertise
Strong understanding of modern architectures such as Transformers, Diffusion models, and Flow Matching
Proven ability to implement and optimize these models for robotics control objectives
VLA or Large-Scale VLM Experience
Experience designing decision-making and control policies by integrating multimodal data
Hands-on experience applying large-scale models to real robotic tasks
Reinforcement Learning (RL) and Imitation Learning (IL) Proficiency
Deep understanding and practical experience with Offline-to-Online RL and Offline RL algorithms (e.g., CQL, IQL)
Experience with advanced imitation learning techniques beyond behavior cloning
Experience optimizing policies in high-dimensional action spaces
Programming and Development Environment Proficiency
Strong programming skills in Python with frameworks such as PyTorch or JAX
Ability to integrate models into real-world robotic systems

Preferred Qualifications

Real-world Robot Learning Experience
Experience successfully deploying end-to-end control models on real robot manipulators without relying solely on simulation
Robotics-Focused Mathematics and Optimization Knowledge
Deep insight into the mathematical foundations of reinforcement learning, including dynamics, probability theory, and non-convex optimization
Top-Tier Research Credentials
First-author publications or presentations at leading AI and robotics conferences such as NeurIPS, ICML, ICLR, CVPR, RSS, ICRA, or IROS
Large-Scale Model Training and Infrastructure Experience
Experience with distributed training and optimization of large-parameter models in GPU cluster environments (multi-GPU, multi-node)
MLOps and Data Engineering Capabilities
Experience building pipelines to systematically manage and leverage large-scale interaction data generated by real robots for training and continuous improvement

Working Conditions

Work Location: 561 Seolleung-ro, Gangnam-gu, Seoul (RUBINA Building, Yeoksam-dong)
Employment Type: Full-time
Probationary Period
A three-month probationary period will apply upon employment.
During this period, your work attitude and performance will be evaluated.
Depending on the evaluation results, the probationary period may be extended or the employment offer may be withdrawn.

How to Apply

Application Materials:
Resume in English or Korean
(optional) Portfolio, research materials, or project documents showcasing your capabilities
Application Deadline: Rolling basis

Hiring Process

Document Screening → 1st Interview → 2nd Interview → 3rd Interview → Final Offer
Candidates who pass the document screening will be contacted individually.
Additional Coffee Chats or Coding Test may be conducted if necessary.

Work Environment & Support

Flexible Work Schedule: Adjust your working hours autonomously to match your personal rhythm.
Equipment & Software Support: We provide job-specific equipment and essential software required for your role.
Office Amenities: Enjoy our in-office snack bar and coffee machines.
Holiday & Birthday Gifts: Small gifts are provided for holidays and birthdays.
Health Checkup Support: We support your well-being through regular health checkups.