Reinforcement Learning Research Engineer
Division
Korea
Job group
Tech/Product
Experience Level
Experience irrelevant
Job Types
Full-time
Locations
Seoul Office서울특별시 강남구 선릉로 561

RLWRLD is ​a ​leading ​Physical AI ​company developing a Robotics ​Foundation ​Model (RFM) ​that enables robots ​to perceive, ​reason, ​and act ​in ​the ​real world like ​humans.


Building ​on deep research ​capabilities ​in ​AI and robotics ​and a ​strong ​data collaboration ​network with ​industrial ​partners in Japan, ​Korea, and ​beyond, RLWRLD is rapidly advancing our RFM to enable precise manipulation by high-degree-of-freedom robotic hands. The company is also collaborating with world-class research groups and partners in robotics and sensor solutions to develop AI models that can be practically deployed across industries such as manufacturing, logistics, and services.


Having raised approximately KRW 60 billion in cumulative seed funding from leading domestic and global venture capital firms and major corporations, RLWRLD continues to attract exceptional talent who are eager to drive innovation across AI, robotics technology, and business.








About the Product Organization


At RLWRLD, our Product Organization is responsible for developing all core products — spanning planning, development, and research.


We are building foundational technologies such as:

  • Robotics Foundation Model (RFM)
  • APIs/SDKs to deliver RFM functionality
  • Data pipeline & teleoperation tools
  • Training systems for model learning
  • Benchmark systems to test performance
  • Robot control systems
  • Infra stack (GPU orchestration, compute management)


Our team includes both research and software engineers, working fluidly across AI model development and software infrastructure. We collaborate closely with Academy Researchers, robotic hardware partners, and internal business developers to deliver cutting-edge robotics solutions.



Position Overview


We are seeking a Real-world Robot Learning innovator who goes beyond simulation to directly address the complexity of real industrial environments using physical robots.


This role tackles core challenges that extend beyond laboratory-scale algorithms, continuously improving policies in real-robot environments through Offline-to-Online Reinforcement Learning (RL) strategies powered by large-scale data. In particular, you will design large end-to-end models based on Vision-Language-Action (VLA) integration—combining vision, language, and action—and optimize them for real robot systems to deliver intelligent control models that operate reliably in real-world deployments.


We are looking for individuals who go beyond architecture design and can overcome real-world uncertainty through data, ultimately delivering high-performance control policies for next-generation robotics.




Key Responsibilities

  • Development of High-Performance VLA-Based Control Policies
  • Research and develop RL algorithms optimized for high-capacity generative models such as Diffusion, Flow Matching, and Auto-regressive models
  • Design and implement reinforcement learning methods tailored to high-dimensional representation learning
  • Push beyond the limitations of imitation learning by developing effective RL techniques that enable complex behaviors and robust handling of edge cases that are difficult to achieve with imitation alone
  • Building Practical Offline-to-Online RL Pipelines
  • Data-efficient RL: Develop sample-efficient Offline-to-Online RL algorithms that maximize performance using large-scale offline datasets with minimal real-robot interaction
  • Scalable pipelines: Design robust training and deployment pipelines that enable continuous application and improvement of RL models beyond the research stage and into production systems
  • Advanced Multimodal Reward Modeling
  • Complex task reward design: Research reward models that precisely evaluate success and progress in complex manipulation tasks using multimodal data (vision, tactile signals, language, etc.)
  • Human-in-the-loop & scalable supervision: Develop mechanisms that convert real-world industrial feedback into effective learning signals
  • Real-Robot-Centered Validation and Cross-Functional Collaboration
  • Real-world validation: Deploy developed models on real robot manipulators, analyze performance data, and prioritize real-world applicability
  • Cross-functional collaboration: Work closely with systems and hardware engineers to ensure that algorithmic advances translate into optimal end-to-end robot system performance, including latency and stability



Required Qualifications

  • Deep Learning & Generative Model Expertise
  • Strong understanding of modern architectures such as Transformers, Diffusion models, and Flow Matching
  • Proven ability to implement and optimize these models for robotics control objectives
  • VLA or Large-Scale VLM Experience
  • Experience designing decision-making and control policies by integrating multimodal data
  • Hands-on experience applying large-scale models to real robotic tasks
  • Reinforcement Learning (RL) and Imitation Learning (IL) Proficiency
  • Deep understanding and practical experience with Offline-to-Online RL and Offline RL algorithms (e.g., CQL, IQL)
  • Experience with advanced imitation learning techniques beyond behavior cloning
  • Experience optimizing policies in high-dimensional action spaces
  • Programming and Development Environment Proficiency
  • Strong programming skills in Python with frameworks such as PyTorch or JAX
  • Ability to integrate models into real-world robotic systems




Preferred Qualifications

  • Real-world Robot Learning Experience
  • Experience successfully deploying end-to-end control models on real robot manipulators without relying solely on simulation
  • Robotics-Focused Mathematics and Optimization Knowledge
  • Deep insight into the mathematical foundations of reinforcement learning, including dynamics, probability theory, and non-convex optimization
  • Top-Tier Research Credentials
  • First-author publications or presentations at leading AI and robotics conferences such as NeurIPS, ICML, ICLR, CVPR, RSS, ICRA, or IROS
  • Large-Scale Model Training and Infrastructure Experience
  • Experience with distributed training and optimization of large-parameter models in GPU cluster environments (multi-GPU, multi-node)
  • MLOps and Data Engineering Capabilities
  • Experience building pipelines to systematically manage and leverage large-scale interaction data generated by real robots for training and continuous improvement




Working Conditions

  • Work Location: 561 Seolleung-ro, Gangnam-gu, Seoul (RUBINA Building, Yeoksam-dong)
  • Employment Type: Full-time
  • Probationary Period
  • A three-month probationary period will apply upon employment.
  • During this period, your work attitude and performance will be evaluated.
  • Depending on the evaluation results, the probationary period may be extended or the employment offer may be withdrawn.



How to Apply

  • Application Materials:
  • Resume in English or Korean
  • (optional) Portfolio, research materials, or project documents showcasing your capabilities
  • Application Deadline: Rolling basis



Hiring Process

  • Document Screening → 1st Interview → 2nd Interview → 3rd Interview → Final Offer
  • Candidates who pass the document screening will be contacted individually.
  • Additional Coffee Chats or Coding Test may be conducted if necessary.



Work Environment & Support

  • Flexible Work Schedule: Adjust your working hours autonomously to match your personal rhythm.
  • Equipment & Software Support: We provide job-specific equipment and essential software required for your role.
  • Office Amenities: Enjoy our in-office snack bar and coffee machines.
  • Holiday & Birthday Gifts: Small gifts are provided for holidays and birthdays.
  • Health Checkup Support: We support your well-being through regular health checkups.
Share
Reinforcement Learning Research Engineer

RLWRLD is ​a ​leading ​Physical AI ​company developing a Robotics ​Foundation ​Model (RFM) ​that enables robots ​to perceive, ​reason, ​and act ​in ​the ​real world like ​humans.


Building ​on deep research ​capabilities ​in ​AI and robotics ​and a ​strong ​data collaboration ​network with ​industrial ​partners in Japan, ​Korea, and ​beyond, RLWRLD is rapidly advancing our RFM to enable precise manipulation by high-degree-of-freedom robotic hands. The company is also collaborating with world-class research groups and partners in robotics and sensor solutions to develop AI models that can be practically deployed across industries such as manufacturing, logistics, and services.


Having raised approximately KRW 60 billion in cumulative seed funding from leading domestic and global venture capital firms and major corporations, RLWRLD continues to attract exceptional talent who are eager to drive innovation across AI, robotics technology, and business.








About the Product Organization


At RLWRLD, our Product Organization is responsible for developing all core products — spanning planning, development, and research.


We are building foundational technologies such as:

  • Robotics Foundation Model (RFM)
  • APIs/SDKs to deliver RFM functionality
  • Data pipeline & teleoperation tools
  • Training systems for model learning
  • Benchmark systems to test performance
  • Robot control systems
  • Infra stack (GPU orchestration, compute management)


Our team includes both research and software engineers, working fluidly across AI model development and software infrastructure. We collaborate closely with Academy Researchers, robotic hardware partners, and internal business developers to deliver cutting-edge robotics solutions.



Position Overview


We are seeking a Real-world Robot Learning innovator who goes beyond simulation to directly address the complexity of real industrial environments using physical robots.


This role tackles core challenges that extend beyond laboratory-scale algorithms, continuously improving policies in real-robot environments through Offline-to-Online Reinforcement Learning (RL) strategies powered by large-scale data. In particular, you will design large end-to-end models based on Vision-Language-Action (VLA) integration—combining vision, language, and action—and optimize them for real robot systems to deliver intelligent control models that operate reliably in real-world deployments.


We are looking for individuals who go beyond architecture design and can overcome real-world uncertainty through data, ultimately delivering high-performance control policies for next-generation robotics.




Key Responsibilities

  • Development of High-Performance VLA-Based Control Policies
  • Research and develop RL algorithms optimized for high-capacity generative models such as Diffusion, Flow Matching, and Auto-regressive models
  • Design and implement reinforcement learning methods tailored to high-dimensional representation learning
  • Push beyond the limitations of imitation learning by developing effective RL techniques that enable complex behaviors and robust handling of edge cases that are difficult to achieve with imitation alone
  • Building Practical Offline-to-Online RL Pipelines
  • Data-efficient RL: Develop sample-efficient Offline-to-Online RL algorithms that maximize performance using large-scale offline datasets with minimal real-robot interaction
  • Scalable pipelines: Design robust training and deployment pipelines that enable continuous application and improvement of RL models beyond the research stage and into production systems
  • Advanced Multimodal Reward Modeling
  • Complex task reward design: Research reward models that precisely evaluate success and progress in complex manipulation tasks using multimodal data (vision, tactile signals, language, etc.)
  • Human-in-the-loop & scalable supervision: Develop mechanisms that convert real-world industrial feedback into effective learning signals
  • Real-Robot-Centered Validation and Cross-Functional Collaboration
  • Real-world validation: Deploy developed models on real robot manipulators, analyze performance data, and prioritize real-world applicability
  • Cross-functional collaboration: Work closely with systems and hardware engineers to ensure that algorithmic advances translate into optimal end-to-end robot system performance, including latency and stability



Required Qualifications

  • Deep Learning & Generative Model Expertise
  • Strong understanding of modern architectures such as Transformers, Diffusion models, and Flow Matching
  • Proven ability to implement and optimize these models for robotics control objectives
  • VLA or Large-Scale VLM Experience
  • Experience designing decision-making and control policies by integrating multimodal data
  • Hands-on experience applying large-scale models to real robotic tasks
  • Reinforcement Learning (RL) and Imitation Learning (IL) Proficiency
  • Deep understanding and practical experience with Offline-to-Online RL and Offline RL algorithms (e.g., CQL, IQL)
  • Experience with advanced imitation learning techniques beyond behavior cloning
  • Experience optimizing policies in high-dimensional action spaces
  • Programming and Development Environment Proficiency
  • Strong programming skills in Python with frameworks such as PyTorch or JAX
  • Ability to integrate models into real-world robotic systems




Preferred Qualifications

  • Real-world Robot Learning Experience
  • Experience successfully deploying end-to-end control models on real robot manipulators without relying solely on simulation
  • Robotics-Focused Mathematics and Optimization Knowledge
  • Deep insight into the mathematical foundations of reinforcement learning, including dynamics, probability theory, and non-convex optimization
  • Top-Tier Research Credentials
  • First-author publications or presentations at leading AI and robotics conferences such as NeurIPS, ICML, ICLR, CVPR, RSS, ICRA, or IROS
  • Large-Scale Model Training and Infrastructure Experience
  • Experience with distributed training and optimization of large-parameter models in GPU cluster environments (multi-GPU, multi-node)
  • MLOps and Data Engineering Capabilities
  • Experience building pipelines to systematically manage and leverage large-scale interaction data generated by real robots for training and continuous improvement




Working Conditions

  • Work Location: 561 Seolleung-ro, Gangnam-gu, Seoul (RUBINA Building, Yeoksam-dong)
  • Employment Type: Full-time
  • Probationary Period
  • A three-month probationary period will apply upon employment.
  • During this period, your work attitude and performance will be evaluated.
  • Depending on the evaluation results, the probationary period may be extended or the employment offer may be withdrawn.



How to Apply

  • Application Materials:
  • Resume in English or Korean
  • (optional) Portfolio, research materials, or project documents showcasing your capabilities
  • Application Deadline: Rolling basis



Hiring Process

  • Document Screening → 1st Interview → 2nd Interview → 3rd Interview → Final Offer
  • Candidates who pass the document screening will be contacted individually.
  • Additional Coffee Chats or Coding Test may be conducted if necessary.



Work Environment & Support

  • Flexible Work Schedule: Adjust your working hours autonomously to match your personal rhythm.
  • Equipment & Software Support: We provide job-specific equipment and essential software required for your role.
  • Office Amenities: Enjoy our in-office snack bar and coffee machines.
  • Holiday & Birthday Gifts: Small gifts are provided for holidays and birthdays.
  • Health Checkup Support: We support your well-being through regular health checkups.