Skip to content

AI Learns How To Play Physically Simulated Tennis At Grandmaster Level By Watching Tennis Matches

FurkanGozukara edited this page Oct 25, 2025 · 1 revision

AI Learns How To Play Physically Simulated Tennis At Grandmaster Level By Watching Tennis Matches

AI Learns How To Play Physically Simulated Tennis At Grandmaster Level By Watching Tennis Matches

image Hits Patreon BuyMeACoffee Furkan Gözükara Medium Codio Furkan Gözükara Medium

YouTube Channel Furkan Gözükara LinkedIn Udemy Twitter Follow Furkan Gözükara

A system has been developed that can learn a range of physically simulated tennis skills from a vast collection of broadcast video demonstrations of tennis play. The system employs hierarchical models that combine a low-level imitation policy and a high-level motion planning policy to control the character's movements based on motion embeddings learned from the broadcast videos. By utilizing simple rewards and without the need for explicit annotations of stroke types, the system is capable of learning complex tennis shotmaking skills and stringing together multiple shots into extended rallies.

To account for the low quality of motions extracted from the broadcast videos, the system utilizes physics-based imitation to correct estimated motion and a hybrid control policy that overrides erroneous aspects of the learned motion embedding with corrections predicted by the high-level policy. The resulting controllers for physically-simulated tennis players are able to hit the incoming ball to target positions accurately using a diverse array of strokes (such as serves, forehands, and backhands), spins (including topspins and slices), and playing styles (such as one/two-handed backhands and left/right-handed play).

Overall, the system is able to synthesize two physically simulated characters playing extended tennis rallies with simulated racket and ball dynamics, demonstrating the effectiveness of the approach.

Paper link ⤵️

https://research.nvidia.com/labs/toronto-ai/vid2player3d/

Our Discord server ⤵️

https://bit.ly/SECoursesDiscord

If I have been of assistance to you and you would like to show your support for my work, please consider becoming a patron on 🥰 ⤵️

https://www.patreon.com/SECourses

Technology & Science: News, Tips, Tutorials, Tricks, Best Applications, Guides, Reviews ⤵️

https://www.youtube.com/playlist?list=PL_pbwdIyffsnkay6X91BWb9rrfLATUMr3

Playlist of StableDiffusion Tutorials, Automatic1111 and Google Colab Guides, DreamBooth, Textual Inversion / Embedding, LoRA, AI Upscaling, Pix2Pix, Img2Img ⤵️

https://www.youtube.com/playlist?list=PL_pbwdIyffsmclLl0O144nQRnezKlNdx3

00:00:00 Introduction to amazing new AI technology that can learn playing tennis

00:00:18 The permission to upload video

00:00:26 The video of the paper starts with introduction

00:01:08 Motion capture has been the most common source of motion data for character animation

00:02:13 System Overview

00:03:07 Approach

00:05:00 Complex and Diverse Skills

00:06:05 Task Performance

00:06:46 Styles from Different Players

00:07:16 Two-Player Rallies

00:08:13 Ablation of Physics Correction

00:08:36 Ablation of Hybrid Control

00:08:58 Effects of Removing Residual Force Control

Computer animation faces a major challenge in developing controllers for physics-based character simulation and control. In recent years, a combination of deep reinforcement learning (DRL) and motion imitation techniques has yielded simulated characters with lifelike motions and athletic abilities. However, these systems typically rely on costly motion capture (mocap) data as a source of kinematic motions to imitate. Fortunately, video footage of athletic events is abundant and offers a rich source of in-activity motion data. This inspired a research paper by Zhang et al. that explores how video data can be leveraged to learn tennis skills.

The authors seek to answer several key questions, including how to use large-scale video databases of 3D tennis motion to produce controllers that can play full tennis rallies with simulated racket and ball dynamics, how to use state-of-the-art methods in data-driven and physically-based character animation to learn skills from video data, and how to learn character controllers with a diverse set of skills without explicit skill annotations.

To tackle these challenges, the authors propose a system that builds upon recent ideas in hierarchical physics-based character control. Their approach involves leveraging motions produced by physics-based imitation of example videos to learn a rich motion embedding for tennis actions. They then train a high-level motion controller that steers the character in the latent motion space to achieve higher-level task objectives, with low-level movements controlled by the imitation controller.

The system also addresses motion quality issues caused by perception errors in the learned motion embedding.

@article{

zhang2023vid2player3d,

author = {Zhang, Haotian and Yuan, Ye and Makoviychuk, Viktor and Guo, Yunrong and Fidler, Sanja and Peng, Xue Bin and Fatahalian, Kayvon},

title = {Learning Physically Simulated Tennis Skills from Broadcast Videos},

journal = {ACM Trans. Graph.},

issue_date = {August 2023},

numpages = {14},

doi = {10.1145/3592408},

publisher = {ACM},

address = {New York, NY, USA},

keywords = {physics-based character animation, imitation learning, reinforcement learning},

}

Video Transcription

  • 00:00:00 Greetings everyone. A group of brilliant  researchers has recently published a new  

  • 00:00:05 research paper that enables AI to learn physically  simulated tennis skills from broadcast videos.  

  • 00:00:11 Today I am excited to share their paper's amazing  supplementary video in 4K super upscaled format. I  

  • 00:00:18 have obtained permission from the primary author  of the paper and you can also find the link to  

  • 00:00:24 the paper in the video's description. In this  paper, we present a system that allows physically  

  • 00:00:29 simulated characters to learn diverse and complex  tennis skills from broadcast tennis videos.  

  • 00:00:37 Our simulated characters can hit  consecutive incoming tennis balls  

  • 00:00:41 with a variety of tennis skills such as serve,  forehand and backhand, topspin, and slice.  

  • 00:00:49 And the motions we generate resemble those of  human players. The controllers can also be trained  

  • 00:00:55 using different players motion data, enabling  the characters to adopt different playing styles.  

  • 00:01:08 Motion capture has been the most common source  of motion data for character animation. While  

  • 00:01:13 MoCap is able to record high-quality data, it  can be difficult to use these systems to record  

  • 00:01:18 athletic motion, which can require large  capture volumes and highly skilled actors.  

  • 00:01:25 On the other hand, human athletes are frequently  recorded in videos, especially for sports. These  

  • 00:01:32 videos have the potential to be a valuable  source of data for character animation by  

  • 00:01:37 providing a vast volume of inactivity data of a  highly specialized athletic motion. Despite being  

  • 00:01:44 large scale, the motion estimated from videos are  usually in lower quality compared to mocap data.  

  • 00:01:51 While prior works have demonstrated  learning skills from videos,  

  • 00:01:55 they are limited to reproducing short video clips.  State-of-the-art, data-driven animation techniques  

  • 00:02:02 typically require high-quality motion data.  Directly applying these methods to video data  

  • 00:02:08 may not produce natural human-like motions,  and motions may not be precise enough to hit  

  • 00:02:13 incoming tennis balls close to desired  locations. To enable characters to learn  

  • 00:02:19 skills from sports videos, we present a video  imitation system that consists of four stages.  

  • 00:02:27 First, we estimate kinematic motions from source  video clips. Secondly, a low-level imitation  

  • 00:02:34 policy is trained to imitate the kinematic  motion for controlling the low-level behaviors  

  • 00:02:39 of the simulated character and generate physically  corrected motion. Next, we fit conditional VAEs to  

  • 00:02:47 the corrected motion to learn a motion embedding  that produces human-like tennis motions. Finally,  

  • 00:02:54 a high-level motion planning policy is trained  to generate target kinematic motion from the  

  • 00:02:59 motion embedding, and then control a physically  simulated character to perform a desired task.  

  • 00:03:09 To build our tennis motion data set from  raw videos, we use a combination of 2D and  

  • 00:03:15 3D pose estimators to reconstruct the  player's poses and route trajectories.  

  • 00:03:23 However, the estimated kinematic motions  are pretty noisy, with jittering and foot  

  • 00:03:28 skating artifacts. More importantly, the wrist  motion for controlling the racket is inaccurate,  

  • 00:03:36 since it is difficult to estimate the wrist or the  racket motion due to occlusion and motion blur.  

  • 00:03:47 To address these artifacts, we train a low-level  imitation policy to control a physically  

  • 00:03:54 simulated character to track these noisy kinematic  motions and output physically corrected motions.  

  • 00:04:01 The resulting motions after correction are more  physically plausible and stable compared to the  

  • 00:04:06 original kinematic motions. With the corrected  motion dataset, we can construct a kinematic  

  • 00:04:14 motion embedding by fitting conditional VAEs to  the motion data. Given the same initial pose,  

  • 00:04:21 diverse motions can be generated by  sampling different trajectories of latency.  

  • 00:04:29 An additional benefit of the  motion embedding is that it can  

  • 00:04:32 help smooth the motions and mitigate some of the  jittering artifacts in the original motion data.  

  • 00:04:43 To address the inaccuracies in the wrist joint for  precise control of the racket, we propose a hybrid  

  • 00:04:48 control structure where the full body motion  is controlled by the reference trajectories  

  • 00:04:53 from the motion embedding, while the wrist motion  is directly controlled by the high-level policy.  

  • 00:05:03 With our system, various tennis  skills can be learned such as serve,  

  • 00:05:07 forehand topspin, backhand topspin,  and backhand slice. These skills are  

  • 00:05:15 learned using data from a right-handed  player who used a one-handed backhand.  

  • 00:05:27 The simulated character can hit fast-coming  tennis balls with diverse and complex skills.  

  • 00:05:38 When given a target spin direction,  

  • 00:05:40 such as a backspin, the character  will hit the ball with a slice.  

  • 00:05:50 Here we visualize the skills with the  character model used in our physics simulation.  

  • 00:06:09 The simulated characters can hit incoming tennis  balls close to random target locations with high  

  • 00:06:15 precision. They can hit the same incoming  tennis ball to various target locations,  

  • 00:06:25 or hit different incoming  tennis balls to the same target.  

  • 00:06:31 In the extreme cases, the simulated characters can  still complete the task with exceptional skill,  

  • 00:06:37 such as hitting consecutive balls that land on  the court edges. When constructing the motion  

  • 00:06:44 embedding with different players' motion, the  simulated character can learn tennis skills in  

  • 00:06:53 different styles, such as a two-hand backhand  swing learned using data from a right-handed  

  • 00:06:58 player who used a two-hand backhand,  or holding the racket with the left  

  • 00:07:04 hand learned using data from a left-handed  player who also used a two-hand backhand.  

  • 00:07:18 The learned controllers can further generate novel  animations of tennis rallies between two players.  

  • 00:07:26 This rally is generated using controllers  learned from two right-handed players.  

  • 00:07:42 This rally is generated using controllers learned  

  • 00:07:45 from a left-handed player  and a right-handed player.  

  • 00:08:12 The physics correction is essential for  constructing a good motion embedding for  

  • 00:08:16 generating natural tennis motions. Directly  training the embedding from the uncorrected  

  • 00:08:21 kinematic motions will result in physically  implausible motion that exhibits artifacts  

  • 00:08:27 such as foot skating and jittering. It also  decreases precision when hitting the tennis balls.  

  • 00:08:35 The proposed hybrid control is  crucial for precisely controlling  

  • 00:08:39 the tennis racket. Without the hybrid  control to correct the wrist motions,  

  • 00:08:43 the simulated character may hit the ball,  but fail to return it close to the target.  

  • 00:09:20 More details are available in the  paper. Thank you for watching.

Clone this wiki locally