-
-
Notifications
You must be signed in to change notification settings - Fork 358
AI Learns How To Play Physically Simulated Tennis At Grandmaster Level By Watching Tennis Matches
Full tutorial link > https://www.youtube.com/watch?v=m8W4l-peEBk
A system has been developed that can learn a range of physically simulated tennis skills from a vast collection of broadcast video demonstrations of tennis play. The system employs hierarchical models that combine a low-level imitation policy and a high-level motion planning policy to control the character's movements based on motion embeddings learned from the broadcast videos. By utilizing simple rewards and without the need for explicit annotations of stroke types, the system is capable of learning complex tennis shotmaking skills and stringing together multiple shots into extended rallies.
To account for the low quality of motions extracted from the broadcast videos, the system utilizes physics-based imitation to correct estimated motion and a hybrid control policy that overrides erroneous aspects of the learned motion embedding with corrections predicted by the high-level policy. The resulting controllers for physically-simulated tennis players are able to hit the incoming ball to target positions accurately using a diverse array of strokes (such as serves, forehands, and backhands), spins (including topspins and slices), and playing styles (such as one/two-handed backhands and left/right-handed play).
Overall, the system is able to synthesize two physically simulated characters playing extended tennis rallies with simulated racket and ball dynamics, demonstrating the effectiveness of the approach.
Paper link
https://research.nvidia.com/labs/toronto-ai/vid2player3d/
Our Discord server
https://bit.ly/SECoursesDiscord
If I have been of assistance to you and you would like to show your support for my work, please consider becoming a patron on 🥰
https://www.patreon.com/SECourses
Technology & Science: News, Tips, Tutorials, Tricks, Best Applications, Guides, Reviews
https://www.youtube.com/playlist?list=PL_pbwdIyffsnkay6X91BWb9rrfLATUMr3
Playlist of StableDiffusion Tutorials, Automatic1111 and Google Colab Guides, DreamBooth, Textual Inversion / Embedding, LoRA, AI Upscaling, Pix2Pix, Img2Img
https://www.youtube.com/playlist?list=PL_pbwdIyffsmclLl0O144nQRnezKlNdx3
00:00:00 Introduction to amazing new AI technology that can learn playing tennis
00:00:18 The permission to upload video
00:00:26 The video of the paper starts with introduction
00:01:08 Motion capture has been the most common source of motion data for character animation
00:02:13 System Overview
00:03:07 Approach
00:05:00 Complex and Diverse Skills
00:06:05 Task Performance
00:06:46 Styles from Different Players
00:07:16 Two-Player Rallies
00:08:13 Ablation of Physics Correction
00:08:36 Ablation of Hybrid Control
00:08:58 Effects of Removing Residual Force Control
Computer animation faces a major challenge in developing controllers for physics-based character simulation and control. In recent years, a combination of deep reinforcement learning (DRL) and motion imitation techniques has yielded simulated characters with lifelike motions and athletic abilities. However, these systems typically rely on costly motion capture (mocap) data as a source of kinematic motions to imitate. Fortunately, video footage of athletic events is abundant and offers a rich source of in-activity motion data. This inspired a research paper by Zhang et al. that explores how video data can be leveraged to learn tennis skills.
The authors seek to answer several key questions, including how to use large-scale video databases of 3D tennis motion to produce controllers that can play full tennis rallies with simulated racket and ball dynamics, how to use state-of-the-art methods in data-driven and physically-based character animation to learn skills from video data, and how to learn character controllers with a diverse set of skills without explicit skill annotations.
To tackle these challenges, the authors propose a system that builds upon recent ideas in hierarchical physics-based character control. Their approach involves leveraging motions produced by physics-based imitation of example videos to learn a rich motion embedding for tennis actions. They then train a high-level motion controller that steers the character in the latent motion space to achieve higher-level task objectives, with low-level movements controlled by the imitation controller.
The system also addresses motion quality issues caused by perception errors in the learned motion embedding.
@article{
zhang2023vid2player3d,
author = {Zhang, Haotian and Yuan, Ye and Makoviychuk, Viktor and Guo, Yunrong and Fidler, Sanja and Peng, Xue Bin and Fatahalian, Kayvon},
title = {Learning Physically Simulated Tennis Skills from Broadcast Videos},
journal = {ACM Trans. Graph.},
issue_date = {August 2023},
numpages = {14},
doi = {10.1145/3592408},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {physics-based character animation, imitation learning, reinforcement learning},
}
-
00:00:00 Greetings everyone. A group of brilliant researchers has recently published a new
-
00:00:05 research paper that enables AI to learn physically simulated tennis skills from broadcast videos.
-
00:00:11 Today I am excited to share their paper's amazing supplementary video in 4K super upscaled format. I
-
00:00:18 have obtained permission from the primary author of the paper and you can also find the link to
-
00:00:24 the paper in the video's description. In this paper, we present a system that allows physically
-
00:00:29 simulated characters to learn diverse and complex tennis skills from broadcast tennis videos.
-
00:00:37 Our simulated characters can hit consecutive incoming tennis balls
-
00:00:41 with a variety of tennis skills such as serve, forehand and backhand, topspin, and slice.
-
00:00:49 And the motions we generate resemble those of human players. The controllers can also be trained
-
00:00:55 using different players motion data, enabling the characters to adopt different playing styles.
-
00:01:08 Motion capture has been the most common source of motion data for character animation. While
-
00:01:13 MoCap is able to record high-quality data, it can be difficult to use these systems to record
-
00:01:18 athletic motion, which can require large capture volumes and highly skilled actors.
-
00:01:25 On the other hand, human athletes are frequently recorded in videos, especially for sports. These
-
00:01:32 videos have the potential to be a valuable source of data for character animation by
-
00:01:37 providing a vast volume of inactivity data of a highly specialized athletic motion. Despite being
-
00:01:44 large scale, the motion estimated from videos are usually in lower quality compared to mocap data.
-
00:01:51 While prior works have demonstrated learning skills from videos,
-
00:01:55 they are limited to reproducing short video clips. State-of-the-art, data-driven animation techniques
-
00:02:02 typically require high-quality motion data. Directly applying these methods to video data
-
00:02:08 may not produce natural human-like motions, and motions may not be precise enough to hit
-
00:02:13 incoming tennis balls close to desired locations. To enable characters to learn
-
00:02:19 skills from sports videos, we present a video imitation system that consists of four stages.
-
00:02:27 First, we estimate kinematic motions from source video clips. Secondly, a low-level imitation
-
00:02:34 policy is trained to imitate the kinematic motion for controlling the low-level behaviors
-
00:02:39 of the simulated character and generate physically corrected motion. Next, we fit conditional VAEs to
-
00:02:47 the corrected motion to learn a motion embedding that produces human-like tennis motions. Finally,
-
00:02:54 a high-level motion planning policy is trained to generate target kinematic motion from the
-
00:02:59 motion embedding, and then control a physically simulated character to perform a desired task.
-
00:03:09 To build our tennis motion data set from raw videos, we use a combination of 2D and
-
00:03:15 3D pose estimators to reconstruct the player's poses and route trajectories.
-
00:03:23 However, the estimated kinematic motions are pretty noisy, with jittering and foot
-
00:03:28 skating artifacts. More importantly, the wrist motion for controlling the racket is inaccurate,
-
00:03:36 since it is difficult to estimate the wrist or the racket motion due to occlusion and motion blur.
-
00:03:47 To address these artifacts, we train a low-level imitation policy to control a physically
-
00:03:54 simulated character to track these noisy kinematic motions and output physically corrected motions.
-
00:04:01 The resulting motions after correction are more physically plausible and stable compared to the
-
00:04:06 original kinematic motions. With the corrected motion dataset, we can construct a kinematic
-
00:04:14 motion embedding by fitting conditional VAEs to the motion data. Given the same initial pose,
-
00:04:21 diverse motions can be generated by sampling different trajectories of latency.
-
00:04:29 An additional benefit of the motion embedding is that it can
-
00:04:32 help smooth the motions and mitigate some of the jittering artifacts in the original motion data.
-
00:04:43 To address the inaccuracies in the wrist joint for precise control of the racket, we propose a hybrid
-
00:04:48 control structure where the full body motion is controlled by the reference trajectories
-
00:04:53 from the motion embedding, while the wrist motion is directly controlled by the high-level policy.
-
00:05:03 With our system, various tennis skills can be learned such as serve,
-
00:05:07 forehand topspin, backhand topspin, and backhand slice. These skills are
-
00:05:15 learned using data from a right-handed player who used a one-handed backhand.
-
00:05:27 The simulated character can hit fast-coming tennis balls with diverse and complex skills.
-
00:05:38 When given a target spin direction,
-
00:05:40 such as a backspin, the character will hit the ball with a slice.
-
00:05:50 Here we visualize the skills with the character model used in our physics simulation.
-
00:06:09 The simulated characters can hit incoming tennis balls close to random target locations with high
-
00:06:15 precision. They can hit the same incoming tennis ball to various target locations,
-
00:06:25 or hit different incoming tennis balls to the same target.
-
00:06:31 In the extreme cases, the simulated characters can still complete the task with exceptional skill,
-
00:06:37 such as hitting consecutive balls that land on the court edges. When constructing the motion
-
00:06:44 embedding with different players' motion, the simulated character can learn tennis skills in
-
00:06:53 different styles, such as a two-hand backhand swing learned using data from a right-handed
-
00:06:58 player who used a two-hand backhand, or holding the racket with the left
-
00:07:04 hand learned using data from a left-handed player who also used a two-hand backhand.
-
00:07:18 The learned controllers can further generate novel animations of tennis rallies between two players.
-
00:07:26 This rally is generated using controllers learned from two right-handed players.
-
00:07:42 This rally is generated using controllers learned
-
00:07:45 from a left-handed player and a right-handed player.
-
00:08:12 The physics correction is essential for constructing a good motion embedding for
-
00:08:16 generating natural tennis motions. Directly training the embedding from the uncorrected
-
00:08:21 kinematic motions will result in physically implausible motion that exhibits artifacts
-
00:08:27 such as foot skating and jittering. It also decreases precision when hitting the tennis balls.
-
00:08:35 The proposed hybrid control is crucial for precisely controlling
-
00:08:39 the tennis racket. Without the hybrid control to correct the wrist motions,
-
00:08:43 the simulated character may hit the ball, but fail to return it close to the target.
-
00:09:20 More details are available in the paper. Thank you for watching.
