-
-
Notifications
You must be signed in to change notification settings - Fork 358
Mind Blowing Dream To Video Could Be Coming With Stable Diffusion Video Rebuild From Brain Activity
Full tutorial link > https://www.youtube.com/watch?v=dmzdoMnuloo
In this groundbreaking video, we delve into the realm of mind-video and brain-activity reconstruction, bringing you an in-depth discussion on a new research paper titled, "Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity". This may open the doors of dream-to-video era.
If I have been of assistance to you and you would like to show your support for my work, please consider becoming a patron on 🥰
https://www.patreon.com/SECourses
Technology & Science: News, Tips, Tutorials, Tricks, Best Applications, Guides, Reviews
https://www.youtube.com/playlist?list=PL_pbwdIyffsnkay6X91BWb9rrfLATUMr3
Playlist of #StableDiffusion Tutorials, Automatic1111 and Google Colab Guides, DreamBooth, Textual Inversion / Embedding, LoRA, AI Upscaling, Pix2Pix, Img2Img
https://www.youtube.com/playlist?list=PL_pbwdIyffsmclLl0O144nQRnezKlNdx3
Research Paper
https://arxiv.org/pdf/2305.11675.pdf
Video Footage Source
This fascinating research explores the intersection of neurology, machine learning and video generation, aiming to understand and recreate the visual experiences directly from brain signals. Using advanced techniques such as masked brain modeling, multimodal contrastive learning and co-training with an augmented Stable Diffusion model, the MinD-Video approach seeks to convert functional Magnetic Resonance Imaging (fMRI) data into high-quality videos.
We dissect the various components of the MinD-Video methodology, focusing on the fMRI encoder and the video generative model. We also discuss the paper's innovative use of progressive learning and explain the pre-processing of the fMRI data for efficient results.
Further, we explore how the research attempts to address the challenges of time delays and individual variations in brain activity. We go in depth into each stage of the progressive learning applied to the fMRI encoder, from general to semantic-related features and from large-scale pre-training to contrastive learning.
Discover how the Stable Diffusion model is adapted for video generation, and how scene-dynamic sparse causal attention ensures smooth video transitions. We also cover the use of adversarial guidance in controlling the diversity of generated videos and how attention maps help visualize the learning process.
Perfect for anyone interested in neuroscience, machine learning or video generation, this video provides a comprehensive overview of a cutting-edge approach in brain-activity reconstruction. Expand your knowledge and join the discussion as we explore the future of mind-video.
For a more detailed understanding, the link to the full research paper is provided in the description. Stay curious, keep learning, and don't forget to like, comment, and subscribe for more exciting content.
Abstract
Reconstructing human vision from brain activities has been an appealing task that
helps to understand our cognitive process. Even though recent research has seen great
success in reconstructing static images from non-invasive brain recordings, work on
recovering continuous visual experiences in the form of videos is limited. In this work,
we propose MinD-Video that learns spatiotemporal information from continuous fMRI
data of the cerebral cortex progressively through masked brain modeling, multimodal
contrastive learning with spatiotemporal attention, and co-training with an augmented
Stable Diffusion model that incorporates network temporal inflation. We show that
high-quality videos of arbitrary frame rates can be reconstructed with MinD-Video
using adversarial guidance. The recovered videos were evaluated with various semantic
and pixel-level metrics. We achieved an average accuracy of 85% in semantic
classification tasks and 0.19 in structural similarity index (SSIM), outperforming the
previous state-of-the-art by 45%. We also show that our model is biologically plausible
and interpretable, reflecting established physiological processes.
Introduction
Life unfolds like a film reel, each moment seamlessly transitioning into the next, forming a “perpetual theater” of experiences. This dynamic narrative forms our perception, explored through the naturalistic paradigm, painting the brain as a moviegoer engrossed in the relentless film of experience. Understanding the information hidden within our complex brain activities is a big puzzle in cognitive neuroscience. The task of recreating human vision from brain recordings, especially using non-invasive tools like functional Magnetic Resonance Imaging (fMRI), is an exciting but difficult task. Non-invasive methods, while less intrusive, capture limited information, susceptible to various interferences like noise. Furthermore, the acquisition of neuroimaging data is a complex, costly process. Despite these complexities, progress has been made, notably in learning valuable fMRI features with limited fMRI-annotation pairs.
#MinDVideo #fMRI
-
00:00:00 Greetings everyone.
-
00:00:01 It looks like Dream to Video is coming soon.
-
00:00:03 Today, I will introduce you to a new research paper.
-
00:00:06 Mind-Video.
-
00:00:07 Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity.
-
00:00:12 The research paper focuses on reconstructing high-quality videos from brain activity, aiming
-
00:00:17 to understand the cognitive process and visual perception.
-
00:00:20 The proposed approach, called MinD-Video, utilizes masked brain modeling, multimodal
-
00:00:25 contrastive learning, and co-training with an augmented Stable Diffusion model to learn
-
00:00:31 spatiotemporal information from continuous functional Magnetic Resonance Imaging (fMRI)
-
00:00:36 data.
-
00:00:37 The paper focuses on composing human vision from brain recordings, particularly using
-
00:00:42 non-invasive tools like fMRI.
-
00:00:44 The unique challenge of reconstructing dynamic visual experiences from fMRI data is addressed,
-
00:00:51 considering the time delays in capturing brain activity and the variations in hemodynamic
-
00:00:55 response across individuals.
-
00:00:56 The MinD-Video methodology consists of two modules: an fMRI encoder and a video generative
-
00:01:03 model.
-
00:01:04 The fMRI encoder progressively learns from brain signals, starting with general visual
-
00:01:09 fMRI features obtained through large-scale unsupervised learning with masked brain modeling.
-
00:01:16 Semantic-related features are then distilled using multimodal contrastive learning in the
-
00:01:20 Contrastive Language-Image Pre-Training (CLIP) space.
-
00:01:24 The augmented stable diffusion model is employed for video generation, with scene-dynamic sparse
-
00:01:29 causal attention to handle scene changes and temporal constraints.
-
00:01:33 The fMRI data captured during visual stimuli is pre-processed to identify the regions of
-
00:01:39 interest (ROIs) in the visual cortex.
-
00:01:43 The activated voxels are determined through statistical tests, and the top 50% most significant
-
00:01:48 voxels are selected.
-
00:01:50 Progressive learning is employed as an efficient training scheme for the fMRI encoder.
-
00:01:55 The encoder undergoes multiple stages to learn fMRI features progressively, starting from
-
00:02:01 general features to more specific and semantic-related features.
-
00:02:05 Large-scale pre-training with masked brain modeling is utilized to learn general features
-
00:02:09 of the visual cortex.
-
00:02:11 An autoencoder architecture is trained on the Human Connectome Project dataset using
-
00:02:16 the visual cortex regions defined by a parcellation method.
-
00:02:19 The goal of this pre-training is to obtain rich and compact embeddings that describe
-
00:02:24 the original fMRI data effectively.
-
00:02:27 Spatiotemporal attention is introduced to process multiple fMRI frames in a sliding
-
00:02:32 window, considering the time delays caused by the hemodynamic response.
-
00:02:37 The augmented fMRI encoder is further trained using multimodal contrastive learning.
-
00:02:42 Triplets consisting of fMRI, video, and caption are used for training.
-
00:02:47 Videos are down sampled and captioned with the BLIP model.
-
00:02:51 Contrastive learning is applied to pull the fMRI embeddings closer to a shared CLIP space,
-
00:02:56 which contains rich semantic information.
-
00:02:59 The aim is to make the fMRI embeddings more understandable by the generative model during
-
00:03:03 conditioning.
-
00:03:04 The Stable Diffusion model is used as the base generative model, modified to handle
-
00:03:08 video generation.
-
00:03:11 Scene-dynamic sparse causal attention is employed to condition each video frame on its previous
-
00:03:15 two frames, allowing for scene changes while ensuring video smoothness.
-
00:03:20 Adversarial guidance is introduced to control the diversity of generated videos based on
-
00:03:25 positive and negative conditions.
-
00:03:27 The generative module is trained with the target dataset using text conditioning.
-
00:03:32 The paper aims to understand the biological principles of the decoding process.
-
00:03:37 Attention maps from different layers of the fMRI encoder are visualized to observe the
-
00:03:41 transition from capturing local relations to recognizing global, abstract features.
-
00:03:47 The attention maps are projected back to brain surface maps, enabling the observation of
-
00:03:52 each brain region's contributions and the learning progress through each training stage.
-
00:03:57 To learn more, please check the description for the link to the paper.
