Home » News » Google’s DeepMind Creates AI To Generate Videos From A Single Frame

Google’s DeepMind Creates AI To Generate Videos From A Single Frame

fb twitter pinterest linkedin
Google’s DeepMind Creates AI To Generate Videos From A Single Frame-GadgetAny

Google’s DeepMind neural network has shown that it can invent short videos from a single image frame, and it is pretty cool to understand its functionality.

As DeepMind noted on Twitter, the artificial intelligence model, named “Transframer” — that’s a riff on a “transformer,” a common type of AI tool that whips up text based on partial prompts — “excels in video prediction and view synthesis,” and is able to “generate 30 [second] videos from a single image.”

As the Transframer website notes, the AI makes its perspective videos by guessing the target images’ surroundings with “context images” — in brief, by correctly predicting what one of the chairs below would look like from different views and angles based on extensive training data that lets it “imagine” an actual object from another perspective. Also, Transframer can unify a wide array of tasks, including image segmentation, view synthesis, and video interpolation. 

Transframer works on different types of video generation benchmarks. The research team claims that it is a state-of-the-art model which is likely to be the strongest and most competitive on few-shot view synthesis, and can produce coherent 30-second videos from a single image.

Google’s DeepMind

 

This model is quite impressive as it seems to apply artificial depth perception and perspective to create what the image would look like if someone were to “move” around it, raising the possibility of entire video games based on machine learning techniques instead of conventional rendering.

The proposed model also yielded promising results on eight tasks, including image classification, semantic segmentation, and optical flow prediction with no task-specific architectural components. 

The transformer can also be used in different applications, which require learning conditional structure using text or a single image. It will be able to predict and generate video models, novel view synthesis, and multi-task vision.

GadgetAny
Saloni Behl

By Saloni Behl

I always had a crush on technology that's why I love reviewing the latest tech for the readers.

Leave a Reply

Your email address will not be published.

eleven + 16 =

Related news