Meta is challenging Google and Microsoft with ImageBind, a multi-sensory artificial intelligence (AI) that learns like humans. This revolutionary AI combines six types of data to outperform all its competitors. Let's dive into this fascinating world and discover how this technology could change the way we perceive the world.
Multi-sensory AI for a complete experience
Meta takes a giant step forward in the race for the best artificial intelligence by introducing ImageBind. This AI model seeks to learn like humans, taking a multisensory approach involving images, text, video and audio, as well as depth, thermal and inertial data.
ImageBind is part of Meta's initiative to create multimodal systems that can learn from different types of data. This AI not only understands one feature, but is also able to relate it to other features. For example, it can determine sound, shape, temperature, and how objects in a photograph are moving.
An AI that outperforms other models
According to Meta, ImageBind outperforms other AI models trained for a particular modality. Unlike generative AIs like ChatGPT or Midjourney, this alternative links six types of data into a multidimensional index. Researchers could use any of these as input or cross-reference each other.
Association learning: the key to success
ImageBind stands out for its use of a learning concept similar to that of humans. Meta explains that “as humans absorb information from the world, we innately use multiple senses.” The company claims that we are able to generate sensory experiences by viewing an image.
“ImageBind uses the binding property of images, which means that they coexist with a variety of modalities and can serve as a bridge to connect them,” Meta adds. For example, this AI can link text to an image using web data or link motion to video using video data captured by wearable cameras with IMU sensors.
Promising applications for all
Research shows that Meta's model can improve using few training examples. While the initial results are promising, it will still be some time before we see applications similar to ChatGPT using ImageBind. However, that doesn't stop the company from talking about the possibilities of the technology.
For example, ImageBind could generate an audio track tailored to a video of the sea that you recorded while on vacation. Or create a virtual reality experience that simulates a boat trip and adds all the necessary elements to make it immersive. Designers could even create animated shorts from an image and an audio file.
An open source project for collaborative development
Meta has announced that ImageBind will be an open source project, allowing interested people to access the repository on GitHub. Unlike OpenAI, the tech giant confirmed that it would maintain its strategy of opening up the code to everyone for the purpose of improving it or detecting errors. Thus, this initiative promises to stimulate innovation and accelerate the development of this revolutionary AI for the benefit of all.