Technology giant Mark Zuckerberg continues to invest heavily in cutting-edge technologies, including artificial intelligence. Meta, owned by him, has launched ImageBind, an AI tool capable of assimilating six different modalities at the same time: images, videos, audio, depth, thermal and spatial movements.
With this tool, Meta sets itself apart from its main competitors, which only process text, image, video and audio. ImageBind, on the other hand, can replicate the sensory experience of humans by learning to link multiple inputs. And according to Zuckerberg's company, this is just the beginning.
Focus on ImageBind, Meta's new AI tool
ImageBind is designed to help machines understand the environment around them more deeply. According to Meta, it can even update existing AI models to support input from any of six modalities, enabling audio-based search, multimodal search, multimodal arithmetic and multimodal generation.
The tool is available on the Meta open source site for developers to use or test.
Meta challenges the limits of multimodal learning
The Meta team is convinced that this multimodal approach to AI is the future. Meta wants to take it a step further by adding other senses, including touch, speech, smell and fMRI brain signals. For now, research in this area is still ongoing, and a lot of work remains to be done.
Meta notes, “There is still much to be discovered about multimodal learning. The AI research community has yet to effectively quantify scaling behaviors that only appear in larger models and understand their applications.”
Meta continues to push the boundaries of artificial intelligence with projects such as ImageBind and LLaMa, an AI tool for training and improving natural language processing models. In this way, Meta is contributing to the growth of AI by striving to make it smarter and more responsive.