Techfullnews

Meta Introduces Spirit LM Open Source Multimodal AI Model

SpiritLMOpenSource

Right in time for Halloween 2024, Meta has launched Meta Spirit LM, its first open-source multimodal language model capable of handling both text and speech inputs and outputs. This groundbreaking model directly challenges similar AI technologies such as OpenAI’s GPT-4 and Hume’s EVI 2, along with specific text-to-speech (TTS) and speech-to-text (ASR) tools like ElevenLabs.

The Future of AI Agents

Created by Meta’s Fundamental AI Research (FAIR) team, Spirit LM open source seeks to enhance AI voice systems by offering more natural and expressive speech generation. It also tackles multimodal tasks, including automatic speech recognition (ASR), text-to-speech (TTS), and speech classification.

However, for the time being, Spirit LM open source is only available for non-commercial use under Meta’s FAIR Noncommercial Research License. This allows researchers to modify and experiment with the model, but any commercial usage or redistribution of the models must adhere to the noncommercial stipulations.

A New Approach to Speech and Text AI

Most traditional AI voice models first convert spoken words into text using ASR, then process that text through a language model and finally use TTS to produce the spoken output. While this approach works, it often fails to capture the full emotional and tonal range of natural human speech.

Meta Spirit LM open source solves this issue by integrating phonetic, pitch, and tone tokens, allowing it to create more expressive and emotionally nuanced speech. The model is available in two variants:

Spirit LM Base: Focuses on phonetic tokens for speech generation and processing.

Spirit LM Expressive: Incorporates pitch and tone tokens to convey emotional cues such as excitement or sadness, bringing an added layer of expressiveness to speech.
Both models are trained on datasets that include both speech and text, allowing Spirit LM open source to excel in cross-modal tasks like converting text to speech and vice versa, all while maintaining the natural nuances of speech.

Fully Open-Source for Noncommercial Use

image 40

Consistent with Meta’s dedication to open research, Meta Spirit LM open source has been released for non-commercial research purposes. Developers and researchers have full access to the model weights, code, and accompanying documentation to advance their own projects and experiment with new applications.

Mark Zuckerberg, Meta’s CEO, has emphasized the importance of open-source AI, expressing that AI holds the potential to significantly enhance human productivity and creativity, and drive forward innovations in fields like medicine and science.

Potential Applications of Spirit LM Open Source

Meta Spirit LM open source is designed to handle a wide range of multimodal tasks, such as:

Automatic Speech Recognition (ASR): Converting spoken words into written text.
Text-to-Speech (TTS): Transforming written text into spoken words.
Speech Classification: Recognizing and categorizing speech based on content or emotional tone.

The Spirit LM Expressive model takes things further by not only recognizing emotions in speech but also generating responses that reflect emotional states like joy, surprise, or anger. This opens doors for more lifelike and engaging AI interactions in areas like virtual assistants and customer service systems.

Meta’s Larger AI Research Vision

Meta Spirit LM open source is part of a larger set of open tools and models that Meta FAIR has released. This includes advancements like Segment Anything Model (SAM) 2.1 for image and video segmentation, widely used across industries like medical imaging and meteorology, as well as research aimed at improving the efficiency of large language models.

Meta’s broader mission is to advance Advanced Machine Intelligence (AMI) while ensuring that AI tools are accessible to a global audience. For over a decade, the FAIR team has been leading research that aims to benefit not just the tech world but society at large.

What Lies Ahead for Meta Spirit LM Open Source?

With Meta Spirit LM open source, Meta is pushing the boundaries of what AI can achieve in integrating speech and text. By making the model open-source and focusing on a more human-like, expressive interaction, Meta is giving the research community the opportunity to explore new ways AI can bridge the gap between humans and machines.

Whether in ASR, TTS, or other AI-driven systems, Spirit LM open source represents a significant leap forward, shaping a future where AI-powered conversations and interactions feel more natural and engaging than ever before.

ADVERTISEMENT
RECOMMENDED
NEXT UP

About two years ago, Mark Zuckerberg rebranded his company from Facebook to Meta with a focus on building the “metaverse,” a virtual reality realm. However, since 2021, the metaverse’s popularity has waned, with companies like Disney shutting down metaverse divisions, and crypto-based startup metaverses facing challenges. In 2022, Meta’s Reality Labs division reported a significant operational loss of $13.7 billion.

Despite these setbacks, at Meta Connect 2023, Zuckerberg remains committed to the metaverse, albeit with a shift in emphasis. Previously, he envisioned the metaverse as a fully digital world, but now he emphasizes a blend of the physical and digital realms. Zuckerberg envisions a future where people can be physically present with friends while others join digitally as avatars or holograms, creating a seamless experience. He also sees scenarios where AI entities, embodied as holograms, assist in various tasks during meetings or gatherings.

While these ideas aren’t entirely new for Meta, the presentation marks a departure from Zuckerberg’s 2021 vision. Back then, he promised a decade where most people would immerse themselves in a 3D version of the internet using Meta’s Horizon Worlds platform. The latest keynote, however, shifts the focus to incorporating the virtual into everyday living spaces, such as solving puzzles or playing games in one’s living room.

image 261

The Horizon Worlds platform did make an appearance, but attention was also directed towards AI advancements. Zuckerberg highlighted new features powered by Meta’s AI technology, including AI chatbots for brainstorming and AI assistants integrated into Instagram, Messenger, or WhatsApp. These AI innovations are intended to propel the metaverse forward, although their immediate association with the term is not entirely clear.

While Meta faces challenges in redefining the metaverse and introducing AI features, the company cannot afford to abandon the concept. The term “metaverse” is intrinsic to its identity. However, the ambiguity surrounding the definition of the metaverse may work to Meta’s advantage, allowing them to encompass various technologies, from VR to AI, under the metaverse umbrella. Ultimately, whatever Meta pursues may be considered the metaverse, particularly from Mark Zuckerberg’s perspective.

Google has announced that it will be discontinuing the Basic HTML view in Gmail starting in January 2024. This is a significant change, as the Basic HTML view is a simplified version of Gmail that is often used by people with slow internet connections, older browsers, or visual impairments.

The Google representative who acknowledged the change said that Basic HTML views “were replaced by their modern successors 10+ years ago and do not include full Gmail feature functionality.” However, many blind and visually impaired users have raised concerns about the accessibility of Gmail’s Standard view.

image 241

Pratik Patel, an executive leadership coach who is blind, said in an email that many blind and partially sighted people find Gmail’s Standard view difficult to use “due to complex usage patterns, inaccessible design elements, and inefficient navigation.” He added that “people often find it quicker to accomplish tasks via the HTML interface as opposed to the standard one.”

image 240

Google has said that it is committed to giving users leading accessibility options, and that the Standard Gmail view is screen-reader compatible in all languages supported by Gmail. However, Patel argues that the Standard view is less usable due to inconsistencies and design decisions that go against established user interaction patterns.

What can be done?

Google has said that it will be notifying users about the change and how to switch to the Standard view before the Basic view is disabled. However, it is important to note that the Standard view is not fully accessible to all users.

If you are a blind or visually impaired user who relies on the Basic HTML view in Gmail, there are a few things you can do:

  • Contact Google and express your concerns. The more feedback Google receives about this issue, the more likely they are to take action.
  • Use a third-party email client. There are a number of third-party email clients that are more accessible than Gmail’s Standard view. Some popular options include Thunderbird, Outlook, and Mailspring.
  • Consider switching to a different email provider. There are a few email providers that offer a more accessible experience than Gmail. One popular option is ProtonMail, which is known for its strong focus on privacy and security.

It is important to note that there is no perfect solution for blind and visually impaired users. However, by taking the steps above, you can help to ensure that you have access to the email services you need.

ADVERTISEMENT
Receive the latest news

Subscribe To Our Weekly Newsletter

Get notified about new articles