{"id":92144,"date":"2025-03-14T02:57:32","date_gmt":"2025-03-14T02:57:32","guid":{"rendered":"https:\/\/neclink.com\/index.php\/2025\/03\/14\/sesame-the-startup-behind-the-viral-virtual-assistant-maya-releases-its-base-ai-model\/"},"modified":"2025-03-14T02:57:32","modified_gmt":"2025-03-14T02:57:32","slug":"sesame-the-startup-behind-the-viral-virtual-assistant-maya-releases-its-base-ai-model","status":"publish","type":"post","link":"https:\/\/neclink.com\/index.php\/2025\/03\/14\/sesame-the-startup-behind-the-viral-virtual-assistant-maya-releases-its-base-ai-model\/","title":{"rendered":"Sesame, the startup behind the viral virtual assistant Maya, releases its base AI model"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">AI company\u00a0<a href=\"https:\/\/www.sesame.com\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Sesame<\/a>\u00a0has released the base model that powers Maya, the\u00a0<a href=\"https:\/\/www.theverge.com\/news\/621022\/sesame-voice-assistant-ai-glasses-oculus-brendan-iribe\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">impressively realistic voice assistant<\/a>.<\/p>\n<p class=\"wp-block-paragraph\">The model, which is 1 billion parameters in size (\u201cparameters\u201d referring to individual components of the model), is under an Apache 2.0 license, meaning it can be used commercially with few restrictions. Called CSM-1B, the model generates \u201cRVQ audio codes\u201d from text and audio inputs, according to <a href=\"https:\/\/huggingface.co\/sesame\/csm-1b\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Sesame\u2019s description on the AI dev platform Hugging Face<\/a>.<\/p>\n<p class=\"wp-block-paragraph\">RVQ refers to \u201cresidual vector quantization,\u201d a technique for encoding audio into discrete tokens called codes. RVQ is used <a href=\"https:\/\/drscotthawley.github.io\/blog\/posts\/2023-06-12-RVQ.html\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">in a number of recent AI audio technologies<\/a>, including Google\u2019s SoundStream and Meta\u2019s Encodec.<\/p>\n<p class=\"wp-block-paragraph\">CSM-1B uses a model from <a href=\"https:\/\/techcrunch.com\/2024\/09\/08\/meta-llama-everything-you-need-to-know-about-the-open-generative-ai-model\/\">Meta\u2019s Llama family<\/a> as its backbone paired with an audio \u201cdecoder\u201d component. A fine-tuned variant of CSM powers Maya, Sesame says.<\/p>\n<p class=\"wp-block-paragraph\">\u201cThe model open-sourced here is a base generation model,\u201d Sesame writes in CSM-1B\u2019s <a href=\"https:\/\/huggingface.co\/sesame\/csm-1b\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Hugging Face<\/a> and <a href=\"https:\/\/github.com\/SesameAILabs\/csm?tab=readme-ov-file\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">GitHub<\/a> repositories. \u201cIt is capable of producing a variety of voices, but it has not been fine-tuned on any specific voice [\u2026] The model has some capacity for non-English languages due to data contamination in the training data, but it likely won\u2019t do well.\u201d<\/p>\n<p class=\"wp-block-paragraph\">It\u2019s unclear what data Sesame used to train CSM-1B. The company didn\u2019t say. <\/p>\n<p class=\"wp-block-paragraph\">It\u2019s worth noting the model has no real safeguards to speak of. Sesame has an honor system and merely urges developers and users not to use the model to mimic a person\u2019s voice without their consent, create misleading content like fake news, or engage in \u201charmful\u201d or \u201cmalicious\u201d activities.<\/p>\n<p class=\"wp-block-paragraph\">I tried <a href=\"https:\/\/huggingface.co\/spaces\/sesame\/csm-1b\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">the demo<\/a> on Hugging Face, and cloning my voice took less than a minute. From there, it was easy to generate speech to my heart\u2019s desire, including on controversial topics like the election and Russian propaganda.<\/p>\n<figure class=\"wp-block-embed is-type-rich is-provider-soundcloud wp-block-embed-soundcloud\"\/>\n<p class=\"wp-block-paragraph\">Consumer Reports recently warned that many popular AI-powered voice cloning tools on the market <a href=\"https:\/\/www.consumerreports.org\/media-room\/press-releases\/2025\/03\/consumer-reports-assessment-of-ai-voice-cloning-products\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">don\u2019t have \u201cmeaningful\u201d safeguards<\/a> to prevent fraud or abuse.<\/p>\n<p class=\"wp-block-paragraph\">Sesame, co-founded by Oculus co-creator Brendan Iribe, went viral in late February for its assistant tech, which comes close to clearing uncanny valley territory. Maya and Sesame\u2019s other assistant, Miles, take breaths and speak with disfluencies, and can be interrupted while speaking, <a href=\"https:\/\/techcrunch.com\/2024\/08\/17\/openais-new-voice-mode-let-me-talk-with-my-phone-not-to-it\/\">much like OpenAI\u2019s Voice Mode<\/a>. <\/p>\n<p class=\"wp-block-paragraph\">Sesame has raised an undisclosed amount of capital from Andreessen Horowitz, Spark Capital, and Matrix Partners. In addition to building voice assistant tech, the company says it\u2019s prototyping AI glasses \u201cdesigned to be worn all day\u201d that\u2019ll be equipped with its custom models. <a href=\"https:\/\/platform.theverge.com\/wp-content\/uploads\/sites\/2\/2025\/02\/7.webp?quality=90&amp;strip=all&amp;crop=0,0,100,100\" target=\"_blank\" rel=\"noreferrer noopener nofollow\"><\/p>\n<\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/techcrunch.com\/2025\/03\/13\/sesame-the-startup-behind-the-viral-virtual-assistant-maya-releases-its-base-ai-model\/\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>AI company\u00a0Sesame\u00a0has released the base model that powers Maya, the\u00a0impressively realistic voice assistant. The model, which is 1 billion parameters in size (\u201cparameters\u201d referring to<\/p>\n","protected":false},"author":1,"featured_media":92145,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[149],"tags":[],"class_list":["post-92144","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-business"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/posts\/92144","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/comments?post=92144"}],"version-history":[{"count":0,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/posts\/92144\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/media\/92145"}],"wp:attachment":[{"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/media?parent=92144"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/categories?post=92144"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/tags?post=92144"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}