{"id":21294,"date":"2023-08-30T10:17:52","date_gmt":"2023-08-30T10:17:52","guid":{"rendered":"https:\/\/web3unplugged.io\/blog\/?p=21294"},"modified":"2023-08-30T10:17:54","modified_gmt":"2023-08-30T10:17:54","slug":"g42s-inception-releases-arabic-language-open-source-al-model-jais","status":"publish","type":"post","link":"https:\/\/web3unplugged.io\/blog\/g42s-inception-releases-arabic-language-open-source-al-model-jais\/","title":{"rendered":"G42&#8217;s Inception Releases Arabic Language Open-Source Al Model &#8216;Jais&#8217;"},"content":{"rendered":"\n<p>Inception, a G42 company dedicated to pushing the boundaries of AI, announced the open-source release of \u201cJais\u201d, the world\u2019s highest-quality Arabic Large Language Model. Jais is a 13-billion parameter model trained on a newly developed 395-billion-token Arabic and English dataset.<\/p>\n\n\n\n<p>With a name inspired by UAE\u2019s highest peak, Jais will bring the advantages of generative AI across the Arabic-speaking world. The model results from a collaboration between Inception,\u00a0Mohamed bin Zayed\u00a0University of Artificial Intelligence (MBZUAI) and Cerebras Systems. It was trained on Condor Galaxy; the recently announced multi-exaFLOP AI supercomputer built by G42 and Cerebras.<\/p>\n\n\n\n<p>Jais\u2019 release marks a significant milestone in the realm of AI for the Arabic world. It is a model homegrown in the UAE\u2019s capital, Abu Dhabi, offering more than 400 million Arabic speakers the opportunity to harness the potential of generative AI. It will facilitate and expedite innovation, highlighting Abu Dhabi\u2019s leading position as a hub for AI, innovation, culture preservation, and international collaboration.<\/p>\n\n\n\n<p>By open-sourcing Jais, Inception aims to engage the scientific, academic, and developer communities to accelerate the growth of a vibrant Arabic language AI ecosystem. This can serve as a model for other languages currently underrepresented in mainstream AI.<\/p>\n\n\n\n<p>\u201cWe believe that innovation thrives when we collaborate,\u201d said Andrew Jackson, CEO of Inception. \u201cWith this release, we are setting a new standard for AI advancement in the Middle East and ensuring that the Arabic language, with its depth and heritage, finds its voice within the AI landscape. Jais is a testament to our commitment to excellence and dedication to democratising AI and promoting innovation.\u201d<\/p>\n\n\n\n<p>Jais outperforms existing Arabic models by a sizable margin. It is also competitive with English models of similar size despite being trained on significantly less English data. This exciting result shows that the model\u2019s English component learned from the Arabic data and vice versa, opening a new era in LLM\u2019s development and training.<\/p>\n\n\n\n<p>MBZUAI President and University Professor Eric Xing said,\u201d Developing such a high-calibre Arabic LLM demanded cutting-edge AI research in addition to an in-depth and nuanced understanding of the Arabic language, its diversity and heritage, and the growing importance of LLMs across all echelons of society. Thanks to our research and partnerships with Inception and other top regional and global organisations, MBZUAI will continue pioneering LLMs that are efficient, effective, and accurate.\u201d<\/p>\n\n\n\n<p>Jais is a transformer-based large language model that incorporates many cutting-edge features, including ALiBi position embeddings, which enables the model to extrapolate to much longer inputs, providing better context handling and accuracy. Other state-of-the-art techniques include SwiGLU and maximal update parameterisation to improve the model\u2019s training efficiency and accuracy.<\/p>\n\n\n\n<p>Jais\u2019 training, fine-tuning, and evaluation were undertaken by an Inception\/MBZUAI joint team on the Condor Galaxy 1 (CG-1), the recently announced, state-of-the-art AI supercomputer co-developed by G42 and Cerebras Systems. The 13-billion parameter open-source model was trained on a unique and purpose-built dataset of 116 billion Arabic tokens designed to capture Arabic\u2019s complexity, nuance, and richness. It also included 279 billion English word tokens to increase the model\u2019s performance through cross-language transfer. Inception and MBZUAI will continue to expand and refine Jais as its user community grows.<\/p>\n\n\n\n<p>\u201cOur strategic partnership with G42 is already delivering pioneering results. A few weeks ago, we introduced the first multi-exaFLOP AI supercomputer, Condor Galaxy 1 (CG-1). Now, the partnership delivers another key breakthrough: the leading Arabic LLM for the open-source community,\u201d said Andrew Feldman, co-founder and CEO of Cerebras Systems. \u201cAt Cerebras, our passion is building groundbreaking technology. One of the great rewards is seeing the innovative ways it is used. Jais is a significant contribution to the international open-source community. It is also a testament to how incredibly easy CG-1 is to use and enables extremely rapid AI model development.\u201d<\/p>\n\n\n\n<p>Jais is available for download on Hugging Face. Users can also try Jais online upon registering interest on Jais\u2019 website and receiving an invite to access the playground environment.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Inception, a G42 company dedicated to pushing the boundaries of AI, announced the open-source release of \u201cJais\u201d, the world\u2019s highest-quality Arabic Large Language Model. Jais is a 13-billion parameter model trained on a newly developed 395-billion-token Arabic and English dataset. With a name inspired by UAE\u2019s highest peak, Jais will bring the advantages of generative [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":21296,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_seopress_robots_primary_cat":"none","_seopress_titles_title":"","_seopress_titles_desc":"","_seopress_robots_index":"","footnotes":""},"categories":[2],"tags":[],"class_list":["post-21294","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news"],"rttpg_featured_image_url":{"full":["https:\/\/web3unplugged.io\/blog\/wp-content\/uploads\/2023\/08\/Untitled-2-2.jpg",2500,1437,false],"landscape":["https:\/\/web3unplugged.io\/blog\/wp-content\/uploads\/2023\/08\/Untitled-2-2.jpg",2500,1437,false],"portraits":["https:\/\/web3unplugged.io\/blog\/wp-content\/uploads\/2023\/08\/Untitled-2-2.jpg",2500,1437,false],"thumbnail":["https:\/\/web3unplugged.io\/blog\/wp-content\/uploads\/2023\/08\/Untitled-2-2-150x150.jpg",150,150,true],"medium":["https:\/\/web3unplugged.io\/blog\/wp-content\/uploads\/2023\/08\/Untitled-2-2-300x172.jpg",300,172,true],"large":["https:\/\/web3unplugged.io\/blog\/wp-content\/uploads\/2023\/08\/Untitled-2-2-1024x589.jpg",1024,589,true],"1536x1536":["https:\/\/web3unplugged.io\/blog\/wp-content\/uploads\/2023\/08\/Untitled-2-2-1536x883.jpg",1536,883,true],"2048x2048":["https:\/\/web3unplugged.io\/blog\/wp-content\/uploads\/2023\/08\/Untitled-2-2-2048x1177.jpg",2048,1177,true],"post-thumbnail":["https:\/\/web3unplugged.io\/blog\/wp-content\/uploads\/2023\/08\/Untitled-2-2.jpg",731,420,false],"graptor-sq-xs":["https:\/\/web3unplugged.io\/blog\/wp-content\/uploads\/2023\/08\/Untitled-2-2.jpg",100,57,false]},"rttpg_author":{"display_name":"Admin CG","author_link":"https:\/\/web3unplugged.io\/blog\/author\/admin-cg\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/web3unplugged.io\/blog\/category\/news\/\" rel=\"category tag\">news<\/a>","rttpg_excerpt":"Inception, a G42 company dedicated to pushing the boundaries of AI, announced the open-source release of \u201cJais\u201d, the world\u2019s highest-quality Arabic Large Language Model. Jais is a 13-billion parameter model trained on a newly developed 395-billion-token Arabic and English dataset. With a name inspired by UAE\u2019s highest peak, Jais will bring the advantages of generative&hellip;","_links":{"self":[{"href":"https:\/\/web3unplugged.io\/blog\/wp-json\/wp\/v2\/posts\/21294","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/web3unplugged.io\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/web3unplugged.io\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/web3unplugged.io\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/web3unplugged.io\/blog\/wp-json\/wp\/v2\/comments?post=21294"}],"version-history":[{"count":1,"href":"https:\/\/web3unplugged.io\/blog\/wp-json\/wp\/v2\/posts\/21294\/revisions"}],"predecessor-version":[{"id":21297,"href":"https:\/\/web3unplugged.io\/blog\/wp-json\/wp\/v2\/posts\/21294\/revisions\/21297"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/web3unplugged.io\/blog\/wp-json\/wp\/v2\/media\/21296"}],"wp:attachment":[{"href":"https:\/\/web3unplugged.io\/blog\/wp-json\/wp\/v2\/media?parent=21294"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/web3unplugged.io\/blog\/wp-json\/wp\/v2\/categories?post=21294"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/web3unplugged.io\/blog\/wp-json\/wp\/v2\/tags?post=21294"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}