ai models – TheNewsHub

Microsoft Phi-4 Open-Source Small Language Model Introduced; Claimed to Outperform Gemini 1.5 Pro

Ashish Singh — Fri, 13 Dec 2024 11:56:22 +0000

Microsoft on Friday released its Phi-4 artificial intelligence (AI) model. The company’s latest small language model (SLM) joins its open-source Phi family of foundational models. The AI model comes eight months after the release of Phi-3 and four months after the introduction of the Phi-3.5 series of AI models. The tech giant claims that the SLM is more capable of solving complex reasoning-based queries in areas such as mathematics. Additionally, it is also said to excel in conventional language processing.

Microsoft’s Phi-4 AI Model to Be Available via Hugging Face

So far, every Phi series has been launched with a mini variant, however, no mini model accompanied the Phi-4 model. Microsoft, in a blog post, highlighted that Phi-4 is currently available on Azure AI Foundry under a Microsoft Research Licence Agreement (MSRLA). The company plans to make it available on Hugging Face next week as well.

The company also shared benchmark scores from its internal testing. Based on these, the new AI model significantly upgrades the capabilities of the older generation model. The tech giant claimed that Phi-4 outperforms Gemini Pro 1.5, a much larger model, on the math competition problems benchmark. It also released a detailed benchmark performance in a technical paper published in the online journal arXiv.

On safety, Microsoft stated that the Azure AI Foundry comes with a set of capabilities to help organisations measure, mitigate, and manage AI risks across the development lifecycle for traditional machine learning and generative AI applications. Additionally, enterprise users can use Azure AI Content Safety features such as prompt shields, groundedness detection and others as a content filter.

Developers can also add these safety capabilities into their applications via a single application programming interface (API). The platform can monitor applications for quality and safety, adversarial prompt attacks, and data integrity and provide developers with real-time alerts. This will be available to those Phi users who access it via Azure.

Notably, smaller language models are often being trained after deployment on synthetic data, allowing them to quickly gain more knowledge and higher efficiency. However, post-training results are not always consistent in real-world use cases.

Google Introduces PaliGemma 2 Family of Open Source AI Vision-Language Models

Ashish Singh — Fri, 06 Dec 2024 10:59:01 +0000

Google introduced the successor to its PaliGemma artificial intelligence (AI) vision-language model on Thursday. Dubbed PaliGemma 2, the family of AI models improve upon the capabilities of the older generation. The Mountain View-based tech giant said the vision-language model can see, understand, and interact with visual input such as images and other visual assets. It is built using the Gemma 2 small language models (SLM) which were released in August. Interestingly, the tech giant claimed that the model can analyse emotions in the uploaded images.

Google PaliGemma AI Model

In a blog post, the tech giant detailed the new PaliGemma 2 AI model. While Google has several vision-language models, PaliGemma was the first such model in the Gemma family. Vision models are different from typical large language models (LLMs) in that they have additional encoders that can analyse visual content and convert it into familiar data form. This way, vision models can technically “see” and understand the external world.

One benefit of a smaller vision model is that it can be used for a large number of applications as smaller models are optimised for speed and accuracy. With PaliGemma 2 being open-sourced, developers can use its capabilities to build into apps.

The PaliGemma 2 comes in three different parameter sizes of 3 billion, 10 billion, and 28 billion. It is also available in 224p, 448p, 896p resolutions. Due to this, the tech giant claims that it is easy to optimise the AI model’s performance for a wide range of tasks. Google says it generates detailed, contextually relevant captions for images. It can not only identify objects but also describe actions, emotions, and overall narrative of the scene.

Google highlighted that the tool can be used for chemical formula recognition, music score recognition, spatial reasoning, and chest X-ray report generation. The company has also published a paper in the online pre-print journal arXiv.

Developers and AI enthusiasts can download the PaliGemma 2 model and its code on Hugging Face and Kaggle here and here. The AI model supports frameworks such as Hugging Face Transformers, Keras, PyTorch, JAX, and Gemma.cpp.

Amazon Web Services (AWS) Announces Nova Family of Multimodal AI Models

Ashish Singh — Wed, 04 Dec 2024 08:07:49 +0000

Amazon Web Services (AWS), the cloud computing division of the tech giant, introduced the Nova family of artificial intelligence (AI) models on Tuesday at its ongoing re:Invent conference. There are five different large language models (LLMs) under the Nova branding, with three of them capable of only text generation. Apart from this Nova also includes an image-generation model and a video-generation model. The company stated that the new generation of AI models comes with improved intelligence and competitive pricing, and is currently available on Amazon Bedrock.

AWS Introduces Nova AI Models

In a post, Amazon detailed the new generation of AI models. Currently, five different LLMs have been introduced as part of the Nova series, and company CEO Andy Jassy highlighted that a sixth AI model dubbed Nova Premier will be launched in 2025.

Among the five models, three — Nova Micro, Nova Lite, and Nova Pro — can only generate text. However, there are differences between the three models. Micro only accepts text as input and provides the lowest latency responses in the entire series. It has a context window of 1,28,000 tokens.

On the other hand, the Nova Lite accepts images, videos, and text as inputs but only generates text. Nova Pro is the most capable multimodal AI model in the trio and can complete a wider range of tasks compared to the other two. Both of these models have a context window of 3,00,000 tokens.

Apart from these, there are two more models in the Nova series that Amazon calls “creative content generation models”. First is Nova Canvas, an image generation model that accepts text and images as inputs. The company has touted it as a tool for advertising, marketing, and entertainment.

Finally, the Nova Reel is a video generation model that can generate short videos from text and image prompts. It also gives users the option to control camera motion with natural language prompts. All of these models are available for the company’s enterprise clients and can be availed from the Amazon Bedrock platform.

Lightricks Introduces Open-Source LTX Video AI Model With Real-Time Video Generation Capability

Ashish Singh — Mon, 25 Nov 2024 12:48:26 +0000

Lightricks, a software company focused on image and video editing, released an open-source artificial intelligence (AI) video model in preview last week. Dubbed LTX Video, the AI model can generate medium-resolution videos in real time. While real-time video generation capability exists in a few large language models (LLMs), this is the first one to be open-sourced. The company also stated that once the full version is released, it will be free for both personal and commercial use and can be integrated into the LTX Studio.

Lightricks Introduces Open-Source AI Video Model

In a series of posts on X (formerly known as Twitter), Lightricks detailed its open-source AI model. The LTX Video accepts both text and images as input and can generate five-second-long videos in 768 x 512p resolution. While the preview model caps the video quality at medium resolution, it offers near real-time generation with four seconds of wait time. However, this generation time is possible on devices equipped with the Nvidia H100 chip.

(1/13) We’ve been working on something special

Introducing LTX Video, Lightricks’ new open-source, community-driven model for video generation. Create breathtaking videos in moments, blazing past traditional playback speeds—this is LTX Video.

Learn more below. pic.twitter.com/HWSVvT8P77

— LTX Studio (@LTXStudio) November 22, 2024

The company claims that the AI model can generate dynamic videos with high prompt adherence and does not require high-end resources to run. To locally run the LTX Video, users will need a GPU similar to the level of RTX 4090. Lightricks also highlighted that the model architecture is based on the Diffusion Transformer but uses only two billion parameters to keep its size small.

LTX Video is currently available on GitHub, Hugging Face, and ComfyUI to download. To test the model’s capabilities before downloading it from the hosting websites, users can go to the model page on Fal.ai here.

The AI model can also be integrated with a wide range of external editing tools to further fine-tune the generated videos. The company also plans to release the full version of the video model and make it open source for both personal use cases and commercial usage. The full version of the tool will also be integrated with the LTX Studio, the company’s AI-powered storyboard platform.

For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who’sThat360 on Instagram and YouTube.

Redmi Watch 5, Buds 6 Pro Launch Set for November 27 Alongside Redmi K80 Series

Sony in ‘Early Stages’ of Developing New PS5 Gaming Handheld to Compete With Nintendo Switch: Report

OpenAI Reportedly Planning to Release The Successor to GPT-4 Before the End of This Year

Ashish Singh — Fri, 25 Oct 2024 14:13:42 +0000

OpenAI is reportedly planning to release the next generation of its artificial intelligence (AI) model before the end of the year. As per a report, the next frontier model of the company will be significantly more powerful and capable than the GPT-4 AI model. The large language model is said to be internally called Orion. While the company is reportedly planning for a December release, it is said that the AI model will not be released in the public domain first. Instead, it could first be accessed by the enterprises OpenAI closely works with.

OpenAI’s Next Generation of AI Model

The Verge reported that the AI firm is looking at a timeline of December 2024 for the launch of the next generation of its frontier large language model. Citing people familiar with the matter, the publication claimed that the model is internally being called Orion. Notably, earlier it was said to be Strawberry, but that turned out to be the GPT-4o AI model.

While OpenAI has released the GPT-4 Turbo and GPT-4o AI models after the release of GPT-4, neither of these were new AI models. They were simply upgraded and tweaked versions of the LLM, built on the foundation of the GPT-4 architecture. The next iteration of its AI model, which could be called GPT-5, is expected to include a new architecture and capabilities.

As per the report, the codenamed Orion AI model will not be released directly to the public, unlike the previous AI model releases by the company. Instead, the AI model will reportedly be shared with companies that OpenAI works closely with. The reason for this is said to be that the AI firm wants to let the enterprises build their own products and features before the model is available in the public domain.

Microsoft could be one of them, given that it is a major investor in the company. Interestingly, the report also claims that the Orion model could be hosted on the Azure servers by November. However, the official name of the AI model is not confirmed as of yet.

Additionally, the report claims that the Orion AI model could be up to 100 times more powerful than GPT-4. It is also expected to introduce improved AI agentic capabilities. Eventually, OpenAI wants to merge its AI models together to create artificial general intelligence (AGI), the report added.