ai agent – TheNewsHub

OpenAI Reportedly Planning to Launch AI Agents That Can Control Tasks on Computer

Ashish Singh — Thu, 14 Nov 2024 11:03:57 +0000

OpenAI is reportedly planning to release artificial intelligence (AI) agents that can operate tasks on computer systems. As per a report, the company has been working on several agent-related research projects, one of which is dubbed “Operator” which can execute multi-step actions on computers. The AI agents are said to be released in January 2025 as a research preview for developers. The company is reportedly planning to access its AI agents via a native application programming interface (API) which developers can use to build software and apps.

OpenAI’s AI Agents

AI Agents have become a recent trend in the AI space. These are smaller AI models that have a limited but specialised knowledge base and can use specific software to execute actions such as mimicking keystrokes, button clicks, and more. Due to the specialised nature of the models, they can complete tasks with accuracy and speed.

According to a Bloomberg report, OpenAI has developed a new AI agent dubbed Operator that can complete tasks on computers. Citing people familiar with the matter, the publication claimed that users will be able to command the AI agent complicated tasks such as writing code or booking tickets, and it would be able to perform them.

On Wednesday, OpenAI executives reportedly revealed plans to release the tool in January 2025 as a research preview. The company is said to create a new API for developers through which developers will have access to it.

Notably, OpenAI is reportedly working on several agent-related research projects, which are near completion. One such agent is said to be capable of executing tasks in a web browser. Details about the other projects are currently not known.

OpenAI CEO Sam Altman mentioned AI agents as the company’s focus earlier this month during a question and answer session on Reddit. Replying to a user, he said, “We will have better and better models. But I think the thing that will feel like the next giant breakthrough will be agents.”

Anthropic, OpenAI’s competitor, released native AI agents last month. Dubbed Computer Use, these agents can understand and interact with computers, essentially allowing them to control and complete tasks on PCs. These agents are built on an upgraded version of Claude 3.5 Sonnet.

OpenAI Reportedly Planning to Launch AI Agents That Can Control Tasks on Computer

Ashish Singh — Thu, 14 Nov 2024 11:03:57 +0000

OpenAI’s AI Agents

Microsoft Introduces Magnetic-One Generalist Multi-Agent AI System That Can Complete Complex Tasks

Ashish Singh — Wed, 06 Nov 2024 10:15:32 +0000

Microsoft introduced a new multi-agent artificial intelligence (AI) system dubbed Magnetic-One on Monday. The tech giant called it a high-performing system that can activate multiple AI agents to complete complex tasks via web browsers on locally on a device. It is based on a new framework that allows an AI model to access multiple modalities and capabilities to complete tasks such as booking a ticket, purchasing a product online, or editing a document stored on the device. Notably, Microsoft’s Magnetic-One is an open-source project and is accessible to researchers and developers.

Microsoft Introduces Magnetic-One

Generative AI has taken a huge leap in machine intelligence and its capability to generate outputs across text, images, audio, and video formats. However, while modern AI systems are great at retrieving information, they still remain poor at reasoning, especially when it comes to solving problems and completing tasks.

This is why AI agents, which can be understood as miniature software capable of executing an action, have become an important extension of large language models (LLMs). Microsoft’s Magnetic-One also works on the same principle, as detailed in a research paper. The company describes it as a “high-performing generalist agentic system” designed to complete complex multi-step tasks such as software engineering, data analysis, scientific research, and web navigation.

Magnetic-One has a multi-agent architecture, which means one LLM can activate several agents to complete a task. For this, the AI system activates a lead agent dubbed the Orchestrator. It directs four other agents where each agent specialises in one task.

The workflow of Magnetic One
Photo Credit: Microsoft

For instance, if the system is asked to book a ticket for a movie, the Orchestrator could trigger a vision agent that can look at the screen and process the visual information. Another might have knowledge of web browsers and can handle its navigation. The third could be breaking down the prompt into actionable steps, and the fourth might be able to handle financial transactions. By dividing the task among multiple such specialised agents, both the accuracy and speed of completion is increased.

The open-source Magnetic-One AI system is available on GitHub and can be accessed here. It is available to researchers and developers, and can also be used for commercial purposes under a custom Microsoft licence. Alongside, Microsoft has also released AutoGenBench, which is a tool that evaluates the performance of AI agents. It comes with built-in controls for repetition and isolation to thoroughly test the agents.

(Except for the headline, this story has not been edited by NDTV staff and is published from a press release)

Affiliate links may be automatically generated – see our ethics statement for details.