Amazon has introduced Nova Act, a groundbreaking AI agent designed to autonomously perform tasks within web browsers. Developed by the Amazon AGI Lab in San Francisco, Nova Act represents a significant leap in agentic AI technology, enabling users to automate multi-step processes such as booking tickets, ordering groceries, or navigating complex websites.
Nova Act can interact with web pages just like a human user—clicking buttons, filling out forms, scrolling through pages, and selecting dates on calendars. It uses visual understanding to interpret and engage with site elements, making it capable of handling tasks that often challenge other AI systems.
The AI agent breaks down workflows into smaller steps to complete complex processes. For example, it can search for items, add them to a cart, check out, and even answer questions based on on-screen activity.
Accompanying the launch is the Nova Act SDK, which allows developers to prototype their own AI agents tailored to specific tasks. Developers can access this toolkit on nova.amazon.com, where Amazon also showcases its Nova foundation models.
Nova Act will power features in Amazon’s upcoming Alexa+ upgrade, enhancing the voice assistant with generative AI capabilities for browser-based tasks. This integration positions Alexa+ as a more versatile tool for everyday needs.
Amazon claims Nova Act achieves over 90% accuracy in handling browser-based tasks, outperforming similar tools from competitors like OpenAI and Anthropic. Internal tests highlight its ability to navigate pop-ups, drop-down menus, and other challenging web elements.
Nova Act builds on the foundation models launched by Amazon in December 2024—Nova Micro, Lite, and Pro—which specialize in text generation and context understanding. Additionally, visual generation models like Canvas (images) and Reel (videos) are part of this portfolio, showcasing Amazon’s commitment to advancing generative AI technologies.