Google DeepMind at NeurIPS 2024

Analysis

Printed: 5 December 2024

Advancing adaptive AI brokers, empowering 3D scene creation, and innovating LLM coaching for a wiser, safer future

Subsequent week, AI researchers worldwide will collect for the 38th Annual Conference on Neural Information Processing Systems (NeurIPS), happening December 10-15 in Vancouver,

Two papers led by Google DeepMind researchers will probably be acknowledged with Test of Time awards for his or her “simple affect” on the sector. Ilya Sutskever will current on Sequence to Sequence Learning with Neural Networks which was co-authored with Google DeepMind VP of Drastic Analysis, Oriol Vinyals, and Distinguished Scientist Quoc V. Le. Google DeepMind Scientists Ian Goodfellow and David Warde-Farley will current on Generative Adversarial Nets.

We’ll additionally present how we translate our foundational analysis into real-world functions, with reside demonstrations together with Gemma Scope, AI for music generation, weather forecasting and extra.

Groups throughout Google DeepMind will current greater than 100 new papers on matters starting from AI brokers and generative media to modern studying approaches.

Constructing adaptive, good, and protected AI Brokers

LLM-based AI brokers are displaying promise in finishing up digital duties by way of pure language instructions. But their success relies on exact interplay with advanced person interfaces, which requires in depth coaching knowledge. With AndroidControl, we share probably the most various management dataset to this point, with over 15,000 human-collected demos throughout greater than 800 apps. AI brokers skilled utilizing this dataset confirmed important efficiency good points which we hope helps advance analysis into extra common AI brokers.

For AI brokers to generalize throughout duties, they should study from every expertise they encounter. We current a technique for in-context abstraction learning that helps brokers grasp key job patterns and relationships from imperfect demos and pure language suggestions, enhancing their efficiency and adaptableness.

A body from a video demonstration of somebody making a sauce, with particular person components recognized and numbered. ICAL is ready to extract the essential features of the method

Growing agentic AI that works to meet customers’ targets may also help make the know-how extra helpful, however alignment is crucial when growing AI that acts on our behalf. To that finish, we suggest a theoretical methodology to measure an AI system’s goal-directedness, and in addition present how a model’s perception of its user can influence its safety filters. Collectively, these insights underscore the significance of sturdy safeguards to stop unintended or unsafe behaviors, making certain that AI brokers’ actions stay aligned with protected, supposed makes use of.

Advancing 3D scene creation and simulation

As demand for high-quality 3D content material grows throughout industries like gaming and visible results, creating lifelike 3D scenes stays pricey and time-intensive. Our latest work introduces novel 3D technology, simulation, and management approaches, streamlining content material creation for sooner, extra versatile workflows.

Producing high-quality, reasonable 3D belongings and scenes usually requires capturing and modeling 1000’s of 2D photographs. We showcase CAT3D, a system that may create 3D content material in as little as a minute, from any variety of photos — even only one picture, or a textual content immediate. CAT3D accomplishes this with a multi-view diffusion mannequin that generates extra constant 2D photos from many various viewpoints, and makes use of these generated photos as enter for conventional 3D modelling methods. Outcomes surpass earlier strategies in each velocity and high quality.

CAT3D allows 3D scene creation from any variety of generated or actual photos.

Left to proper: Textual content-to-image-to-3D, an actual picture to 3D, a number of photographs to 3D.

Simulating scenes with many inflexible objects, like a cluttered tabletop or tumbling Lego bricks, additionally stays computationally intensive. To beat this roadblock, we current a new technique called SDF-Sim that represents object shapes in a scalable method, rushing up collision detection and enabling environment friendly simulation of huge, advanced scenes.

A fancy simulation of lots of of objects falling and colliding, precisely modelled utilizing SDF-Sim

AI picture mills primarily based on diffusion fashions battle to manage the 3D place and orientation of a number of objects. Our answer, Neural Assets, introduces object-specific representations that seize each look and 3D pose, realized via coaching on dynamic video knowledge. Neural Property allows customers to maneuver, rotate, or swap objects throughout scenes—a useful gizmo for animation, gaming, and digital actuality.

Given a supply picture and object 3D bounding bins, we will translate, rotate, and rescale the thing, or switch objects or backgrounds between photos

Enhancing how LLMs study and reply

We’re additionally advancing how LLMs prepare, study, and reply to customers, bettering efficiency and effectivity on a number of fronts.

With bigger context home windows, LLMs can now study from doubtlessly 1000’s of examples directly — often called many-shot in-context studying (ICL). This course of boosts mannequin efficiency on duties like math, translation, and reasoning, however usually requires high-quality, human-generated knowledge. To make coaching less expensive, we discover methods to adapt many-shot ICL that cut back reliance on manually curated knowledge. There’s a lot knowledge accessible for coaching language fashions, the principle constraint for groups constructing them turns into the accessible compute. We address an important question: with a hard and fast compute finances, how do you select the appropriate mannequin dimension to attain the perfect outcomes?

One other modern strategy, which we name Time-Reversed Language Models (TRLM), explores pretraining and finetuning an LLM to work in reverse. When given conventional LLM responses as enter, a TRLM generates queries which may have produced these responses. When paired with a conventional LLM, this methodology not solely helps guarantee responses observe person directions higher, but in addition improves the technology of citations for summarized textual content, and enhances security filters towards dangerous content material.

Curating high-quality knowledge is significant for coaching giant AI fashions, however guide curation is troublesome at scale. To deal with this, our Joint Example Selection (JEST) algorithm optimizes coaching by figuring out probably the most learnable knowledge inside bigger batches, enabling as much as 13× fewer coaching rounds and 10× much less computation, outperforming state-of-the-art multimodal pretraining baselines.

Planning duties are one other problem for AI, notably in stochastic environments, the place outcomes are influenced by randomness or uncertainty. Researchers use varied inference sorts for planning, however there’s no constant strategy. We display that planning itself can be viewed as a distinct type of probabilistic inference and suggest a framework for rating totally different inference methods primarily based on their planning effectiveness.

Bringing collectively the worldwide AI group

We’re proud to be a Diamond Sponsor of the convention, and help Women in Machine Learning, LatinX in AI and Black in AI in constructing communities world wide working in AI, machine studying and knowledge science.

For those who’re at NeurIPs this 12 months, swing by the Google DeepMind and Google Analysis cubicles to discover cutting-edge analysis in demos, workshops and extra all through the convention.

Source link