Untitled

IC Colloquium

Monday 21 October 2019 @ 16:15 room BC 420 (see map)


Memory-Efficient Adaptive Optimization for Humungous-Scale Learning

Yoram Singer

Princeton University


Abstract

Adaptive gradient-based optimizers such as AdaGrad and Adam are among the methods of choice in modern machine learning. These methods maintain second-order statistics of each model parameter, thus doubling the memory footprint of the optimizer. In behemoth-size applications, this memory overhead restricts the size of the model being used as well as the number of examples in a mini-batch. I start by giving a general overview of adaptive gradient methods. I then describe a novel, simple, and flexible adaptive optimization method with sublinear memory cost that retains the benefits of classical adaptive methods. I give convergence guarantees for the method and demonstrate its effectiveness in training some of the largest deep models.


Biography

Yoram Singer is a professor of Computer Science at Princeton University. He was a member of the technical staff at AT&T Research 1995-1999, an associate professor at the Hebrew University 1999-2007, and a Principal Scientist at Google 2005-2019. At Google, he implemented and launched Google’s Domain Spam classifier used for all search queries 2004-2017, co-founded the Sibyl system which served YouTube predictions 2008-2018, founded the Principles Of Effective Machine-learning group, and the Google’s AI Lab at Princeton. He co-chaired COLT’04 and NIPS’04. He is a fellow of AAAI.

More information


Host: Martin Jaggi


The video of his talk will be available on the IC Memento after the talk