a guest Oct 21st, 2019 73 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
- IC Colloquium
- Monday 21 October 2019 @ 16:15 room BC 420 (see map)
- Memory-Efficient Adaptive Optimization for Humungous-Scale Learning
- Yoram Singer
- Princeton University
- Adaptive gradient-based optimizers such as AdaGrad and Adam are among the methods of choice in modern machine learning. These methods maintain second-order statistics of each model parameter, thus doubling the memory footprint of the optimizer. In behemoth-size applications, this memory overhead restricts the size of the model being used as well as the number of examples in a mini-batch. I start by giving a general overview of adaptive gradient methods. I then describe a novel, simple, and flexible adaptive optimization method with sublinear memory cost that retains the benefits of classical adaptive methods. I give convergence guarantees for the method and demonstrate its effectiveness in training some of the largest deep models.
- Yoram Singer is a professor of Computer Science at Princeton University. He was a member of the technical staff at AT&T Research 1995-1999, an associate professor at the Hebrew University 1999-2007, and a Principal Scientist at Google 2005-2019. At Google, he implemented and launched Google’s Domain Spam classifier used for all search queries 2004-2017, co-founded the Sibyl system which served YouTube predictions 2008-2018, founded the Principles Of Effective Machine-learning group, and the Google’s AI Lab at Princeton. He co-chaired COLT’04 and NIPS’04. He is a fellow of AAAI.
- More information
- Host: Martin Jaggi
- The video of his talk will be available on the IC Memento after the talk
RAW Paste Data