Large-Scale Optimization: Beyond Stochastic Gradient Descent and Convexity[MP4] [1:55:55] [2017/06/08]Stochastic optimization lies at the heart of machine learning, and its cornerstone is stochastic gradient descent (SGD), a staple introduced over 60 years ago! Recent years have, however, brought an…
Machine Learning and the Law Symposium Session 1[MP4] [1:53:30] [2017/04/25]Advances in machine learning and artificial intelligence mean that predictions and decisions of algorithms are already in use in many important situations under legal or regulatory control, and this…
Nuts and Bolts of Building Applications using Deep Learning[MP4] [2:06:54] [2017/04/25]How do you get deep learning to work in your business, product, or scientific study? The rise of highly scalable deep learning techniques is changing how you can best approach AI problems. This…
Machine Learning and the Law Symposium Session 3[MP4] [1:31:28] [2017/04/25]Advances in machine learning and artificial intelligence mean that predictions and decisions of algorithms are already in use in many important situations under legal or regulatory control, and this…
Machine Learning and the Law Symposium Session 2[MP4] [1:47:33] [2017/04/24]Advances in machine learning and artificial intelligence mean that predictions and decisions of algorithms are already in use in many important situations under legal or regulatory control, and this…
Recurrent Neural Networks and Other Machines that Learn Algorithms Symposium Session 3[MP4] [1:28:15] [2017/03/17]Soon after the birth of modern computer science in the 1930s, two fundamental questions arose: 1. How can computers learn useful programs from experience, as opposed to being programmed by human…
Recurrent Neural Networks and Other Machines that Learn Algorithms Symposium Session 1[MP4] [1:53:55] [2017/03/02]Soon after the birth of modern computer science in the 1930s, two fundamental questions arose: 1. How can computers learn useful programs from experience, as opposed to being programmed by human…
Recurrent Neural Networks and Other Machines that Learn Algorithms Symposium Session 2[MP4] [1:46:35] [2017/02/28]Soon after the birth of modern computer science in the 1930s, two fundamental questions arose: 1. How can computers learn useful programs from experience, as opposed to being programmed by human…
Deep Reinforcement Learning Through Policy Optimization[MP4] [1:59:06] [2017/01/24]Reinforcement Learning (Deep RL) has seen several breakthroughs in recent years. In this tutorial we will focus on recent advances in Deep RL through policy gradient methods and actor critic methods.…
Theory and Algorithms for Forecasting Non-Stationary Time Series[MP4] [1:45:04] [2017/01/24]Time series appear in a variety of key real-world applications such as signal processing, including audio and video processing; the analysis of natural phenomena such as local weather, global…
Crowdsourcing: Beyond Label Generation[MP4] [1:50:18] [2017/01/24]This tutorial will showcase some of the most innovative uses of crowdsourcing that have emerged in the past few years. While some have clear and immediate benefits to machine learning, we will also…
ML Foundations and Methods for Precision Medicine and Healthcare[MP4] [2:08:17] [2017/01/24]Electronic health records and high throughput measurement technologies are changing the practice of healthcare to become more algorithmic and data-driven. This offers an exciting opportunity for…
Variational Inference: Foundations and Modern Methods[MP4] [1:53:04] [2017/01/24]One of the core problems of modern statistics and machine learning is to approximate difficult-to-compute probability distributions. This problem is especially important in probabilistic modeling,…
Generative Adversarial Networks[MP4] [1:55:53] [2017/01/24]Generative adversarial networks (GANs) are a recently introduced class of generative models, designed to produce realistic samples. This tutorial is intended to be accessible to an audience who has no…
Predictive Learning[MP4] [0:56:52] [2017/01/24]Deep learning has been at the root of significant progress in many application areas, such as computer perception and natural language processing. But almost all of these systems currently use…
Intelligent Biosphere[MP4] [0:49:42] [2017/01/24]The biosphere is a stupendously complex and poorly understood system, which we depend on for our survival, and which we are attacking on every front. Worrying. But what has that got to do with machine…
Value Iteration Networks[MP4] [0:19:40] [2017/01/24]We introduce the value iteration network (VIN): a fully differentiable neural network with a `planning module' embedded within. VINs can learn to plan, and are suitable for predicting outcomes that…
Tractable Operations for Arithmetic Circuits of Probabilistic Models[MP4] [0:19:15] [2017/01/24]We consider tractable representations of probability distributions and the polytime operations they support. In particular, we consider a recently proposed arithmetic circuit representation, the…
Testing for Differences in Gaussian Graphical Models: Applications to Brain Connectivity[MP4] [0:16:51] [2017/01/24]Functional brain networks are well described and estimated from data with Gaussian Graphical Models (GGMs), e.g.\ using sparse inverse covariance estimators. Comparing functional connectivity of…
SDP Relaxation with Randomized Rounding for Energy Disaggregation[MP4] [0:21:36] [2017/01/24]We develop a scalable, computationally efficient method for the task of energy disaggregation for home appliance monitoring. In this problem the goal is to estimate the energy consumption of each…
Bayesian Intermittent Demand Forecasting for Large Inventories[MP4] [0:18:01] [2017/01/24]We present a scalable and robust Bayesian method for demand forecasting in the context of a large e-commerce platform, paying special attention to intermittent and bursty target statistics. Inference…
Synthesis of MCMC and Belief Propagation[MP4] [0:17:26] [2017/01/24]Markov Chain Monte Carlo (MCMC) and Belief Propagation (BP) are the most popular algorithms for computational inference in Graphical Models (GM). In principle, MCMC is an exact probabilistic method…
Deep Learning for Predicting Human Strategic Behavior[MP4] [0:19:19] [2017/01/24]Predicting the behavior of human participants in strategic settings is an important problem in many domains. Most existing work either assumes that participants are perfectly rational, or attempts to…
Using Fast Weights to Attend to the Recent Past[MP4] [0:21:02] [2017/01/24]Until recently, research on artificial neural networks was largely restricted to systems with only two types of variable: Neural activities that represent the current or recent input and weights that…
Sequential Neural Models with Stochastic Layers[MP4] [0:20:18] [2017/01/24]How can we efficiently propagate uncertainty in a latent state representation with recurrent neural networks? This paper introduces stochastic recurrent neural networks which glue a deterministic…
Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences[MP4] [0:20:55] [2017/01/24]Recurrent Neural Networks (RNNs) have become the state-of-the-art choice for extracting patterns from temporal sequences. Current RNN models are ill suited to process irregularly sampled data…
Graphons, mergeons, and so on![MP4] [0:17:10] [2017/01/24]In this work we develop a theory of hierarchical clustering for graphs. Our modelling assumption is that graphs are sampled from a graphon, which is a powerful and general model for generating graphs…
Hierarchical Clustering via Spreading Metrics[MP4] [0:17:40] [2017/01/24]We study the cost function for hierarchical clusterings introduced by [Dasgupta, 2015] where hierarchies are treated as first-class objects rather than deriving their cost from projections into flat…
Clustering with Same-Cluster Queries[MP4] [0:18:05] [2017/01/24]We propose a framework for Semi-Supervised Active Clustering framework (SSAC), where the learner is allowed to interact with a domain expert, asking whether two given instances belong to the same…
Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA[MP4] [0:22:50] [2017/01/24]Nonlinear independent component analysis (ICA) provides an appealing framework for unsupervised feature learning, but the models proposed so far are not identifiable. Here, we first propose a new…
Fast and Provably Good Seedings for k-Means[MP4] [0:19:44] [2017/01/24]Seeding - the task of finding initial cluster centers - is critical in obtaining high-quality clusterings for k-Means. However, k-means++ seeding, the state of the art algorithm, does not…
Supervised learning through the lens of compression[MP4] [0:16:52] [2017/01/24]This work continues the study of the relationship between sample compression schemes and statistical learning, which has been mostly investigated within the framework of binary classification. We…
MetaGrad: Multiple Learning Rates in Online Learning[MP4] [0:16:36] [2017/01/24]In online convex optimization it is well known that certain subclasses of objective functions are much easier than arbitrary convex functions. We are interested in designing adaptive methods that can…
Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning[MP4] [0:15:52] [2017/01/24]We study the sampling-based planning problem in Markov decision processes (MDPs) that we can access only through a generative model, usually referred to as Monte-Carlo planning. Our objective is to…
Global Analysis of Expectation Maximization for Mixtures of Two Gaussians[MP4] [0:18:58] [2017/01/24]Expectation Maximization (EM) is among the most popular algorithms for estimating parameters of statistical models. However, EM, which is an iterative algorithm based on the maximum likelihood…
Machine Learning and Likelihood-Free Inference in Particle Physics[MP4] [0:50:37] [2017/01/24]Particle physics aims to answer profound questions about the fundamental building blocks of the Universe through enormous data sets collected at experiments like the Large Hadron Collider at CERN.…
Matrix Completion has No Spurious Local Minimum[MP4] [0:18:26] [2017/01/24]Matrix completion is a basic machine learning problem that has wide applications, especially in collaborative filtering and recommender systems. Simple non-convex optimization algorithms are popular…
Large-Scale Price Optimization via Network Flow[MP4] [0:18:59] [2017/01/24]This paper deals with price optimization, which is to find the best pricing strategy that maximizes revenue or profit, on the basis of demand forecasting models. Though recent advances in regression…
Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks[MP4] [0:18:17] [2017/01/24]We study the problem of synthesizing a number of likely future frames from a single input image. In contrast to traditional methods, which have tackled this problem in a deterministic or…
Supervised Word Mover's Distance[MP4] [0:21:30] [2017/01/24]Accurately measuring the similarity between text documents lies at the core of many real world applications of machine learning. These include web-search ranking, document recommendation,…
Beyond Exchangeability: The Chinese Voting Process[MP4] [0:19:34] [2017/01/24]Many online communities present user-contributed responses, such as reviews of products and answers to questions. User-provided helpfulness votes can highlight the most useful responses, but voting is…
Protein contact prediction from amino acid co-evolution using convolutional networks for…[MP4] [0:20:12] [2017/01/24]Proteins are the "building blocks of life", the most abundant organic molecules, and the central focus of most areas of biomedicine. Protein structure is strongly related to protein…
Deep Learning without Poor Local Minima[MP4] [0:19:19] [2017/01/24]In this paper, we prove a conjecture published in 1989 and also partially address an open problem announced at the Conference on Learning Theory (COLT) 2015. For an expected loss function of a deep…
Learning to Poke by Poking: Experiential Learning of Intuitive Physics[MP4] [0:21:00] [2017/01/24]We investigate an experiential learning paradigm for acquiring an internal model of intuitive physics. Our model is evaluated on a real-world robotic manipulation task that requires displacing objects…
Learning What and Where to Draw[MP4] [0:21:37] [2017/01/24]Generative Adversarial Networks (GANs) have recently demonstrated the capability to synthesize compelling real-world images, such as room interiors, album covers, manga, faces, birds, and flowers.…
Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks[MP4] [0:22:55] [2017/01/24]We present weight normalization: a reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction. By reparameterizing the weights…
Achieving the KS threshold in the general stochastic block model with linearized acyclic belief…[MP4] [0:15:04] [2017/01/24]The stochastic block model (SBM) has long been studied in machine learning and network science as a canonical model for clustering and community detection. In the recent years, new developments have…
Orthogonal Random Features[MP4] [0:18:42] [2017/01/24]We present an intriguing discovery related to Random Fourier Features: replacing multiplication by a random Gaussian matrix with multiplication by a properly scaled random orthogonal matrix…
Poisson-Gamma dynamical systems[MP4] [0:16:40] [2017/01/24]This paper presents a dynamical system based on the Poisson-Gamma construction for sequentially observed multivariate count data. Inherent to the model is a novel Bayesian nonparametric prior that…
The Multiscale Laplacian Graph Kernel[MP4] [0:20:34] [2017/01/24]Many real world graphs, such as the graphs of molecules, exhibit structure at multiple different scales, but most existing kernels between graphs are either purely local or purely global in character.…
Stochastic Online AUC Maximization[MP4] [0:14:40] [2017/01/24]Area under ROC (AUC) is a metric which is widely used for measuring the classification performance for imbalanced data. It is of theoretical and practical interest to develop online learning…
Without-Replacement Sampling for Stochastic Gradient Methods[MP4] [0:19:55] [2017/01/24]Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled with replacement. In contrast, sampling without replacement is far less…
Regularized Nonlinear Acceleration[MP4] [0:18:52] [2017/01/24]We describe a convergence acceleration technique for generic optimization problems. Our scheme computes estimates of the optimum from a nonlinear average of the iterates produced by any optimization…
Generalization of ERM in Stochastic Convex Optimization: The Dimension Strikes Back[MP4] [0:16:21] [2017/01/24]In stochastic convex optimization the goal is to minimize a convex function $F(x) \doteq \E{f\sim D}[f(x)]overaconvexset \K \subset \R^dwhere Dissomeunknowndistributionandeach …
Bayesian Optimization with Robust Bayesian Neural Networks[MP4] [0:14:50] [2017/01/24]Bayesian optimization is a prominent method for optimizing expensive to evaluate black-box functions that is prominently applied to tuning the hyperparameters of machine learning algorithms. Despite…
Learning About the Brain: Neuroimaging and Beyond[MP4] [0:51:37] [2017/01/24]Quantifying mental states and identifying "statistical biomarkers" of mental disorders from neuroimaging data is an exciting and rapidly growing research area at the intersection of…
Reproducible Research: the Case of the Human Microbiome[MP4] [0:55:53] [2017/01/24]Modern data sets usually present multiple levels of heterogeneity, some apparent such as the necessity of combining trees, graphs, contingency tables and continuous covariates, others concern latent…
Interpretable Distribution Features with Maximum Testing Power[MP4] [0:22:45] [2017/01/24]Two semimetrics on probability distributions are proposed, given as the sum of differences of expectations of analytic functions evaluated at spatial or frequency locations (i.e, features). The…
Examples are not enough, learn to criticize! Criticism for Interpretability [MP4] [0:20:39] [2017/01/24]Example-based explanations are widely used in the effort to improve the interpretability of highly complex distributions. However, prototypes alone are rarely sufficient to represent the gist of the…
Showing versus doing: Teaching by demonstration[MP4] [0:19:03] [2017/01/24]People often learn from others' demonstrations, and classic inverse reinforcement learning (IRL) algorithms have brought us closer to realizing this capacity in machines. In contrast, teaching by…
Relevant sparse codes with variational information bottleneck[MP4] [0:17:53] [2017/01/24]In many applications, it is desirable to extract only the relevant aspects of data. A principled way to do this is the information bottleneck (IB) method, where one seeks a code that maximises…
Dense Associative Memory for Pattern Recognition[MP4] [0:24:11] [2017/01/24]A model of associative memory is studied, which stores and reliably retrieves many more patterns than the number of neurons in the network. We propose a simple duality between this dense associative…
Deep Learning Symposium Session 3[MP4] [1:21:49] [2017/01/24]Deep Learning algorithms attempt to discover good representations, at multiple levels of abstraction. Deep Learning is a topic of broad interest, both to researchers who develop new algorithms and…
Deep Learning Symposium Session 2[MP4] [1:05:19] [2017/01/24]Deep Learning algorithms attempt to discover good representations, at multiple levels of abstraction. Deep Learning is a topic of broad interest, both to researchers who develop new algorithms and…
Deep Learning Symposium Session 1[MP4] [1:54:39] [2017/01/24]Deep Learning algorithms attempt to discover good representations, at multiple levels of abstraction. Deep Learning is a topic of broad interest, both to researchers who develop new algorithms and…