My Picks From Neurips 2020

NeurIPS 2020 is virtual this year. As a result, not only the talks were virtual, but also the networking and poster sessions were held online. I got to experience gather.town for the first time. It felt like playing video games at times. I changed my avatar many times :D
All the keynotes had sign language interpretation. I thought it was cool!
Below are some of the talks that I enjoyed watching or reading.

Keynotes:

  • The Genomic Bottleneck: A Lesson from Biology
    • Success to humans over other animal is perhaps due to better prior
    • Neo-cortex in Brain is a repetitive structure
    • Humans have broken cultural barrier through language and thus transfer knowledge from one generation to another
  • Causal Learning

  • You Can’t Escape Hyperparameters and Latent Variables: Machine Learning as a Software Engineering Enterprise
    • We are compiler hackers
    • How can we make AI to make our life better
    • Pay attention to what AI you are designing to the world
    • Be aware of the bias in the data and how it was collected
    • Pay attention to how your model will be used
    • Tradeoffs make some things better and somethings worse Code bless us All
  • Feedback control loop
    • Multi-agent learning games
    • Non-stationary environment
    • (1) Stabilize and shape behavior
      • Two player and multi-player/agent learning
      • Zero-sum game
      • Allocation game
      • Learning rule example: replicator dynamics
      • Can natural learning rule lead to nash equilibrium?
        • Update rule depends on a user’s own strategy
        • Uncoupled learning rule
        • Anti-coordination game
      • Introduce auxiliary states for higher order learning
      • Anticipatory learning - where things are heading taking into account
      • Optimization - optimistic gradient ascent
    • (2) Robustness to variation
      • Internal: thrust, drag
      • External: weather
      • Nominal analysis
      • Contractive game: allow auxiliary dynamics over long term
      • Partially observed markov process
      • PAC verification (1) completeness (2) Soundness
    • (3) Track command signals
      • Forecasting and no-regret
      • Distributional forecast
  • Robustness, Verification, Privacy: Addressing Machine Learning Adversaries
    • Privacy
    • Robustness
    • Verifiability
      • ML is risk assessment system
      • Build trust
      • Who builds the ML system and who verifies the ML code is correct?
      • Verify training distribution can be called as verifying hypothesis

Meta learning

  • Continual Deep learning
    • Catastrophic forgetting as new data arrived
    • Some weights are regularized while others not
    • Function regularization
      • Identify few crucial examples
      • (1) Convert neural network to GP
      • (2) Identify memorable past examples (on the boundary?)
      • (3) Train function regularizations of memorable past
    • ELBO - variational objective
  • [Adversarial robust few-shot learning: a meta-learning approach]
    • Adversarial examples only in the outer loop
    • Meta-learner is adversarially robust
    • Query data matters when doing adversarial training
    • Setting in a few shot learning
    • Outer loop queries with adversarial examples
  • Online continual learning
    • Online data
    • reservoir sampling
    • replay buffer
    • C-MAML
      • Inner and outer loop model
    • Adaptive learning rate schedule - per parameter
      • conservative updating gradient related to the previous task
      • gradient clipping and masking
    • Single pass and multiple pass - how many times an example is used

Reinforcement learning

Normalizing flows

ML Compiler

Equivariant networks

  • Symmetry: a translation that leaves some aspect of the object invariant
  • Transformation
  • GNN - same adjacency matrix symmetry

Representation learning

  • Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning
    • Self-supervised learning in par with SimCLR
    • Image augmentation
    • Prediction network and a target network
    • Exponential moving average is used to update weights from prediction network to target network minimizing cosine similarity of output of two networks
  • Space-Time Correspondence as a Contrastive Random Walk
    • Contrastive learning
    • Data augmentation has hyperparameters and require supervision
    • Dynamics in Video
    • Mining correspondence
    • Temporal coherence
    • Latent views - intermediate views
      • Palindrome views
      • Random walk from a query frame to a target node and edge strength
      • Encoder
      • Pairwise similarity and softmax function
      • Transition probability
      • Each step of the random walk is a contrastive learning task
      • Also apply self-supervised using palindrome frame
      • Edge dropout improves object level correspondence
    • Label propagation
    • Work outperforms colorization based method
  • Learning Physical Graph Representations from Visual Scenes
  • Multi-label Contrastive Predictive Coding
  • Learning optimal representation
    • Consider a subset of classifier that are accessible
    • Minimality causes generalization
  • [Manifold in Graph embedding]
    • Representation of each point in the embedding space
    • Singular value decomposition
  • [Hebbian Memory Network]
    • Biology inspired
    • Single layer memory module
    • Based on associative memory implemented by plasticity
    • Memory module is differentiable

Vision application

  • Rethinking Pre-training and Self-training
    • How do you incorporate unlabeled data in your task
    • Pretraining and transfer weights for downstream task
      • Value diminishes with more data
    • Self-training: co-training / SimCLR
    • SimCLR and pre-training
    • When pre-training hurts, self-training helps
    • Does self training have any limit?
    • Joint training helps
    • pre-training and self-training on the same task are additive
  • Neural Sparse Voxel Fields - NSVF
    • Voxel embedding
    • Ray voxel intersection and matching
    • Progressive initialization with self pruning
  • 3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data
    • Set of plausible meshes
    • 3D reconstruction from 2D have to address ambiguity
    • n-Quantization for multiple hypothesis, normalizing flows, reprojection loss for mode degeneration for sparse gradients
    • Normalizing flows convert complex distribution such as 3D meshes to simpler distribution such as multivariate gaussian
  • Do Adversarially Robust ImageNet Models Transfer Better?
    • What features model is learning depends on
      • Convolutional prior
      • Data augmentation
      • Loss function
    • Adversarial robustness
      • Train model with adversarial examples

Evaluation

  • Accuracy is not enough
    • Precision and Recall
    • Ranking
      • Expected search length: Num of non-relevant item searched before k relevant item
      • R-precision: Inverse of expected search length.
      • Reciprocal rank: Time to find a Single relevant item
      • average precision: avg precision for different recall preference
      • rank-based precision
      • expected utility
    • Behavior
      • online setting
      • Explicit feedback
        • Expensive and privacy concern
      • Implicit feedback
        • can be noisy and biased
      • Use log to calculate offline metric
      • Implicit
        • Short-term
          • page level
        • long-term
          • session - level
        • clicks, dwell time, eye movement
        • zooming in and out
        • Good abandonment
        • Slate evaluation

Causal learning

  • Causal intervention in Semantic segmentation
    • Expensive to collect label for semantic segmentation
    • Context adjustment
      • Context is a confounder between input data and label. Some objects alway cooccur such as couch and TV.
      • In order to defound it, we need to find an input data from any context with the same label. e.g., Two Car pictures from any context. causal-structure

      • Use causal intervention to cut connection from C to X
      • (1) Classification
      • (2) Pseudo-mask for semantic segmentation
      • (3) Confounder set to break the edge with confounder
        • Average segmentation mask of each class
      • Is there a notion of time?
  • [Causal imitation learning]
    • Reward signal is unobserved
    • Behavior cloning and inverse reinforcement learning
    • Environment is structural causal model, expert, learner

Sparsity

  • Gaussian weights
  • Constant expansion between layers
  • Uniform concentration

Day 2

Day 1

  • Make boats fly
  • Optimize folds to laptime
  • Use simulator to optimize design choices
  • Objective is to evaluate as many design choices as possible
  • Boats have more inputs than f1 cars
  • Use RL to optimize the design choices
  • Boat simulator: loosely defined goal, imperfect knowledge of the environment and complex dynamics
  • RL controls 14 inputs. Optimizes the velocity made good.
    • Autopilot/RL trained agent follow the path and perform manuevers
    • Curriculumn learning, custom initialization, sharing experience replay buffer across workers, domain randomization, sample efficient, recovering spot instances, use different seeds
    • Encapsulate a simulator in gym environment
    • Rewards
    • Autopilot/RL evaluates a design
    • Genetic search like method to search optimal design choices
    • Removes uncertainty of human input. No need to perfom many lapse and average output
  • The Challenges and Latest Advances in Causal AI
    • causal model helps to adapt data drift in dynamic world
    • It helps to discover causal driver and avoid spurious correlation
    • Works with small data
    • causalLens is an ML library that helps users to build causal model
    • Reading list:
      • Beyond structural causal models: causal constraints models by Blom et al. 2019
      • Necessary and sufficient conditions for causal feature selection in time series by Mastakouri et al. 2020
      • DYNOTEARS structural learning from time series data by Pamfil et al 2020
  • MINING AND LEARNING WITH GRAPHS AT SCALE
  • Tell us about our data: Local data to Global data
  • Computation on Multi-modal data
  • Scaling: mapreduce, distributed hashtable, GraphTensor for tensorflow, jraph
  • Widely used features: graph building and clustering, semi-supervised learning, GNNs and embedding
  • Graph building: LSH, semisupervised, local search and auto-encoders.
  • Clustering
  • Information propagation
  • Signals and topological analysis

GNN and COVID-19

  • Spatial and temporal information was used to model COVID-19 spread
  • Graph used spatial information to build the graph. One graph is built for each day. A graph receives information from the previous day’s graph.
  • Model predicts case count

Graphs for privacy

  • Federated learning of cohorts
  • Cluster of k users. Users within a cluster bear similarity in browser behavior
  • Affinity hierarchical clustering algorithm

Causal inference

  • Random trial
  • Clustering: group nodes that are in control group and treatment group
  • Correlation clusterings beat balanced clustering

Grale: Building graph at scale

  • Semi-supervised learning and inference on the unlabeled nodes
  • Different types of relationship. Finding the right relationship is important
  • Multi-modal features
  • Bucket using LSH. Within each bucket, training data (whether two node belong to the same graph) and build graph
  • Grale at Youtube to find malicious actor
    • Abusive vs non-abuse item

Clustering at scale

  • Affinity Hierarchical clustering
  • MapReduce
  • Randomized Composable Core-sets
Written on December 6, 2020