My Picks From Neurips 2020

NeurIPS 2020 is virtual this year. As a result, not only the talks were virtual, but also the networking and poster sessions were held online. I got to experience gather.town for the first time. It felt like playing video games at times. I changed my avatar many times :D
All the keynotes had sign language interpretation. I thought it was cool!
Below are some of the talks that I enjoyed watching or reading.

Keynotes:

The Genomic Bottleneck: A Lesson from Biology
- Success to humans over other animal is perhaps due to better prior
- Neo-cortex in Brain is a repetitive structure
- Humans have broken cultural barrier through language and thus transfer knowledge from one generation to another
Causal Learning
You Can’t Escape Hyperparameters and Latent Variables: Machine Learning as a Software Engineering Enterprise
- We are compiler hackers
- How can we make AI to make our life better
- Pay attention to what AI you are designing to the world
- Be aware of the bias in the data and how it was collected
- Pay attention to how your model will be used
- Tradeoffs make some things better and somethings worse Code bless us All
Feedback control loop
- Multi-agent learning games
- Non-stationary environment
- (1) Stabilize and shape behavior
  - Two player and multi-player/agent learning
  - Zero-sum game
  - Allocation game
  - Learning rule example: replicator dynamics
  - Can natural learning rule lead to nash equilibrium?
    - Update rule depends on a user’s own strategy
    - Uncoupled learning rule
    - Anti-coordination game
  - Introduce auxiliary states for higher order learning
  - Anticipatory learning - where things are heading taking into account
  - Optimization - optimistic gradient ascent
- (2) Robustness to variation
  - Internal: thrust, drag
  - External: weather
  - Nominal analysis
  - Contractive game: allow auxiliary dynamics over long term
  - Partially observed markov process
  - PAC verification (1) completeness (2) Soundness
- (3) Track command signals
  - Forecasting and no-regret
  - Distributional forecast
Robustness, Verification, Privacy: Addressing Machine Learning Adversaries
- Privacy
- Robustness
- Verifiability
  - ML is risk assessment system
  - Build trust
  - Who builds the ML system and who verifies the ML code is correct?
  - Verify training distribution can be called as verifying hypothesis

Meta learning

Continual Deep learning
- Catastrophic forgetting as new data arrived
- Some weights are regularized while others not
- Function regularization
  - Identify few crucial examples
  - (1) Convert neural network to GP
  - (2) Identify memorable past examples (on the boundary?)
  - (3) Train function regularizations of memorable past
- ELBO - variational objective
[Adversarial robust few-shot learning: a meta-learning approach]
- Adversarial examples only in the outer loop
- Meta-learner is adversarially robust
- Query data matters when doing adversarial training
- Setting in a few shot learning
- Outer loop queries with adversarial examples
Online continual learning
- Online data
- reservoir sampling
- replay buffer
- C-MAML
  - Inner and outer loop model
- Adaptive learning rate schedule - per parameter
  - conservative updating gradient related to the previous task
  - gradient clipping and masking
- Single pass and multiple pass - how many times an example is used

Reinforcement learning

Off-policy Reinforcement learning
- Data driven RL
- off-policy still collects data actively from environment
- Offline uses data collected with any policy; new data is not added
- Large data and large model lead to generalization
RL with augmented data
- Pixel based
- state-based
- Soft-Actor-Critic model
- Evaluation on Data efficiency, generalization efficiency
Sub-sampling for Efficient Non-Parametric Bandit Exploration
- UCB , Thompson sampling
- Bootstrapping observation; log regret
  - Sub-sampling dueling algorithm
    - Leader arm competes with other arms
    - Change leader at the end of the iteration
  - Randomized sampler
  - Deterministic sampler
Can Temporal-Diﬀerence and Q-Learning Learn Representation? A Mean-Field Theory

Normalizing flows

Improved Variational Bayesian Phylogenetic Inference with Normalizing Flows

ML Compiler

Equivariant networks

Symmetry: a translation that leaves some aspect of the object invariant
Transformation
GNN - same adjacency matrix

Representation learning

Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning
- Self-supervised learning in par with SimCLR
- Image augmentation
- Prediction network and a target network
- Exponential moving average is used to update weights from prediction network to target network minimizing cosine similarity of output of two networks
Space-Time Correspondence as a Contrastive Random Walk
- Contrastive learning
- Data augmentation has hyperparameters and require supervision
- Dynamics in Video
- Mining correspondence
- Temporal coherence
- Latent views - intermediate views
  - Palindrome views
  - Random walk from a query frame to a target node and edge strength
  - Encoder
  - Pairwise similarity and softmax function
  - Transition probability
  - Each step of the random walk is a contrastive learning task
  - Also apply self-supervised using palindrome frame
  - Edge dropout improves object level correspondence
- Label propagation
- Work outperforms colorization based method
Learning Physical Graph Representations from Visual Scenes
Multi-label Contrastive Predictive Coding
Learning optimal representation
- Consider a subset of classifier that are accessible
- Minimality causes generalization
[Manifold in Graph embedding]
- Representation of each point in the embedding space
- Singular value decomposition
[Hebbian Memory Network]
- Biology inspired
- Single layer memory module
- Based on associative memory implemented by plasticity
- Memory module is differentiable

Vision application

Rethinking Pre-training and Self-training
- How do you incorporate unlabeled data in your task
- Pretraining and transfer weights for downstream task
  - Value diminishes with more data
- Self-training: co-training / SimCLR
- SimCLR and pre-training
- When pre-training hurts, self-training helps
- Does self training have any limit?
- Joint training helps
- pre-training and self-training on the same task are additive
Neural Sparse Voxel Fields - NSVF
- Voxel embedding
- Ray voxel intersection and matching
- Progressive initialization with self pruning
3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data
- Set of plausible meshes
- 3D reconstruction from 2D have to address ambiguity
- n-Quantization for multiple hypothesis, normalizing flows, reprojection loss for mode degeneration for sparse gradients
- Normalizing flows convert complex distribution such as 3D meshes to simpler distribution such as multivariate gaussian
Do Adversarially Robust ImageNet Models Transfer Better?
- What features model is learning depends on
  - Convolutional prior
  - Data augmentation
  - Loss function
- Adversarial robustness
  - Train model with adversarial examples

Evaluation

Accuracy is not enough
- Precision and Recall
- Ranking
  - Expected search length: Num of non-relevant item searched before k relevant item
  - R-precision: Inverse of expected search length.
  - Reciprocal rank: Time to find a Single relevant item
  - average precision: avg precision for different recall preference
  - rank-based precision
  - expected utility
- Behavior
  - online setting
  - Explicit feedback
    - Expensive and privacy concern
  - Implicit feedback
    - can be noisy and biased
  - Use log to calculate offline metric
  - Implicit
    - Short-term
      - page level
    - long-term
      - session - level
    - clicks, dwell time, eye movement
    - zooming in and out
    - Good abandonment
    - Slate evaluation

Causal learning

Causal intervention in Semantic segmentation
- Expensive to collect label for semantic segmentation
- Context adjustment
  - Context is a confounder between input data and label. Some objects alway cooccur such as couch and TV.
  - In order to defound it, we need to find an input data from any context with the same label. e.g., Two Car pictures from any context.
  - Use causal intervention to cut connection from C to X
  - (1) Classification
  - (2) Pseudo-mask for semantic segmentation
  - (3) Confounder set to break the edge with confounder
    - Average segmentation mask of each class
  - Is there a notion of time?
[Causal imitation learning]
- Reward signal is unobserved
- Behavior cloning and inverse reinforcement learning
- Environment is structural causal model, expert, learner

Sparsity

Gaussian weights
Constant expansion between layers
Uniform concentration

Day 2

Deep Implicit Layers: Neural ODEs, Equilibrium Models and Beyond
Generative conversational AI
- Perplexity, BLEU
- Human evaluation Likert (Humanness, coherence, fluency)
- Human A/B test like interaction
- Vanilla Seq-to-Seq model: Lack of diversity, inconsistency, lack of knowledge, lack of empathy
- Ways to address:
- Deeper conversation
- Add diversity: Nucleus sampling
- Personalization: embedding
- Knowledge graph
- Text knowledge
Forget About the LiDAR: Self-Supervised Depth Estimators with MED Probability Volumes
- State of the art SIDE can be achieved by autoencoder network incorporating MED representations in the output layer
- Outperforms DORN supervised baseline

Day 1

Differential privacy with pytorch

Make boats fly
Optimize folds to laptime
Use simulator to optimize design choices
Objective is to evaluate as many design choices as possible
Boats have more inputs than f1 cars
Use RL to optimize the design choices
Boat simulator: loosely defined goal, imperfect knowledge of the environment and complex dynamics
RL controls 14 inputs. Optimizes the velocity made good.
- Autopilot/RL trained agent follow the path and perform manuevers
- Curriculumn learning, custom initialization, sharing experience replay buffer across workers, domain randomization, sample efficient, recovering spot instances, use different seeds
- Encapsulate a simulator in gym environment
- Rewards
- Autopilot/RL evaluates a design
- Genetic search like method to search optimal design choices
- Removes uncertainty of human input. No need to perfom many lapse and average output
The Challenges and Latest Advances in Causal AI
- causal model helps to adapt data drift in dynamic world
- It helps to discover causal driver and avoid spurious correlation
- Works with small data
- causalLens is an ML library that helps users to build causal model
- Reading list:
  - Beyond structural causal models: causal constraints models by Blom et al. 2019
  - Necessary and sufficient conditions for causal feature selection in time series by Mastakouri et al. 2020
  - DYNOTEARS structural learning from time series data by Pamfil et al 2020
MINING AND LEARNING WITH GRAPHS AT SCALE
Tell us about our data: Local data to Global data
Computation on Multi-modal data
Scaling: mapreduce, distributed hashtable, GraphTensor for tensorflow, jraph
Widely used features: graph building and clustering, semi-supervised learning, GNNs and embedding
Graph building: LSH, semisupervised, local search and auto-encoders.
Clustering
Information propagation
Signals and topological analysis

GNN and COVID-19

Spatial and temporal information was used to model COVID-19 spread
Graph used spatial information to build the graph. One graph is built for each day. A graph receives information from the previous day’s graph.
Model predicts case count

Graphs for privacy

Federated learning of cohorts
Cluster of k users. Users within a cluster bear similarity in browser behavior
Affinity hierarchical clustering algorithm

Causal inference

Random trial
Clustering: group nodes that are in control group and treatment group
Correlation clusterings beat balanced clustering

Grale: Building graph at scale

Semi-supervised learning and inference on the unlabeled nodes
Different types of relationship. Finding the right relationship is important
Multi-modal features
Bucket using LSH. Within each bucket, training data (whether two node belong to the same graph) and build graph
Grale at Youtube to find malicious actor
- Abusive vs non-abuse item

Clustering at scale

Affinity Hierarchical clustering
MapReduce
Randomized Composable Core-sets

Written on December 6, 2020