Optimization of Deep Generative Models
Problem of Interest
- Probabilistic deep generative models have emerged as a pivotal area in the field of machine learning with the aim of generating data samples from complex distributions and learning meaningful representations from unlabelled data in various applications such as audio, images, video, and text.
- Two common approaches, i.e., variational autoencoders (VAEs) and generative adversarial networks (GANs), have demonstrated remarkable capabilities in generating realistic data, with some limitations: (i) simplified posteriors, (ii) training instability, and (iii) fail to balance meaningful latent representations and inference quality.
Our Method & Results
- We propose Multi-Adversarial Autoencoders (MAAE) to address these issues. MAAE builds upon a hybrid architecture of VAE and GAN, known as adversarial autoencoders (AAE), by employing multiple discriminators, each independently trained to perform adversarial training against the encoder instead of directly addressing the KL penalty in VAE, ensuring smoother model training (see Figure 1).
![Figure 1. The architecture of Multi-adversarial Autoencoder (MAAE). (top) A standard autoencoder to reconstruct data from a latent vector via encoder and decoder; (bottom) Multiple discriminators that are trained to distinguish a latent sample from the encoder and a prior, providing soft-ensemble feedback to the encoder to find well-matched variational posterior.](https://prod-files-secure.s3.us-west-2.amazonaws.com/5de040a2-d005-4ddc-a63b-6ee9deabb3a1/115f6a82-9f84-435e-b63f-8707c2aa5eee/Untitled.png)
Figure 1. The architecture of Multi-adversarial Autoencoder (MAAE). (top) A standard autoencoder to reconstruct data from a latent vector via encoder and decoder; (bottom) Multiple discriminators that are trained to distinguish a latent sample from the encoder and a prior, providing soft-ensemble feedback to the encoder to find well-matched variational posterior.
- From another perspective, adjusting the different strengths of the discriminators during the training process enables the model to acquire either a more generous or meaningful representation based on its current state. This adaptive approach helps to mitigate the trade-off between these two objectives, ultimately leading to a more balanced and refined performance.
- To validate this approach, we initiated the process by assessing the model's generalization capacity using the current representation. We employed a 2D-Gaussian distribution as the prior distribution and evaluated the similarity between the representation and this prior distribution (see Figures (a)-(f)). Meanwhile, we quantified the semantic content contained in the representation by applying it to a downstream classification task and measuring the resulting error rate (see Figures (g)).
![Figure 2. Illustration of latent distributions on test data, trained by various VAE-based models and ours (MAAE). MAAE (f) demonstrates a latent space relatively closer to the prior (a), showing better inference quality. In (g), a lower error rate on a semi-supervised task applied to the learned latent vectors indicates that MAAE obtains meaningful and informative learned representations.](https://prod-files-secure.s3.us-west-2.amazonaws.com/5de040a2-d005-4ddc-a63b-6ee9deabb3a1/56d587f0-bd26-4d2d-90dc-5a2b61b62121/Untitled.png)
Figure 2. Illustration of latent distributions on test data, trained by various VAE-based models and ours (MAAE). MAAE (f) demonstrates a latent space relatively closer to the prior (a), showing better inference quality. In (g), a lower error rate on a semi-supervised task applied to the learned latent vectors indicates that MAAE obtains meaningful and informative learned representations.
Publications & Github