[[{“value”:”

GANs are often criticized for being difficult to train, with their architectures relying heavily on empirical tricks. Despite their ability to generate high-quality images in a single forward pass, the original minimax objective is challenging to optimize, leading to instability and risks of mode collapse. While alternative objectives have been introduced, issues with fragile losses persist, hindering progress. Popular GAN models like StyleGAN incorporate tricks such as gradient-penalized losses and minibatch standard deviation to address instability and diversity but lack theoretical backing. Compared to diffusion models, GANs use outdated backbones, limiting their scalability and effectiveness.

Researchers from Brown University and Cornell University challenge that GANs require numerous tricks for effective training. They introduce a modern GAN baseline by proposing a regularized relativistic GAN loss, which addresses mode dropping and convergence issues without relying on ad-hoc solutions. This loss, augmented with zero-centered gradient penalties, ensures training stability and local convergence guarantees. By simplifying and modernizing StyleGAN2, incorporating advanced elements like ResNet design, grouped convolutions, and updated initialization, they develop a minimalist GAN, R3GAN, which surpasses StyleGAN2 and rivals state-of-the-art GANs and diffusion models across multiple datasets, achieving better performance with fewer architectural complexities.

In designing GAN objectives, balancing stability and diversity is critical. Traditional GANs often face challenges like mode collapse due to their reliance on a single decision boundary to separate real and fake data. Relativistic pairing GANs (RpGANs) address this by evaluating fake samples relative to real ones, promoting better mode coverage. However, RpGANs alone struggle with convergence, particularly with sharp data distributions. Adding zero-centered gradient penalties, R1 (on real data) and R2 (on fake data), ensures stable and convergent training. Experiments on StackedMNIST show that RpGAN with R1 and R2 achieves full mode coverage, outperforming conventional GANs and mitigating gradient explosions.

R3GAN builds a simplified yet advanced baseline for GANs by addressing optimization challenges using RpGAN with R1 and R2 losses. Starting with StyleGAN2, the model progressively strips non-essential components, such as style-based generation techniques and regularization tricks, to create a minimalist backbone. Modernization steps include adopting ResNet-inspired architectures, bilinear resampling, and leaky ReLU activations while avoiding normalization layers and momentum-based optimizers. Further enhancements involve grouped convolutions, inverted bottlenecks, and fix-up initialization to stabilize training without normalization. These updates result in a more efficient and powerful architecture, achieving competitive FID scores with roughly 25M trainable parameters for both the generator and discriminator.

The experiments showcase Config E’s advancements in GAN performance. On FFHQ-256, Config E achieves an FID of 7.05, outperforming StyleGAN2 and other configurations through architectural improvements like inverted bottlenecks and grouped convolutions. On StackedMNIST, Config E achieves perfect mode recovery with the lowest KL divergence (0.029). On CIFAR-10, FFHQ-64, and ImageNet datasets, Config E consistently surpasses prior GANs and rivals diffusion models, achieving lower FID with fewer parameters and faster inference (single evaluation). Despite slightly lower recall than some diffusion models, Config E demonstrates superior sample diversity compared to other GANs, highlighting its efficiency and effectiveness without relying on pre-trained features.

In conclusion, the study presents R3GAN, a simplified and stable GAN model for image generation that uses a regularized relativistic loss (RpGAN+R1+R2) with proven convergence properties. By focusing on essential components, R3GAN eliminates many ad-hoc techniques commonly used in GANs, enabling a streamlined architecture that achieves competitive FID scores on datasets like Stacked-MNIST, FFHQ, CIFAR-10, and ImageNet. While not optimized for downstream tasks like image editing or controllable synthesis, it provides a robust baseline for future research. Limitations include the lack of scalability evaluation on higher-resolution or text-to-image tasks and ethical concerns regarding the potential misuse of generative models.

Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation Intelligence–Join this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.

The post R3GAN: A Simplified and Stable Baseline for Generative Adversarial Networks GANs appeared first on MarkTechPost.

“}]] [[{“value”:”GANs are often criticized for being difficult to train, with their architectures relying heavily on empirical tricks. Despite their ability to generate high-quality images in a single forward pass, the original minimax objective is challenging to optimize, leading to instability and risks of mode collapse. While alternative objectives have been introduced, issues with fragile losses
The post R3GAN: A Simplified and Stable Baseline for Generative Adversarial Networks GANs appeared first on MarkTechPost.”}]] Read More AI Paper Summary, AI Shorts, Applications, Artificial Intelligence, Editors Pick, Machine Learning, Staff, Tech News, Technology

R3GAN: A Simplified and Stable Baseline for Generative Adversarial Networks GANs Sana Hassan Artificial Intelligence Category – MarkTechPost

Leave a Reply Cancel reply