Disentangling random and cyclic effects in time-lapse sequences

Time-lapse image sequences offer visually compelling insights into dynamic processes that are too slow to observe in real time. However, playing a long time-lapse sequence back as a video often results in distracting flicker due to random effects, such as weather, as well as cyclic effects, such as the day-night cycle. We introduce the problem of disentangling time-lapse sequences in a way that allows separate, after-the-fact control of overall trends, cyclic effects, and random effects in the images, and describe a technique based on data-driven generative models that achieves this goal. This enables us to “re-render” the sequences in ways that would not be possible with the input images alone. For example, we can stabilize a long sequence to focus on plant growth over many months, under selectable, consistent weather.Our approach is based on Generative Adversarial Networks (GAN) that are conditioned with the time coordinate of the time-lapse sequence. Our architecture and training procedure are designed so that the networks learn to model random variations, such as weather, using the GAN’s latent space, and to disentangle overall trends and cyclic variations by feeding the conditioning time label to the model using Fourier features with specific frequencies.We show that our models are robust to defects in the training data, enabling us to amend some of the practical difficulties in capturing long time-lapse sequences, such as temporary occlusions, uneven frame spacing, and missing frames.

References:

1. Anokhin, I., Solovev, P., Korzhenkov, D., Kharlamov, A., Khakhulin, T., Silvestrov, A., Nikolenko, S., Lempitsky, V., and Sterkin, G. (2020). High-resolution daytime translation without domain labels. In Proc. CVPR.Google ScholarCross Ref
2. Brock, A., Donahue, J., and Simonyan, K. (2019). Large scale gan training for high fidelity natural image synthesis. In Proc. ICLR.Google Scholar
3. Choi, Y., Uh, Y., Yoo, J., and Ha, J.-W. (2020). Stargan v2: Diverse image synthesis for multiple domains. In Proc. CVPR.Google Scholar
4. Chong, M. J., Chu, W.-S., Kumar, A., and Forsyth, D. (2021). Retrieve in style: Unsupervised facial feature transfer and retrieval. In Proc. ICCV.Google ScholarCross Ref
5. Clark, A., Donahue, J., and Simonyan, K. (2019). Efficient video generation on complex datasets. CoRR, abs/1907.06571.Google Scholar
6. Collins, E., Bala, R., Price, B., and Süsstrunk, S. (2020). Editing in style: Uncovering the local semantics of GANs. In Proc. CVPR.Google ScholarCross Ref
7. Colton, S. and Ferrer, B. P. (2021). Ganlapse generative photography. In Proc. International Conference on Computational Creativity.Google Scholar
8. Dinh, L., Sohl-Dickstein, J., and Bengio, S. (2017). Density estimation using Real NVP. In Proc. ICLR.Google Scholar
9. Endo, Y., Kanamori, Y., and Kuriyama, S. (2019). Animating landscape: self-supervised learning of decoupled motion and appearance for single-image video synthesis. In Proc. SIGGRAPH ASIA 2019.Google ScholarDigital Library
10. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative Adversarial Networks. In Proc. NIPS.Google Scholar
11. Härkönen, E., Hertzmann, A., Lehtinen, J., and Paris, S. (2020). GANSpace: Discovering interpretable GAN controls. In Proc. NeurIPS.Google Scholar
12. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. (2017). GANs trained by a two time-scale update rule converge to a local nash equilibrium. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 6629–6640, Red Hook, NY, USA. Curran Associates Inc.Google ScholarDigital Library
13. Ho, J., Jain, A., and Abbeel, P. (2020). Denoising diffusion probabilistic models. In Proc. NeurIPS.Google Scholar
14. Horita, D. and Yanai, K. (2020). Ssa-gan: End-to-end time-lapse video generation with spatial self-attention. In Proc. ACPR.Google ScholarDigital Library
15. Huang, X., Mallya, A., Wang, T.-C., and Liu, M.-Y. (2021). Multimodal conditional image synthesis with product-of-experts GANs. CoRR, abs/2112.05130.Google Scholar
16. Jacobs, N., Burgin, W., Fridrich, N., Abrams, A., Miskell, K., Braswell, B. H., Richardson, A. D., and Pless, R. (2009). The global network of outdoor webcams: Properties and applications. In ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL).Google ScholarDigital Library
17. Jacobs, N., Roman, N., and Pless, R. (2007). Consistent temporal variations in many outdoor scenes. In Proc. CVPR.Google ScholarCross Ref
18. Kafri, O., Patashnik, O., Alaluf, Y., and Cohen-Or, D. (2021). Stylefusion: A generative model for disentangling spatial segments. CoRR, abs/2107.07437.Google Scholar
19. Karacan, L., Akata, Z., Erdem, A., and Erdem, E. (2019). Manipulating attributes of natural scenes via hallucination. In Proc. TOG.Google Scholar
20. Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., and Aila, T. (2020a). Training generative adversarial networks with limited data. In Proc. NeurIPS.Google Scholar
21. Karras, T., Aittala, M., Laine, S., Härkönen, E., Hellsten, J., Lehtinen, J., and Aila, T. (2021). Alias-free generative adversarial networks. In Proc. NeurIPS.Google Scholar
22. Karras, T., Laine, S., and Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proc. CVPR.Google ScholarCross Ref
23. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., and Aila, T. (2020b). Analyzing and improving the image quality of StyleGAN. In Proc. CVPR.Google Scholar
24. Kim, J., Kim, M., Kang, H., and Lee, K. (2020). U-gat-it: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. In Proc. ICLR.Google Scholar
25. Kingma, D. P. and Dhariwal, P. (2018). Glow: Generative flow with invertible 1×1 convolutions. In Proc. NeurIPS.Google Scholar
26. Kingma, D. P. and Welling, M. (2014). Auto-encoding variational bayes. In Proc. ICLR.Google Scholar
27. Logacheva, E., Suvorov, R., Khomenko, O., Mashikhin, A., and Lempitsky, V. (2020). Deeplandscape: Adversarial modeling of landscape videos. In Proc. ECCV.Google ScholarDigital Library
28. Martin-Brualla, R., Gallup, D., and Seitz, S. M. (2015). Time-lapse mining from internet photos. In Proc. TOG.Google ScholarDigital Library
29. Mirza, M. and Osindero, S. (2014). Conditional generative adversarial nets. CoRR, abs/1411.1784.Google Scholar
30. Miyato, T. and Koyama, M. (2018). cgans with projection discriminator. In Proc. ICLR.Google Scholar
31. Nam, S., Ma, C., Chai, M., Brendel, W., Xu, N., and Kim, S. J. (2019). End-to-end time-lapse video synthesis from a single outdoor image. In Proc. CVPR.Google ScholarCross Ref
32. Park, T., Liu, M.-Y., Wang, T., and Zhu, J.-Y. (2019). Semantic image synthesis with spatially-adaptive normalization. In Proc. CVPR.Google Scholar
33. Park, T., Zhu, J.-Y., Wang, O., Lu, J., Shechtman, E., Efros, A. A., and Zhang, R. (2020). Swapping autoencoder for deep image manipulation. In Proc. NeurIPS.Google Scholar
34. Sohl-Dickstein, J., Weiss, E. A., Maheswaranathan, N., and Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. In Proc. ICML.Google Scholar
35. Song, Y. and Ermon, S. (2019). Generative modeling by estimating gradients of the data distribution. In Proc. NeurIPS.Google Scholar
36. Sun, J., Shen, Z., Wang, Y., Bao, H., and Zhou, X. (2021). LoFTR: Detector-free local feature matching with transformers. In Proc. CVPR.Google ScholarCross Ref
37. Tancik, M., Srinivasan, P. P., Mildenhall, B., Fridovich-Keil, S., Raghavan, N., Singhal, U., Ramamoorthi, R., Barron, J. T., and Ng, R. (2020). Fourier features let networks learn high frequency functions in low dimensional domains. In Proc. NeurIPS.Google Scholar
38. Tulyakov, S., Liu, M.-Y., Yang, X., and Kautz, J. (2018). MoCoGAN: Decomposing motion and content for video generation. In Proc. CVPR.Google ScholarCross Ref
39. van den Oord, A., Kalchbrenner, N., and Kavukcuoglu, K. (2016a). Pixel recurrent neural networks. In Proc. ICML.Google Scholar
40. van den Oord, A., Kalchbrenner, N., Vinyals, O., Espeholt, L., Graves, A., and Kavukcuoglu, K. (2016b). Conditional image generation with PixelCNN decoders. In Proc. NIPS.Google Scholar
41. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017). Attention is all you need. In Proc. NeurIPS.Google Scholar
42. Wang, T.-C., Liu, M.-Y., Tao, A., Liu, G., Kautz, J., and Catanzaro, B. (2019). Few-shot video-to-video synthesis. In Proc. NeurIPS.Google Scholar
43. Wang, T.-C., Liu, M.-Y., Zhu, J.-Y., Liu, G., Tao, A., Kautz, J., and Catanzaro, B. (2018). Video-to-video synthesis. In Proc. NeurIPS.Google Scholar
44. Xiong, W., Luo, W., Ma, L., Liu, W., and Luo, J. (2018). Learning to generate time-lapse videos using multi-stage dynamic generative adversarial networks. In Proc. CVPR.Google ScholarCross Ref
45. Zhu, J.-Y., Park, T., Isola, P., and Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proc. ICCV.Google ScholarCross Ref

ACM Digital Library Publication:

Overview Page:

SIGGRAPH 2022: Technical Papers

“Disentangling random and cyclic effects in time-lapse sequences” by Härkönen, Aittala, Kynkäänniemi, Laine, Aila, et al. …

Conference:

Type(s):

Title:

Presenter(s)/Author(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Sponsored by: