“Disentangling random and cyclic effects in time-lapse sequences” by Härkönen, Aittala, Kynkäänniemi, Laine, Aila, et al. …

  • ©Erik Härkönen, Miika Aittala, Tuomas Kynkäänniemi, Samuli Laine, Timo Aila, and Jaakko Lehtinen




    Disentangling random and cyclic effects in time-lapse sequences



    Time-lapse image sequences offer visually compelling insights into dynamic processes that are too slow to observe in real time. However, playing a long time-lapse sequence back as a video often results in distracting flicker due to random effects, such as weather, as well as cyclic effects, such as the day-night cycle. We introduce the problem of disentangling time-lapse sequences in a way that allows separate, after-the-fact control of overall trends, cyclic effects, and random effects in the images, and describe a technique based on data-driven generative models that achieves this goal. This enables us to “re-render” the sequences in ways that would not be possible with the input images alone. For example, we can stabilize a long sequence to focus on plant growth over many months, under selectable, consistent weather.Our approach is based on Generative Adversarial Networks (GAN) that are conditioned with the time coordinate of the time-lapse sequence. Our architecture and training procedure are designed so that the networks learn to model random variations, such as weather, using the GAN’s latent space, and to disentangle overall trends and cyclic variations by feeding the conditioning time label to the model using Fourier features with specific frequencies.We show that our models are robust to defects in the training data, enabling us to amend some of the practical difficulties in capturing long time-lapse sequences, such as temporary occlusions, uneven frame spacing, and missing frames.


    1. Anokhin, I., Solovev, P., Korzhenkov, D., Kharlamov, A., Khakhulin, T., Silvestrov, A., Nikolenko, S., Lempitsky, V., and Sterkin, G. (2020). High-resolution daytime translation without domain labels. In Proc. CVPR.Google ScholarCross Ref
    2. Brock, A., Donahue, J., and Simonyan, K. (2019). Large scale gan training for high fidelity natural image synthesis. In Proc. ICLR.Google Scholar
    3. Choi, Y., Uh, Y., Yoo, J., and Ha, J.-W. (2020). Stargan v2: Diverse image synthesis for multiple domains. In Proc. CVPR.Google Scholar
    4. Chong, M. J., Chu, W.-S., Kumar, A., and Forsyth, D. (2021). Retrieve in style: Unsupervised facial feature transfer and retrieval. In Proc. ICCV.Google ScholarCross Ref
    5. Clark, A., Donahue, J., and Simonyan, K. (2019). Efficient video generation on complex datasets. CoRR, abs/1907.06571.Google Scholar
    6. Collins, E., Bala, R., Price, B., and Süsstrunk, S. (2020). Editing in style: Uncovering the local semantics of GANs. In Proc. CVPR.Google ScholarCross Ref
    7. Colton, S. and Ferrer, B. P. (2021). Ganlapse generative photography. In Proc. International Conference on Computational Creativity.Google Scholar
    8. Dinh, L., Sohl-Dickstein, J., and Bengio, S. (2017). Density estimation using Real NVP. In Proc. ICLR.Google Scholar
    9. Endo, Y., Kanamori, Y., and Kuriyama, S. (2019). Animating landscape: self-supervised learning of decoupled motion and appearance for single-image video synthesis. In Proc. SIGGRAPH ASIA 2019.Google ScholarDigital Library
    10. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative Adversarial Networks. In Proc. NIPS.Google Scholar
    11. Härkönen, E., Hertzmann, A., Lehtinen, J., and Paris, S. (2020). GANSpace: Discovering interpretable GAN controls. In Proc. NeurIPS.Google Scholar
    12. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. (2017). GANs trained by a two time-scale update rule converge to a local nash equilibrium. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 6629–6640, Red Hook, NY, USA. Curran Associates Inc.Google ScholarDigital Library
    13. Ho, J., Jain, A., and Abbeel, P. (2020). Denoising diffusion probabilistic models. In Proc. NeurIPS.Google Scholar
    14. Horita, D. and Yanai, K. (2020). Ssa-gan: End-to-end time-lapse video generation with spatial self-attention. In Proc. ACPR.Google ScholarDigital Library
    15. Huang, X., Mallya, A., Wang, T.-C., and Liu, M.-Y. (2021). Multimodal conditional image synthesis with product-of-experts GANs. CoRR, abs/2112.05130.Google Scholar
    16. Jacobs, N., Burgin, W., Fridrich, N., Abrams, A., Miskell, K., Braswell, B. H., Richardson, A. D., and Pless, R. (2009). The global network of outdoor webcams: Properties and applications. In ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL).Google ScholarDigital Library
    17. Jacobs, N., Roman, N., and Pless, R. (2007). Consistent temporal variations in many outdoor scenes. In Proc. CVPR.Google ScholarCross Ref
    18. Kafri, O., Patashnik, O., Alaluf, Y., and Cohen-Or, D. (2021). Stylefusion: A generative model for disentangling spatial segments. CoRR, abs/2107.07437.Google Scholar
    19. Karacan, L., Akata, Z., Erdem, A., and Erdem, E. (2019). Manipulating attributes of natural scenes via hallucination. In Proc. TOG.Google Scholar
    20. Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., and Aila, T. (2020a). Training generative adversarial networks with limited data. In Proc. NeurIPS.Google Scholar
    21. Karras, T., Aittala, M., Laine, S., Härkönen, E., Hellsten, J., Lehtinen, J., and Aila, T. (2021). Alias-free generative adversarial networks. In Proc. NeurIPS.Google Scholar
    22. Karras, T., Laine, S., and Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proc. CVPR.Google ScholarCross Ref
    23. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., and Aila, T. (2020b). Analyzing and improving the image quality of StyleGAN. In Proc. CVPR.Google Scholar
    24. Kim, J., Kim, M., Kang, H., and Lee, K. (2020). U-gat-it: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. In Proc. ICLR.Google Scholar
    25. Kingma, D. P. and Dhariwal, P. (2018). Glow: Generative flow with invertible 1×1 convolutions. In Proc. NeurIPS.Google Scholar
    26. Kingma, D. P. and Welling, M. (2014). Auto-encoding variational bayes. In Proc. ICLR.Google Scholar
    27. Logacheva, E., Suvorov, R., Khomenko, O., Mashikhin, A., and Lempitsky, V. (2020). Deeplandscape: Adversarial modeling of landscape videos. In Proc. ECCV.Google ScholarDigital Library
    28. Martin-Brualla, R., Gallup, D., and Seitz, S. M. (2015). Time-lapse mining from internet photos. In Proc. TOG.Google ScholarDigital Library
    29. Mirza, M. and Osindero, S. (2014). Conditional generative adversarial nets. CoRR, abs/1411.1784.Google Scholar
    30. Miyato, T. and Koyama, M. (2018). cgans with projection discriminator. In Proc. ICLR.Google Scholar
    31. Nam, S., Ma, C., Chai, M., Brendel, W., Xu, N., and Kim, S. J. (2019). End-to-end time-lapse video synthesis from a single outdoor image. In Proc. CVPR.Google ScholarCross Ref
    32. Park, T., Liu, M.-Y., Wang, T., and Zhu, J.-Y. (2019). Semantic image synthesis with spatially-adaptive normalization. In Proc. CVPR.Google Scholar
    33. Park, T., Zhu, J.-Y., Wang, O., Lu, J., Shechtman, E., Efros, A. A., and Zhang, R. (2020). Swapping autoencoder for deep image manipulation. In Proc. NeurIPS.Google Scholar
    34. Sohl-Dickstein, J., Weiss, E. A., Maheswaranathan, N., and Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. In Proc. ICML.Google Scholar
    35. Song, Y. and Ermon, S. (2019). Generative modeling by estimating gradients of the data distribution. In Proc. NeurIPS.Google Scholar
    36. Sun, J., Shen, Z., Wang, Y., Bao, H., and Zhou, X. (2021). LoFTR: Detector-free local feature matching with transformers. In Proc. CVPR.Google ScholarCross Ref
    37. Tancik, M., Srinivasan, P. P., Mildenhall, B., Fridovich-Keil, S., Raghavan, N., Singhal, U., Ramamoorthi, R., Barron, J. T., and Ng, R. (2020). Fourier features let networks learn high frequency functions in low dimensional domains. In Proc. NeurIPS.Google Scholar
    38. Tulyakov, S., Liu, M.-Y., Yang, X., and Kautz, J. (2018). MoCoGAN: Decomposing motion and content for video generation. In Proc. CVPR.Google ScholarCross Ref
    39. van den Oord, A., Kalchbrenner, N., and Kavukcuoglu, K. (2016a). Pixel recurrent neural networks. In Proc. ICML.Google Scholar
    40. van den Oord, A., Kalchbrenner, N., Vinyals, O., Espeholt, L., Graves, A., and Kavukcuoglu, K. (2016b). Conditional image generation with PixelCNN decoders. In Proc. NIPS.Google Scholar
    41. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017). Attention is all you need. In Proc. NeurIPS.Google Scholar
    42. Wang, T.-C., Liu, M.-Y., Tao, A., Liu, G., Kautz, J., and Catanzaro, B. (2019). Few-shot video-to-video synthesis. In Proc. NeurIPS.Google Scholar
    43. Wang, T.-C., Liu, M.-Y., Zhu, J.-Y., Liu, G., Tao, A., Kautz, J., and Catanzaro, B. (2018). Video-to-video synthesis. In Proc. NeurIPS.Google Scholar
    44. Xiong, W., Luo, W., Ma, L., Liu, W., and Luo, J. (2018). Learning to generate time-lapse videos using multi-stage dynamic generative adversarial networks. In Proc. CVPR.Google ScholarCross Ref
    45. Zhu, J.-Y., Park, T., Isola, P., and Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proc. ICCV.Google ScholarCross Ref

ACM Digital Library Publication: