簡易檢索 / 詳目顯示

研究生: Littek, Alina Raffaella Giulia
Littek, Alina Raffaella Giulia
論文名稱: Explainable Anomaly Detection in Surveillance Videos: Autoencoder-based Reconstruction and Error Map Visualization
Explainable Anomaly Detection in Surveillance Videos: Autoencoder-based Reconstruction and Error Map Visualization
指導教授: 葉梅珍
Yeh, Mei-Chen
口試委員: 葉梅珍
Yeh, Mei-Chen
王科植
Wang, Ko-Chih
吳志強
Wu, Jhih-Ciang
口試日期: 2024/05/29
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 77
英文關鍵詞: Machine Learning, Anomaly Detection, Explainability
DOI URL: http://doi.org/10.6345/NTNU202400956
論文種類: 學術論文
相關次數: 點閱:157下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • The ever-increasing volume of surveillance video data creates a challenge for security applications, rendering manual monitoring impractical. Existing automatic anomaly detection methods often rely on computationally expensive processing steps, require substantial labeled training data, and lack interpretability. This project addresses these limitations by proposing an unsupervised, end-to-end deep learning framework with built-in explainability for anomaly detection in videos. Central to this approach is the autoencoder model, leveraging its capability to reconstruct video frames and identify abnormal patterns through the analysis of reconstruction errors. Five different lightweight autoencoder architectures are investigated, exploring the effectiveness of 2D and 3D convolutions, denoising techniques, and spatio-temporal layers for capturing both spatial and temporal features directly from raw video data. These models achieve promising performance, with Area Under the Curve values ranging from 70% to 95% on the benchmark UCSD Pedestrian datasets, showcasing the potential of lightweight architectures for efficient deployment in diverse environments. The proposed framework offers several advantages beyond efficient anomaly detection. It directly extracts spatial and temporal features from raw video, simplifying system design and eliminating the need for complex processing steps. Additionally, inherent interpretability is achieved through error maps generated during reconstruction. This transparency allows for understanding the model's decisions and accurate anomaly localization for human oversight. It is crucial for building trust in anomaly detection systems in real-world surveillance applications. This research establishes a foundation for the development of robust and ethical anomaly detection systems with a focus on lightweight and explainable models.

    1 Introduction 1 2 Background 4 2.1 Anomalies and Anomaly Detection . . . . . . . . . . . . . . . 4 2.2 Deep Learning Concepts . . . . . . . . . . . . . . . . . . . . . 5 2.2.1 Recurrent Neural Network and Long Short-Term Memory . . . . 5 2.2.2 Convolutional Neural Network . . . . . . . . . . . . . . 6 2.2.3 Autoencoder Model . . . . . . . . . . . . . . . . . . . . 8 2.2.4 Generative Adversarial Network . . . . . . . . . . . . . 8 2.2.5 Activation and Loss Functions . . . . . . . . . . . . . . 9 2.2.6 Transfer Learning . . . . . . . . . . . . . . . . . . . . . 11 2.3 Explainability in Deep Learning . . . . . . . . . . . . . . . . . 12 3 Related Work 14 3.1 Traditional Anomaly Detection . . . . . . . . . . . . . . . . . 14 3.2 Deep Learning-based Anomaly Detection . . . . . . . . . . . . 15 3.3 Explainability in Anomaly Detection . . . . . . . . . . . . . . 19 3.4 Benchmark Datasets . . . . . . . . . . . . . . . . . . . . . . . 21 3.4.1 UCSD Pedestrian Dataset . . . . . . . . . . . . . . . . 21 3.4.2 ShanghaiTech Campus Dataset . . . . . . . . . . . . . 21 3.4.3 CUHK Avenue Dataset . . . . . . . . . . . . . . . . . . 22 3.4.4 UCF Crime Dataset . . . . . . . . . . . . . . . . . . . 22 4 Methodologies 23 4.1 Unsupervised End-to-End Anomaly Detection . . . . . . . . . 23 4.1.1 Convolutional Autoencoder . . . . . . . . . . . . . . . 25 4.1.2 Denoising Autoencoder . . . . . . . . . . . . . . . . . . 27 4.1.3 LSTM Autoencoder . . . . . . . . . . . . . . . . . . . . 28 4.2 Temporal Encoding . . . . . . . . . . . . . . . . . . . . . . . . 30 4.3 Explainability . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.4 Evaluation Methods . . . . . . . . . . . . . . . . . . . . . . . 32 5 Experiments 36 5.1 Data and Processing . . . . . . . . . . . . . . . . . . . . . . . 36 5.2 End-to-end Training . . . . . . . . . . . . . . . . . . . . . . . 38 5.2.1 2D and 3D Convolutional Autoencoder . . . . . . . . . 40 5.2.2 Denoising: Noise Types and Levels . . . . . . . . . . . 40 5.3 Model Testing and Inference . . . . . . . . . . . . . . . . . . . 42 5.3.1 Reconstruction Error Calculation . . . . . . . . . . . . 42 5.3.2 Normalization and Smoothing . . . . . . . . . . . . . . 43 5.3.3 Binary Anomaly Predictions . . . . . . . . . . . . . . . 46 5.4 Model Generalization . . . . . . . . . . . . . . . . . . . . . . . 48 5.5 Anomaly Localization for Explainability . . . . . . . . . . . . 49 6 Results and Evaluation 53 6.1 Generalization Ability . . . . . . . . . . . . . . . . . . . . . . 60 6.2 Comparison with State-of-the-Art . . . . . . . . . . . . . . . . 62 6.3 Explainability . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 7 Discussion 65 7.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 7.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 7.3 Legal, Social, and Ethical Considerations . . . . . . . . . . . . 70 8 Conclusion 72 9 Future Work 73 References 74

    [1] W. Li, V. Mahadevan, and N. Vasconcelos, “Anomaly detection and localization in crowded scenes,” IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 1, pp. 18–32, 2013.
    [2] K. K. Santhosh, D. P. Dogra, and P. P. Roy, “Anomaly detection in road traffic using visual surveillance: A survey,” ACM Computing Surveys, vol. 53, no. 6, 12 2020. [Online]. Available: https://doi.org/10.1145/3417989
    [3] B. Lu, D. Xu, and B. Huang, “Deep-learning-based anomaly detection for lace defect inspection employing videos in production line,” Advanced Engineering Informatics, vol. 51, p. 101471, 2022.
    [4] W. Wang, A. Tamhane, C. Santos, J. R. Rzasa, J. H. Clark, T. L. Canares, and M. Unberath, “Pediatric otoscopy video screening with shift contrastive anomaly detection,” Frontiers in Digital Health, vol. 3, 2022.
    [5] R. Raja, P. C. Sharma, M. R. Mahmood, and D. K. Saini, “Analysis of anomaly detection in surveillance video: recent trends and future vision,” Multimedia Tools and Applications, vol. 82, no. 8, pp. 12 635–12 651, 2023.
    [6] J. Yang, R. Xu, Z. Qi, and Y. Shi, “Visual anomaly detection for images: A systematic survey,” Procedia Computer Science, vol. 199, pp. 471–478, 2022, the 8th International Conference on Information Technology and Quantitative Management (ITQM 2020 & 2021): Developing Global Digital Economy after COVID-19.
    [7] A. Berroukham, K. Housni, M. Lahraichi, and I. Boulfrifi, “Deep learning-based methods for anomaly detection in video surveillance: a review,” Bulletin of Electrical Engineering and Informatics, vol. 12, pp. 314–327, 02 2023.
    [8] Z. Li, Y. Zhu, and M. Van Leeuwen, “A survey on explainable anomaly detection,”ACM Trans. Knowl. Discov. Data, vol. 18, no. 1, 09 2023. [Online]. Available: https://doi.org/10.1145/3609333
    [9] V. Mohindru and S. Singla, “A review of anomaly detection techniques using computer vision,” in Recent Innovations in Computing. Singapore: Springer Singapore, 2021, pp. 669–677.
    [10] R. Chalapathy and S. Chawla, “Deep learning for anomaly detection: A survey,”CoRR, vol. abs/1901.03407, 2019. [Online]. Available: http://arxiv.org/abs/1901.03407
    [11] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
    [12] K. O’Shea and R. Nash, “An introduction to convolutional neural networks,” arXiv preprint arXiv:1511.08458, 2015.
    [13] M. Sabokrou, M. Fayyaz, M. Fathy, Z. Moayed, and R. Klette, “Deep-anomaly: Fully convolutional neural network for fast anomaly detection in crowded scenes,” Computer Vision and Image Understanding, vol. 172, pp. 88–97, 2018.
    [14] D. Bank, N. Koenigstein, and R. Giryes, “Autoencoders,” Machine learning for data science handbook: data mining and knowledge discovery handbook, pp. 353–374, 2023.
    [15] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems, vol. 27. Curran Associates, Inc., 2014.
    [16] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
    [17] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
    [18] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.
    [19] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255.
    [20] M. Iman, H. R. Arabnia, and K. Rasheed, “A review of deep transfer learning and recent advancements,” Technologies, vol. 11, no. 2, 2023.
    [21] W. Samek, T. Wiegand, and K. Muller, “Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models,” CoRR, vol.abs/1708.08296, 2017. [Online]. Available: http://arxiv.org/abs/1708.08296
    [22] G. Ras, N. Xie, M. Van Gerven, and D. Doran, “Explainable deep learning: A field guide for the uninitiated,” Journal of Artificial Intelligence Research, vol. 73, pp. 329–396, 2022.
    [23] G. Medioni, I. Cohen, F. Bremond, S. Hongeng, and R. Nevatia, “Event detection and analysis from video streams,” IEEE Transactions on pattern analysis and machine intelligence, vol. 23, no. 8, pp. 873–889, 2001.
    [24] T. Zhang, H. Lu, and S. Z. Li, “Learning semantic scene models by object classification and trajectory clustering,” in 2009 IEEE conference on computer vision and pattern recognition. IEEE, 2009, pp. 1940–1947.
    [25] J. Kim and K. Grauman, “Observe locally, infer globally: a space-time mrf for detecting abnormal activities with incremental updates,” in 2009 IEEE conference on computer vision and pattern recognition. IEEE, 2009, pp. 2921–2928.
    [26] T. Wang and H. Snoussi, “Histograms of optical flow orientation for visual abnormal events detection,” in 2012 IEEE ninth international conference on advanced video and signal-based surveillance. IEEE, 2012, pp. 13–18.
    [27] A. Del Giorno, J. A. Bagnell, and M. Hebert, “A discriminative framework for anomaly detection in large videos,” in Computer Vision–ECCV 2016: 14th Euro-pean Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V 14. Springer, 2016, pp. 334–349.
    [28] Y. Cong, J. Yuan, and J. Liu, “Abnormal event detection in crowded scenes using sparse representation,” Pattern Recognition, vol. 46, no. 7, pp. 1851–1864, 2013.
    [29] M. Sabokrou, M. Fathy, and M. Hoseini, “Video anomaly detection and localisation based on the sparsity and reconstruction error of auto-encoder,” Electronics Letters, vol. 52, no. 13, pp. 1122–1124, 2016.
    [30] N. Japkowicz, C. Myers, and M. Gluck, “A novelty detection approach to classification,” Proceedings of the Fourteenth Joint Conference on Artificial Intelligence, 10 1999.
    [31] M. Sakurada and T. Yairi, “Anomaly detection using autoencoders with nonlinear dimensionality reduction,” ser. MLSDA’14. New York, NY, USA: Association for Computing Machinery, 2014, p. 4–11. [Online]. Available: https://doi.org/10.1145/2689746.2689747
    [32] Y. Fei, C. Huang, J. Cao, M. Li, Y. Zhang, and C. Lu, “Attribute restoration framework for anomaly detection,” IEEE Transactions on Multimedia, vol. PP, pp. 1–1, 12 2020.
    [33] Y. Zhao, B. Deng, C. Shen, Y. Liu, H. Lu, and X.-S. Hua, “Spatio-temporal autoencoder for video anomaly detection,” in Proceedings of the 25th ACM International Conference on Multimedia, ser. MM ’17. New York, NY, USA: Association for Computing Machinery, 2017, p. 1933–1941. [Online]. Available: https://doi.org/10.1145/3123266.3123451
    [34] K. Deepak, S. Chandrakala, and C. K. Mohan, “Residual spatiotemporal autoencoder for unsupervised video anomaly detection,” Signal, Image and Video Processing, vol. 15, no. 1, pp. 215–222, 2021.
    [35] B. R. Kiran, D. M. Thomas, and R. Parakkal, “An overview of deep learning based methods for unsupervised and semi-supervised anomaly detection in videos,” Journal of Imaging, vol. 4, no. 2, p. 36, 2018.
    [36] M. Ravanbakhsh, M. Nabi, E. Sangineto, L. Marcenaro, C. Regazzoni, and N. Sebe,“Abnormal event detection in videos using generative adversarial nets,” in 2017 IEEE International Conference on Image Processing (ICIP), 2017, pp. 1577–1581.
    [37] T. Schlegl, P. Seeb ̈ock, S. M. Waldstein, U. Schmidt-Erfurth, and G. Langs, “Unsupervised anomaly detection with generative adversarial networks to guide marker discovery,” CoRR, vol. abs/1703.05921, 2017. [Online]. Available: http://arxiv.org/abs/1703.05921
    [38] H. Zenati, C. S. Foo, B. Lecouat, G. Manek, and V. R. Chandrasekhar, “Efficient gan-based anomaly detection,” CoRR, vol. abs/1802.06222, 2018. [Online]. Available: http://arxiv.org/abs/1802.06222
    [39] M. O. Kaplan and S. E. Alptekin, “An improved bigan based approach for anomaly detection,” Procedia Computer Science, vol. 176, pp. 185–194, 2020.
    [40] M. Sabokrou, M. Khalooei, M. Fathy, and E. Adeli, “Adversarially learned one-class classifier for novelty detection,” CoRR, vol. abs/1802.09088, 2018. [Online]. Available: http://arxiv.org/abs/1802.09088
    [41] A. Atghaei, S. Ziaeinejad, and M. Rahmati, “Abnormal event detection in urban surveillance videos using GAN and transfer learning,” CoRR, vol. abs/2011.09619, 2020. [Online]. Available: https://arxiv.org/abs/2011.09619
    [42] C. Wu, S. Shao, C. Tunc, P. Satam, and S. Hariri, “An explainable and efficient deep learning framework for video anomaly detection,” Cluster computing, pp. 1–23, 2021.
    [43] L. Antwarg, R. M. Miller, B. Shapira, and L. Rokach, “Explaining anomalies detected by autoencoders using shapley additive explanations,” Expert Systems with Applications, vol. 186, p. 115736, 2021.
    [44] H.-T. Duong, V.-T. Le, and V. T. Hoang, “Deep learning-based anomaly detection in video surveillance: A survey,” Sensors, vol. 23, no. 11, 2023. [Online]. Available: https://www.mdpi.com/1424-8220/23/11/5024
    [45] W. Liu, D. L. W. Luo, and S. Gao, “Future frame prediction for anomaly detection – a new baseline,” in 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    [46] C. Lu, J. Shi, and J. Jia, “Abnormal event detection at 150 fps in matlab,” in 2013 IEEE International Conference on Computer Vision, 2013, pp. 2720–2727.
    [47] W. Sultani, C. Chen, and M. Shah, “Real-world anomaly detection in surveillance videos,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6479–6488.
    [48] Y. Zhang, “A better autoencoder for image: Convolutional autoencoder,” in ICONIP17-DCEC, 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:209442203
    [49] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3d convolutional networks,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 4489–4497.
    [50] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in Proceedings of the 25th international conference on Machine learning, 2008, pp. 1096–1103.
    [51] S. Kaur, “Noise types and various removal techniques,” International Journal of Advanced Research in Electronics and Communication Engineering (IJARECE), vol. 4, no. 2, pp. 226–230, 2015.
    [52] M. Hasan, J. Choi, J. Neumann, A. K. Roy-Chowdhury, and L. S. Davis, “Learning temporal regularity in video sequences,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 733–742.

    下載圖示
    QR CODE