With the rapid development of Internet of Everything and artificial intelligence techniques and massive amounts of video surveillance data, crowd counting has drawn extensive attention in computer vision. Inspired by deep learning methods, convolutional neural networks (CNN) have been dedicated to improving the effectiveness of crowd counting. As CNN is unable to capture the continuous size changes of heads in images, the large-scale variations impede the development of crowd counting. To solve this problem, this paper presents an attention and multi-feature fused network (AMFNet) containing a multi-level feature extractor and four attentional density estimator (ADE) modules. The multi-level extractor is used to extract the features of different sizes and various kinds of context information based on a deep network backbone. The existing ADE modules are built to merge different level features to generate a high-quality density map. A channel attention unit is adopted in the ADE modules to identify the head accurately. Then, four ADE modules are applied to exploit multi-level features and generate a fine-grained density map for coping with various scales. The experiment results show that the proposed AMFNet performs well in dense crowd scenarios, and that it is comparable to mainstream methods in terms of accuracy and robustness.

Crowd Counting via Attention and Multi-Feature Fused Network, 2023.

Crowd Counting via Attention and Multi-Feature Fused Network

Bruno, A
2023-01-01

Abstract

With the rapid development of Internet of Everything and artificial intelligence techniques and massive amounts of video surveillance data, crowd counting has drawn extensive attention in computer vision. Inspired by deep learning methods, convolutional neural networks (CNN) have been dedicated to improving the effectiveness of crowd counting. As CNN is unable to capture the continuous size changes of heads in images, the large-scale variations impede the development of crowd counting. To solve this problem, this paper presents an attention and multi-feature fused network (AMFNet) containing a multi-level feature extractor and four attentional density estimator (ADE) modules. The multi-level extractor is used to extract the features of different sizes and various kinds of context information based on a deep network backbone. The existing ADE modules are built to merge different level features to generate a high-quality density map. A channel attention unit is adopted in the ADE modules to identify the head accurately. Then, four ADE modules are applied to exploit multi-level features and generate a fine-grained density map for coping with various scales. The experiment results show that the proposed AMFNet performs well in dense crowd scenarios, and that it is comparable to mainstream methods in terms of accuracy and robustness.
Inglese
2023
http://hcisj.com/articles/?HCIS202313050
KOREA INFORMATION PROCESSING SOC
13
Korea (Republic of)
internazionale
esperti anonimi
con ISI Impact Factor
Online
Settore INF/01 - Informatica
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
7
File in questo prodotto:
File Dimensione Formato  
13-50 (2).pdf

Open Access

Tipologia: Documento in Post-print
Dimensione 7.51 MB
Formato Adobe PDF
7.51 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10808/54065
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
social impact