Reducing the stride of the convolution kernel: a simple and effective strategy to increase the performance of CNN in building extraction from remote sensing image
- 1Faculty of Geographical Science, Beijing Normal University, Beijing , China (201931051048@mail.bnu.edu.cn)
- 2State Key Laboratory of Remote Sensing Science, Beijing Normal University, Beijing , China (jjwu@bnu.edu.cn)
- 3Beijing Key Laboratory for remote sensing of Environmental and Digital Cities, Beijing Normal University, Beijing, China; (jjwu@bnu.edu.cn)
Automatically extracting buildings from remote sensing images (RSI) plays important roles in urban planning, population estimation, disaster emergency response, etc. With the development of deep learning technology, convolutional neural networks (CNN) with better performance than traditional methods have been widely used in extracting buildings from remote sensing imagery (RSI). But it still faces some problems. First of all, low-level features extracted by shallow layers and abstract features extracted by deep layers of the artificial neural network could not be fully fused. it makes building extraction is often inaccurate, especially for buildings with complex structures, irregular shapes and small sizes. Secondly, there are so many parameters that need to be trained in a network, which occupies a lot of computing resources and consumes a lot of time in the training process. By analyzing the structure of the CNN, we found that abstract features extracted by deep layers with low geospatial resolution contain more semantic information. These abstract features are conducive to determine the category of pixels while not sensitive to the boundaries of the buildings. We found the stride of the convolution kernel and pooling operation reduced the geospatial resolution of feature maps, so, this paper proposed a simple and effective strategy—reduce the stride of convolution kernel contains in one of the layers and reduced the number of convolutional kernels to alleviate the above two bottlenecks. This strategy was used to deeplabv3+net and the experimental results for both the WHU Building Dataset and Massachusetts Building Dataset. Compared with the original deeplabv3+net the result showed that this strategy has a better performance. In terms of WHU building data set, the Intersection over Union (IoU) increased by 1.4% and F1 score increased by 0.9%; in terms of Massachusetts Building Dataset, IoU increased by 3.31% and F1 score increased by 2.3%.
How to cite: Chen, M., Wu, J., and Tian, F.: Reducing the stride of the convolution kernel: a simple and effective strategy to increase the performance of CNN in building extraction from remote sensing image, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10783, https://doi.org/10.5194/egusphere-egu21-10783, 2021.