

In this study, various acoustic and visual features are proposed for frog call classification: MPEG-7 audio descriptor, syllable duration, oscillation rate, entropy related features, linear prediction codings, Mel-frequency Cepstral coefficients, local binary patterns, and histogram of oriented gradients.

However, few studies investigate visual features for frog call classification, which have been successfully used in acoustic event detection, speech/speaker recognition. Previous studies have explored various acoustic features for classifying frog calls.

Specifically, frog populations can be reflected by detecting frog species using collected recordings. Recent advances in acoustic sensors provide a novel way to assess frog vocalizations and further optimize the global protection policy. Rapid decreases in frog populations have been spotted worldwide, which are regarded as one of the most critical threats to the global biodiversity.
