Abstract
This paper presents a novel, noninvasive framework for automated dysphagia screening using neck acoustic sensing and machine learning. Dysphagia, a condition characterized by difficulty swallowing, affects a significant portion of the population, particularly older adults and individuals with neurological or oncological conditions. Current diagnostic methods, such as videofluoroscopic swallowing studies (VFSS) and fiberoptic endoscopic evaluation of swallowing (FEES), are invasive, costly, and require specialized equipment and trained personnel. To address these limitations, the authors developed a machine learning-based system that analyzes acoustic signals captured from the neck during swallowing tasks. The study collected data from 49 participants undergoing FEES, with acoustic signals annotated using the penetration-aspiration scale (PAS) to assess swallowing dysfunction. The proposed model achieved a high classification performance, with an AUC-ROC of 0.904 across five independent train-test splits. This work demonstrates the feasibility of using noninvasive acoustic sensing as a scalable, cost-effective, and practical tool for dysphagia screening and pharyngeal health monitoring.
Methodology
The study involved collecting neck acoustic signals from 49 participants during FEES, a gold-standard swallowing evaluation. Acoustic data was annotated using the penetration-aspiration scale (PAS) to classify swallowing events as normal or abnormal. The authors employed signal processing techniques for feature extraction and trained machine learning models, including pre-trained audio embedding models like OPERA, to classify swallowing abnormalities. The model's performance was evaluated using five independent train-test splits, with additional experiments to assess the impact of demographic features and model architectures.
Results
The proposed system achieved an AUC-ROC of 0.904 for detecting swallowing abnormalities, demonstrating high diagnostic accuracy. The OPERA pre-trained model outperformed other baseline models in feature extraction. Demographic features such as age and gender had minimal impact on model performance. The study also highlighted the system's ability to generalize across different bolus consistencies, addressing a key limitation of prior research.
Implications
This work has significant implications for healthcare, offering a noninvasive, portable, and cost-effective tool for dysphagia screening and pharyngeal health monitoring. The system could be deployed in clinical settings or as a point-of-care device, reducing reliance on invasive and resource-intensive diagnostic methods. Additionally, it has the potential to improve early detection and intervention for dysphagia, particularly in high-risk populations such as older adults and patients with neurological or oncological conditions.
View on arXiv