Abstract
This paper introduces a novel self-supervised learning framework, Masked Multiscale Reconstruction (MMR), for pretraining photoplethysmography (PPG) foundation models. PPG signals, commonly collected via wearable devices, are inherently multi-scale, with physiological information encoded across both fine-grained waveform morphology and broader rhythmic dynamics. The proposed MMR framework leverages wavelet-based multiresolution decomposition to capture these hierarchical time-frequency features. Specifically, the model is trained to reconstruct masked wavelet coefficients from PPG signals, encouraging the learning of rich, physiologically meaningful embeddings. The authors pretrain their model on a large-scale dataset of approximately 17 million 10-second PPG segments from 32,000 smartwatch users, totaling 48,000 hours of data. The pretrained model demonstrates state-of-the-art performance on 17 out of 19 downstream health-related tasks, outperforming or matching existing PPG foundation models and other self-supervised baselines. Ablation studies further highlight the importance of wavelet-based representations and the impact of design choices such as wavelet family, decomposition scales, and patch size. This work underscores the potential of wavelet-driven approaches for building generalizable and robust PPG foundation models, enabling advancements in digital health applications such as cardiovascular monitoring and stress detection.
Methodology
The authors propose a self-supervised pretraining framework, Masked Multiscale Reconstruction (MMR), which uses wavelet-based multiresolution decomposition of PPG signals. The model employs a transformer encoder to reconstruct randomly masked wavelet coefficients across multiple time-frequency scales. The training dataset consists of 17 million 10-second PPG segments from 32,000 smartwatch users, processed using the Discrete Wavelet Transform (DWT). The model is evaluated on 19 downstream health-related tasks, with systematic ablations to assess the impact of design choices such as wavelet family, decomposition scales, and patch size.
Results
The MMR framework achieves state-of-the-art performance on 17 out of 19 health-related tasks, outperforming or matching existing PPG foundation models, time-series foundation models, and other self-supervised baselines. The learned embeddings capture robust, physiologically meaningful features, as demonstrated through extensive analysis and ablation studies. The results validate the effectiveness of wavelet-based representations for generalizable PPG modeling.
Implications
The proposed MMR framework has significant implications for digital health, particularly in wearable technology and continuous cardiovascular monitoring. By leveraging wavelet-based multiscale representations, the model enables robust and generalizable PPG foundation models, which can support a wide range of health-related applications, including blood pressure estimation, arrhythmia detection, and stress monitoring. This work also highlights the potential of wavelet-driven approaches for advancing self-supervised learning in biosignal modeling.
View on arXiv