AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

48 Papers today
8h Update frequency
7 Days of history
VIMPO: Value-Implicit Policy Optimization for LLMs
Zhewei Kang, Aosong Feng, Sergey Levine, Dawn Song, Xuandong Zhao
Reinforcement Learning Large Language Models Optimization
  • VIMPO is a critic-free policy optimization method that improves reasoning in LLMs.
  • It derives a policy-implied value function using KL-regularized reinforcement learning principles.
  • The method allows for token-level credit assignment without the instability of a learned critic.
  • Empirical results show VIMPO outperforms existing methods like GRPO, especially under noisy rewards.
Read more
Spectral Retrieval-Augmented Time-Series Forecasting
Huu Hiep Nguyen, Minh Hoang Nguyen, Dung Nguyen, Hung Le
Time Series
  • Introduction of SpecReTF, a novel retrieval-augmented forecasting architecture.
  • Combines frequency-domain analysis with recency-weighted pattern retrieval.
  • Unified similarity measure integrates Jensen–Shannon divergence and cosine similarity.
  • SpecReTF achieves state-of-the-art forecasting accuracy on benchmark datasets.
Read more
What Makes Effective Supervision in Latent Chain-of-Thought: An Information-Theoretic Analysis
Xinghao Chen, Chak Tou Leong, Wenjin Guo, Jian Wang, Wenjie Li, Xiaoyu Shen
NLP Large Language Models Theory
  • Latent Chain-of-Thought models face challenges due to weak learning signals from outcome supervision.
  • The dual collapse phenomenon involves gradient attenuation and representational drift in latent spaces.
  • Process supervision can be effectively decomposed into Trajectory and Space Supervision.
  • Generative reconstruction is more effective than geometric compression for preserving information capacity.
Read more
Deep Learning for Soil Moisture Estimation: Fusing Satellite Data with Optimally-Lagged Meteorological Features
Adrian Canovas-Rodriguez, Aurora González Vidal, Antonio F. Skarmeta
Time Series
  • Optimal meteorological and inter-depth lags were identified using Cross-Correlation Function (CCF).
  • A per-pixel CNN model showed significant improvement in soil moisture prediction when combined with depth features.
  • The CNN-LSTM hybrid model achieved the best overall performance in held-out data evaluation.
  • Incorporating subsurface depth information was crucial for enhancing prediction accuracy.
Read more
FlexLAM: Resolving the Bottleneck Trade-off in Latent Action Learning
Takanori Yoshimoto, Yang Hu, Naruya Kondo, Tatsuya Matsushima
Robotics Computer Vision Efficient ML
  • Identification of a bottleneck trade-off in fixed-capacity LAMs affecting action alignment.
  • Introduction of retained-prefix training for variable-length latent actions.
  • FlexLAM outperforms traditional fixed-capacity LAMs across all evaluated token budgets.
  • Supports inference-time token-budget adjustments without retraining.
Read more
Physiology-Aware CNN and Zero-Shot Multimodal LLMs for ECG Image Classification: A Comparative Study
Khalil Ahammad, Derek Abbott, Mohsen Dorraki
Computer Vision Large Language Models Multimodal
  • Physiology-aware CNN models outperform zero-shot multimodal LLMs in ECG image classification.
  • LeadGroupECG model effectively captures anatomical relationships among ECG leads.
  • CNN models achieved high ROC-AUC scores, indicating strong classification performance.
  • Zero-shot LLMs showed near-chance performance, highlighting limitations in ECG interpretation.
Read more
ReNIO: Reweighting Negative Trajectory Importance for LLM On-Policy Distillation
Chen Lin, Kedi Chen, Wei Zhang
Large Language Models Reinforcement Learning NLP
  • Incorrect student-generated outputs can provide more valuable training signals than correct ones in OPD.
  • ReNIO introduces a prefix-computable reweighting method that emphasizes negative trajectories without needing final-answer labels.
  • The method leverages student-to-teacher probability ratios to identify and weight pivotal tokens leading to incorrect reasoning.
  • ReNIO shows substantial performance improvements in mathematical reasoning and code generation tasks.
Read more
Comparing Linear Probes with Mahalanobis Cosine Similarity
Zhuofan Josh Ying, Peter Hase, Nikolaus Kriegeskorte
Interpretability Theory Large Language Models
  • MCS provides a near-perfect linear prediction of OOD AUROC across multiple models and datasets.
  • Theoretical proof establishes the linear relationship between MCS and OOD AUROC under specific conditions.
  • MCS outperforms ECS significantly in terms of correlation with probe performance.
  • The study identifies failure modes for the linearity of MCS and OOD AUROC, enhancing understanding of probe generalization.
Read more
Contagion Networks: Evaluator Bias Propagation in Multi-Agent LLM Systems
Zewen Liu
Large Language Models NLP Theory
  • Introduces Contagion Networks to measure evaluator bias propagation in multi-agent LLM systems.
  • Establishes a Cross-Agent Contagion Matrix (ΓN) for quantifying bias spread across agents.
  • Identifies three propagation regimes and demonstrates that homogeneous agents have weaker contagion effects.
  • Finds that increasing evaluator committee size can reduce effective contagion by 72.4%.
Read more
Physics-Informed Discovery of Yield Functions in Plasticity via Convex Neural Representations
Hyeonbin Moon, Donghyuk Cho, Jecheon Yu, Jeong Whan Yoon, Seunghwa Ryu
Theory Interpretability Optimization
  • Introduces a physics-informed framework for yield function discovery from displacement and force data.
  • Utilizes a convex neural network to represent yield functions, ensuring convexity and symmetry.
  • Trains the neural yield function using force equilibrium residuals instead of direct stress supervision.
  • Validated against benchmark yield functions using finite element simulations.
Read more
When to Trust, How to Distill: Multi-Foundation Model Guidance for Lightweight, Robust Scientific Time Series Forecasting
Rupasree Dey, Abdul Matin, Nathan Orwick, Yao Zhang, Shrideep Pallickara, Sangmi Lee Pallickara
Time Series Efficient ML Theory
  • Introduction of Guard framework for dynamic multi-teacher knowledge distillation.
  • Adaptive mechanisms for selecting teacher models based on input statistics and uncertainty.
  • Significant RMSE reduction compared to traditional distillation methods.
  • Demonstrated effectiveness in four scientific domains despite distributional misalignment.
Read more
How Linear Is a Transformer Feed-Forward Block? Per-Block Linear Recoverability Is Learned, Not Architectural
Stuart Whipp
NLP Large Language Models Efficient ML
  • Introduces a novel measure of linear recoverability for Transformer FFN blocks using closed-form least squares.
  • Demonstrates that linear recoverability varies significantly across different FFN blocks and is a learned property rather than an architectural one.
  • Finds that residual nonlinearity is not well captured by low-order multiplicative models.
  • Highlights the potential for targeted compression of FFN blocks based on their linear recoverability profiles.
Read more
Scaling Linear Mode Connectivity and Merging to Billion Parameter Pretrained Transformers
Tianyi Li, Zhiqiang Shen
NLP Large Language Models Computer Vision
  • Introduces a scalable framework for linear mode connectivity in billion-parameter pretrained transformers.
  • Utilizes parameterized weight transformations and a dual learning procedure for effective model merging.
  • Achieves near-zero loss barriers on WikiText and maintains high accuracy on ImageNet during interpolation.
  • Demonstrates the importance of resolving parameter symmetries in enhancing model connectivity.
Read more
Causal Gaussian Processes for Robust Treatment Effect Evaluation with Unobserved Confounding
Junzhe Zhang, Jingyuan Chen, Elias Bareinboim
Theory
  • Introduces Causal Gaussian Processes (CGP) for evaluating treatment effects with unobserved confounding.
  • Develops a universal discretization method for approximating causal models in continuous domains.
  • Demonstrates the effectiveness of CGP in mitigating confounding bias in observational data.
  • Provides a framework that requires only basic temporal ordering between treatment and outcome.
Read more
Distribution-Aware Diffusion-LLM for Robust Ultra-Long-Term Time Series Forecasting
Falguni Ghosh, Vahid Hashemi, Bernhard Kainz
Time Series Large Language Models Generative Models
  • Introduction of Diffusion-LLM framework that combines LLMs with conditional diffusion models for time series forecasting.
  • Improvement in multimodal alignment and probabilistic modeling through a shared latent space.
  • Significant performance gains in ultra-long-term and few-shot forecasting across multiple benchmarks.
  • Demonstration of DDPMs as effective regularizers for enhancing LLM robustness.
Read more
Reward-free Pretraining for Reinforcement Learning via Occupancy Coverage Maximization
Marco Pratticò, Pietro Novelli, Massimiliano Pontil, Carlo Ciliberto
Reinforcement Learning
  • ROVER maximizes state-space coverage for effective exploration in sparse-reward environments.
  • The method employs a learned resolvent world model to estimate occupancy, addressing common estimation challenges.
  • Introduction of a virtual 'sink' state stabilizes learning by managing unsupported state-action regions.
  • Empirical results show ROVER achieves superior coverage and initialization compared to traditional reward-free methods.
Read more
Understanding Key Features of Time Series Foundation Models from Epidemic Forecasting
Alireza Jafari, Judy Fox, Geoffrey C. Fox, Madhav Marathe, Aniruddha Adiga
Time Series
  • A systematic evaluation of forecasting models for influenza using ILI and hospitalization data.
  • Mixture-of-experts models outperform other architectures, indicating the benefit of diverse pretrained representations.
  • Numerical transformer-based models are reliable, especially with appropriate pretraining.
  • LLM-based forecasting methods are less effective compared to traditional numerical approaches.
Read more
UltraQuant: 4-bit KV Caching for Context-Heavy Agents
Inesh Chakrabarti, David Limpus, Aditi Ghai Rana, Bowen Bao, Spandan Tiwari, Thiago Crepaldi, Ashish Sirasao
Large Language Models Efficient ML NLP
  • UltraQuant improves 4-bit KV caching for context-heavy agents, addressing memory pressure and GPU utilization.
  • The method incorporates practical design choices, including asymmetric key/value treatment and optimized decode-attention kernels.
  • UltraQuant achieves a 3.47× reduction in time-to-first-token during late rounds and a 1.63× increase in output throughput over FP8 KV caching.
  • The approach emphasizes the importance of serving efficiency metrics in evaluating KV-cache performance.
Read more
Pseudo-Feature Padding: A Lightweight Defense Against False Data Injection in Power Grids
Farhin Farhad Riya, Shahinul Hoque, Yingyuan Yang, Jinyuan Sun, Kevin Tomsovic
Theory Efficient ML
  • Introduces a lightweight defense mechanism against FDIA in DNNs used in CPS.
  • Utilizes pseudo-feature padding to increase input dimensionality and complexity.
  • Model-agnostic approach requiring no modifications to existing DNN architectures.
  • Demonstrates significant improvements in robustness with minimal impact on performance.
Read more
Information Lattice Learning as Probabilistic Graphical Model Structure Learning
Haizi Yu, Lav R. Varshney
Theory Interpretability Graph Learning
  • ILL provides a framework for learning interpretable rules from signals, emphasizing low complexity.
  • The probabilistic rules learned through ILL can be interpreted as marginal constraints in PGMs.
  • The information lattice structure aids in understanding the relationships between different abstractions.
  • ILL distinguishes between general and special lifting, impacting the reconstruction of probability distributions.
Read more
Post-Training Speech Enhancement Language Models with Perceptual Rewards
Frédéric Berdoz, Luca A. Lanzendörfer, Antonis Asonitis, Roger Wattenhofer
Audio & Speech Reinforcement Learning Optimization
  • Introduction of a post-training stage for autoregressive speech enhancement models using GSPO.
  • Development of a composite reward system that combines multiple perceptual metrics to avoid reward hacking.
  • Achieved state-of-the-art performance on DNS2020 and DNS5 benchmarks.
  • Human evaluations indicate a preference for multi-metric rewards over single-metric approaches.
Read more
Alzheimer's Disease Diagnosis using a Multimodal Approach with 3D MRI and PET
Loukas Ilias, Anthi-Maria Vozinaki, Christos Ntanos, Dimitris Askounis
Multimodal
  • Introduces a novel multimodal approach combining 3D MRI and PET for AD diagnosis.
  • Utilizes three fusion strategies and a sparsely gated Mixture-of-Experts classifier.
  • Achieves high classification accuracies across multiple diagnostic tasks.
  • Implements Grad-CAM for enhanced model interpretability.
Read more
VLA-FAIL: Efficient Task Failure Detection for Finetuned Vision-Language-Action Models
Florian Seligmann, Emiliyan Gospodinov, Enes Ulas Dincer, Gerhard Neumann
Robotics Efficient ML Multimodal
  • VLA-FAIL is a lightweight framework for detecting task failures in Vision-Language-Action models.
  • It combines two novel detection methods: LLMD for out-of-distribution state detection and ACC for action consistency monitoring.
  • The framework requires no failure data and incurs minimal computational overhead.
  • AUCPDT is introduced as a new metric to evaluate detection accuracy and latency.
Read more
DiT-Reward: Generative Representations for Text-to-Image Reward Modeling
Yuanming Yang, Guoqing Ma, Bo Wang, Yuan Zhang, Wei Tang, Chenyi Li, Haoyang Huang, Nan Duan
Generative Models Reinforcement Learning Multimodal
  • DiT-Reward effectively repurposes a pretrained text-to-image DiT as a reward model.
  • The method outperforms existing models like HPSv3 on multiple preference benchmarks.
  • A lightweight head can extract meaningful preference predictions even when the generative backbone is frozen.
  • Reward performance benefits from representations in the middle-to-late layers of the transformer.
Read more
Quantum-classical physics-informed Kolmogorov-Arnold networks for PDEs
Xiang Rao, Yuxuan Shen
Theory Efficient ML
  • Introduction of QCPIKAN, the first quantum-classical physics-informed Kolmogorov-Arnold network for PDEs.
  • Theoretical proof of accelerated convergence rates and reduced numerical dispersion.
  • Validation across three seepage scenarios in porous media demonstrates superior performance.
  • Outperforms existing models in accuracy, error control, and dynamic tracking.
Read more
Breaking chains with trees: Deep learning with $ (log N)$ parallel time complexity
Neeraj Mohan Sushma, Aditya Nagarsekar, Cabrel Teguemne Fokam, Robin Schiewer, Amit Kumar Pal, Anand Subramoney, David Kappel
Efficient ML Computer Vision NLP
  • HBLL allows training of deep neural networks without full backpropagation, improving scalability and parallelism.
  • The framework achieves O(log N) parallel time complexity, significantly enhancing computational efficiency.
  • HBLL demonstrates competitive performance on challenging benchmarks in vision classification and language modeling.
  • The method supports flexible inference by defining subnetworks based on hierarchical paths.
Read more
An Empirical Study of OpenPangu Quantization on Ascend NPUs
Tong Shi, Jiacheng Wang, Hui Xie, Ying Li, Aishan Liu, Jinyang Guo, Xianglong Liu
NLP Large Language Models Efficient ML
  • 8-bit weight-only quantization is effectively lossless for OpenPangu models.
  • 4-bit quantization is practical for the 7B model but harmful for the 1B model.
  • Ultra-low precision quantization (2-bit and binary) often results in poor performance.
  • The study provides a comprehensive evaluation of various quantization methods on Ascend NPUs.
Read more
Adversarial Bandit Optimization with Globally Bounded Perturbations to Convex Losses
Zhuoyu Cheng, Kohei Hatano, Eiji Takimoto
Optimization Theory
  • Introduces a model for bandit optimization with C-approximately convex and β-smooth function sequences.
  • Establishes expected regret guarantees that account for adversarial perturbations under a global budget.
  • Demonstrates that sublinear expected regret is achievable even with non-convex losses.
  • Modifies existing bandit algorithms to accommodate the new perturbation model.
Read more
Parameterized Representations via Implicit Stochastic Modulation for High-Dimensional and High-Order Neural PDE Solvers
Zhangyong Liang, Huanhuan Gao
Theory Optimization Efficient ML
  • PRISM decouples parameter encoding from the spatial AD graph, addressing memory growth issues.
  • The architecture enables zero-shot extrapolation for parameterized PDEs without retraining.
  • PRISM supports efficient scaling of high-dimensional PDEs, achieving up to 100,000 dimensions on a single GPU.
  • Variance-aware Lipschitz damping is incorporated to enhance optimization stability.
Read more
Learning universal approximations for partial differential equations with Physics-Informed Broad Learning System
Zhiwen Yu, Derong Yang, Liujian Zhang, Kaixiang Yang, Peilin Zhan, Jianmin Lv, Jane You, C. L. Philip Chen
Efficient ML Theory Optimization
  • PIBLS is the first application of Broad Learning System (BLS) to solving PDEs, offering a backpropagation-free computational framework.
  • The framework reformulates PDE solving as a direct least-squares optimization, enhancing computational efficiency.
  • Rigorous mathematical proof establishes PIBLS's universal approximation property for PDE solutions.
  • Experimental results show PIBLS is significantly faster and more accurate than traditional PINNs.
Read more
PaAno+: Multiscale Encoding and Cross-Variable Attention for Time Series Anomaly Detection
Youji Zhu, Hongbing Wang, Wenchao Liu, Xiaodong Liu, Xiangguang Xiong
Time Series
  • Introduces PaAno+, a lightweight model for time series anomaly detection.
  • Utilizes multiscale feature extraction and cross-variable attention to improve anomaly detection accuracy.
  • Implements a novel self-supervised learning task for better feature representation.
  • Demonstrates state-of-the-art performance on the TSB-AD benchmark.
Read more
Computational Identifiability
Lucius E.J. Bynum, Rajesh Ranganath, Kyunghyun Cho
Theory
  • Introduction of computational identifiability as a practical, computation-bound notion of identifiability.
  • Formalization of the relationship between causal effect estimation and meta-learning.
  • Empirical demonstration of computational identifiability in complex scenarios.
  • Provision of a framework for identifying causal effects with finite samples and error tolerances.
Read more
Learning a Normal World Model for Few-Shot Boundary-Calibrated Abnormality Detection
Weizhi Nie, Weichao Liu, Weijie Wang, Yuting Su
Time Series
  • Introduces a normal world modeling framework for few-shot abnormality detection.
  • Develops an entropy-aware normal-world energy for quantitative evaluation of abnormality.
  • Demonstrates strong performance on the NASA C-MAPSS turbofan degradation benchmark.
  • Mechanistic validation tests confirm the model captures the structure of normal behavior.
Read more
From Handcrafted Features to Functional Edge Learning: Evolution of EEG Seizure Detection Frameworks
Sepideh Kheirollahi, Mohammad Rasoul Roshanshah
Time Series Interpretability Efficient ML
  • Deep Learning models for EEG analysis face significant challenges in clinical deployment due to their black-box nature and high data requirements.
  • Kolmogorov-Arnold Networks (KANs) offer a new paradigm by using learnable activation functions, enhancing interpretability and efficiency.
  • KANs are more robust to data scarcity and can facilitate cross-patient personalization without extensive retraining.
  • The paper provides a structured analysis of EEG seizure detection methodologies, highlighting the need for transparent and efficient models.
Read more
On the Position Bias of On-Policy Distillation
Yan Xie, Sijie Zhu, Tiansheng Wen, Bo Chen, Yifei Wang
Reinforcement Learning Optimization Efficient ML
  • Identifies the position bias phenomenon in OPD, where early tokens provide more valuable supervision than later ones.
  • Proposes IW-OPD, which adjusts token weights based on the accumulated discrepancy between student and teacher distributions.
  • Demonstrates that IW-OPD converges faster and achieves better performance than standard OPD.
  • Shows that the advantages of IW-OPD increase with the mismatch between teacher and student models.
Read more
Comparative Study of Neural Surrogate Architectures for Autoregressive Prediction of Internal Battery States
Gihyun Lee, Thorben Menne, Simon Olma, Jakob Hilgert, Sangyoung Park
Time Series Efficient ML Theory
  • The study compares four neural network architectures for predicting internal battery states.
  • U-Net architecture shows superior performance with a 3% mean final-step nRMSE.
  • The proposed models significantly reduce inference latency, achieving a 5.38× speed-up over traditional numerical solvers.
  • Spatial inductive bias is identified as a critical factor influencing surrogate model performance.
Read more
PG-MAP: Joint MAP Optimization for Inference-Time Alignment of Diffusion and Flow-Matching Models
Ruolan Sun, Pawel Polak
Generative Models Optimization Multimodal
  • PG-MAP is the first framework to jointly optimize conditioning and latent variables during inference-time alignment.
  • The framework employs a forward-consistency coupling, allowing coordinated updates across modalities.
  • PG-MAP shows consistent improvements in alignment metrics across different diffusion models.
  • Human evaluations indicate a strong preference for outputs generated using PG-MAP compared to existing baselines.
Read more
The Cost Geometry of Belief: finite-resource inference under noisy observation
Laurent Caraffa
Theory
  • Introduces a cost geometry for beliefs based on optimal transport and Fisher information.
  • Establishes that certainty is an unattainable boundary in finite-resource inference.
  • Identifies three key results: a wall of certainty, an honesty condition, and a rigidity in belief geometries.
  • Demonstrates that the Gaussian distribution is the most hyperbolic belief in this framework.
Read more
Efficient Network Inference via Hardware-Aware Architecture Search, Model Pruning & Quantization
Lucas Heublein, Mark Deutel, Axel Plinge, Felix Ott
Efficient ML
  • Investigates efficient network inference for GNSS interference characterization under strict resource constraints.
  • Utilizes a deployment-oriented compression pipeline combining pruning and quantization with MCUNet as a baseline.
  • Applies hardware-aware zero-shot NAS to optimize network architecture and pruning configurations.
  • Demonstrates trade-offs between predictive performance and deployment efficiency through experimental evaluations.
Read more
Robustness Cannot be Reduced to Regularization: Studying Adversarial Training Beyond the Linear Case
David A. R. Robin, Rafael Pinot, Yann Chevaleyre
Theory Optimization Efficient ML
  • Adversarial training is effective but computationally expensive.
  • No equivalence between adversarial risk and regularized risk exists for two-layer networks.
  • The impossibility of reformulating adversarial risk extends to deeper architectures.
  • The study emphasizes the need for new methodologies in adversarial training beyond linear models.
Read more
On the Curse of Dimensionality in Private Sparse Covariance Estimation and PCA
Syamantak Kumar, Shourya Pandey, Purnamrita Sarkar, Kevin Tian
Theory
  • Demonstrates a significant curse of dimensionality in DP covariance estimation and PCA.
  • Establishes poly(k, log d) sample complexity for DP PCA under additional sparsity assumptions.
  • Provides poly(d) lower bounds for both sparse covariance estimation and PCA under DP.
  • First to show an exponential gap between private and non-private sample complexities in sparse estimation.
Read more
SamatNext v0.2-B: An Exploratory Study of RMS-Normalized Hybrid Decoders for Curriculum Retention in Small Code Models
Samat Zharassov
NLP Large Language Models
  • SamatNext v0.2-B demonstrates improved retention of prior capabilities compared to a standard Transformer baseline.
  • The hybrid architecture effectively balances retention and plasticity in curriculum learning settings.
  • Despite improvements, both models face challenges with catastrophic forgetting, particularly in early-stage syntax tasks.
  • The study emphasizes the importance of structured curriculum learning in training adaptive models.
Read more
When Calibration Fails the Vulnerable Hospital: Federated Conformal Risk Control via Risk-Curve Shrinkage
Nafis Fuad Shahid
Federated Learning Computer Vision Theory
  • Quantifies the marginal-conditional coverage gap in federated CRC for medical segmentation.
  • Proposes a shrinkage-based federated CRC protocol that enhances prediction set efficiency.
  • Demonstrates that naive pooling of calibration scores can lead to critical failures in individual site coverage.
  • Identifies the necessity of finite-sample correction terms to avoid excessive violations.
Read more
Short-Term Electricity Demand Forecasting for New England Using a Hybrid Transformer-XGBoost Framework with Weather, Calendar, and COVID-19 Indicators
Reza Ghanavati, Behrooz Mosallaei
Time Series
  • The hybrid Transformer-XGBoost framework significantly outperforms a tabular-only XGBoost model in short-term electricity demand forecasting.
  • COVID-19 indicators initially improved model accuracy but became less relevant as behavioral adaptations occurred post-pandemic.
  • Hyperparameter optimization using Optuna enhanced the model's performance through efficient search strategies.
  • The study emphasizes the importance of considering temporal validity decay in forecasting models affected by structural changes in demand patterns.
Read more
Bypassing Minimization Bias: A Shift-Invariant Variance Estimator for Off-Equilibrium Local Learning Coefficients
Yingjia Cai
Theory Optimization
  • Introduction of the Shift-Invariant Variance Estimator (SIVE) to bypass minimization bias in LLC estimation.
  • SIVE structurally eliminates the need for the local minimum by using variance and a noise-debiasing correction.
  • Controlled experiments validate SIVE's effectiveness in recovering geometric signals in off-equilibrium settings.
  • SIVE is scalable to deep neural networks, enabling real-time tracking of structural phase transitions during training.
Read more
Fisher-Geometric Sharpness and the Implicit Bias of SGD toward Flat Minima
Md Sakir Ahmed, Kumaresh Sarmah, Hemen Dutta
Theory Optimization
  • Introduces Riemannian sharpness as an invariant measure of flatness under reparametrization.
  • Establishes a connection between SGD's implicit bias and Riemannian flat minima through a derived SDE.
  • Demonstrates a PAC-Bayes generalization bound explicitly controlled by Riemannian sharpness.
  • Empirical validation shows Riemannian sharpness better predicts generalization than Euclidean sharpness.
Read more
Computational Methods and Challenges in Cell-Free DNA Analysis for Multi-Cancer Early Detection
Nicko Starkey, Marcin W. Wojewodzic, Krzysztof Rzecki
Multimodal
  • cfDNA is a promising biomarker for non-invasive multi-cancer early detection.
  • The review categorizes computational methods into statistical, machine learning, and deep learning approaches.
  • Multimodal ensemble approaches are identified as having the highest readiness for clinical integration.
  • Standardization of evaluation protocols is crucial for future research and comparison.
Read more
Predicting Mergeability of Parameter-Efficient Fine-Tuning Updates
Lin Tang, Wei Zhang, Jing Li, Hongyu Chen, Ming Zhao, Yuxuan Wang
NLP Large Language Models Efficient ML
  • Formalizes adapter mergeability for LoRA, separating single-task utility from post-merge retention.
  • Introduces MergeProbe, a lightweight predictor that estimates mergeability based on early training signals.
  • Demonstrates improved retention in merging adapters across multiple domains compared to existing methods.
  • Shifts the merging process from a post-hoc evaluation to an anticipatory measurement problem.
Read more