AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

69 Papers today
8h Update frequency
7 Days of history
Extra-Merge: Tracing the Rank-1 Subspace of Model Merging in Language Model Pre-Training
Wenjie Zhou, Bohan Wang, Hongtao Zhang, Chenxi Jia, Wei Chen, Xueqi Cheng
Large Language Models Optimization Efficient ML
  • Identification of the Rank-1 Subspace phenomenon in merged model trajectories.
  • Introduction of Extra-Merge, a training-free method for loss minimization.
  • Theoretical grounding of the merging process in the context of optimization landscapes.
  • Demonstrated effectiveness across various model scales and optimizers.
Read more
MTL-FNO: A Lightweight Multi-Task Fourier Neural Operator for Sparse Field Reconstruction
Siyu Ye, Shihang Li, Zhiqiang Gong, Benrong Zhang, Weien Zhou, Yiyong Huang, Wen Yao
Efficient ML
  • Introduction of MTL-FNO, a lightweight multi-task framework for sparse field reconstruction.
  • Utilizes hard parameter sharing to efficiently capture common features across multiple tasks.
  • Implements low-rank terms for task-specific parameters to achieve model compression.
  • Develops a decoupled optimization scheme for spectral weights to reduce task conflicts.
Read more
Auditing and Fixing Economic Validity in Tabular Foundation Models for Discrete Choice
Yingshuo Wang, Xian Sun, Yanhang Li, Zhichao Fan, Zexin Zhuang
Theory
  • TFMs often violate economic principles in discrete choice predictions.
  • A two-stage adapter is proposed to integrate TFM predictions within a utility-maximization framework.
  • The adapter guarantees economic consistency while recovering accuracy gains from TFMs.
  • On tested datasets, the adapter outperformed standard multinomial logit models significantly.
Read more
Prism: A Plug-in Reproducible Infrastructure for Scalable Multimodal Continual Instruction Tuning
Jun-Tao Tang, Yu-Cheng Shi, Zhen-Hao Xie, Da-Wei Zhou
Multimodal Large Language Models NLP
  • PRISM introduces a lightweight plugin design that separates algorithm development from MLLM backbone implementation.
  • The framework supports a unified benchmarking suite, facilitating fair comparisons across different methods.
  • PRISM enhances scalability and reproducibility in MCIT research by integrating widely used large-scale training pipelines.
  • The modular architecture allows for easy integration of new methods and benchmarks as standalone plugins.
Read more
Step-TP: A Grounded, Step-Level Dataset with Chain-of-Thought Reasoning for LLM-Guided Tensor Program Optimization
Mengfan Liu, Da Zheng, Junwei Su, Chuan Wu
Large Language Models Optimization
  • Step-TP provides step-level supervision for tensor program optimization, enhancing LLM reasoning capabilities.
  • The dataset is designed around principles that ensure token efficiency and interpretable decision-making.
  • Structured chain-of-thought reasoning is integrated to facilitate reliable multi-step optimization.
  • The dataset aims to overcome limitations of existing datasets that primarily focus on outcome-only supervision.
Read more
Optimizing Digital Therapeutic Interventions: Online Learning under Endogenous Adherence
Eric Pulick, Stephanie Carpenter, Matthew Buman, Yonatan Mintz
Optimization Theory
  • Introduces a decision support framework for digital therapeutics that models both treatment recommendations and patient adherence.
  • Utilizes a linear dynamical system to capture the time-varying nature of patient engagement and its effects on adherence.
  • Presents the UCB-BOLD algorithm, which achieves sublinear regret in online treatment selection.
  • Demonstrates significant performance improvements over existing benchmarks in managing patient adherence.
Read more
Linear and Neural Dueling Bandits with Delayed Feedback
Xiangyi Wang, Pingchen Lu, Jie Mao, Mingze Kong, Zhi Hong, Zhiyong Wang, Zhongxiang Dai
Reinforcement Learning Theory Optimization
  • Introduces two novel algorithms for dueling bandits with delayed feedback: LDB-DF and NDB-DF.
  • Utilizes an Inverse Probability Weighting mechanism to ensure unbiased estimation despite delayed feedback.
  • Establishes theoretical regret bounds for both linear and neural settings.
  • Demonstrates superior performance of proposed methods through extensive experiments on simulated and real-world datasets.
Read more
Metropolis-Scale Resilient and Trustworthy Traffic Flow Inference Using Multi-Source Data
Qishen Zhou, Yifan Zhang, Michail A. Makridis, Anastasios Kouvelas, Yibing Wang, Simon Hu
Time Series Graph Learning Optimization
  • Introduction of TA-ANP framework for traffic state inference.
  • Effective fusion of multi-source data (FCD and fixed-detector measurements).
  • Rapid adaptation to changes in sensing configurations without retraining.
  • Joint handling of multiple GTSI sub-tasks with minimized interference.
Read more
Capture-Calibrate-Coach: A Graph-Based Framework for Knowledge Monitoring Estimation and Adaptive Feedback
Gen Li, Li Chen, Cheng Tang, Boxuan Ma, Yuncheng Jiang, Daisuke Deguchi, Takayoshi Yamashita, Atsushi Shimada
Graph Learning
  • Introduces the 3C framework for adaptive learning support focusing on knowledge monitoring.
  • Utilizes large language models to extract learners' perceptions from open-ended self-reports.
  • Employs a heterogeneous graph neural network for inferring latent perceived states.
  • Demonstrates high accuracy in predicting knowledge states and positive feedback reception from users.
Read more
MULTISEISMO: A Multimodal Seismic Dataset and Model for Cross-Modal Seismic Understanding
Sai Munikoti, Ian Stewart, Chengping Chai, Lisa Linville, Scott Vasquez, Sameera Horawalavithana, Karl Pazdernik
Multimodal Time Series
  • MULTISEISMO is a large-scale multimodal seismic dataset integrating waveform data, geographical imagery, and metadata.
  • The dataset includes over 16,000 seismic events spanning 13 years, formatted in a standardized JSON structure.
  • MISCE, a multimodal instruction set, enables effective training and evaluation of GMMs on seismic tasks.
  • SeisModal, the first domain-specific multimodal model for seismic analysis, shows superior performance compared to general-purpose models.
Read more
Evolving Robustness--Exploration Trade-off in Online Reinforcement Learning via Quantile Bayesian Risk MDPs
Meichen Song, Yuhao Wang, Enlu Zhou
Reinforcement Learning Theory Optimization
  • Introduces a quantile Bayesian risk-aware MDP framework to manage the robustness-exploration trade-off in online RL.
  • Establishes a theoretical foundation for the impact of quantile levels on decision-making under uncertainty.
  • Proposes an adaptive quantile schedule that shifts focus from robustness to exploration as data accumulates.
  • Demonstrates strong empirical performance in environments with varying exploration demands.
Read more
Falcon-X: A Time Series Foundation Model for Heterogeneous Multivariate Modeling
Yiding Liu, Yifan Hu, Hongjie Xia, Peiyuan Liu, Hongzhou Chen, Xilin Dai, Zewei Dong, Jiang-Ming Yang
Time Series
  • Falcon-X addresses the limitations of existing TSFMs by enabling effective cross-variate modeling.
  • The model utilizes a Unified Prototype Diff-Attention mechanism for improved semantic alignment.
  • Latent Entity Attention allows for efficient cross-variate interactions in a unified latent space.
  • Falcon-X demonstrates state-of-the-art performance on benchmark datasets for time series forecasting.
Read more
RotMoLE: Enhancing Mixture of Low-Rank Experts through Rotational Gating Mechanism
Mengyang Sun, Maochuan Dou, Tao Feng, Dan Zhang, Yihao Wang, Junpeng Liu, Yifan Zhu, Jie Tang
NLP Large Language Models Efficient ML
  • RotMoLE introduces a rotational gating mechanism to enhance expert selection in MoE architectures.
  • The framework allows for complex spatial transformations of expert outputs, improving representation and generalization.
  • Empirical results show significant performance improvements in multi-task and multilingual learning scenarios.
  • RotMoLE leverages low-rank structures to maintain parameter efficiency while enhancing model capabilities.
Read more
Spend Your Rollouts Where It Counts: Rollout Allocation for Group-Based RL Post-Training
Woojeong Kim, Ziyi Yang, Jing Nathan Yan, Jialu Liu
Reinforcement Learning Large Language Models Efficient ML
  • Pilot-Commit framework improves rollout allocation efficiency in group-based RL.
  • The framework uses a two-stage process to evaluate prompt informativeness and allocate resources accordingly.
  • Pilot-Commit achieves baseline accuracy with significantly fewer rollouts compared to existing methods.
  • The proposed method adapts to the evolving policy, optimizing the learning signal from prompts.
Read more
AME-TS: Anchored Mixture-of-Experts for Time Series Forecasting
Rui Wang, Renhao Xue, Ray Razi, Huan Song, Hannah R. Marlowe
Time Series
  • AME-TS utilizes a structure-guided approach to improve expert specialization in time series forecasting.
  • The model employs a regime predictor to derive interpretable temporal descriptors that inform expert routing.
  • AME-TS achieves a strong accuracy-efficiency tradeoff, outperforming existing models at smaller scales and remaining competitive at larger scales.
  • The routing mechanism in AME-TS is more interpretable and stable compared to traditional MoE architectures.
Read more
PromptAudit: Auditing Prompt Sensitivity in LLM-Based Vulnerability Detection
Steffen J. Camarato, Yahya Hmaiti, Mandana Ghadamian, David Mohaisen
Large Language Models NLP
  • Prompt sensitivity significantly affects the performance of LLMs in vulnerability detection.
  • Standard chain-of-thought prompting outperforms other strategies in operational performance.
  • Few-shot prompting benefits are model-dependent and most effective for prompt-sensitive models.
  • Adaptive chain-of-thought and self-consistency can lead to reduced recall and increased abstention.
Read more
Reparametrizing Shampoo and SOAP for Subspace Basis Updates and BFloat16 Storage
Alan Milligan, Zikun Xu, Simon Lacoste-Julien, Felix Dangel, Wu Lin
Optimization Efficient ML
  • Introduces a reparametrization of preconditioners in Shampoo-based methods to support BFP16 storage.
  • Reduces computational overhead by updating only part of the basis through QR decomposition in a subspace.
  • Improves performance of SOAP and KL-SOAP methods, closing the performance gap with KL-Shampoo.
  • Compatible with various subspace selection strategies, enhancing flexibility in optimization.
Read more
Innovation: An Almost Characterization of Hallucination
Nishant P. Das, Piyush Srivastava
NLP Large Language Models Theory
  • Introduces 'innovation' as a simpler property related to hallucination in LLMs.
  • Establishes a relationship between innovation and hallucination, showing they are nearly equivalent.
  • Provides new lower bounds on hallucination rates based on the innovation rate.
  • Demonstrates that increasing training data does not eliminate hallucination once innovation occurs.
Read more
Variational Inference for Evidential Deep Learning
Jiawei Tang, Xinyan Du, Hui Liu, Junhui Hou, Yuheng Jia
Theory Interpretability Computer Vision
  • Introduces a principled variational framework for Evidential Deep Learning (VI-EDL).
  • Derives an Evidence Lower Bound (ELBO) to control evidence growth and enhance uncertainty quantification.
  • Establishes theoretical generalization guarantees, validating the heuristic parameter setting in conventional EDL.
  • Demonstrates state-of-the-art performance in various applications, including out-of-distribution detection and noise detection.
Read more
Self-Improvement Imitation with Biologically Guided Search for Protein Design Under Oracle Budgets
Ashima Khanna, Dominik Grimm
Optimization
  • Introduction of SILO, a framework for protein design under oracle budgets.
  • Utilization of hierarchical edit policy for structured mutation proposals.
  • Implementation of incremental stochastic beam search and UCB-based selection for candidate evaluation.
  • Demonstrated superior performance across multiple protein fitness landscapes compared to existing methods.
Read more
Geometric Flow Matching for Molecular Conformation Generation via Manifold Decomposition
Yunqing Liu, Yi Zhou, Wenqi Fan
Generative Models
  • GO-Flow introduces a manifold-aware approach to molecular conformation generation, decomposing the process into translation, rotation, and conformation subspaces.
  • The method employs tailored flow matching objectives that respect the geometry of each subspace, avoiding the need for models to relearn basic geometric constraints.
  • GO-Flow achieves state-of-the-art performance in generating molecular conformations, demonstrating high fidelity with as few as 50 sampling steps.
  • The framework encourages rotation-consistent generation and improves geometric validity by integrating physical inductive biases.
Read more
Deep ZakaiJ: Structured Filtering for Jump-Diffusion Time Series Forecasting
Yan Leng, Thibaut Mastrolia, Hao Wang
Time Series
  • Deep ZakaiJ integrates the Zakai nonlinear filtering equation into a neural framework for structured inference in jump-diffusion systems.
  • The encoder employs a three-step process for belief updating, achieving first-order accuracy in filtering evolution.
  • The decoder is designed to parameterize key dynamics conditioned on filtered beliefs, enhancing interpretability.
  • Empirical results show improved distributional quality and well-calibrated predictive intervals across various datasets.
Read more
Few-shot Cross-country Generalization of Tabular Machine Learning and Foundation Models for Childhood Anemia Prediction under Distribution Shift
Yusuf Brima, Marcellin Atemkeng, Lansana Hassim Kallon, David Niyukuri, Antoine Vacavant, Samuel Saidu, Ding-Geng Chen
Theory Interpretability
  • TabPFN outperforms traditional models in data-scarce environments for childhood anemia prediction.
  • Predictive performance is constrained by cross-population heterogeneity rather than model architecture.
  • The study highlights the importance of addressing population-level structural challenges in global health.
  • Feature importance analysis identifies child age, altitude, and height-for-age as key predictors of anemia.
Read more
Online Learning on Hidden-Convex Losses via Algorithmic Equivalence: Optimal Regret, Geometric Barrier, and Bandit Feedback
Anas Barakat, Andreas Kontogiannis, Vasilis Pollatos, Ioannis Panageas, Antonios Varvitsiotis
Optimization Theory
  • OGD achieves O(√T) regret for hidden-convex losses under exact gradient feedback, matching the optimal rate for adversarial online convex optimization.
  • The paper introduces a necessary-and-sufficient Hessian compatibility condition, expanding the class of reparameterizations that can be used.
  • A lower bound is established, demonstrating that without Hessian compatibility, OGD can incur Ω(T) regret.
  • The analysis is extended to bandit feedback, achieving a regret bound of O(T^(3/4)).
Read more
SL-BiLEM: Structured Learnable Behavior-in-the-Loop Epidemic Modeling for Forecasting and Policy Evaluation
Haochun Wang, Sendong Zhao, Jingbo Wang, Yanrui Du, Bing Qin, Ting Liu
Time Series
  • SL-BiLEM effectively integrates behavioral dynamics into epidemic modeling, addressing feedback loops caused by human responses to disease spread.
  • The framework shows a 76% improvement over neural-mechanistic baselines and significantly reduces out-of-distribution degradation.
  • SL-BiLEM provides 100% bootstrap confidence interval coverage across synthetic counterfactual experiments, demonstrating its robustness.
  • The model achieves Treatment Effect Accuracy exceeding 0.85, indicating strong performance in counterfactual policy evaluation.
Read more
Balancing Plasticity and Stability with Fast and Slow Successor Features
Raymond Chua, Doina Precup, Blake Richards
Reinforcement Learning Robotics Theory
  • Introduces a continual RL setup with smooth, continuous non-stationarity.
  • Demonstrates that performance degradation under non-stationarity is primarily due to instability rather than insufficient plasticity.
  • Proposes a framework integrating Successor Features with multi-timescale synaptic consolidation.
  • Utilizes cross-attention over SFs to provide insights into the distribution of stability and plasticity across temporal dimensions.
Read more
Towards the Connection between Activation Sparsity and Flat Minima
Ze Peng, Jian Zhang, Lei Qi, Yang Gao, Yinghuan Shi
Theory Efficient ML
  • Activation sparsity is linked to the flatness of loss landscapes in deep networks.
  • The authors introduce the concept of derivative sparsity, which aids in pruning during backpropagation.
  • Proposed modifications can effectively enhance activation sparsity and reduce computational costs.
  • Empirical results show at least 36% improvement in inference sparsity and 50% in training sparsity over standard Transformers.
Read more
Ratio-Variance Regularized Policy Optimization
Yu Luo, Shuo Han, Yihan Hu, Lei Lv, Huaping Liu, Fuchun Sun, Jianye Hao, Dong Li
Reinforcement Learning Large Language Models Robotics
  • R2VPO eliminates the need for binary hard clipping in policy optimization.
  • The method preserves critical gradient signals while down-weighting stale data.
  • R2VPO shows significant performance improvements across diverse tasks, especially with smaller models.
  • The approach enhances sample efficiency by effectively utilizing off-policy data.
Read more
SilIF: Silhouette-Augmented Isolation Forest for Unsupervised Transaction Fraud Detection
Venkatakrishnan Gopalakrishnan
Theory Interpretability Efficient ML
  • SilIF enhances Isolation Forest by adding a silhouette-based scoring layer.
  • Demonstrated a statistically significant improvement in fraud detection performance on the IEEE-CIS benchmark.
  • Characterizes conditions under which the silhouette augmentation is beneficial or not.
  • Provides open-source code for reproducibility of results.
Read more
From Scores to Gibbs Correctors: Accelerating Uniform-Rate Discrete Diffusion Models
Yuchen Liang, Ness Shroff, Yingbin Liang
Generative Models Theory Efficient ML
  • Introduction of the GADD algorithm, achieving O(polylog(Ξ΅βˆ’1)) sampling complexity.
  • No additional training required beyond standard score estimation.
  • Demonstrated practical advantages in various generative tasks.
  • General framework for analyzing predictor-corrector methods in discrete diffusion models.
Read more
Nonlinear Data Integration via Kernel Methods for Data Collaboration Analysis
Yamato Suetake, Yuta Kawakami, Shunnosuke Ikeda, Yuichi Takano
Federated Learning Optimization Theory
  • Proposes a kernel-based integration framework for Data Collaboration analysis.
  • Introduces Linear Kernel Integration (LKI) and Nonlinear Kernel Integration (NKI) to handle nonlinear dimensionality reduction.
  • Incorporates graph regularization and centering constraints to enhance representation quality.
  • Demonstrates improved classification accuracy in image classification tasks using NKI.
Read more
LUCoS: Latent Unsupervised Context Selection for Tabular Foundation Models
Oroel Ipas, Guillermo Gomez-Trenado, RocΓ­o Romero-Zaliz, Isaac Triguero
Theory Efficient ML
  • LUCoS improves predictive performance in low-label tabular learning by selecting context instances based on latent embeddings.
  • The method outperforms random selection and traditional tabular space methods across multiple datasets and metrics.
  • Instance selection is crucial for TFMs, especially in cold-start scenarios where no labels are available.
  • LUCoS demonstrates that defining representativeness in a meaningful representation geometry is essential for effective context selection.
Read more
Latent Q-Barrier Shielding for Safe In-Context Reinforcement Learning
Minjae Kwon, Amir Moeini, Shangtong Zhang, Lu Feng
Reinforcement Learning
  • Introduction of a latent Q-Barrier shield for safe ICRL deployment.
  • The shield utilizes learned context representations and cost critics without parameter updates.
  • Proven theoretical guarantees for budget-safe continuations using Q-Barrier conditions.
  • Empirical results show improved reward-safety tradeoffs in multiple benchmarks.
Read more
Pretrained Approximators for Low-Thrust Trajectory Cost and Reachability
Zhong Zhang, Giacomo Acciarini, Dario Izzo, Hexi Baoyin, Francesco Topputo
Optimization
  • Introduction of machine learning surrogates for low-thrust trajectory design to reduce computational costs.
  • Demonstration of a scaling law where performance improves with increased dataset size and model capacity.
  • Development of a large-scale dataset using a homotopy-ray strategy for mission design.
  • Implementation of a self-similar transformation for generalization across diverse orbital scenarios.
Read more
The Kalman Evolve: Closing the Gap in Kalman Filtering via Interpretable Algorithm Discovery
Vasileios Saketos, Ming Xiao
Time Series Optimization Interpretability
  • Kalman Evolve framework optimizes both noise parameters and update structure of the Kalman Filter.
  • Introduces interpretable, non-affine modifications to the classical Kalman filter.
  • Demonstrates that affine updates are structurally suboptimal in nonlinear sensing models.
  • Achieves up to 12% reduction in RMSE compared to strong baselines.
Read more
Max-Window Scale Estimation for Near-Lossless HiF8 W8A8 Quantization-Aware Training
Yingying Cheng, Jinquan Shi, Li Zhou, Zhiyang He, Zhaoyi Sun, Fan Zhang, Jie Sun
NLP Large Language Models Efficient ML
  • Identification of two orthogonal failure modes in QAT: amax saturation and catastrophic forgetting.
  • Introduction of a max-algorithm DTS strategy for improved scale estimation.
  • Development of a two-phase training protocol to stabilize learning and preserve pretrained knowledge.
  • Comprehensive failure analysis revealing the limitations of existing scaling methods.
Read more
Localizing Memorized Regions in Diffusion Models via Coordinate-Wise Curvature Differences
Gwangho Kim, Sungyoon Lee
Generative Models
  • Introduces a geometric characterization of local memorization in diffusion models.
  • Proposes curvature-difference methods to isolate overfitting-driven memorization.
  • Derives a score-difference proxy that unifies existing memorization detection metrics.
  • Empirical results show improved localization of memorized regions in Stable Diffusion.
Read more
Trust Region Q Adjoint Matching
Yonghoon Dong, Kyungmin Lee, Changyeon Kim, Jaehyuk Kim, Jinwoo Shin
Reinforcement Learning Optimization Theory
  • TRQAM introduces a trust-region parameter to stabilize off-policy fine-tuning of pretrained flow policies.
  • The method effectively controls the KL divergence between fine-tuned and pretrained policies, preventing destructive drift.
  • Theoretical results demonstrate that the KL divergence can be explicitly modeled as a function of the trust-region parameter.
  • TRQAM outperforms existing methods in offline RL tasks, achieving a notable success rate of 68%.
Read more
HRVConformer: Neonatal Hypoxic-Ischemic Encephalopathy Classification from the Heart Rate signals
Shuwen Yu, William P Marnane, Geraldine B. Boylan, Gordon Lightbody
Time Series
  • HRVConformer processes raw heart rate signals directly, eliminating the need for handcrafted features.
  • The architecture integrates convolutional and Transformer components to enhance classification performance.
  • The model was trained on a comprehensive dataset, demonstrating robust performance metrics.
  • HRVConformer outperformed existing baseline models in HIE classification tasks.
Read more
Symbolic Regression via Latent Iterative Refinement
Xieting Chu, Sriram Vishwanath, Vijay Ganesh
Theory Interpretability Optimization
  • Introduces Latent Equation Embedding (LEE) for symbolic regression.
  • Addresses the amortization gap in neural symbolic regression through iterative inference.
  • Combines iterative refinement with gradient descent for improved robustness.
  • Achieves 2-10 times simpler expressions compared to leading baselines.
Read more
PRISM: Position-encoded Regressive Inverse Spectral Model for Multilayer Thin-Film Design
Runtian Wang, Renhao Xue, Baige Chen, Hao Wu
Optimization Generative Models Efficient ML
  • PRISM integrates discrete material selection and continuous thickness regression in a single autoregressive transformer model.
  • Introduces spectrum prefix conditioning and cumulative-depth Rotary Position Embeddings to enhance model efficiency and accuracy.
  • Achieves over 50% reduction in MAE compared to other transformer baselines while using only one-fifth of the parameters.
  • State-of-the-art performance with an MAE of 0.010 on in-distribution validation benchmarks.
Read more
Probabilistic Recurrent Intention Switching Model
Wenyuan Sheng, Hao Zhu, Joschka Boedecker
Reinforcement Learning Robotics Interpretability
  • PRISM replaces traditional memoryless models with a recurrent neural network for intention switching.
  • The EM objective decomposes into independent subproblems, allowing for efficient reward recovery.
  • PRISM demonstrates high performance on diverse tasks, recovering interpretable intentions without supervision.
  • The framework is the first to apply multi-intention IRL to a large-scale robotic manipulation dataset.
Read more
TGFormer: Towards Temporal Graph Transformer with Auto-Correlation Mechanism
Hongjiang Chen, Pengfei Jiao, Ming Du, Xuan Guo, Zhidong Zhao, Di Jin, Xiao Liu
Graph Learning Time Series
  • TGFormer redefines temporal graph learning by treating it as a time-series analysis problem.
  • The Series Transformer layer effectively captures long-term dependencies using a Transformer-based architecture.
  • The auto-correlation mechanism enhances the model's ability to capture periodic patterns with reduced computational complexity.
  • TGFormer consistently outperforms state-of-the-art methods across multiple real-world datasets.
Read more
Separate Aggregation of Split Network for Personalized Federated Learning
Yunseok Kang, Jaeyoung Song
Federated Learning
  • PGFedSplit improves personalization and global generalization in federated learning.
  • The framework utilizes a split architecture with adaptive aggregation scheduling.
  • Clients benefit from a mix of local and synthetic representations to enhance robustness.
  • Extensive experiments show consistent performance improvements over state-of-the-art PFL methods.
Read more
ARBITER: Reasoning Trajectory Basins and Majority Vote Failures in Test-Time Sampling
Meng Cai, Lars Kulik, Farhana Choudhury
NLP Large Language Models
  • Identifies wrong-majority failure as a critical issue in consensus decoding of language models.
  • Introduces ARBITER, a framework that accumulates same-model evidence for challenger basins while treating consensus as a prior.
  • Demonstrates that many direct correction strategies degrade performance, while sparse additive evidence can improve accuracy.
  • Achieves notable improvements in accuracy across various models and benchmarks, indicating recoverable headroom in sampled outputs.
Read more
ChainzRule: Sample-Efficient, Robust Deep Learning Across Tabular, NLP, and Vision Tasks
Rowan Martinsh
Efficient ML Interpretability NLP
  • ChainzRule architecture utilizes learnable polynomial layers to enhance model flexibility and efficiency.
  • Differential Regularization (DREG) imposes a sensitivity budget, improving robustness and interpretability.
  • CR achieves competitive results across diverse domains with significantly less training data compared to traditional models.
  • The model maintains a consistent gradient tail ratio, indicating reliability and stability during inference.
Read more
Context-Instrumental Data Distillation for Kubernetes Manifest Generation: Method and Experimental Evaluation
Andrey Kozachok, Anatoliy Bakaev, Aleksandr Kozachok, Shamil Magomedov, Artem Noev
NLP Large Language Models Efficient ML
  • Introduction of context-instrumental data distillation for SLMs in generating Kubernetes manifests.
  • Focus on the importance of data quality and validation over quantity in training datasets.
  • Achieved a high accuracy rate of 91.5% in generating valid Kubernetes configurations.
  • Demonstrated that strict output format requirements significantly impact result quality.
Read more
CurveRL: Principled Distribution-Aware Context Reweighting for LLM Reasoning
Ke Sun, Yizhou Zhao, Jiayi Xin, Qi Long, Weijie Su
NLP Large Language Models Reinforcement Learning
  • CurveRL formulates prompt reweighting as context distribution control, enhancing understanding of optimal weights.
  • The approach uses a distribution-aware utility function in pass-rate quantile space to derive optimal weights.
  • Extensive experiments show CurveRL's superior performance in improving pass@1 and pass@k metrics.
  • The study emphasizes the significance of context-distribution control in RLVR algorithm design.
Read more
Explainable Comparison of Feature-Based and Deep Learning Models for TROPOMI Methane Plume Screening
Solomiia Kurchaba, Joannes D. Maasakkers, Berend J. Schuit, Ilse Aben
Computer Vision Interpretability
  • Comparison of feature-based and image-based machine learning models for methane plume classification.
  • Identification of retrieval artifacts in TROPOMI data that resemble methane emissions.
  • Use of SHAP for explainability to interpret model decisions.
  • Evaluation of models under both balanced and imbalanced conditions.
Read more
Probabilistic Smoothing with Ratio-Monotone Transforms for Global Optimization
Kukyoung Jang, Taehyun Cho, Junrui Zhang, Ping Xu, Kyungjae Lee
Optimization Theory Efficient ML
  • ProMoT generalizes existing smoothing methods by allowing a broader class of symmetric unimodal kernels and introducing ratio-monotone transformations.
  • The framework preserves the global maximizer and ensures convergence of stationary points near the true optimum without a decreasing smoothing schedule.
  • A leave-one-out variance reduction technique is introduced, improving the iteration complexity of gradient estimation.
  • ProMoT demonstrates improved robustness to hyperparameter tuning compared to traditional Gaussian smoothing methods.
Read more
Curriculum Learning for Safety Alignment
Sandeep Kumar, Virginia Smith, Chhavi Yadav
NLP Large Language Models Optimization
  • Introduction of Staged-Competence framework for safety alignment in LLMs.
  • Demonstrated 16% reduction in OOD harmful response rates and 20% reduction in jailbreak attack success rates.
  • Improved data efficiency, achieving baseline safety performance with only 75% of the training data.
  • Identified inconsistencies in popular safety datasets and provided a cleaned dataset for training.
Read more
ReMoE: Boosting Expert Reuse through Router Fine-Tuning in Memory-Constrained MoE LLM Inference
Xiongwei Zhu, Xiaojian Liao, Tianyang Jiang, Yusen Zhang, Liang Wang, Limin Xiao
Large Language Models Efficient ML
  • ReMoE enhances expert reuse in Mixture-of-Experts models through router fine-tuning.
  • The method promotes temporal stability in routing to align with cache locality constraints.
  • Experiments show a 26% increase in expert reuse without sacrificing task performance.
  • Real-system evaluations indicate an 8.4% improvement in throughput and significant reductions in processing time.
Read more
SemanticZip: A Pilot Framework for Lossy Text Compression with LLMs as Semantic Decompressors
Natalia Trukhina, Vadim Vashkelis
NLP Large Language Models
  • SemanticZip defines a new approach to lossy text compression using LLMs for semantic decompression.
  • The framework distinguishes between protected commitment-preserving channels and lossy channels for safe context compression.
  • A small experimental harness is presented to compare various text representation formats.
  • Results show a coherent gradient of compression and recoverability, with structured representations performing best.
Read more
WINDQuant: Weight-Informed Neural Decision-Making for Global Mixed-Precision LLM Quantization
Phong Nam Huu Nguyen, Khoi M. Le, Cong-Duy T Nguyen, Anh Tuan Luu, Thong Thanh Nguyen, Tho Quan
Large Language Models Reinforcement Learning Efficient ML
  • WINDQuant reformulates mixed-precision quantization as a sequential decision-making problem, allowing for adaptive bit-width allocation.
  • The framework operates at a fine-grained column-chunk level, improving precision assignment flexibility.
  • WINDQuant achieves competitive performance in ultra-low-bit settings on LLaMA models without requiring full retraining.
  • The approach integrates reinforcement learning with activation-aware mechanisms for effective quantization.
Read more
Function-Valued Causal Influence in Nonlinear Time Series
Valentina V. Kuskova, Dmitry Zaytsev, Michael Coppedge
Time Series
  • Scalar edge scores in nonlinear causal discovery obscure the true nature of causal relationships.
  • Function-valued causal influence provides a more nuanced understanding of causal effects that vary across states.
  • The proposed framework allows for direct estimation of causal response functions from trained models.
  • Synthetic experiments reveal that similar scalar scores can correspond to qualitatively different causal mechanisms.
Read more
RealBench: Benchmarking Data-Driven Numerical Weather Forecasting Under Operational Conditions and Extreme Event Challenges
Ruize Li, Zhibin Wen, Tao Han, Hao Chen, Fenghua Ling, Wei Zhang, Song Guo, Lei Bai
Time Series
  • RealBench provides a more accurate evaluation framework for AI weather forecasting models under operational conditions.
  • The benchmark eliminates data leakage by using a strictly out-of-distribution test set for 2025.
  • It integrates real-time operational analysis and extensive in-situ observations for direct performance evaluation.
  • Event-specific metrics are introduced to better assess the forecasting of high-impact extreme weather events.
Read more
DriftingMol: Decoder-Coupled Drift for One-Pass Property-Conditional Molecular Generation
Jiangjie Qiu, Yijun Li, Wentao Li, Xiaonan Wang
Generative Models
  • Introduction of a SELFIES latent drifting pipeline for efficient molecular generation.
  • Development of decoder-coupled drift, which utilizes a frozen VAE decoder for gradient preservation.
  • Demonstration of superior performance in property control and uniqueness compared to other drift variants.
  • Validation of the method through extensive ablation studies and protocol-matched conditioning.
Read more
TSFMAudit: Data Contamination Auditing in Forecasting Time Series Foundation Models
Hongkai Li, Shifeng Xie, Lefei Shen, Zhuo Li, Mouxiang Chen, Xiaobin Zhang, Han Fu, Jianling Sun, Xiaoxue Ren, Chenghao Liu
Time Series
  • First study on pretraining contamination auditing specifically for Time Series Foundation Models (TSFMs).
  • Introduces TSFMAudit, a framework leveraging probe adaptation dynamics to infer contamination risk.
  • Demonstrates that contaminated datasets show faster loss reduction and less parameter movement during fine-tuning.
  • Evaluated on 6 TSFMs and 187 datasets, outperforming 10 existing contamination auditing methods.
Read more
Agile Online Model Selection: Resolving Adaptation Lag via Safeguarded Large Learning Rates
Kei Takemura, Ryuta Matsuno, Keita Sakuma
Theory Optimization
  • Introduces a novel OOMD algorithm that utilizes safeguarded large learning rates to enhance adaptation speed.
  • Employs a dynamic penalty mechanism to manage the risks associated with large learning rates, ensuring theoretical robustness.
  • Demonstrates significant reduction in adaptation lag, outperforming existing tuning-free algorithms in various datasets.
  • Achieves near-optimal worst-case guarantees while allowing for aggressive adaptation to distribution shifts.
Read more
Label-NTK Alignments and A Tighter Convergence Bound in the NTK Regime
Ruchirinkil Marreddy, Chaoyue Liu
Theory Optimization
  • Introduces Label-NTK and Residual-NTK alignments to improve convergence bounds in the NTK regime.
  • Demonstrates that existing convergence guarantees are overly pessimistic due to reliance on the smallest NTK eigenvalue.
  • Establishes a refined convergence bound that closely matches empirical training dynamics.
  • Provides theoretical justification for the observed alignments under mild assumptions.
Read more
SIKA-GP: Accelerating Gaussian Process Inference with Sparse Inducing Kernel Approximations for Bayesian Deep Learning
Wenyuan Zhao, Rui Tuo, Chao Tian
Efficient ML Theory Computer Vision
  • SIKA-GP reduces the computational complexity of GP inference to O(log M) using sparse inducing kernel approximations.
  • The method integrates seamlessly with Bayesian neural networks, enhancing scalability for high-dimensional feature representations.
  • Empirical results show significant speedups in training and inference while maintaining predictive accuracy.
  • The approach is applicable to deep feature learning, addressing challenges posed by large-scale datasets.
Read more
On the Role of Inductive Bias in Time-Series Pretraining: A Case Study in Learning Generalizable Representations for Clinical Time Series
Sharmita Dey, Diego Paez-Granados
Time Series
  • Introduces PATHOFM, a transformer model for clinical time-series pretraining.
  • Identifies and formalizes three key inductive biases: Local Completion, Temporal Continuity, and Unsupervised In-Context Dynamics.
  • Demonstrates that dynamics-centric mixtures of objectives provide balanced transfer across classification and regression tasks.
  • Highlights the importance of preserving waveform structure while ensuring generalizability across subjects.
Read more
Time Series Causal Discovery via Context-Conditioned and Causality-Augmented Pretraining
Biao Ouyang, Tengxue Zhang, Zhihao Zhuang, Yang Shu, Chenjuan Guo, Bin Yang
Time Series
  • PTCD introduces a pretraining paradigm for time-series causal discovery, enhancing adaptability to new datasets.
  • The framework captures both intra-window and inter-window dependencies through a dual-scale iterative attention mechanism.
  • Causal augmentation strategies, including intervention-based tasks and causal mixup, improve generalization and robustness.
  • Extensive experiments show PTCD's superior performance in causal discovery and root cause identification compared to existing methods.
Read more
Aligning Few-Step Generative Models by Amortizing Sample-based Variational Inference
Jaewoo Lee, Hyeongyu Kang, Dohyun Kim, Kyuil Sim, Woocheol Shin, Minsu Kim, Taeyoung Yun, Jeongjae Lee, Sanghyeok Choi, Tabitha Edith Lee, Jongchul Ye, Jinkyoo Park
Generative Models Reinforcement Learning Robotics
  • FAV provides a general alignment framework for few-step generative models without restrictive assumptions.
  • The method leverages sample-based variational inference to decouple alignment from specific model families.
  • FAV achieves state-of-the-art performance in both robotics manipulation and image generation tasks.
  • The framework can fine-tune diverse generative models, including GANs and VAEs, across various scales.
Read more
GEM: Geometric Entropy Mixing for Optimal LLM Data Curation
Yue Min, Ziyun Qiao, Ruining Chen, Yujun Li
Large Language Models Optimization Interpretability
  • GEM reformulates data curation as a variational problem on a hypersphere, addressing limitations of traditional methods.
  • The framework includes a mixing-balance regularizer to prevent cluster collapse under embedding anisotropy.
  • A provable MM-based inference algorithm ensures stable convergence for the regularized objective.
  • GEM achieves linear-time complexity for web-scale deployment through teacher-student distillation.
Read more
A PAC-Bayesian View of Generalisation for Physics-Informed Machine Learning
Thien V. Nguyen, Amaury Habrard, Benjamin Guedj
Theory
  • Introduces a PAC-Bayesian framework for PIML that provides generalisation guarantees in regression settings with unbounded losses.
  • Develops a multi-task approach that jointly treats data fidelity and physical constraints, leading to tighter generalisation bounds.
  • Establishes a direct connection between physical regularity and statistical performance through input-gradient dependent complexity terms.
  • Proposes a self-bounding-aware learning algorithm that optimises derived bounds and estimates constants in practical settings.
Read more
CAFD: Concept-Aware DNN Fault Detection using VLMs
Amin Abbasishahkoo, Mahboubeh Dadkhah, Lionel Briand
Computer Vision NLP Efficient ML
  • Introduction of Concept-Aware Fault Detection (CAFD) for DNNs.
  • Novel Concept Failure Ratio (CFR) metric derived from Vision-Language Models (VLMs).
  • CAFD integrates model-based, distance-based, and concept-based features for improved fault detection.
  • Empirical evaluations show CAFD outperforms existing methods, especially under budget constraints.
Read more
Explainable Retinal Imaging for Prediction of Multi-Organ Dysfunction in Type 2 Diabetes
Mini Han Wang, Liting Huang, Wei Hong, Boonthawan Wingwon
Interpretability
  • Developed a framework for predicting multi-organ dysfunction in T2DM using routine clinical biomarkers.
  • Gradient boosting model outperformed traditional logistic regression in predicting multi-system dysregulation.
  • SHAP analysis provided insights into the contributions of various biomarkers to multi-system risk.
  • The study highlights the importance of capturing complex interactions among biomarkers for better clinical outcomes.
Read more
Learning Latent Dynamical Causal Processes for Single-Cell Perturbation Prediction
Wenkang Jiang, Yuhang Liu, Erdun Gao, Ehsan Abbasnejad, Lina Yao, Javen Qinfeng Shi
Generative Models Theory Time Series
  • Proposes a unified framework that captures both latent causal mechanisms and temporal dynamics in single-cell perturbation responses.
  • Introduces a theoretical analysis ensuring the identifiability of latent causal variables in the proposed model.
  • Develops CITE-VAE, a framework that learns causally meaningful latent dynamics for principled generalization.
  • Empirical results validate the effectiveness of the proposed method, outperforming existing approaches on benchmark datasets.
Read more