AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

48 Papers today
8h Update frequency
7 Days of history
Forecasting what Matters: Decision-Focused RL for Controlled EV Charging with Unknown Departure Times
Giuseppe Gabriele, Fabio Pavirani, Seyed Soroush Karimi Madahi, Chris Develder
Reinforcement Learning Optimization Time Series
  • Introduces a decision-focused reinforcement learning framework for EV charging.
  • Addresses the challenge of unknown departure times in EV charging optimization.
  • Implements end-to-end training of the forecaster and RL agent to improve decision quality.
  • Demonstrates up to 14% improvement in total reward and 55% reduction in unsupplied energy.
Read more
From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning
Lingjing Kong, Xin Liu, Guangyi Chen, Martin Q. Ma, Xiangchen Song, Yuekai Sun, Mikhail Yurochkin, Taylor W. Killian, Ruslan Salakhutdinov, Kun Zhang, Eric P. Xing, Zhengzhong Liu
NLP Large Language Models Reinforcement Learning
  • Introduces a hierarchical latent selection model for reasoning in LLMs.
  • Demonstrates the complementary roles of supervised fine-tuning and reinforcement learning.
  • Shows that RL can extract reusable atomic modules from compound reasoning traces.
  • Finds that training on compound traces enhances generalization capabilities.
Read more
TRIDENT: Breaking the Hybrid-Safety-Physics Coupling for Provably Safe Multi-Agent Reinforcement Learning
Zijie Meng, Ziwei Li, Yufei Liu, Zhiyu Li, Jiyuan Liu, Wenhua Nie, Bingcai Wei, Miao Zhang
Reinforcement Learning Robotics Theory
  • Introduces TRIDENT, the first MARL framework co-designing hybrid-action, safety, and physics modules.
  • Establishes a coupling lemma that formalizes the interdependencies of hybrid actions, safety, and physics in MARL.
  • Achieves 95.5% reduction in training-time violations compared to MADDPG and 76.3% compared to MACPO.
  • Demonstrates a 13.5% improvement in reward over the strongest unconstrained baseline.
Read more
Online Distributional Prediction via Latent Cluster Geometry Under Drift and Corruption
Navyansh Mahla, Prateek Chanda, Ganesh Ramakrishnan
Theory Optimization Time Series
  • Introduces a latent clustering-configuration view for online distributional prediction.
  • Derives high-probability regret bounds that account for drift and corruption effects.
  • Demonstrates that temporal localization of memory can mitigate stale-geometry issues.
  • Achieves sublinear cumulative Wasserstein regret without requiring a parametric model.
Read more
P-K-GCN: Physics-augmented Koopman-enhanced Graph Convolutional Network for Deep Spatiotemporal Super-resolution
Xizhuo (Cici) Zhang, Zekai Wang, Fei Liu, Bing Yao
Graph Learning Time Series Theory
  • Introduction of P-K-GCN for spatiotemporal super-resolution.
  • Incorporation of physics-based constraints to enhance model fidelity.
  • Utilization of Koopman operator theory for linearizing nonlinear dynamics.
  • Theoretical guarantees on error reduction through Rademacher complexity.
Read more
Correcting Sensor-Induced Distribution Drift with Wasserstein Adversarial Learning
Saraa Ali, Vladimir Bocharnikov, Fedor Ratnikov, Mikhail Hushchyn, Artem Ryzhikov, Denis Derkach
Generative Models Interpretability
  • Introduces a Wasserstein GAN-inspired method for unsupervised calibration of sensor systems.
  • Demonstrates the ability to recover interpretable degradation parameters from distribution shifts.
  • Validates the approach on both a toy model and real-world high-energy physics data.
  • Shows improved calibration accuracy and correlation with ground truth aging coefficients.
Read more
QueryMarket: Cost-Aware Online Active Learning in Data Markets
Xiwen Huang, Pierre Pinson
Efficient ML Optimization Time Series
  • Introduces QueryMarket, a novel framework for online active learning that incorporates cost and budget constraints.
  • Develops OVBAL, an active learning strategy that estimates label utility and adapts to nonstationary environments.
  • Demonstrates the effectiveness of OVBAL in both synthetic and real-world scenarios, particularly in managing costs.
  • Addresses the limitations of existing online active learning methods by integrating economic considerations.
Read more
Explaining Attention with Program Synthesis
Amiri Hayes, Belinda Li, Jacob Andreas
NLP Large Language Models Interpretability
  • Introduces program synthesis as a method for interpreting attention mechanisms in transformer models.
  • Demonstrates that a substantial fraction of attention heads can be approximated by executable programs.
  • Shows that replacing attention heads with synthesized programs incurs minimal performance loss.
  • Highlights the potential for causal validation and model editing using symbolic representations.
Read more
The Illusion of Improvement: Reject Inference Strategies in Credit Scoring
Bruno Scarone, Ricardo Baeza-Yates
Theory Interpretability
  • Identification of a structural failure mode in credit scoring models where accuracy improvement masks recall deterioration.
  • Proposal of a controlled exploration strategy to mitigate survival bias without statistical assumptions.
  • Demonstration that standard evaluation metrics can mislead practitioners regarding model performance.
  • Minimal exploration rates (2-5%) can effectively assess the feedback loop's severity at low cost.
Read more
Functional Equivalence in Attention: A Comprehensive Study with Applications to Linear Mode Connectivity
Viet-Hoang Tran, Vinh Khanh Bui, Van-Hoan Trinh, Tan Lai Ngoc, Tan M. Nguyen
Theory
  • Functional equivalence in attention mechanisms is more complex than in traditional architectures.
  • Sinusoidal positional encodings preserve the symmetry of vanilla attention, while RoPE enhances expressivity by reducing symmetry.
  • Positional encodings significantly affect linear mode connectivity in Transformers.
  • An alignment algorithm demonstrates the dependency of connectivity on the choice of positional encoding.
Read more
ResAware: Cross-Environment Website Fingerprinting via Resource-Privileged Distillation
Chongru Fan, Wei Wang, Wentao Huang, Zhenquan Ding, Jinqiao Shi, Lei Cui, Zhiyu Hao, Xiaochun Yun
Theory
  • ResAware improves the robustness of Website Fingerprinting models in real-world environments.
  • The framework utilizes a training-rich/inference-poor approach to leverage resource-level features.
  • Significant performance improvements were observed under temporal, spatial, and browser variations.
  • The method enhances existing WF models without increasing the online attack capabilities.
Read more
Counterfactual Optimization of Baseball Pitch Sequences and Estimation of Its Impact on Season-Level Statistics
Ryota Takamido, Hiroki Nakamoto
Optimization
  • Optimization of both final and setup pitches can significantly influence season-level performance metrics.
  • A Transformer-based model was developed to predict pitch outcomes based on contextual information.
  • Counterfactual analyses revealed that altering pitch sequences can lead to substantial improvements in K/9 statistics.
  • Insights on effective pitch locations and the role of pitch command were identified, enhancing strategic decision-making.
Read more
Complementary Attention Head Pruning for Efficient Transformers
Yaniv Livertovsky, Shahar Somin, Gonen Singer
NLP Efficient ML Graph Learning
  • CAHP introduces a global graph-based approach to attention head selection for Transformers.
  • The framework eliminates the need for predefined pruning ratios by automatically determining optimal head retention based on performance metrics.
  • CAHP outperforms existing methods, particularly in high-compression settings, while maintaining model accuracy.
  • The method avoids the 'proximity bias' seen in gradient-based pruning, ensuring diverse functional head retention across layers.
Read more
Discrete Autoregressive Transformer for Generative Mechanism Synthesis
Anar Nurizada, Anurag Purwar
Generative Models Robotics Optimization
  • Introduces a generative approach to mechanism synthesis using a discrete autoregressive transformer.
  • Addresses the limitations of traditional optimization methods by generating multiple mechanisms for a given coupler curve.
  • Utilizes a large dataset of over one million mechanisms to train the model effectively.
  • Achieves competitive performance metrics compared to existing methods, demonstrating the efficacy of the proposed approach.
Read more
Dual-Channel Grounded World Modeling (DCGWM): Structural Prevention of Objective Interference Collapse via Heterogeneous External Grounding with Inward-Only Gradient Flow
Akshay Hazare
Theory Robotics Multimodal
  • Formalizes Objective Interference Collapse (OIC) as a failure mode in joint latent world modeling.
  • Proposes DCGWM architecture with partitioned latent space to prevent OIC.
  • Introduces Asymmetric Grounding Adherence Loss for managing rollout drift.
  • Establishes theoretical results supporting the architecture's structural properties.
Read more
Compute Efficiency and Serial Runtime Tradeoffs for Stochastic Momentum Methods
Depen Morwani, Alexandru Meterez, Pranav Nair, Sham Kakade
Optimization Theory Efficient ML
  • Stochastic momentum methods like HB and ASGD show distinct tradeoffs between compute efficiency and serial runtime.
  • HB maintains SGD-level compute efficiency over a larger batch-size window, allowing for reduced serial runtime.
  • ASGD outperforms HB in small-batch scenarios but trades off compute efficiency for improved serial runtime at larger batch sizes.
  • The paper provides a theoretical framework for understanding the performance of these methods under varying spectral conditions.
Read more
Quantum Annealing Enhanced Reinforcement Learning for Accurate Remaining Useful Lifetime Prediction
Manoranjan Gandhudi, Arunkumar V., G. R. Anil, Gangadharan G. R
Reinforcement Learning Optimization Time Series
  • Introduction of Quantum Annealing Enhanced Q-learning (QAQL) for RUL prediction.
  • Q-value updates are reformulated as QUBO problems solved on a quantum processor.
  • Stochastic action selection from quantum annealer enhances exploration in reinforcement learning.
  • QAQL outperforms 14 classical and quantum baselines across multiple error metrics.
Read more
Diagnosing and Repairing Shape-Prior Shortcuts in Long-Range Single-Shot Fringe Projection Profilometry
Adam Haroon, Anush Lakshman, Cody Fleming, Beiwen Li
Computer Vision Robotics Interpretability
  • Identifies limitations of existing single-shot FPP methods in long-range settings.
  • Introduces a novel architecture, PhiCalNet, that improves depth reconstruction accuracy.
  • Demonstrates the effectiveness of mechanistic interpretability and uncertainty quantification in diagnosing and repairing model errors.
  • Achieves a significant reduction in mean absolute error for 3D reconstruction tasks.
Read more
Uncertainty Quantification of Engineering Structures by Polynomial Chaos Expansion and Multivariate Active Learning
Qitian Lu, Jafar Jafari-Asl, Panagiotis Spyridis, Lukas Novak
Theory Optimization Efficient ML
  • Introduction of a normalized variance aggregation for evaluating multi-output PCE models.
  • Development of a sequential adaptive sampling method that balances variance contribution and spatial exploration.
  • Demonstration of improved accuracy and stability in surrogate modeling for engineering structures.
  • Comparison with traditional sampling methods, highlighting the advantages of the proposed approach.
Read more
Enhanced Graph Neural Networks using K-Hop Gaussian Diffusion
Xuling Zhang, Peng Wang, Daiyan Li, Aoran Huang, Zeiwei Chen, Yongkui Yang
Graph Learning
  • Introduction of the K-Hop Gaussian (KHG) diffusion kernel to enhance GNNs.
  • KHG allows for multi-hop diffusion with Gaussian weighting, balancing local and global information.
  • Demonstrated superiority of KHG over traditional GNNs and existing diffusion kernels in noisy and complex graphs.
  • KHG serves as a modular, plug-and-play component for existing GNN architectures.
Read more
Amortized Probabilistic Retrieval of Atmospheric CO2 from OCO-2 Spectra Using Deep Learning with Laplace Approximations and Normalizing Flows
Alejandro Calle-Saldarriaga, Felix Jimenez, Jack Grosskreuz, Jiazheng Wang, Jonathan Hobbs, Matthias Katzfuss
Theory Efficient ML Interpretability
  • Introduces a deep learning framework for CO2 retrieval that significantly reduces computational time.
  • Utilizes high-fidelity simulation data to account for model errors and improve accuracy.
  • Implements Laplace approximations and normalizing flows for enhanced uncertainty quantification.
  • Demonstrates superior predictive accuracy compared to existing operational methods.
Read more
Learning to Refine Hidden States for Reliable LLM Reasoning
Chia-Hsuan Hsu, Jui-Ming Yao
Large Language Models Reinforcement Learning NLP
  • ReLAR introduces an iterative hidden-state refinement framework for LLMs.
  • The framework allows direct control over internal reasoning trajectories before generating outputs.
  • Reinforcement-learning-based controllers dynamically adjust the refinement process based on task complexity.
  • Experiments show improved coherence and reliability in reasoning tasks with lower inference costs.
Read more
Fair Cognitive Impairment Detection Through Unlearning
William Nguyen, Jiali Cheng, Hadi Amiri
Multimodal Audio & Speech
  • Introduction of FMD, a fair MCI detection framework that combines multiple modalities.
  • Utilization of cross-attention fusion for better interaction between speech, text, and image data.
  • Implementation of an unlearning mechanism to mitigate demographic biases in model predictions.
  • Demonstrated improved performance on multilingual benchmarks while reducing subgroup disparities.
Read more
PowerOPD: Stabilizing On-Policy Distillation with Bounded Power Transformation
Anhao Zhao, Junlong Tong, Yingqi Fan, Ping Nie, Wenjie Li, Xiaoyu Shen
NLP Large Language Models Efficient ML
  • PowerOPD addresses severe training pathologies in standard on-policy distillation.
  • The method employs a Box-Cox power transformation to create bounded rewards.
  • PowerOPD achieves significant accuracy gains and sample efficiency improvements.
  • The approach reduces wall-clock time and peak GPU memory usage compared to traditional methods.
Read more
ASTEROID: A Spatiotemporal Information Transformer for Forecasting Multi-Step Time Series of Molecular Dynamics
Kexin Wu, Luonan Chen, Renxiao Wang
Time Series
  • ASTEROID predicts multi-step atomic coordinates directly, bypassing traditional iterative integration methods.
  • The framework integrates Spatiotemporal Information Transformation into a Transformer architecture to model complex dependencies.
  • ASTEROID demonstrates superior accuracy and reduced computational costs compared to existing forecasting methods.
  • The model supports iterative multi-step forecasting over extended time scales, enhancing its practical applicability.
Read more
Geometric and Stochastic Analysis of Discontinuities in Sparse Mixture-of-Experts
Tho Tran Huu, Huu-Tuan Nguyen, Thien-Hai Nguyen, Nhat-Tri Ho, Viet-Hoang Tran, Tho Quan, Tan Minh Nguyen
Theory Efficient ML NLP
  • Discontinuities in SMoE architectures are classified by order, with lower-order discontinuities being more prevalent.
  • The authors establish that random perturbations in input space will almost surely encounter discontinuities, particularly order-1 ones.
  • A novel smoothing mechanism is proposed to mitigate the effects of discontinuities, enhancing model performance.
  • The analysis provides theoretical insights into the structure and behavior of discontinuities in SMoEs.
Read more
A Hybrid LSTM--Vision Transformer Architecture for Predicting HRRR Forecast Errors
David Aaron Evans, Jay C. Rothenberger, Kara J. Sulia, Nick P. Bassill, Chris D. Thorncroft
Time Series Multimodal
  • The hybrid LSTM-ViT framework improves forecast-error prediction skill compared to baseline LSTM models.
  • Incorporating vertically resolved atmospheric profiles enhances the model's ability to capture complex PBL processes.
  • The largest improvements in predictive skill are observed for precipitation forecast errors, achieving a twofold increase over the baseline.
  • The model is particularly effective during periods of enhanced PBL activity and complex atmospheric evolution.
Read more
KANLib -- An Modular, Extensible and Fast Kolmogorov-Arnold Network Implementation
Julian Hoever, Gregor Schiele
Theory Efficient ML Interpretability
  • KANLib is a modular and extensible framework for Kolmogorov-Arnold Networks.
  • It integrates features from existing KAN implementations to enhance usability and performance.
  • The framework supports various configurations and maintains compatibility with PyTorch.
  • Experimental results demonstrate KANLib's efficiency and predictive accuracy on benchmark datasets.
Read more
Self-CTRL: Self-Consistency Training with Reinforcement Learning
Itamar Pres, Laura Ruis, Melat Ghebreselassie, Belinda Z. Li, Jacob Andreas
NLP Large Language Models Reinforcement Learning
  • Self-CTRL optimizes for consistency between language models' self-explanations and their behavior.
  • The method includes two training directions: explanation training and behavior training.
  • In probabilistic reasoning tasks, consistency training improved bias reporting significantly.
  • In constitutional AI, Self-CTRL greatly enhanced refusal prediction accuracy.
Read more
MorphStrata: Layer-Specific Perturbations for Generating Morphence Students in Time-Series Moving Target Defense
Abhishek Bhardwaj, Arnav Doshi, Anusri Nagarajan, Thanh Quynh Nhu Ta, Mohammad Masum, Robert Chun, Jaydip Sen, Saptarshi Sengupta
Time Series
  • MorphStrata introduces a layer-specific perturbation strategy for time-series forecasting models.
  • The method enhances adversarial robustness while maintaining low computational overhead.
  • Empirical results show significant improvements in adversarial RMSE, especially in high entropy datasets.
  • The approach demonstrates a positive correlation between student model diversity and defense effectiveness.
Read more
EnvRL: Learn from Environment Dynamics in Agentic Reinforcement Learning
Zhitong Wang, Songze Li, Hao Peng, Shuzheng Si, Yi Wang, Maosong Sun, Juanzi Li
Reinforcement Learning Large Language Models
  • EnvRL enhances agentic RL by utilizing environment dynamics as implicit supervision signals.
  • The framework introduces two auxiliary objectives: state prediction and inverse dynamics.
  • Joint optimization of these objectives with the primary RL goal leads to better decision-making.
  • Empirical results show significant improvements in success rates on long-horizon tasks.
Read more
TuneAhead: Predicting Fine-tuning Performance Before Full Training Begins
Yuxiang Luo, Haonan Long, Chen Wang, Qiqi Duan, Xiaotian Lin, Yanwei Xu, Yuyu Luo, Weikai Yang, Nan Tang
NLP Large Language Models Efficient ML
  • TUNEAHEAD predicts fine-tuning performance before full training, reducing wasted computational resources.
  • The framework combines static dataset descriptors with dynamic probe features for accurate predictions.
  • SHAP-based attributions provide interpretable diagnostics, helping practitioners understand prediction drivers.
  • TUNEAHEAD outperforms strong baselines in extensive experiments, demonstrating its effectiveness.
Read more
Hierarchical Attention via Domain Decomposition
Stephan Köhler, Oliver Rheinbach
Theory Efficient ML NLP
  • Introduction of a hierarchical attention mechanism based on domain decomposition.
  • Demonstrated efficiency in approximating solution operators for a one-dimensional diffusion problem.
  • Outperformed a global low-rank attention baseline in terms of training speed and accuracy.
  • Utilized significantly fewer parameters compared to traditional methods.
Read more
Monotonic Kolmogorov-Arnold Networks: A Theoretical and Empirical Study of Monotonicity as an Inductive Bias
Mikhail Krasnov, Carolina Fortuna, Blaž Bertalanič
Theory Interpretability Efficient ML
  • MKAN guarantees hard monotonicity for all parameter values through exponential reparameterization and positive edge weights.
  • The representation-cost theorem provides a framework for understanding the dimensionality of monotone realizations of feature extractors.
  • Empirical results indicate that MKAN is competitive with existing monotone neural networks while offering enhanced interpretability.
  • The model successfully recovers ground-truth factors in controlled settings, outperforming traditional methods in terms of Spearman alignment.
Read more
Towards Anomaly Detection on Relational Data
Shiyuan Li, Yunfeng Zhao, Yue Tan, Qingfeng Chen, Yixin Liu, Shirui Pan
Graph Learning
  • RelAD effectively addresses the challenges of feature redundancy and complex relational dependencies in relational anomaly detection.
  • The framework integrates both attribute and relational edge reconstruction to enhance anomaly detection accuracy.
  • Extensive experiments on benchmark datasets show that RelAD outperforms existing anomaly detection methods.
Read more
Target-confidence Recourse Using tSeTlin machines: TRUST
K. Darshana Abeyrathna, Sara El Mekkaoui, Nils Enric Canut Taugbøl, Anuja Vats
Interpretability Optimization Theory
  • TRUST allows users to specify desired prediction confidence levels for counterfactual explanations.
  • The framework uses a Probabilistic Tsetlin Machine (PTM) to enhance the interpretability and robustness of recourse options.
  • Counterfactuals generated by TRUST are more stable and less fragile compared to traditional boundary-based approaches.
  • The methodology provides clause-level attribution, explaining the reliability of different counterfactuals.
Read more
On the Residual Scaling of Looped Transformers: Stability and Transferability
Shaowen Wang, Bingrui Li, Ge Zhang, Wenhao Huang, Shen Yan, Jian Li
Theory Optimization Large Language Models
  • The standard scaling ε = 1/√N is insufficient for looped Transformers due to weight sharing.
  • A new scaling ε = 1/N is proposed to stabilize training and control residual growth.
  • The derived parameterization ε = λ/(N√L) allows for effective hyperparameter transfer across different loop counts and depths.
  • Experiments validate that linear residual scaling enhances trainability and maintains optimal learning rates across various configurations.
Read more
No-Free-Fairness: Fundamental Limits and Trade-offs in Learning Systems
Khoat Than
Theory
  • Unfairness in learning systems arises from intrinsic structural properties rather than just biased data.
  • There exists a fundamental fairness-cost trade-off that limits the ability to achieve both high performance and fairness.
  • Finite sample sizes can lead to unavoidable disparities, even in ideal conditions where fair solutions exist.
  • The choice of model can create inherent limitations in achieving fairness, independent of data quality.
Read more
Delta-Based Target Reformulation for Short-Term Electricity Load Forecasting Using LSTM and Transformer Models
Vansh Bansal
Time Series
  • Delta-based target reformulation improves hour-ahead forecasting accuracy by over 50% MAPE compared to absolute formulations.
  • LSTM and Transformer models benefit from delta targets, especially for short-term predictions.
  • LightGBM shows competitive performance under absolute formulations but struggles with error accumulation in multi-step delta reconstruction.
  • The study is validated on eight years of real-world data, ensuring operational relevance.
Read more
Attention as Frustrated Synchronization
Joshua Nunley
Theory Efficient ML NLP
  • Introduces the Frustrated Synchronization Network (FSN) as a new attention mechanism.
  • FSN replaces consensus in traditional attention with frustrated synchronization, enhancing predictive capabilities.
  • Achieves lower validation loss compared to tuned transformers, especially in long-range copying tasks.
  • Utilizes a coupling kernel that is directly interpretable through synchronization concepts.
Read more
Mechanism-Guided Selective Unlearning for RLVR-Induced Reasoning
Chenyu Zhou, Qiliang Jiang, Shuning Wu, Xu Zhou
NLP Large Language Models Reinforcement Learning
  • MAST effectively reduces collateral damage during unlearning compared to full-parameter updates.
  • Mechanism separation is crucial, as SFT and RLVR updates differ significantly in their impact on model behavior.
  • MAST achieves meaningful forgetting while preserving performance on non-target tasks, demonstrating its robustness across different models.
  • Standard evaluation metrics for unlearning may be insufficient, as they can overlook the complexities of reasoning updates.
Read more
Performance-Driven Environment Abstraction with Multi-Timescale Learning
Yue Guan, Dipankar Maity, Panagiotis Tsiotras
Reinforcement Learning Robotics Theory
  • Introduces performance-driven environment abstraction optimizing decision quality in MDPs.
  • Establishes performance guarantees separating value-function approximation error from action-sharing error.
  • Develops a multi-timescale reinforcement learning algorithm that adapts policy and abstraction jointly.
  • Empirical results show improved sample efficiency and faster replanning compared to existing methods.
Read more
MM++: Unsupervised Scale-Invariant Multilayer OOD Detection via Top-K Gated Feature Fusion
Rahim Hossain, Md Tawheedul Islam Bhuian, Md Farhan Shadiq, Kyoung-Don Kang
Computer Vision
  • MM++ is a fully unsupervised and strictly post-hoc framework for OOD detection.
  • It employs a Top-K gated feature fusion mechanism to select the most informative layers based on entropy density drops.
  • The framework utilizes a Ledoit–Wolf regularized tied covariance matrix for stable distance estimation.
  • MM++ demonstrates superior performance across diverse architectures and challenging datasets, including long-tailed distributions.
Read more
Probing, Fusion, and Trustworthiness: A Systematic Evaluation of Foundation Model Representations for Multimodal Cancer Analysis
Jingyu Hu, Giuseppe Tripodi, Reed Naidoo, Sarah F. McGough, Tapabrata Chakraborti
Multimodal
  • Foundation models can effectively extract representations from multimodal cancer data.
  • Image and omics modalities provide complementary information for predictive tasks.
  • Multimodal fusion strategies can enhance performance, particularly when modalities are balanced.
  • Conformal prediction offers a method for assessing model trustworthiness and uncertainty.
Read more
LLMZero: Discovering Adaptive Training Strategies for RL Post-Training via LLM Agents
Haoyang Fang, Wei Zhu, Boran Han, Alex Zhang, Zhenyu Pan, Shuo Yang, Shuai Zhang, Jiading Gai, Peng Tang, Cuixiong Hu, Xuan Zhu, Huzefa Rangwala, George Karypis, Bernie Wang
Reinforcement Learning Large Language Models Optimization
  • LLMZERO discovers adaptive training strategies that significantly outperform traditional fixed schedules.
  • Capacity parameters accumulate monotonically, while regularization parameters oscillate, necessitating adaptive strategies.
  • The system uses LLM agents to analyze training dynamics and propose coordinated hyperparameter transitions.
  • Improvements range from 9% to 140% over baseline models and 6% to 15% over grid search methods.
Read more
Graph Grounded Cross Attention Transformer Neural Network for Structurally Constrained Full Event Sequence Generation in Predictive Process Monitoring
Fang Wang, Ernesto Damiani
Graph Learning Generative Models Time Series
  • GGATN integrates graph neural networks with Transformer architectures for event sequence generation.
  • The model generates full event sequences in a single pass, addressing limitations of autoregressive methods.
  • Experiments show GGATN achieves superior generation quality compared to existing baselines.
  • The architecture preserves global process structure while modeling position-specific dependencies.
Read more
Informative Missingness to Generate Irregular Clinical Time Series
Hadi Mehdizavareh, Gabriele Santangelo, Giovanna Nicora, Simon Lebech Cichosz, Arianna Dagliati, Arijit Khan, Riccardo Bellazzi
Generative Models Time Series
  • Introduces a diffusion-based framework for generating clinical time series that captures both lab values and their observation patterns.
  • Highlights the importance of modeling informative missingness in clinical data rather than treating it as a preprocessing artifact.
  • Demonstrates that the generated synthetic data closely matches real patient trajectories, indicating the model's effectiveness.
  • Provides a preprocessing protocol that maintains MNAR-like structure while being compatible with diffusion training.
Read more
Perron--Frobenius Operator Matching for Generative Modeling
Shiqi Zhang, Wuwei Wu, Jaemin Oh, Jie Chen, Xiaoning Qian
Generative Models Theory Optimization
  • PFOM generalizes density evolution matching beyond first-order descriptions.
  • Only Kullback–Leibler divergence preserves equality between density-level and sample-conditioned objectives.
  • Nesterov-accelerated training enhances convergence and reduces discretization errors.
  • PFOM demonstrates improved efficiency in generative modeling tasks.
Read more