AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

62 Papers today
8h Update frequency
7 Days of history
UOTIP: Unbalanced Optimal Transport Map for Unpaired Inverse Problems
Donggyu Lee, Taekyung Lee, Jaewoong Choi
Optimization Generative Models Computer Vision
  • Introduces UOTIP, the first model for unpaired inverse problems based on Unbalanced Optimal Transport.
  • Demonstrates robustness to multi-level observation noise and adaptability to class imbalance.
  • Proves the existence and uniqueness of the transport map for ill-posed inverse problems.
  • Achieves state-of-the-art performance on unpaired image inverse problem benchmarks.
Read more
Robust Recommendation from Noisy Implicit Feedback: A GMM-Weighted Bayes-label Transition Matrix Framework
Zongyu Li, Xuanyu Liu, Gongce Cao, Shirui Sun, Yaqi Fang, Yongshuai Yu
Theory Optimization
  • Introduction of RGBT framework that combines GMM with BLTM for robust recommendations.
  • Theoretical guarantees of full sample utilization and low-variance estimation.
  • Demonstrated effectiveness of RGBT in utilizing noisy samples compared to traditional methods.
  • Superior calibration capability of transition matrix over state-of-the-art approaches.
Read more
Training Language Agents to Learn from Experience
Yuval Shalev, Zifeng Ding, Mateja Jamnik
NLP Large Language Models Reinforcement Learning
  • Introduction of the In-context Training (ICT) task for evaluating cross-task self-improvement in language agents.
  • Development of a reinforcement learning-based training pipeline for reflectors to learn from experience autonomously.
  • Demonstration of significant performance improvements in language agents across unseen tasks using the proposed framework.
  • Generalization of learned reflections to substantially different environments beyond the training benchmarks.
Read more
Robust Subspace-Constrained Quadratic Models for Low-Dimensional Structure Learning
Zheng Zhai, Xiaohui Li
Optimization Theory
  • Introduction of a robust SCQM that accommodates various noise distributions.
  • Development of a gradient descent algorithm with orthogonality-preserving updates.
  • Theoretical analysis showing improved robustness with β„“p loss functions.
  • Extensive experiments confirming superior performance over traditional methods.
Read more
LOSCAR-SGD: Local SGD with Communication-Computation Overlap and Delay-Corrected Sparse Model Averaging
Yassine Maziane, Ammar Mahran, Artavazd Maranjyan, Peter RichtΓ‘rik
Optimization Federated Learning Efficient ML
  • LOSCAR-SGD combines local training, sparse model averaging, communication-computation overlap, and worker-specific local-step counts.
  • The delay-corrected merge rule preserves local progress during communication delays.
  • Theoretical guarantees are provided for convergence in smooth non-convex settings.
  • Empirical results show significant reductions in training time with the proposed method.
Read more
Robust Personalized Recommendation under Hidden Confounding in MNAR
Zongyu Li, Wanting Su, Tianyu Xia
Theory Optimization
  • Introduces a novel framework (PUID) for personalized estimation of hidden confounding strength in recommender systems.
  • Develops an entropy-based sensitivity estimator to quantify the influence of unobserved confounders.
  • Proposes a benchmark-guided variant (BPUID) that enhances robustness and predictive accuracy.
  • Demonstrates significant performance improvements over global methods in extensive experiments on real-world datasets.
Read more
A New Framework to Analyse the Distributional Robustness of Deep Neural Networks
Divij Khaitan, Subhashis Banerjee
Theory Interpretability Computer Vision
  • Introduces a framework for analyzing distributional robustness in deep neural networks.
  • Uses Bernoulli distributions to model interactions between layer weights and activations.
  • Demonstrates the ability to distinguish between memorization and generalization in neural networks.
  • Shows that distribution shifts negatively impact the separation metrics used for robustness diagnostics.
Read more
Q-SYNTH: Hybrid Quantum-Classical Adversarial Augmentation for Imbalanced Fraud Detection
Adam Innan, Mansour El Alami, Nouhaila Innan, Muhammad Shafique, Mohamed Bennai
Generative Models
  • Q-SYNTH is a hybrid quantum-classical framework for fraud detection.
  • It synthesizes minority-class fraud samples to address class imbalance.
  • The framework shows improved statistical fidelity and competitive downstream performance.
  • Q-SYNTH offers a favorable trade-off between distributional fidelity and detection performance.
Read more
Compositional Transduction with Latent Analogies for Offline Goal-Conditioned Reinforcement Learning
Junseok Kim, Dohyeong Kim, Mineui Hong, Songhwai Oh
Reinforcement Learning Robotics Theory
  • Introduces analogy transduction for synthesizing goal-reaching behaviors across varying contexts.
  • Proposes a novel task-endogenous analogy representation that captures essential changes for optimal execution.
  • Develops the Compositional Transduction with latent Analogies (CTA) approach for offline GCRL.
  • Demonstrates significant performance improvements over existing methods in empirical evaluations.
Read more
Improved Guarantees for Constrained Online Convex Optimization via Self-Contraction
Dhruv Sarkar, Abhishek Sinha
Optimization Theory
  • Introduces a projection-based algorithm for COCO with improved regret and CCV bounds.
  • Achieves O(log T) regret and O(log T) CCV for strongly convex losses.
  • Maintains O(√T) regret while improving CCV to O(√T) for convex losses.
  • Utilizes a novel movement bound related to self-contracted curves.
Read more
Ada2MS: A Hybrid Optimization Algorithm Based on Exponential Mixing of Elementwise and Global Second-Moment Estimates
Meng Zhu, Quan Xiao, Weidong Min
Optimization
  • Ada2MS combines the advantages of AdamW and Momentum SGD to improve optimization performance.
  • The algorithm utilizes exponential interpolation between elementwise and global second-moment estimates.
  • Ada2MS maintains stability while gradually introducing SGD-like characteristics during training.
  • Experimental results show that Ada2MS performs competitively in visual tasks.
Read more
Catching a Moving Subspace: Low-Rank Bandits Beyond Stationarity
Hamed Khosravi, Xiaoming Huo
Theory
  • Introduces a framework for piecewise-stationary low-rank linear contextual bandits.
  • Establishes an identification boundary for recovering moving subspaces under scalar feedback.
  • Develops the SPSC algorithm that interleaves probing and exploitation to adapt to subspace changes.
  • Demonstrates significant performance improvements over existing methods in empirical evaluations.
Read more
AVSD: Adaptive-View Self-Distillation by Balancing Consensus and Teacher-Specific Privileged Signals
Duy Nguyen, Hanqi Xiao, Archiki Prasad, Zaid Khan, Anirban Das, Austin Zhang, Sambit Sahu, Hyunji Lee, Elias Stengel-Eskin, Mohit Bansal
NLP Large Language Models Reinforcement Learning
  • AVSD utilizes multiple types of privileged information to enhance self-distillation.
  • The method separates stable consensus signals from view-specific residual signals.
  • AVSD outperforms traditional single-view self-distillation methods on various benchmarks.
  • The approach addresses the limitations of relying on a single privileged view for training.
Read more
APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents
Yibo Li, Jiashuo Yang, Zhi Zheng, Zhiyuan Hu, Yuan Sui, Shizun Wang, Yufei He, Bryan Hooi
Large Language Models Reinforcement Learning Optimization
  • APEX introduces a strategy map to maintain an explicit exploration space for LLM agents.
  • The framework effectively addresses exploration collapse by balancing exploration and exploitation.
  • APEX outperforms existing self-evolving agent frameworks across multiple benchmarks.
  • The mechanisms of Fork Discovery and Policy Selection are crucial for enhancing exploratory behavior.
Read more
Automated Kernel Discovery Towards Understanding High-dimensional Bayesian Optimization
Taeyoung Yun, Woocheol Shin, Inhyuck Song, Jaewoo Lee, Jinkyoo Park
Optimization
  • Introduces Kernel Discovery, an LLM-driven framework for high-dimensional BO.
  • Employs a two-stage approach for kernel generation and validation.
  • Proposes LOO-CRPS as a robust selection criterion to avoid overfitting.
  • Achieves superior performance on high-dimensional BO benchmarks compared to existing methods.
Read more
Efficient Learning of Deep State Space Models via Importance Smoothing
John-Joseph Brady, Nikolas Nusken, Yunpeng Li
Time Series Generative Models Efficient ML
  • Introduction of parallel variational Monte Carlo (PVMC) for training DSSMs.
  • PVMC combines the strengths of variational auto-encoding and DSMC methods.
  • Achieves state-of-the-art results on baseline experiments.
  • Demonstrates a 10Γ— speed-up over existing DSMC methods.
Read more
CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation
SeungJeh Chung, Geonho Park, Misong Kim, HyeongYeop Kang
Generative Models Optimization Computer Vision
  • CAdam reduces Gaussian counts by 85%–97% compared to standard densification methods.
  • The framework addresses the Densification Dilemma by leveraging statistical signal verification.
  • CAdam employs a novel approach that combines momentum-based verification and context-aware selection.
  • The method maintains comparable perceptual quality while improving memory efficiency.
Read more
Dynamic Shapley Computation
Xuan Yang, Hsi-Wen Chen, Ming-Syan Chen, Jian Pei
Theory Efficient ML Interpretability
  • D-Shap transforms Shapley computation into a reusable and incremental process.
  • The framework allows for efficient updates in dynamic settings, addressing both task and player changes.
  • Self-valuation enables the construction of the initial Shapley matrix directly from training data.
  • D-Shap achieves substantial reductions in computational costs, making it practical for real-world applications.
Read more
The Devil is in the Condition Numbers: Why is GLU Better than non-GLU Structure?
Xingyu Lyu, Qianqian Xu, Zhiyong Yang, Peisong Wen, Qingming Huang
Optimization Theory Large Language Models
  • GLU structure leads to improved spectral conditioning in the NTK regime.
  • Faster convergence of training error is observed with GLU compared to non-GLU models.
  • The generalization gap remains similar between GLU and non-GLU models.
  • The primary advantage of GLU is in accelerating optimization rather than reducing generalization error.
Read more
HORST: Composing Optimizer Geometries for Sparse Transformer Training
Tom Jacobs, Rohan Jain, Rebekka Burkholz
Optimization Computer Vision NLP
  • Introduces HORST, a new optimizer that effectively combines stability and sparsity in transformer training.
  • Reveals a geometric dichotomy between steepest descent and mirror descent, impacting optimizer performance.
  • Demonstrates that the entropy mirror map can overwrite the implicit bias of steepest descent optimizers.
  • Shows significant performance improvements over AdamW in both vision and language tasks, especially at higher sparsity levels.
Read more
CIG: Exploration via Conditional Information Gain
Tim Joseph, Marcus Fechner, Philipp Stegmaier, Karam Daaboul, J. Marius ZΓΆllner
Reinforcement Learning
  • CIG provides a new intrinsic reward that effectively combines lifelong and episodic exploration signals.
  • The method is scalable to high-dimensional state spaces, overcoming limitations of previous approaches.
  • CIG is evaluated across diverse tasks and consistently outperforms or matches existing exploration strategies.
  • The approach is implemented without additional hyperparameters, simplifying integration into existing frameworks.
Read more
Axiomatizing Neural Networks via Pursuit of Subspaces
Mehmet Yamac, Mert Duman, Ugur Akpinar, Felix Rojas Casadiego, Serkan Kiranyaz, Marcel van Gerven, Moncef Gabbouj
Theory Interpretability
  • Introduces the Pursuit of Subspaces (PoS) framework as an axiomatic approach to understanding neural networks.
  • Establishes four geometric axioms that explain how DNNs learn compact representations.
  • Provides a unified interpretation of architectural mechanisms and their roles in representation and generalization.
  • Connects existing neural architectures to a geometric foundation, facilitating the design of explainable models.
Read more
Weight Decay Regimes in Grokking Transformers: Cheap Online Diagnostics
Lucky Verma
Theory
  • Weight decay is a critical parameter influencing the transition between memorization, generalization, and collapse in transformers.
  • Two online diagnostics are introduced to track training dynamics effectively and at lower computational costs.
  • The study identifies a critical weight decay threshold (Ξ»c = 0.0158) and an empirical power-law exponent for time-to-grok.
  • The findings are consistent across various model architectures, suggesting broader applicability beyond transformers.
Read more
AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback
Miaobo Hu, Shuhao Hu, Bokun Wang, Ruohan Wang, Xin Wang, Xiaobo Guo, Daren Zha, Jun Xiao
Reinforcement Learning Large Language Models Optimization
  • AGPO introduces adaptive clipping and temperature sampling to improve training stability and efficiency.
  • The method utilizes group-level statistics to control update magnitude and exploration dynamically.
  • AGPO outperforms traditional PPO and GRPO methods on multiple benchmarks, demonstrating its effectiveness.
  • The approach is critic-free, simplifying the training process while maintaining performance.
Read more
Provably Learning Diffusion Models under the Manifold Hypothesis: Collapse and Refine
Wei Huang, Andi Han, Mingyuan Bai, Huanjian Zhou, Qixin Zhang, Taiji Suzuki, Kenji Fukumizu
Generative Models Theory Efficient ML
  • Introduces Score-induced Latent Diffusion (SiLD) as a two-stage learning framework for diffusion models.
  • Proves convergence guarantees and establishes that sample complexity depends on intrinsic dimension, not ambient dimension.
  • Demonstrates empirical success on various datasets, outperforming VAE-based latent diffusion models.
  • Establishes a novel training strategy that integrates manifold learning and density estimation under a single objective.
Read more
Latent Process Generator Matching
Lukas Billera, Hedwig Nora Nordlinder, Ben Murrell
Generative Models Theory Optimization
  • Introduces latent process generator matching, extending generator matching theory to time-dependent latent processes.
  • Allows for learning generators of stochastic processes that match one-time marginal distributions on the image space.
  • Generalizes existing methods by accommodating a wider variety of latent spaces, including continuous and manifold-valued processes.
  • Provides sufficient conditions for valid loss functions, recovering results from previous works as corollaries.
Read more
Variance Reduction for Expectations with Diffusion Teachers
Jesse Bettencourt, Xindi Wu, Matan Atzmon, James Lucas, Jonathan Lorraine
Generative Models Optimization Efficient ML
  • Introduction of CARV, a framework for variance reduction in diffusion teacher gradients.
  • Hierarchical Monte Carlo estimator that amortizes expensive computations over cheaper resamples.
  • Significant variance reduction achieved through timestep importance sampling and stratified sampling.
  • Demonstrated 2-3x effective compute multipliers in text-to-3D and attribution tasks.
Read more
A Dialogue between Causal and Traditional Representation Learning: Toward Mutual Benefits in a Unified Formulation
Yan Li, Yuewen Sun, Shaoan Xie, Gongxu Luo, Yunlong Deng, Kun Zhang, Guangyi Chen
Theory
  • Introduces a unified formulation for representation learning that includes both task and constraint components.
  • Emphasizes the mutual benefits of integrating causal representation learning with traditional representation learning.
  • Demonstrates through experiments that the effectiveness of causal constraints varies significantly with different task formulations.
  • Clarifies the relationship between CRL and traditional representation learning, promoting better communication and collaboration between the two fields.
Read more
Fast and Stable Triangular Inversion for Delta-Rule Linear Transformers
Aleksandros Sobczyk, Gioele Gottardo, Christos K. Matzoros, Mirko De Vita, Filip Skogh, Anastasios Zouzias, Jiawei Zhuang
NLP Large Language Models Efficient ML
  • Introduces a systematic analysis of triangular matrix inversion methods for Delta-Rule Linear Transformers.
  • Highlights the importance of numerical stability in maintaining model accuracy during matrix inversion.
  • Demonstrates significant performance improvements with up to 4.3Γ— speed-up on NPUs compared to existing methods.
  • Focuses on leveraging hardware efficiency through matrix product-rich algorithms.
Read more
Weasel: Out-of-Domain Generalization for Web Agents via Importance-Diversity Data Selection
Fatemeh Pesaran zadeh, Seyeon Choi, Xing Han LΓΉ, Siva Reddy, Gunhee Kim
NLP Large Language Models Efficient ML
  • WEASEL is the first data selection approach designed for offline web agent training, focusing on out-of-domain generalization and training efficiency.
  • The method employs a greedy algorithm to optimize trajectory selection based on importance and diversity.
  • Target-centered AXTree pruning is introduced to enhance training efficiency by removing irrelevant content.
  • The approach includes generating style-consistent reasoning traces to improve performance in reasoning-native models.
Read more
Behavior-Consistent Deep Reinforcement Learning
Marcel Hussing, Liv G. d'Aliberti, Claas Voelcker, Benjamin Eysenbach, Eric Eaton
Reinforcement Learning Robotics Theory
  • Introduction of behavior-consistent reinforcement learning (BRL) as a new framework.
  • Establishment of a theoretical link between policy divergence and Q-function disagreement.
  • Identification of challenges in high-entropy maximum-entropy RL.
  • Development of Q-value Expectile Disagreement (QED) for improved behavioral consistency.
Read more
Chronicle: A Multimodal Foundation Model for Joint Language and Time Series Understanding
Paul Quinlan, Jeremy Levasseur, Qingguo Li, Xiaodan Zhu
Multimodal NLP Time Series
  • Chronicle is the first model to jointly pretrain on text and time series from scratch.
  • It utilizes a shared transformer architecture for both modalities, enhancing cross-domain representation learning.
  • Chronicle achieves competitive performance against state-of-the-art unimodal foundation models.
  • The model sets new benchmarks for frozen-embedding time series classification and multimodal forecasting.
Read more
Nonparametric Learning and Earning with One-Point Feedback under Nonstationarity
Xiangyu Yang, Feng Xu, Jian-Qiang Hu, Jiaqiao Hu
Optimization Theory
  • Proposes a nonparametric learning framework for dynamic pricing under nonstationarity.
  • Utilizes one-point feedback for revenue-based gradient approximations.
  • Incorporates a restarting mechanism to adapt to changing market conditions.
  • Introduces a meta-learning layer to handle unknown nonstationarity levels.
Read more
Plug-and-Play Spiking Operators: Breaking the Nonlinearity Bottleneck in Spiking Transformers
Xinzhe Yuan, Xiang Peng, Bin Gu, Huan Xiong
NLP Large Language Models Efficient ML
  • Introduces a plug-and-play framework for spiking operators in Transformers.
  • Decomposes nonlinear computations into spike-friendly primitives.
  • Supports common Transformer nonlinearities without fine-tuning.
  • Demonstrates minimal accuracy loss (<1%) across various tasks.
Read more
Divide and Contrast: Learning Robust Temporal Features without Augmentation
Abdul-Kazeem Shamba, Kerstin Bach, Gavin Taylor
Time Series
  • Di-COT eliminates the need for data augmentation and multiple encoder passes, reducing computational overhead.
  • The method contrasts overlapping sub-blocks within time-series instances, ensuring meaningful representation learning.
  • Di-COT reformulates temporal contrastive learning as a cross-entropy classification task for dense supervision.
  • The framework achieves state-of-the-art performance on multiple benchmarks while maintaining low training times.
Read more
Decomposing MXFP4 quantization error for LLM reinforcement learning: reducible bias, recoverable deadzone, and an irreducible floor
Xiaocan Li, Shiliang Wu, Zheng Shen
Reinforcement Learning Large Language Models Efficient ML
  • Introduces a structural decomposition of MXFP4 quantization error into three components: scale bias, deadzone truncation, and grid noise.
  • Demonstrates that each error component corresponds to specific RL failure modes affecting training outcomes.
  • Proposes targeted corrections for each failure mode, improving the accuracy of RL post-training.
  • Empirical results show significant recovery of accuracy in large language models post-quantization.
Read more
OmniISR: A Unified Framework for Centralized and Federated Learning via Intermediate Supervision and Regularization
Wei-Bin Kou, Guangxu Zhu, Ming Tang, Chen Zhang, Lisheng Wu, Lei Zhou, Yujiu Yang
Federated Learning Theory Optimization
  • Introduces a unified framework that integrates centralized and federated learning.
  • Utilizes intermediate supervision and regularization to address optimization challenges.
  • Provides theoretical guarantees for convergence and gradient alignment.
  • Demonstrates significant performance improvements in both CL and FL settings.
Read more
Machine-Learning-Enhanced Non-Invasive Testing for MASLD Fibrosis: Shallow-Deep Neural Networks Versus FIB-4, Tabular Foundation Models, and Large Language Models
Athanasios Angelakis, Gabriele De Vito, Eleni-Myrto Trifylli, Filomena Ferrucci
Theory Efficient ML
  • MLE-NIT can improve advanced fibrosis detection in MASLD without requiring additional biomarkers.
  • The s-DNN model outperformed both TabPFN and GPT-4o in external validation cohorts.
  • The study highlights the importance of local calibration and threshold selection for clinical utility.
  • AST and FIB-4 were identified as the most significant variables influencing model performance.
Read more
DASH: Fast Differentiable Architecture Search for Hybrid Attention in Minutes on a Single GPU
Weizhe Chen, Miao Zhang, Junpeng Jiang, Yaping Li, Weili Guan, Liqiang Nie
NLP Large Language Models Efficient ML
  • DASH provides a differentiable search framework for hybrid attention architecture design, moving beyond manual and selector-style methods.
  • The framework allows for architecture-only optimization, significantly reducing search time and token usage.
  • DASH consistently outperforms existing hybrid attention design baselines and achieves better performance than Jet-Nemotron models.
  • The method demonstrates that high-quality hybrid architectures can be obtained quickly, paving the way for routine design applications.
Read more
Towards Understanding Self-Pretraining for Sequence Classification
Omar Coser, Loredana Zollo, Paolo Soda, Antonio Orvieto
Theory Optimization
  • Self-pretraining (SPT) significantly enhances Transformer model performance in sequence classification tasks.
  • The ability of label supervision to learn useful Attention patterns from random initialization is a central challenge.
  • Learning proximity interactions are identified as a key source of improvements from SPT.
  • SPT gains persist across different model depths, data sources, and pretraining durations.
Read more
CRAFT: Conflict-Resolved Aggregation for Federated Training
Ziqi Wang, Qiang Liu, Nils Thuerey
Federated Learning
  • CRAFT reformulates federated aggregation as a constrained least-squares problem to ensure conflict-free updates.
  • The method employs a momentum-like reference direction to preserve useful temporal information during aggregation.
  • Layer-wise adaptation allows for conflict resolution at varying granularities, making it suitable for deep neural networks.
  • Extensive experiments demonstrate improved mean accuracy and reduced accuracy disparity across clients.
Read more
Mitigating Label Bias with Interpretable Rubric Embeddings
Calvin Isley, Johann D. Gaebler, Sharad Goel
Interpretability
  • Rubric embeddings provide a framework to mitigate label bias in machine learning models.
  • Traditional black-box embeddings can encode sensitive attributes and replicate biases from historical evaluations.
  • The proposed method shows empirical success in reducing group disparities while improving cohort quality.
  • Rubric embeddings are constructed from expert-defined criteria, ensuring alignment with the desired outcomes.
Read more
Gaussian Sheaf Neural Networks
AndrΓ© Ribeiro, Ana Luiza TenΓ³rio, Tiago da Silva, Diego Mesquita
Graph Learning Theory
  • Introduction of Gaussian Sheaf Neural Networks (GSNNs) for learning with Gaussian-distributed node features.
  • Development of a new Laplacian operator that generalizes the sheaf Laplacian for Gaussian distributions.
  • GSNNs demonstrate superior performance compared to traditional GNNs on both synthetic and real-world datasets.
  • The framework effectively preserves the geometric and algebraic structure of Gaussian parameters during message passing.
Read more
Beyond Numerical Features: CNN-Driven Algorithm Selection via Contour Plots for Continuous Black-Box Optimization
Yiliang Yuan, Xiang Shi, Mustafa Misir
Optimization
  • Introduces a probing-based AAS formulation using contour maps for continuous BBO.
  • Demonstrates the effectiveness of CNNs in predicting solver performance from visual representations.
  • Shows significant performance improvements over traditional single best solver approaches.
  • Competes well with feature-based methods like ELA and Deep-ELA.
Read more
Winfree Oscillatory Neural Network
Jiawen Dai, Yue Song
Computer Vision Theory Efficient ML
  • WONN is the first synchronization-based oscillatory architecture to scale competitively to ImageNet-1K.
  • The architecture achieves 80.1% accuracy on Maze-hard using only 1% of the parameters of prior state-of-the-art models.
  • WONN combines geometric inductive biases with flexible synchronization dynamics for improved representation learning.
  • The learned representations exhibit bimodal phase organization, enhancing the model's ability to capture complex structures.
Read more
OpenSeisML: Open Large-Scale Real Seismic and well-log Dataset for Generative AI
Ipsita Bhar, Huseyin Tuna Erdinc, Thales Souza, Charles Jones, Felix J. Herrmann
Generative Models
  • Introduction of OpenSeisML, a large-scale dataset for seismic inversion.
  • Automated data curation pipeline enhances reproducibility and efficiency.
  • Dataset includes real seismic and well-log data, addressing the scarcity of high-quality datasets.
  • Supports training of generative models for uncertainty quantification in subsurface properties.
Read more
Smaller Abstract State Spaces Enable Cross-Scale Generalization in Reinforcement Learning
Nasehatul Mustakim, Lucas Lehnert
Reinforcement Learning Theory Efficient ML
  • Introduces a theoretical model for OOD generalization in RL agents using POMDPs.
  • Extends state abstraction frameworks to POMDPs and proposes a novel successor-weighted model reduction.
  • Derives a performance loss bound that highlights the relationship between abstract state space size and OOD generalization.
  • Demonstrates that smaller abstract state spaces improve test performance and facilitate generalization to complex tasks.
Read more
GraphDiffMed: Knowledge-Constrained Differential Attention with Pharmacological Graph Priors for Medication Recommendation
Krati Saxena, Tomohiro Shibata
Graph Learning Time Series Interpretability
  • Introduction of the first dual-scale application of Differential Attention v2 for medication recommendation.
  • Demonstrated improvements in recommendation quality and safety performance over existing methods.
  • Provided a transparent analysis of the impact of knowledge constraints on safety-performance balance.
  • Showed that higher DDI rates in recommendations can reflect more comprehensive solutions for complex cases.
Read more
TriForces: Augmenting Atomistic GNNs for Transferable Representations
Ali Ramlaoui, Alexandre Duval, Hannah Bull, Victor Schmidt, Hugues Talbot, Fragkiskos D. Malliaros, Joseph Musielewicz
Graph Learning
  • TriForces introduces a three-stream architecture for atomistic GNNs, enhancing representation transferability.
  • The framework utilizes self-supervised learning to improve the organization and quality of learned representations.
  • Significant performance improvements were observed on multiple benchmarks without the need for Density Functional Theory (DFT) labels.
  • The model enables efficient similarity retrieval in compositional, structural, or joint embedding spaces.
Read more
FedCoE: Bridging Generalization and Personalization via Federated Coordinated Dual-level MoEs
Penglin Dai, Fulian Li, Xincao Xu, Junhua Wang, Lixin Duan, Xiao Wu
Federated Learning
  • FedCoE balances global generalization and local personalization in federated learning.
  • The framework utilizes a dual-level mixture-of-experts architecture to handle heterogeneous data.
  • A shared gating network synchronizes expert selection across clients, addressing gating inconsistency.
  • An adaptive mechanism allows new clients to quickly access global experts, improving cold-start performance.
Read more
Choose Wisely and Privately: Proactive Client Selection for Fair and Efficient Federated Learning
Adda Akram Bendoukha, Heber Hwang Arcolezi, Nesrine Kaaniche, Aymen Boudguiga
Federated Learning Optimization Efficient ML
  • Introduction of a Potential Federation Loss (PFL) that balances predictive utility and fairness in client selection.
  • Development of a proactive client selection framework that identifies optimal client subsets before training.
  • Utilization of mutual information to assess data suitability and fairness concerns.
  • Demonstration of improved model performance and efficiency over traditional reactive methods.
Read more
Spectral Souping: A Unified Framework for Online Preference Alignment
Yinlam Chow, Guy Tennenholtz, Ted Yun, James Harrison, Arthur Gretton, Andre Barreto, Bo Dai
NLP Large Language Models Reinforcement Learning
  • Introduction of Spectral Souping for online preference alignment in LLMs.
  • Discovery of a universal spectral representation that aids in model merging.
  • Two-phase methodology: offline training of specialized policies and online adaptation.
  • Significant performance improvements over existing methods.
Read more
A Deployment Audit of Release-Side Risk in Conformal Triage under Prevalence Shift
Chengze Li, Xiao Liu, Hanrong Zhang, Haiyang Peng, Yanghao Ruan, Huanhuan Ma, Chunyu Miao, Qichao Zhou, Xiangrong Qi, Philip Yu
Theory
  • Introduces a leakage-aware deployment audit for evaluating release-side risk in conformal triage.
  • Demonstrates that traditional metrics can obscure the safety of release decisions under prevalence shift.
  • Identifies the necessity of separating correction, calibration, and evaluation to ensure safety in deployment.
  • Shows that lower review rates can lead to the unsafe release of patients who should not be cleared.
Read more
A Machine Learning Framework for Weighted Least Squares GNSS Positioning based on Activation Functions
Pin-Hsun Lee, Harry Leib
Optimization
  • Proposes a machine learning framework to enhance GNSS positioning accuracy.
  • Utilizes activation functions to transform machine learning predicted scores into weights for WLS.
  • Demonstrates significant reductions in positioning errors in urban environments.
  • Shows strong geographical transferability of the proposed algorithm.
Read more
Training distribution determines the ceiling of drug-blind cancer sensitivity prediction
Taekyung Heo
Theory
  • Drug-blind sensitivity prediction has plateaued due to metric artifacts rather than limitations in drug representations.
  • Global Pearson r conflates between-drug potency and within-drug cell sensitivity rankings, masking true performance ceilings.
  • Per-drug Pearson r reveals that no drug representation improves upon cell-only features.
  • Mechanism-of-action (MoA) as a training distribution constraint significantly enhances predictive performance.
Read more
Same Target, Different Basins: Hard vs. Soft Labels for Annotator Distributions
Mirerfan Gheibi, Gashin Ghazizadeh
Computer Vision Theory Optimization
  • Hard-label delivery methods can improve learning outcomes when annotations are sparse.
  • Multipass and SLS methods match soft-label training when full annotator distributions are available.
  • The preservation of the example-to-distribution match is crucial for effective learning.
  • SLS and soft-label cross-entropy optimize the same expected objective, allowing for clearer comparisons.
Read more
Learning to Think in Physics: Breaking Shortcut Learning in Scientific Diffusion via Representation Alignment
Haozhe Jia, Pengyu Yin, Wenshuo Chen, Shaofeng Liang, Lei Wang, Bowen Tian, Xiucheng Wang, Nanqian Jia, Yutao Yue
Generative Models
  • Introduction of REPA-P, a teacher-free framework for aligning intermediate representations with physical states.
  • Demonstration of improved convergence and reduced physics residuals across multiple PDE tasks.
  • Validation of the hypothesis that aligning latent features with physical quantities enhances model robustness.
  • Architecture-agnostic approach applicable to both U-Net and Diffusion Transformer backbones.
Read more
Modular Multimodal Classification Without Fine-Tuning: A Simple Compositional Approach
Herman BergstrΓΆm, Aditya Mehrotra, Rahul G. Krishnan
Multimodal
  • CoMET framework allows for multimodal classification without fine-tuning.
  • PCA is sufficient for effective dimensionality reduction in embeddings.
  • PALPooling improves representation quality without backpropagation.
  • Achieves state-of-the-art results across various multimodal benchmarks.
Read more
TimeSRL: Generalizable Time-Series Behavioral Modeling via Semantic RL-Tuned LLMs -- A Case Study in Mental Health
Yuang Fan, Lilin Xu, Millie Wu, Jingping Nie, Qingyu Chen, Yuzhe Yang, Zhuo Zhang, Xin Liu, Subigya Nepal, Xiaofan Jiang, Xuhai 'Orson' Xu
Time Series Reinforcement Learning Large Language Models
  • TimeSRL introduces a two-stage framework for time-series behavioral modeling that enhances generalizability.
  • The model uses semantic abstractions to improve reasoning over longitudinal behavioral data.
  • TimeSRL achieves state-of-the-art performance in mental health prediction, outperforming traditional ML and LLM methods.
  • The approach demonstrates robustness against distribution shifts across different datasets.
Read more
Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases
Jingwen Liu, Ezra Edelman, Surbhi Goel, Bingbin Liu
Optimization Theory Efficient ML
  • The small-vs-large gap exists across various tasks and architectures, indicating that fewer samples can lead to faster learning.
  • Sampling biases from smaller datasets enhance optimization by modulating layer-wise updates, facilitating quicker convergence.
  • Empirical evidence shows that even random labels can yield speedups similar to those with real labels, underscoring the role of sampling bias.
  • Adjustments to initialization and learning rates can significantly reduce the small-vs-large gap, highlighting the importance of parameter-wise interventions.
Read more
Most Transformer Modifications Still Do Not Transfer at 1-3B: A 2020-2026 Update to Narang et al. (2021) with Downstream Evaluation and a Noise Floor
Yang Zhao, Jiahao Lu, Bin Huang, Guhua Zhang, Jie Zhou
Large Language Models NLP Theory
  • Most Transformer modifications do not transfer effectively at larger scales (1-3B parameters).
  • Only two out of 20 modifications showed significant improvements at 1.2B, with one failing at 3B.
  • Downstream evaluation metrics are more reliable than pretraining perplexity for assessing model performance.
  • The gap between validation loss and downstream task accuracy has increased for attention-output modifications.
Read more
Fast Reconstruction of Exact Maxwell Dynamics from Sparse Data
Dan DeGenaro, Xin Li, Obed Amo, Michael Pokojovy, Sarah Adel Bargal, Markus Lange-Hegermann, Bogdan RaiΕ£Δƒ
Theory Efficient ML Optimization
  • Introduction of FLASH-MAX, a shallow neural network that predicts electromagnetic fields from sparse data.
  • Each hidden neuron in FLASH-MAX represents an exact solution to Maxwell's equations, ensuring physical validity by construction.
  • Achieves sub-1% relative validation error from about 1,000 observations in seconds, with zero PDE residual.
  • Demonstrates that embedding governing structures into the model improves the trade-off between accuracy and optimization speed.
Read more