AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

70 Papers today
8h Update frequency
7 Days of history
Representation-Guided Discrete Molecular Graph Retrosynthesis
Jiahai Huang, Anjie Qiao, Zhen Wang, Defu Lian, Yutong Lu
Generative Models Graph Learning
  • Introduction of Graph-oriented Representation Guidance (GRG) for molecular graph retrosynthesis.
  • Systematic exploration of design choices for representation guidance, including teacher representations and alignment strategies.
  • GRG achieves state-of-the-art performance on USPTO-50k with improved accuracy and diversity.
  • Representation similarity can be used for lightweight inference-time verification, enhancing output quality.
Read more
Relative Repairability: A Calibration-Based Diagnostic for High-Sparsity Post-Pruning Allocation
Qishi Zhan, Liang He, Minxuan Hu, Ziheng Chen
Efficient ML
  • Introduces Relative Repairability (RR) as a diagnostic for evaluating pruning-induced damage and its recoverability.
  • RR is most effective near the recoverability transition, where traditional allocation methods lose reliability.
  • Demonstrates that RR can outperform existing sparsity allocation methods like ERK and LAMP in specific scenarios.
  • Highlights the importance of considering repairable damage in high-sparsity pruning strategies.
Read more
Hybrid Quantum-Classical Corrective Diffusion Modeling for Meteorological Downscaling
Rui Wang, Edoardo Pasetto, Amer Delilbasic, Morris Riedel, Kristel Michielsen, Gabriele Cavallaro
Generative Models Theory Efficient ML
  • Introduction of a hybrid quantum-classical model for meteorological downscaling.
  • Improved performance metrics (MAE and CRPS) compared to classical models.
  • Preservation of key wind field characteristics while enabling controlled changes.
  • Identification of limitations in real hardware deployment and generalization gaps.
Read more
Courtroom Analogy: New Perspective on Uncertainty-Aware Classification
Taeseong Yoon, Heeyoung Kim
Theory Interpretability Efficient ML
  • Introduces a courtroom analogy for uncertainty-aware classification, framing it as a debate among class advocates.
  • Models each advocate's opinion using Dirichlet distributions with interpretable parameters.
  • Proposes MoDEX, a neural architecture that efficiently predicts uncertainty while maintaining interpretability.
  • Demonstrates strong theoretical properties and state-of-the-art UQ performance across diverse benchmarks.
Read more
Global Convergence of Wasserstein Policy Gradient for Entropy-Regularized Reinforcement Learning
Zhaoyu Zhu, Rui Gao, Shuang Li
Reinforcement Learning Theory Optimization
  • Develops a global convergence theory for Wasserstein Policy Gradient in entropy-regularized RL.
  • Utilizes Bellman structure to replace traditional convexity arguments in convergence analysis.
  • Establishes relationships between value improvement, Fisher information, and KL divergence.
  • Demonstrates geometric convergence properties despite the non-convex nature of the RL objective.
Read more
The Quantization Benefits of Residual-Free Transformers
Yiping Ji, Mahalakshmi Sabanayagam, Peyman Moghadam, Hemanth Saratchandran, Simon Lucey
Efficient ML Optimization Large Language Models
  • Residual connections in transformers lead to non-Gaussian activations, increasing quantization error.
  • Residual-free transformers maintain near-Gaussian activations, resulting in improved robustness to low-bit quantization.
  • Orthogonal initialization and second-order optimization techniques can effectively train residual-free transformers.
  • The study reveals an accuracy-compressibility trade-off, suggesting architectural changes can enhance quantization performance.
Read more
An Effective-Rank Audit of Alignment-Induced Activation Shifts: Confound Control, Constructive Calibration, and Limits
Yuki Nakamura
NLP Large Language Models Theory
  • Introduces the effective rank of the alignment modification matrix as a continuous measure of activation shifts.
  • Demonstrates confound-controlled measurement to isolate contributions to activation shifts in LLMs.
  • Identifies the distinction between robust and brittle configurations in model calibration.
  • Critiques the limitations of rank-based diagnostics in assessing model safety.
Read more
A perspective on fluid mechanical environments for challenges in reinforcement learning
Shruti Mishra, Michael Chang, Vamsi Spandan, Shmuel M. Rubinstein
Reinforcement Learning
  • Fluid mechanics problems serve as a valuable testbed for developing RL agents.
  • Agents can leverage preserved representations in fluid dynamics to learn efficiently.
  • The paper outlines specific problem descriptions for RL agents in fluid environments.
  • Open-source simulators like Dedalus are highlighted for RL method development.
Read more
Aligning Molecular Graph Explanations with Chemical Identity via InChIfied Invariants
Emanuele Guidotti, Sara Puglioli
Graph Learning Interpretability
  • Introduction of INCHIFIED INVARIANTS for molecular graph featurization.
  • Achieves 99.62% consistency in representations for chemically equivalent graphs.
  • Maintains predictive performance on MoleculeNet tasks while improving explanation consistency.
  • Quantitative analysis shows significant improvement in attribution consistency.
Read more
Graph-based Complexity Forecasts in UK En Route Airspace Using Relevant Aircraft Interactions
Edward Henderson, George De Ath, Nick Pepper
Graph Learning Time Series Optimization
  • Introduces a probabilistic approach to forecast airspace complexity using relevant aircraft interactions.
  • Implements a refined relevant aircraft filter algorithm tailored for the London Middle Sector.
  • Achieves improved prediction accuracy with an F1-score of 0.84, outperforming traditional models.
  • Forecasts ATCO workload up to 45 minutes in advance with significant correlation to actual interactions.
Read more
Hidden-State Privacy Has an Empty Middle
Alexander Okezue Bell
NLP Large Language Models Theory
  • No Gaussian release mechanisms tested achieved both moderate utility and privacy against adaptive attackers.
  • A diagonal inverse-Fisher mechanism is identified as the unique minimax-optimal solution in the Gaussian class.
  • A new split-memory transformer architecture outperforms existing models in both privacy and utility metrics.
  • The study highlights the need for redesigning models to achieve better privacy outcomes.
Read more
The Normalized Maximum Likelihood for Regular Non-Smooth Models: Measure-Theoretic Foundations and Geometric Sampling
Trenton Lau, Gary P. T. Choi
Theory Optimization Efficient ML
  • Establishes a well-posed NML stochastic complexity for path-differentiable Lipschitz estimators.
  • Introduces the Propose-and-Project Metropolis-Hastings (PDL-PPMH) sampler for non-differentiable models.
  • Demonstrates the method's robustness in high-dimensional Lasso regression problems.
  • Provides a data-efficient alternative to cross-validation using the exact NML criterion.
Read more
Synheart Capacity: A Theory-Driven Physiological Representation of Cognitive Capacity Dynamics from Wearable Signals
Yisak Debele, Henok Ademtew, Israel Goytom
Multimodal Theory Time Series
  • Introduces a theory-driven framework for modeling cognitive capacity dynamics using wearable physiological signals.
  • Demonstrates significant cross-individual generalization in estimating stress and effort states.
  • Enables differentiation between productive engagement and cognitive overload.
  • Highlights the potential for real-time monitoring of cognitive states to enhance adaptive system interactions.
Read more
Learning Permutation from Structure Without Supervision
Ran Eisenberg, Ofir Lindenbaum
Optimization Computer Vision Theory
  • Introduces a unified framework for unsupervised permutation learning using task-specific structural losses.
  • Develops an entropy-adaptive Gumbel-Sinkhorn method that modulates temperature based on assignment uncertainty.
  • Proposes a new interpretation of adaptive temperature control in terms of optimal transport over the Birkhoff polytope.
  • Demonstrates improved performance in permutation quality and training stability across various tasks.
Read more
Si'multaneous 'S'patial-'T'emporal Message Passing for Dynamic Graph Representation Learning
Shubhajit Roy, Anirban Dasgupta
Graph Learning
  • Introduces SiST-GNN, a new paradigm for dynamic graph representation learning that integrates spatial and temporal information simultaneously.
  • Addresses the architectural bottleneck in existing DGNNs that limits joint reasoning over topology and evolution.
  • Achieves state-of-the-art performance in link prediction tasks, outperforming prior methods by significant margins.
  • Demonstrates effectiveness in dynamic node classification tasks, matching or exceeding the performance of continuous-time models.
Read more
Polymorphism Is Rotation: Operational Mechanistic Interpretability from a Two-Layer Transformer to Pythia-70m
Jordan F. McCann
Interpretability Large Language Models Theory
  • Introduces the concept of 'polymorphism' in transformer models, highlighting the impact of random rotations on internal coordinates.
  • Demonstrates that a single orthogonal Procrustes rotation can align features between independently trained models without retraining.
  • Challenges the interpretation of high decoder-column cosine similarity as evidence of universality, revealing encoder failures.
  • Validates findings across different model sizes, confirming the robustness of the rotation phenomenon.
Read more
FastKernels: Benchmarking GPU Kernel Generation in Production
Gabriele Oliaro, Yichao Fu, May Jiang, Owen Lu, Junli Wang, Hao Zhang, Zhihao Jia, Samyam Rajbhandari
Large Language Models Optimization Efficient ML
  • FASTKERNELS provides a benchmark-as-framework approach that integrates evaluation and deployment.
  • It features a compositional task hierarchy that allows for the reuse of optimizations across different levels.
  • The framework evaluates kernels with production baselines, capturing real-world tensor inputs and multi-GPU communication patterns.
  • FASTKERNELS covers a wide range of architectures, ensuring broad applicability across various domains.
Read more
Knowledge Graph Modulated Deep Learning for Limited-Sample Clinical Data Analysis
Yuwei Xue, Sakib Mostafa, James Zou, Joseph Liao, Maximilian Diehn, Ash A. Alizadeh, Lei Xing, Md. Tauhidul Islam
Graph Learning
  • Introduction of Graph-in-Graph (GiG) framework for clinical data analysis.
  • GiG allows integration of patient-specific gene expression data with biological knowledge graphs.
  • Significant performance improvements in clinical tasks, particularly in limited-sample settings.
  • Prostate cancer diagnosis task shows up to 49 percentage points improvement in macro-F1 score.
Read more
Active Learning for Stochastic Contextual Linear Bandits
Emma Brunskill, Ishani Karmarkar, Zhaoqi Li
Reinforcement Learning Theory Efficient ML
  • Introduces an active learning framework for stochastic contextual linear bandits.
  • Demonstrates that active context sampling can improve sample efficiency over passive methods.
  • Provides theoretical guarantees showing performance improvements by a factor of √d.
  • Empirical results validate the effectiveness of the proposed algorithm in real-world applications.
Read more
Feature Learning in Wide Neural Networks under $μ$P: Identifiability and Sparse-Dictionary Decomposition of the Mean-Field Limit
Akmal Xodarev
Theory
  • Establishes global existence and uniqueness of mean-field limit under µP.
  • Characterizes identifiability of network functions based on active components.
  • Describes sparse-dictionary decomposition of long-time limit measures.
  • Derives total feature-learning-error decomposition into various components.
Read more
Open Multimodal Datasets and Open-Source Software for Data-Driven Modeling of Multiphase Transport and Thermal Systems
Christy Dunlap, Hari Pandey, Stephen Pierson, Daniel Curl, Braden Stevens, Mohammad Ishraq Hossain, Annapurna Parjuli, Chinmaya Joshi, Han Hu
Multimodal Time Series Computer Vision
  • Introduction of the S+TD framework for classifying thermal-fluid datasets.
  • Organization of public NED3 datasets for easier access and usability.
  • Development of open-source software packages to support diverse data analysis tasks.
  • Emphasis on SeqReg for sequence regression applications in thermal-fluid systems.
Read more
Iterative Refinement Neural Operators are Learned Fixed-Point Solvers: A Principled Approach to Spectral Bias Mitigation
Xiaotian Liu, Shuyuan Shang, Xiaopeng Wang, Pu Ren, Yaoqing Yang
Theory Optimization Efficient ML
  • Introduction of the Iterative Refinement Neural Operator (IRNO) to mitigate spectral bias in neural operators.
  • Establishment of a contraction-based analysis for IRNO, ensuring convergence to a unique fixed point.
  • Demonstration of consistent error reduction across multiple physical systems and tasks.
  • Effective spectral bias mitigation and cross-operator transferability, enhancing the versatility of neural operators.
Read more
Continuous-Depth Field Theory for Transformer Patching and Mechanistic Interpretability
David N. Olivieri, Antonio F. Pérez Rodríguez
NLP Large Language Models Interpretability
  • Introduces a field-theoretic framework for Transformer patching.
  • Empirically tests the framework on GPT-2-style models.
  • Establishes a bounded local linear regime for patch effects.
  • Demonstrates structured anisotropic propagation of interventions.
Read more
Looped Diffusion Language Models
Sanghyun Lee, Chunsan Hong, Seungryong Kim, Jonghyun Lee, Jongho Park, Dongmin Park
NLP Generative Models Efficient ML
  • LoopMDM improves training efficiency by reducing FLOPs without adding parameters.
  • Selective looping enhances performance on reasoning tasks, outperforming non-looped models.
  • Adaptive loop counts during inference lead to further gains in compute efficiency.
  • Attention analysis reveals increased interactions among masked positions with looping.
Read more
Anytime Training with Schedule-Free Spectral Optimization
Anuj Apte, Pranav Deshpande, Niraj Kumar, Shouvanik Chakrabarti, Junhyung Lyle Kim
Optimization Theory Large Language Models
  • SF-NorMuon outperforms SF-AdamW and matches tuned AdamW baselines.
  • The method enables high-quality checkpoints at any training point without a fixed horizon.
  • Theoretical guarantees for stationarity and the importance of weight decay for stability are provided.
  • The approach addresses the optimization challenges in continual learning scenarios.
Read more
Active Query Synthesis for Preference Learning
Namrata Nadagouda, Nauman Ahad, Maegan Tucker, Mark A. Davenport
Optimization Robotics Efficient ML
  • Introduction of a confidence-aware response model for pairwise comparisons to enhance query reliability.
  • Development of Info-Synth, an active query synthesis framework that generates optimal continuous queries.
  • Two approximation strategies, Pair M-dist and Pair Opt-dist, for effective query selection in finite pools.
  • Empirical validation of the framework across synthetic data, real-world preference learning, and robotic gain tuning.
Read more
JacQuant: STE-Free Quantization-Aware Training via Learned Jacobian Surrogates
Kai Yi, Vignesh Vivekraja, Harshit Khaitan, Steven Li
Large Language Models Optimization Efficient ML
  • JacQuant replaces the STE's identity Jacobian with a learned surrogate sensitivity map, improving gradient alignment.
  • The framework is computationally efficient, with negligible overhead and compatibility with existing quantization methods.
  • Theoretical convergence guarantees are provided for non-convex objectives under the learned sensitivity model.
  • Empirical results show significant accuracy improvements in ultra-low-bit quantization scenarios on LLM benchmarks.
Read more
Building a privacy-preserving Federated Recommender system for mobile devices
Aasheesh Singh
Federated Learning
  • Introduction of a federated learning approach to enhance user privacy in mobile recommendations.
  • Development of a two-stage recommendation pipeline for candidate generation and ranking.
  • Implementation of the system on Kotlin Multiplatform for cross-platform deployment.
  • Utilization of diverse datasets for system validation and performance evaluation.
Read more
Trajectory-Based Difficulty Scoring for Reliable Learning on Tabular Data
Tomer Lavi, Bracha Shapira, Nadav Rappoport
Interpretability
  • Introduction of Trajectory-based Difficulty Score (TDS) for estimating instance difficulty in boosted ensembles.
  • TDS utilizes interpretable trajectory descriptors to predict held-out loss effectively.
  • Demonstrated strong rank correlation with error and outperformance of existing baselines in classification tasks.
  • TDS enhances active learning, selective prediction, and conformal prediction workflows.
Read more
On the Impact of Class Imbalance on the Learning Dynamics of Deep Neural Networks: An Intuitive Insight
Ismail B. Mustapha, Shafaatunnur Hasan, Sunday O. Olatunji, Hatem S. Y. Nabus
Theory
  • Class imbalance severely impacts the learning dynamics of DNNs, leading to underfitting of minority classes.
  • DNNs initially focus on the majority class, which results in poor performance on minority class samples.
  • Even when minority samples are learned, the representations are often overfitted and non-generalizable.
  • A systematic investigation of learning patterns is essential for developing effective methods to address class imbalance.
Read more
Batch Normalization Amplifies Memorization and Privacy Risks
Ngoc Phu Doan, Chongyan Gu, Ihsen Alouani
Theory Optimization
  • Batch Normalization significantly increases the memorization of outlier samples in deep neural networks.
  • Models with Batch Normalization show higher susceptibility to membership inference attacks, indicating privacy risks.
  • The study employs a multifaceted approach, combining empirical experiments and theoretical analysis.
  • Theoretical insights reveal that BN amplifies the influence of outlier samples during training.
Read more
GEMQ: Global Expert-Level Mixed-Precision Quantization for MoE LLMs
Jianing Deng, Song Wang, Dongwei Wang, Zijie Liu, Tianlong Chen, Huanrui Yang, Jingtong Hu
Large Language Models Efficient ML NLP
  • GEMQ introduces a global approach to mixed-precision quantization for MoE-LLMs, improving expert importance estimation.
  • The method includes a linear programming formulation for optimal bit-width allocation based on quantization error analysis.
  • Router fine-tuning is employed to adapt to quantized experts, enhancing expert selection accuracy.
  • GEMQ is integrated into a progressive quantization framework, refining expert importance estimation iteratively.
Read more
Merge-Bench: Resolve Merge Conflicts with Large Language Models
Benedikt Schesch, Michael D. Ernst
Large Language Models Reinforcement Learning Optimization
  • Introduction of Merge-Bench, a scalable dataset for merge conflict resolution.
  • Development of LLMergeJ, the first model trained with online reinforcement learning for this task.
  • LLMergeJ outperforms three commercial LLMs in resolving Java merge conflicts.
  • The paper proposes a test-free evaluation paradigm to enhance the reliability of model assessments.
Read more
Capture-Calibrate-Coach: A Graph-Based Framework for Knowledge Monitoring Estimation and Adaptive Feedback
Gen Li, Li Chen, Cheng Tang, Boxuan Ma, Yuncheng Jiang, Daisuke Deguchi, Takayoshi Yamashita, Atsushi Shimada
Graph Learning
  • Introduces a novel framework for knowledge monitoring in educational contexts.
  • Utilizes heterogeneous graph neural networks for inferring latent perceived states.
  • Classifies learners into metacognitive patterns to deliver personalized feedback.
  • Demonstrates high accuracy in predicting learners' perceived knowledge states.
Read more
DiLaDiff: Distilled Latent-Augmented Diffusion for Language Modeling
Jean-Marie Lemercier, Tomas Geffner, Karsten Kreis, Morteza Mardani, Arash Vahdat, Ante Jukić
NLP Generative Models Large Language Models
  • DiLaDiff improves sampling quality and throughput in language modeling by addressing token correlation issues.
  • The model integrates a continuous latent space with a latent diffusion model and a consistency distillation approach.
  • DiLaDiff achieves significant speed improvements, outperforming masked diffusion baselines while maintaining quality.
  • The distillation process allows for efficient latent variable sampling, reducing computational overhead.
Read more
An Open-Source Training Dataset for Foundation Models for Black-box Optimization
Aaron Klein, Herilalaina Rakotoarison, Luca Thale-Bombien, David Salinas
Optimization
  • Introduction of BBO-Pile, the largest open-source dataset for black-box optimization.
  • Dataset consists of 557,100 optimization trajectories from 6 optimizers across 3,095 black-boxes.
  • Foundation models trained on this dataset demonstrate effective imitation of existing optimization methods.
  • Scaling behavior of models is analyzed with respect to parameter count and token budget.
Read more
On the Stability and Realizability of Recurrent Polynomial Surrogate Ternary Logic Gate Networks
Sai Sandeep Damera, Ryan Matheu, Aniruddh G. Puranic, John S. Baras, Calin Belta
Theory Time Series Robotics
  • Introduction of R-DTLGN, a recurrent architecture that operates in Kleene's three-valued logic.
  • Establishment of structural guarantees for graceful degradation under input uncertainty.
  • Development of a novel hardening routine for converting learned networks into logic circuits.
  • Demonstration of principled abstention and input certainty monotonicity in R-DTLGN.
Read more
LLMTabBench: Evaluating LLMs on Binary Tabular Classification From Zero to Few Shots
Daria Grushina, Kseniia Kuvshinova, Alina Kostromina, Aziz Temirkhanov, Mile Mitrovic, Dmitry Simakov
Large Language Models NLP Efficient ML
  • LLMTabBench is introduced as a benchmark for evaluating LLMs in tabular classification with limited labeled data.
  • LLMs demonstrate strong performance in zero-shot settings, often surpassing few-shot learning models.
  • Incorporating few-shot examples can conflict with LLMs' prior knowledge, potentially degrading performance.
  • A data complexity threshold exists, beyond which LLM performance declines and few-shot examples are less effective.
Read more
Test-Time Training Undermines Safety Guardrails
Simone Antonelli, Sadegh Akhondzadeh, Aleksandar Bojchevski
NLP Large Language Models Generative Models
  • TTT introduces new vulnerabilities that can be exploited to bypass safety filters in models.
  • Three threat models for TTT are identified: self-supervised, few-shot, and generation-phase.
  • TTT can significantly increase the Attack Success Rate (ASR), with rates up to 95% in certain configurations.
  • Standard evaluation methods may overestimate ASR due to TTT-induced overfitting, necessitating a validity-aware evaluation approach.
Read more
Predicting Stock Price Direction on Earnings Announcement Days using Multi-modal Deep Learning
Manuel Noseda, Nathan Soldati, Marco Paina
Multimodal Time Series NLP
  • The study integrates multiple data modalities: firm fundamentals, market dynamics, and news sentiment.
  • LSTM and Transformer models are evaluated against a logistic regression baseline for predicting stock price direction on EA days.
  • The Transformer model outperforms the LSTM in identifying volatile price movements, achieving a higher macro F1-score.
  • Incorporating sentiment analysis from financial news significantly improves prediction accuracy.
Read more
Complete-muE: Optimal Hyperparameter Transfer and Scaling for MoE Models
Hongwu Peng, Ohiremen Dibua, Yuanjun Xiong, Yifan Gong, Jianming Zhang, Yan Kang
NLP Large Language Models Generative Models
  • Complete-muE provides a systematic approach for hyperparameter transfer across dense FFN and MoE models.
  • The framework utilizes a two-bridge system to address the complexities of architecture and token count changes.
  • Empirical results show that hyperparameters tuned on a dense model can be effectively transferred to various MoE configurations.
  • The method enables significant convergence speedups in MoE models, reducing the need for extensive hyperparameter searches.
Read more
Push Your Agent: Measuring and Enforcing Quantitative Goal Persistence in Long-Horizon LLM Agents
Yuandao Cai, Yuzhang Zhu, Liyou Gao, Wensheng Tang, Shengchao Qin
NLP Large Language Models
  • Introduces Quantitative Goal Persistence (QGP) as a new evaluation target for language agents.
  • Develops PushBench, a benchmark for measuring agent performance in long-horizon tasks.
  • Evaluates controller-level interventions that improve task completion rates and reduce errors.
  • Demonstrates that traditional success metrics can obscure important failure modes in agent performance.
Read more
Grow-Prune-Freeze Networks: Adaptive & Continual Learning Technique for Olfactory Navigation
Kordel K. France, Ovidiu Daescu
Robotics Reinforcement Learning Theory
  • Introduction of GPF networks for continual learning in dynamic environments.
  • Theoretical grounding in random matrix theory to ensure stability in learning.
  • Empirical success in olfactory navigation tasks with a 94% success rate.
  • Generalization of GPFs to other machine learning tasks.
Read more
Learned Relay Representations for Forward-Thinking Discrete Diffusion Models
Benjamin Rozonoyer, Jacopo Minniti, Dhruvesh Patel, Neil Band, Avishek Joey Bose, Tim G. J. Rudner, Andrew McCallum
NLP Large Language Models Generative Models
  • Introduction of Learned Relay Representations (Relay) for MDMs.
  • Relay allows MDMs to retain and utilize latent information across decoding steps.
  • Demonstrated effectiveness on a Sudoku-based planning task and Fast-dLLM v2.
  • Outperformed standard supervised finetuning on coding tasks with reduced inference latency.
Read more
Complement Submodular Information Measures for Balanced and Robust Data Selection
Rishabh Iyer
Optimization Theory
  • Introduction of Complement Submodular Information (CSI) as a new class of submodular objectives.
  • Derivation of complement-aware variants of classical submodular functions with theoretical guarantees.
  • Empirical evidence showing improved performance of CSI objectives in robust subset selection tasks.
  • CSI objectives effectively preserve structural balance across selected subsets and their complements.
Read more
Don't Retrain, Just Reuse: Recovering Dual-Target Molecules from Single-Target Diffusion Models
Qingyuan Zeng, Pengxiang Cai, Zixin Guan, Ziyang Chen, Anglin Liu, Lang Qin, Xinyao Lai, Jintai Chen
Generative Models Optimization
  • Introduces a novel method for dual-target molecular design without retraining generative models.
  • Formulates the problem as a constrained multi-objective optimization over the input space of a frozen model.
  • Proposes REUSE, an evolutionary input-space search framework that balances efficiency and quality.
  • Achieves a 20.9-percentage-point improvement in Dual High Affinity compared to the best prior methods.
Read more
Any-Dimensional Invariant Universality
Shengtai Yao, Eitan Levin, Mateo Díaz
Theory Graph Learning Efficient ML
  • Develops a systematic approach to establish any-dimensional universality in machine learning models.
  • Identifies the importance of an infinite-dimensional limit space for analyzing universality across varying input sizes.
  • Critiques existing architectures for their failure to achieve universality and proposes modifications to restore it.
  • Highlights the role of symmetries and norm choices in proving universality in infinite-dimensional spaces.
Read more
Joint Optimization of Training and Inference in Federated Edge Learning via Constrained Multi-Objective Deep Reinforcement Learning
Zhen Li, Jun Cai, Chao Yang, Haoran Gao
Federated Learning Reinforcement Learning Optimization
  • Introduces a joint optimization framework for federated training and inference in FEEL systems.
  • Develops a tandem-queue-inspired mechanism to link inference requests with training data.
  • Proposes the C-MOPPO algorithm to address the NP-hard multi-objective optimization problem.
  • Demonstrates significant performance improvements over baseline methods in terms of accuracy, latency, and energy consumption.
Read more
Scale When Needed: Adaptive Neuron-level Mixed Precision Quantization Aware Training
Ayush K. Varshney, Konstantinos Vandikas, Šarūnas Girdzijauskas, Adam Orucu, Aneta Vulgarakis Feljan
Efficient ML
  • NMP-QAT allows each neuron to learn its own quantization precision, enhancing adaptability.
  • The framework supports both weights-only and weights + activations quantization.
  • Empirical results show improved compression-accuracy trade-offs over traditional mixed-precision methods.
  • NMP-QAT is designed for efficient deployment on resource-constrained 6G edge devices.
Read more
Disentangled Double Machine Learning for Accurate Causal Effect Estimation
Guodu Xiang, Kui Yu, Yujie Wang, Richang Hong, Fuyuan Cao, Jiye Liang
Theory
  • DDML improves causal effect estimation by addressing confounding bias more effectively than traditional methods.
  • The Causal Role Disentanglement strategy enhances nuisance function estimation by separating covariates into distinct causal roles.
  • The Residual Dependence Orthogonalization strategy mitigates residual dependence, improving the precision of causal estimates.
  • Extensive experiments show that DDML outperforms existing algorithms in various datasets, indicating its robustness.
Read more
On the Benefits of Free Exploration for Regret Minimization in Multi-Armed Bandits
Yunlong Hou, Zixin Zhong, Vincent Y. F. Tan
Reinforcement Learning Robotics Theory
  • Introduces a new framework for regret minimization that includes a free exploration phase.
  • Develops the UFE-KLUCB-H algorithm, combining free exploration and regret minimization strategies.
  • Establishes logarithmic scaling of the free exploration budget with respect to the time horizon.
  • Demonstrates significant regret savings through simulations and theoretical analysis.
Read more
It's the humans, not the data: Geopolitical bias in LLMs originates in post-training, amplified by the language of the prompt
Stuart Bladon, Brinnae Bent
Large Language Models NLP
  • Geopolitical bias in LLMs originates from post-training, not pre-training.
  • The language of the prompt significantly amplifies biases in LLM responses.
  • Six out of seven models showed bias shifts towards the preferences of their developers after post-training.
  • The study utilized a paired-scenario forced-choice probe across multiple languages.
Read more
Spectral Probe-Circuits: A Three-Step Recipe for Identifying Attention-Head Circuits in Pretrained Transformers
Yongzhong Xu
NLP Large Language Models Interpretability
  • Introduces a three-step recipe for identifying attention-head circuits in transformers.
  • Demonstrates that a small induction circuit is necessary in all tested models.
  • Shows that the per-head PR signal can predict seed-specific circuits without task labels.
  • Finds a consistent fraction of heads (17-19%) engaged in specialized computation across different model sizes.
Read more
Approaching I/O-optimality for Approximate Attention
Pál András Papp, Aleksandros Sobczyk, Anastasios Zouzias
NLP Large Language Models Efficient ML
  • Introduces a technique for computing attention with I/O costs that depend almost-linearly on sequence length.
  • Establishes tight and nearly-tight bounds for I/O complexity across different parameter regimes.
  • Demonstrates that the proposed method outperforms existing algorithms like FlashAttention in terms of I/O efficiency.
  • Categorizes I/O complexity based on the interplay between fast memory size, feature dimension, and polynomial degree.
Read more
Automated Random Embedding for Practical Bayesian Optimization with Unknown Effective Dimension
Hong Qian, Xiang Shu, Xiang Xia, Xuhui Liu, Yangde Fu, Bei Liang, Huibin Wang, Liang Dou
Optimization
  • DSEBO automatically adjusts subspace dimensions during optimization, addressing the challenge of unknown effective dimensions.
  • The shared embedding technique enhances initialization and convergence in higher-dimensional spaces.
  • Theoretical analysis establishes a regret bound, indicating improved performance over traditional methods.
  • Extensive experiments confirm DSEBO's superiority in high-dimensional optimization tasks.
Read more
SeqRoute: Global Budget-Aware Sequential LLM Routing via Offline Reinforcement Learning
Zhongling Xu, Shunan Zheng, Wei Wang
Large Language Models Reinforcement Learning Optimization
  • SeqRoute addresses the limitations of traditional LLM routing by incorporating sequential decision-making under budget constraints.
  • The introduction of Hindsight Budget Relabeling (HBR) allows for the generation of a large dataset enriched with bankruptcy signals, overcoming data starvation.
  • Conservative Q-Learning (CQL) is utilized to ensure safe exploration and prevent the model from making costly decisions in budget-tight situations.
  • The λ-sweep mechanism enables a single policy to navigate the cost-quality Pareto frontier dynamically during deployment.
Read more
The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models
Ming Liu
NLP Large Language Models
  • CoT prompting is crucial for arithmetic tasks in small language models, but the order of steps is less important than previously believed.
  • The study identifies a positional readout mechanism where models copy the trailing number before the answer delimiter, leading to high accuracy.
  • Gold-answer presence significantly boosts model accuracy, indicating a reliance on positional copying over logical reasoning.
  • Different models exhibit varying degrees of content gating and distractor handling, with implications for their architecture-specific behaviors.
Read more
ThriftAttention: Selective Mixed Precision for Long-Context FP4 Attention
Joe Sharratt
NLP Large Language Models Efficient ML
  • ThriftAttention achieves near-FP16 quality at FP4 inference efficiency.
  • The method selectively computes important query-key interactions in higher precision.
  • It recovers up to 89.1% of the quality gap between FP4 and FP16 with only 5% of blocks in FP16.
  • The performance advantage increases with longer context lengths.
Read more
MindAlign: Bridging EEG, Vision, and Language for Zero-Shot Visual Decoding
Zexuan Chen, Sichao Liu, Runhao Lu, Huichao Qi, Alexandra Woolgar, Xi Vincent Wang, Lihui Wang
Multimodal
  • Introduction of a tri-modal EEG-image-text alignment framework for visual decoding.
  • Significant improvement in decoding accuracy on the Things-EEG2 benchmark compared to prior methods.
  • Utilization of a masked autoencoder for pre-training the EEG encoder, enhancing performance.
  • Demonstration that compact embedding spaces outperform larger models in EEG-to-image retrieval.
Read more
Steered Generation via Gradient-Based Optimization on Sparse Query Features
Sumanta Bhattacharyya, Pedram Rooshenas
NLP Large Language Models Optimization
  • Introduction of Prototype-Based Sparse Steering for controlled text generation.
  • Demonstration of attention query activations as a high-fidelity control mechanism.
  • Validation through experiments in both rigid planning and educational feedback contexts.
  • Establishment of sparse query representations for improved interpretability and control.
Read more
Cluster Frequency Conformal Prediction for Local Coverage
Tomer Lavi, Bracha Shapira, Nadav Rappoport
Theory Interpretability Efficient ML
  • CFCP improves classwise reliability in many-class classification by utilizing local cluster-level label frequencies.
  • The framework preserves standard conformal prediction validity while adapting to local representation structures.
  • CFCP achieves superior class coverage in 15 out of 16 dataset comparisons against strong baselines.
  • The method is computationally efficient and maintains competitive prediction set sizes.
Read more
Deployment-complete benchmarking
El Mustapha Mansouri, Keigo Arai
Theory
  • Deployment-complete benchmarking ensures that benchmark evidence directly supports deployment actions.
  • Mixed fibers indicate missing deployment information, which can lead to unresolved actions.
  • Completion curves quantify the evidence needed to resolve ambiguities in deployment decisions.
  • Traditional benchmarks often fail to provide reliable deployment guidance despite high scores.
Read more
Metropolis-Scale Resilient and Trustworthy Traffic Flow Inference Using Multi-Source Data
Qishen Zhou, Yifan Zhang, Michail A. Makridis, Anastasios Kouvelas, Yibing Wang, Simon Hu
Time Series Graph Learning Optimization
  • Introduction of TA-ANP framework for traffic state inference.
  • Effective fusion of floating car data and fixed-detector measurements.
  • Unified approach to handle multiple traffic inference sub-tasks.
  • Robust uncertainty quantification using neural processes and Monte Carlo Dropout.
Read more
Parametric Prior Mapping Framework for Non-stationary Probabilistic Time Series Forecasting
Jinglin Li, Jun Tan, QI Fang, Ning Gui
Time Series Generative Models Optimization
  • PPM synergizes parametric estimation with deep generative modeling to capture non-stationary dynamics.
  • Introduces a parametric push-forward mechanism for deriving adaptive conditional priors.
  • Utilizes a hybrid objective combining NLL and MSE for training stability and distributional fidelity.
  • Empirical results show up to 31.2% reduction in CRPS and 44.3% in QICE compared to state-of-the-art models.
Read more
BandVQ: Band-Wise Vector-Quantized EEG Foundation Model
Jamiyan Sukhbaatar, Satoshi Imamura, Toshihisa Tanaka
Time Series
  • Introduction of BandVQ, a band-wise vector-quantized EEG foundation model.
  • Utilization of independent VQ-VAE tokenizers for each EEG frequency band.
  • Incorporation of metadata conditioning and region-based masking to enhance model performance.
  • Pretraining on a large-scale EEG dataset with over 9,200 subjects.
Read more
Reflex: Reinforcement Learning with Reflection Symmetry Exploitation in State-Based Continuous Control
Shuai Zhen, Yifan Zhang, Yuling Wang, Yanhua Yu
Reinforcement Learning Robotics Efficient ML
  • Reflex introduces a reflection symmetry-aware learning paradigm for state-based continuous control tasks.
  • The paper formalizes axial and bilateral reflection types and their transformations.
  • A symmetry regularization term is proposed to enhance policy learning by encouraging invariance under reflection transformations.
  • Reflex is integrated with both on-policy (PPO) and off-policy (SAC) RL algorithms.
Read more
TGFormer: Towards Temporal Graph Transformer with Auto-Correlation Mechanism
Hongjiang Chen, Pengfei Jiao, Ming Du, Xuan Guo, Zhidong Zhao, Di Jin, Xiao Liu
Graph Learning Time Series
  • TGFormer treats temporal graph learning as a time-series analysis problem, enhancing the modeling of complex temporal patterns.
  • The Series Transformer layer effectively captures long-term dependencies using a Transformer-based architecture.
  • The auto-correlation mechanism (ACoM) captures periodic patterns with reduced computational complexity.
  • TGFormer outperforms existing state-of-the-art methods in precision across multiple datasets.
Read more
When Determinants Are Not Enough: Private Rare Switching
Xingyu Zhou
Reinforcement Learning Theory Optimization
  • The standard rare-switching update rule fails under Gaussian noise due to loss of monotonicity.
  • A new rare-switching rule based on the generalized Rayleigh quotient is proposed.
  • The new rule allows for logarithmic policy updates with controlled regret in private settings.
  • The paper includes a cleaned-up proof of the new rule and its implications.
Read more
Filtered Posterior Mean Collections: A Unified Framework for Analytical Models of Diffusion Generalization
Matthew Niedoba, Berend Zwartsenberg, Frank Wood
Generative Models Computer Vision
  • Introduction of Filtered Posterior Mean Collections (FPMCs) as a unified framework for analytical denoisers.
  • Identification of three principal design axes (query precision, response weights, source distributions) that differentiate prior methodologies.
  • Demonstration of improved performance through soft relaxations and data augmentation strategies.
  • Achieved state-of-the-art sample similarity on CIFAR-10, FFHQ 64 × 64, and AFHQ 64 × 64 datasets.
Read more
RelPrism: A Multi-Faceted Pre-training Framework with Self-Generated Tasks for Relational Databases
Jinyu Yang, Cheng Yang, Junze Chen, Zedi Liu, Muhan Zhang, Hanyang Peng, Chuan Shi
Graph Learning
  • RelPrism addresses the limitations of existing self-supervised learning methods for relational databases by incorporating multi-faceted information.
  • The framework constructs intrinsic, relational, and hybrid attributes, allowing for a broader perspective during pre-training.
  • Experiments show that RelPrism outperforms state-of-the-art methods, improving ROC-AUC by 4.15% for classification tasks and reducing MAE by 10.75% for regression tasks.
  • The methodology emphasizes the importance of multi-granularity clustering to form pseudo-task pools for effective representation learning.
Read more