AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

59 Papers today
8h Update frequency
7 Days of history
From Uncertain Judgments to Calibrated Rankings: Conformal Elo Estimation for LLM Evaluation
Bora Kargi, David Salinas
Large Language Models NLP Theory
  • Introduces a low-cost evaluation framework for LLMs that avoids reliance on large-scale human annotations.
  • Implements calibrated win probabilities to improve Elo estimation accuracy significantly.
  • Utilizes split conformal prediction to provide distribution-free uncertainty bounds for Elo ratings.
  • Achieves a mean absolute error of 17.9 Elo on held-out models compared to human-derived ratings.
Read more
Predicting Cognitive Load from Speech and Interaction Dynamics in Dyadic Conversations
Tahiya Chowdhury
Audio & Speech Multimodal
  • Cognitive load can be predicted from speech and interaction dynamics in natural dyadic conversations.
  • The study employs a regression approach using a two-head Gated Recurrent Unit encoder, enhancing prediction accuracy.
  • Turn-taking dynamics and speaker participation are critical indicators of cognitive load.
  • The research utilizes a diverse dataset of remote collaborative tasks to assess cognitive load across various contexts.
Read more
Crossing the Validation Crisis: Cross-Validation Reduces Benchmarking Variance Surprisingly Well
Célestin Eve, Gaël Varoquaux, Thomas Moreau
Theory Efficient ML NLP
  • Cross-validation significantly reduces benchmarking variance in machine learning evaluations.
  • The concept of sample gain quantifies the benefits of using multiple CV splits.
  • Diminishing returns from additional splits occur later than anticipated, enhancing reliability.
  • A dynamic early-stopping procedure for CV can optimize computational resources.
Read more
LongSpike: Fractional Order Spiking State Space Models for Efficient Long Sequence Learning
Xinrui He, Qiyu Kang, Xuhao Li, Zheng-Jun Zha
Time Series Audio & Speech Efficient ML
  • Introduction of LongSpike framework that utilizes fractional-order dynamics for SNNs.
  • Overcomes the memoryless bottleneck of traditional first-order SNNs.
  • Efficient parallel training enabled through a state-space representation.
  • Demonstrated superior performance on long-sequence benchmarks compared to state-of-the-art SNNs.
Read more
Exposure Bias as Epistemic Underidentification in Recursive Forecasting
Riku Green, Zahraa S. Abdallah, Telmo M Silva Filho
Theory Time Series
  • Exposure bias is not solely a distribution shift but also an epistemic underidentification problem under partial observability.
  • Induced states and provenance variables are crucial for understanding recursive forecasting failures.
  • Empirical evidence shows distinct induced-state regimes and the impact of fixed induced states on corrective tasks.
  • Closed-loop correction can improve performance by changing the induced states during rollout.
Read more
When Does Routing Become Interpretable? Causal Probes on Block Attention Residuals
Aydin Javadov
Interpretability Large Language Models
  • Introduces a routing-ablation framework for Block AttnRes to analyze source families and their contributions.
  • Demonstrates that explicit depth routing does not guarantee mechanistic interpretation.
  • Identifies three localized routing motifs in the trained Block AttnRes model.
  • Finds a significant dissociation between routing mass and causal importance in the model.
Read more
Order Is Not Control
Gareth Seneque, Lap-Hang Ho, Nafise Erfanian Saeedi, Jeffrey Molendijk, Tim Elson
Theory Interpretability Large Language Models
  • Control requires a receiver-gated response law, distinguishing it from mere order induction.
  • Empirical evidence from biological systems and LLMs supports the proposed response laws.
  • Interventions can induce structure without guaranteeing control, necessitating validation through receiver admission.
  • The study introduces a stochastic response kernel to formalize the relationship between drives and responses.
Read more
Simplex-Constrained Sparse Bagging: Transitioning from Uniform Priors to Sparse Posteriors in Ensemble Learning
Meher Sai Preetam, Meher Bhaskar
Optimization Efficient ML Theory
  • SCSB transitions from uniform priors to sparse posteriors in ensemble learning.
  • Introduces a concave quadratic penalty to address the L1-simplex paradox.
  • Achieves up to 96% ensemble compression with linear inference speedups.
  • Improves probability calibration while preserving or enhancing generalization accuracy.
Read more
Improving Crash Frequency Prediction from Simulated Traffic Conflicts Using Machine Learning Based Microsimulation
Xian Liu, Carlo G. Prato, Gustav Markkula
Theory
  • ML-based behavior models improve the realism of traffic microsimulation compared to traditional rule-based models.
  • Simulated conflicts from the ML model align better with real-world crash data, enhancing prediction accuracy.
  • Direct application of ML-generated crashes for predicting real-world crash frequencies remains challenging.
  • The study highlights the potential of ML to advance traffic safety assessments without extensive calibration.
Read more
EPM-JEPA: Operator-Side Experience Modulation in JEPA-Family World Models
Vedant Pandya
Theory Generative Models Time Series
  • EPM-JEPA introduces operator-side experience modulation to adapt JEPA models to distribution shifts.
  • The study compares operand-side injection (EI-JEPA) and operator-side modulation (EPM-JEPA) in a controlled experiment.
  • EPM-JEPA shows a 1.90% improvement over a no-memory baseline, while EI-JEPA underperforms compared to this baseline.
  • The performance trajectory is governed by three independent dynamical processes, indicating complex interactions in model adaptation.
Read more
LLM-Powered Personalized Glycemic Assessment in Type 2 Diabetes with Wearable Sensor Data
Yifan Gao, Yanmin Gong, Yun Shi, Yuanxiong Guo
Large Language Models Multimodal Time Series
  • Introduction of GlyLLM, an LLM-powered framework for glycemic assessment.
  • Integration of wearable sensor data with personalized static metadata enhances model performance.
  • GlyLLM outperforms traditional machine learning methods in glucose forecasting and diabetes categorization.
  • Ablation studies reveal the critical role of personal static metadata in glycemic assessment.
Read more
MiniPIC: Flexible Position-Independent Caching in <100LOC
Nathan Ordonez, Thomas Parnell
Large Language Models Efficient ML
  • MiniPIC provides a flexible and minimalistic approach to position-independent caching in LLMs.
  • The implementation requires less than 100 lines of code changes, making it easy to integrate into existing systems.
  • User-controlled primitives allow for multiple caching strategies without extensive modifications to the inference engine.
  • The proposed method achieves a 49% improvement in prefill throughput and significantly reduces time-to-first-token for cached spans.
Read more
Existence Precedes Value: Joint Modeling of Observational Existence and Evolving States in Time Series Forecasting
Yifan Hu, Hongzhou Chen, Peiyuan Liu, Yiding Liu, Zewei Dong, Jiang-Ming Yang
Time Series
  • Timeflies reformulates time series forecasting as a joint problem of observability inference and value estimation.
  • The framework includes dedicated streams for observations and values, enhancing the modeling of historical irregularities.
  • A new benchmark dataset, Shadow, is introduced to evaluate the model's performance in realistic scenarios.
  • The Observation-Value Joint Entropy (OVJE) metric provides a comprehensive evaluation of the model's predictability.
Read more
Keep Policy Gradient in Charge: Sibling-Guided Credit Distillation for Long-Horizon Tool-Use Agents
Tianyu Ding, Jianhong Xin, Juan Pablo De la Cruz Weinstein
Reinforcement Learning Robotics Large Language Models
  • Naive self-distillation can degrade long-horizon tool-use performance by reinforcing shortcuts.
  • SGCD introduces a novel approach to credit assignment using sibling rollouts to enhance learning.
  • The method maintains the integrity of the policy gradient update while reshaping token credit.
  • Empirical results show significant improvements in performance on AppWorld and Ï„ 3-airline benchmarks.
Read more
Uncertainty Estimation for Molecular Diffusion Models
Paul Seij, Christian A. Naesseth, Stephan Mandt, Metod Jazbec
Generative Models
  • Introduces a method for estimating uncertainty in molecular diffusion models.
  • Utilizes a Laplace approximation to measure noise prediction variability.
  • Demonstrates a negative correlation between uncertainty scores and sample quality metrics.
  • Shows that filtering based on uncertainty can improve generated sample quality.
Read more
TWLA: Achieving Ternary Weights and Low-Bit Activations for LLMs via Post-Training Quantization
Zhixiong Zhao, Zukang Xu, Zhixuan Chen, Xing Hu, Zhe Jiang, Dawei Yang
Large Language Models Efficient ML
  • TWLA achieves significant model compression with ternary weights and low-bit activations.
  • The framework effectively addresses heavy-tailed activation distributions, enhancing quantization performance.
  • Three innovative components (E2M-ATQ, KOTMS, ILA-AMP) work together to optimize both weights and activations.
  • TWLA maintains high accuracy while delivering substantial inference acceleration.
Read more
Select and Improve: Understanding the Mechanics of Post-Training for Reasoning
Akshay Krishnamurthy, Audrey Huang, Nived Rajaraman
Reinforcement Learning Large Language Models NLP
  • Identified two core mechanisms of RL post-training: strategy selection and strategy improvement.
  • Performance improvement through RL is dependent on the diversity of pre-training data and the difficulty of RL data.
  • Strategy selection routes problems to existing reasoning patterns, while strategy improvement enhances these patterns.
  • High-quality pre-training data is essential for effective RL training in reasoning tasks.
Read more
Multi-Bitwidth Quantization for LLMs Using Additive Codebooks
Liza Babaoglu, Shuangyi Chen, Ashish Khisti
Large Language Models Efficient ML Theory
  • Introduction of Drop-by-Drop, a multi-bitwidth quantization framework for LLMs.
  • Establishes a theoretical foundation linking Gaussian source refinement to quantization methods.
  • Enables inference-time control over model weight precision without retraining.
  • Maintains low perplexity and strong accuracy across various bitwidths.
Read more
Towards Provably Fair Machine Learning: Bayesian Approaches For Consistent and Transparent Predictions
Owen O'Neill, Fintan Costello
Theory
  • Introduces a formal definition of statistical consistency for predictions in machine learning.
  • Develops the Fair Bayesian classifier that ensures consistency across all subgroups.
  • Demonstrates that standard classifiers often yield statistically inconsistent predictions.
  • Achieves zero consistency error while exceeding baseline accuracy on benchmark datasets.
Read more
Understanding Truncated Positional Encodings for Graph Neural Networks
James Flora, Mitchell Black, Weng-Keen Wong, Amir Nayyeri
Graph Learning Theory
  • Truncated spectral and walk-based positional encodings have different expressive powers.
  • Truncated spectral PEs can be less expressive than the 1-WL test.
  • k-harmonic distances provide a bridge between spectral and polynomial positional encodings.
  • A combination of truncated PEs yields better performance than using a single family.
Read more
The Geometry of Phase Transitions in Generative Dynamics via Projection Caustics
Ryosuke Sakamoto, Kotaro Sakamoto
Generative Models Theory
  • Introduces a geometric framework for understanding phase transitions in generative dynamics.
  • Identifies projection caustics as critical points where generative models exhibit sharp transitions.
  • Develops the Critical Boundary Detector (CBD) to diagnose score-direction instability.
  • Demonstrates the effectiveness of CBD across toy models and latent text-to-image diffusion models.
Read more
Forecasting Is Not Attribution: Localizing Decoder Bypass in Graph-Based Neural Marketing Mix Models
Yunbo Wang, Bolbi Liu
Graph Learning Time Series Theory
  • Identification of 'attribution bypass' in graph-based neural marketing mix models.
  • Introduction of DICE-MMM as a two-stage framework for graph learning and diagnostics.
  • Development of CIG and AR-CIG diagnostics to assess decoder alignment with the graph.
  • Empirical evidence showing that low forecasting error does not guarantee valid attribution.
Read more
Reliability of Probabilistic Emulation of Physical Systems
Sam F. Greenbury, Radka Jersakova, Paolo Conti, Marjan Famili, Christopher Iliffe Sprague, Edwin Brown, Jason D. McEwen
Generative Models Time Series Theory
  • CRPS-trained ensembles provide more reliable uncertainty estimates than generative models.
  • Generative models trained in ambient space can match the coverage of CRPS ensembles but are computationally expensive.
  • The study introduces AutoCast and AutoSim for benchmarking and dataset generation, respectively.
  • Reliable uncertainty quantification is crucial for effective decision-making in physical system modeling.
Read more
Bag of Dims: Training-Free Mechanistic Interpretability via Dimension-Level Sign Patterns
Varun Reddy Nalagatla
NLP Large Language Models Interpretability
  • Introduces the Bag of Dims framework for interpretability in transformer models.
  • Demonstrates that sign patterns in hidden states can predict semantic content effectively.
  • Achieves unsupervised discovery of 175 semantic categories with high accuracy.
  • Shows that features are preserved through attention projections and linked to specific neurons.
Read more
Navigating the Safety-Fidelity Trade-off: Massive-Variate Time Series Forecasting for Power Systems via Probabilistic Scenarios
Kaijie Xu, Anqi Wang, Xilin Dai
Time Series
  • Introduction of PowerPhase, a benchmark for probabilistic forecasting in power systems with up to 36,964 channels.
  • Incorporation of voltage-safety metrics to evaluate model performance alongside traditional accuracy measures.
  • Identification of a safety-fidelity trade-off in model rankings, emphasizing the need for constraint satisfaction.
  • Development of PowerForge, a scenario-based forecasting model tailored for transmission-scale grids.
Read more
Different Layers, Different Manifolds: Module-Wise Weight-Space Geometry in Transformer Optimization
Kirato Yoshihara
NLP Large Language Models Optimization
  • Different transformer modules (attention and MLP layers) prefer different weight-space geometries.
  • Assigning Stiefel geometry to attention layers and DGram geometry to MLP layers yields optimal performance.
  • DGram constraints on attention layers can lead to instability due to singular value growth and softmax saturation.
  • Module-specific optimization strategies are necessary for effective transformer training.
Read more
Adaptive Weighted Averaging
Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit
Optimization Theory
  • Introduces adaptive strategies for selecting the maximum among unknown values based on unbiased estimates.
  • Presents the SBern strategy, which is admissible and strictly dominates uniform random selection.
  • Constructs the Speel strategy that dominates any fixed deterministic strategy.
  • Demonstrates impossibility results for dependent observations and the limitations of certain strategies.
Read more
Extracting Governing Equations from Latent Dynamics via Multi-View Contrastive Learning
Paolo Muratore, Mackenzie Weygandt Mathis
Theory Time Series Interpretability
  • Introduction of DYSCO, a novel multi-view contrastive learning framework for system identification.
  • Theoretical guarantees for identifying latent dynamical systems under noisy observations.
  • Compatibility with symbolic regression for recovering governing equations.
  • Empirical validation across diverse dynamical regimes, demonstrating robustness to noise.
Read more
Selecting Samples on Graphs: A Unified Dataset Pruning Framework for Lossless Training Acceleration
Dongyue Wu, Zilin Guo, Xiaoyu Li, Jiajia Liu, Jingdong Chen, Nong Sang, Changxin Gao
Graph Learning Efficient ML Optimization
  • Introduces a unified graph-based framework for dataset pruning that captures both intrinsic and extrinsic sample values.
  • Formulates dataset pruning as a Maximum Weight Clique Problem (MWCP) and provides a principled greedy solution.
  • Proves formal approximation guarantees for a broad family of importance metrics under mild conditions.
  • Demonstrates significant reduction in training time (over 40%) without sacrificing accuracy on ImageNet-1k.
Read more
MP3: Multi-Period Pattern Pre-training for Spatio-Temporal Forecasting
Lilan Peng, Yandi Liu, Qingren Yao, Chongshou Li, Tianrui Li
Graph Learning Time Series
  • MP3 effectively addresses the temporal mirage phenomenon in spatio-temporal forecasting.
  • The plugin enhances existing STGNNs by learning multi-period patterns from long time series.
  • Experiments show consistent performance improvements across multiple datasets and models.
  • MP3 reduces forecasting errors significantly, demonstrating its robustness and scalability.
Read more
MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling
Jiacheng Chen, Xinyu Zhang, Shunkai Zhang, Yanmohan Wang, Lin Li, Tiancheng Qin, Qin Wang, Zhengmao Zhu, Tianle Li, Jingyang Li, Zehan Li, Binyang Jiang, Jin Zhu, Han Ding, Fei Yu, Chenyu Du, Zijian Song, Jiayuan Song, Zhi Zhang, Yunan Huang, Weiyu Cheng, Pengyu Zhao, Yu Cheng
Reinforcement Learning Generative Models Large Language Models
  • MaxProof integrates generative-verifier RL with population-level scaling for mathematical proofs.
  • The framework combines proof generation, verification, and repair capabilities into a single model.
  • MaxProof achieves scores exceeding human gold-medal thresholds in major mathematical competitions.
  • The methodology emphasizes tournament selection from a population of candidate proofs to enhance accuracy.
Read more
Positional Encoding in the Context of Memristor-Based Analog Computation for Automatic Speech Recognition
Benedikt Hilmes, Nick Rossenbach, Ralf Schlüter
Audio & Speech Efficient ML NLP
  • Memristors enable efficient analog computation for neural models in NLP.
  • Large output values from positional encodings can degrade performance in memristor-based systems.
  • Adjusting ADC configurations can significantly reduce performance degradation.
  • Relative positional encodings improve model performance, especially under low-precision conditions.
Read more
Hölder++: Improving the Quality-Coherence Trade-off in Multimodal VAEs
Huyen Vo, María Martínez-García, Isabel Valera
Generative Models Multimodal
  • Introduces Hölder++ to improve the generative quality-coherence trade-off in multimodal VAEs.
  • First implementation of symmetric Hölder pooling without approximations for multimodal settings.
  • Models distinct shared and private representations to enhance coherence and diversity.
  • Employs hierarchical inference to further disentangle shared and private latent spaces.
Read more
CLARITree: Cholesky and Lookahead Accelerations for Regression with Interpretable Piecewise Linear Trees
Yixiao Wang, Hayden McTavish, Varun Babbar, Margo Seltzer, Cynthia Rudin
Efficient ML Interpretability Optimization
  • Introduction of CLARITree, an efficient algorithm for learning sparse, piecewise linear regression trees.
  • Utilization of lookahead-style split optimization to enhance performance.
  • Implementation of rank-one Cholesky updates for maintaining numerical stability during split evaluations.
  • Demonstration of superior performance compared to greedy baselines and scalability to larger datasets.
Read more
Viral Proteins Reveal Geometry of Protein Language Models
Arthur Bigot, Harmon Bhasin, Core Francisco Park, Eugene Shakhnovich, Dianzhuo Wang
NLP Large Language Models Interpretability
  • Identification of a dominant nativeness axis in pLM representation space that tracks reconstruction difficulty.
  • Scaling impacts viral protein representation unevenly across different viral families.
  • pLM embeddings retain viral-specific signals beyond nativeness, outperforming shallow classification baselines.
Read more
DynamicPTQ: Mitigating Activation Quantization Collapse via Residual-Stream Dynamics
Zimo Zhao, Maolin Wang, Bowen Yu, Bowen Liu, Xiao Han, Xiangyu Zhao
NLP Large Language Models Efficient ML
  • DynamicPTQ addresses the issue of quantization collapse caused by massive activations in LLMs.
  • The paper introduces Jump Ratio and Historical Feature SNR to analyze residual stream dynamics.
  • DynamicPTQ allows for phase-aware mixed-precision quantization, improving model performance.
  • Experiments show significant improvements in perplexity and QA performance with modest memory overhead.
Read more
VideoMDM: Towards 3D Human Motion Generation From 2D Supervision
Amir Mann, Gal Michael Harari, Merav Keidar, Or Litany
Computer Vision Generative Models
  • VideoMDM is the first diffusion training framework for 3D human motion using only 2D supervision.
  • The method employs a noisy-teacher scheme to generate approximate 3D poses from 2D inputs.
  • A depth-aware reprojection loss is introduced, which is equivalent to 3D supervision under certain conditions.
  • VideoMDM achieves competitive results on various datasets, demonstrating its effectiveness in generating realistic 3D motions.
Read more
WHAR Arena: Benchmarking the State of the Art in Efficient Wearable Human Activity Recognition
Maximilian Burzer, Tobias King, Till Riedel, Michael Beigl, Tobias Röddiger
Time Series Efficient ML
  • Introduction of a standardized benchmarking framework for WHAR.
  • Curation of 30 datasets and 17 models to enable fair comparisons.
  • Evaluation of performance and efficiency metrics on an Android device.
  • Identification of a distributed state of the art in WHAR, with compact models showing better deployment efficiency.
Read more
Boosting Direct Preference Optimization with Penalization
Pengwei Sun
NLP Large Language Models Optimization
  • DPOP enhances DPO by incorporating a gated penalty on reference-greedy responses.
  • The penalty is selectively activated based on the likelihood of preferred versus rejected responses.
  • Empirical results show significant improvements in performance over existing methods.
  • DPOP demonstrates the utility of reference-greedy responses as effective training signals.
Read more
Boltzmann Attention: Learnable Ising Couplings for Cooperative Attention
Gilhan Kim, Daniel K. Park
NLP Large Language Models Theory
  • Boltzmann Attention introduces learnable pairwise couplings to enhance attention mechanisms.
  • The method outperforms standard softmax attention, especially with longer sequences.
  • A four-way ablation study confirms that improvements are due to the learnable couplings.
  • The Ising model formulation opens avenues for quantum-computing-based training methods.
Read more
How Useful is Causal Invariance for Domain Adaptation in Finite-Sample Settings?
Julia Kostin, Kasra Jalaldoust, Elias Bareinboim, Samory Kpotufe, Fanny Yang
Theory
  • Causal invariance can potentially improve supervised domain adaptation by identifying invariant predictors.
  • The effectiveness of causal knowledge in finite-sample settings is influenced by target-risk margins and estimation errors.
  • The study provides a theoretical framework for understanding when causal knowledge can lead to better predictive performance.
  • Real-world causal benchmarks validate the theoretical results regarding the use of causal invariance in domain adaptation.
Read more
Distributional Loss for Robust Classification
Kathleen Anderson, Thomas Martinetz
Theory Optimization
  • Introduction of a bimodal Gaussian distribution as a target for classifier outputs.
  • Mitigation of overfitting and improved robustness in classification tasks.
  • Effective in low-data scenarios without requiring additional label information.
  • Minimal modifications needed for integration into standard training pipelines.
Read more
Beyond the Commitment Boundary: Probing Epiphenomenal Chain-of-Thought in Large Reasoning Models
Daniel Scalena, Sara Candussio, Luca Bortolussi, Elisabetta Fersini, Malvina Nissim, Gabriele Sarti
NLP Large Language Models Interpretability
  • Introduction of a causal framework for analyzing CoT reasoning traces.
  • Identification of a commitment boundary where models stabilize their answers.
  • Demonstration of epiphenomenal reasoning beyond the commitment boundary.
  • Development of lightweight attention probes for predicting answer-formation stages.
Read more
PolyFlow: Safe and Efficient Polytope-Constrained Flow Matching with Constraint Embedding and Projection-free Update
Jianming Ma, Qiyue Yang, Yang Zhang, Liyun Yan, Zhanxiang Cao, Yazhou Zhang, Yue Gao
Generative Models Robotics Optimization
  • PolyFlow embeds constraints directly into the flow model, improving safety and efficiency.
  • The framework eliminates numerical integration errors by reformulating flow matching in a discrete-time setting.
  • A projection-free architecture ensures strict adherence to convex constraints without computational overhead.
  • Experimental validation shows zero constraint violations and superior distribution matching quality.
Read more
The Stable Recovery Manifold: Geometric Principles Governing Recoverability in Continual Learning
Ayushman Trivedi, Bhavika Melwani
Theory
  • Introduces the Stable Recovery Manifold (SRM) hypothesis, suggesting that forgotten knowledge is preserved in a low-dimensional subspace.
  • Demonstrates that the dimensionality of the recovery subspace remains constant at approximately 8 principal directions across multiple tasks.
  • Finds that 82% of the variance in recoverability is explained by geometric variables, with principal-angle drift as the dominant predictor.
  • Falsifies the Recoverability Diffusion hypothesis, providing a clearer understanding of the nature of forgetting in continual learning.
Read more
μVLA: On Recurrent Memory for Partially Observable Manipulation in VLA Models
Egor Cherepanov, Nikita Kachaev, Daniil Zelezetsky, Aydar Bulatov, Artem Pshenitsyn, Yuri Kuratov, Alexey Skrynnik, Aleksandr I. Panov, Alexey K. Kovalev
Multimodal Robotics Reinforcement Learning
  • Introduces μVLA, a family of recurrent VLA models focused on isolating the effects of recurrence.
  • Demonstrates significant performance improvements in partially observable tasks using recurrent memory.
  • Establishes a controlled experimental framework to evaluate recurrence without auxiliary mechanisms.
  • Identifies specific regimes where minimal recurrence is sufficient and where additional memory structures are needed.
Read more
Circuit Synchronization Precedes Generalization: Causal Evidence from Fourier Structure in Grokking Transformers
Achyuthan Sivasankar
Theory
  • Introduction of Frequency Synchronization Degree (FSD) as a metric for Fourier circuit synchronization.
  • FSD predicts grokking 500-3,000 steps in advance across multiple configurations.
  • Causal evidence indicates that the timing of grokking is influenced by regularization parameters.
  • Demonstration that multi-block circuits are necessary for the precursor to generalization.
Read more
A Stabilized Path-Space Approach to Diffusion-Based Posterior Sampling
Evan Scope Crafts, Umberto Villa, Saviz Mowlavi, Yanting Ma, Hassan Mansour, Wael H. Ali
Generative Models Optimization Theory
  • Introduces a stabilized path-space framework for diffusion-based posterior sampling.
  • Connects diffusion posterior sampling to stochastic optimal control for better uncertainty quantification.
  • Eliminates biases from initial value functions through a novel time reparameterization.
  • Demonstrates improved accuracy and robustness in sampling through extensive benchmark evaluations.
Read more
A Stationary (and Therefore Compatible) Representation is All You Need
Niccolò Biondi, Federico Pernici, Simone Ricci, Alberto Del Bimbo
Theory Computer Vision Efficient ML
  • Stationary representations learned via d-Simplex fixed classifiers ensure compatibility in model updates.
  • Combining cross-entropy loss with contrastive loss captures higher-order dependencies while preserving compatibility.
  • The proposed method achieves state-of-the-art performance in open-set image recognition.
  • Theoretical proof that stationarity implies compatibility strengthens the foundation for future research.
Read more
Emerging Flexible Designs for Geospatial Multimodal Foundation Models
Philipe Dias, Waqwoya Abebe, Abhishek Potnis, Aristeidis Tsaris, Dan Lu, Xiao Wang, Dalton Lunga
Computer Vision Multimodal
  • Standardized benchmarking of geospatial foundation models using unified pretraining objectives and evaluation protocols.
  • Insights into the impact of tokenization and fusion strategies on model robustness and spectral reasoning.
  • Identification of trade-offs between model flexibility and performance in varying spectral conditions.
  • Demonstration of Flex's adaptability to heterogeneous bands compared to standard architectures.
Read more
ReSET: Accurate Latency-Critical NVFP4 Reasoning via Step-Aware Temperature Scaling
Sihwa Lee, Janghwan Lee, Donghoon Yoo, Jae Gon Kim, Hanyul Ryu, Soojung Ryu, Jungwook Choi
Large Language Models Efficient ML
  • ReSET improves reasoning accuracy by up to 2.6 points over NVFP4 baseline through step-aware temperature scaling.
  • A CUDA-core small-M NVFP4 kernel achieves 1.57–2.49× speedup in kernel-level latency for small-batch decoding.
  • The proposed methods address both accuracy degradation and latency issues in NVFP4 quantization for LRMs.
  • The research highlights the importance of considering step-level uncertainty in reasoning processes.
Read more
CausalMoE: A Billion-Scale Multimodal Foundation Model for Granger Causal Discovery with Pattern-Routed Heterogeneous Experts
Bo Liu, Di Dai, Jingwei Liu, Jiarui Jin, Xiaocheng Fang, Guangkun Nie, Hongyan Li, Shenda Hong
Time Series Multimodal Large Language Models
  • CausalMoE addresses the limitations of existing GCD methods by modeling patch-level temporal heterogeneity.
  • The model utilizes a Pattern-Routed Mixture of Heterogeneous Experts to dynamically route time-series patches to specialized experts.
  • Integration of LLMs and VLMs enhances causal discovery by incorporating multimodal semantic information.
  • CausalMoE achieves state-of-the-art performance on fully supervised benchmarks and excels in few-shot scenarios.
Read more
Multimodal Graph Negative Learning
Zhengyu Wu, Xu Wang, Hongchao Qin, Xunkai Li, Guang Zeng, Rong-Hua Li, Guoren Wang
Graph Learning Multimodal
  • Introduces GraphMNL, a framework for learning from multimodal attributed graphs.
  • Addresses the challenge of node-level branch semantic imbalance in MAGs.
  • Utilizes Negative Learning to guide inferior branches without imitating biased dominant predictions.
  • Implements graph-aware reliability arbitration to assess branch reliability.
Read more
Physics-Informed Neural Networks for Chemotherapy Pharmacokinetics: Benchmarking the Clinical Estimator and Exposing Parameter Identifiability
Riya Bisht, Dhruv Agarwal
Theory
  • PINNs effectively model chemotherapy pharmacokinetics, providing insights into unobservable tissue concentrations.
  • In a linear two-compartment model, PINNs match the performance of traditional nonlinear least-squares estimators while also estimating tissue concentrations.
  • The PINN framework reveals non-identifiability issues in the Michaelis-Menten model that traditional methods fail to address.
  • Sparse tissue observations significantly enhance the parameter recovery capabilities of PINNs.
Read more
Adjusted Cup-Product Neural Layer
Snigdha Chandan Khilar
Theory
  • Introduces a neural layer that computes the adjusted cup product, ensuring gauge invariance.
  • Establishes a necessity theorem indicating that the adjustment term is the sole source of gauge-invariant output.
  • Demonstrates that the adjusted layer generalizes well in scenarios where cup products are not convolution-expressible.
  • Empirical results show improved performance over traditional CNNs in topological tasks.
Read more
Normative Robustness as a Frontier for Non-Verifiable Reasoning in LLMs
Elizaveta Tennant, Benjamin Henke, Anita Keshmirian, Murray Shanahan, Verena Rieser, Kristian Lum, Sydney Levine, Julia Haas
NLP Large Language Models
  • Introduces moral robustness as a key metric for evaluating LLMs in non-verifiable reasoning.
  • Develops a scalable, multi-turn evaluation framework for assessing moral reasoning in LLMs.
  • Finds that LLMs' moral judgments can shift based on user preferences and conversation structure.
  • Identifies a failure mode termed moral deliberative sycophancy, where models align their reasoning with user views.
Read more
Out-of-Distribution (OOD) Detectors for Open-Set RF Fingerprinting
Sudeepta Mondal, Ganesh Sundaramoorthi
Theory
  • Introduces a unified mathematical framework for OOD detection in RF fingerprinting based on information theory.
  • Demonstrates the applicability of tuning OOD detectors without access to OOD data.
  • Achieves competitive performance on the POWDER dataset, validating the proposed methods.
  • Addresses the practical challenges of deploying OOD detection in RF environments.
Read more
Disparate Impact in Synthetic Data Generation
Paul Andrey, Michaël Perrot, Batiste Le Bars, Marc Tommasi
Generative Models Graph Learning Theory
  • Proposes a new definition of fairness for SDG based on disparate impact.
  • Investigates the causes of disparate impact in SDG, including estimation and sampling errors.
  • Demonstrates the effects of differential privacy on disparate impact across groups.
  • Introduces a group-wise modeling approach to improve utility and fairness in SDG.
Read more
Representing Time Series as Structured Programs for LLM Reasoning
Jaeho Kim, Changhun Oh, Seokhyun Lee, Irina Rish, Changhee Lee
Large Language Models Time Series
  • Introduction of T2SP, a structured representation for time series that aligns with LLMs' training modalities.
  • T2SP is deterministic, invertible, and training-free, making it compatible with off-the-shelf LLMs.
  • Demonstrated improvements in reasoning performance, reduced inference time, and lower failure rates in LLM responses.
  • T2SP enables effective time-series editing, captioning, and question answering without the need for fine-tuning.
Read more