AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

59 Papers today
8h Update frequency
7 Days of history
Advances in Scientific Machine Learning for Coupled Fluid Flow and Transport
Gabriel F. Barros, Rômulo M. Silva, Alvaro L. G. A. Coutinho
Efficient ML Theory
  • Overview of SciML advancements for coupled fluid flow and transport modeling.
  • Introduction of surrogate modeling techniques using PINNs and β-VAEs.
  • Discussion of computational strategies like Adaptive Mesh Refinement/Coarsening.
  • Illustration of methodologies through benchmark problems like lock-exchange flows.
Read more
Activation- and Influence-Aware Ranks (AIR): Function-Preserving SVD Compression for LLMs
Nico Harder, Daniel Becking, Karsten Mueller, Wojciech Samek
Large Language Models Efficient ML NLP
  • AIR integrates activation and influence metrics for improved SVD-based compression of LLMs.
  • The method achieves over 18% lower perplexity compared to SVD-LLM with 60% parameter retention.
  • AIR requires approximately 90% less calibration data while maintaining model quality.
  • The framework is layer-local and can be combined with end-to-end methods for enhanced performance.
Read more
Performance Analysis and Optimization of 3D Generative Diffusion Models across GPU Architectures
Jeeho Ryoo, Yongchan Jung, Muhammad Ali Khaliq, Weidong Zhang, Jiatong Han, Byeong Kil Lee
Generative Models Optimization Efficient ML
  • Detailed performance analysis of Med-DDPM across three NVIDIA GPU architectures.
  • Identification of architecture-specific bottlenecks in convolution and normalization kernels.
  • Implementation of TF32 Tensor Core activation and a 3D channels-last layout for optimization.
  • Achieved up to 100× reduction in SM cycles and dynamic instructions on A100.
Read more
Integrating national forest inventory, airborne lidar, and satellite imagery for wall-to-wall mapping of forest structure with computer vision
Luke J. Zachmann, David D. Diaz, Vincent A. Landau, Chelsey Walden-Schreiner, Tony Chang, Nathan E. Rutenbeck, Katharyn A. Duffy, Kiarie Ndegwa, Andreas Gros, Scott Conway, Guy Bayes
Computer Vision
  • Introduction of the VibrantForests framework for comprehensive forest mapping.
  • Utilization of satellite imagery and lidar data to estimate multiple forest attributes.
  • Improved predictive capabilities across diverse forest conditions.
  • Annual updates and high spatial resolution of 10 meters.
Read more
Predicting Mergeability of Parameter-Efficient Fine-Tuning Updates
Lin Tang, Wei Zhang, Jing Li, Hongyu Chen, Ming Zhao, Yuxuan Wang
NLP Large Language Models Efficient ML
  • Introduces a framework for predicting the mergeability of LoRA adapters during early training.
  • Defines mergeability based on single-task utility and post-merge retention.
  • Presents MergeProbe, a lightweight predictor that informs merging decisions.
  • Demonstrates improved performance on the MERGE-PEFT benchmark across multiple domains.
Read more
What Makes Effective Supervision in Latent Chain-of-Thought: An Information-Theoretic Analysis
Xinghao Chen, Chak Tou Leong, Wenjin Guo, Jian Wang, Wenjie Li, Xiaoyu Shen
NLP Large Language Models Theory
  • Latent Chain-of-Thought models face challenges due to weak outcome supervision leading to gradient attenuation and representational drift.
  • The paper introduces two types of supervision: Trajectory Supervision for stepwise reasoning and Space Supervision for semantic structure preservation.
  • The Unified Latent Probe (ULP) is proposed to quantify the mutual information between latent trajectories and reasoning steps.
  • Empirical results show that effective supervision stabilizes training and enhances reasoning accuracy through improved information fidelity.
Read more
Low-Energy Reduced RISC-V Instruction Subset Processor for Tsetlin Machine Inference at the Edge
Chanda Gupta, Sanidhya Bhatia, Shaurya Priyadarshi, Himani Panwar, Rishad Shafik, Sudip Roy
Efficient ML
  • Introduction of a programmable RISC-V architecture tailored for Tsetlin Machine inference.
  • Significant performance improvements and energy savings compared to traditional architectures.
  • Demonstration of TM's competitive accuracy against Binarized Neural Networks.
  • Methodology includes instruction profiling and optimizations specific to TM workloads.
Read more
MassSpecGym in the Wild: Uncovering and Correcting Evaluation Pitfalls in AI-Driven Molecule Discovery
Hongxuan Liu, Roman Bushuiev, Ivy Lightheart, Mrunali Manjrekar, Anton Bushuiev, Magdalena Lederbauer, Filip Jozefov, Yinkai Wang, Soha Hassoun, Josef Sivic, James Taylor, Runzhong Wang, David Healey, Tomáš Pluskal, Connor W. Coley
Optimization Theory
  • Identification of evaluation pitfalls in 17 out of 26 papers using MassSpecGym.
  • Categorization of issues into data leakage, shortcut learning, and implementation bugs.
  • Introduction of MassSpecGym v1.5 to address identified failures and improve benchmarking standards.
  • Recommendations for best practices in model evaluation in the context of MS/MS.
Read more
Boundary Embedding Shaping with Adaptive Contrastive Learning for Graph Structural Disentanglement
Jiaqing Chen, Zidu Yin, Yichao Cai, Yuhang Liu, Zhen Zhang, Dong Gong, Javen Qinfeng Shi
Graph Learning
  • Identifies spurious structural noise from entangled neighborhoods as a critical issue in graph classification.
  • Introduces Boundary Embedding Shaping (BES) as a framework combining hard example mining and adaptive contrastive learning.
  • Demonstrates that BES effectively sharpens decision boundaries without destabilizing non-boundary nodes.
  • Achieves significant improvements in classification accuracy, particularly for nodes near class boundaries.
Read more
UltraQuant: 4-bit KV Caching for Context-Heavy Agents
Inesh Chakrabarti, David Limpus, Aditi Ghai Rana, Bowen Bao, Spandan Tiwari, Thiago Crepaldi, Ashish Sirasao
Large Language Models Efficient ML NLP
  • UltraQuant introduces a 4-bit KV caching method tailored for context-heavy agent workloads.
  • The approach emphasizes joint measurement of task quality, cache residency, and serving throughput.
  • Key design innovations include asymmetric KV treatment and optimized GPU kernels for enhanced performance.
  • UltraQuant achieves a 3.47× reduction in time-to-first-token in late rounds and a 1.63× increase in output throughput over the FP8 KV baseline.
Read more
IHBench: Evaluating Post-Interruption Recovery in Voice Agents with Structured Workflows
Ahmad Salimi, Wentao Ma, Yuzhi Tang, Dongming Shen, Mu Li, Alex Smola
Audio & Speech NLP Large Language Models
  • IHBench defines post-interruption recovery as a critical evaluation axis for voice agents.
  • The benchmark includes six types of interruptions and uses a two-axis scoring system.
  • Closed-weight models outperform open-weight models in handling interruptions.
  • The study reveals significant performance gaps in existing models regarding post-interruption recovery.
Read more
Multi-Granular Attention-Driven Reinforcement Learning Framework for Web Intelligent Enhancement Systems
Navin Chhibber, Deepak Singh, Anokh Kishore, Nikita Chawla, K. Anguraj
Reinforcement Learning Graph Learning Optimization
  • Development of a graph-based semantic knowledge modeling framework with attention mechanisms.
  • Introduction of an adaptive multi-agent reinforcement learning strategy for optimizing personalized web actions.
  • Incorporation of a continuous online adaptation and feedback integration module for real-time updates.
  • Achieved an accuracy of 80%, outperforming existing methods.
Read more
Zero-Inflated Gaussian Distributions Enable Parameter-Space Sparsity in Estimation-of-Distribution Algorithms
Andreas Faust, Sven Nitzsche, Juergen Becker
Optimization
  • Introduction of zero-inflated Gaussian distributions as a sampling law for EDAs in sparse optimization.
  • Joint optimization of sparsity patterns and active values without additional hyperparameters.
  • Identification of latent parameters from observed samples, enhancing the understanding of correlation structures.
  • Empirical results show ZIG-EDA outperforms existing methods in terms of convergence speed and solution quality.
Read more
Quantile of Means: A Bonus-Free Ensemble Method for Minimax Optimal Reinforcement Learning
Asaf Cassel, Aviv Rosenberg
Reinforcement Learning Theory Efficient ML
  • Introduces a quantile-based ensemble method for exploration in finite-horizon MDPs.
  • Achieves instance-optimal variance-dependent regret bounds without requiring count-based bonuses.
  • Improves regret rates in bandit settings by reducing logarithmic factors.
  • Distribution-agnostic approach that adapts to various reward distributions.
Read more
Exploring the potential of AlphaEarth and TESSERA embeddings for Fine-scale Local Climate Zone Mapping: A case study across five cities in Switzerland
Htet Yamin Ko Ko, Clement Atzberger
Computer Vision
  • TESSERA embeddings outperform traditional Sentinel-1/2 composites and AlphaEarth for LCZ mapping.
  • The study demonstrates the feasibility of generating fine-scale LCZ maps at 10m resolution.
  • Embedding-based models can reduce preprocessing time and improve model transferability.
  • Improving reference data quality is crucial for enhancing mapping accuracy.
Read more
Train, Retrieve, or Both? A Four-Arm Head-to-Head for Correct Statutory Citation on the Ontario Residential Tenancies Act
Ali Asaria, Tony Salomone, Deep Gandhi
NLP Large Language Models
  • The hybrid SFT+RAG model outperforms both the base and SFT-only models in citation accuracy.
  • Retrieval is crucial for achieving zero hallucinations in statutory citations.
  • The study demonstrates that a smaller, efficient hybrid model can match or exceed the performance of larger, specialized retrieval systems.
  • The findings indicate that more data does not necessarily lead to better performance in this context.
Read more
FlexLAM: Resolving the Bottleneck Trade-off in Latent Action Learning
Takanori Yoshimoto, Yang Hu, Naruya Kondo, Tatsuya Matsushima
Robotics Computer Vision Efficient ML
  • FlexLAM introduces variable-length latent actions to overcome fixed-capacity bottlenecks in LAMs.
  • The method employs retained-prefix training to create prefix-valid codes that enhance action alignment.
  • FlexLAM outperforms traditional fixed-capacity LAMs across all evaluated token budgets.
  • The model supports inference-time token-budget adjustments without retraining.
Read more
How Transparent is DiffusionGemma?
Joshua Engels, Callum McDougall, Bilal Chughtai, Janos Kramar, Senthoran Rajamanoharan, Cindy Wu, Arthur Conmy, Asic Q Chen, Jean Tarbouriech, Min Ma, Brendan O'Donoghue, João Gabriel Lopes de Oliveira, Rohin Shah, Neel Nanda
NLP Large Language Models Interpretability
  • DiffusionGemma's initial variable transparency is low due to high opaque serial depth.
  • Intermediate states can be made interpretable, reducing opaque serial depth significantly.
  • Algorithmic transparency is more complex for diffusion models compared to autoregressive models.
  • Novel phenomena unique to diffusion models were identified, including non-chronological reasoning.
Read more
Judging to Improve: A De-biased VLM-as-3D-Judge Protocol for Single-Image 3D Generation
Ali Asaria, Tony Salomone, Deep Gandhi
Generative Models Optimization Computer Vision
  • Development of an optimization-grade de-biased VLM-as-3D-judge protocol.
  • Identification and rectification of three failure modes in the evaluation process.
  • Empirical findings show that independent samples lack learnable preferences.
  • Lightweight adaptations can achieve parity with strong base models but not exceed them.
Read more
A Hybrid GNN-FEM Framework for Phase-Field Fracture Simulation. Physics-Preserving Hybridization for Generalizable Surrogate Modeling
Hyeonbin Moon, Yongjin Choi, Seunghwa Ryu
Graph Learning Efficient ML Theory
  • Integration of GNN with FEM enhances efficiency in phase-field fracture modeling.
  • The framework maintains physical consistency by preserving the incremental solution structure.
  • Strong generalization capabilities across varying problem settings are achieved.
  • Numerical experiments confirm reduced computational costs without sacrificing accuracy.
Read more
An Information Theoretic Framework for Graph Novelty Generation via Latent Mixture Modeling
Itsuki Nakagawa, Kenji Yamanishi
Generative Models Graph Learning Theory
  • Introduces a novel framework for graph novelty generation using latent mixture modeling.
  • Imposes novelty and reliability conditions based on the Minimum Description Length principle.
  • Theoretical guarantees on misclassification probabilities for generated samples.
  • Empirical results demonstrate superior control over novelty and reliability compared to existing methods.
Read more
Uncertainty-Aware Reward Modeling for Stable RLHF
Licheng Pan, Haocheng Yang, Haoxuan Li, Yichen Sun, Yunsheng Lu, Shijian Wang, Lei Shen, Yuan Lu, Zhixuan Chu, Hao Wang
Reinforcement Learning Large Language Models Optimization
  • Introduces Uncertainty-Aware Reward Modeling (UARM) to address unreliability in reward models.
  • Utilizes quantile-based conformal prediction for calibrated uncertainty estimates.
  • Implements a heteroscedastic advantage reweighting scheme to suppress unreliable samples.
  • Demonstrates significant improvements in reward model calibration and alignment quality.
Read more
Emyx: Fast and efficient all-atom protein generation
Nicholas J. Williams, Ward Haddadin, Matteo P. Ferla, Constantin Schneider, Nicholas B. Woodall, Ruby Sedgwick, Christian D. Madsen, Andrew L. Hopkins, Edward O. Pyzer-Knapp
Generative Models Efficient ML
  • Emyx introduces a simplified architecture for all-atom protein generation, focusing on geometric constraints.
  • The model outperforms existing state-of-the-art methods in enzyme design benchmarks.
  • Emyx achieves significant reductions in training time and computational resources.
  • The approach bridges flow matching training efficiency with advanced sampling methods from diffusion models.
Read more
Federated Bilevel Performative Prediction
Liangxin Qian, Chang Liu, Xuanyu Cao, Jun Zhao, Kwok-Yan Lam
Optimization Federated Learning Theory
  • Introduces federated bilevel performative prediction, integrating decision-dependent distribution shifts into federated learning.
  • Formulates the federated bilevel performatively stable (FBPS) point and establishes conditions for its existence and uniqueness.
  • Develops two algorithms, FBi-RRM and FBi-SGD, with convergence guarantees tailored for performative shifts.
  • Demonstrates improved meta-generalization and stability through experiments on strategic learning tasks.
Read more
Sparsity, Superposition, and Forgetting: A Mechanistic Study of Representation Retention in Continual Learning
Jan Wasilewski, Jędrzej Kozal, Michał Woźniak, Bartosz Krawczyk
Theory Interpretability
  • Superposition increases over time with transient dips at task boundaries, indicating boundary-specific interference.
  • Higher feature sparsity leads to more superposition but does not always result in forgetting if representations are strong.
  • Tasks with sparser features exhibit higher effective rank, suggesting broader latent capacity usage.
Read more
Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language Models
Darrien McKenzie, Nicklas Hansen, Xiaolong Wang
Reinforcement Learning Large Language Models NLP
  • Introduces Bayesian Manifold Curriculum (BMC) for structured problem sampling in RL for LLMs.
  • Frames problem sampling as a manifold-structured bandit problem, emphasizing the relationships between tasks.
  • Identifies trade-offs between productivity, diversity, and utility in adaptive curriculum learning.
  • Demonstrates that focusing solely on difficulty can hinder generalization and performance.
Read more
Flow Map Denoisers: Traversing the Distortion-Perception Plane for Inverse Problems
Nicolas Zilberstein, Morteza Mardani, Santiago Segarra
Computer Vision Generative Models Theory
  • Flow maps enable continuous traversal of the distortion-perception plane with a single model.
  • The lookahead parameter controls the tradeoff between MMSE and perceptual quality.
  • The method achieves optimality for Gaussian targets and demonstrates empirical effectiveness for natural images.
  • Integration into a Plug-and-Play framework allows for versatile applications in various inverse problems.
Read more
SL-S4Wave: Self-Supervised Learning of Physiological Waveforms with Structured State Space Models
Feng Wu, Harsh Deep, Eric Lehman, Sanyam Kapoor, Guoshuai Zhao, Rahul Krishnan, Gari Clifford, Li-wei H Lehman
Time Series
  • Introduces SL-S4Wave, a self-supervised learning framework tailored for long-sequence physiological waveforms.
  • Combines contrastive learning with a structured state space model to capture both local and long-range dependencies.
  • Demonstrates superior performance in arrhythmia detection with fewer labeled examples and robust performance on long segments.
  • Shows effective transferability to unseen arrhythmia types and generalizability to EEG tasks.
Read more
Sensorimotor World Models: Perception for Action via Inverse Dynamics
Petr Ivashkov, Randall Balestriero, Bernhard Schölkopf
Robotics Reinforcement Learning Theory
  • Introduction of sensorimotor world models (SMWM) that prioritize action-relevant representations.
  • Use of inverse dynamics regularization to prevent representation collapse and enhance model stability.
  • Demonstration of the model's ability to learn compact latent spaces and filter out irrelevant information.
  • Competitive performance in planning tasks compared to existing models.
Read more
Displacement Is Not Direction: Evaluating Fidelity Metrics for Quantized LLM Deployment
Miloš Nikolić, Ali Hadi Zadeh, Enrique Torres Sanchez, Andreas Moshovos
Large Language Models NLP Efficient ML
  • Identification of a silent zone where fidelity metrics lose ranking power.
  • Decomposition of score differences into volume and direction, highlighting KLD's limitations.
  • Demonstration that per-prompt KLD has weak predictive power for model selection.
  • Evidence that the collapse of correlation is consistent across various metric variants.
Read more
ProMUSE: Progressive Multi-modal Uncertainty-guided Staged Evidential Alzheimer Disease Classification
Long Doan, Branden Chen, Ethan Litton, Huan Huang, Jiajing Huang, Yixin Xie, Weihua Zhou, Nandakumar Narayanan, Chen Zhao
Multimodal Efficient ML
  • ProMUSE utilizes a staged approach to integrate multi-modal data for Alzheimer's disease classification.
  • The framework begins with low-cost clinical data and incorporates MRI/PET only when necessary, based on uncertainty thresholds.
  • Experiments show ProMUSE reduces MRI/PET usage by 50-90% while maintaining competitive diagnostic accuracy.
  • The methodology employs Dempster–Shafer theory for effective fusion of modality-specific beliefs and uncertainties.
Read more
Connect the Dots: Training LLMs for Long-Lifecycle Agents with Cross-Domain Generalization Via Reinforcement Learning
Yanxi Chen, Weijie Shi, Yuexiang Xie, Boyi Hu, Yaliang Li, Bolin Ding, Jingren Zhou
Large Language Models Reinforcement Learning
  • Introduction of the CoD framework for training LLMs to enhance long-lifecycle agent capabilities.
  • End-to-end reinforcement learning approach interleaving task-solving and context-updating episodes.
  • Demonstrated improvements in task-solving performance through empirical results.
  • Potential for cross-domain generalization of the CoD meta-capability.
Read more
Learner-based Concept Drift Detection: Analysis and Evaluation
Md Moman Ul Haque Khan, Samira Sadaoui
Theory Time Series Efficient ML
  • Concept drift is a significant challenge for machine learning models in dynamic environments.
  • The paper categorizes drift detection methods into SPC, Window-based, and Ensemble-based frameworks.
  • A total of 15 drift detection algorithms are reviewed and empirically evaluated.
  • Synthetic and real-world datasets are used to assess the performance of the detection methods.
Read more
Spectral DPPs via NEPv: A Scalable Continuous Relaxation of Determinantal MAP for Diversity-Aware Data Selection
Richard Yi Da Xu
Optimization Efficient ML Theory
  • Introduces a continuous relaxation of the DPP-MAP problem, making it scalable for large datasets.
  • Develops a new NEPV formulation that allows for efficient iterative solving.
  • Achieves a time complexity of O(ndk + nk²)t, suitable for datasets with millions to billions of candidates.
  • Maintains the diversity objectives of DPPs while reducing computational costs.
Read more
Fisher-Geometric Sharpness and the Implicit Bias of SGD toward Flat Minima
Md Sakir Ahmed, Kumaresh Sarmah, Hemen Dutta
Theory Optimization
  • Introduces Riemannian sharpness as a reparametrization-invariant measure of flatness.
  • Establishes a connection between SGD's implicit bias and Riemannian flatness through a PAC-Bayes generalization bound.
  • Demonstrates that mini-batch SGD concentrates probability mass at Riemannian-flat minima.
  • Empirical results show Riemannian sharpness correlates with generalization better than Euclidean sharpness.
Read more
Multi-Modal Contrastive Learning for Implicit Earth Embeddings via Location Tying
Jonathan Hecht, Lukas Arzoumanidis, Ziyue Li, Youness Dehbi
Multimodal
  • Introduction of two multimodal contrastive learning architectures: MELT and SALT.
  • Both architectures utilize unpaired geospatial data, expanding beyond traditional two-modality approaches.
  • Performance is primarily limited by the location encoder rather than modality diversity.
  • MELT provides more stable training compared to SALT.
Read more
Neural network surrogates with uncertainty quantification for inverse problems in partial differential equations
Christian Jimenez-Beltran, Aretha L. Teckentrup, Antonio Vergari, Konstantinos C. Zygalakis
Theory Efficient ML
  • Introduction of DeepGaLA, a neural network surrogate for PDEs with uncertainty quantification.
  • Demonstrates the effectiveness of DA-MCMC for evaluating posterior approximations.
  • Achieves accuracy comparable to Gaussian-process surrogates while improving efficiency in high-dimensional settings.
  • Incorporates differential-equation constraints, enhancing applicability in nonlinear scenarios.
Read more
Multi-Task Bayesian In-Context Learning
Qingyang Zhu, Eric Karl Oermann, Kyunghyun Cho
Theory Efficient ML Time Series
  • Introduces a flexible framework for test-time adaptation in Bayesian inference.
  • Demonstrates robust generalization under out-of-meta-distribution prior shifts.
  • Achieves inference efficiency that is significantly faster than classical Bayesian methods.
  • Matches oracle Bayesian predictors across diverse task families.
Read more
Computational Methods and Challenges in Cell-Free DNA Analysis for Multi-Cancer Early Detection
Nicko Starkey, Marcin W. Wojewodzic, Krzysztof Rzecki
Multimodal
  • cfDNA is a promising biomarker for non-invasive multi-cancer early detection.
  • The review highlights various computational methods, including machine learning and deep learning approaches, for analyzing cfDNA.
  • Challenges in the field include technical, computational, and methodological issues that need to be addressed for clinical integration.
  • Multimodal ensemble approaches are identified as having the highest readiness for clinical application.
Read more
OnDeFog: Online Decision Transformer under Frame Dropping
Daiki Yotsufuji, Kenta Nishihara, Shoma Shimizu, Kento Uchida, Shinichi Shirakawa
Reinforcement Learning
  • OnDeFog combines offline learning mechanisms with online reinforcement learning to handle frame dropping.
  • It addresses the limitations of DeFog, which struggles with generalization due to its offline nature.
  • OnDeFog demonstrates superior performance in environments with high frame dropping rates.
  • The method outperforms DeFog on datasets with a significant amount of low-reward data.
Read more
Protein Representation Learning with Secondary-Structure and Energy-Filtered Hydrogen-Bond Graphs
Mohamed Mouhajir, Limei Wang, El Houcine Bergou, Hajar El Hammouti, Lamiae Azizi, Dongqi Fu
Graph Learning
  • Introduction of SSProNet, a graph neural network that incorporates secondary structure and hydrogen-bond interactions for protein representation.
  • Construction of protein graphs based on biophysically grounded topologies that reflect stabilizing forces rather than mere proximity.
  • Augmentation of residue nodes with secondary structure assignments to provide additional structural context.
  • Empirical validation shows consistent performance improvements over traditional methods on various protein-related tasks.
Read more
Pseudo-Feature Padding: A Lightweight Defense Against False Data Injection in Power Grids
Farhin Farhad Riya, Shahinul Hoque, Yingyuan Yang, Jinyuan Sun, Kevin Tomsovic
Theory Efficient ML
  • Introduction of a lightweight defense framework against FDIA in DNNs for CPS.
  • Utilization of pseudo-feature padding to increase input dimensionality and complexity.
  • Model-agnostic approach requiring no modifications to existing DNN architectures.
  • Demonstrated significant robustness improvements with negligible performance impact.
Read more
Efficiently Representing Algorithms With Chain-of-Thought Transformers
Yanhong Li, Anej Svete, Ashish Sabharwal, William Merrill
Theory Efficient ML Large Language Models
  • CoT transformers can simulate Word RAM algorithms with poly-logarithmic overhead.
  • The study provides a direct simulation method that is more efficient than Turing machine-based simulations.
  • Three practical settings for CoT simulation are established: finite-precision transformers, continuous CoT, and hybrid architectures.
  • The results show that CoT can execute common algorithms efficiently, such as sorting and Dijkstra's algorithm.
Read more
When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning
Daehwan Kim, Haejun Chung, Ikbeom Jang
Theory Efficient ML Optimization
  • Introduction of Adaptive Binning, a training-adaptive discretization method for tabular self-supervised learning.
  • Feature-wise coarse-to-fine curriculum allows for dynamic refinement of discretization during training.
  • Combines categorical reconstruction with ordinal supervision for improved representation learning.
  • Demonstrated consistent performance improvements across various medical tabular datasets.
Read more
Kolmogorov-Arnold Reservoir Computing
Juntian Huang, Jurgen Kurths, Ying Tang
Theory Efficient ML Time Series
  • KARC utilizes explicit basis-function expansions inspired by the Kolmogorov-Arnold representation theorem.
  • It achieves efficient closed-form training while preserving the expressive capacity of Kolmogorov-Arnold networks.
  • KARC outperforms existing reservoir computing methods on benchmarks involving chaotic systems and PDEs.
  • The framework can be integrated with generative diffusion models for enhanced feature forecasting.
Read more
Human-like autonomy emerges from self-play and a pinch of human data
Daphne Cornelisse, Julian Hunt, Zixu Zhang, Waël Doulazmi, Kevin Joseph, Jaime Fernández Fisac, Eugene Vinitsky
Reinforcement Learning Robotics
  • Spiced self-play combines self-play RL with minimal human data to improve driving policies.
  • Only 30 minutes of human driving data significantly enhances policy alignment with human behavior.
  • The method avoids extensive reward engineering and domain randomization, simplifying the training process.
  • Policies trained with this approach exhibit lower collision rates and more human-like driving behavior.
Read more
When Calibration Fails the Vulnerable Hospital: Federated Conformal Risk Control via Risk-Curve Shrinkage
Nafis Fuad Shahid
Federated Learning Computer Vision Theory
  • Quantifies the marginal-conditional coverage gap in federated CRC using real medical data.
  • Proposes a shrinkage-based federated CRC protocol to improve prediction set efficiency.
  • Demonstrates that naive pooling of calibration scores can lead to significant coverage violations.
  • Highlights the necessity of finite-sample correction terms in maintaining coverage guarantees.
Read more
LOKI: Memory-Free Null-Space Constrained Lifelong Knowledge Editing
Masih Eskandar, Miquel Sirera Perelló, Stratis Ioannidis, Jennifer Dy
NLP Large Language Models Efficient ML
  • Introduction of a dynamic layer selection algorithm for knowledge editing.
  • Utilization of null-space projections to preserve past knowledge.
  • Demonstrated superior performance compared to existing lifelong knowledge editing methods.
  • No need for access to previous knowledge or extensive preprocessing.
Read more
Predicting gestational age at birth in the context of preterm birth from multi-modal fetal MRI
Diego Fajardo-Rojas, Megan Hall, Daniel Cromb, Mary A. Rutherford, Lisa Story, Emma C. Robinson, Jana Hutter
Multimodal
  • Developed a machine learning pipeline for predicting gestational age at birth using multi-modal fetal MRI data.
  • Achieved an accuracy of 0.77 and specificity of 0.82 in classifying term and preterm births.
  • Identified key predictive features such as cervical length and placental T2* values.
  • Demonstrated the potential of using regression models for predicting gestational age, expanding the approach beyond classification.
Read more
Does Text Actually Help? Uncovering and Resolving Text Collapse in Multimodal Time Series Forecasting
Huu Hiep Nguyen, Minh Hoang Nguyen, Dung Nguyen, Hung Le
Time Series Multimodal
  • Identification of 'text collapse' as a critical failure mode in multimodal time series forecasting.
  • Introduction of REST-TS, a framework that resolves text collapse by supervising the text branch on the residual components.
  • REST-TS achieves state-of-the-art performance across diverse domains without requiring changes to backbone architectures.
  • Effective rank analysis shows that REST-TS enhances the utilization of textual information in forecasting.
Read more
Algebraic Dead Directions in LayerNorm Transformers: A Forward-Pass-Only Diagnostic at LLM Scale
Tejas Pradeep Shirodkar, P. J. Narayanan
Theory Large Language Models Optimization
  • Introduces a forward-pass-only method to identify dead directions in LayerNorm transformers.
  • Demonstrates that the inverse-scale direction of LayerNorm is a kernel for activation covariance.
  • Validates the method across 14 pretrained transformers, achieving high accuracy in predictions.
  • Shows that training increases the depth of dead directions, revealing more complex structures.
Read more
Convex training of Lipschitz-regularized shallow neural networks
Chao Yin, Antoine Lesage-Landry
Optimization Theory
  • Introduction of a convex training method for Lipschitz-regularized shallow neural networks.
  • The proposed method guarantees that the optimal network is no worse than the initial pre-trained network.
  • Demonstrated improvements in accuracy and robustness against adversarial attacks on real-world datasets.
  • The convex program can be solved efficiently using existing optimization solvers.
Read more
Critical Percolation as a Synthetic Data Model for Interpretability
Aryeh Brill, Tom Ingebretsen Carlson
Interpretability
  • Introduces a synthetic data model based on critical percolation clusters to improve interpretability research.
  • The model captures hierarchical and multi-scale structures, reflecting properties of natural data.
  • An efficient algorithm for generating data at arbitrary scales is proposed.
  • Probing experiments show that latent variables can be decoded from neural network activations.
Read more
On the QUEST for Uncertainty Quantification via Highest Density Regions
Sam Goring, Tom Kuipers, Nicola Paoletti, David S. Watson
Theory
  • QUEST provides a novel framework for uncertainty quantification based on highest density regions.
  • The approach addresses limitations of traditional proper scoring rules in regression tasks.
  • QUEST measures satisfy important axioms from the uncertainty quantification literature.
  • Empirical results indicate that QUEST outperforms standard uncertainty measures in selective prediction tasks.
Read more
VERITAS: Verifier-Guided Proof Search for Zero-Shot Formal Theorem Proving
Manish Acharya, Zhenyu Liao, Yueke Zhang, Kevin Leach, Yu Huang, Yifan Zhang
Theory Large Language Models Reinforcement Learning
  • Introduces VERITAS, a zero-shot framework for formal theorem proving.
  • Utilizes a two-phase protocol that incorporates structured verifier feedback into proof generation.
  • Achieves improved theorem solving rates compared to existing methods.
  • Releases VERITAS-CombiBench, a benchmark of 55 combinatorics theorems.
Read more
3D-DLP: Self-Supervised 3D Object-Centric Scene Representation Learning
Ellina Zhang, Madhaven Iyengar, Amir Zadeh, Chuan Li, Deepak Pathak, David Held, Tal Daniel
Computer Vision Robotics Reinforcement Learning
  • 3D-DLP is the first self-supervised object-centric representation model for colored 3D voxel data.
  • The model utilizes a compact particle representation that scales effectively to real-world data.
  • Key methodological innovations include an appearance-aware K-means keypoint prior and a chroma reconstruction loss.
  • The learned latent particles are controllable and interpretable, allowing for effective scene manipulation.
Read more
Score Approximation for Diffusion Models on Arbitrary Low-Dimensional Structures
Xinhe Mu, Zaijiu Shang, Zhaoqi Zhou, Chuan Zhou, Qi Meng, Guiying Yan, Zhiming Ma
Generative Models Theory Efficient ML
  • Establishes the first score approximation theory for any distribution with compact support, removing continuity and smoothness assumptions.
  • Introduces a discrete-mixture formulation that allows for score approximation with ReLU networks, mitigating the curse of dimensionality.
  • Demonstrates that the parameter size of the neural network grows with ε at order O(ε^(-d/2)), improving upon existing theoretical bounds.
  • Provides a divide-and-conquer strategy to bound discretization error and higher-order derivatives of log p_t(x), enhancing diffusion theory.
Read more
The Significance of Style Diversity in Annotation-Free Synthetic Data Generation
Zahra Abbasiantaeb, Zeno Belligoli, Omar Essam, Mohammad Aliannejadi
NLP Large Language Models Generative Models
  • Proposes an annotation-free framework for synthetic dialogue generation using intent definitions.
  • Introduces two stylization models (Univ and Exam) to enhance linguistic style diversity.
  • Demonstrates that style diversity is more critical than topic diversity for synthetic data utility.
  • Achieves up to 93.3% performance compared to human-annotated data in intent classification tasks.
Read more
VIMPO: Value-Implicit Policy Optimization for LLMs
Zhewei Kang, Aosong Feng, Sergey Levine, Dawn Song, Xuandong Zhao
Large Language Models Reinforcement Learning Optimization
  • VIMPO offers a critic-free approach to policy optimization that maintains simplicity while enhancing credit assignment.
  • The method derives a closed-form representation of the value function using policy log-ratios and Monte Carlo estimates.
  • VIMPO achieves better training efficiency and higher validation accuracy compared to GRPO, especially under noisy reward conditions.
  • The approach separates reward incorporation from policy improvement, allowing for coherent integration of outcome-based supervision.
Read more