AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

48 Papers today
8h Update frequency
7 Days of history
Meta-Ensemble Learning with Diverse Data Splits for Improved Respiratory Sound Classification
June-Woo Kim, Miika Toikkanen, Heejoon Koo, Yoon Tae Kim, Doyoung Kwon, Kyunghoon Kim
Audio & Speech
  • Introduction of a meta-ensemble learning method that leverages data diversity through distinct data splits.
  • Evaluation of various meta-model architectures, including feedforward and Transformer-based models.
  • Demonstration that data diversity at the base model level significantly enhances generalization.
  • Achievement of state-of-the-art performance on the ICBHI dataset and robust results on clinical validation datasets.
Read more
GraphPL: Leveraging GNN for Efficient and Robust Modalities Imputation in Patchwork Learning
Xingjian Hu, Zuoyu Yan, Jianhua Zhu, Liangcai Gao, Fei Wang, Tengfei Ma
Graph Learning Multimodal
  • GraphPL effectively addresses the modality collapse issue prevalent in existing methods.
  • The framework leverages GNNs for dynamic and adaptive fusion of modalities.
  • GraphPL shows an average improvement of 9.2% in imputation tasks across simulated datasets.
  • On the real-world EHR dataset eICU, it achieves an average improvement of 8.7% over baseline methods.
Read more
Perfecting Aircraft Maneuvers with Reinforcement Learning
Atahan Cilan, Mahir Demir, Özgün Can Yürütken, Seyyid Osman Sevgili, Ümit Can Bekar
Reinforcement Learning Robotics
  • The proposed RL algorithm can be applied to any physically acceptable maneuver with available trajectory data.
  • Real pilot data was utilized, demonstrating that AI can achieve performance comparable to professional pilots.
  • The study addresses stability issues commonly found in supervised learning approaches for aerobatic maneuvers.
  • The algorithm allows for the generation of scaled versions of maneuvers using the same reference trajectory.
Read more
Dynamic Regret for Online Regression in RKHS via Discounted VAW and Subspace Approximation
Dmitry B. Rokhlin, Georgiy A. Karapetyants
Theory Optimization
  • The paper extends the discounted VAW approach to the RKHS setting for online regression.
  • It introduces a general orthogonal truncation method for constructing RKHS from feature expansions.
  • Dynamic regret bounds are derived for both fast and slow regimes based on eigenvalue decay.
  • The method controls approximation errors through uniform projection errors of kernel sections.
Read more
Gradient-Direction Sensitivity Reveals Linear-Centroid Coupling Hidden by Optimizer Trajectories
Yongzhong Xu
Theory Interpretability Optimization
  • Traditional optimizer trajectory analysis often fails to capture feature-relevant directions in parameter space.
  • Gradient-based SED provides a more accurate representation of feature formation compared to update-based SED.
  • In multitask settings, gradient aggregation obscures important structure, necessitating task-resolved analysis.
  • Causal interventions reveal that low-rank structures, rather than specific directions, are crucial for understanding feature formation.
Read more
Laplace-Bridged Randomized Smoothing for Fast Certified Robustness
Miao Lin, MD Saifur Rahman Mazumder, Feng Yu, Daniel Takabi, Rui Ning
Computer Vision Efficient ML Theory
  • Introduction of Laplace-Bridged Smoothing (LBS) as a reformulation of Randomized Smoothing (RS).
  • LBS eliminates the need for noise-augmented training, reducing training costs and improving clean accuracy.
  • Significant reduction in certification costs, achieving speedups of up to 494× on edge devices.
  • Demonstrated stronger certified robustness on CIFAR-10 and ImageNet datasets compared to traditional RS.
Read more
Diffusion-Guided Feature Selection via Nishimori Temperature: Noise-Based Spectral Embedding
Vasiliy S. Usatyuk, Denis A. Sapozhnikov, Sergey I. Egorov
Graph Learning Efficient ML Theory
  • Introduction of Noise-Based Spectral Embedding (NBSE) for feature selection.
  • Utilization of the Nishimori temperature for determining critical points in the Bethe–Hessian matrix.
  • Achieves up to 70% feature reduction while preserving classification accuracy.
  • Demonstrates robustness against noise through theoretical bounds.
Read more
Approximating Uniform Random Rotations by Two-Block Structured Hadamard Rotations in High Dimensions
Tomer Zilca, Gal Mendelson
Theory Efficient ML
  • The two-block structured Hadamard rotation converges uniformly for individual coordinates to the distribution of a uniformly rotated vector.
  • An explicit Kolmogorov-distance bound of order d−1/5 is established for one-dimensional marginals.
  • A significant lower bound on the Wasserstein distance shows that the two-block transform is not a globally accurate surrogate for uniform random rotations.
  • The results indicate a clear distinction between marginal behavior and full high-dimensional geometry in terms of approximation quality.
Read more
Sustained Gradient Alignment Mediates Subliminal Learning in a Multi-Step Setting: Evidence from MNIST Auxiliary Logit Distillation Experiment
Chayanon Kitkana, Shivam Arora
Theory Optimization
  • Positive gradient alignment between trait and distillation gradients persists throughout multi-step training.
  • Removing the trait-aligned component of the distillation gradient effectively stops trait acquisition.
  • Liminal training reduces alignment but does not prevent trait acquisition, highlighting the inadequacy of current mitigation methods.
  • The study provides empirical evidence supporting the causal relationship between gradient alignment and subliminal learning.
Read more
Negative Ontology of True Target for Machine Learning: Towards Evaluation and Learning under Democratic Supervision
Yongquan Yang
Theory
  • Challenges the traditional assumption of the objective existence of the true target in ML.
  • Introduces Democratic Supervision as a participatory approach to supervision in ML.
  • Defines Multiple Inaccurate True Targets (MIATTs) to facilitate evaluation and learning.
  • Proposes the EL-MIATTs framework for ML-based predictive modeling.
Read more
VLM Judges Can Rank but Cannot Score: Task-Dependent Uncertainty in Multimodal Evaluation
Divake Kumar, Sina Tayebati, Devashri Naik, Ranganath Krishnan, Amit Ranjan Trivedi
Multimodal
  • VLMs can effectively rank responses but struggle with reliable absolute scoring.
  • Conformal prediction provides a method to quantify uncertainty in VLM evaluations.
  • Evaluation uncertainty is task-dependent, with significant variations in prediction interval widths.
  • A failure mode called 'ranking-scoring decoupling' is identified, where high ranking correlation does not guarantee reliable scores.
Read more
Optimization-Free Topological Sort for Causal Discovery via the Schur Complement of Score Jacobians
Rui Wu, Hong Xie
Graph Learning Theory Efficient ML
  • Introduces SSTS, an optimization-free algorithm for causal discovery.
  • Establishes a mathematical equivalence between graph marginalization and Schur complement of SJIM.
  • Demonstrates scalability to high-dimensional data (up to 1000 variables) without non-convex optimization.
  • Characterizes the expectation gap in non-linear systems and proposes Block-SSTS to mitigate structural errors.
Read more
The Last Human-Written Paper: Agent-Native Research Artifacts
Jiachen Liu, Jiaxin Pei, Jintao Huang, Chenglei Si, Ao Qu, Xiangru Tang, Runyu Lu, Lichang Chen, Xiaoyan Bai, Haizhong Zheng, Carl Chen, Zhiyang Chen, Haojie Ye, Yujuan Fu, Zexue He, Zijian Jin, Zhenyu Zhang, Shangquan Sun, Maestro Harmon, John Dianzhuo Wang, Jianqiao Zeng, Jiachen Sun, Mingyuan Wu, Baoyu Zhou, Yuchen You, Shijian Lu, Yiming Qiu, Fan Lai, Yuan Yuan, Yao Li, Junyuan Hong, Ruihao Zhu, Beidi Chen, Alex Pentland, Ang Chen, Mosharaf Chowdhury, Zechen Zhang
Theory
  • Introduction of Agent-Native Research Artifacts (ARA) to preserve the full research process.
  • Identification of 'Storytelling Tax' and 'Engineering Tax' as critical issues in traditional research publication.
  • Development of mechanisms to support the ARA ecosystem, including a Live Research Manager and ARA Compiler.
  • Significant improvements in question-answering accuracy and reproduction success rates using ARAs.
Read more
Measuring the Sensitivity of Classification Models with the Error Sensitivity Profile
Andrea Maurino
Theory
  • Introduction of the Error Sensitivity Profile (ESP) for assessing model sensitivity to data errors.
  • Development of the Dirtify tool suite to support the computation of ESP and facilitate data cleaning.
  • Demonstration of the effectiveness of ESP through experiments on two datasets with 14 classification models.
  • ESP provides a detailed, model-specific sensitivity profile that aids in prioritizing data-cleaning efforts.
Read more
Diffusion Templates: A Unified Plugin Framework for Controllable Diffusion
Zhongjie Duan, Hong Zhang, Yingda Chen
Generative Models Computer Vision
  • Diffusion Templates provide a unified framework for controllable diffusion, addressing fragmentation in existing methods.
  • The framework allows for the decoupling of model inference from capability injection, enhancing modularity and reusability.
  • A diverse set of Template models is released, covering various controllable generation tasks.
  • The system supports heterogeneous control modules through a standardized interface, improving integration across different architectures.
Read more
Meta-Aligner: Bidirectional Preference-Policy Optimization for Multi-Objective LLMs Alignment
Wenzhe Xu, Biao Liu, Yiyang Sun, Xin Geng, Ning Xu
NLP Large Language Models Reinforcement Learning
  • Introduction of MEAL, a framework for dynamic preference-policy optimization.
  • Utilization of a preference-weight-net for generating adaptive preference weights.
  • Establishment of a bidirectional feedback loop between preferences and policy responses.
  • Demonstration of improved performance on complex multi-objective benchmarks.
Read more
A Unifying Framework for Unsupervised Concept Extraction
Chandler Squires, Pradeep Ravikumar
Theory Generative Models Interpretability
  • Introduction of a new theoretical framework for unsupervised concept extraction based on latent concept generative models (LC-GMs).
  • Development of a meta-theorem for identifiability that simplifies proving guarantees for various concept extraction methods.
  • Demonstration of the framework's ability to recover existing identifiability results in related fields.
  • Discussion of the implications for method development, linking concept extraction with generative modeling and amortized posterior inference.
Read more
Supernodes and Halos: Loss-Critical Hubs in LLM Feed-Forward Layers
Audrey Cherilyn, Houman Safaai
Large Language Models NLP Efficient ML
  • Supernodes account for a significant portion of loss sensitivity in FFN layers, with the top 1% of channels capturing a median of 58.7% of LP mass.
  • Pruning methods that remove supernodes lead to substantial degradation in model performance, highlighting their critical role.
  • The study introduces the concept of 'write halos' around supernodes, which share redundancy and support, aiding in structured pruning.
  • The findings are validated across various LLMs, indicating a consistent pattern of LP concentration and the importance of core channel preservation.
Read more
Nautile-370M: Spectral Memory Meets Attention in a Small Reasoning Model
Maixent Chenebaux
NLP Large Language Models Reinforcement Learning
  • Introduction of SeqCond Attention (SCA), a novel sequence operator that enhances efficiency and expressiveness in language modeling.
  • Demonstration of SCA's theoretical expressiveness, proving it can replicate outputs of traditional self-attention mechanisms.
  • Development of a hybrid architecture combining SCA and transformer layers, optimizing for reasoning tasks.
  • Implementation of gradient-balanced GRPO and scored self-distillation to improve reinforcement learning outcomes.
Read more
Prior-Aligned Data Cleaning for Tabular Foundation Models
Laure Berti-Equille
Reinforcement Learning
  • Introduces L2C2, the first deep RL framework for tabular data cleaning focused on prior alignment.
  • Demonstrates the importance of reward design in RL for data cleaning, with several designs leading to trivial strategies.
  • TFMAwareReward effectively selects distinct cleaning pipelines and improves accuracy on challenging datasets.
  • Parameterized cleaning actions enhance the reward across most datasets.
Read more
Cortex-Inspired Continual Learning: Unsupervised Instantiation and Recovery of Functional Task Networks
Kevin McKee, Thomas Hazy, Yicong Zheng, Zacharie Bugaud, Thomas Miconi
Theory Efficient ML
  • FTN offers a parameter-isolation approach to prevent catastrophic forgetting in continual learning.
  • The three-stage mask configuration allows for unsupervised task detection and rapid recovery of task-specific subnetworks.
  • FTN-Slow achieves nearly zero forgetting across multiple benchmarks, while FTN-Fast balances speed and retention.
  • The method is inspired by biological neural mechanisms, enhancing its structural and functional efficiency.
Read more
SFT-then-RL Outperforms Mixed-Policy Methods for LLM Reasoning
Alexis Limozin, Eduard Durech, Torsten Hoefler, Imanol Schlag, Valentina Pyatkin
Large Language Models Reinforcement Learning Optimization
  • Identification of critical bugs in popular training frameworks that degrade SFT performance.
  • Correction of these bugs leads to the SFT-then-RL pipeline outperforming mixed-policy methods by significant margins.
  • A truncated SFT-then-RL variant with fewer RL steps still outperforms mixed-policy methods, highlighting efficiency.
  • The findings underscore the importance of rigorous validation across different frameworks in machine learning research.
Read more
Parameter Efficiency Is Not Memory Efficiency: Rethinking Fine-Tuning for On-Device LLM Adaptation
Irene Tenison, Stella Ahn, Miriam Kim, Ebtisam Alshehri, Lalana Kagal
NLP Large Language Models Efficient ML
  • Parameter efficiency does not equate to memory efficiency in LLM adaptation.
  • LARS introduces a novel approach that reduces activation memory during training.
  • LARS achieves significant memory savings while maintaining competitive performance.
  • The framework is applicable to resource-constrained devices, enhancing LLM deployment.
Read more
Scalable Hyperparameter-Divergent Ensemble Training with Automatic Learning Rate Exploration for Large Models
Hailing Cheng, Tao Huang, Chen Zhu, Antonio Alonso
Optimization Efficient ML
  • HDET allows simultaneous exploration of diverse learning rates across GPU replicas without additional hardware costs.
  • An automatic learning rate controller adapts the learning rate based on inter-replica performance, enhancing training efficiency.
  • The method improves model quality and convergence speed on large-scale training tasks.
  • HDET can be applied to various scalar hyperparameters, broadening its applicability in model training.
Read more
Safe-Support Q-Learning: Learning without Unsafe Exploration
Yeeun Lim, Narim Jeong, Donghwan Lee
Reinforcement Learning Robotics Theory
  • Introduces a behavior policy that enables safe RL without unsafe exploration in both online and offline settings.
  • Proposes a two-stage safe-support Q-learning framework with KL-regularized Bellman targets.
  • Demonstrates a unique fixed point for the safe Bellman operator and provides a method for policy extraction.
  • Achieves stable learning and safer behavior with performance on par or better than existing methods.
Read more
Intrinsic Mutual Information as a Modulator for Preference Optimization
Peng Liao, Peijia Zheng, Lingbo Li, Shangsong Liang, Lin Chen
NLP Large Language Models Optimization
  • RMiPO is a lightweight framework for offline preference optimization that reduces reliance on hyperparameter tuning.
  • The framework leverages intrinsic mutual information for dynamic hyperparameter modulation.
  • RMiPO achieves a reduction in training overhead by over 15% compared to existing methods.
  • Extensive evaluations demonstrate RMiPO's superior performance on benchmark datasets.
Read more
GeoEdit: Local Frames for Fast, Training-Free On-Manifold Editing in Diffusion Models
Yiming Zhang, Sitong Liu, Ke Li, Zhihong Wu, Alex Cloninger, Melvin Leok
Generative Models Optimization Computer Vision
  • Introduces a Jacobian-free algorithm for on-manifold editing in diffusion models.
  • Proves a theoretical guarantee for the accuracy of tangent space estimation from perturbed samples.
  • Enables rapid, continuous editing without the need for full re-diffusion or retraining.
  • Demonstrates effective CLIP-guided optimization for semantic image editing.
Read more
Rethinking Layer Redundancy in Large Language Models: Calibration Objectives and Search for Depth Pruning
Minkyu Kim, Vincent-Daniel Yun, Youngrae Kim, Youngjin Heo, Suin Cho, Seong-hun Kim, Woosang Lim, Gaeul Kwon
NLP Large Language Models Efficient ML
  • Layer redundancy in LLMs is influenced by both the model and the evaluation objective.
  • Different calibration objectives yield qualitatively different pruning patterns.
  • Perplexity and downstream accuracy rankings do not consistently align.
  • Search algorithms converge to similar solutions under a fixed objective.
Read more
QFlash: Bridging Quantization and Memory Efficiency in Vision Transformer Attention
Sehyeon Oh, Yongin Kwon, Jemin Lee
Computer Vision Efficient ML
  • QFlash enables fully integer-based softmax computation in Vision Transformers.
  • Achieves significant speedups (up to 8.69×) over existing methods while reducing energy consumption.
  • Addresses critical challenges in integer-only attention mechanisms, including scale explosion and GPU inefficiencies.
  • Maintains competitive accuracy levels, ensuring practical applicability in real inference scenarios.
Read more
Causal Representation Learning from General Environments under Nonparametric Mixing
Ignavier Ng, Shaoan Xie, Xinshuai Dong, Peter Spirtes, Kun Zhang
Theory Graph Learning
  • Introduces a framework for causal representation learning in general environments without restrictive assumptions.
  • Demonstrates the ability to fully recover latent DAGs and identify causal variables under nonparametric mixing.
  • Utilizes third-order derivatives to extract causal ordering information, improving upon existing methods.
  • Validates theoretical results through simulation studies, showcasing practical applicability.
Read more
FedSLoP: Memory-Efficient Federated Learning with Low-Rank Gradient Projection
Yutong He, Zhengyang Huang, Jiahe Geng
Federated Learning Optimization Efficient ML
  • FedSLoP reduces communication and memory costs in federated learning.
  • The algorithm employs low-rank gradient projections to maintain optimization efficiency.
  • Theoretical convergence guarantees are provided for nonconvex settings.
  • Empirical results demonstrate competitive accuracy against existing methods.
Read more
Feasible-First Exploration for Constrained ML Deployment Optimization in Crash-Prone Hierarchical Search Spaces
Christian Lysenstøen
Optimization
  • Introduces Thermal Budget Annealing (TBA) for feasible-first exploration in deployment optimization.
  • Identifies the importance of early exploration quality in crash-heavy and budget-constrained environments.
  • Presents DeployBench, a benchmark suite for evaluating deployment optimization strategies.
  • Demonstrates improved model family discovery and reduced wasted budget compared to cold-start TPE.
Read more
PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference
Ishan Patel, Ishan Joshi
Large Language Models Efficient ML NLP
  • Introduces a shared-pool architecture for KV cache that allows multiple agents to access a single compressed cache.
  • Achieves a stable 2.91x compression ratio across various configurations and models.
  • Demonstrates significant memory savings, reducing KV cache memory from 19.8 GB to 0.45 GB with minimal performance degradation.
  • Finds that perplexity degradation improves with longer context lengths, suggesting implicit regularization effects.
Read more
BitRL: Reinforcement Learning with 1-bit Quantized Language Models for Resource-Constrained Edge Deployment
Md. Ashiq Ul Islam Sajid, Mohammad Sakib Mahmood, Md. Tareq Hasan, Md Abdur Rahim, Rafat Ara, Md. Arafat Hossain
Reinforcement Learning Large Language Models Efficient ML
  • BitRL is the first framework to integrate 1-bit quantized LLMs with reinforcement learning for edge deployment.
  • Achieves 10-16× memory reduction and 3-5× energy efficiency improvements while retaining 85-98% of task performance.
  • Theoretical analysis characterizes quantization effects and identifies value estimation as a critical bottleneck.
  • Real-world validation on commodity hardware (Raspberry Pi 4) confirms the framework's practical utility.
Read more
Scaling Multi-Node Mixture-of-Experts Inference Using Expert Activation Patterns
Abhimanyu Bambhaniya, Geonhwa Jeong, Jason Park, Jiecao Yu, Jaewon Lee, Pengchao Wang, Changkyu Kim, Chunqiang Tang, Tushar Krishna
Large Language Models Efficient ML Optimization
  • Characterization of expert activation patterns across multiple MoE models reveals critical insights into load imbalance and activation correlation.
  • Proposed workload-aware micro-batch grouping and expert placement strategies significantly reduce inter-node communication overhead.
  • Optimizations lead to a 20% reduction in all-to-all communication volume and a 6% decrease in MoE layer latency.
  • The approach provides both theoretical insights and practical solutions for scaling MoE inference in large multi-node clusters.
Read more
Unstable Rankings in Bayesian Deep Learning Evaluation
Qishi Zhan, Minxuan Hu, Guansu Wang, Jiaxin Liu, Liang He
Theory
  • Standard evaluations of Bayesian deep learning methods are unreliable under data scarcity.
  • Method rankings are dataset-dependent and can vary significantly across different datasets.
  • A Bayesian hierarchical model is proposed to treat evaluation metrics as random variables.
  • The Minimum Detectable Difference curve helps assess the reliability of observed performance gaps.
Read more
PathMoG: A Pathway-Centric Modular Graph Neural Network for Multi-Omics Survival Prediction
Di Wang, Chupei Tang, Junxiao Kong, Jixiu Zhai, Moyu Tang, Tianchi Lu
Graph Learning Multimodal Interpretability
  • PathMoG reorganizes genomic data into pathway modules to enhance biological relevance in survival prediction.
  • The Hierarchical Omics Modulation (HOM) mechanism allows for better integration of multi-omics data.
  • A dual-level attention mechanism captures complex interactions between pathways and clinical outcomes.
  • PathMoG outperforms existing survival prediction models across multiple cancer types.
Read more
Towards interpretable AI with quantum annealing feature selection
Francesco Aldo Venturelli, Emanuele Costa, Sikha O K, Bruno Juliá-Díaz, Miguel A. González Ballester, Alba Cervera-Lierta
Computer Vision Interpretability Optimization
  • Introduces a quantum annealing-based feature selection method for CNNs.
  • Enhances interpretability by identifying important feature maps for predictions.
  • Demonstrates improved class disentanglement compared to existing methods.
  • Analyzes the computational behavior of quantum annealing in feature selection.
Read more
Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs
Divakar Kumar Yadav, Tian Zhao, Deepak Kumar
Efficient ML Optimization Large Language Models
  • CuTile achieves up to 1,007 TFLOP/s for fused attention on Blackwell B200, outperforming FlashAttention-2 by 2.5x.
  • CuTile is a practical replacement for WMMA in GEMM tasks, requiring significantly less code.
  • CuTile does not outperform cuBLAS for standard GEMM workloads, achieving only 52-79% of its performance.
  • Triton offers better portability across architectures compared to CuTile.
Read more
Efficient VQ-QAT and Mixed Vector/Linear quantized Neural Networks
Terry Gou, Puneet Gupta
Efficient ML
  • Introduction of a cosine similarity-based assignment method for VQ model weight compression.
  • Enhancement of Differentiable K-Means with top-1 sampling and a straight-through estimator.
  • Exploration of differentiable neural architecture search for adaptive layer-wise quantization.
  • Demonstrated faster training and better accuracy preservation on ResNet-18 for ImageNet classification.
Read more
CoreFlow: Low-Rank Matrix Generative Models
Dongze Wu, Linglingzhi Zhu, Yao Xie
Generative Models
  • CoreFlow learns shared low-rank geometry for matrix distributions, improving training efficiency.
  • The model effectively handles incomplete matrices through masked updates and iterative completion.
  • CoreFlow shows substantial improvements in generation quality in few-sample settings.
  • The approach preserves matrix structure while reducing the effective generative dimension.
Read more
Time-varying Interaction Graph ODE for Dynamic Graph Representation Learning
Xiaoyi Wang, Zhiqiang Wang, Jianqing Liang, Xingwang Zhao, Chuangyin Dang, Zhen Jin, Jiye Liang
Graph Learning
  • TI-ODE models inter-node interactions as a time-dependent combination of multiple basis functions.
  • The model captures both the diversity of interaction patterns and their time-varying nature.
  • TI-ODE shows superior robustness compared to traditional models with a unified message-passing mechanism.
  • Experimental results indicate state-of-the-art performance on multiple dynamic graph benchmarks.
Read more
EvoTSC: Evolving Feature Learning Models for Time Series Classification via Genetic Programming
Xuanhao Yang, Bing Xue, Mengjie Zhang
Time Series
  • EvoTSC leverages genetic programming to evolve feature learning models for time series classification.
  • The multi-layer program structure integrates expert knowledge to enhance the evolutionary search process.
  • A modified Pareto tournament selection strategy is introduced to reduce overfitting and improve model generalizability.
  • EvoTSC significantly outperforms eleven benchmark methods in various experimental comparisons.
Read more
Online combinatorial optimization with stochastic decision sets and adversarial losses
Gergely Neu, Michal Valko
Optimization Theory
  • Introduces algorithms for online combinatorial optimization with stochastic action availability.
  • Proposes Counting Awake Times for efficient loss estimation.
  • Improvements in regret bounds from O(T^4/5) to O(T^2/3) and O(√T) in restricted settings.
  • Eliminates costly exploration phases present in previous algorithms.
Read more
Hybrid JIT-CUDA Graph Optimization for Low-Latency Large Language Model Inference
Divakar Kumar Yadav, Tian Zhao
Large Language Models Optimization Efficient ML
  • Introduction of a hybrid JIT-CUDA Graph runtime to optimize LLM inference.
  • Significant reduction in inference latency and variance for short-sequence workloads.
  • Asynchronous graph capture mechanism enhances runtime flexibility.
  • Evaluation shows up to 66.0% reduction in Time-to-First-Token (TTFT).
Read more
FlashOverlap: Minimizing Tail Latency in Communication Overlap for Distributed LLM Training
Rezaul Karim, Austin Wen, Wang Zongzuo, Weiwei Zhang, Yang Liu, Walid Ahmed
Large Language Models Efficient ML Optimization
  • FlashOverlap eliminates tail latency in communication-computation overlap for distributed LLM training.
  • The method uses peer-to-peer communication to replace traditional collective operations, allowing for fine-grained overlap.
  • FlashOverlap is compatible with various parallelism strategies, enhancing its applicability across different architectures.
  • Experimental evaluations show significant improvements in latency, MFU, and throughput.
Read more
Compute Aligned Training: Optimizing for Test Time Inference
Adam Ousherovitch, Ambuj Tewari
NLP Large Language Models Reinforcement Learning
  • Introduction of Compute Aligned Training (CAT) to align training objectives with test-time strategies.
  • Derivation of new loss functions that optimize model performance during inference.
  • Empirical evidence showing substantial improvements in test-time scaling over standard training methods.
  • Extension of alignment techniques beyond traditional SFT and RL to include various inference strategies.
Read more
Prior-Agnostic Robust Forecast Aggregation
Zhi Chen, Cheng Peng, Wei Tang
Theory
  • Introduces a prior-agnostic robust forecast aggregation framework for unknown state spaces.
  • Develops a closed-form log-odds aggregator that pools forecasts in logit space.
  • Establishes minimax-regret guarantees, demonstrating the complexity of unknown state spaces.
  • Achieves a worst-case regret of 0.0255 under conditionally independent signals.
Read more