AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

48 Papers today
8h Update frequency
7 Days of history
The Significance of Style Diversity in Annotation-Free Synthetic Data Generation
Zahra Abbasiantaeb, Zeno Belligoli, Omar Essam, Mohammad Aliannejadi
NLP Large Language Models Generative Models
  • Introduces an annotation-free framework for synthetic dialogue generation using intent definitions.
  • Demonstrates that style diversity is more crucial than topic diversity for the utility of synthetic data.
  • Presents two novel stylization models (Univ and Exam) for enhancing the linguistic style of generated dialogues.
  • Achieves up to 93.3% accuracy compared to human-annotated data, showcasing the effectiveness of the proposed methods.
Read more
Insulin4RL: Real-Time Insulin Management in the Intensive Care Unit for Offline Reinforcement Learning
Thomas Frost, Steve Harris
Reinforcement Learning
  • Insulin4RL dataset features real clinical trajectories with irregular inputs and actions.
  • The dataset is derived from MIMIC-IV and includes over 375,000 labeled decisions.
  • Traditional discretization of EHR data can lead to biased evaluations and maladaptive policies.
  • The paper provides baseline performance metrics and a standardized evaluation protocol for ORL models.
Read more
Compositionality Emerges in a Narrow Depth-Connectivity Regime: Architecture Constraints and Solution Manifolds
Dat H. Do, Rushi Shah, Duc V. Le, Dianbo Liu
Theory Optimization Generative Models
  • Compositionality in neural networks emerges in a specific depth-connectivity regime.
  • Sparse networks exhibit compositionality based on retained connections rather than just weight sparsity.
  • The introduction of Similarity-based Pruning (SP) enhances compositional connectivity.
  • A heuristic depth predictor identifies optimal depths for achieving compositionality.
Read more
SL-S4Wave: Self-Supervised Learning of Physiological Waveforms with Structured State Space Models
Feng Wu, Harsh Deep, Eric Lehman, Sanyam Kapoor, Guoshuai Zhao, Rahul Krishnan, Gari Clifford, Li-wei H Lehman
Time Series
  • SL-S4Wave combines contrastive learning with structured state space models for improved modeling of physiological waveforms.
  • The framework demonstrates strong label efficiency, requiring fewer labeled examples for high performance.
  • It effectively captures long-range dependencies and noise robustness in multichannel physiological signals.
  • SL-S4Wave outperforms existing state-of-the-art methods in arrhythmia detection and generalizes well to EEG tasks.
Read more
Compute Efficiency and Serial Runtime Tradeoffs for Stochastic Momentum Methods
Depen Morwani, Alexandru Meterez, Pranav Nair, Sham Kakade
Optimization Theory Efficient ML
  • Stochastic momentum methods like HB and ASGD have distinct impacts on compute efficiency and serial runtime.
  • HB maintains SGD-level compute efficiency over a larger batch-size window, allowing for reduced serial runtime.
  • ASGD shows improved compute efficiency for small batches but trades this off for better serial runtime at larger batch sizes.
  • The study provides theoretical lower bounds for the performance of HB and ASGD under various spectral conditions.
Read more
Bayesian Anytime Pareto Set Identification for Multi-Objective Multi-Armed Bandits
Lennert Saerens, Bram Silue, Eleni Litsa, Peter Vrancx, Pieter Libin
Optimization Theory Efficient ML
  • Introduction of TTPFTS, the first Bayesian anytime algorithm for MOMAB PSI.
  • Demonstrated efficiency gains in molecular discovery applications compared to traditional methods.
  • Development of a new uncertainty quantification metric for Bayesian MOMAB PSI algorithms.
  • Empirical validation against state-of-the-art algorithms on synthetic benchmarks.
Read more
SAGE: Retain-Aware Post-Hoc Sanitization of Final Unlearning Vector
Jingyuan Zhang, Yucheng Bai, Peixi Wen, Zhehao Huang, Zhengbao He, Hanling Tian, Xinwen Cheng, Haiyin Ran, Xiaolin Huang
Large Language Models Optimization Efficient ML
  • SAGE provides a post-hoc solution for improving retention in unlearning processes without rerunning original pipelines.
  • The method quantifies retention damage using retention activation bias and applies spectral sanitization to the final update vector.
  • Empirical results show a consistent improvement in the retain-forget trade-off across multiple unlearning methods and model sizes.
  • SAGE achieves an average retention capability increase of 26.3% while maintaining effective unlearning.
Read more
Constrained hybrid modelling to predict microbial dynamics and organic matter turnover in soil systems
Paul Collart, Juergen Gall, Andrea Schnepf, Holger Pagel, Lars Doorenbos
Theory Optimization
  • Introduction of HySoMi, a hybrid modeling framework for soil carbon cycling predictions using genomic data.
  • Integration of ecological constraints into the model to ensure realistic predictions of microbial dynamics.
  • Demonstrated improved performance over traditional models, even with small training datasets.
  • Effective learning of dynamics for unmeasurable components of the soil model.
Read more
Neural Additive and Basis Models with Feature Selection and Interactions
Yasutoshi Kishimoto, Kota Yamanishi, Takuya Matsuda, Shinichi Shirakawa
Interpretability Efficient ML Theory
  • Introduction of a feature selection mechanism in NAM and NBM to enhance computational efficiency.
  • Ability to handle high-dimensional datasets and capture feature interactions effectively.
  • Demonstrated better or comparable performance against existing GAMs and other models.
  • Maintains high interpretability while improving throughput over traditional NAM and NBM.
Read more
RouteJudge: An Open Platform for Reproducible and Preference-Aware LLM Routing
Guannan Lai, Haoran Hu, Han-Jia Ye
Large Language Models NLP Efficient ML
  • RouteJudge shifts the evaluation focus from model-level response quality to router-level decision quality.
  • The platform allows for preference-aware evaluations through anonymous pairwise comparisons of model responses.
  • ORBIT provides a standardized workflow for developing and assessing LLM routing algorithms.
  • The framework supports continuous expansion of routing methods and encourages reproducibility in evaluations.
Read more
A Hybrid LSTM--Vision Transformer Architecture for Predicting HRRR Forecast Errors
David Aaron Evans, Jay C. Rothenberger, Kara J. Sulia, Nick P. Bassill, Chris D. Thorncroft
Time Series Multimodal
  • The hybrid LSTM-ViT framework improves forecast-error prediction skill compared to baseline LSTM models.
  • Incorporating vertically resolved atmospheric profiles enhances the model's ability to capture complex PBL processes.
  • The model achieves significant improvements in predicting precipitation forecast errors, with up to a twofold increase in predictive skill.
  • The approach is applicable across diverse forecasting environments due to the availability of dense surface observations.
Read more
Connect the Dots: Training LLMs for Long-Lifecycle Agents with Cross-Domain Generalization Via Reinforcement Learning
Yanxi Chen, Weijie Shi, Yuexiang Xie, Boyi Hu, Yaliang Li, Bolin Ding, Jingren Zhou
Large Language Models Reinforcement Learning
  • Introduction of the CoD framework for training LLMs as long-lifecycle agents.
  • Emphasis on the meta-capability of continuous learning and adaptation in dynamic environments.
  • Development of a specialized RL algorithm for effective credit assignment.
  • Demonstrated improvements in task-solving performance through empirical results.
Read more
Evolutionary Two-Stage Hyperparameter Optimization Strategies for Physics-Informed Neural Networks
Fedor Buzaev, Dmitry Efremenko, Egor Bugaev, Andrei Ermakov, Denis Derkach, Daria Pugacheva, Fedor Ratnikov
Optimization
  • Introduces a two-stage evolutionary optimization strategy for hyperparameter tuning in PINNs.
  • Demonstrates that evolutionary algorithms outperform classical methods like Bayesian optimization and grid search.
  • Establishes guidelines for optimal budget distribution between exploration and exploitation phases.
  • Achieves significant improvements in solution accuracy with constrained computational resources.
Read more
Algebraic Dead Directions in LayerNorm Transformers: A Forward-Pass-Only Diagnostic at LLM Scale
Tejas Pradeep Shirodkar, P. J. Narayanan
Theory Large Language Models Optimization
  • Introduces a forward-pass-only method to identify dead directions in LayerNorm transformers.
  • Derives a closed-form expression for the dead direction based on the LayerNorm scale parameter.
  • Validates the method across 14 pretrained transformers, achieving high accuracy in predictions.
  • Demonstrates that training increases the depth of dead directions significantly.
Read more
Artemis: Anatomy-Resolved inTervention for Eliminating Multimodal NeuroImage confounderS
Siyuan Dai, Yang Du, Kun Zhao, Zhusuyi Chen, Heng Huang, Paul Thompson, Chao Shi, Haoteng Tang, Liang Zhan
Graph Learning Multimodal
  • Artemis addresses region-specific demographic confounding in multimodal neuroimaging.
  • The framework provides a lightweight, plug-in intervention module for existing GNN architectures.
  • Significant improvements in predictive accuracy and AUC metrics were observed across multiple clinical benchmarks.
  • The approach enhances the interpretability of GNN models in clinical neuroscience.
Read more
Bounded Context Management for Tabular Foundation Models on Stream Learning
Jinmo Lee, Doyun Choi, Moongi Choi, Jaemin Yoo
Theory Efficient ML Time Series
  • Introduces a future-information view for context management in tabular stream learning.
  • Proposes CURE, a context management policy that enhances prediction accuracy.
  • Demonstrates up to 27.0% relative improvement over classical stream learners.
  • CURE maintains robustness across multiple TFM architectures.
Read more
Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language Models
Darrien McKenzie, Nicklas Hansen, Xiaolong Wang
Large Language Models Reinforcement Learning Optimization
  • Introduces Bayesian Manifold Curriculum (BMC) for structured problem sampling in RL for LLMs.
  • Frames problem sampling as a manifold-structured bandit problem, capturing the relationships between tasks.
  • Demonstrates the importance of balancing productivity, diversity, and utility in training strategies.
  • Develops latent task trees to represent the hierarchical structure of task relationships.
Read more
AGDN: Learning to Solve Traveling Salesman Problem with Anisotropic Graph Diffusion Network
Bolin Shen, Ziwei Huang, Zhiguang Cao, Yushun Dong
Graph Learning Optimization
  • AGDN addresses critical limitations in existing GNN approaches for TSP, particularly regarding graph sparsification and multi-hop information propagation.
  • The MixScore transition matrix enhances the model's ability to capture informative topological priors.
  • Anisotropic graph diffusion allows for improved information exchange between nodes, addressing the challenges of disconnected optimal node pairs.
  • AGDN outperforms existing methods in various experimental settings while ensuring efficient computation.
Read more
Concept Modulation Models: A Unified Framework for Identifiability and Extrapolation
Soheun Yi, Yizhou Lu, Chandler Squires, Pradeep Ravikumar
Generative Models Theory
  • Introduction of Concept Modulation Models (CMMs) as a unified framework for identifiability and extrapolation.
  • CMMs separate attribute-specific indexing from shared modulation mechanisms, enhancing understanding of latent-variable settings.
  • Establishment of algebraic criteria for extrapolation based on attribute potentials.
  • Recovery of existing identifiability and extrapolation results while providing new guarantees for structured attribute spaces.
Read more
Towards Modality-imbalanced Federated Graph Learning: A Data Synthesis-based Approach
Zhengyu Wu, Hongchao Qin, Xunkai Li, Zekai Chen, Rong-Hua Li, Guoren Wang
Federated Learning Graph Learning Multimodal
  • Introduces a novel framework, FedMGS, for addressing modality imbalance in federated graph learning.
  • Identifies and characterizes two types of modality imbalance: client-level and node-level.
  • Employs a graph-aware approach to recover missing modalities without compromising data privacy.
  • Demonstrates significant performance improvements over existing methods through extensive experiments.
Read more
The Token Is a Group Element: On Lie-Algebra Attention over Matrix Lie Groups
Przemyslaw Musialski
Robotics Computer Vision Theory
  • Introduces the concept of a token as a bare group element in matrix Lie groups.
  • Develops a closed-form attention score based on the negative squared algebra norm of the relative pose.
  • Demonstrates the method's applicability to various matrix Lie groups, including non-compact and non-abelian cases.
  • Shows significant performance improvements over traditional vector-token attention methods.
Read more
Robust and Interpretable Adaptation of Equivariant Materials Foundation Models via Sparsity-promoting Fine-tuning
Youngwoo Cho, Seunghoon Yi, Wooil Yang, Sungmo Kang, Young-woo Son, Jaegul Choo, Joonseok Lee, Soo Kyung Kim, Hongkee Yoon
Graph Learning Interpretability Efficient ML
  • Introduction of a sparsity-promoting fine-tuning method for equivariant MLIPs.
  • Achieves high accuracy with minimal parameter updates (0.5% to 3%).
  • Demonstrates versatility across various material property predictions.
  • Provides physically interpretable insights into model representations.
Read more
When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning
Daehwan Kim, Haejun Chung, Ikbeom Jang
Theory Efficient ML Optimization
  • Introduction of Adaptive Binning, a training-adaptive discretization method for tabular SSL.
  • Feature-wise coarse-to-fine curriculum that refines discretization based on learning dynamics.
  • Integration of categorical reconstruction with ordinal supervision for mixed feature types.
  • Demonstrated consistent performance improvements across various medical tabular datasets.
Read more
Optimal Deterministic Multicalibration and Omniprediction
Georgy Noarov, Aaron Roth
Theory
  • Introduces a minimax-optimal multicalibration algorithm that outputs deterministic predictors.
  • Demonstrates that deterministic predictors can achieve the same sample complexity as randomized ones.
  • Extends the algorithm to ensure outcome indistinguishability for finite test collections.
  • Provides deterministic omnipredictors and panpredictors, resolving open problems in the field.
Read more
Signature filtering: a lightweight enhancement for statistical watermark detection in large language models
Chih-Duo Hong, Yen-Pang Chen, Fang Yu
Large Language Models Optimization NLP
  • Signature filtering improves watermark detection rates significantly, especially in challenging scenarios.
  • The method does not require changes to watermark embedding or text generation processes.
  • It utilizes a mixed-integer linear program to identify disruptive tokens for removal.
  • Empirical results show that detection rates can increase from 8-31% to 78-99% with filtering.
Read more
Seed-Guided Semi-Supervised Clustering by A-Contrario Anomaly Detection
Nassir Mohammad
Theory Efficient ML
  • Introduces a statistical duality framework for clustering and anomaly detection.
  • Develops a robust Perception algorithm that eliminates the need for manual parameter tuning.
  • Implements a seed-guided expansion process that integrates expert intent while being resilient to noise.
  • Achieves competitive performance on various datasets with minimal user input.
Read more
Be Your Own Teacher: Steering Protein Language Models via Unsupervised Reward Optimization
Lanqing Li, Shentong Mo, Yang Yu, Pheng-Ann Heng
Generative Models Reinforcement Learning Optimization
  • Introduces unsupervised reward optimization for fine-tuning protein language models.
  • Proposes two algorithms, SRO and BRO, that enhance controllability without ground-truth labels.
  • Demonstrates significant performance improvements over competitive baselines in various tasks.
  • Provides new datasets and benchmark tasks for evaluating PLM controllability.
Read more
Does VLA Even Know the Basics? Measuring Commonsense and World Knowledge Retention in Vision-Language-Action Models
Nikita Kachaev, Andrey Moskalenko, Matvey Skripkin, Nikita Kurlaev, Daria Pugacheva, Albina Burlova, Mikhail Kolosov, Denis Shepelev, Andrey Kuznetsov, Elena Tutubalina, Aleksandr I. Panov, Alexey K. Kovalev, Vlad Shakhuro
Multimodal Robotics
  • Introduction of Act2Answer, a protocol for evaluating VLA models through action-based answer selection.
  • Creation of a diverse benchmark suite with 1,720 questions across various commonsense and world knowledge categories.
  • Empirical analysis shows VLA models excel at simple tasks but have significant gaps in complex semantic understanding.
  • Co-training with VQA tasks correlates with improved knowledge retention in VLA models.
Read more
Semantic Robustness Certification for Vision-Language Models
Peiyu Yang, Paul Montague, Feng Liu, Andrew C. Cullen, Amardeep Kaur, Christopher Leckie, Sarah M. Erfani
Multimodal
  • Introduces a framework for certifying VLM robustness under semantic-level transformations.
  • Uses text prompts as semantic proxies to formalize transformations without needing additional data.
  • Characterizes VLM decision boundaries to determine prediction-invariant intervals.
  • Demonstrates effectiveness through experiments on synthetic and real-world data.
Read more
An Information Theoretic Framework for Graph Novelty Generation via Latent Mixture Modeling
Itsuki Nakagawa, Kenji Yamanishi
Generative Models Graph Learning Theory
  • Introduces a novel framework for graph novelty generation using latent mixture modeling.
  • Imposes novelty and reliability conditions based on the Minimum Description Length principle.
  • Theoretical guarantees on misclassification probabilities for novelty and reliability.
  • Empirical results demonstrate superior control over novelty generation compared to existing methods.
Read more
ThousandWorlds: A benchmark for climate emulation of potentially habitable exoplanets
Edward T. Stevenson, Mei Ting Mak, Eric Wolf, Denis E. Sergeev, Tobi Hammond, N. J. Mayne, Miles Cranmer
Theory Efficient ML
  • ThousandWorlds is a curated benchmark dataset for exoclimate emulation, containing 1800 simulations from five GCMs.
  • The dataset supports three levels of complexity in regression tasks, catering to both single and multi-simulator scenarios.
  • Evaluation protocols are introduced to measure emulator performance against GCM variability, enhancing scientific utility assessment.
  • Gaussian process methods show superior performance compared to deep learning techniques in this context.
Read more
Sensorimotor World Models: Perception for Action via Inverse Dynamics
Petr Ivashkov, Randall Balestriero, Bernhard Schölkopf
Robotics Reinforcement Learning Theory
  • Introduction of Sensorimotor World Model (SMWM) that integrates perception and action.
  • Utilization of inverse dynamics regularization to prevent representation collapse.
  • Training from offline, reward-free trajectories without complex regularizers.
  • Empirical evidence of learned representations tracking controllable dynamics.
Read more
Domain-Shift Aware Neural Networks for Unbalance Characterization in Rotating Systems
Bernardo Feijó Junqueira, Claudio Kiyoshi Umezu, Bruno Bilhar Karaziack, Tomaz Junior, Daniel Alves Castello
Theory
  • Introduces a domain-shift aware neural network for estimating unbalance in rotating systems.
  • Utilizes a maximum mean discrepancy strategy for feature alignment across different operational conditions.
  • Demonstrates improved prediction accuracy in the presence of domain shifts.
  • Highlights the challenges of data scarcity and domain discrepancies in SHM.
Read more
Kolmogorov-Arnold Reservoir Computing
Juntian Huang, Jurgen Kurths, Ying Tang
Theory Efficient ML Time Series
  • KARC improves upon traditional reservoir computing by using explicit basis-function expansions.
  • The framework allows for efficient closed-form training while preserving expressive capacity.
  • KARC outperforms existing methods on challenging benchmarks, including chaotic systems and PDEs.
  • The approach can be integrated with generative models, enhancing applications in areas like text-to-image generation.
Read more
From Sparse Features to Trustworthy Proxies: Certifying SAE-Based Interpretability
Dibyanayan Bandyopadhyay, Asif Ekbal
NLP Large Language Models Interpretability
  • Introduces a certification framework for assessing the interpretability of frozen language models using sparse autoencoders.
  • Derives a risk bound that decomposes into four measurable terms, providing a clear criterion for trustworthiness.
  • Empirical validation shows non-vacuous bounds for multiple language models at practical sample sizes.
  • Layerwise analysis indicates that later layers are easier to certify, highlighting depth-dependent behavior in model interpretability.
Read more
P$^2$CE: Model-Agnostic Plausible Pareto-Optimal Counterfactual Explanations
Arthur Hendricks Mendes de Oliveira, Giovani Valdrighi, Marcos Medeiros Raimundo
Interpretability
  • P2CE is a model-agnostic algorithm that generates optimal counterfactual explanations.
  • The algorithm ensures that explanations are plausible and within the data distribution.
  • P2CE leverages outlier detection and SHAP values to enhance computational efficiency.
  • Empirical evaluations show superior performance compared to existing counterfactual explanation methods.
Read more
Latent Confounded Causal Discovery via Lie Bracket Geometry
Sridhar Mahadevan
Graph Learning Theory Optimization
  • Introduces BRIDGE and SKFM algorithms for causal discovery under latent confounding.
  • Establishes that latent confounding obstructs coherent causal information transport.
  • Demonstrates high performance on synthetic data while exposing limitations on real datasets.
  • Combines geometric insights with causal inference to enhance discovery methods.
Read more
Alzheimer's Disease Diagnosis using a Multimodal Approach with 3D MRI and PET
Loukas Ilias, Anthi-Maria Vozinaki, Christos Ntanos, Dimitris Askounis
Multimodal
  • Introduces a novel multimodal approach for Alzheimer's diagnosis using 3D MRI and PET.
  • Utilizes three fusion strategies and a Mixture-of-Experts classifier for improved adaptability and performance.
  • Achieves high classification accuracies across multiple diagnostic tasks.
  • Employs Grad-CAM for model interpretability, enhancing trust in clinical applications.
Read more
Tracking Representation Dynamics in Large Language Models with Persistent Homology
Naman Malhotra, Jay Ambadkar, Abhinav Gupta, Kushal Kasivel, Abbas Schwarz, Kamillo Ferry, Anthea Monod
NLP Large Language Models Interpretability
  • Persistent homology reveals significant topological changes in LLM representations during early training stages.
  • Different alignment objectives produce distinguishable topological trajectories despite similar behavioral outcomes.
  • Instruction-tuned and pretrained models show qualitatively different evolution patterns in their representations.
  • The study emphasizes the importance of understanding internal representation dynamics beyond behavioral metrics.
Read more
Why SWAVE May Not Be All You Need:A Concept-Evolution Retrospective on Complex-Valued Recurrent Language Models
Ramprasath Ganesaraja, Swathika N, Sahil Dilip Panse
NLP Large Language Models Theory
  • SWAVE is a complex-valued recurrent language model that aims to retain information over long contexts without decay.
  • The model underwent three development phases, addressing structural issues and refining its architecture.
  • Key components like ComplexNorm and Wave Propagation Scan were retained, while ineffective concepts were discarded.
  • The paper introduces a formal characterization of 'cos-domination collapse' and provides engineering principles for complex-valued training.
Read more
GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents
Zhe Ren, Yibo Yang, Yimeng Chen, Zijun Zhao, Benshuo Fu, Zhihao Shu, Bingjie Zhang, Yangyang Xu, Dandan Guo, Shuicheng Yan
Large Language Models NLP
  • GATEMEM addresses the lack of benchmarks for multi-principal shared-memory environments.
  • The benchmark evaluates memory agents on utility, access control, and active forgetting.
  • No existing methods achieve optimal performance across all governance dimensions.
  • Long-context prompting provides the best governance score but at a high token cost.
Read more
Thermodynamic Signatures of Reasoning: Free-Energy and Spectral-Form-Factor Diagnostics for Hallucination Detection in Large Language Models
Salim Khazem
NLP Large Language Models Theory
  • Introduction of Free-Energy Signatures (FES) for hallucination detection in LLMs.
  • FES utilizes thermodynamic potentials and random-matrix theory to analyze attention Laplacians.
  • Theoretical results demonstrate stability, expressiveness, and PAC bounds for FES.
  • Empirical results show FES significantly outperforms existing spectral diagnostics in detecting hallucinations.
Read more
Trainable Photonic Measurement for Physics-Informed PDE Learning
Jiale Linghu, Hao Dong, Yangshuai Wang
Theory Optimization Efficient ML
  • Introduction of photonic quantum neural fields for physics-informed learning.
  • Demonstrated significant performance improvements in solving PDEs compared to classical methods.
  • Lower error rates achieved with fewer parameters in challenging regimes.
  • Stability of Fock-probability measurements under noise conditions.
Read more
Do Time Series Foundation Model Benchmarks Hide Regime-Dependent Failures? Evidence from Traffic Speed Forecasting
Yingshuo Wang, Xian Sun, Lingdong Kong, Wei Gao, Yanhang Li, Zhichao Fan, Zexin Zhuang
Time Series
  • Aggregate metrics can mask severe regime-dependent failures in TSFMs.
  • Transition-regime MAE is significantly higher than overall MAE, indicating critical performance issues.
  • A historical conditional baseline outperforms TSFMs in transition coverage but not in overall accuracy.
  • Bimodal mixture augmentation (BMA) improves transition coverage while preserving TSFM accuracy.
Read more
Execution-State Capsules: Graph-Bound Execution-State Checkpoint and Restore for Low-Latency, Small-Batch, On-Device Physical-AI Serving
Liang Su
NLP Reinforcement Learning Robotics
  • FlashRT provides a low-latency execution environment for on-device AI applications.
  • The execution-state capsule enables efficient checkpointing and restoring of execution states.
  • The proposed system achieves significant speedups in time-to-first-token compared to existing methods.
  • The design focuses on single-stream, low-latency interactions, making it suitable for real-time applications.
Read more
Emyx: Fast and efficient all-atom protein generation
Nicholas J. Williams, Ward Haddadin, Matteo P. Ferla, Constantin Schneider, Nicholas B. Woodall, Ruby Sedgwick, Christian D. Madsen, Andrew L. Hopkins, Edward O. Pyzer-Knapp
Generative Models Efficient ML
  • Emyx simplifies the architecture for all-atom protein generation, reducing training costs and improving diversity.
  • The model outperforms existing state-of-the-art methods in enzyme design benchmarks.
  • Emyx achieves high accuracy in global fold recovery and catalytic geometry while being computationally efficient.
Read more
Zero-Shot Active Feature Acquisition via LLM-Elicitation
Binyamin Perets, Natalie Mendelson, Shiran Vainberg, Yehuda Chowers, Shai Shen-Orr, Shie Mannor
Large Language Models Optimization Theory
  • Introduces a framework for Zero-Shot Active Feature Acquisition using LLMs.
  • Focuses on eliciting unary deviations and pairwise co-variations as sufficient statistics.
  • Demonstrates effectiveness in binary classification and top-k identification tasks.
  • Outperforms traditional AFA methods in challenging medical scenarios.
Read more
Recurrent neural networks approximate continuous functions
Valentin Abadie, Clemens Hutter, Helmut Bölcskei
Theory
  • Introduces the TMNU model to facilitate the approximation of continuous functions using RNNs.
  • Proves that a single ReLU RNN can uniformly approximate any continuous function on [-1, 1] with fixed weights.
  • Establishes convergence rates that align with polynomial approximation rates.
  • Demonstrates that runtime is a necessary resource in the fixed-network approximation paradigm.
Read more