AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

48 Papers today
8h Update frequency
7 Days of history
Randomly Initialized Networks Can Learn from Peer-to-Peer Consensus
Esteban Rodríguez-Betancourt, Edgar Casasola-Murillo
Theory Efficient ML Computer Vision
  • Self-distillation can lead to significant improvements in representation learning even without complex mechanisms.
  • A minimal setup with randomly initialized networks can outperform random baselines on tasks like CIFAR-10 classification.
  • Learning dynamics are sensitive to hyperparameters such as learning rate and architecture.
  • The proposed method avoids representational collapse, maintaining stability during training.
Read more
SetFlow: Generating Structured Sets of Representations for Multiple Instance Learning
Nikola Jovišić, Milica Škipina, Vanja Švenda
Generative Models Computer Vision Multimodal
  • SetFlow effectively models entire MIL bags, capturing intra-bag dependencies.
  • The architecture combines flow matching with a Set Transformer design for permutation-invariant inputs.
  • Evaluation on mammography data shows improved performance in classification tasks.
  • Synthetic data generated by SetFlow can compete with real data, highlighting its utility in data-scarce scenarios.
Read more
Forecasting Ionospheric Irregularities on GNSS Lines of Sight Using Dynamic Graphs with Ephemeris Conditioning
Mert Can Turkmen, Eng Leong Tan, Yee Hui Lee
Graph Learning Time Series Optimization
  • Introduces a dynamic graph model for ionospheric forecasting, addressing limitations of gridded data.
  • Employs ephemeris conditioning to leverage predictable satellite trajectories for improved forecasting.
  • Achieves significant performance improvements over traditional persistence models in predicting ionospheric irregularities.
  • Demonstrates the model's robustness under simulated coverage dropout through spatial message passing.
Read more
Representation Before Training: A Fixed-Budget Benchmark for Generative Medical Event Models
Inhyeok Lee, Luke Solo, Michael C. Burkhart, Bashar Ramadan, William F. Parker, Brett K. Beaulieu-Jones
Generative Models Time Series Interpretability
  • Fused code-value tokenization yields significant improvements in clinical outcome predictions.
  • Decile-based quantization is more effective than finer bins under a one-epoch training budget.
  • Event order and admission-relative RoPE encoding can replace time tokens without loss of performance.
  • CLIF remapping maintains model performance while providing a smaller, interpretable token set for multi-site use.
Read more
HEALing Entropy Collapse: Enhancing Exploration in Few-Shot RLVR via Hybrid-Domain Entropy Dynamics Alignment
Zhanyu Liu, Qingguo Hu, Ante Wang, Chenqing Liu, Zhishang Xiang, Hui Li, Delai Qiu, Jinsong Su
Reinforcement Learning Large Language Models NLP
  • HEAL addresses entropy collapse in few-shot RLVR, enhancing exploration diversity.
  • The framework incorporates high-value general-domain data to improve reasoning patterns.
  • Entropy Dynamics Alignment (EDA) aligns entropy dynamics between target and general domains.
  • HEAL achieves performance comparable to full-shot RLVR with significantly fewer samples.
Read more
Covariance-Based Structural Equation Modeling in Small-Sample Settings with p > n
Hiroki Hasegawa, Aoba Tamura, Yukihiko Okada
Theory
  • Introduces a novel estimation principle for covariance-based SEM in small-sample settings with p > n.
  • Reformulates covariance structures into self-covariance and cross-covariance components.
  • Demonstrates improved stability in estimating structural parameters' sign and direction.
  • Validates the proposed method through experiments on synthetic and real-world data.
Read more
Detecting and Suppressing Reward Hacking with Gradient Fingerprints
Songtao Wang, Quang Hieu Pham, Fangcong Yin, Xinpeng Wang, Jocelyn Qiaochu Chen, Greg Durrett, Xi Ye
Reinforcement Learning Large Language Models Interpretability
  • Introduction of Gradient Fingerprint (GRIFT) for detecting reward hacking in RLVR.
  • GRIFT outperforms existing methods like CoT Monitor and TRACE by over 25% in detection accuracy.
  • Integration of GRIFT into training processes can suppress reward hacking and improve task performance.
  • The method utilizes gradient-based representations to assess the quality of reasoning traces.
Read more
Efficient Federated RLHF via Zeroth-Order Policy Optimization
Deyi Wang, Qining Zhang, Lei Ying
Reinforcement Learning Federated Learning Efficient ML
  • Par-S2ZPO is designed for federated RLHF, focusing on efficiency in communication, computation, and memory.
  • The algorithm employs zeroth-order optimization with binary perturbations to reduce resource requirements.
  • Theoretical analysis establishes a convergence rate that is competitive with centralized methods.
  • Empirical results show Par-S2ZPO outperforms traditional FedAvg-based RLHF methods in various tasks.
Read more
Evaluating Temporal and Structural Anomaly Detection Paradigms for DDoS Traffic
Yasmin Souza Lima, Rodrigo Moreira, Larissa F. Rodrigues Moreira, Tereza Cristina M. de B. Carvalho, Flávio de Oliveira Silva
Time Series
  • Proposes a decision framework for selecting between temporal and structural features in DDoS detection.
  • Utilizes lag-1 autocorrelation and PCA cumulative explained variance as diagnostic tools.
  • Demonstrates that structural features often outperform temporal features in DDoS detection.
  • Focuses on the representation of traffic data rather than the choice of detection algorithms.
Read more
LoRaQ: Optimized Low Rank Approximation for 4-bit Quantization
Yann Bouquet, Alireza Khodamoradi, Sophie Yáng Shen, Kristof Denolf, Mathieu Salzmann
Generative Models Efficient ML Optimization
  • LoRaQ enables fully sub-16 bit quantization, eliminating the need for high-precision branches.
  • The proposed method uses a data-free calibration approach to optimize quantization error compensation.
  • LoRaQ outperforms existing state-of-the-art quantization methods in terms of generative performance.
  • The authors release an open-source PTQ library to support diverse quantization schemes.
Read more
UniMamba: A Unified Spatial-Temporal Modeling Framework with State-Space and Attention Integration
Xingsheng Chen, Xianpei Mu, Deyu Yi, Yilin Yuan, Xingwei He, Bo Gao, Regina Zhang, Pietro Lio, Siu-Ming Yiu
Time Series
  • UniMamba is the first framework to combine state-space dynamics with spatial-temporal attention.
  • The Mamba Variate–Channel Encoding Layer incorporates FFT-Laplace transform and TCN for efficient modeling.
  • UniMamba consistently outperforms Transformer, MLP, and Mamba-based models in accuracy and scalability.
  • The framework effectively captures both global temporal patterns and cross-variable relationships.
Read more
Federated Learning with Quantum Enhanced LSTM for Applications in High Energy Physics
Abhishek Sawaika, Durga Pritam Suggisetti, Udaya Parampalli, Rajkumar Buyya
Federated Learning Theory Efficient ML
  • Introduction of a hybrid quantum-classical LSTM model (QLSTM) for efficient learning in HEP applications.
  • Implementation of a federated learning framework to distribute the learning workload across local servers.
  • Demonstrated significant performance improvements with reduced data and resource requirements compared to baseline models.
  • Achieved comparable results to classical deep learning methods with a model that has less than 300 parameters.
Read more
ECG-Lens: Benchmarking ML & DL Models on PTB-XL Dataset
Saloni Garg, Ukant Jadia, Amit Sagtani, Kamal Kant Hiran
Time Series
  • Comparison of traditional ML and advanced DL models for ECG classification.
  • Use of raw ECG signals and SWT for data augmentation to improve model performance.
  • ECG-Lens model achieved 80% accuracy and 90% ROC-AUC, outperforming traditional methods.
  • Demonstrates the potential of deep learning in enhancing automated ECG analysis.
Read more
Scalable Neighborhood-Based Multi-Agent Actor-Critic
Tim Goppelsroeder, Rasmus Jensen
Reinforcement Learning
  • Introduction of MADDPG-K, a scalable extension of MADDPG.
  • Critic input is limited to k nearest agents, reducing computational complexity.
  • Empirical results demonstrate competitive performance and faster convergence.
  • Method shows better runtime scaling with an increasing number of agents.
Read more
DPrivBench: Benchmarking LLMs' Reasoning for Differential Privacy
Erchi Wang, Pengrun Huang, Eli Chien, Om Thakkar, Kamalika Chaudhuri, Yu-Xiang Wang, Ruihan Wu
Large Language Models Theory
  • DPrivBench is a novel benchmark for assessing LLMs' reasoning on differential privacy.
  • The benchmark includes 720 instances, covering both foundational and advanced DP topics.
  • Current LLMs perform well on basic DP mechanisms but struggle with advanced algorithms.
  • Integrating external references can improve LLM accuracy in DP reasoning.
Read more
Tabular foundation models for in-context prediction of molecular properties
Karim K. Ben Hicham, Jan G. Rittig, Martin Grohe, Alexander Mitsos
Efficient ML
  • TFMs enable in-context learning for molecular property prediction without task-specific fine-tuning.
  • Combining TFMs with CheMeleon embeddings yields significant performance improvements.
  • Molecular representation is crucial for TFM effectiveness, outperforming traditional fingerprints.
  • TFMs reduce computational costs compared to conventional fine-tuning methods.
Read more
The Spectral Geometry of Thought: Phase Transitions, Instruction Reversal, Token-Level Dynamics, and Perfect Correctness Prediction in How Transformers Reason
Yi Liu
NLP Large Language Models Interpretability
  • Large language models exhibit spectral phase transitions during reasoning tasks.
  • Instruction tuning reverses the spectral geometry of reasoning in models.
  • A taxonomy of generation dynamics categorizes models into expansion, compression, and equilibrium.
  • Spectral properties can predict reasoning correctness with high accuracy.
Read more
Univariate Channel Fusion for Multivariate Time Series Classification
Fernando Moro, Vinicius M. A. Souza
Time Series Efficient ML
  • Introduction of Univariate Channel Fusion (UCF) as a lightweight, classifier-agnostic method for MTSC.
  • UCF transforms multivariate time series into a univariate format using simple fusion techniques.
  • Demonstrated competitive accuracy and high efficiency in five diverse real-world case studies.
  • UCF is particularly effective in scenarios with high inter-channel correlation.
Read more
Reasoning-targeted Jailbreak Attacks on Large Reasoning Models via Semantic Triggers and Psychological Framing
Zehao Wang, Lanjun Wang
NLP Large Language Models
  • Introduction of reasoning-targeted jailbreak attacks that compromise the reasoning process of LRMs without changing final answers.
  • Development of the PRJA Framework, which combines semantic analysis and psychological principles to manipulate reasoning steps.
  • Demonstration of high attack success rates (83.6%) against several commercial LRMs, indicating serious security concerns.
  • Emphasis on the importance of safeguarding the reasoning process in sensitive applications like healthcare and education.
Read more
The Topological Trouble With Transformers
Michael C. Mozer, Shoaib Ahmed Siddiqui, Rosanne Liu
NLP Large Language Models Theory
  • Transformers' feedforward architecture limits their ability to track dynamic states effectively.
  • State tracking is crucial for language understanding and reasoning, yet transformers often fail in this regard.
  • The authors propose a taxonomy for recurrent and continuous-thought transformer architectures.
  • Dynamic depth models and externalized state representations are computationally inefficient solutions.
Read more
TwinTrack: Post-hoc Multi-Rater Calibration for Medical Image Segmentation
Tristan Kirscher, Alexandra Ertl, Klaus Maier-Hein, Xavier Coubez, Philippe Meyer, Sylvain Faisan
Computer Vision
  • TwinTrack provides a framework for post-hoc calibration of segmentation probabilities in ambiguous medical imaging tasks.
  • The method utilizes isotonic regression to align predictions with the empirical mean human response (MHR).
  • TwinTrack outperforms traditional calibration methods in terms of calibration metrics and segmentation accuracy.
  • The approach is robust to inter-rater disagreement, providing meaningful probabilistic interpretations of segmentation outputs.
Read more
Applications of deep generative models to DNA reaction kinetics and to cryogenic electron microscopy
Chenwei Zhang
Generative Models Graph Learning Multimodal
  • Introduction of ViDa, a deep learning framework for DNA reaction kinetics analysis.
  • Development of Struc2mapGAN for synthesizing high-fidelity cryo-EM density maps.
  • Proposal of improved evaluation metrics for protein structure modeling from cryo-EM maps.
  • Integration of structural embeddings with cryo-EM data using CryoSAMU.
Read more
Towards a Data-Parameter Correspondence for LLMs: A Preliminary Discussion
Ou Wu
Large Language Models Theory Efficient ML
  • Establishes a unified framework linking data-centric and model-centric optimization methods.
  • Identifies three key correspondences: geometric, low-rank, and security-privacy.
  • Demonstrates that cooperative optimization can outperform isolated approaches.
  • Encourages collaboration between data and parameter research communities.
Read more
A Systematic Survey and Benchmark of Deep Learning for Molecular Property Prediction in the Foundation Model Era
Zongru Li, Xingsheng Chen, Honggang Wen, Regina Qianru Zhang, Ming Li, Xiaojin Zhang, Hongzhi Yin, Qiang Yang, Kwok-Yan Lam, Pietro Lio, Siu-Ming Yiu
Graph Learning Multimodal Theory
  • The paper categorizes molecular property prediction methods into four paradigms: Quantum, Descriptor ML, Geometric Deep Learning, and Foundation Models.
  • It highlights the need for improved benchmark designs to address challenges in data curation and evaluation protocols.
  • The authors propose three forward-looking research directions to enhance molecular property prediction methodologies.
  • A comprehensive meta-analysis of over one hundred deep architectures reveals trends in performance across different molecular property prediction tasks.
Read more
Hallucination as Trajectory Commitment: Causal Evidence for Asymmetric Attractor Dynamics in Transformer Generation
G. Aytug Akarlar
NLP Large Language Models Generative Models
  • Hallucination in language models is linked to asymmetric attractor dynamics.
  • Same-prompt bifurcation isolates trajectory dynamics from prompt-level confounds.
  • Activation patching reveals significant asymmetry in the ability to corrupt versus correct trajectories.
  • The study identifies a strong correlation between prompt encoding and hallucination rates.
Read more
Beyond Distribution Sharpening: The Importance of Task Rewards
Sarthak Mittal, Leo Gagnon, Guillaume Lajoie
Reinforcement Learning Large Language Models Optimization
  • Introduces a controlled framework for comparing distribution sharpening and task-reward optimization in RL.
  • Demonstrates that task-reward optimization leads to significant performance improvements, especially on difficult tasks.
  • Challenges the notion that RL fine-tuning primarily enhances existing model preferences through sharpening.
  • Highlights the importance of task rewards in developing new capabilities in large language models.
Read more
Prior-Fitted Functional Flow: In-Context Generative Models for Pharmacokinetics
César Ojeda, Niklas Hartung, Wilhelm Huisinga, Tim Jahn, Purity Kamene Kavwele, Marian Klose, Piyush Kumar, Ramsés J. Sánchez, Darius A. Faroughy
Generative Models Time Series
  • Introduction of Prior-Fitted Functional Flows (PFF) for pharmacokinetics.
  • PFF enables zero-shot predictions and individual forecasting from sparse data.
  • A new open-access dataset was created to calibrate physiological plausibility.
  • PFF outperforms traditional NLME models and the AICMET model in predictive accuracy.
Read more
SCRIPT: Implementing an Intelligent Tutoring System for Programming in a German University Context
Alina Deriyeva, Jesper Dannath, Benjamin Paassen
Generative Models NLP Theory
  • SCRIPT is designed to support Python programming education while conforming to European regulatory standards.
  • The system functions as both a teaching tool and a research platform for ITS development.
  • It employs a four-model architecture to facilitate personalized learning experiences.
  • Initial implementation has been successfully used in a data mining course for exam preparation.
Read more
FL-MHSM: Spatially-adaptive Fusion and Ensemble Learning for Flood-Landslide Multi-Hazard Susceptibility Mapping at Regional Scale
Aswathi Mundayatt, Jaya Sreevalsan-Nair
Theory Interpretability Optimization
  • Introduces a spatially adaptive modeling approach for multi-hazard susceptibility mapping.
  • Combines Early Fusion and Late Fusion techniques with a Mixture of Experts model.
  • Demonstrates improved predictive performance for flood and landslide susceptibility in diverse regions.
  • Highlights the importance of spatial heterogeneity in hazard susceptibility analysis.
Read more
Placing Puzzle Pieces Where They Matter: A Question Augmentation Framework for Reinforcement Learning
Yangyi Fang, Jiaye Lin, Xiaoliang Fu, Cong Qin, Haolin Shi
Reinforcement Learning Large Language Models
  • Introduction of PieceHint, a framework for strategic hint injection in RL training.
  • Focus on identifying critical reasoning steps to enhance model learning.
  • Progressive withdrawal of hints promotes independent reasoning capabilities.
  • Experimental validation shows competitive performance against larger models.
Read more
Hybrid Spectro-Temporal Fusion Framework for Structural Health Monitoring
Jongyeop Kim, Jinki Kim, Doyun Lee
Time Series
  • Introduction of a Hybrid Spectro-Temporal Fusion framework for SHM.
  • Integration of Spectro-Temporal Alignment and Hybrid Spectro-Temporal Fusion for improved vibration analysis.
  • Demonstrated superior performance of the proposed framework over conventional methods.
  • Temporal resolution significantly impacts the performance of machine learning models.
Read more
Sequential KV Cache Compression via Probabilistic Language Tries: Beyond the Per-Vector Shannon Limit
Gregory Magarshak
NLP Large Language Models Theory
  • Introduces sequential KV compression, addressing the limitations of per-vector compression methods.
  • Proposes a two-layer architecture: probabilistic prefix deduplication and predictive delta coding.
  • Achieves a theoretical compression ratio of approximately 914,000× over TurboQuant at the Shannon limit.
  • Demonstrates that compression efficiency improves with increasing context length.
Read more
Wasserstein Distributionally Robust Risk-Sensitive Estimation via Conditional Value-at-Risk
Feras Al Taha, Eilyan Bitar
Optimization Theory Time Series
  • Introduces a distributionally robust approach to risk-sensitive estimation using CVaR.
  • Establishes a framework for minimizing worst-case CVaR over a type-2 Wasserstein ambiguity set.
  • Derives a tractable semidefinite programming formulation for computing affine estimators.
  • Demonstrates improved performance in electricity price forecasting compared to traditional methods.
Read more
Learning Affine-Equivariant Proximal Operators
Oriel Savir, Zhenghan Fang, Jeremias Sulam
Optimization Computer Vision Theory
  • Introduction of AE-LPNs that compute exact proximal operators while being equivariant to shifts and scaling.
  • Demonstration of the importance of equivariance in enhancing robustness to noise and affine transformations.
  • Development of conditions for ensuring affine-equivariance in neural network architectures.
  • Validation of AE-LPNs through both synthetic examples and real-world denoising tasks.
Read more
Impact of Nonlinear Power Amplifier on Massive MIMO: Machine Learning Prediction Under Realistic Radio Channel
Marcin Hoffmann, Paweł Kryszkiewicz
Theory Optimization Efficient ML
  • Nonlinear effects of power amplifiers in M-MIMO systems are significant and often overlooked in existing literature.
  • The paper proposes a statistical model and a machine learning model to predict Signal to Distortion Ratio (SDR) under realistic conditions.
  • The ML-based power allocation scheme demonstrates a 12% median gain in user throughput compared to fixed operating point schemes.
  • 3D-Ray Tracing simulations reveal the inadequacy of traditional channel models in accurately capturing nonlinear distortion effects.
Read more
Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling
Fei Wang, Li Shen, Liang Ding, Chao Xue, Ye Liu, Changxing Ding
NLP Large Language Models Optimization Efficient ML
  • AdaLeZO improves ZO optimization by addressing layer sensitivity and computational inefficiencies.
  • The framework uses a Multi-Armed Bandit approach for dynamic perturbation allocation.
  • Inverse Probability Weighting is employed to ensure unbiased gradient estimation.
  • Extensive experiments show significant speedups in wall-clock time without sacrificing accuracy.
Read more
Demystifying the unreasonable effectiveness of online alignment methods
Enoch Hyunwook Kang
Theory Reinforcement Learning Efficient ML
  • Introduces temperature-zero regret as a criterion focusing on the top-ranked response.
  • Proves that greedy online alignment methods achieve bounded (O(1)) cumulative temperature-zero regret.
  • Clarifies that prior logarithmic-regret results are driven by policy randomization rather than failure to identify the best response.
  • Demonstrates the effectiveness of greedy alignment methods in practical applications.
Read more
Modern Structure-Aware Simplicial Spatiotemporal Neural Network
Zhaobo Hu, Vincent Gauthier, Mehdi Naima
Graph Learning Time Series Theory
  • First approach to utilize high-dimensional simplicial complexes for spatiotemporal modeling.
  • Combines spatiotemporal random walks with Temporal Convolutional Networks for improved efficiency.
  • Demonstrates effectiveness across diverse real-world datasets in energy, environmental, and transportation sectors.
  • Achieves competitive performance in both prediction and data imputation tasks.
Read more
QuantSightBench: Evaluating LLM Quantitative Forecasting with Prediction Intervals
Jeremy Qin, Maksym Andriushchenko
NLP Large Language Models Time Series
  • Introduction of QuantSightBench as a benchmark for evaluating LLMs in quantitative forecasting.
  • Proposed use of prediction intervals as a more rigorous evaluation format compared to point estimates.
  • Evaluation of multiple LLMs shows none achieve the 90% coverage target for prediction intervals.
  • Identified systematic overconfidence and calibration issues across evaluated models.
Read more
Generalization Boundaries of Fine-Tuned Small Language Models for Graph Structural Inference
Michal Podstawski
Graph Learning Large Language Models NLP
  • Fine-tuned SLMs exhibit strong ordinal consistency across different graph families.
  • Structural reasoning performance degrades gracefully with increasing graph size, with architecture-specific degradation profiles.
  • Adjacency-list serialization is more effective than edge-list encoding, especially for larger graphs.
  • Node-level properties are estimated most reliably, while global properties pose significant inference challenges.
Read more
Training Time Prediction for Mixed Precision-based Distributed Training
Minchul Kang, Changyong Shin, Jinwoo Jeong, Hyunho Lee, Younghun Go, Gyeongmin Kim, Gyeongsik Yang, Chuck Yoo
Efficient ML
  • Floating-point precision significantly affects training time, with variations up to 2.4x.
  • Existing prediction methods fail to account for precision variations, leading to high prediction errors.
  • The proposed precision-aware predictor achieves a MAPE of 9.8%, significantly improving accuracy.
  • The methodology incorporates operator-level precision and communication overheads for better predictions.
Read more
Late Fusion Neural Operators for Extrapolation Across Parameter Space in Partial Differential Equations
Eva van Tegelen, Taniya Kapoor, George A.K. van Voorn, Peter van Heijster, Ioannis N. Athanasiadis
Theory Interpretability Optimization
  • Introduction of Late Fusion Neural Operator architecture for improved extrapolation in PDEs.
  • Separation of state dynamics and parameter effects enhances generalization capabilities.
  • Significant performance improvements over existing neural operator methods.
  • Comprehensive benchmarking across diverse PDE problems.
Read more
Evaluating quality in synthetic data generation for large tabular health datasets
Jean-Baptiste Escudié, Benjamin Barnes, Stefan Meisegeier, Klaus Kraywinkel, Fabian Prasser, Nils Körber
Generative Models
  • Introduces a systematic evaluation framework for synthetic data generation in health datasets.
  • Evaluates seven machine learning models on four datasets with varying scales.
  • Proposes a methodology for assessing fidelity in synthesized joint distributions.
  • Highlights challenges in maintaining medical domain adherence during data synthesis.
Read more
Hierarchical Active Inference using Successor Representations
Prashant Rangarajan, Rajesh P. N. Rao
Reinforcement Learning Robotics Theory
  • Introduces a hierarchical model of active inference that leverages successor representations for efficient planning.
  • Demonstrates the ability to learn higher-level abstract states and actions from lower-level representations.
  • Validates the approach on multiple planning and reinforcement learning tasks, showing improved efficiency.
  • Addresses scalability challenges in active inference by utilizing a state-action hierarchy.
Read more
How Much Cache Does Reasoning Need? Depth-Cache Tradeoffs in KV-Compressed Transformers
Xiao Wang
NLP Large Language Models Theory
  • Conjectured product depth lower bound for k-hop reasoning in Transformers.
  • Establishment of a bandwidth barrier that limits depth lower bounds in high-precision settings.
  • Two-regime error analysis revealing significant differences in performance between adaptive and oblivious cache strategies.
  • Identification of an open problem related to closing the gap between conjectured and proven bounds.
Read more
PINNACLE: An Open-Source Computational Framework for Classical and Quantum PINNs
Shimon Pisnoy, Hemanth Chandravamsi, Ziv Chen, Aaron Goldgewert, Gal Shaviner, Boris Shragner, Steven H. Frankel
Theory Optimization Efficient ML
  • PINNACLE is an open-source framework that integrates classical and quantum PINNs.
  • The framework supports advanced training strategies and multi-GPU acceleration.
  • A comprehensive benchmark study quantifies the impact of various architectural and training enhancements.
  • The results highlight the sensitivity of PINNs to design choices and their computational costs compared to classical methods.
Read more
Neural Garbage Collection: Learning to Forget while Learning to Reason
Michael Y. Li, Jubayer Ibn Hamid, Emily B. Fox, Noah D. Goodman
NLP Large Language Models Reinforcement Learning Efficient ML
  • Introduces Neural Garbage Collection (NGC) for efficient KV cache management in language models.
  • Enables end-to-end learning of memory management and reasoning through reinforcement learning.
  • Achieves 2-3x peak KV cache size compression while maintaining strong accuracy.
  • Eliminates the need for supervised fine-tuning or proxy objectives in training.
Read more
ProtoTTA: Prototype-Guided Test-Time Adaptation
Mohammad Mahdi Abootorabi, Parvin Mousavi, Purang Abolmaesumi, Evan Shelhamer
Computer Vision NLP Interpretability
  • ProtoTTA enhances robustness of prototype-based models during distribution shifts.
  • The framework minimizes entropy of prototype-similarity distributions for confident activations.
  • Geometric filtering is used to stabilize updates by focusing on reliable samples.
  • Experiments show improved performance across diverse benchmarks compared to standard methods.
Read more