AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

48 Papers today
8h Update frequency
7 Days of history
Failed Reasoning Traces Tell You What Is Fixable (But Not by Reading Them)
Nizar Islah, Istabrak Abbes, Irina Rish, Sarath Chandar, Eilif B. Muller
NLP Large Language Models Interpretability
  • Failed reasoning traces encode recoverability structure that can guide effective interventions.
  • Three trajectory features derived from failed traces help cluster failures and characterize post-training methods.
  • A training-free routing rule based on these features improves recovery rates on challenging reasoning problems.
  • The approach allows for diagnostic analysis without requiring access to training-time data or model weights.
Read more
A Geometric Characterization of the Stationary Plateau for Two-Layer Neural Networks
Tian Ding, Dawei Li, Ruoyu Sun
Theory Optimization
  • Introduces a geometric characterization of stationary plateaus in two-layer neural networks.
  • Classifies stationary points on plateaus based on conditions of local minima and saddle points.
  • Identifies the 'inner Hessian' matrix as a key factor in determining local geometry.
  • Demonstrates how neuron splitting affects the nature of stationary points in wider networks.
Read more
QuBLAST: A Framework for Quantizing Large Language Models with Block-Level Compression Approach and Activation Scaling Strategy
Pasindu Wickramasinghe, Achyuta Muthuvelan, Rachmad Vidya Wicaksana Putra, Minghao Shao, Muhammad Shafique
NLP Large Language Models Efficient ML
  • QuBLAST introduces a block-level compression approach for mixed-precision quantization of LLMs.
  • The activation scaling strategy effectively mitigates the impact of activation outliers.
  • Experimental results show a significant reduction in model size while maintaining performance.
  • QuBLAST is applicable to various LLM architectures, including those with non-conventional attention mechanisms.
Read more
Two-Action Apple Tasting with Switching Costs
Tommaso Cesari, Roberto Colomboni
Theory Optimization
  • The two-action apple-tasting problem is analyzed with switching costs against an oblivious adversary.
  • The expected minimax regret is established as Θ(√T), contradicting previous assumptions of a Ω(T^(2/3)) lower bound.
  • A normalized formulation of the problem simplifies the analysis and allows for a more straightforward algorithm design.
  • The proposed algorithm utilizes a simple alternating strategy between blind and inspection modes to achieve O(√T) regret.
Read more
STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations
Rishit Dagli, Abir Harrasse, Luke Zhang, Florent Draye, Amirali Abdullah, Bernhard Schölkopf, Zhijing Jin
NLP Large Language Models Interpretability
  • STRIDE models TDA as a sparse recovery problem in activation space, bypassing the need for retraining LLMs.
  • The framework uses lightweight steering operators to mimic changes in model predictions caused by training data subsets.
  • STRIDE outperforms existing TDA methods in terms of accuracy and computational efficiency.
  • The approach facilitates practical applications such as data selection and contamination detection.
Read more
Multi-Modal Graph Neural Network with Transformer-Guided Adaptive Diffusion for Preclinical Alzheimer Classification
Jaeyoon Sim, Minjae Lee, Guorong Wu, Won Hwa Kim
Graph Learning Multimodal Interpretability
  • Introduction of GTAD framework that combines diffusion processes with transformer-guided attention.
  • Improved classification performance for preclinical Alzheimer's disease using multi-modal imaging data.
  • Enhanced interpretability in identifying critical brain regions associated with Alzheimer's disease.
  • Demonstration of the model's effectiveness on structural brain networks from the ADNI study.
Read more
When Graph Tokens Sink: A Mechanistic Analysis of Graph Language Models
Ding Zhang, Runtao Zhou, Wenqing Zheng, Rizal Fathony, Bayan Bruss, Chirag Agarwal
Large Language Models Graph Learning Interpretability
  • Graph sink tokens emerge as activation-level outliers but do not effectively convey graph information.
  • High activation levels of graph tokens do not correlate with their importance for downstream tasks.
  • Current GLMs exhibit a decoupling between token saliency and semantic utility, indicating potential architectural limitations.
Read more
AdaWeather: Adaptively Mixing Probabilistic Weather Forecasts with Logarithmic Regret
Saptarishi Dhanuka, Sarvesh Iyer, Manmeet Singh, Mihir More, Rushil Gupta, Dhruman Gupta, Parthasarathi Mukhopadhyay, Sandeep Juneja
Time Series Theory Optimization
  • Introduces AdaWeather, a framework combining probabilistic weather forecasts adaptively.
  • Achieves logarithmic regret compared to the best static mixture of experts.
  • Utilizes a U-Net model for historical pattern learning to enhance forecast accuracy.
  • Demonstrates improved performance in temperature forecasting over existing methods.
Read more
A Goal-Set Characterization of Task Composition in the Boolean Task Algebra
Eduardo Terrés-Caballero, Herke van Hoof
Reinforcement Learning Theory Efficient ML
  • Formalization of a representational limitation in the BTA framework, reducing training costs from O(log2 |G|) to constant.
  • Introduction of a new method for task composition that relies on goal sets and requires only array lookups.
  • Empirical validation showing that additional base tasks do not enhance performance upon convergence.
  • Identification of limitations in deterministic BTA composition when applied to stochastic MDPs, necessitating consideration of exponentially many policies.
Read more
Edge of Stability Selectively Shapes Learning Across the Data Distribution
Shauna Kwag, Anakha Ganesh, Tomaso Poggio, Pierfrancesco Beneventano
Optimization Theory
  • EoS is a selective mechanism that redistributes learning across the training data distribution.
  • Two necessary conditions for benefiting from EoS are gradient alignment with the top Hessian eigenvector and sustained gradient magnitude.
  • Controlled perturbations can isolate the effects of alignment and persistence on learning outcomes.
  • The geometric composition of the training data influences which subsets benefit from EoS, affecting generalization behavior.
Read more
Learning Empirically Admissible Neural Heuristics for Combinatorial Search
Siddharth Sahay
Reinforcement Learning Optimization
  • Introduces a framework for learning admissible neural heuristics that guarantees path optimality.
  • Utilizes an underestimating Admissible Bellman Operator and an Asymmetric Loss function to prevent overestimations.
  • Implements a post-hoc calibration safety offset to ensure empirical admissibility.
  • Achieves significant reductions in search node expansions across various combinatorial puzzles.
Read more
Reinforcement Learning from Rich Feedback with Distributional DAgger
Rishabh Agrawal, Jacob Fein-Ashley, Paria Rashidinejad
Reinforcement Learning Large Language Models Theory
  • DistIL leverages rich feedback beyond binary correctness, improving credit assignment in reinforcement learning.
  • The proposed method guarantees monotonic policy improvement, addressing limitations of existing self-distillation techniques.
  • Empirical results indicate significant performance gains across diverse domains compared to traditional RLVR and self-distillation methods.
Read more
Dynamic Multi-Pair Trading Strategy in Cryptocurrency Markets with Deep Reinforcement Learning
Damian Lebiedź, Robert Ślepaczuk
Reinforcement Learning Time Series Optimization
  • Introduction of a hybrid trading strategy combining statistical arbitrage with DRL execution policies.
  • Development of a hierarchical pair selection methodology to isolate high-conviction anomalies.
  • Demonstration of significant outperformance of the DRL-enhanced strategy over traditional heuristics.
  • Establishment of a framework for safe reinforcement learning via deterministic shielding.
Read more
ConTraIRL: Factorized Contrastive Abstractions for Transferable IRL
Yikang Gui, Bikramjit Banerjee, Prashant Doshi
Reinforcement Learning Robotics
  • Introduces a dual-encoder architecture for factorized representations of dynamics and goals.
  • Utilizes a dual contrastive objective to enhance reward transfer in IRL.
  • Demonstrates effective few-shot transfer capabilities to unseen dynamics-goal combinations.
  • Empirical results show significant improvements over traditional IRL methods.
Read more
dMX: Differentiable Mixed-Precision Assignment for Low-Precision Floating-Point Formats
Giuseppe Franco, Ian Colbert, Pablo Monteagudo-Lago, Felix Marty, Nicholas Fraser
NLP Large Language Models Efficient ML
  • Introduces a differentiable framework for mixed-precision quantization of floating-point formats.
  • Utilizes continuous optimization to avoid abrupt transitions in bit-width assignments.
  • Implements a temperature-based annealing mechanism for discretizing learned bit-widths.
  • Achieves superior performance over traditional layer-selection heuristics in LLMs.
Read more
Multi-component Causal Tracing in Large Language Models
Zirui Yan, Dennis Wei, Dmitriy A. Katz, Prasanna Sattigeri, Ali Tajer
NLP Large Language Models Interpretability
  • Introduces a unified framework for multi-component causal tracing in LLMs.
  • Develops an efficient algorithm that converts combinatorial search into a continuous optimization problem.
  • Demonstrates the identification of critical model components that influence performance metrics.
  • Highlights the non-linear interactions among components, challenging previous linear assumptions.
Read more
Regime-Arrival Uncertainty in Generalization Bounds under Distribution Shift
Prince Poudel
Theory
  • Introduces a framework for analyzing generalization bounds under regime-switching environments.
  • Quantifies the risk due to regime composition mismatch using a two-state Markov process.
  • Extends generalization bounds to beta-mixing data with effective sample size considerations.
  • Empirical results show strong correlation between the proposed penalty and deployment gaps.
Read more
Smart Picks in the Dark: Towards Efficient RLVR for Reasoning via Tracing Metacognitive Pivots
Guangcheng Zhu, Shenzhi Yang, Haobo Wang, Xing Zheng, Yingfan MA, Xuening Feng, Zhongqi Chen, Bowen Song, Weiqiang Wang, Gang Chen
Reinforcement Learning Large Language Models Efficient ML
  • Introduces PivotTrace, a framework for efficient RLVR that selects unlabeled samples without prior supervision.
  • Utilizes metacognitive pivots to quantify uncertainty and guide adaptive data routing.
  • Achieves superior performance with significantly fewer annotated samples compared to fully supervised models.
  • Demonstrates 2.75 times faster convergence in training processes.
Read more
Scenario Generation for Risk-Aware Reinforcement Learning with Probably Approximately Safe Guarantees
Mohit Prashant, Arvind Easwaran
Reinforcement Learning Generative Models Optimization
  • Introduces a dual chance-constrained program (CCP) for safety guarantees in RL.
  • Utilizes a variational autoencoder (VAE) to encode state-space distributions and barrier-certificates.
  • Focuses on robust exploration to minimize the risk of encountering unsafe states.
  • Demonstrates the effectiveness of the proposed method through experimental results.
Read more
Do Transformers Need Three Projections? Systematic Study of QKV Variants
Ali Kayyam, Anusha Madan Gopal, M Anthony Lewis
Efficient ML Large Language Models Computer Vision
  • Systematic evaluation of QKV projection-sharing strategies across diverse tasks.
  • Q-K=V configuration reduces KV cache size by 50% with minimal performance degradation.
  • Projection sharing is complementary to head sharing, enabling significant memory efficiency gains.
  • Task-dependent efficacy of projection-sharing strategies observed.
Read more
OpenRFM: Dissecting Relational In-Context Learning
Zhikai Chen, Junyu Yin, Jialiang Gu, Siheng Xiong, Xiaoze Liu, Ruowang Zhang, Keren Zhou, Kai Guo
Theory Efficient ML Graph Learning
  • OpenRFM addresses the performance gap between open RFMs and commercial models by enhancing relational in-context learning.
  • The dual-stage ICL architecture combines relation-level and batch-level learning to improve label coverage.
  • A homophily-aware pre-training approach is introduced, mixing synthetic and real data for better model performance.
  • OpenRFM shows a 30% improvement over the RT backbone and surpasses KumoRFMv1 in multiple tasks.
Read more
Mitigating Spurious Correlations with Memorization-Guided Dataset De-Biasing
Arda Fazla, Abolfazl Hashemi
Computer Vision Theory Efficient ML
  • Proposes a two-stage sample scoring function to mitigate spurious correlations in datasets.
  • Introduces the TCSL and TCSL-CS algorithms for effective sample selection.
  • Demonstrates improved worst-group accuracy using only 10% of the original training data.
  • Highlights the limitations of existing coreset selection methods in handling spurious features.
Read more
MAdam: Metric-Aware Multi-Objective Adam
Fengbei Liu, Rachit Saluja, Sunwoo Kwak, Ruibo Wang, Ruining Deng, Heejong Kim, Johannes C. Paetzold, Mert R. Sabuncu
Optimization
  • Identification of systematic mismatches between MOO solvers and the Adam optimizer.
  • Introduction of MAdam, a metric-aware optimization method that resolves these mismatches.
  • Demonstrated improvements in performance across multiple application domains.
  • MAdam maintains compatibility with existing MOO solvers and the Adam optimizer.
Read more
Topology-Aware Gaussian Graph Repair for Robust Graph Neural Networks
Anubha Goel, Juho Kanniainen
Graph Learning
  • TAGR improves GNN robustness against noisy and missing edges.
  • The framework combines a Gaussian feature-neighborhood graph with a topology-aware residual correction.
  • TAGR maintains compatibility with standard GNN architectures.
  • The method is lightweight and avoids the complexities of dense graph structure learning.
Read more
GRZO: Group-Relative Zeroth-Order Optimization for Large Language Model Fine-Tuning
Liyan Tan, Yequan Zhao, Yifan Yang, Ruijie Zhang, Xinling Yu, Zheng Zhang
NLP Large Language Models Optimization
  • GRZO reduces variance in zeroth-order optimization by utilizing multiple pseudo-independent perturbations per mini-batch.
  • The method maintains inference-level memory efficiency while improving convergence rates for large language models.
  • Experimental results show significant improvements in accuracy and memory usage compared to existing ZO methods like MeZO.
  • GRZO can be integrated with other optimization techniques to further enhance performance.
Read more
Demystifying Pipeline Parallelism: First Theory for PipeDream
Ivan Ilin, Peter Richtárik
Theory Optimization Efficient ML
  • Introduction of Randomized PipeDream (RPD) with a nonconvex convergence guarantee for pipeline model parallelism.
  • Analysis of the scaling behavior of PipeDream, showing that delay grows quadratically with the number of stages.
  • Comparison of PipeDream and LocalSGD, highlighting performance differences based on the training objective.
  • Experimental results indicating that the choice of method depends on the specific task and scaling conditions.
Read more
Multi-Modal Machine Learning for Breast Cancer Recurrence Prediction
Jiahao Shao, Xudong Wang, Anam Nawaz Khan, Christopher Brett, Xueping Li, Bing Yao
Multimodal
  • Multi-modal integration of clinical data improves breast cancer recurrence prediction accuracy.
  • The study employs a rule-based extraction mechanism to recover tumor characteristics from unstructured data.
  • Performance is benchmarked against traditional single-source models, showing significant enhancements.
  • Data harmonization addresses issues of fragmentation and inconsistency in electronic health records.
Read more
ReLoRA: Knowledge-Reusing Adaptation for Fast Rollout of Evolving LLM Services
Yang Xu, Zihuai Xu, Hongli Xu, Yunming Liao, Zhiwei Yao, Xitong Fu
Large Language Models Efficient ML Optimization
  • ReLoRA enables efficient adaptation of LoRA adapters for evolving LLMs.
  • The framework utilizes Bayesian optimization for adaptive initialization.
  • Scheduled regularization is employed to enhance fine-tuning efficiency.
  • ReLoRA significantly reduces time-to-readiness and improves accuracy.
Read more
Identifying Gems from Roman RAPIDly
Karan Gandhi, Ashish A. Mahabal, Jacob E. Jencson, Russ R. Laher, Ben Rusholme, Lin Yan, Ryan M. Lau, Schuyler D. Van Dyk, Mansi M. Kasliwal
Time Series
  • Introduction of RuBR model for classifying astronomical transients.
  • Development of three model variations to handle different training scenarios.
  • Methodology emphasizes domain adaptation for transitioning to real data.
  • Experimental results show effectiveness in distinguishing genuine detections.
Read more
Be Fair! Can Machine Learning Engineering Agents Adhere to Fairness Constraints?
Anna Richter, Julia Stoyanovich, Sebastian Schelter
Theory
  • Existing benchmarks for MLE agents do not adequately assess their adherence to fairness and responsibility constraints.
  • The proposed evaluation framework emphasizes domain-centric design and the impact of technical expertise on MLE agent usability.
  • Exploratory experiments show that MLE agents underperform compared to human-designed pipelines in terms of fairness and predictive quality.
  • The study highlights the importance of human oversight in the development of ML pipelines, especially in sensitive applications.
Read more
LimiX-2M: Mitigating Low-Rank Collapse and Attention Bottlenecks in Tabular Foundation Models
Yuanrui Wang, Xingxuan Zhang, Han Yu, Mingchao Ming, Gang Ren, Hao Yuan, Li Mao, Yunjia Zhang, Chun Yuan, Peng Cui
Efficient ML Theory
  • Introduction of RaBEL, a Radial Basis Embedding Layer that improves feature representation and conditioning.
  • Reordering of attention mechanisms to enhance the aggregation of column-level statistics before feature-level attention.
  • LimiX-2M achieves superior performance compared to larger models while being more computationally efficient.
  • Identification and quantification of low-rank collapse issues in traditional tabular foundation models.
Read more
The Right Measure for Physics-Constrained Generation: A Co-Area Correction for Posterior-Consistent PDE Inverse Problems
Jian Xu, Delu Zeng, John Paisley, Qibin Zhao
Generative Models Theory
  • Identifies a systematic bias in physics-constrained generative posteriors due to the omission of the co-area correction.
  • Demonstrates that the bias can inflate posterior errors significantly, particularly in heterogeneous constraint sensitivities.
  • Introduces CoCoS, a new constrained sampler that accurately targets the correct co-area posterior.
  • Validates the necessity of the Fixman correction through controlled experiments against an i.i.d. ground-truth arbiter.
Read more
The Impact of Temporal Granularity on Socio-Demographic Inference from Household Load Profiles
Dejan Radovanovic, Maximilian Schirl, Andreas Unterweger, Günther Eibl
Time Series
  • Coarser temporal granularity reduces predictive accuracy but reveals stable performance plateaus.
  • Handcrafted and ts-fresh features are competitive with CNN-based embeddings, with XGBoost being the most effective classifier.
  • Static attributes can be inferred from coarse data, while dynamic attributes require fine-grained signals.
  • The study provides insights into the privacy-utility trade-off in smart metering.
Read more
Trading Human Curation for Synthetic Augmentation in RLVR
Akshansh, Leonardo Rosa Rodrigues, Michael Korostelev, Youssef Hassan, Mark E. Whiting
Reinforcement Learning Large Language Models Efficient ML
  • Synthetic task augmentation can effectively substitute for human curation in RLVR.
  • The cost-adjusted trade rate (ρcost) between augmented and human-authored tasks is established and measured.
  • High-share augmentation retains generalization performance comparable to a larger set of human-authored tasks.
  • The study provides insights into the economics of task generation for RLVR.
Read more
Testing the Test: Score-Direction Instability in Class-Split Anomaly Detection
Alejandro Ascarate, Leo Lebrat, Rodrigo Santa Cruz, Clinton Fookes, Olivier Salvado
Computer Vision Theory
  • Identification of score-direction instability in class-split anomaly detection due to normal-anomaly overlap.
  • Introduction of a training-free diagnostic tool, neighborhood class leakage, to assess the reliability of class-split evaluations.
  • Empirical validation across various datasets and representation methods to demonstrate the diagnostic's effectiveness.
  • Highlighting the fragility of benchmark conclusions in anomaly detection when relying on class-split evaluations.
Read more
Pruning Deep Neural Networks via the Marchenko--Pastur Distribution
Leonid Berlyand, Theo Bourdais, Houman Owhadi, Yitzchak Shmalo
Efficient ML Theory Computer Vision
  • Introduces a Marchenko–Pastur random-matrix approach for pruning DNNs.
  • Achieves accuracy retention with minimal post-pruning fine-tuning (one epoch).
  • Provides deterministic data-path certificates for effective pruning.
  • Demonstrates significant MAC reduction while maintaining competitive accuracy on ImageNet-1k.
Read more
Pseudospectral Bounds for Transient Amplification in Coupled Gradient Descent
Ahanaf Hasan Ariq
Optimization Theory
  • Establishes Kreiss-constant bounds for block-triangular Jacobians in coupled gradient descent.
  • Identifies a critical coupling threshold for spectral instability.
  • Introduces a finite-horizon iteration complexity bound for stochastic coupled descent.
  • Demonstrates the significance of transient amplification in high-dimensional learning dynamics.
Read more
An Ensembled Latent Factor Model via Differential Evolution and Gradient Descent Optimization
Rui Zhang, Jinhang Liu, Wenbo Zhang
Optimization
  • Introduction of ELFM-DEGDO, combining differential evolution and gradient descent for latent factor modeling.
  • The model addresses the limitations of traditional gradient descent-only approaches in handling HDI data.
  • Empirical results show ELFM-DEGDO outperforms multiple advanced latent factor models on real datasets.
  • The proposed self-adaptive weighting mechanism effectively fuses strengths from both optimization methods.
Read more
Staying Alive: Uncensored Survival Analysis with Tabular Foundation Models
Mariana Vargas Vieyra
Time Series
  • Introduces a training-free method for survival regression using Tabular Foundation Models.
  • Constructs an Accelerated Failure Time model with a single scalar parameter.
  • Implements a non-parametric in-context estimator for imputing right-censored data.
  • Demonstrates competitive performance against traditional survival regression models.
Read more
A Close Look At World Model Recovery In Supervised Fine-Tuned LLM Planners
Patrick Emami, Nan Qiang, Peter Graf
Large Language Models NLP Interpretability
  • Supervised fine-tuning improves LLMs' ability to encode action validity and state predicates.
  • LLMs can learn internal representations that differentiate valid and invalid actions despite challenges in probability classification.
  • Fine-tuning with diverse state space coverage leads to better world model recovery.
  • The study provides insights into the representation and reasoning capabilities of LLMs in planning contexts.
Read more
Graph-Guided Universum Learning in Generalized Eigenvalue Proximal SVMs for Alzheimer's Disease Classification
Yogesh Kumar, Vrushank Ahire, Mudasir Ganaie
Graph Learning
  • Introduction of two graph-guided Universum learning models for Alzheimer's disease classification.
  • Utilization of mild cognitive impairment (MCI) subjects as Universum data to enhance classification between AD and CN.
  • Construction of a graph using Gaussian similarity and Minimum Spanning Tree to capture geometric relationships among Universum samples.
  • Demonstrated superior performance of the proposed models over existing methods, particularly under noisy conditions.
Read more
Folded Transport MCMC: Certifiable Quotient Posterior Computation for Symmetric Bayesian Models
Jun Hu
Theory
  • Introduces Folded Transport MCMC (FolT-MCMC) for symmetric Bayesian models.
  • Directly samples from the quotient posterior to avoid label-switching issues.
  • Proves theoretical guarantees for the method's convergence and certification.
  • Demonstrates significant empirical improvements in convergence diagnostics.
Read more
QUIVER: Quantum-Informed Views for Enhanced Representations in Large ML Models
Aritra Bal, Michael Binder, Markus Klute, Benedikt Maier, Michael Spannowsky
Multimodal Theory Graph Learning
  • QUIVER integrates quantum Fisher information into classical machine learning models to enhance feature representation.
  • The method is architecture-agnostic, allowing integration into various model types such as transformers and graph neural networks.
  • Significant performance improvements were observed on QM9 and JETCLASS benchmark datasets compared to classical baselines.
  • The quantum Fisher view exposes higher-order correlations that classical representations may overlook.
Read more
ASymPO: Asymmetric-Scale Policy Optimization for Asynchronous LLM Post-Training Without Behavior Information
Zehua Liu, Yuxuan Yao, Xiaojin Fu, Tao Zhong, Mingxuan Yuan
Reinforcement Learning Large Language Models Optimization
  • ASymPO normalizes token loss to stabilize training without behavior-policy probabilities.
  • Identifies scale-imbalance failure mode in current-policy-only asynchronous RL.
  • Proposes Scaled Policy Optimization (SPO) as a simpler baseline method.
  • Empirical evaluations show ASymPO's effectiveness in asynchronous mathematical reasoning tasks.
Read more
Toward Multi-Domain and Long-Tailed Quantization via Feature Alignment and Scaling
Chin-Yuan Yeh, Ting-An Chen, De-Nian Yang, Ming-Syan Chen
Efficient ML
  • Introduction of EmaQ for efficient multi-domain quantization addressing domain shifts.
  • Extension to EmaQ-LT for long-tailed quantization, mitigating majority-class overconfidence.
  • Theoretical convergence guarantees for the proposed quantization methods.
  • Sensitivity-aware weight aggregation to harmonize convergence across diverse domains.
Read more
RESCAST-100K: A Comprehensive Dataset for Cross-Domain Residential Load and Indoor Temperature Forecasting
Jainam Dhruva, Yousaf Raza, A.B. Siddique, Simone Silvestri
Time Series
  • Introduction of RESCAST-100K, a comprehensive dataset for residential energy forecasting.
  • Dataset includes 100,000 simulated homes with detailed time series data and exogenous variables.
  • Configuration-driven interface allows for systematic evaluation across various domains.
  • Benchmarking shows that cross-attention and MLP-mixer models outperform traditional architectures.
Read more
APIC: Amortized Physics-Informed Calibration using Neural Processes
Aishwarya Venkataramanan, Sai Karthikeya Vemuri, Joachim Denzler
Theory Generative Models Time Series
  • APIC combines the probabilistic rigor of KOH with the efficiency of amortized inference.
  • The dual-latent architecture effectively disentangles physical parameters from model discrepancies.
  • APIC enables rapid calibration from sparse observations while quantifying uncertainty.
  • Experimental results show improved predictive performance and reliable parameter recovery across multiple dynamical systems.
Read more
Reconciling Causality and Non-Equilibrium Thermodynamics with Hamiltonian Causal Models
Dario Rancati, Max Welling, Francesco Locatello
Theory
  • Introduction of Hamiltonian Causal Models (HCMs) for trajectory-level causal modeling.
  • Separation of immutable equations of motion from intervenable mechanisms.
  • Entropy production as a measurable causal observable that quantifies irreversibility.
  • HCMs accommodate time-dependent, adaptive interventions and feedback loops.
Read more