From Data to Equations: Symbolic Regression as a Path to Physical AI

In our previous blog post, we explored the idea of “Physical AI”: using AI not just as a predictive black box, but as a promising candidate that bridges the rigor of physical models with the flexibility of machine learning. One of the most promising advances at the intersection of these areas is the renewed interest in symbolic regression (SR) as a tool for explainable AI and data-driven system identification.

What is Symbolic Regression?

Symbolic regression (SR) is a form of machine learning that automatically discovers mathematical expressions describing the relationships hidden in data. Unlike black-box approaches, SR produces human-readable and explainable results, offering transparent insights into the underlying system behavior.

This makes SR highly relevant across disciplines – physics, engineering, biology, and beyond [1]. While the idea is not new – even Kepler and Newton could be considered early practitioners of symbolic regression [1] – recent progress has accelerated its impact. The combination of algorithmic advances, increased computing power, and an increasing demand for transparency in AI has led to a rapid resurgence of interest and practical explorations of SR [2].

Co-Author
Prof. Philipp Zech
University of Innsbruck

Send email

Philipp Zech is an Assistant Professor of Computer Science with a solid background in Software Engineering, Model-Driven Engineering, Modeling and Simulation, Machine Learning, and Digital Twin Engineering.

Beyond Black Boxes: SR as Iterative, Interpretable Learning

An interesting example of this new direction is presented by Llorella et al. [3], which demonstrates that SR can be deployed as a transparent, iterative learning framework that mimics the trial-and-error process of scientific discovery. In their study, data is collected through experimentation with physics simulations, while SR continuously proposes and refines mathematical models in real time based on the evolving dataset. Each new data point updates the mathematical expression, making the entire process visible, iterative, and explainable.

Why Is This Important?

Llorella et al.’s system capitalizes on continuous feedback: as new evidence is gathered, the model refines its mathematical hypothesis. This reflects the core principles of iterative learning and model selection found in modern AI and machine learning approaches.

Explainable AI: Unlike black-box models, SR provides transparent, step-by-step insights into how models evolve and why specific equations are chosen. This level of interpretability makes SR an ideal candidate for domains that require trustworthy and explainable AI behavior.

How Does Symbolic Regression Map to Broader ML/AI Trends?

SR, in the iterative and interactive setting recommended by Llorella et al., isn’t just a teaching tool but represents a prototype for the next generation of interpretable machine learning. More specifically, the following can be anticipated:

Surrogate Modeling & System Identification: SR can act as a “white-box” surrogate model for complex systems, effectively bridging the gap between simulation, experimentation, and theoretical understanding.

Iterative, Collaborative AI: Human experts and AI systems can jointly generate, test, and refine models, mirroring the real process of scientific discovery and engineering design [4].

Explainable AI in Practice: Across regulatory, industrial, and scientific domains, there is a growing need for AI systems that not only predict but also explain. SR is already ahead of this need, offering interpretable insights that build trust and transparency into the modeling and learning process..

As Dong & Zhong summarize, “SR’s unique capability to generate potentially interpretable mathematical expressions … allows [it] to unveil underlying patterns in data, offering profound insights for scientists, engineers, and researchers” [1].

Towards Physical AI: From Educational Labs to System Identification

The lessons from educational settings [2] directly translate into scientific and engineering applications. The transparent, feedback-driven modeling process that enables students to uncover equations from simulation data is the very same mechanism that allows SR to discover, validate, and refine physical laws and governing equations in complex, real-world systems.

Because SR identifies explicit, interpretable mathematical relationships – rather than opaque statistical correlations – it can reveal underlying mechanisms, physical constraints such as conservation laws, and integrated prior domain knowledge [4].

These characteristics make SR particularly well-suited for Physical AI, which are AI systems that not only learn from data but also understand the physical principles at work. Whether used to teach foundational scientific concepts or to optimize and control complex dynamic cyber-physical systems, SR offers a path towards AI tools that reason with, and about, the laws of nature.

References

Makke, N., Chawla, S. Interpretable scientific discovery with symbolic regression: a review. Artif Intell Rev 57, 2 (2024).

Dong, J. & Zhong, J. (2025). Recent Advances in Symbolic Regression. ACM Computing Surveys, 57(11).

Llorella et al. (2024). Fostering scientific methods in simulations through symbolic regressions. Phys. Educ. 59.

Udrescu, S.-M. & Tegmark, (2020). AI Feynman: A physics-inspired method for symbolic regression. Science Advances, 6(16).