Regresión lineal multivariable versus regresión simbólica a partir de programación genética. Aplicación a la caracterización espectroscópica de aguas residuales urbanas

  1. Carreres-Prieto, Daniel 1
  2. García, Juan T. 1
  3. Castillo, Luis G. 1
  4. Carrillo, José M. 1
  5. Vigueras-Rodriguez, Antonio 1
  1. 1 Universidad Politécnica de Cartagena

    Universidad Politécnica de Cartagena

    Cartagena, España


Ingeniería del agua

ISSN: 1134-2196

Year of publication: 2022

Volume: 26

Issue: 4

Pages: 261-277

Type: Article

DOI: 10.4995/IA.2022.18073 DIALNET GOOGLE SCHOLAR lock_openOpen access editor

More publications in: Ingeniería del agua


Characterising urban wastewater in real time is key to ensure the proper management of water resources and environmental protection. From indirect measurements, such as the molecular spectroscopy which provides information on the physicochemical properties of the water, it is possible to determine the pollutant load of wastewater from mathematical correlation models. The research compares multivariate linear regression models and symbolic regression models based on genetic programming to establish a correlation with the pollutant load of the wastewater. The study has focused on the comparison of models for the characterisation of total nitrogen, total phosphorus and nitrogen in the form of nitrate of 90 urban wastewater samples. It is observed that the symbolic regression based on genetic programming provides an improvement in goodness of fit (R2) of between 72.76% and 146.39% with respect to multivariate linear regression.

Funding information


Bibliographic References

  • APHA–AWWA–WPCF, Standard Methods for the Examination of Water and Wastewater, twentieth edition, Washington, DC, 1998
  • Brunton, S.L., Proctor, J.L., Kutz, J.N. 2016. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the national academy of sciences, 113(15), 3932-3937.
  • Carreres-Prieto, D., García, J.T., Cerdán-Cartagena, F., Suardiaz-Muro, J. 2019. Spectroscopy transmittance by LED calibration. Sensors, 19(13), 2951.
  • Carreres-Prieto, D., García, J.T., Cerdán-Cartagena, F., Suardiaz-Muro, J. 2020. Wastewater quality estimation through spectrophotometry-based statistical models. Sensors, 20(19), 5631.
  • Carreres-Prieto, D. 2021. Contribución al campo del IOT mediante el desarrollo de sensores inteligentes basados en espectrofotometría de longitud de onda variable. Aplicación a la monitorización en continuo de la carga contaminante en aguas residuales urbanas. Tesis Doctoral. Universidad Politécnica de Cartagena.
  • Carreres-Prieto, D., García, J.T., Cerdán-Cartagena, F., Suardiaz-Muro, J., Lardín, C. 2022. Implementing Early Warning Systems in WWTP. An investigation with cost-effective LED-VIS spectroscopy-based genetic algorithms. Chemosphere, 293, 133610.
  • Leardi, R., Boggia, R., Terrile, M. 1992. Genetic algorithms as a strategy for feature selection. Journal of chemometrics, 6(5), 267-281.
  • Lepot, M., Torres, A., Hofer, T., Caradot, N., Gruber, G., Aubin, J.B., Bertrand-Krajewski, J.L. 2016 Calibration of UV/Vis spectrophotometers: a review and comparison of different methods to estimate TSS and total and dissolved COD concentrations in sewers, WWTPs and rivers. Water Research, 101, 519-534.
  • Mesquita, D.P., Quintelas, C., Amaral, A.L., Ferreira, E.C. 2017. Monitoring biological wastewater treatment processes: recent advances in spectroscopy applications. Reviews in Environmental Science and Bio/Technology, 16(3), 395-424.
  • Niazi, A., Leardi, R. 2012. Genetic algorithms in chemometrics. Journal of Chemometrics, 26(6), 345-351.
  • Otto, M. 2016. Chemometrics: statistics and computer application in analytical chemistry. John Wiley & Sons.
  • Quade, M., Abel, M., Nathan Kutz, J., Brunton, S.L. 2018. Sparse identification of nonlinear dynamics for rapid model recovery. Chaos: An Interdisciplinary Journal of Nonlinear Science, 28(6), 063116.
  • Searson, D.P., Leahy, D.E., Willis, M.J. 2010. GPTIPS: an open source genetic programming toolbox for multigene symbolic regression. Proceedings of the International multiconference of engineers and computer scientists, 1, 77-80. Citeseer.
  • TuringBot, S. 2020. Symbolic Regression Software. URL:
  • Udrescu, S.M., Tegmark, M. 2020. AI Feynman: A physics-inspired method for symbolic regression. Science Advances, 6(16), eaay2631.
  • Wagner, S., Kronberger, G., Beham, A., Kommenda, M., Scheibenpflug, A., Pitzer, E., Affenzeller, M. 2014. Architecture and design of the HeuristicLab optimization environment. Advanced methods and applications in computational intelligence, 197-261. Springer, Heidelberg.
  • Zelinka, I., Oplatkova, Z., Nolle, L. (2005). Analytic programming–Symbolic regression by means of arbitrary evolutionary algorithms. International Journal of Simulation: Systems, Science and Technology, 6(9), 44-56.