Publications
-
Automating instrumentation choices for performance problems in distributed applications with VAIF
M. Toslali, E. Ates, A. Ellis, Z. Zhang, D. Huye, L. Liu, S. Puterman, A.K. Coskun, and R.R. Sambasivan.
in ACM Symposium on Cloud Computing (SoCC), Seattle, 2021. GitHub
-
Proctor: A semi-supervised performance anomaly diagnosis framework for production HPC systems
B. Aksar, Y. Zhang, E. Ates, B. Schwaller, O. Aaziz, V.J. Leung, J. Brandt, M. Egele, and A.K. Coskun.
in International Supercomputing Conference (ISC-HPC), 2021.
-
Counterfactual Explanations for Machine Learning on Multivariate Time Series Data
E. Ates, B. Aksar, V.J. Leung, and A.K. Coskun.
in Proceedings of IEEE International Conference on Applied Artifical Intelligence (ICAPAI), 2021. GitHub
-
Automating Telemetry- and Trace-Based Analytics on Large-Scale Distributed Systems
E. Ates
PhD Dissertation, 2020.
-
Praxi: Cloud software discovery that learns from practice
A. Byrne, E. Ates, A. Turk, V. Pchelin, S. Duri, S. Nadgowda, C. Isci, and A.K. Coskun.
to appear in IEEE Trans. on Cloud Computing (TCC), 2020. GitHub
-
An automated, cross-layer instrumentation framework for diagnosing performance problems in distributed applications
E. Ates, L. Sturmann, M. Toslali, O. Krieger, R. Megginson, A.K. Coskun, and R.R. Sambasivan.
in ACM Symposium on Cloud Computing (SoCC), Santa Cruz, 2019.
-
HPAS: An HPC performance anomaly suite for reproducing performance variations
E. Ates, Y. Zhang, B. Aksar, J. Brandt, V.J. Leung, M. Egele, and A.K. Coskun.
in International Conference on Parallel Processing (ICPP), Kyoto, 2019. GitHub
-
Online diagnosis of performance variation in HPC systems using machine learning
O. Tuncer, E. Ates, Y. Zhang, A. Turk, J. Brandt, V.J. Leung, M. Egele, and A.K. Coskun.
in IEEE Trans. on Parallel and Distributed Computing (TPDS), 2019.
-
Understanding simultaneous impact of network QoS and power on HPC application performance
T. Patki, E. Ates, A.K. Coskun, and J.J. Thiagarajan,
in Computational Reproducibility at Exascale (CRE), Dallas, 2018.
-
Tangram: Colocating HPC applications with oversubscription
Q. Xiong, E. Ates, M.C. Herbordt, and A.K. Coskun.
in IEEE High Performance Extreme Computing Conf. (HPEC), Boston, 2018.
-
Taxonomist: Application detection through rich monitoring data Best Artifact Award
E. Ates, O. Tuncer, A. Turk, J. Brandt, V.J. Leung, M. Egele, and A.K. Coskun.
in European Conf. on Parallel and Distributed Systems (Euro-Par), Torino, 2018. Artifact
-
Diagnosing performance variations in HPC applications using machine learning Gauss Award
O. Tuncer, E. Ates, Y. Zhang, A. Turk, J. Brandt, V.J. Leung, M. Egele, and A.K. Coskun.
in Int. Supercomputing Conf. (ISC-HPC), Frankfurt, 2017.