Performance-Driven Metamorphic Testing of Cyber-Physical Systems

Cyber-Physical Systems (CPSs) are a new generation of systems which integrate software with physical processes. The increasing complexity of these systems, combined with the uncertainty in their interactions with the physical world, makes the definition of effective test oracles especially challenging, facing the well known test oracle problem. Metamorphic testing has shown great potential to alleviate the test oracle problem by exploiting the relations among the inputs and outputs of different executions of the system, so-called metamorphic relations (MRs). In this article, we propose a MR pattern called Performance Variation (PV) for the identification of performance-driven MRs, and we show its applicability in two CPSs from different domains: automated navigation systems and elevator control systems. For the evaluation, we assessed the effectiveness of this approach for detecting failures in an open source simulation-based autonomous navigation system, as well as in an industrial case study from the elevation domain. We derive concrete MRs based on the PV pattern for both case studies and we evaluate their effectiveness with seeded faults. Results show that the approach is effective at detecting over 88% of the seeded faults, while keeping the ratio of false positives at 4% or lower.

DOI: https://doi.org/10.1109/TR.2022.3193070

Authors: Jon Ayerdi, Pablo Valle, Sergio Segura, Aitor Arrieta, Goiuria Sagardui and Maite Arratibel

Title of the source: IEEE Transactions on Reliability

Publisher:  IEEE

Relevant pages:  

Year: 2022

Evaluating System-Level Test Generation for Industrial Software: A Comparison between Manual, Combinatorial and Model-Based Testing

Adequate testing of safety-critical systems is vital to ensure correct functional and non-functional operations. Previous research has shown that testing of such systems requires a lot of effort, thus automated testing techniques have found a certain degree of success. However, automated testing has not replaced the need for manual testing, rather a common industrial practice exhibits a balance between automated and manual testing. In this respect, comparing manual testing with automated testing techniques continues to be an interesting topic to investigate. The need for this investigation is most apparent at system-level testing of industrial systems, where there is a lack of results on how different testing techniques perform with respect to both structural and system-level metrics such as Modified Condition/Decision Coverage (MC/DC) and requirement coverage. In addition to the coverage, the cost of these techniques will also determine their efficiency and thus practical viability. In this paper, we have developed cost models for efficiency measurement and performed an experimental evaluation of manual testing, model-based testing (MBT) and combinatorial testing (CT) in terms of MC/DC and requirement coverage. The evaluation is done in an industrial context of a safety-critical system that controls several functions on-board the passenger trains. We have reported the dominant conditions of MC/DC affected by each technique while generating MC/DC adequate test suites. Moreover, we investigated differences and overlaps of test cases generated by each of the three techniques. The results showed that all test suites achieved 100% requirement coverage except the test suite generated by pairwise testing strategy. However, MBT-generated test suites were more MC/DC adequate and provided a higher number of both similar and unique test cases. Moreover, unique test cases generated by MBT had an observable affect on MC/DC, which will complement manual testing to increase MC/DC coverage. The least dominant MC/DC condition fulfilled by the generated test cases by all three techniques is the ‘independent effect of a condition on the outcomes of a decision’. Lastly, the evaluation also showed CT as the most efficient testing technique amongst the three in terms of time ted outcomes.

Authors: Muhammad Nouman Zafar, Wasif Afzal, Eduard Paul Enoiu

Title of the source: The 3rd ACM/IEEE International Conference on Automation of Software Test 2022

Publisher:  IEEE

Relevant pages:  

Year: 2022

Software test results exploration and visualization with continuous integration and nightly testing

Software testing is key for quality assurance of embedded systems. However, with increased development pace, the amount of test results data risks growing to a level where exploration and visualization of the results are unmanageable. This paper covers a tool, Tim, implemented at a company developing embedded systems, where software development occurs in parallel branches and nightly testing is partitioned over software branches, test systems and test cases. Tim aims to replace a previous solution with problems of scalability, requirements and technological flora. Tim was implemented with a reference group over several months. For validation, data were collected both from reference group meetings and logs from the usage of the tool. Data were analyzed quantitatively and qualitatively. The main contributions from the study include the implementation of eight views for test results exploration and visualization, the identification of four solutions patterns for these views (filtering, aggregation, previews and comparisons), as well as six challenges frequently discussed at reference group meetings (expectations, anomalies, navigation, integrations, hardware details and plots). Results are put in perspective with related work and future work is proposed, e.g., enhanced anomaly detection and integrations with more systems such as risk management, source code and requirements repositories.

Authors: Per Erik Strandberg, Wasif Afzal, Daniel Sundmark

Title of the source: International Journal on Software Tools for Technology Transfer

Publisher:  Springer

Relevant pages:  

Year: 2022

A novel methodology to classify test cases using natural language processing and imbalanced learning

Detecting the dependency between integration test cases plays a vital role in the area of software test optimization. Classifying test cases into two main classes – dependent and independent – can be employed for several test optimization purposes such as parallel test execution, test automation, test case selection and prioritization, and test suite reduction. This task can be seen as an imbalanced classification problem due to the test cases’ distribution. Often the number of dependent and independent test cases is uneven, which is related to the testing level, testing environment and complexity of the system under test. In this study, we propose a novel methodology that consists of two main steps. Firstly, by using natural language processing we analyze the test cases’ specifications and turn them into a numeric vector. Secondly, by using the obtained data vectors, we classify each test case into a dependent or an independent class. We carry out a supervised learning approach using different methods for handling imbalanced datasets. The feasibility and possible generalization of the proposed methodology is evaluated in two industrial projects at Bombardier Transportation, Sweden, which indicates promising results.

Authors: Sahar Tahvili, Leo Hatvani, Enislay Ramentol, Rita Pimentel, Wasif Afzal, Francisco Herrera

Title of the source: Engineering Applications of Artificial Intelligence

Publisher:  ELSEVIER

Relevant pages:  

Year: 2020

Towards a workflow for model-based testing of embedded systems

Model-based testing (MBT) has been previously used to validate embedded systems. However, (i) creation of a model conforming to the behavioural aspects of an embedded system, (ii) generation of executable test scripts and (iii) assessment of test verdict, re-quires a systematic process. In this paper, we have presented a three-phase tool-supported MBT workflow for the testing of an embedded system, that spans from requirements specification to test verdict assessment. The workflow starts with a simplistic, yet practical, application of a Domain-Specific Language (DSL) based on Gherkin-like style, which allows the requirements engineer to specify requirements and to extract information about model elements(i.e. states and transitions). This is done to assist the graphical modelling of the complete system under test (SUT). Later stages of the workflow generates an executable test script that runs on a domain-specific simulation platform. We have evaluated this tool-supported workflow by specifying the requirements, extracting information from the DSL and developing a model of a subsystem of the train control management system developed at Alstom Transport AB in Sweden. The C# test script generated from the SUT model is successfully executed at the Software-in-the-Loop (SIL) execution platform and test verdicts are visualized as a sequence of passed and failed test steps.

Authors: Muhammad Nouman Zafar, Wasif Afzal, Eduard Enoiu

Title of the source: A-TEST 2021: Proceedings of the 12th International Workshop on Automating TEST Case Design, Selection, and Evaluation

Publisher:  Association for Computing Machinery

Relevant pages:  33-40

Year: 2021

Simuloop – Testing Framework for an Industrial Elevator System

ELEVATE is an industrial elevator simulator that can be applied to examine the performance of elevator installations and test scheduling algorithms in a realistic environment. Elevate is a GUI-based application and requires manual steps to perform simulations. In this thesis, we have designed, developed and implemented a program (Simuloop) to facilitate a simulator loop with Elevate. Simuloop enables automatic simulations without manual interference. We propose 2 experiments that apply Simuloop to generate demanding passenger traffic to test an industrial elevator dispatcher. We apply a genetic algorithm and reinforcement learning, respectively. Simuloop is used to give feedback to the algorithms by simulating the passenger traffic. The experiment with the genetic algorithm performs stochastic updates on a lunch peak profile with 948 passengers. The updates are based on varying the passenger weight, entry/exit time to identify patterns that yield a high waiting time. The algorithm is able to increase the average waiting time from 20 to 44.5 seconds. The experiment with reinforcement learning has higher requirements to Simuloop since it depends on frequent feedback to guide the learning. We design a small-scale experiment to train the algorithm to select the arrival and destination floors for passengers. The algorithm is able to increase the cumulative waiting time, suggesting that the experiment is applicable for the use case. Due to the limitations of the pipeline, we conclude that using reinforcement learning is unpractical. The experiments prove that Simuloop can successfully be applied for automatic testing of an elevator system. This opens up the opportunity for performing exhaustive testing without the need for manual steps.

DOI:

Authors: Torbjørn Ruud

Title of the source: Master’s thesis

Publisher:  University of Oslo

Relevant pages:  

Year: 2021

Understanding Digital Twins for Cyber-Physical Systems: A Conceptual Model

Digital Twins (DTs) are revolutionizing Cyber-Physical Systems (CPSs) in many ways, including their development and operation. The significant interest of industry and academia in DTs has led to various definitions of DTs and related concepts, as seen in many recently published papers. Thus, there is a need for precisely defining different DT concepts and their relationships. To this end, we present a conceptual model that captures various DT concepts and their relationships, some of which are from the published literature, to provide a unified understanding of these concepts in the context of CPSs. The conceptual model is implemented as a set of Unified Modeling Language (UML) class diagrams and the concepts in the conceptual model are explained with a running example of an automated warehouse case study from published literature and based on the authors’ experience of working with the real CPS case study in previous projects.

Authors: Tao Yue, Paolo Arcaini and Shaukat Ali

Title of the source: 2021 10th International Symposium On Leveraging Applications of Formal Methods, Verification and Validation (ISoLA)

Publisher:  Springer

Relevant pages:  54-71

Year: 2021

Genetic Algorithm-based Testing of Industrial Elevators under Passenger Uncertainty

Elevators, as other cyber-physical systems, need to deal with uncertainty during their operation due to several factors such as passengers and hardware. Such uncertainties could affect the quality of service promised by elevators and in the worst case lead to safety hazards. Thus, it is important that elevators are extensively tested by considering uncertainty during their development to ensure their safety in operation. To this end, we present an uncertainty testing methodology supported with a tool to test industrial dispatching systems at the Software-in-the-Loop (SiL) test level. In particular, we focus on uncertainties in passenger data and employ a Genetic Algorithm (GA) with specifically designed genetic operators to significantly reduce the quality of service of elevators, thus aiming to find uncertain situations that are difficult to extract by users. An initial experiment with an industrial dispatcher revealed that the GA significantly decreased the quality of service as compared to not considering uncertainties. The results can be used to further improve the implementation of dispatching algorithms to handle various uncertainties.

Authors:
Joritz Galarraga, Aitor Arrieta Marcos, Shaukat Ali, Goiuria Sagardui, Maite Arratibel

Title of the source: 2021 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)

Publisher:  IEEE

Relevant pages:  353-358

Year: 2021

Machine Learning-based Test Oracles for Performance Testing of Cyber-Physical Systems: an Industrial Case Study on Elevators Dispatching Algorithms

The software of systems of elevators needs constant maintenance to deal with new functionality, bug fixes or legislation changes. To automatically validate the software of these systems, a typical approach in industry is to use regression oracles, which execute test inputs both in the software version under test and in a previous software version. However, these practices require a long test execution time and cannot be re-used at different test phases. To deal with these issues, we propose DARIO, a test oracle that relies on regression machine-learning algorithms to detect both functional and non-functional problems of the system. The machine-learning algorithms of this oracle are trained by using data from previously tested versions to predict reference functional and non-functional performance
values of the new versions. An empirical evaluation with an industrial case study demonstrates the feasibility of using our approach. A total of five regression learning algorithms were validated by using mutation testing techniques. For the context of functional bugs, the accuracy when predicting verdicts by DARIO ranged between 95% to 98%, across the different scenarios proposed. For the context of non-functional bugs, were competitive too, having an accuracy when predicting verdicts by DARIO ranging between 83% to 87%.

Authors:
Aitor Gartziandia, Aitor Arrieta, Jon Ayerdi, Miren Illarramendi, Aitor Agirre, Goiuria Sagardui, Maite Arratibel

Title of the source: Journal of Software: Evolution and Process

Publisher: 

Relevant pages: 

Year: 2022