We present the receiver operating characteristic (ROC) analysis and comparison between the calculation parameters of Orange Canvas (see Fig. 3 and Table 1, where AUC is the area under thr ROC curve, CA is the classification accuracy, and F1 is the F-score). These models allow predicting, for example, the defects formed as a result of local overcooling of the crucible walls in the thermal node, leading to the accelerated crystal growth. We also receive the prediction models for obtained thr crystal weight, blocks, crack, bubbles formation, and total defect characteristics. The models obtained for all data sets and they were later used for generalization on a different data set which does not include the data used on the training stage. During training and testing, we find the recall and precision of prediction, and analyze the correlation among the features. The results show that the neural network precision for defects formed as a result of local overcooling of the crucible was 0.94. The neural network determines the current situation as a known state and reproduces its reaction as accurately as possible. The experimental studies of the sapphire crystals growth by the Kyropoulos method describe the dependence of the defects level in time, and neural networks as a machine learning instrument make it possible to derive new dependencies from this data and predict the obtained crystal quality. The precision of SVM and naive Bayes algorithms was 0.857 and 0.801, respectively. The experimental data set has to be extended to specify models, improving the recall and precision for defects prediction.
Method AUC CA F1 Precision Recall Neural network 0.889 0.929 0.930 0.940 0.929 SVM 0.578 0.857 0.857 0 857 0.857 Naive Bayes 0.844 0.786 0.789 0.801 0.786 Tree 0.833 0.786 0.779 0.782 0.786 CN2 rule inducer 0.700 0.714 0.714 0.714 0.714 Random forest 0.889 0.714 0.693 0.706 0.714 AdaBoost 0.544 0.643 0.592 0.607 0.643 kNN 0.544 0.643 0.503 0.413 0.643
Table 1. Comparison between calculation parameters of Orange Canvas.
Data mining is often concerned with the development of predictive models. In order to apply predictive models in practice, they have to be integrated into the decision support systems. The comparison between calculation parameters of Orange Canvas can be applied for the universal expert system development for defects prediction during the sapphire crystals obtaining. The analysis allows the experts to find hidden information in data and improve the efficiency of prediction. The generalized structure of the expert system for defects prediction is presented in Fig. 4. These investigations allow us to improve the expert system for defects prediction in sapphire crystals. We demonstrate the robustness and the predictive power of our method by performing the determination of defects. The designed software is a universal tool for studying the influence of the crystal growth parameters on the quality of sapphire crystals. It can be widely used to estimate and predict the defects of growing crystals.
The class diagrams of the expert system for the sapphire defects prediction can be seen in Fig. 5. In this diagram, the main classes include:
- CCriteria—a class that represents a feature, which can be characterized by an object from the subject area. This class includes the name of the attribute and the weight of the attribute, which is necessary for evaluating its effect on the resulting output of the calculation. This class also combines a set of possible values for the criterion of the values contained in the collection of objects of the CFeature class.
- CFeature—a class that represents one particular value from a certain set of permissible values related to a specific attribute. The class allows the user to set the name of the value and its numerical rating, defined in the range from 0 to 1.
- CCriteriaCollection—a class that facilitates work with a variety of features that are available in the developed expert system. This class simplifies the search and selection of criteria in the developed system, their addition and deletion, and also includes internal means of checking features on the correctness.
- CPattern—a class of the solution variant that is presented to the user of the expert system. The class includes a field for describing the solution received by the user after the search. Another attribute characterizing a solution is the range of values defined by the expert, which distinguishes the solution from others.
- CPatternCollection—a class that represents a set of solutions specified by an expert. This class includes solutions, sorted in an ascending order, which speeds up and simplifies the search procedure. This class also allows to test for the correctness of the set of solutions.
- CExpertSystemContainer—a class that encapsulates all static information about features and solutions provided by experts.
- CTuple—a class that contains information about the selection of certain values of attributes. This sample is made by the user of the expert system. The accuracy of the solution, as a rule, increases when more features are set by the user. The CChooser instance is passed as an object of this class to find a solution.
- CChooser—a class designed to search for a solution. In the beginning, a coefficient that is scaled to the interval of [0, 1] is determined from the sample transmitted to it. This coefficient is then passed to the CPatternCollection object. The CPatternCollection object determines which of the intervals falls on the number, and then returns the solution.
- CSerializer—a class that allows serializing and deserializing all the data stored in the CExpertSystemContainer. This class is necessary to save the developed expert system to a file on the disk and to load it.
The algorithms for data collection and analysis are designed to meet the following criteria: Analysis of initial technological data; statistical data processing; modeling of the influence of technological parameters on the quality of crystals; crystal quality prediction according to the initial data; decision making; analysis of reasons of possible deviations; model correction based on newly discovered data.
Selected user interface elements of the expert system are shown in Fig. 6.