Mining and discovery of hidden relationships between software source codes and related textual documents

AbstractNormally, software documentations are produced, informally. They are written in the unnatural and non-structural form of the language, such as user manuals, user requirements, design documentation, tutorials, support documentation, and so on. Recent studies show that 61% of software projects are subject to failure or challenges due to an increase in the costs and production time. Various factors may lead to this issue, and one of the major contributing factors is the lack of links between the software's source code and its related documents. The significance of software development and the possibility of making prospective changes by the development team necessitate an understanding of the links between various sections of codes and documentations. Therefore, it is crucial to design a system to link the software codes to their corresponding textual documentation. This article proposes a model for recovering the latent, but traceable links between software source codes and existing documents based on word extraction and function name separation. The contributions in this article include: (1) a model based on word extraction from document and source codes; (2) the proposal of an algorithm for splitting compound words and words that are connected to one another and completing abbreviations used in the names of functions, variables, and output commands; and (3) a new algorithm that is proposed for retrieving traceable latent links between the source code and documents. Two data sets are used in this research and the achieved results will be reported in terms of recall, precision, and F-measure. The experimental results are promising and indicate that the proposed approach significantly outperforms its counterparts.