| chris jensen ( @ 2005-12-07 19:23:00 |
Candidacy Paper Abstract
From the paper that won't end, I bring you, an abstract! There are three topics: data available in OSSD, processes executed in OSSD, and tools/techniques that might help analyze the data to extract the processes. The horrid length of time it's taking me to write the paper can be blamed directly on this three-pronged attack and the time spent weeding out would-be fourth (and beyond) prongs. Without further adieu...
In their 1993 study of software process capture and analysis, Alex Wolf and David Rosenblum argued that a hybrid, manual and tool-aided, approach to capturing process data is necessary because purely automated approaches are “intently biased towards the computerized aspects of processes, while purely manual approaches are inefficient for high volumes of data.” [Wolf and Rosenblum, 1993]. Since the time this statement was made, more and more aspects of development have become computerized. This is especially true in open source software development communities where most if not all development is computerized. With computerization came tools for analyzing structured aspects of the new data., including workflow mining tools such as Balboa, developed Dr. Wolf's future student Jonathan Cook. Worklow mining efforts produced many fruitful results in the latter half of the 1990s and beyond, and is well complimented by a flourishing breadth of tools mining other dimensions of software repositories. These tools, however, fail to incorporate unstructured data in software repositories, thereby providing an incomplete characterization of the processes (and other phenomenon) they discover. The purpose of this paper is to examine tools and techniques for automating the analysis of semi and unstructured data to compliment the results already achievable via structured analysis of software repositories in order to discover software development processes. In doing so for the reasons above, and more explained below, this study also takes in in-depth look at data available and processes enacted in open source software development (OSSD). The goal is not to prove that discovery is completely automatable for any or every process, but merely to see what can be achieved with current unstructured data analysis technology.
From the paper that won't end, I bring you, an abstract! There are three topics: data available in OSSD, processes executed in OSSD, and tools/techniques that might help analyze the data to extract the processes. The horrid length of time it's taking me to write the paper can be blamed directly on this three-pronged attack and the time spent weeding out would-be fourth (and beyond) prongs. Without further adieu...
In their 1993 study of software process capture and analysis, Alex Wolf and David Rosenblum argued that a hybrid, manual and tool-aided, approach to capturing process data is necessary because purely automated approaches are “intently biased towards the computerized aspects of processes, while purely manual approaches are inefficient for high volumes of data.” [Wolf and Rosenblum, 1993]. Since the time this statement was made, more and more aspects of development have become computerized. This is especially true in open source software development communities where most if not all development is computerized. With computerization came tools for analyzing structured aspects of the new data., including workflow mining tools such as Balboa, developed Dr. Wolf's future student Jonathan Cook. Worklow mining efforts produced many fruitful results in the latter half of the 1990s and beyond, and is well complimented by a flourishing breadth of tools mining other dimensions of software repositories. These tools, however, fail to incorporate unstructured data in software repositories, thereby providing an incomplete characterization of the processes (and other phenomenon) they discover. The purpose of this paper is to examine tools and techniques for automating the analysis of semi and unstructured data to compliment the results already achievable via structured analysis of software repositories in order to discover software development processes. In doing so for the reasons above, and more explained below, this study also takes in in-depth look at data available and processes enacted in open source software development (OSSD). The goal is not to prove that discovery is completely automatable for any or every process, but merely to see what can be achieved with current unstructured data analysis technology.