31-05-2013, 04:04 PM
Leveraging XML Technologies in Developing Program Analysis Tools
Leveraging XML.docx (Size: 18.88 KB / Downloads: 12)
Abstract
XML technologies are quickly becoming ubiquitous within all aspects of computer and information sciences.
Both industry and academics have accepted the XML standards and the large number of tools that support
manipulation, transformation, querying, and storage of
XML objects. Thus, tools and representations based on XML are very attractive with respect to adoption. This paper describes the experiences of the authors in the development and application of srcML, a XML application to support explicit markup of syntactic information within source code. Additionally, XML technologies are leveraged along with srcML to support various program analysis, fact extraction, and reverse engineering tasks. A short description of these tools is given along with the motivation behind using an adoption centric XML approach.
Introduction
While conducting research in program understanding, reverse engineering, and software visualization we ran into a common technical problem namely, it is necessary to parse and analyze large amounts of source code. Furthermore, none of the existing program analysis tools worked very well for our particular problems. While many of the existing analysis tools are successful in a number of ways, they are typically difficult to integrate or extend into new research and products. The existing tools are typically: tightly coupled with other tools, language dependent, or embody a methodology orthogonal to the specific problem. Additionally, these tools are given little support by the original developers and/or require specific (older) OS versions, platforms, and libraries. These inherent problems have also been described by others [6, 12]. Oftentimes using and modifying an existing tool is as difficult as building your own from scratch.
srcML
srcML (SouRce Code Markup Language) [3, 4, 8] is an XML application that supports both document and data views of source code. The format adds structural information to raw source code files. The document view of source code is supported by the preservation of all lexical information including comments, white space, preprocessor directives, etc. from the original source code file. This permits transformation equality between the representation in srcML and the related source code document.