skip to main content
10.1145/2815833.2815843acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesk-capConference Proceedingsconference-collections
research-article

A methodology for constructing the calculation model of scientific spreadsheets

Authors Info & Claims
Published:07 October 2015Publication History

ABSTRACT

Spreadsheets models are frequently used by scientists to analyze research data. These models are typically described in a paper or a report, which serves as single source of information on the underlying research project. As the calculation workflow in these models is not made explicit, readers are not able to fully understand how the research results are calculated, and trace them back to the underlying spreadsheets. This paper proposes a methodology for semi-automatically deriving the calculation workflow underlying a set of spreadsheets. The starting point of our methodology is the cell dependency graph, representing all spreadsheet cells and connections. We automatically aggregate all cells in the graph that represent instances and duplicates of the same quantities, based on analysis of the formula syntax. Subsequently, we use a set of heuristics, incorporating knowledge on spreadsheet design, computational procedures and domain knowledge, to select those quantities, that are relevant for understanding the calculation workflow. We explain and illustrate our methodology by actually applying it on three sets of spreadsheets from existing research projects in the domains of environmental and life science. Results from these case studies show that our constructed calculation models approximate the ground truth calculation workflows, both in terms of content and size, but are not a perfect match.

References

  1. R. Abraham and M. Erwig. Inferring Templates from Spreadsheets. In Proceedings of the 28th international conference on Software engineering., pages 182--191. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. I. Benn and N. R. J. Hulton. An Excel spreadsheet program for reconstructing the surface profile of former mountain glaciers and ice caps. Computers and Geosciences, 36(5):605--610, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. Boulton, M. Rawlins, P. Vallance, and M. Walport. Science as a public enterprise: the case for open data. Lancet, 377(9778):1633--5, May 2011.Google ScholarGoogle ScholarCross RefCross Ref
  4. Y. Chen and H. C. Chan. Visual checking of spreadsheets. In Proceedings of the European Spreadsheet Risks Interest Group 1st Annual Conference, pages 75--85, London, 2000.Google ScholarGoogle Scholar
  5. M. Clermont. A Toolkit for Scalable Spreadsheet Visualization. In Proceedings of EuSpRIG 2004 Conference, pages 1--12. European Spreadsheet Risks Interest Group, 2004.Google ScholarGoogle Scholar
  6. J. S. Davis. Tools for spreadsheet auditing. International Journal of Human-Computer Studies, 45:429--442, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. F. Hermans, M. Pinzger, and A. V. Deursen. Supporting Professional Spreadsheet Users by Generating Leveled Dataflow Diagrams. In Proceedings of the 33rd International Conference on Software Engineering. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Hodnigg, R. T. Mittermeir, and I. Informatik-systeme. Metrics-Based Spreadsheet Visualization Support for Focused Maintenance. In Proceedings of the European Spreadsheet Risks Interest Group 9th Annual Conference, pages 79--94, London, 2008.Google ScholarGoogle Scholar
  9. T. Igarashi, J. Mackinlay, B.-W. Chang, and P. Zellweger. Fluid Visualization of Spreadsheet Structures. In Proceedings of the IEEE Symposium on Visual Languages, Halifax, NS, Canada, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Jannach, T. Schmitz, B. Hofer, and F. Wotawa. Avoiding, Finding and Fixing Spreadsheet Errors - A Survey of Automated Approaches for Spreadsheet QA. Journal of Systems and Software, pages 1--69, 2014.Google ScholarGoogle Scholar
  11. B. Kankuzi and Y. Ayalew. An End-User Oriented Graph-Based Visualization for Spreadsheets. In Proceedings of the 4th International Workshop on End-User Software Engineering, pages 86--90, Leipzig,Germany, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. N. a. Mazer. A novel spreadsheet method for calculating the free serum concentrations of testosterone, dihydrotestosterone, estradiol, estrone and Cortisol: With illustrative examples from male and female populations. Steroids, 74(6):512--519, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  13. H. Rijgersberg, M. Wigham, and J. Top. How semantics can improve engineering processes: A case of units of measure and quantities. Advanced Engineering Informatics, 25(2):276--287, Apr. 2011.Google ScholarGoogle ScholarCross RefCross Ref
  14. S. Roy and F. Hermans. Dependence Tracing Techniques for Spreadsheets: An Investigation. In Software Engineering Methods in Spreadsheets, pages 1--4, 2014.Google ScholarGoogle Scholar
  15. B. Ruggeri. Chemicals exposure: Scoring procedure and uncertainty propagation in scenario selection for risk analysis. Chemosphere, 77(3):330--338, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  16. J. Sajaniemi. Modeling Spreadsheet Audit: A Rigorous Approach to Automatic Visualization. Journal of Visual Languages & Computing, 11:49--82, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. Shiozawa, K. Okada, and Y. Matsushita. 3D Interactive Visualization for Inter-Cell Dependencies of Spreadsheets. In Proceedings of the IEEE Symposium on Information Visualization, an Francisco, CA, USA, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Wolstencroft, S. Owen, M. Horridge, O. Krebs, W. Mueller, J. L. Snoep, F. du Preez, and C. Goble. RightField: embedding ontology annotation in spreadsheets. Bioinformatics (Oxford, England), 27(14):2021--2, July 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A methodology for constructing the calculation model of scientific spreadsheets

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        K-CAP '15: Proceedings of the 8th International Conference on Knowledge Capture
        October 2015
        209 pages
        ISBN:9781450338493
        DOI:10.1145/2815833

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 October 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        K-CAP '15 Paper Acceptance Rate16of56submissions,29%Overall Acceptance Rate55of198submissions,28%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader