July 2009

Master's Thesis & Open-Source Tool

On July 15th, I successfully defended my Master's Thesis in Biomedical Informatics at Vanderbilt University. This defense was the culmination of 2 years of work. The thesis focuses on extracting organizational structure and relationships from the audit logs of clinician information systems. This work has potential applications in the improvement of delivery of care and improving the security of patients private medical data.

As part of this work, I developed an open-source tool for analyzing audit logs. Licensed under an Apache 2.0 License, the Healthcare Organizational Relational Network Extraction Toolkit (HORNET) is a Python framework for plugins that analyze healthcare audit logs. The tool is fully functional, but is not yet polished enough for use by healthcare administrators.

The project is hosted on Google Code (http://code.google.com/p/hornet/). You can visit the project site as well as view the latest documentation

I am writing a journal publication that describes this tool, its methods, and results from Vanderbilt University Medical Center. I will link to that publication when it is available, but until that time, I can release my thesis abstract.

A Framework for the Automatic Discovery of Policy from Healthcare Access Logs

by John M. Paulett

Healthcare organizations are often stymied in their efforts to prevent insider attacks that violate patient privacy. Numerous high-profile privacy breaches involving celebrities have brought this deficiency to the public's attention. In response, recent legislation aims to improve this situation by means of regulations and sanctions. While the public and government may demand more privacy safeguards, the current state-of-the-art tools in healthcare security, such as access control and auditing, will still be limited in their ability to solve the issue technically. These technologies are theoretically sound and tested in other industries, yet are suboptimal because no feasible methods exist for generating the policies these systems must act upon, due to the inherent complexities of modern healthcare organizations.

To address this shortcoming, we present a novel open-source framework, which mines low-level statistics of how users interact within the organization from the access logs of the organization's information systems. Our framework is scalable and capable of handling real world data integrity issues. We demonstrate the use of our tool by modeling the Vanderbilt University Medical Center. Additionally, we compare our framework's model to traditional experts who would attempt to manually generate a similar model.

Programming Clojure Review

http://cdn.johnpaulett.com/upload/programming-clojure.jpg

When Stuart Halloway's Programming Clojure came out in May, I picked up a copy and have been reading through it and practicing with the Project Euler problems.

First off, it is a great book! Second off, it introduces a seriously interesting programming language.

Clojure is a Lisp dialect designed to run on Java Virtual Machine (JVM). This combination is what makes Clojure very powerful: you get the power of a mature virtual machine with access to any existing Java libraries, combined with the dynamic, functional style of Lisp. Imagine being able to continue to use the code and libraries you any others have spent years developing from a new programming environment.

Layering a language on top of the JVM is not a new concept. Jython, JRuby, Groovy, and others did it years ago. But to some extent, these languages serve as a mere face-lift to the verbose syntax of Java. These languages were ported or created for the JVM to harness the power of existing Java libraries and platforms, while providing a prettier language.

While Clojure does offer a new syntax, it has a much more fundamental contribution to the Java world: strong concurrency primitives. (It should be noted that Scala offers this benefit as well.)

Clojure takes a hard-line approach to the arch-enemy of concurrency: shared state. Clojure allows programmers to easily write concurrent programs that can execute on multiple processors or cores. This ability comes from several facets of Clojure:

  • Immutable data
  • Preferring "pure" functions by making the programmer explicitly state where shared state is accessed
  • Multiple models for transactions and locks

Almost anyone who has experience writing threaded Java code, knows how difficult it is to ensure that multiple threads can execute in parallel without causing awful race conditions and subtle bugs. Luckily, Clojure addresses these shortcomings by using its own concurrency models.

Stuart's book begins by discussing the syntax of Clojure and demonstrates Clojure's ability to interact with regular Java classes. The book moves into the list-based world of Lisp with functional programming techniques, including lazy evaluation. The book then moves into advanced topics, including concurrency, macros, and Clojure's form of polymorphism, multimethods. The book concludes with a short chapter on testing Clojure code, working with SQL databases, and doing web development.

Through the book, we work on building an Ant replacement in Clojure. The most interesting take-away from this ongoing example is the use of actual Clojure code for the build DSL, removing the need for Ant's build.xml. The code-as-data concept is very elegant, resulting in a DSL that is very clear yet lacks XML's verbosity.

I also found the Snake game to be an excellent example of an application sharing state in a safe way using the Clojure transaction primitives.

The book gave me a great appreciation of the Lisp family of languages. The only wart that bothered me about Clojure was that it seems that at times the programmer must be too aware of the specific implementation of Clojure on the JVM. For instance, Clojure's recursion is at times hampered by the lack of Tail Call Optimization on the JVM. Because of this lack, the programmer must determine which work-around is most appropriate for his problem. Regardless, Clojure feels very clean and precise.

The book also clearly provides best practices and examples of idiomatic Clojure.

I look forward trying Clojure out in my projects. As I mentioned, I have been working through the Project Euler problems (my answers are definitely not ideal).

I would highly recommend the book to anyone who works in Java. I also believe the book is an excellent introduction to functional programming--I have read the Real World Haskell and Programming Erlang books with some difficultly, but Programming Clojure just clicked in my mind.