Encore/J: Transparently Recoverable Java for Resilient Distributed Computing

Research Areas: End System Security,

Principal Investigator: Antony Hosking

The EncoreJ project is developing tools and libraries for transparent rewriting of Java code, making distributable Java applications resilient in the face of execution node reconfiguration and failure. Developers control the system, but EncoreJ automatically rewrites compiled Java code, as packages are loaded, adding support for creating, accessing, and computing upon local and remote objects, and for resilience in the face of system failures and reconfigurations. EncoreJ further interfaces with a variety of persistence mechanisms (e.g., databases), both for providing fundamental resilience (saving/restoring information) and for coordinating recovery with the mechanisms of the external database.

EncoreJ exploits resiliency support to make it easy to reconfigure applications as the host platform evolves, adding and removing resources dynamically; e.g., a virtual node might go down and be replaced by another, in order to force work to move to a newly available system. Programmers describe “”on the side”” (without modifying source code), how to place, move, and replicate objects and computations; the source code remains the primary mechanism for expressing algorithms clearly without hard-coded details of distribution or resilience.

The EncoreJ tools and prototype are a platform for research by the wider community working on policies/algorithms for migration, replication, scheduling, etc., in Grid systems. The focus is a convenient and flexible platform, powerful and extensible, without over-commitment to any particular policies or strategies. EncoreJ builds on readily available and standard systems (Java virtual machines and packages) to ensure wide applicability and easy distribution and adoption.

Keywords: Recoverability, Java, resilient distributed computing