Before starting, I just want to make clear that I am not a member of the OSGi Alliance nor a participant of any EG. I just happen to use OSGi since Eclipse started to investigate OSGi as their componentization model in its core. Since then I got more and more attached to OSGi and I don’t want to give up any of its features, so I guess you can call me a fanboy if you like. Of course, I am following the dispute between OSGi and Jigsaw since project Jigsaw was announced and I have to admit, that I was and am not happy with the approach Sun was taking not using a JSR and hoping to establish a modularization standard beyond the JVM. I already expressed my feelings in an older blog post called [Componentization Wars Part II]. Anyway, I guess there are some things I can’t change, so I’ll try my best to at least help the Jigsaw community to benefit from my experiences with OSGi and hope the input will help to create a better system (we all can benefit from). At the end only the quality of the technology should count and we should all work together to progress. So please consider the rest of this post as my humble contribution. Take it or leave it, it’s just an offer.
I thought a lot about the resistance of parts of the Java community in reusing the OSGi standard and starting from what we already have and in my conclusion the problem is two fold. First, it has come from a source outside the JSR initially. Well, I know this is debatable, because OSGi was one of the first JSRs and now with JSR 291 it in fact is one, but it is not “developed” within the JSR process, so one can say this is a valid point. Second modularization is in fact a tough call. When looking around, there is no other language or standard (even beyond Java itself) that has entirely tackled the problem and boy since David Parna’s introduction of the concept of modularity in 1972 quite some time has passed. A “fan” or not one has to admit that OSGi has already has gained a name in terms of modularizing the JVM and looking at the time required, well it took over 10 years to get that far. So I think it is fair to say that modularization is not easy to accomplish and to get right. In fact I had my problems (and partially still have) with OSGi as well. Let me explain…
My first contact with OSGi
When I first encountered OSGi I was working at IBM Research trying to explore ways to improve the modularization approach of [UIMA]. UIMA itself already had the notion of components or modules if you prefer already, but in a way it left certain features to get improved. In particular the isolation of each module wasn’t enforced, so introducing 3rd party modules into the runtime was a risky business. Especially when giving the fact that these modules had access to basically all resources. You never know what they are using or how they behave in the system. That, for instance was a reason to separate the processing into an other Java process. If that one fails, not the entire system is affected. OSGi in this context wasn’t THE savior, but it helped to make it more robust. No Java based model can provide real isolation of modules on JVM level (without the need to create a separate JVM instance is still something entirely missing in Java – one can ALWAYS create an OutOfMemoryError if he or she intends to). The great benefit of OSGi was simple. Hiding internal APIs, making dependencies explicit and potentially create a repository of reusable artifacts being usable without pages of manuals describing on how to set-up a system suitable for these modules.
How OSGi is received (on first encounter)
As you may imagine, one of the most important goals was to hide OSGi as much as possible. No services, no Import-Package (but Require-Bundle on one aggregation bundle), basically reduce OSGi to its minimum. Well, the appreciation was… limited. All people I was working with were exceptional bright and determined researchers in MLP. They developed highly sophisticated algorithms on how to analyze unstructured data, but only a few were software engineers. So the code was… working, but not production ready. Enforcing them to apply rules that ultimately made it harder for them getting things done and as a result slowing them down wasn’t something they welcomed very much. This is something I can totally understand and relate to! What is a system worth, if you don’t gain anything! Well, the problem here is that they didn’t gain anything, because they just used their code and to a limited extend the code of others, so the overhead of integrating with others was rather limited. The real benefit is visible when you start reusing multiple artifacts and potentially in different versions. This is something we (as a community) are trying to achieve for decades but failed so far, if you ask me.
Some reflections about our history
Looking at the history of Java it feels like it is a common problem. One tends to address the immediate problems. In principle there is nothing wrong with that, but in often after introducing a new standard you realize that the actual problem is way more complicated or you haven’t anticipated all vital use cases and the trivial approach looking so attractive at the beginning won’t work in the long run. The problem then is only that you don’t want to brake backwards compatibility and eventually this will drive the design of your solution, ultimately removing many otherwise possible ways to go. Talking about history, when we look back, jars where introduced as a distribution format for applets. Surely solving the immediate problem. After a while, Java on the server emerges and yet another problem arose. Separation of different web applications. Well, we know how it ended, eventually we got different class loading solutions for each J2EE vendor, because this part of the specification was left out.
My point is…
Now we have Jigsaw addressing a subset of OSGi’s modularization capabilities, which by itself is perfectly fine (even if it is not OSGi what they are going to use). The problem I fear is only that soon there will be more requirements altering the current approach unfitting. In particular, the great benefit of modules should be to be able to reuse them, without the need to understand all their internals. It was stated that OSGi’s restrictive class loading approach is not suitable for many applications. I tend to agree. I experienced this pain and it wasn’t pleasant. I heard that not allowing split packages is not acceptable, because it is a necessity. Why? Is it, because it is a good design or just because current systems are using it? If you want to create modules, you want to make them robust, so except for the exposed (public) API, it shouldn’t matter how it is implemented. If modules are not enforced to have their own class loader, you can never be sure whether or not you’ll have collisions. Yes, this is a pain in the **** to rethink and rewrite existing code! And I also have better things to do, than to retrofit some working libraries – no doubt! The question we should ask ourselves however is what do we really want? A system seamlessly integrating with existing code or provide a new way of thinking which might not be as simple as hoped, but give us what we need in the long run? Do we want reusable entities or a better provisioning system as plain jars? Don’t get me wrong here, I am not saying OSGi is the solution or should be used at all! Just saying that the requirements are dictating what you should get and if you intend to create “real” modules, the lessons OSGi learned are most valuable, even if you are eventually taking a completely different approach. In fact it has its short comings and flaws as any other system has.
OSGi is not perfect either
Working with OSGi for a while, I came across several places, where I noticed the need for improvement. For instance, OSGi provides a pretty strict module system, which when running is well defined. Unfortunately, there is a problem with how to get to this state or even knowing when this state is achieved. The idea behind its model is that using services, there should not be any dependency on the start order, because everything can change at any time. This is a nice idea, but in real world it is impossible to achieve. For instance, I am currently working with embedded devices and due to the limited hardware the start-up can take minutes. Based on the dynamics however, a user might already see a web interface popping up, which seems perfectly usable. However, when using OSGi’s ConfigAdmin service to gather all configured configurations, it can happen that not yet started bundles don’t provide the required information, so the resulting configuration is incomplete. Similar things apply to the start order (no start levels are not sufficient) when using security. It is just not standardized how to ensure that certain bundles have to be loaded before everything else in order to start a particular runtime with security on. OSGi limits itself to define the runtime behavior, leaving out configuration issues when moving from one container to another. Basically the configuration for the start levels of each bundle getting loaded is an implementation detail – doesn’t this look familiar to you when using JEE ;-). Also providing tools to inspect the runtime behavior (how the container resolves dependencies is not sufficiently defined, so you have to jump through a lot of hoops or use implementation details of a particular implementation. Not really helpful/desirable from a tooling provider perspective. I could go one and on, like with every other technology. Work in progress is written over everyone of them. So, you see, there is so much that can still be done to improve our situation. No one is perfect and will ever be. We can just try to do our best to progress and learn from our faults.
Properties of a true module system
Well, after talking so much about what’s not right and what can be improved, I guess it is just fair to also draw a rough picture of a system providing the features of true – meaning reusable – modules. To help me out a little, I took the liberty in quoting one of the experts of componentization/modularization (which is not quite the same). Clemens Szyperski is currently working for Microsoft (I think) and so has no real relation to Java at all, but his observations and remarks hold true even in our space.
„A software component is a unit of composition with contractually specified interfaces and explicit context dependencies only. A software component can be deployed independently and is subject to composition by third parties.“ [Szyperski et. al. 2002, p.41]
… and a more brief definition of required properties.
„The characteristic properties of a component are that it:
• is a unit of independent deployment;
• is a unit of third-party composition;
• has no (externally) observable state.“ [Szyperski et. al. 2002, p.36]
I think this sets the floor for more further investigations. In particular, the citations above are nice, but pretty vague and open to interpretation. For instance, what is “independent” when we’re talking about deployment? What does composition mean in this context? Is it just a descriptor, is it a zip or rpm file containing everything? I don’t know and it doesn’t actually matter I think. What matters is that the resulting system is compelling, concise and valid within itself. Of course, some features seem better than others, but we also need freedom to explore new ways of thinking.
So while thinking about the core features of a (potentially) new module system, for me I identify the following as most important:
- Isolation: When talking about isolation, I am basically talking about trust. Having a module system defining an isolation on module level (which currently can only achieved by having a custom class loader), gives me as the module designer the trust that I can create something without the fear of breaking something on the user side, just because of an implementation detail that is not part of the public API.I can trust that when I have the isolation on module level, I know what will have to deal with. As they say: Good fences make for good neighbors.
- Information hiding: Yes this is as old as it can get in software engineering, but this is the most critical part in so many ways. First of all, I understand modules as a way of abstraction. Zoom out of the implementation details and just focus on the API. That’s what I want! Black Boxes, only with interfaces (or maybe factories visible). Of course, API contracts purely defined in Java Interfaces are not precise enough to provide all required information (like ranges, value lists, limits,…), but is a starting point. The right documentation should or for now has to do the rest.
- Enabling reuse: Currently the silver bullet to decouple is dependency injection. Although a great way of doing so, it is not enforced. Anyone can just programmatically wire up classes. As a result you get bound to an implementation, which again makes reuse and updates way harder. If I know I have to use a certain API, I usually don’t care who is providing the implementation and that’s key for every robust system.
- Predictability: Well, usually even in JEE we assume we know how an application will behave, once deployed, but in reality, it is just not true. Resolution is based on the class path which can contain, depending on where you deploy your application, basically any classes and libraries. Now you have many not easy to manage factors affecting what gets loaded when. For instance there can be multiple logging frameworks in different versions pressent interfeering with the one provided by your application. Depending when they are found in the class path, they might cause problems or not. A deterministic system, that declaratively defines its dependencies will only see the required ones. Everything else is just hidden and serve other applications if they need it – no interference with each other!
- flexible binding, yet safe binding: This is something increasingly important and also not accomplished satisfyingly by any module system I know of – so far. Basically what one wants is to create an application based on the dependencies known and being able to fix later in time appearing problems without the need to redeploy and change the whole application. For instance if a security vulnerability in one module is detected. The latest version should be deployable and tell the runtime that it is a fixed version of some of the existing ones within the runtime, so those versions can be replaced.
- robustness: Currently no in-JVM approach can guarantee any runtime behavior like the consumed memory or the allocated cpu cycles. If you get a malicious module, it can bring down the entire JVM. There are already research projects out there providing such features, so in theory it should be possible to achieve this on JVM level.
Of course, there is more, but I think the list is contains the most important parts. You might have noticed, there is no mentioning of OSGi or the way OSGi is doing it. I believe there is always more than one way to accomplish your goals, so maybe the OSGi community overlooked a possibility – I honestly don’t know. So if you’re able to come up with any solution that fulfills these requirements, I would be more than happy. Maybe and only maybe you can consider looking at some of the approaches the OSGi was taking to tackle parts of these problems.
[Componentization Wars Part II]: http://osgi.mjahn.net/2008/12/04/componentization-wars-part-ii-guerrilla-tactics/
[Szyperski et. al. 2002]: Szyperski, Clemens ; Gruntz, Dominik ; Murer, Stephan: Component Software. Beyond Object-Oriented Programming. Addison-Wesley Professional, 2002 – ISBN 0201745720