Friday, 31 October 2008

Prepare for the Worst but Hope for the Best

When the sun sets tonight, we should expect things to get spooky!

It's best to be prepared for that, so I've been carving my pumpkins early. Be careful what you say, or the cat might get your tongue.

Or, if you've bad, like ignoring your community, demons might rise up from the depths.

We can always stitch you up after you've been torn to pieces, but it's never quite the same.

Maybe I'll finally spot that elusive alien!

I have one pumpkin left to carve and have yet to decide on what the pumpkin wants to become...

Wednesday, 29 October 2008

How Big is an EObject?

I've been thinking a lot the last while about how best to reduce the footprint for a "bare" EObject. The most basic implementation we have is BasicEObjectImpl. It's the abstract base class that all modeled objects should extend; we can and do add methods to InternalEObject so extending this ensures binary compatibility. While BasicEObjectImpl declares no fields, it implements all reusable logic in a highly factored way. EObjectImpl extends it and declares fields for the commonly used features, leaving a properties holder field to point at an instance that in turn declares fields for the less commonly used features. Unfortunately, calling eContents() and eCrossReferences() causes this holder to be fluffed up. Of course derived classes can and do specialize that to instead always create a new list---these lists are just views---and thereby avoid this, but the general problem is that once the properties holder is allocated, it sticks around. For example, when you turn an object into a proxy while unloading a resource, it shows up to hold the proxy URI. Let's take closer peek.

Here's a bit of simple math, based on a 32 bit JVM, for the memory footprint of a bare fully fluffed up EObjectImpl. An object with no fields has 8 bytes of heap overhead. Each int field and each non-primitive field has 4 bytes of overhead. As such, EObjectImpl with its 5fields has minimally 8 + 5 x 4 = 28 bytes. Once you call EObject.eAdapters(), the adaper list is fluffed up. It uses a relatively smart implementation that has no backing array until there is at least one element in the list, but still it has 8 + 4 x 4 = 24 bytes of overhead. Now call EObject.eContents() and EObject.eCrossReferences(). Originally it was assumed that not many clients would use these but in actual fact they ended up being exceedingly powerful and heavily used in the famework itself and by clients. After the calls, the properties holder is fluffed up so add 8 + 6 x 4 = 32 bytes along with, for each list, 8 + 3 x 4 = 20 bytes. Recall the lists are views, so they never change their footprint regardless of the size. Add that all up, and we're at 124 bytes. To put that in perspective, that's as much space as used by a 44 character-length String. Of course I've already pointed out that it's trivial to reduce this to 76 bytes by not caching the contents and cross references lists, but that's still a lot of nuts to stuff in those little cheeks.

I've been thinking a lot about what's the best possible thing we could do to reduce this and came up with the design that's prototyped in 252501. If you want to store data on the object itself---maps could be used, but that will make the footprint worse, not better---minimally you'd need at least one field. That field would reference a structure that holds only the necessary fields. You could trivially use an array, but casting is very expensive. You'd be shocked if you measured it. Unfortunately most people don't measure performance of micro operations and simply assume something that looks trivial is also trivially cheap. Wrong. An array is also not ideal for storing primitives. Ideally you'd want something from which you could fetch the data without down casting from Object. In any case, fewer trips to empty out the cheeks would be good.

Imagine instead having a class for each combination of fields you require. You'd need quite a few of them, unfortunately, but that would be space optimal while also avoiding casting. When you needed to set a field, you'd create whatever class instance is required to hold that field along with the other fields already set, or, when unsetting a field, you'd create a smaller instance without that field. The thought of writing all those classes is unbearably stupid, so of course generating them is the order of the day; work smarter, not harder. That way you can easily modify the pattern without the mind numbing tedium of writing 2^6 classes by hand. I'm pretty happy with the result, though not so happy with the 75K it adds to the jar. I need to think if there are ways to reduce the amount of byte code...

Some cool things that came out of this is the fact that the eDeliver flag, i.e., whether or not the notifications will be produced, does not require any storage; the implementation class simply returns true or false from its hard coded method as appropriate. The patch in the bugzilla shows the templated used to generate this and CompactEObjectImpl shows the template's result as well as how it's used to fully implement a modeled object. Note that I hacked DynamicEObjectImpl to use this new base class only so I could run the core test suite to verify CompactEObjectImpl's correct behavior. It's a good feeling to know all is well.

In combination with this storage approach, I've also created a new array-backed list implementation that avoids ever modifying the backing array; it allocates an array of exactly the required size and never modifies it after populating it correctly. Now, instead of caching a list implementation for the eAdapters, I can cache only the array of adapters themselves, which of course is null for an empty list of adapters. Add to that the trick of checking, when the adapters array is replaced with a new one, if that replacement is equal to the one held by the container, and caching the container's instance instead, and we've ensured that a tree of objects, where each object has the same list of adapters, uses a single shared array instance. This is particularly valuable for ChangeRecorder, EContentAdapter, ECrossReferenceAdapter, and any adapter, like UML's CacheAdapter, that uniformly adds itself to the entire tree of objects. Are you all perked up for the final result?

If we do the same calculations for this implementation, a fully fluffed up CompactEObjectImpl takes only 8 + 1 x 4 = 12 bytes of storage. I think that's impossible to beat. I know of a large company that will be very happy with this. Isn't open source grand? If we could just reduce the byte code for all those darned storage subclasses, without paying the cost of casting, I'd be ecstatic. Anyone out there with creative ideas? A contribution would make a nice birthday present. Speaking of which, happy birthday Darin.

Thursday, 23 October 2008

Service and Support: The Fuel for Your Ecosystem

I watched Mike's recorded presentation about software ecosystems the other day; I didn't have time to watch it while in Germany but it's definitely worth watching if you haven't already. Whether you're a business person or a developer, it provides a useful perspective of the future. It's closely related to the keynote Ralph presented at MDSD about open collaboration; I think someone might have recorded it, so I'll try to find a link.

I view EMF as a mini-platform in a sense very similar to the Eclipse platform as a whole, as Mike described, as well as a vehicle for open collaboration, as Ralph described. I've tried very hard to make EMF both tightly integrated with Eclipse, as well as a platform that can be used independent of Eclipse. After all, modeling isn't just for implementing tools, it's also for supporting runtimes, so it's important to penetrate all runtimes where Java is a key part of the solution. I've also tried hard to build an open collaborative ecosystem with things like EMFT and the Modeling Project as a whole. I feel that it's been pretty successful because a great many people are involved. Although pride is a sin, I take great pride in the fruit of the community's labor.

Having a world-class platform is a key ingredient for success, but social aspects are far more important than folks tend to think, especially us techies. I've often been guilty of thinking of politics as a four letter word, but I'm starting to come around in my advanced age (my birthday is next week) to seeing it in a different light. Politics is sociology and it pervades everything we do as social creatures, whether we like to admit that or not. It's simply neutral and can be applied for bad ends or for good ends. So don't fool yourself into thinking that a purely technical discussion can avoid involving its roots in our social underpinnings.

Antoine opening bugzilla 237041 and the comments in the EMF newsgroup about it being the best newsgroup and about how others might learn from its example prompted me to climb on my high horse and give a bit of advice. In addition to building quality software, there is one other key ingredient to success: showing respect for your community. How you respond to their problems is paramount in this regard, so don't treat them like mushrooms.

When your community reports a problem, fix it as soon as possible. Be sure to make a distinction between wish list items (I want more goodness) and actual defects (it's broken because it's not working the way it was designed to work). A list of 1000 bugzillas that looks to the community like a list of 1000 defects rather than like a list of 1000 wishes is bad publicity. Having no responses in most of those reports, is also bad news. Your community will interpret it as a lack of respect, and no pleas about your lack of resource will help offset that negative perception. An example of where I've failed is 243432; not responding to someone looking to contribute is inexcusable, but I have so little time!

When your community asks questions, answer them as soon as possible. Like Antoine, I prefer people ask for help in the newsgroup rather than using my personal mail, and for goodness sake, not on mailing lists because that effectively get pushed into my personal mail. I believe no question should go unanswered so I'm happy that folks like Eike Stepper, Martin Taal, and Christian Damus feel the same way. Of course all that support is a lot of work, and some people think I'm a pretty crazy guy to spend all that time on "menial" activities. Too bad for their misguided thinking, because I'm quite sure it's a key to success. Don't leave your community out in the cold.

Here's a simple process you can follow for effectively and quickly dealing with your newsgroups:
  • Take a glance to see if someone else with appropriate skills is needed for the topic; in my case Eike, Martin, or Christian.
  • If not, hit reply.
  • Type in the person's name at the top to make it personal; try not to misspell, because it's rude.
  • Type in "Comments below."
  • Start reading and whenever a thought jumps to mind, type it in right then and there.
  • Upon getting to the end, hope the question has been answered.
  • If not, ask lots of questions so the person asking the questions has to answer yours instead.
  • Rinse, lather, repeat.
  • Avoid infinite loops.
A step that I've missed is to incorporate the answers into the FAQ or some type of wiki recipe; but I have so little time! Don't repeat my mistakes, but do repeat this process for as many newsgroups as possible, because Eclipse's overall success will be reflected in your personal success. Your dedication to high-quality service and support is the fuel for your ecosystem and the success of that ecosystem is the most effective way to achieve maximum influence with your limited individual capacity.

Wednesday, 22 October 2008

What's All This Fuss About Modeling?

Just recently I blogged about how CDO mismanagement led to the world's financial crisis. There's another version explaining the CDO fiasco that's even more facetious than my blogs, so don't read it if expletives offend you. Apparently Eike Stepper has been a really busy guy, because I also discovered that he's caused security problems at Microsoft in an article brought to my attention by Goggle Alert. Check out this part of the article:
Meanwhile, the lone moderate item is highly technical, involves only XP SP3 and deals with a potential information disclosure exploit in Microsoft Office that can be triggered through the use of a specially crafted Connected Data Objects, or "CDO," URL. With CDO, programmers can upgrade and enhance a code-building facility called the Eclipse Modeling Framework for runtime support using Java or XML. This is a back-end vulnerability that an egghead hacker could really have fun with just to be mischievous, experts say
I would normally be extremely happy to discover that Microsoft is using Eclipse technology in Microsoft Office, but I don't really appreciated CDO's users being characterized eggheads nor having EMF blamed for Microsoft's security problems, though it's nice to know clients could really have fun with it. I'm thinking about terminating Eike's project before he does any more damage to our reputation. I'm tired of him snowing on my roses.

I wonder if perhaps someone did a Google search on CDO, not being sure what it stands for, and ended up at Wikipedia's CDO link, which naturally lead them to the Connected Data Objects link, and from there they jumped to a questionable conclusion. Sadly it's yet another example of misconceptions around modeling, as if I didn't list enough of them already. You'd think the bright ideas would stand out better against the dull background.

Don't forget to register for Eclipse Summit Europe and also don't forget to book the hotel when you do; Nestor Hotel is nice and close so I'd suggest booking as soon as possible. I'll be doing my "stupid modeling" talk at ESE; I'm sure you won't want to miss something so unbearably stupid! I'm really looking forward to the symposia too. I don't know how I'll be able to divide my time between the modeling symposium, where we're bound to create some unbearably stupid new misconceptions, and the runtime and e4 symposia. Oh my goodness, I just noticed there are auto and banking symposia as well. That's way too much goodness for one day, especially when modeling is important for all of them! I'm going to have a chat with Ralph about the fact there is simply too much good content being crammed into too little time. Such a bountiful fountain is just unacceptable for the indecisive among us!

The ESE banking symposium reminds me of Eclipse Banking Day in New York. It's planned for December 9th. The agenda is almost complete. Don't forget to register if you plan to attend. There will be lots of good modeling content, and Jeff and Jochen will be there too. I can't wait to find about about what UBS and Morgan Stanley have been doing! Apparently they've looked beyond the all misconceptions, so I'm going to investigate what type polarizing filters they have in their glasses to block out the harsh glare.

Tuesday, 21 October 2008

Hand-written and Generated Code: Never the Twain Shall Meet

There are many tools these days that generate code. Before writing such a tool, stop to consider if you really should be generating code in the first place. After all, you're generating code from a model---you might not think of it as a model, but that's what I call it because that's what it is---so if you have enough information to generate the code that realizes the behavior described by the model, you obviously have enough information to emulate the behavior of the model. Byte code has a cost. It causes bloat. Don't produce it if you don't have to. EMF, for example, can emulate an instance of an Ecore model, including a fully functional editor, without generating a single line of code; just trying invoking "Create Dynamic Instance..." on any EClass' pop-up. It's a cool thing.

If you do have a good reason to generate code, keep in mind that humans will read it. Hand writing bad code is unacceptable, but generating bad code is completely inexcusable. Have you ever seen generated code where every referenced class name is fully qualified? It's clearly the simplest way to avoid name collisions, but it seems disrespectful of the human reader. Generating code that isn't of hand-written quality gives generators a bad reputation so focus on creating a thing of beauty.

In the ideal world, generated code would be complete. It would never need to be sullied from its untouched pristine state. Technically, you would not even need to put it under source code control because of course you can always regenerate it. You'd want to be very careful to version the generator in that case though. And keep in mind that if you don't version control the generated code, all your clients will need to install the right version of the generator tools simply to produce a functional code base. Also, it will be more difficult to detect when changes in the generator produces code that's different from what you've been testing. Treating generated code as if it's ephemeral has definite appeal, but is something to consider carefully.

You've probably noticed that the world is typically not quite ideal, and often far from it. So it's often the case that clients need to tailor what's generated. Sometimes that's even the whole point: the generated code is just scaffolding or a starting point from which to hand code a complete application. It's typically important to be able to invoke the generator again if the input model for it changes. Because many generators will simply overwrite any files they generated that last time, keeping hand written changes separate is obviously important in that case. But many generators also support protected regions where users can write their code such that it will not be overwritten. EMF takes this design to the extreme, effectively inverting it, by marking all the regions that the generator may touch. I like to think it's a bright idea.

There are those who believe that one should never modify generated code. I'm not one of those people, though there are clear advantages to avoiding it. For example, it's really easy to see what you've written yourself verses what was generated for you. JDT's support for filters mitigates that advantage by supporting the same thing dynamically, i.e., hides everything marked @generated. More importantly, it's possible to delete all the generated stuff to do a clean sweep. That's probably the strongest reason. On the downside, more classes result in more bloat. Even an empty class will take close to 0.5k. Worse yet, if you can't anticipate which files a user will wish to specialize, you're liable to double the number of classes. For example, in the implementation of MOF that preceded EMF, for every EClass Foo, it would generate FooGen, Foo, FooGenImpl, and FooImpl, where Foo extends FooGen, FooGenImpl implements Foo, and FooImpl extends FooGenImpl and implements Foo. The whole design caused significant bloat and just looked very stilted; even in the public API was very clearly tainted by the fact a generator was being employed. It's import to realize that small droplets of bloat will tend add up...

So while some will argue that when it comes to hand written and generated code, never the twain shall meet. I think it's important to keep in mind that, as with most things in life, there are trade-offs to our design decisions . As such, it's more important to explain and understand all the considerations that should be taken into account when making a choice than it is to decide which specific choice is a best practice in general. After all, EMF's generator model generates both the Ecore model and itself, so we're not actually in a position to delete our generated code. We need it to bootstrap the environment. It's prickly problem.

So while it's often a good practice to separate generated code from hand written code, and it's not necessary to version control generated code in that case, these decisions come at a price.

Sunday, 19 October 2008

A Whirlwind Trip to Germany

Last Saturday I began my whirlwind trip to Germany. My flight left Toronto at 5:30PM and arrived in Frankfurt at 7:ooAM . The place was fogged in so badly I couldn't see the runway, or anything else for that matter, until we actually touched down. That was very disconcerting. I'm always surprised that customs in the EU is organized with a separate queue per server given it's a well established fact that a single queue with multiple servers has a better average- and worst-case wait times. The Germans have such an organized and precise society---the wine glasses all have a 0.2L mark---you'd just assume this would have been optimized long ago.

I had to purchase tickets for the train, so I had to get in line again; a single queue, thank goodness. I didn't know how to use the automated machines; although it's very handy to have the British flag to switch to English, but when all the buttons are labeled in German, that doesn't help nearly as much as you might think. My line moved so slowly that I estimated the arrival at the front of the queue would take more than 1/2 an hour, so the first class lineup became way too attractive to overlook. As such, I queue jumped and was on the desired train five minutes later, headed for Köln where I transferred onto a train to Dortmund. Sightseeing from the train was quite enjoyable; the photos end up with interesting reflection effects.

From Dortmund, I took a taxi to my final destination of Hotel Drei Linden in Lünen; a scrum discussion among eight drivers was needed to determine what to enter into the GPS finder in order to guide the taxi to the final destination. Such helpful people!

Drei Linden is a very quaint hotel, but it turns out there is no internet access on the third floor; keep that in mind if you book there! I immediately started to feel a bit of shortness of breath after having been without internet access already for more than twelve hours. I'd hoped I could find an internet cafe, but apparently Germany doesn't consider Sunday shopping a good thing, so along with the other pedestrian traffic that fine Sunday afternoon, I just pressed my nose to the glass. Lünen is
a very photogenic place. I used the tall steeples to avoid getting lost.

It was a super nice warm day for so late into October which made exploring that much more fun. There were so many quaint sights to see along the river.

Even the small details were very pretty.

The old and the new often stood side-by-side in sharp contrast.

I just love pedestrian street malls, especially when the architecture is so stunning.

I had a nice dinner on the sidewalk cafe in front of the hotel. I had to order from a German menu with no translation help because none of the folks spoke a word of English; thank goodness I can read Dutch well enough to recognize the German/Dutch word for shrimp. After dinner, I headed off to the ice cream shop I'd spotted earlier in the day. On the way back, the sun set on my first day in Germany, as internet withdrawal, and the associated inability to write a blog, took a heavy toll.

On Monday morning, Wolfgang picked me up at the hotel and we headed to itemis' main site in Lünen. This is the view of the steeple from the hotel on that gorgeous fall morning.

This is the building itemis is using right now. Notice the unusual shadow on the building?

From within the building, looking out the circular window, you see this.

At NASA they might build space ships, but in Lünen saucers come in for a landing.

I even got to tour the inside of it, but as with NASA, the aliens were all too elusive.

Here's the new building into which itemis will move when it's completed. It's seen with the footing of the saucer in the foreground. It looks like a land-walker from Star Wars.

Everywhere I go, folks seem to be obsessed with guns! Here's the weapon of choice at itemis: a missile launcher.

Peter wasted no time showing me how to lock and load the lethal weapon.

Wolfgang demonstrated an alternative high-powered firing technique.

Meanwhile Jens maintained an aloof facade. It's always the quiet ones who should concern you the most!

Itemis functions a little differently than most organizations. Here's a good way to look at it. Most companies will come up with a strategy and then hunt down the skills and resources to carry it out. Itemis tends to look at the skills and resources it has and builds a strategy around those. It's kind of a novel approach, although seemingly obvious. As a result, the focus is on the strengths of each individual. Like Google, they also believe in encouraging people to innovate on company time, so one day a week, you can work on whatever you like. I expect many good things to come from collaborating with itemis!

On Tuesday night, it was time to head out to Hamburg for MDSD, where I stayed at the Empire Riverside Hotel. It was fantastic and, most importantly, it had internet! Axel Uhl of SAP was staying there as well, so the next morning, he and I had breakfast and a chance to chat for a few hours before the start of the workshop.

The facilities for the workshop were very nice and a great group of people attended.

I had the honor of giving the first keynote presentation on the first day. I.e., what I've been calling my "Stupid Modeling talk." I think it went well. For the rest of the conference people felt the need to apologize each time they used the word "meta" so I felt kind of bad.

My talk was followed by Axel's keynote, which I thought was simply excellent. I've never seen him present before and he's a very good presenter. He talked about all the outstanding issues that modeling needs to address on the journey to wide-spread mainstream adoption.

In the afternoon the talks were in German, so mostly what I got out of that was "blah blah blah swim lanes blah blah blah framework blah blah blah EMF blah blah blah." Because I understand Dutch, when I listen to German I feel like I should understand every word, but I can only pick out the words that match Dutch words and the words borrowed from English.

The whole group had a very nice dinner that night at La Dolce Vita in Elmshorn. Here are Sven and I enjoying our red wine:

Later that evening, I met up with Ralph Müller for a drink and a chat at the bar in my hotel. The next morning I did my two hour introduction to EMF.

After that, Ralph gave a great keynote presentation about the Eclipse's open source business model. The Eclipse story is very compelling and Ralph does and excellent job bringing it to life.

In the afternoon, Jan was supposed to present his tutorial on GMF, but he was sick, so I didn't get a chance to see him on this trip. Instead, Robert Wloch, also from itemis, presented the tutorial. Robert did a good job presenting a very complex topic. I was a little frustrated when I tried to follow the tutorial steps, substituting my very simple tree model, and GMF didn't produce a working result.

I finally got to bed at a decent hour that night so the next morning I was all fresh and ready for Sven and Peter's tutorial about Xtext. This was the presentation I enjoyed the most at the workshop. It's quite a complex topic, but by presenting it by way of a complete end-to-end example illustrating the development of a simple textual DSL for state machines, it was made to seem simple. Of course I'm particularly interested in textual DSLs, because XML has worn very thin for me, so I just loved this talk.

In the afternoon, Arno Haase presented Xpand and Xtend.

The combination of Xpand and Xtend has some distinct advantages over JET, particularly Xpand's decomposition into smaller components and the associated aspect-oriented extensibility of those components. I have this obsession about needing the templates to produce exactly the format I want without need for a formatter on the results, so I think Xpand needs a tiny bit of work to make it perfect in my eyes. Arno had a bit of help from Karsten who did all the driving leaving Arno free to present; Karsten was kind enough to order for me the book "Pillars of the Earth" which he says is the best book he's ever read. Isn't that nice!

I think everyone agreed it was a great workshop. I certainly enjoyed it a very much. I want to thank Simon Zambrovski in particular for helping to organize such an excellent event, for shuttling me between the train in Hamburg and the Nordakademie in Elmshorn.

My plane back to Toronto via Frankfurt left at 8:15AM, so I had to get up bright and early. There were some real clowns on the plane!

Just twelve hours later, I was home in time to enjoy a beautiful afternoon in the garden.

I'm already looking forward to my next trip to Germany for ESE and to Paris for MD Day.