Scala Pitfalls For Java Refugees (SP4JR) #0: because what you think you know might be a hammer waiting for a thumb

Java++… or something else?

This article is the first of what should be a really short series, because as you know, Scala is just Java++, and so the differences between the two are almost non-existent.

OK, I don’t think I could have said more wrong things in one sentence, so now, let’s be serious!

Scala is a great language. But at some point, it seems to have been advertised as (only) being a kind of Java++, what is utterly wrong and leads to bad expectations, which in turns leads to bad experiments and disappointment.

There is a swarm of reasons for that superstition, from early advertising by some Scala users, to the same bad advertising from Scala detractors, with the added confusion of a lot of ‘similarities’ between the two languages: both are statically typed, are compiled, run on the JVM, share some keywords, are Object Oriented, and so on.

So… where are the differences?

But when you look at the details, where the devil lies, you will see that there are a lot of differences. Just in the above ‘similarities’, each of them is more a likeness in wording than in the actual concept:

  • they both run on the JVM – and its ‘J’ part is quite heavy. To start with, it is not that friendly with foreign languages, although it’s getting better. Moreover, at the end of the compilation, whatever the language may have been saying, it’s the bytecode which is speaking. And sometimes it may be quite surprising, as we will see about visibility modifiers;
  • they are both Object Oriented, but Scala is both more pure than Java in that respect, and loaded with a lot of new concepts that have their subtleties. Sometimes, to fully understand why you get that NullPointerException, you will have to exactly understand these subtleties;
  • they are both statically typed, but I believe that Java is one of the worst things that happened to static typing, and Scala one of the best… So you will have to unlearn your a priori and learn what is real about static typing again, the first thing being: the type system is your friend, use it to all extents, and it is not something that stands high between your dreamed code and the actual reality;
  • they share some (a lot of) keywords… but when you only know Java you may be surprised by their semantics in Scala, and face unexpected results, just because you haven’t even considered that return is something completely different in each language.

Learning your new hammer

This series of articles will try to highlight some of these differences, so that you may have an idea in a glimpse of what may be unexpected. Of course, that may spoil some fun and WTF moments, so, please do not read the following articles if you want to leave the suspense untouched!

The goal is not to teach Scala: there are a lot of excellent tutorials around the net for that – one really good one among others being Scala For Java Refugees, from which I stole the title. So, a basic knowledge of Scala syntax will be assumed.

Hoping it will help, see you in the first article, SP4JR #1: “you don’t want to use return”.

Set up Eclipse workspace in RAM

In the last article, we covered basic optimization paths for Eclipse. But it remains that Eclipse spend an awful amount of time performing I/O, reading and writing big amounts of (often small) files – and you can’t imagine how numerous theses files are. Of course, there is all the class resources file you are editing for your project, but also all the ones generated by compilation, plus a bunch of index files, VCS indexes, and so on. And it’s worse with Scala, where one Scala file can lead to dozens of compiled class files, and big Aspect weaving indexes are maintained.

So, we need to make all this I/O as fast as possible, and generally the solution to that problem is to have hardware with better performance. Well, if you have a laptop, you see what I mean: hard drives are usually rather bad on them, which is expected: it’s difficult to find a powerful, cheap and battery efficient hard drive…

So, the first solution is of course to upgrade your hard drive, and if you can afford to by an SSD one, this link seems to show what dramatic improvement you can get.

If you can’t, or don’t want to (for example for warranty reasons, or because you have an Apple thing), there is another solution: use all that free RAM you have as an hard drive – it does have good I/O! After all, nowadays, 4 GB of RAM is pretty common on laptops, so why not keep one of them for your workspace? Ah. I see. RAM is also known to be bad at keeping data between reboots. It will have to be dealt with.

So, the following article describes how to set up a RAM disk on a Linux system and configure scripts to automatically synchronize your Eclipse workspace from your hard drive and back into it, so that you don’t loose all your hard work each time you stop your computer.

Before starting

Am I going to loose my work every morning ?

Well, I hope not. I won’t lie and say there is no risk, and it is certainly higher than your hard drive crashing. But you’re using source control, no?

More seriously, there is a risk: with the following solution, your workspace is in RAM, and is only periodically copied back to your hard drive. If your computer is abruptly shut down (for example, your battery is empty), you will loose everything you did since the last synchronization.

Against that, I configured the synchronization frequency to two minutes, so you won’t loose much work. I also never used hibernation for my laptop, as I’m not sure how it will work with a ramdisk.

Another bad thing would be that for some reason, one synchronization goes crazy and start messing everything up. It’s always the “worst case scenario” with synchronization configuration. I believe that everything is done here to avoid that, but who knows where neutrinos choose to go.

You may also encounter some inconvenience at login-time, especially if you start eclipse as soon as login, and have a lot of data in your workspace. The problem here could be that the synchronization is not finished, and so Eclipse sees inconsistent data – for example, a missing project. That one is easy to avoid: just wait a little time after login so that your are sure that all workspace is synchronized into ramdisk.

Lastly, even if it says nothing about the future: I’ve been using this configuration for about 8 months and never lost anything. Nonetheless, I’m more careful for data in the ramdisk than other: I do use a source control tool for most of them, and do make some back-up periodically to everything else.

What expensive software and configuration is needed?

I’m a poor Linux user, with no culture of other exotic OSs like MS Windows or MacOS. So I only know how to configure this solution under Linux. The good news is that it should work on any recent Linux distribution – well, and even not so recent ones, and perhaps on other Unix like OSs with some modifications. And perhaps the huge internet has a similar configuration for that OS you are using.

You will also need:

  • sufficient RAM space available (depends of the size of your workspace, 1GB seems good, even for big projects)
  • rsync (available with your favorite aptitude or yum package manager).

So, let’s start the configuration.

1. Set up the RAM disk

On a recent Linux, it’s trivial. We just have to configure and mount a tmpfs:

  • create a directory that will be our mount point, say in /media:
% mkdir /media/


  • add the tmpfs information in  /etc/fstab:
# ramdisk for eclipse
none /media/ramdisk tmpfs defaults,user,size=1G,mode=0777 0 0

#* /media/ramdisk is the mountpoint previously chosen
#* size=1G is the maximum size that would have the ramdisk
#  it will scale on need until it reaches that limit.
  • mount the ramdisk (it will be automatic on next reboot)
% sudo mount /media/ramdisk


2. Clean your workspace

Your workspace is likely to hold an amazing amount of hard references to files it holds. It’s at least the case for several Eclipse metadata files, in which full paths are stored. Obviously, that is not going to work well if you change the place of the workspace…

So, clean your projects (won’t hurt), and erase Eclipse’s files:

  • remove .metadata directory in the workspace root;
  • remove other project specific files, like .project and .settings

3. Create the synchronization script

So, the goal of the installation is to make your workspace reside in RAM. That seems brilliant, except that RAM is erased on power loss, and you don’t really want to loose all you work each time you stop your computer – even if you make heavy use of an SCM, cloning a git repository every morning is kind of boring. OK, and batteries are not yet that trustworthy, especially on that 5 year-old laptop you happen to use at work.

What we want is a script that automatically copies workspace contents from the hard drive to the RAM disk on start-up, and then save back the contents of the RAM disk to the hard drive, regularly during a session and at its end.

For that goal, we are going to use rsync which is a magical[1] tool with exactly that purpose.

Before putting that script into place, be aware that it is a destructive synchronization: each file not present on the RAM disk workspace will be erased, so don’t ever use your hard drive workspace location to create new files. You must create them into /media/ramdisk/workspace, and the script will copy them back to the hard drive.

Save the following script somewhere, for example in ~/bin/tmpfs_workspace.sh

You should change /bin/zsh to whatever shell you are using, most likely /bin/bash (perhaps it’s even sh compatible, I don’t know, I’m no sysadmin :)

You will also have to change /full/path/to/your/workspace to the actual path to your workspace

#!/bin/zsh
STATIC="/full/path/to/your/workspace"
VOLATILE="/media/ramdisk/workspace"
[[ -r $VOLATILE ]] || install -dm700 $VOLATILE
if [[ -e $VOLATILE/.sync ]]; then
    rsync -av --delete --exclude='.sync' $VOLATILE/ $STATIC/
else
    rsync -av $STATIC/ $VOLATILE/
    touch $VOLATILE/.sync
fi

Change its execution permissions:

% chmod +x ~/bin/tmpfs_workspace.sh


Now, you can test it :

% ~/bin/tmpfs_workspace.sh


The first time, when /media/ramdisk/workspace does not exist (or at least does not contain the .sync file), the content $STATIC directory is copied to $VOLATILE. In other runs, the synchronization will go the other way.

4. Automate synchronization

Now, we want to automate the script call:
- once on login to initialize the RAM disk contents;
- every two minutes or so, to be more or less safe from power problems, thanks to cron;
- at the end of the session

Cron

% crontab -e



Add:

*/2 *   *   *   *    /path/to/bin/tmpfs_workspace.sh


Login/Logout

Again, we are going to create a shell script ~/bin/tmpfs_syc_login_logout.sh

#!/bin/zsh
#login sync
/path/to/bin/tmpfs_workspace.sh
#logout sync
trap "/path/to/bin/tmpfs_workspace.sh" 0

And finally, make that script be called at login time.

I prefer to put it in my graphical session manager, as I don’t want to call the script each time I start a new shell. Of course, it depends upon your graphical manager.

For E17, I created a new application in the configuration manager, set the script as executable for that application and added it to “application to start”.

And that’s it! Enjoy the blazing fastness of every action that used to slowly read on your disk, like opening a file, indexing JDT weaving, etc.

[1] must be, don’t know any program that work so great without a little magic help. Well, or amazing developers, kudos to them.

Java LDAP SDK for SyncRepl replication showcase

Java LDAP reborn

As you may know, I’m rather fond of the LDAP protocol and its open source server and client implementations.

But I’m also found of the JVM, and in a not so far past, the only maintained Java LDAP SDK was Sun’s LDAP-JNDI, which is at best a call for masochists to make them fulfil their perversions.

But that time seems to be far behind, and the Java-LDAP world evolved a lot in the last 5 years, with the creation of two open source LDAP servers fully built in Java (Open DS and Apache DS), the Spring framework sub-project Spring-LDAP, more recently the really good and already mature UnboundID LDAP SDK and finally the ongoing effort from OpenDS/ApacheDS teams to make a new reference Java LDAP API.

That’s nice, because LDAP server implementations are quite mature and really efficient NoSQL stores, in production in the biggest companies, in critical spots. But LDAP is also a really well thought-out protocol, and it’s normalized – something generally missing for other NoSQL stores, and something of much importance: it brings interoperability.

LDAP replication : introducing SyncRepl

OK, that was just a little context to introduce my latest toy application. NoSQL stores are (most of the time) associated with the idea of replication. Until rather recently, replication was a black point in the open source LDAP world, were no real standard had emerged for that need. Each server implementation was using its own proprietary protocol, if replication was available at all. But lately, it seems that OpenLDAP’s replication implementation, SyncRepl is starting to become the defacto standard: ApacheDS chose to use it for it’s own needs.

And that move was possible because, like almost everything in LDAP, SyncRepl is just an extension to the LDAP protocol, normalized in RFC 4533.

ApacheDS’ adoption of SyncRepl is a major news. It means that it opens the way to cross-LDAP server master-master replication, and an even stronger and integrated open source LDAP ecosystem.

But also, it becomes really interesting for third-party clients to use SyncRepl for their read-only synchronization needs. For example, it becomes trivial for an email client to replicate only a sub-part of the LDAP directory which contains contact information and only choose what attributes are needed – but also to stay synchronized with any of their future evolution.

Two Java APIs to implement a SyncRepl client

Well, it’s trivial for the client as soon as the LDAP library used knows how to handle SyncRepl extension. And what is really cool for us, Java users, is that we already have 2 available SDKs which allow that ! When I said at the beginning that things were changing in Java/LDAP world, that wasn’t a lie :)

The two SDKs are ApacheDS’ API, since their server uses SyncRepl for its replication system, and Unbound ID’s LDAP SDK which added it less than ten days ago (ok, after I asked for it’s availability, but the implementation was really fast: thank you Neil Wilson for your hard and great work).

Need for a showcase application

And for things to be really trivial for a client, the best is to have a working example available. It’s what I propose with the following show case application: Syncrepl Web Notifier, in short: Syweno.

Syweno has three goals:

  • see how to use ApacheDS and UnboundID LDAP SDKs to synchronize from a master LDAP server ;
  • define common interfaces and utility tools (an API) on top of the two SDKs to make a client application as easy as possible to implement, which mostly means “hide as much LDAP as we can, and provide a listener kind of interface for the client to process synchronization messages” ;
  • build a little client application that uses that API and visually demonstrates how it works. A web page that allow to start/stop a synchronization and display in real time updates made in the LDAP master is a good candidate.

So, the code source is available on github here: http://github.com/fanf/syweno

And even if the code is the most interesting thing in that show case application, the web part looks like that:

Ah, and before you look at the code of Syweno and start to ask if you just forgot even basic Java syntax, don’t be afraid: it’s coded in Scala (and yes, that runs on the JVM, see the README.txt), using Liftweb framework – Comet is so easy to use with it, it’s not even fun.

Enjoy !

Normation

87 rue Turbigo
75003 Paris

Tel : 01 83 62 26 96
Fax : 01 83 62 29 38
Contact us

Sign up for our newsletter

English
Français

Follow us