Monday, October 3, 2011

How to get C like performance in Java


Java makes high level programming simpler and easier at the cost of making low level programming much harder. It has many areas which can be slow. However for every problem there is a solution. Many solutions/hacks require working around Java's protections but if you need low level performance it is still possible.

One way to get C-like performance is to use C via JNI for key sections of code. If you want to avoid using C or JNI there are still ways you can get the performance you want.

 Fast array access

One of the area where Java can be slower is array access. This is because it implicitly does bounds checking. The JVM is smart enough to optimize checks for loops by checking the first and last element, however this doesn't always apply.

One work around is to use the Unsafe class (which is only available on some JVMs, OpenJDK JVMs do) This class has getXxxx() and setXxxx() for each primitive type and gives you direct access to an object, array or direct memory where you have to do the bounds checking. In native code, these are compiled to single machine code instruction. There is also a getObject(), setObject() methods however I suspect that they don't provide as much of a performance improvement (by the time you access the Object as well)

You can check the native code generated for a method by downloading the debug version of the OpenJDK and getting it to print the compiled native code.

Arbitrary memory access

You can use the Unsafe class again for arbitrary access, however a "friendlier" way is to use a DirectByteBuffer and change its address and limit as desired (via reflection or via JNI) This will give you a Buffer which points to a random area of memory such as device buffer.

Using less memory

This is not as much of an issue as it used to be. A 16 GB server costs $1000 and a 1 TB server costs about $70K.

However, cache memory is still a premium and for some applications and its worth cutting memory consumption. A simple thing to do is to use Trove which support primitives in collections efficiently. If you have a large table of data, you can store data by column instead of by row (if you have lots of rows of data, and a few columns). This can improve caching behaviour if you are scanning data by field but don't need all fields.

You can also use Direct memory to store data how you wish. This is what the BigMemory library uses.

Stream based IO is slow and NIO is a pain to use

How can use you have the best of both worlds? Use blocking IO in NIO (which is the default for a Channel) Don't use Selectors unless you need them. In many cases, they just add complexity. Most systems can handle 1K-10K threads efficiently. If you need more connections than that, buy another server, a cheap one cost about $500.

Fast Efficient Strings

Java 6 update 21 has an option -XX:+UseCompressedStrings which can use byte[] instead of char[] for the strings which don't need 16-bit characters. For many applications this saves memory but is slower. (5%-10%)

Instead you can use your own Text type which wraps a byte[], or get you text data from ByteBuffer, CharBuffer or use Unsafe.

Faster Startup times

Java tends to have slow startup times when you load in lots of bloated libraries. If this is really a problem for you load less libraries. Keeping them to a minimum is good practice anyway. Do this and your startup times will be a few seconds (not as fast as C, but likely to be fast enough)

Less GC pauses

Most Java libraries create objects freely and generally this is not a problem.

However this doesn't mean you can't pre-allocate your objects, use Direct ByteBuffers and Object recycling techniques to minimise your object creation. By increasing the Eden size you can have an application which rarely GCs. You may even reduce it to one GC per day (say as a scheduled over night job)
 

Friday, September 23, 2011

Linux Boot Process


The following are the 6 high level stages of a typical Linux boot process.
1. BIOS
  • BIOS stands for Basic Input/Output System
  • Performs some system integrity checks
  • Searches, loads, and executes the boot loader program.
  • It looks for boot loader in floppy, cd-rom, or hard drive. You can press a key (typically F12 of F2, but it depends on your system) during the BIOS startup to change the boot sequence.
  • Once the boot loader program is detected and loaded into the memory, BIOS gives the control to it.
  • So, in simple terms BIOS loads and executes the MBR boot loader.
2. MBR
  • MBR stands for Master Boot Record.
  • It is located in the 1st sector of the bootable disk. Typically /dev/hda, or /dev/sda
  • MBR is less than 512 bytes in size. This has three components 1) primary boot loader info in 1st 446 bytes 2) partition table info in next 64 bytes 3) mbr validation check in last 2 bytes.
  • It contains information about GRUB (or LILO in old systems).
  • So, in simple terms MBR loads and executes the GRUB boot loader.
3. GRUB
  • GRUB stands for Grand Unified Bootloader.
  • If you have multiple kernel images installed on your system, you can choose which one to be executed.
  • GRUB displays a splash screen, waits for few seconds, if you don’t enter anything, it loads the default kernel image as specified in the grub configuration file.
  • GRUB has the knowledge of the filesystem (the older Linux loader LILO didn’t understand filesystem).
  • Grub configuration file is /boot/grub/grub.conf (/etc/grub.conf is a link to this). The following is sample grub.conf of CentOS.
#boot=/dev/sda
default=0
timeout=5
splashimage=(hd0,0)/boot/grub/splash.xpm.gz
hiddenmenu
title CentOS (2.6.18-194.el5PAE)
          root (hd0,0)
          kernel /boot/vmlinuz-2.6.18-194.el5PAE ro root=LABEL=/
          initrd /boot/initrd-2.6.18-194.el5PAE.img
  • As you notice from the above info, it contains kernel and initrd image.
  • So, in simple terms GRUB just loads and executes Kernel and initrd images.
4. Kernel
  • Mounts the root file system as specified in the “root=” in grub.conf
  • Kernel executes the /sbin/init program
  • Since init was the 1st program to be executed by Linux Kernel, it has the process id (PID) of 1. Do a ‘ps -ef | grep init’ and check the pid.
  • initrd stands for Initial RAM Disk.
  • initrd is used by kernel as temporary root file system until kernel is booted and the real root file system is mounted. It also contains necessary drivers compiled inside, which helps it to access the hard drive partitions, and other hardware.
5. Init
  • Looks at the /etc/inittab file to decide the Linux run level.
  • Following are the available run levels
    • 0 – halt
    • 1 – Single user mode
    • 2 – Multiuser, without NFS
    • 3 – Full multiuser mode
    • 4 – unused
    • 5 – X11
    • 6 – reboot
  • Init identifies the default initlevel from /etc/inittab and uses that to load all appropriate program.
  • Execute ‘grep initdefault /etc/inittab’ on your system to identify the default run level
  • If you want to get into trouble, you can set the default run level to 0 or 6. Since you know what 0 and 6 means, probably you might not do that.
  • Typically you would set the default run level to either 3 or 5.
6. Runlevel programs
  • When the Linux system is booting up, you might see various services getting started. For example, it might say “starting sendmail …. OK”. Those are the runlevel programs, executed from the run level directory as defined by your run level.
  • Depending on your default init level setting, the system will execute the programs from one of the following directories.
    • Run level 0 – /etc/rc.d/rc0.d/
    • Run level 1 – /etc/rc.d/rc1.d/
    • Run level 2 – /etc/rc.d/rc2.d/
    • Run level 3 – /etc/rc.d/rc3.d/
    • Run level 4 – /etc/rc.d/rc4.d/
    • Run level 5 – /etc/rc.d/rc5.d/
    • Run level 6 – /etc/rc.d/rc6.d/
  • Please note that there are also symbolic links available for these directory under /etc directly. So, /etc/rc0.d is linked to /etc/rc.d/rc0.d.
  • Under the /etc/rc.d/rc*.d/ direcotiries, you would see programs that start with S and K.
  • Programs starts with S are used during startup. S for startup.
  • Programs starts with K are used during shutdown. K for kill.
  • There are numbers right next to S and K in the program names. Those are the sequence number in which the programs should be started or killed.
  • For example, S12syslog is to start the syslog deamon, which has the sequence number of 12. S80sendmail is to start the sendmail daemon, which has the sequence number of 80. So, syslog program will be started before sendmail.
There you have it. That is what happens during the Linux boot process.





Nine traits of the veteran Unix admin

I came across a very nice article about the Unix admin traits written by Paul Venezia in Infoworld.com. Sharing the entire article as it is. Follow this field guide if you want to understand the rare and elusive hard-core Unix geek.

Veteran Unix admin trait No. 1: We don't use sudo
Much like caps lock is cruise control for cool, sudo is a crutch for the timid. If we need to do something as root, we su to root none of this sudo nonsense. In fact, for Unix-like operating systems that force sudo upon all users, the first thing we do is sudo su - and change the root password so that we can comfortably su - forever more. Using sudo exclusively is like bowling with only the inflatable bumpers in the gutters -- it's safer, but also causes you to not think through your actions fully.


Veteran Unix admin trait No. 2: We use vi, not emacs, and definitely not pico or nano
While we know that emacs is near and dear to the hearts of many Unix admins, it really is the Unix equivalent of Microsoft Word. Vi -- and explicitly vim -- is the true tool for veteran Unix geeks who need to get things done and not muck about with the extraneous nonsense that comes with emacs. Emacs has a built-in game of Tetris, for crying out loud.

I'll grudgingly admit that the bells and whistles in vim such as code folding and syntax highlighting might be considered fluff, but at the end of the day, real Unix work blends extremely well with vi's modal editing concepts. In addition, its svelte size and universal portability make it the One True Editor. Thanks Bill, thanks Bram.

Veteran Unix admin trait No. 3: We wield regular expressions like weapons
To the uninitiated, even the most innocuous regex looks like the result of nauseous keyboard. To us, however, it's pure poetry. The power represented in the complexity of pcre (Perl Compatible Regular Expressions) cannot be matched by any other known tool. If you need to replace every third character in a 100,000-line file, except when it's followed by the numeral 4, regular expressions aren't just a tool for the job -- they're the only tool for the job. Those that shrink from learning regex do themselves and their colleagues a disservice on a daily basis. In just about every Unix shop of reasonable size, you'll find one or two guys regex savants. These poor folks constantly get string snippets in their email accompanied by plaintive requests for a regex to parse them, usually followed by a promise of a round of drinks that never materializes.


Veteran Unix admin trait No. 4: We're inherently lazy
When given a problem that appears to involve lots of manual, repetitive work, we old-school Unix types will always opt to write code to take care of it. This usually takes less time than the manual option, but not always. Regardless, we'd rather spend those minutes and hours constructing an effort that can be referenced or used later, rather than simply fixing the immediate problem. Usually, this comes back to us in spades when a few years later we encounter a similar problem and can yank a few hundred lines of Perl from a file in our home directory, solve the problem in a matter of minutes, and go back to analyzing other code for possible streamlining or playing Angry Birds.


Veteran Unix admin trait No. 5: We prefer elegant solutions
If there are several ways to fix a problem or achieve a goal, we'll opt to spend more time developing a solution that encompasses the actual problem and preventing future issues than simply whipping out a Band-Aid. This is related to the fact that we loathe revisiting a problem we've already marked "solved" in our minds. We figure that if we can eliminate future problems now by thinking a few steps ahead, we'll have less to do down the road. We're usually right.


Veteran Unix admin trait No. 6: We generally assume the problem is with whomever is asking the question
To reach a certain level of Unix enlightenment is to be extremely confident in your foundational knowledge. It also means we never think that a problem exists until we can see it for ourselves. Telling a veteran Unix admin that a file "vanished" will get you a snort of derision. Prove to him that it really happened and he'll dive into the problem tirelessly until a suitable, sensible cause and solution are found. Many think that this is a sign of hubris or arrogance. It definitely is -- but we've earned it.


Veteran Unix admin trait No. 7: We have more in common with medical examiners than doctors
When dealing with a massive problem, we'll spend far more time in the postmortem than the actual problem resolution. Unless the workload allows us absolutely no time to investigate, we need to know the absolute cause of the problem. There is no magic in the work of a hard-core Unix admin; every situation must stem from a logical point and be traceable along the proper lines. In short, there's a reason for everything, and we'll leave no stone unturned until we find it.

To us, it's easy to stop the bleeding by HUPping a process or changing permissions on a file or directory to 777, but that's not the half of it. Why did the process need to be restarted? That shouldn't have been necessary, and we need to know why.

Veteran Unix admin trait No. 8: We know more about Windows than we'll ever let on
Though we may not run Windows on our personal machines or appear to care a whit about Windows servers, we're generally quite capable at diagnosing and fixing Windows problems. This is because we've had to deal with these problems when they bleed over into our territory. However, we do not like to acknowledge this fact, because most times Windows doesn't subscribe to the same deeply logical foundations as Unix, and that bothers us. See traits No. 5 and 6 above.


Veteran Unix admin trait No. 9: Rebooting is almost never an option
Unix boxes don't need reboots. Unless there's absolutely no other option, we'll spend hours fixing a problem with a running system than give it a reboot. Our thinking here is there's no reason why a reboot should ever be necessary other than kernel or hardware changes, and a reboot is simply another temporary approach to fixing the problem. If the problem occurred once and was "fixed" by a reboot, it'll happen again. We'd rather fix the problem than simply pull the plug and wait for the next time.



If some of these traits seem antisocial or difficult to understand from a lay perspective, that's because they are. Where others may see intractable, overly difficult methods, we see enlightenment, born of years of learning, experience, and most of all, logic.


Courtesy: http://www.infoworld.com/t/unix/nine-traits-the-veteran-unix-admin-276