Single, Double And Multiple Dispatch

This post has already been published on code::gallery blog which now has been merged into this blog.

These are mechanisms in object oriented programming languages to identify the funciton/method to be invoked. The dispatch in the nomenclature is about dispatching messages to objects, as it is said in Smalltalk. It is equivalent of saying invoking methods of an object.

Single Dispatch

Typically, multiple methods or functions are given the same name, because the represent the same purpose. In the single dispatch mechanism, the method to be invoked in determined using the object, usually type of the object, on which it is invoked. It also includes the parameters, but the parameter types are identified at the compile time whereas dynamic binding or dynamic dispatch can be used for object on which method is invoked. This object is also syntatically highlighed, like obj.behave(the, arguments).

Most of the conventional and popular languages, like C++, Java or Smallatalk inherently support single dispatch mechanism.

Double Dispatch

Why is anything more than single dispatch is being considered? Because in the real world it is required. In the real world, the behavior between two objects is not dependent on both of them and not just one. Lets consider an example.

Your behavior would change when you face other humans, the domestic cat or the tiger. This means that your actions are dependent not only on you but also on whom you face. This cannot be incorporated using the single dispatch mechanism.

So we come up with double dispatch. It is in fact a simulation using the single dispatch mechanism, and hence is not completely extensible. Consider the following example code:

class Human;
class Cat;
	
class Animal
{
    public:
        virtual void face (Animal& animal);
        virtual void face (Human& human);
        virtual void face (Cat& cat);
}
	
class Human : public Animal
{
        virtual void face (Animal& animal)
        {
            animal.face(*this);
        }
	
        virtual void face (Human& human);
        {
            // shakehand
        }
	
        virtual void face (Cat& cat);
        {
            cat.face(*this);
        }
}
	
class Cat : public Animal
{
        virtual void face (Animal& animal)
        {
            animal.face(*this);
        }
	
        virtual void face (Human& human);
        {
            human.face(*this);
        }
	
        virtual void face (Cat& cat);
        {
            // run
        }
}

This code works. What is done here is that two calls are used to identify both the types involved. Consider this code:

Animal& acat = Cat();
Animal& ahuman = Human();
ahuman.face(acat);

Here, when ahuman.face(acat) is invoked, it in turn invokes Cat::face(Human&) at which point both the types are determined. This is the double dispatch mechanism.

However, as you can see, the biggest disadvantage is that the base class Animal has to know all the derived classes. Everytime a new animal is added, the interface of Animal has to change making it impractical, exactly what the Dependency Inverstion Principle advises us to avoid.

Multiple Dispatch

So we need multi dispatch, also called multimethods. The multidispatch mechanism considers all parameters equally and hence can provide easier and more extensible implementations. Some of the languages that support multiple dispatch are Common Lisp, Dylan, Nice, Scheme and Slate.

One of the common designs of multiple dispatch is to separate the methods from the class (which contains the structure). This allows for treating all the parameters equally.

Some of the conventional languages also support multimethods through extensions – Multimethods for Python, Multimethods for Perl, MultiJava, Ruby, C++ with Multimethods.

More reading:

Technorati tags: , , ,

Copyright Abhijit Nadgouda.

Overcautious Coding

This post has already been published on code::gallery blog which now has been merged into this blog.

Michael Feathers illustrates nicely the problem of overcautious coding. The following should probably be coding principles:

Spurious null checks are a symptom of bad code.

The reactive thing whenever there is a core dump or crash because of nulls or empty pointers is to add a check of null and wash your hands off it. The right thing would be to find out origin of the null and try to avoid it as much as possible. It is not only that the if-endif can cause performance loss, it can inadvertently cover up the malicious code that generates the null.

It is also easier to program to an interface where the callee does not throw nulls so that the multiple callers don’t have to check for them.

Technorati tags: ,

Copyright Abhijit Nadgouda.

Generate Regular Expressions

This post has already been published on code::gallery blog which now has been merged into this blog.

Roy Osherov has created Regulator – a regular expression generator. And it has a nice visual interface to specify the data and the parsing rules that creates the regular expression. If you want to know more about regular expressions head over here. Developers can use this to be more productive in creating regular expressions and use them in their own programming environment.

The Regulator is an advanced, free regular expressions testing and learning tool written by Roy Osherove.
It allows you to build and verify a regular expression against any text input, file or web, and displays matching, splitting or replacement results within an easy to understand, hierarchical tree.

However in such applications the real challenge is to create a rich UI that will enable expressing all the possible formats of data and all the possible ways of specifying the parsing rules. More fiddling with it will tell us whether it is really upto the challenge, but a brief introduction has resulted in a pleasing experience.

You will also find a link to a more ambitious project – Regulazy.

Regulazy is an attempt to build a small “Expert System” for creating .NET Regular Expressions.
It lets the user create an expression based on a real life example of text they would like to parse.
As the user interacts with Regulazy using the mouse, Regulazy offers the user possible expressions that would fit the currently selected text. As the user selects “rules” to apply on various parts of the text,
a regular expression is built automatically in the lower application pane.

Regulazy goes a step further from Regulator to suggest the best matches for a real life data. Users can use sample text to specify the kind of data and see real time preview of the results. It is still in alpha stage.

Regular Expressions themselves are technology agnostic, I really wish that both these tools are not OS or platform specific. I will try and see if they work on other OSs by using Mono.

Technorati tags: , , ,

Copyright Abhijit Nadgouda.

Importance Of Coding Standards

This post has already been published on code::gallery blog which now has been merged into this blog.

Dr. Dobb’s Portal has an article on code quality improvement via coding standards conformance. The concept of coding standards is considered a part of development best practices but the way it is implemented is a pet subject for controversy. In some cases improper coding standards policies have been known to be harmful or less productive.

Purpose

The purpose of implementing a coding standard is to make the code homogeneous throughout the project so that is more maintainable. If it was that once written the code is never looked into, the value of coding standards would diminish. However, the reality is that code has bugs, which might be solved by others, a software evolves through its entire product cycle which leads to evolution and reuse of code on existing base. If the code is presented with consistency, it is easier to read and understand it, even by someone else.

This means that coding standard is not only about naming conventions. It should define how a block of code should look like, it should define indentation and policies to avoid generation of complex code like using multiple or nested ifs or incomprehensible loops. e.g., switch/select-case instead of a huge block of if-else if-else might be useful in languages like C++ and Java. Today all the higher languages support multiple constructs for flow control.

For Whom?

Ideally coding standards should be used by everyone in every project. However, its perceptible value might change depending on the size and the skill set. Even in a one-man project, coding standards help in keeping it clean over a longer period of time, but in larger projects they are a must. It is a requirement today to have people of varying skills in a project. Coding standards keep everyone on the same page and make the code comprehensible to everyone.

Coding standards become of prime importance for an open and distributed team. A lot of open source projects accept contributions from volunteers and users, the project will become a disaster if coding standards are not enforced. A good example is coding conventions put forth by Velocity.

The Problem

There are two main reasons why coding standards are opposed to:

  • Clash with personal style: A lot of engineers develop their own style of programming, which might be best for themselves. However, in a team it might not be very fruitful. An individual should adopt what is good for the team, whether it clashes with personal style or not.
  • Too much work: Using the coding standards feels too much work if its value is not being seen. And this is true sometimes. Wrong policies can often lead to increase in time and effort without an equivalent increase in value.

The Solution

The best way to solve these problems is to include all the team members when the coding standards are formulated. If it not possible to include everyone, all should at least be educated on the coding standards and their advantages and asked for their feedback.

Coding standards, just like software processes, should be customized for every project. Different aspects of the project like team size, team skillset, deadline, project, nature (distributed, onsite/offshore, …) should be considered. Using one-for-all coding standards usually results in misalignment between the actions and project’s goal. Sometimes this can also require customization to project management or organizational techniques.

A project configuration contains multiple policies and coding standards might overlap with some of them, e.g., exception-handling policy is closely related with coding standards. Specific care should be taken so that they don’t clash with each other.

Various tools are available to check compliance with coding standards. However, tools can never ensure the absolute. The best thing is to conduct code reviews, either by peers or seniors and encourage interaction in the developers.

Some more reading:

Technorati tags: , ,

Copyright Abhijit Nadgouda.

Resource Acquisition Is Initialization

This post has already been published on code::gallery blog which now has been merged into this blog.

Software programs have to deal with resources (memory, files, mutexes, semaphores, database connections, …) that a computer provides. To be able to make sure that the resources are available to all the programs and optimally used, following steps are required to use a resource:

  • Acquire a resource
  • Use the resource
  • Release the resource

By acquiring and releasing the program stakes its claim to the resource. If the resource is not released it usually ends up causing memory leaks, deadlocks or sometimes crashes. Memory is the most commonly used resource and memory leaks are commonly witnessed. Resource Acquisition Is Initialization (RAII) is an idiom that provides a protocol for acquiring and releasing resources.

Encapsulates Allocation and Deallocation

The classes that are written usually provide a level of abstraction over the physical resources available. RAII allows us to encapsulate the acquisition and release of these resources. A popular example is that of using a database connection. If the connection is acquired from the connection pool and not released back, it finally runs out of the connections and applications cannot continue working with the database. This is more probable if you leave this responsibility to the user of your class (another programmer). Here is an example in C++, that uses RAII to acquire the connection and release it:

class MyRecords
{
    private:
       DBConn connection;
	
    public:
       MyRecords()
       {
           connection = new DBConn();
       }
	
       ~MyRecords()
       {
           delete connection ;
       }
}

When this class is used, it guarantees release of the connection when its destructor is called. The destructor can be explicitly called, but it is automatically invoked if it is used as a local variable and goes out of scope. Your user now does not need to worry about the database connection, nor do you have to worry if he/she has released the database connection which can lead to undersirable situations.

void useMyRecords()
{
    // Allocated on stack
    MyRecords records;
	
    ....
	
    // Will run out of scope beyond this method
}

The same can be applied to any resource that a program might work with.

Gives Control To You

RAII is just half the story, it also includes releasing the resource. For your user, the two different steps of initializing your object and acquiring the resource has been combined into one, in fact completely encapuslated. What this also means is that your user does not have to worry in unavoidable circumstances like exceptions. Stack variables automatically go out of scope and their destructors are invoked.

It is not always necessary to acquire a resource in the constructor, it might be done in any other intermediate methods. In this case, care should be taken to check if the resource acquisition was successful before releasing it.

Lesser and cleaner code is just one of the post-effects, the real benefit is that you, as owner of your class, get complete control of the resources that your class uses. This is exponentially beneficial if you author a library or a framework.

Where is it applicable?

RAII needs to be supported by a language to be used. Generally, a language which uses non-deterministic garbage collection, lika Java does not support RAII. If the language supports use of custom-defined types for local variables (automatic objects in C++), you can take advantage of the automatic invocation of destructors when they go out of scope. It is also supported by object models and languages like COM which relies on reference-counted garbage collection.

More reading:

Technorati tags: , , ,

Copyright Abhijit Nadgouda.

Why Binary Searches Can Break

This post has already been published on code::gallery blog which now has been merged into this blog.

Binary searches and mergesorts can break, in fact, most of them will break because they follow the same method of calculating the mid-point for the search. It is usually calculated as average of low and high values. Here is what Josh Bloch says (the example uses Java):

The bug is in this line:

6: int mid =(low + high) / 2;

In Programming Pearls Bentley says that the analogous line “sets m to the average of l and u, truncated down to the nearest integer.” On the face of it, this assertion might appear correct, but it fails for large values of the int variables low and high. Specifically, it fails if the sum of low and high is greater than the maximum positive int value (231 – 1). The sum overflows to a negative value, and the value stays negative when divided by two. In C this causes an array index out of bounds with unpredictable results. In Java, it throws ArrayIndexOutOfBoundsException.

Very interesting, whatever the size of the integer, there will always be a condition when the sum of low and high will overflow it. Josh also gives some suggestions:

So what’s the best way to fix the bug? Here’s one way:

6: int mid = low + ((high – low) / 2);

Probably faster, and arguably as clear is:

6: int mid = (low + high) >>> 1;

In C and C++ (where you don’t have the >>> operator), you can do this:

6: mid = ((unsigned) (low + high)) >> 1;

A classic example where the code design can do the right or the wrong. It is important to realise that the code works in all the cases. While all is sometimes impossible to consider and we can use a subset, there is no guarantee that the subset will remain the same in future. That is why, in my opinion, as part of the maintenance of the application, the code should be verified against the new computing environment. Today, the software is re-released for change in business requirements or business environment, but it is also important to consider the changing computing environment.

Technorati tags: ,

Copyright Abhijit Nadgouda.

Libraries, Toolkits And Frameworks

This post has already been published on code::gallery blog which now has been merged into this blog.

We come across a mesh of nomenclature regarding libraries everyday – Application Programming Interface (API), libraries, toolkits, frameworks, and probably some more. So, what is the difference?

Some of these terms do overlap. API is the interface provided by the library, toolkit or the framework. This interface, usually, works like a contract between the creator and the user. Library is a generic term for a group of functionality provided. These libraries together can form a toolkit or a framework. Gang of Four (GoF) explain the difference between toolkits and frameworks in Design Patterns – Elements of Reusable Object-Oriented Software:

A toolkit is a set of related and reusable classes designed to provide useful, general purpose functionality. Toolkits don’t impose a particular design on your application; they just provide the functionality that can help your application do its job. They let you as an implementer avoid recoding common functionality. Toolkits emphasize code reuse. They are the object-oriented equivalent of subroutine libraries.

A framework is a set of cooperating classes that makeup a reusable designed for a specific class of software. The framework dictates the architecture of your application. It will define the overall structure, its partitioning into classes and objects, the key responsibilities, thereof, how the classes and objects, the key responsibilities thereof, how the classes and objects collaborate, and the thread of control. A framework predefines these design parameters so that you, the application designer/implementer, can concentrate on th specifics of your application. The framework captures the design decisions that are common to its application domain. Frameworks thus emphasized design reuse over code reuse, though a framework will usually include concrete subclasses you can put to work immediately.

You commit to a functionality when you use a toolkit, whereas you commit to the design when you use a framework. That is why it is of prime importance to choose the right option for the job. A wrong choice can ruin the entire effort.

This is also important for the creators. Since a framework provides design reuse, it is imperative that it is flexible. Otherwise, it might limit the programmer in more ways than enable him/her.

Technorati tags: , ,

Copyright Abhijit Nadgouda.

This One Is For Programmers

Curt Hibbs points to some interesting questions asked by Stiff and answered by who’s who in the programming domain.

Not golden rules or the word of God, but they definitely can serve as guidelines to some of us and definitely is fun to see what the great guys think. The most interesting for me was the question:

What do you think is the most important skill every programmer should posses?

As a fresher, the answers would have surprised me. The answers talk about communication, sense of value, taste, passion, concentration, but none of them mention about knowing the syntax. What we learn from this is that programming is much beyond code and syntax, it is about understanding concepts and their application. Not that they are not important, but they are not the key to become a good developer. Another good thing that comes out is that most of them suggest some mathematical background from computing, after all somewhere the fundamentals have to be learned. However, not all subscribe to it. In my opinion, it is more related to the field one is working with. Programming pure business applications using highly abstracted programming languages or Domain Specific Languages (DSL) might not require math. However, the base of analysis, optimization or algorithms is in the math.

Good questions and good answers, some of which are useful if you have any doubts about programming.

Technorati tags: ,

Copyright Abhijit Nadgouda.

API Reference Heaven

This post has already been published on code::gallery blog which now has been merged into this blog.

Found a very useful site for API reference – gotAPI (via Curt’s Comments). It is simply great to get all the searchable documentation for scripts and programming languages at one place.

The unique thing about this is that the documentation is obtained directly from the source and is not duplicated. It covers a multitude of languages, right from markup languages to scripts to high-level language frameworks, and there is more to come. Give it to the syntax seekers!

Technorati tags: , , ,

Copyright Abhijit Nadgouda.

Endianness

This post has already been published on code::gallery blog which now has been merged into this blog.

Just like we have to read from left to right or right to left depending upon what language we are reading, programs should know how are integers stored before reading a binary. This is Endianness. Just like the languages it just a matter of preference. The two schemes are big endian and little endian.

In the big endian version, the most significant bit (MSB) is stored in the lowest memory address, whereas in the little endian version the least significant bit (LSB) is stored in the lowest memory address. Here is an example:

The 32-bit integer 2D441B36 has to be stored at memory address 400 – 2D is stored in 400, 44 in 401 (1 byte offset) and so on in the big endian scheme. In case of the little endian scheme, the bytes are stored in the order 36 1B 44 and 2D.

Unfortunately, both are being actively implemented and programmers need to know the endianness of a system. Here is a program in C to differentiate between big endian and little endian systems:

int isBigEndian()
{
    /* assign 00000001 */
    int sample = 1;
	
    /* convert to equivalent character array */
    char *sample_array = (char *) &sample;
	
    /* if little endian, the lowest memory address will contain
       the least significant byte, i.e., 01 */
    if (sample_array[0] == 1)
    {
        return 0;
    }
    else
    {
        return 1;
    }
}

Endianness becomes a critical issue for portable code when it is targetted for multiple architectures.

In addition to the computer architectures, endianness becomes imperative to consider in network protocols like TCP. If the byte ordering is not considered, it leads to the popular NUXI problem. Here is a function to swap between the two schemes by using bit-wise operations:

int swapByteOrder(int sample)
{
    int sample_reverse = ((sample & 0xff000000) >> 24) | /* MSB */
                         ((sample & 0x00ff0000) >> 8) |
                         ((sample & 0x0000ff00) << 8) |
                         ((sample & 0xff000000) << 24);   /* LSB */

    return sample_reverse;
}

More reading

Technorati tags: , , , , ,

Copyright Abhijit Nadgouda.