skip to main | skip to sidebar

Tuesday, May 27, 2008

GridGain 2.0.3 Released with Grid-Enabled Executor Service

GridGain 2.0.3 has been released. For the most part it was a stability release which had undergone tremendous amount of testing and as a result is a great improvement to overall fault-tolerance and scalability of the product. We have tested all sorts of scenarios where nodes kept joining and leaving grid at will under significant load and introduced a lot of performance improvements.

However, this release does have a new feature I am very excited about - Grid-Enabled ExecutorService which executes all the tasks submitted to it on remote grid nodes. Basically, you use it as you would normally use java.util.concurrent.ExecutorService, but you get all the cool GridGain features right out of the box, such as peer-class-loading, fault-tolerance, load balancing, job scheduling and collision resolution, etc...

Here is a "Hello World" example that shows how simple it is to use it. Let's first create a simple java.util.concurrent.Callable that prints out a string and returns the number of characters in that string:


public class ExampleCallable implements Callable<Integer>, Serializable {
/** String argument. */
private String arg;

public ExampleCallable(String arg) {
this.arg = arg;
}

public Integer call() {
// Print out string passed in.
System.out.println(arg);

// Return number of characters.
return arg.length();
}
}

Now, let's execute our ExampleCallable on the grid:

public final class GridExecutorExample {
public static void main(String[] args) throws GridException {
// Start grid node.
GridFactory.start();

try {
Grid grid = GridFactory.getGrid();

// Create new grid-enabled ExecutorService
ExecutorService exec = grid.newGridExecutorService();

// Execute ExampleCallables on the grid.
Future<Integer> hello = exec.submit(new ExampleCallable("Hello");
Future<Integer> world = exec.submit(new ExampleCallable("World");

// Print out number of characters from both executions.
System.out.println("'Hello' character count: " + hello.get());
System.out.println("'World' character count: " + world.get());

// Close executor service.
exec.shutdown();
}
finally {
// Stop grid node.
GridFactory.stop(true);
}
}
}

To make it interesting, let's start a couple of stand alone grid nodes by simply executing gridgain.sh or gridgain.bat script under GRIDGAIN_HOME/bin folder (you can start them on the same physical box if you like).

Note that we don't need to do any deployment of our code to the grid. All required classes will be peer-class-loaded automatically.

After running our example, you should observe that one node will print word "Hello" and another node will print word "World".

Enjoy!

 

Monday, May 19, 2008

GridGain: To Split Or Not To Split

When designing your tasks for execution on the grid you need to decide whether or not you need to split your task into smaller jobs for parallel execution, and what the size of your split should be.

When Not To Split

People often think of a grid as of infrastructure for execution of long running tasks. That is not always the case. Compute grids can add a lot of value even for quick, short running jobs. Imagine, for example, that your application constantly needs to calculate in real time a bunch of statistical metrics and averages on a financial portfolio, let's say for displaying them on UI. Although every single calculation may not take too long, having all calculations performed concurrently on the same server or a thick client can bring it to its knees fairly quickly. A good solution would be to take every calculation and execute it on a separate grid node.

The example above is a good case for when not to split your tasks. When deciding whether to split or not, you should take into consideration the time of local execution. If your task can execute locally fast enough, then don't split it and run it on the grid as a whole. By doing that you get the following benefits:
You remove a single point of failure. If a grid node crashes in a middle of calculation, then GridGain will automatically fail it over to another node. You balance the load across your grid nodes. GridGain will automatically load balance your jobs to the least loaded nodes. You can also turn on job stealing and have less loaded nodes steal jobs from more loaded nodes. You improve overall scalability of your application. Now you can add or remove grid nodes on demand whenever your application load peaks or slows down and, hence, keep the response times constant regardless of the load. For example, you can configure your grid to include more nodes into topology as application load grows. You get GridGain's simplicity. Here is how simple it can be to execute some portfolio calculation on the grid. Note that all you have to do is attach @Gridify annotation to your Java method and that's it!

@Gridify
public void calculatePortfolioPosition() {
...
}

How To Split

Now, let's say you really need to split your task into smaller jobs in order to speed up execution. A good formula to decide on what the size of your split should be is to take the time your task takes to execute locally and divide it by the time you would like to achieve. So if your task executes in 2 seconds and you would like to achieve 100 milliseconds, then the number of jobs your task should split into should be (2000 ms / 100 ms = 20). In reality, the execution time will be slightly more than 100ms as most likely your jobs will not be absolutely equal, and there is also a slight network communication overhead.

That is not to say that for this example you would only need to have 20 nodes in the grid. Ideally you should have as many nodes as your application needs in order to handle the load - let GridGain pick the most available 20 nodes for execution of individual jobs within your task.

For more information visit our Wiki or watch Grid Application In 15 Minutes screencast.

 

Wednesday, May 14, 2008

Master-Worker in Peer-To-Peer Architecture

We sometimes get questions from users on how to ensure Master-Worker pattern within peer-to-peer (P2P) architecture in GridGain. When designing our API and our deployment model, we purposely went with P2P architecture because we wanted to have ultimate freedom on how a grid node is used. As a result, in GridGain a node can act as master or worker or both, depending on your configuration. Moreover, you don't even have to change a single line of code to get this to work.

The following example shows on how it can be done. In GridGain every node has a notion of attributes which it gets at startup. Here is an example that shows how a node can get a "worker" attribute from Spring XML configuration file at startup:


<bean
id="grid.cfg"
class="org.gridgain.grid.GridConfigurationAdapter"
scope="singleton">
...
<property name="userAttributes">
<map>
<!-- Make this node a worker node. -->
<entry key="segment.worker" value="true"/>
</map>
</property>
...
</bean>

Now, we need to make sure that only worker nodes are passed into GridTask.map(...) method on the master nodes. To do this, on the master nodes we need to configure GridAttributesTopologySpi, the purpose of which is to filter nodes based on their attributes. Here is how the configuration will look like:

<bean
id="grid.cfg"
class="org.gridgain.grid.GridConfigurationAdapter"
singleton="true">
...
<property name="topologySpi">
<bean class="org.gridgain.grid.spi.topology.attributes.GridAttributesTopologySpi">
<property name="attributes">
<map>
<!-- Include only worker nodes. -->
<entry key="segment.worker" value="true"/>
</map>
</property>
</bean>
</property>
...
</bean>

That's it! To verify that it works, we can add assertion into our GridTask implementation to make sure that all included nodes are indeed "worker" nodes as follows:

public class FooBarGridTask
extends GridTaskAdapter<String, String> {
...
public Map<GridJob, GridNode> map(
List<GridNode> topology, String arg) {

Map<GridJob, GridNode> jobs =
new HashMap<GridJob, GridNode>(topology.size());

for (GridNode node : topology) {
String workerAttr =
node.getAttribute("segment.worker");

// Assert that worker attribute is present and
// is assigned value "true".
assert workerAttr != null;
assert Boolean.getBoolean(workerAttr) == true;

jobs.put(new FooBarWorkerJob(arg), node);
}

return jobs;
}
...
}

Note, that although we only segmented grid into 2 segments, masters and workers, you can configure as many segments as you like by providing additional node attributes. For example, you can have several worker groups each responsible for processing only a certain subset of jobs. Take a look at Segmenting Grid Nodes article on our Wiki for additional examples.

Enjoy!

 

Wednesday, May 7, 2008

GridGain Vs. Hadoop (Continued)


VS.


Recently Hadoop posted a HadoopVsGridgain comparison page on Wiki. I have always been a big fan of Hadoop. Although I believe that the product is very hard to use and API's are far from obvious, I still think they have achieved quite a lot and the fact that Yahoo Search runs on Hadoop proves that system works and scales quite well. However, this ridiculous "comparison" threw me a bit off and the only reason I can think they put it up is that GridGain started significantly cutting into their user base.

Generally such comparisons from vendors look plain silly. Nobody expects a vendor to be fair when talking about a perceived competitor and needless to say many points made on that page are wrong. Moreover, the main differences between the two products are not even touched!

Hadoop comes with distributed Hadoop File System (HFS) which is its main feature. It also comes with a MapReduce component which allows you to work with files stored on HFS in parallel. HFS is extremely performant and allows for scalable and fast processing of data that is stored on it. It is great for applications that can afford putting all of their data into HFS (Yahoo Search), however it is not at all suited for a vast majority of applications that use conventional databases such as Oracle or MySql.

GridGain, on the other hand, is a MapReduce computational grid platform, the main feature of which is to split a task into smaller jobs and execute them in parallel on the grid. It handles node discovery and communication, peer class loading, scheduling and job collision resolution, load balancing, data affinity, transparent grid-enabling via AOP, and many other computational features out of the box. GridGain does not come with any file system of its own, but integrates with all major data grid products to provide collocation of data and computations - this is how GridGain is able to process terabytes of data stored in any database or file system.

So, the main difference between GridGain and Hadoop is that Hadoop forces you to migrate all of your data into their proprietary Hadoop File System and GridGain does not and instead allows you to work with your existing databases.

I also want to add that GridGain is by far much simpler to use, and the API's it provides are more natural and easier to understand, but take this with a grain of salt as I am definitely biased here :)

 

Monday, May 5, 2008

GridGain And GridDynamics Join Forces

GridGain and GridDynamics announced partnership today.

We are looking forward to working together with Grid Dynamics as there is a lot of synergy between the 2 companies. GridDynamics brings to the table a broad grid computing expertise which together with GridGain open source product and professional support will help us deliver best-of-breed cost effective solutions to our clients.

[image] You can see the full press release here.

 


You are viewing a mobilized version of this site...
View original page here

Mobilized by Mowser Mowser