[image]
Ph: 2022775704
[image] [image]
[image]
[image]
[image][image][image][image][image][image][image][image][image] [image] [image] 639884 members! Sign up to stay informed. [image] [image] [image] [image]
[image]

Sponsored Links


Resources

Enterprise Java
Research Library

Get Java white papers, product information, case studies and webcasts

[image]News News News Messages: 9 Messages: 9 Messages: 9 [image]Printer friendly Printer friendly Printer friendly [image]Post reply Post reply Post reply [image]XML XML XML [image]
[image]

New Java Framework For Data-Intensive Java on Multicore

Posted by: Emilio Bernabei on Fri Dec 22 10:56:18 EST 2006 DIGG
J2EE and the myriad of other web app frameworks have served us well. Why build a web app from scratch including bean pooling, threading, connection management etc.. when it's already done for you?

But when Java developers sit down to build a data processing application (financial services data, insurance claims, health informatics, bio-research, the works...) they have nothing. Nadda. No help. Let me be more specific about the application here -- it's not an OLTP model. Not SOA or ESB based. This is bulk (GB or TB) data processing when you have minutes to spare, not hours to wait.

With the "multicore arms race" now in full swing, Java developers can no longer wait for CPU clock speed to save their application's poor performance. I blog about it in detail on my blog.

Well, I'm pleased to announce to the Java community that Pervasive DataRush Beta 1 is available for download.

DataRush is a light-weight (less than 3 MB on disk) but extremely powerful parallel processing engine framework. It's 100% Java and runs on Java 5 SE. It handles all the parallel programming for you including horizontal, vertical and pipeline parallelism. In fact, you can code many data processing applications using XML scripting and our out-of-the-box library of Java operators.

We've started benchmarking this framework against well-known algorithms out there and have found that, vs. Perl or non-threaded Java, we can cut the runtime to 1/10th of prior performance time in some cases. Not all Comp Sci problems can be made parallel, so I'm not claiming a magic wand here -- but even with the not-so-parallel algorithms, DataRush gives you pipeline parallelism (each module of your algorithm runs on a separate CPU core while data flows dynamically through them). I've posted one such benchmark on the website and will keep posting as they become available.

Download it.

Try it.

Let me know what you think.

We've just launched the beta program so now is your chance to be heard and have your ideas change the course of DataRush.

Thanks for spreading the word!

Emilio Bernabei
Director of Product Management
Chief Evangelist, DataRush

Message was edited by: joeo@enigmastation.com

Threaded replies

    ·  Simple, powerful and saves time coding. You decide. by Emilio Bernabei on Fri Dec 22 14:15:19 EST 2006
      ·  Re: Simple, powerful and saves time coding. You decide. by Jean-Marie Dautelle on Fri Dec 22 18:24:56 EST 2006
      ·  Give us some code examples ;-) by Fabrizio Giudici on Fri Dec 22 18:27:46 EST 2006
        ·  Simple sort example by Matt Walker on Wed Dec 27 10:12:16 EST 2006
          ·  Re: Simple sort example by Emilio Bernabei on Wed Dec 27 11:03:49 EST 2006
          ·  Simple sort example. by Peter Lawrey on Sat Dec 30 08:57:21 EST 2006
            ·  It's not what you see, but what you DON'T see... by Emilio Bernabei on Sat Dec 30 15:48:46 EST 2006
      ·  I like the break down the ease of learning an API by Peter Lawrey on Sun Dec 24 07:10:47 EST 2006
[image]  Message #224524 [image]Post reply Post reply Post reply [image]Go to top Go to top Go to top [image]
[image]

A simpler (and free) approach to concurrent processing

Posted by: Jean-Marie Dautelle on Fri Dec 22 13:48:05 EST 2006 in response to Message #224487
Concurrent programming should be easy and if possible standard (not proprietary). An example of such open framework is Javolution (only 250 Kb) which provides among other things ConcurrentContext to transparently take advantage of multi-cores.

Here too, you don't have to mess with threads or synchronization. Just write your code in concurrent manner and run it! ConcurrentContext can be disabled at runtime in order to measure their effect on execution speed.

Here is an example of concurrent/recursive quick sort illustrating how simple it is:

void quickSort(final FastTable<? extends Comparable> table) {
final int size = table.size();
if (size < 100) {
table.sort(); // Direct quick sort.
} else {
// Splits table in two and sort both part concurrently.
final FastTable<? extends Comparable> t1 = FastTable.newInstance();
final FastTable<? extends Comparable> t2 = FastTable.newInstance();
ConcurrentContext.enter();
try {
ConcurrentContext.execute(new Logic() {
public void run() {
t1.addAll(table.subList(0, size / 2));
quickSort(t1); // Recursive.
}
});
ConcurrentContext.execute(new Logic() {
public void run() {
t2.addAll(table.subList(size / 2, size));
quickSort(t2); // Recursive.
}
});
} finally {
ConcurrentContext.exit();
}
// Merges results.
for (int i=0, i1=0, i2=0; i < size; i++) {
if (i1 >= t1.size()) {
table.set(i, t2.get(i2++));
} else if (i2 >= t2.size()) {
table.set(i, t1.get(i1++));
} else {
Comparable o1 = t1.get(i1);
Comparable o2 = t2.get(i2);
if (o1.compareTo(o2) < 0) {
table.set(i, o1);
i1++;
} else {
table.set(i, o2);
i2++;
}
}
}
FastTable.recycle(t1);
FastTable.recycle(t2);
}
}


[image]  Message #224529 [image]Post reply Post reply Post reply [image]Go to top Go to top Go to top [image]
[image]

Simple, powerful and saves time coding. You decide.

Posted by: Emilio Bernabei on Fri Dec 22 14:15:19 EST 2006 in response to Message #224524
Hey Jean-Marie, how's it going? I really like your direction. I want as many folks in the Java community to benchmark the various frameworks out there on several levels:

a) Speed to learn a framework
b) Speed to code a solution to a problem w/ framework
c) Ability for framework to scale efficiently on multicore

Also, to all the readers out there, I may not have stressed this enough. DataRush is for DATA-INTENSIVE problems. Javolution may be better for you in some cases.

We tried to balance speed of development (hence XML scripting language for expressing dataflow graphs) with the overhead of the framework. In some cases, maybe Javolution is lighter weight... I don't know.

To that end, some things to consider:

1. If you can use DataRush XML scripting to do the same sort operation shown here, is that valuable?

2. Here you see code for merging. DataRush does that for you as implied by the use of our join operator in the XML.

3. Benchmark it! Proof is in the pudding. I leave it to readers to balance design performance and maintenance of code vs. n-th degree speed/throughput needs.

Thanks for downloading DataRush.

[image]  Message #224536 [image]Post reply Post reply Post reply [image]Go to top Go to top Go to top [image]
[image]

Re: Simple, powerful and saves time coding. You decide.

Posted by: Jean-Marie Dautelle on Fri Dec 22 18:24:56 EST 2006 in response to Message #224529
Hey Jean-Marie, how's it going? I really like your direction. I want as many folks in the Java community to benchmark the various frameworks out there...


Agree! Multi-core is here to stay and we can expect more and more various solutions in 2007. Most likely a new JSR will be started and a standard solution will emerge.

BTW: Really nice site with many interesting links.

[image]  Message #224537 [image]Post reply Post reply Post reply [image]Go to top Go to top Go to top [image]
[image]

Give us some code examples ;-)

Posted by: Fabrizio Giudici on Fri Dec 22 18:27:46 EST 2006 in response to Message #224529
Hello. I've been evaluating the "multicore switch" for a few time from an architectural perspective on my blog. "Architectural perspective" means to me that an architect must be comfortable with a range of possible solutions, starting from home-made designs for the simpler cases to the use of an existing framework from more complex cases. One size won't fit all. So it's good that more products come to light, such as DataRush or Javolution (I'm also writing a very small framework which is specifically dedicated to parallel image processing, which is a really specific task).

But it's hard for the architect to build a deep knowledge of these tools as there are more and more and a deep knowledge requires time. Both DataRush and Javolution are on my radar (I think I've been also emailed by the CTO at Pervasive) and I'll be trying them ASAP. One suggestion to Emilio: I think that posting a brief piece of code, as Jean-Marie did to give a rough idea of the framework, is a very good thing. I looked for some example at the Pervasive site, but I couldn't find any. I'd strongly suggest to publish some.

[image]  Message #224554 [image]Post reply Post reply Post reply [image]Go to top Go to top Go to top [image]
[image]

I like the break down the ease of learning an API

Posted by: Peter Lawrey on Sun Dec 24 07:10:47 EST 2006 in response to Message #224529
I agree that one solution will not fit all uses. Different people approach the same problem in different ways.

Essence Java Framework supports multi-threading at the component level and is externally configurable. It uses the built-in Java 5 SE concurrency libraries to support multi-threading and the number of threads in the pool for a component is configured in an external file.

a) Speed to learn a framework.
Essence does not require you to learn a single essence class. It does however use programming by convention. For example you must have a constructor which takes all arguments rather than setters.
If you want to make full use of Essence, its API is still not large. Getting started Javadoc

b) Speed to code a solution to a problem w/ framework
A sample application with a JMS client and broker is provided. The jar for the JMS client and broker is 39K and is 354K including all required JARs. It has two, one page configuration files (60 lines in total). (One for the client and one for the broker)

c) Ability for framework to scale efficiently on multi-core
As Sun have demonstrated, the out of the box libraries are improved from one version of Java to the next. Essence uses these libraries directly rather than creating new ones or a new layer on top so it can take full advantage of any enhancements Update: Java 6 Leads Out of Box Server Performance

What does it give you? Transparent high performance clustering across servers, tested in 2-way for 4-way mastering. Benchmarks

[image]  Message #224608 [image]Post reply Post reply Post reply [image]Go to top Go to top Go to top [image]
[image]

Simple sort example

Posted by: Matt Walker on Wed Dec 27 10:12:16 EST 2006 in response to Message #224537
Here is a simple example that sorts ten randomly generated integers:


<?xml version="1.0" encoding="UTF-8"?>
<!--
(c) Copyright 2006 Pervasive Software Inc. All rights reserved.
-->
<AssemblySpecification
xmlns="http://www.pervasive.com/xmlns/dataflow/sdk"
schemaVersion="1.0"
package="tests.learning"
name="SortTest">
<Doc>
<Author>mwalker</Author>
<DateCreated>2006-12-23</DateCreated>
<Description>Demonstrates the sort operator.</Description>
</Doc>
<Operator>
<Contract>
</Contract>
<Composition>
<Assembly instance="source" type="com.pervasive.dataflow.operators.source.GenerateRandomRows">
<Set target="outputType" value="int"/>
<Set target="rowCount" value="10"/>
</Assembly>
<Assembly instance="sort" type="com.pervasive.dataflow.operators.sort.Sort">
<Set target="partitionCount" value="0"/>
<Set target="sortAlgorithm" value="merge"/>
<Link instance="source" source="output" target="input"/>
</Assembly>
<Assembly instance="log" type="com.pervasive.dataflow.operators.sink.LogRows">
<Set target="logFrequency" value="1"/>
<Link instance="sort" source="output" target="input"/>
</Assembly>
</Composition>
</Operator>
</AssemblySpecification>


Note that I've explicitly set the "partitionCount" property of the sort operator to zero, which means its will automatically assess hardware parallelism and internally exploit that information.

The results of running this code:


C:\workspace\testing>dfe -cp build\dist\testing.jar tests.learning.SortTest
2006-12-23 08:42:11.419 INFO SortTest.log.logType.customize Input type is int
2006-12-23 08:42:11.740 INFO SortTest.sort.sortDispatcher.run Sorted 10 rows in memory










2006-12-23 08:42:11.960 INFO SortTest.log.logRows.run There were 10 rows sinked
2006-12-23 08:42:11.970 INFO com.pervasive.dataflow.tools.cli.DFECLI.processLeftoverArgs Job runtime: 1.815
2006-12-23 08:42:11.960 INFO SortTest.log.logRows.run Row 10 is 2022775704 2006-12-23 08:42:11.960 INFO SortTest.log.logRows.run Row 9 is 1962638078 2006-12-23 08:42:11.960 INFO SortTest.log.logRows.run Row 8 is 1711199546 2006-12-23 08:42:11.960 INFO SortTest.log.logRows.run Row 7 is 1570617054 2006-12-23 08:42:11.960 INFO SortTest.log.logRows.run Row 6 is 980354890 2006-12-23 08:42:11.960 INFO SortTest.log.logRows.run Row 5 is 185780738 2006-12-23 08:42:11.960 INFO SortTest.log.logRows.run Row 4 is 175398363 2006-12-23 08:42:11.960 INFO SortTest.log.logRows.run Row 3 is 15967868 2006-12-23 08:42:11.960 INFO SortTest.log.logRows.run Row 2 is -982258612 2006-12-23 08:42:11.960 INFO SortTest.log.logRows.run Row 1 is -1261162224


As a word of warning, I'm relatively new to DataRush, so take what I say with a grain of salt. However, it seems to me that coding in DataRush involves a new way of looking at concurrent programming.

In DataRush, you are chiefly concerned with building a network of concurrent processes. DataRush gives you a language (the XML you see above) and a standard library of operators (like sort) from which to construct your apps -- you don't have to be concerned with threading or traditional concurrency at all. In this sense, it is much "higher level," hiding not only the details of the threaded implementation, but also providing you a structure within which to operate: the process network.

[image]  Message #224610 [image]Post reply Post reply Post reply [image]Go to top Go to top Go to top [image]
[image]

Re: Simple sort example

Posted by: Emilio Bernabei on Wed Dec 27 11:03:49 EST 2006 in response to Message #224608
There is much, much more, but I would like to add an important side note here.

The "operators" you use in your process network could be anything from a Java class we provide, a Java class you write or an entirely new "assembly" you previously created using XML.

So you can see how the process network lends itself to reuse and how you don't have to always write Java to create new "operators" -- sometimes it's as simple as an XML dataflow snippet.

The engine will look at the nested process networks, the memory afforded to the JVM and the number of cores available, then create a more detailed (expanded) process network of parallel threads at compile time. NOTE: "compile time" for DataRush is actually during runtime... it's sort of like a pre-processing step done just prior to running the job.

Of course, that's just the beginning. At runtime, the engine has to manage in-memory queues of data as the readers stream data throughout the process networks.

There's so much to say... not enough real estate. I haven't even started into what we call "customizers" -- the way you give intelligence to your custom operators so they can self-assess parallelism strategies.

The docs are fairly robust for a Beta. So...Download.

[image]  Message #224694 [image]Post reply Post reply Post reply [image]Go to top Go to top Go to top [image]
[image]

Simple sort example.

Posted by: Peter Lawrey on Sat Dec 30 08:57:21 EST 2006 in response to Message #224608
The problem with simple examples is that often simpler using the most obvious way to do the same thing.
For example, which would you say is simpler to integrate into another application such as Tomcat?
At what point would a developer say the example you provided is worth the extra effort to learn, support etc?

public class RandomIntegers {
public static Integer[] getSortedRandomIntegers(int num) {
Random random = new Random();
Integer[] ret = new Integer[num];
for(int i=0;i<num;i++)
ret[i] = random.nextInt();
Arrays.sort(ret);
return ret;
}

public static void main(String... args) {
int num = args.length> 0 ? Integer.parseInt(args[0]) : 10;
System.out.println("Random numbers length= "+num);
System.out.println(Arrays.asList(getSortedRandomIntegers(num)));
}
}


[image]  Message #224699 [image]Post reply Post reply Post reply [image]Go to top Go to top Go to top [image]
[image]

It's not what you see, but what you DON'T see...

Posted by: Emilio Bernabei on Sat Dec 30 15:48:46 EST 2006 in response to Message #224694
Peter, I agree that using a 'hello world' example is not the way to justify the ROI of learning DataRush. We were just trying to compare concurrent code constructs. I would say if your web app needs to sort 10 integers -- stick with Array.sort() !!

But if you want an example that better illustrates the counter point to using custom code let's use 1 million records. Each record has 200 integers. Now sort the 1 million records 100 times -- once for every 2nd field in the record.

Your code has no concurrency that I can see. Maybe I'm missing it -- I'm not claiming to be a J2EE expert developer. So what would Tomcat do with 2, 4, 8, 16 cores to work with? How would your code look? How would it vary it's concurrent thread count based on available processors? Or would it? How would it vary the batching of data if you vary memory from 1 GB to 16 GB?

I would have to say sorting is 'boring' and also too simplistic a business problem. Maybe some smart sorting expert implemented the world's best Array.sort() class?? I guess now that Sun Java is GPL I could go look :-)

Tomcat is awesome. Don't get me wrong. But it's like using a hammer for a drill's job. Let J2EE do the real-time SOA and the Web App serving and let DataRush do the non-real-time data processing IMHO.

With the chip vendors doubling the number of cores every 18 months now, I would suggest that developers calling Array.sort() are not going to get the ROI their CIO is looking for... but maybe your competition will.

Where between 10 and 1 million records is the cross-over? Only a good ROI calculator knows ...hmmm there's another TODO for this holiday season....

[image]
[image]New content on TheServerSide.comNew content on TheServerSide.comNew content on TheServerSide.com [image]
[image]

New Features in EJB 3.1 - Part 5

[image] In this last article of the series, Reza Rahman will talk about standardized global JNDI names for Session Beans and EJB 3.1 Embeddable Containers for Java SE environments. (October 8, Article)

A RESTful Core for Web-like Application Flexibility - Part 4 - Patterns

[image] Randy Kahle and Tom Hicks conclude their four-part series on resource-oriented computing by describing how to utilize a RESTful kernel to design and build software systems. (October 6, Article)

UseMon Real-Time JVM Monitoring

[image] Paul Rene Jørgensen talks about UseMon, an efficient, open-source, real-time JVM performance monitoring agent. (October 2, Tech Brief)

Extreme Transaction Processing, Low Latency and Performance

[image] In this podcast, John Davies will cover several case studies of extreme transaction processing, low latency and high performance systems and offer insight into what we might expect to see in mainstream in the near future. (September 29, Podcast)

OpenESBs in Action

[image] Learn about Mule, ServiceMix, Synapse, Petal and other OpenESBs in Action from authors Tijs Rademakers and Jos Dirksen - Tech Brief about their new book. (September 24, Tech Brief)

Getting Started with jBPM and Spring

[image] Josh Long looks at business process management using jBPM and Spring to implement a simple multi-stage operation. (September 22, Article)

Java Performance Tooling

[image] In this podcast, Holly Cummins will introduce a number of tools for identifying and fixing common Java performance problems. (September 17, Podcast)

Terracotta Brings Large Memory Spaces to Java

[image] In this podcast, TheServerSide.com editor Peter Varhol asks Terracotta CTO Ari Zilka about how the software works and what it can be used for. (September 15, Podcast)

Object Oriented JavaScript Demonstrated

[image] In this chapter from his book Object-Oriented JavaScript, author Stoyan Stefanov talks about primitive data types in JavaScript as a prelude to discussing more complex concepts. (September 12, Chapter)

The JSF Flex Project

[image] The JavaServer Faces Flex project lets developers unfamiliar with Flex create JavaServer Faces components to link Flex UIs to managed beans in the same manner as JSF. (September 8, Article)

What is an App Server?

[image] Joseph Ottinger notes that we have certain preconceived notions as to what an app server is and does, and argues that we should rid ourselves of those notions. (September 2, Article)

A RESTful Core for Web-like Application Flexibility - Part 3 - Logical Level Programming

[image] Randy Kahle and Tom Hicks continue their series on Resource Oriented Computing with a description of the transition between the physical and logical level of their architecture. (September 2, Article)

Building a Scalable Enterprise Applications Using Asynchronous IO and SEDA Model

[image] Mihai Lucian demonstrates a Java servlet architecture using Mule that supports asynchronous I/O and a combination of fast and long-running business processes. (August 25, Article)

Asynchronous Processes Modeled as Persistent Finite State Machines

[image] Benjamin Possolo describes and implements a finite state machine for asynchronous services using JEE. (August 18, Article)

A RESTful Core for Web-like Application Flexibility - Part 2 - Microkernel

[image] Randy Kahle and Tom Hicks continue their series of RESTful computing with an explanation of the role of the microkernel. (August 7, Article)

Putting Physhun To Work

[image] Read more about the Physhun finite state modeling framework in Jim Ladd's article on a real life application. (August 5, Article)

Are Java Web Applications Secure?

[image] HDIV was designed for the purpose of addressing security issues in Web application frameworks by extending some of those frameworks to do such things as check of non editable data integrity and perform validations for editable data. Read about potential security problems and how HDIV attempts to address those problems. (July 30, Article)

Free Book PDF Download: Mastering EJB Third Edition

[image] Mastering EJB was one of the original and most influential EJB books in the industry. Mastering EJB III now returns with two new expert co-authors, updated for EJB 2.1 and 30% new chapters including security, integration, best practices, open source, and more.
(Book PDF Download)

Application Server Matrix

[image] The Application Server Matrix is a detailed listing of J2EE vendors and their application server products, with information on latest version numbers, J2EE spec support and licensing, pricing, platform support, and links to product downloads and reviews.
(Application Server Comparison Matrix)

News | Blogs | Discussions | Tech talks | Patterns | Reviews | White Papers | Downloads | Articles | Media kit | About
All Content Copyright ©2007 TheServerSide Privacy Policy
Site Map


You are viewing a mobilized version of this site...
View original page here

Mobilized by Mowser Mowser