« March 2005 | Main | May 2005 »
Adam Bosworth, with his 'S4' criteria for mass adoption on the web (simple, sloppy, standard, scalable) seems to have clarified the worse is better debate over web data formats. Mike Champion had an interesting comment on RDF and Atom's slopworthiness as compared to RSS over on David Megginson's related entry 'RSS as the HTML for data':
I can only assume that Mike's referring to recent discussion on xml-dev among other things. Over there I agreed that RDF has simplicity/comprehesion issues, but pointed out with a few simple examples that RDF is lot more tolerant of partial and missing information than some people realise. For example Daniel Steinberg also commenting on Adam Bosworth's keynote, thinks that total agreement is a requirement:
In reality, what Daniel said is is not true about RDF - RDF was designed with the unexpected in mind. A lot of this misunderstanding has had to do with early hype about the Semantic Web, which has at times sounded suspiciously like AI reborn. It also has to do with the way the benefits of RDF have been couched - critically, when WS technology adoption was on the up and up, the emphasis of Semantic Web standardization within the W3C was on formalization of the technology rather than useful applications.
RDF receives its robustness and flexibility properties from its design, and two design properties stand out.
First is the graph model that RDF is based on. All RDF data organized as a graph, different from XML tree based document structures and vaguely like relational databases, but without the idea of tables. The beauty of the graph model is that it is 'additive'. That means you can keep merging new items onto the graph without having to create new data structures to support new information. Using RDF as the data model, queries and merging operations end up producing new graphs as their results, in much the same way SQL query results are also tables. More importantly it makes for a clean programming model. It's extensible and uniform. It's also 'subtractive', which means you can take data out of the graph and leave a smaller graph behind just the same way you'd remove an item from a hashmap, but with the hassle of doing something like dropping a column or table in a database (in the developer trenches, adding or dropping database columns can be the stuff of nightmares). For scalability, breaking up large graphs of data into smaller ones allows us to physically distribute datasets.
The most interesting slide in Adam Bosworth's presentation are not just the ones that feature S4, but the diagrams which show queries being divvied across servers (thanks to Mike for the sending on the link). While, it's known that Google break out their indices across a cluster into what they call 'shards', Bosworth's model looks like the late Gene Kan's Infrasearch query router, now part of the JXTA project. As a counterpoint, Doug Cutting of Lucene and Nutch fame has said, more or less, that there's no great advantage yet to distributed queries across the web in this way over downloading and centralizing the indexes:
Whether downloading the web into a cluster for indexing is the way to go indefinitely remains open if the amount of data being generated exceeds our ability to centralize it. At some point Jim Gray's distributing computing economics might flip in favour of sending the query out after the data rather than trying to localize the data. William Grosso has wondered whether Gray's model invalidates semantic web precepts:
Second is the "open world" assumption of RDF. What that means is that not finding the answer to a query doesn't mean the query is false. For example if searching for an Atom entry's summary finds nothing and you conclude there's no summary for that entry, that's a closed world assumption. But in RSS1.0, which is RDF based, you'd conclude you don't have a summary to hand, not that it doesn't exist. The data might be incomplete at the time of asking. Dan Brickley describes this as 'missing isn't broken':
The case of the Description Logics and ontology worlds coming to the Semantic Web and worrying over queries that will blow up in the engines is much like the case of the enterprise world coming to the Web worrying over type systems and discovery languages. The likeness is not fleeting - both the Semantic Web and Web Services advocates have been busy building competing technology stacks in the last decade. They have valid points and good technology but the need or demand for such precision in the Web context has been overestimated. As Pat Hayes put it:
Pat Hayes is an interesting person to have said that. He's a legend in the world of AI in the way Adam Bosworth is a legend as a software developer. Both have concluded in their own ways that the 'neat' orthodoxies implicit in Web Services and the Semantic Web are futile. Cleaning up the Web is infeasible.
If you come from an SQL/XML background the open world idea of everything being effectively optional is going seem weird and unworkable, but what it really means is that every addition of data is an extension act - extensibility is intrinsic to the RDF way of doing things, not something that gets bolted on as with mustUnderstand/mustIgnore. The same intrinsic nature goes for distribution of datasets. Since RDF data can be distributed across any number of nodes, the technical challenge is not scaling the database across clusters it's routing and distributing queries. Query routing is a special case of the kind of packet routing problems that occupy telecoms, peer-to-peer and internet engineers. Adam Bosworth is right, we need Really Simple Querying, but it's a bit early to rule out RDF as a good fit for returning the results or dealing with scale issues.
What Yahoo! Groups has to say sans cookies:
"Your browser is not accepting our cookies. To view this page, please set your browser preferences to accept cookies. (Code 0)"
And yet, how many frameworks encourage or even allow the binding of handlers based on a combination of URL+method? In my experience the protocol independence anti-pattern kicks in at that point, and the request method is the last thing that a developer is encouraged to take into account. The end result are URLs that react identically to any request method. It might be an interesting experiment if Udell tried sending PUT, DELETE, HEAD requests to the same API calls." - Leigh Dodds
Good to see HTTP abuse getting some attention. Maybe the W3C won't bake HTTP subsets into their specs anymore.
Malware Evolution, Kapersky Labs:
[via Steve Loughran]
Came across this one in work yesterday. What's happening is that some of the guys are using Cruisecontrol (2.2) to run a nightly build along with a deploy/smoketest into JBoss; as part of the build JBoss is stopped, and then started via a Java target. This setup works fine when the build is called directly via Ant, but when run from Cruisecontrol, JBoss is not started and the Cruisecontrol cycle hangs.
Here's the Cruisecontrol fragment that calls the ant build file:
<schedule>
<ant time="0300"
antscript="C:/Projects/P3/build/cc-build.bat"
antworkingdir="C:/Projects/P3/build"
buildfile="cc-build.xml"
uselogger="true"/>
</schedule>
The target in cc-build.xml being invoked looks like this:
<target name="build" depends="stop.appserver, init, clean, get-code">
<ant antfile="build.xml"
dir="${src.localpath}\build" target="nightly-build"/>
<antcall target="start.appserver"/>
</target>
And the start.appserver target looks like this:
<target name="start.appserver" description="Start the Appserver server." depends="init">
<java dir="${appserver.home.dir}/bin" classname="org.jboss.Main" fork="true" spawn="true">
<arg line="-c default"/>
<jvmarg value="-Xms32m"/>
<jvmarg value="-Xmx200m"/>
<classpath>
<pathelement path="${appserver.home.dir}/bin/run.jar"/>
<pathelement path="${java.home}/lib/tools.jar"/>
</classpath>
</java>
</target>
I suspect the JBoss JVM process is forking out in a way that perhaps has the Cruisecontrol JVM hung waiting for it to return. I haven't had to time to really go digging into this but I'm thinking that an exec target might work better instead of Java. Another possibility I suppose is that the running JBoss is not being stopped fully before the new instance is started (but that doesn't happen via Ant). Anyhow, I thought I'd throw out there to see if anyone had come cross this before.
With the current furore over Andrew Tridgell reverse engineering the Bitkeeper wire protocol, it's interesting to note that the argument seems to be not over the wire protocol but enabled access to the metadata via understanding the protocol. Tridgell has done this before with Samba. I imagine Bitmover have every right to claim that the metadata is part of product (presumably it's generated by the software), but it seems then to be difficult or impossible to manage the code without the metadata. Caveat Emptor then.
If so, the Linux kernel SCM argument ratifies the notion that data is the new lock-in. Who owns data and metadata and who has access to them is an important issue.
From a technical perspective, it's arguable that the higher you go up the programing language stack the fuzzier the distinction between software and data is. If you had to look at a typical Java or C# system, it'd be clear enough for the most part what's data and metadata and what's code. Technologies like annotations make this fuzzier, but not impenetrably so. XSLT scripts can get fuzzy, as can systems utilizing code generation. A significant Lisp system could make for an interesting data ownership argument. Lisp advocates having been preaching code==data for decades. Consider that the configuration files for my emacs editor are in Lisp, or that using Python or Ruby source code to store configuration details (rather than XML) is a common idiom. Down the line, I can imagine a rules driven system based on Topic Maps or RDF data being equally fuzzy.
In short, a lot of innovation in enterprise and commercial software is about blurring the line between data and code. I would love to see those knowledgeable in Open Source, Web, Compliance and IT Governance matters pick up on this issue, and maybe focus less on software licencing. Most RFPs that pass my desk assume that what is the data is in a system is largely obvious. They've no doubt been set straight on music, but I would guess that most folks think that they own their IM conversations, their email, their weblogs, and their photos. It's not just that people won't own their data - it's not unfeasible to imagine a situation where a software provider had to turn the code over and give up a strategic technology advantage to enable access to the data.
Via Norm Walsh, Dave Pawson has a blog: http://nodesets.blogspot.com/. Seems like he's doing 12 rounds with Tomcat.
In UserDict as object scaffolding I mentioned that:
I've had enough feedback to suggest this is either pointless or a bad idea. So I'll be unlearning it.
A while back, I reviewed the book How To Write Parallel Programs (HTWPP), which is sadly out of print. As an aside I said this:
Just yesterday Bob Prior at the MIT Press made the following comment on that review:
Aside from the possibility of getting such a great book back in print, MIT Press classics already has all kinds of great books up there. Here's a taster:
I had no idea the series existed. And it's not just CS books - economics, philosophy, architecture to name a few subject categories.
(Thanks to all the folks who pointed out that HTWPP is available online. Go read it!)
Then again, with all this talk of dynamic typing, and Python, and Groovy and Ruby on Rails, perhap we should stop and consider whether the Java world is ready for type freedom. Yow.
(From ScottMcPhee)

Vincent Massol would love reusable Ant tasks:
/**
* Accommodate Windows bug encountered in both Sun and IBM JDKs.
* Others possible. If the delete does not work, call System.gc(),
* wait a little and try again.
*/
private boolean delete(File f) {
if (!f.delete()) {
if (Os.isFamily("windows")) {
System.gc();
}
try {
Thread.sleep(DELETE_RETRY_SLEEP_MILLIS);
} catch (InterruptedException ex) {
// Ignore Exception
}
}
return true;
}
Would you have thought about this? Probably not and you would have been right not to as this only happens in some rare occasions."
I've thought of it, sure, because it's bitten me before. Repeatedly. (Here's the rant...) And it's not rare (or at least, not rare enough :). Try writing junit tests which add and remove enough directories or files between setups. Nightmare. Try doing industry standard .do/.done file weirdness and getting the support calls when the files are left lying around. Nightmare. And that system.gc() hack doesn't always work. I'm not even sure it's considered a JDK bug - Java by design is so abstracted from the actual filesystem it can't offer guaranteed side effects for file operations. So you need to treat these things as best effort. Given that gc is also best effort there's still have room to fail in the Ant code above (I think). I wrote a countdown once to repeatedly try a deletion and failing that, bail out with an email to ops. That's spinning the CPU more that a sleep() but you got more shots at deletion. These days I'd tend to the idiom which deals with files whose modification time is X milliseconds older than currentTimeMillis. Or if you must, fork a process (btw, how Ant forks processes is great; everyone should re-use that).
Couldn't agree more about re-using Ant tasks however (exec being a great example):
I think another reason the Tasks are tied to the Ant engine is because Ant doesn't have standard i/o (eg the way Unix pipes do). Task.execute() is void. I use a set/get/execute(in,out,err) idiom a lot for XML pipelining in Java, it was taught to me by Sean McGrath. The reason that works under those circumstances and any component in the pipeline is reusable/reorderable is because the XML in and XML out provides uniform i/o. A Uniform API might not be quite enough - you might have to ask what Ant's answer to | is. Without the i/o abstraction I don't know if you can achieve what Vincent wants in terms of depdendency management.
But of course none of this stuff was the original intention of Ant.
antsh: turning Java into shell scripting, task by task!
Jaxen 1.1beta4. I moved some code from a custom API to Jaxen 1.0 a few months ago (even in moratorium a good library). So it's really good news that Jaxen is active and will make a 1.1. I had compeletly misssed that it was active, or that Elliotte was working on it.
The open source world needs more CVS commit RSS feeds - it's way easier to stay on top of releases that way.
Cool: MT-Redland
[via Danny, aka "I've-got-mucho-domains"]
Dare Obasanjo on attention.xml and collaborative filtering:
Collaborative filtering alledgedly only works if you have a critical mass of items of interest and users to cross-reference. I heard once this needed to get to the low 1000s to ensure reasonable precision. That was back in 2000, by which time people had figured out how to process large in-memory datacubes in close to real time (ie updates occuring between user sessions).
That's on the server.
What we're not doing is considering how filtering might work on the client. When more specific information about the user is available, it's possible to optimize these algorithms to work with much smaller data sets, and in general to think about different algorithms or hybrid approaches. And it's probable the results can have higher relevance for the user. Commercially, collaboration has worked best for targeting mass goods for individuals, which is why it works well for Amazon.
But the choice of algorithm varies based on the nature of the data (a lot of this stuff tends to be fantastically sensitive to the data and how the data is represented). Think about how useless a Bayesian spam filter would be aggregated across a 100,000 user data set up on Bloglines. It could be much better to work against a couple of users you trust and some candidate data of your own to seed the algorithms.
Probably the reason they all start to sound like Hailstorm is because they all work on the basis that the computation has to be done on the server against large aggregate datasets. One place, one owner. Cue the consequent privacy concerns. A few years ago, when asked how the trust problem could be solved, a senior executive from Egg bank had an immediate answer - "Branding". The extent people will trust your organisation with their information is largely based on their current perception of your organisation. That's not quite the same thing as branding, but you get the idea.
What do you do with all that information you're generating 24x7? How do you convert it to value? Today's answer is to sell it to the people who have something to sell or messages to tell. The money's not in whatever it is you're offering to users to gather up the data in the first place (like search) - the money's in the side effects. And while converting the data into value for you or for those who want to sell something, the users must not think they're being sold out. Or they're gone. Something of a highwire act - and you only get to fall once.
One way to allow highy specific user information to inform the filters on the user's device, not someone's VC-backed server farm. Really, that's a social solution.
It could be much more interesting to sell this technology directly to users for 5 dollars and let them run it on their phones against the data of their choice. To do that requires a certain amount of letting go of ways of doing things, right through from client-server technology to business models based on TV and print media. The current situation is hopelessly dependent on those systems of buying and selling.
The social networking phenomenon is interesting insofar as it attempt to join users to users or rather than users to services to advertisers. The next step is to get those lumbering servers out of the way and let people interact directly. That will require more imaginative and disruptive business models.
Brian McCallister tells a story about why clarity might be important in a programming language:
This reminds me of Jonathan Sobel's classic paper "Is Scheme Faster than C?". When I linked to that paper here, Jonathan left the following comment:
I have to say, after a few years in the wilderness, coming back to RDF to do some hacking has been both fun and instructive. So, what's changed?
The community. I'm slightly older and a lot less cynical about the whole technology after being very excited about RDF around the turn of the century. I became pretty annoyed at the direction the RDF community was taking starting in 2001; by 2002 I had lost much of my interest. During that time I moaned a lot and generally wasn't very helpful (sorry). The other thing that's changed now is that the community's expectations seemed to have settled to something sane, especially around the extent and value of formal approaches on the Internet. The whole DL and formal logic gung-ho seem to have eased up a lot in the last two years, thankfully. No doubt some people felt that was a necessary growing pain for the technology, but it was just as much a pain to have to have really smart KR people tell you you were wrong, wrong, wrong, at various levels of politeness, when you wanted to get something useful out the door and iterate. Especially tough if you knew your AI history and where the whole KR sheebang could end up versus what counts for deployment on the Web.
The tools. The tools are so much better now. I've had Jena in a small-scale production environment for over 6 months, acting as the ham in an XMPP and Hibernate sandwich. It works a treat. At some point they might need to go back and clean up the APIs in a breaking way - there's some junk DNA lying about, understandable as the API has travelled through about 2.5 iterations of RDF at this point. But the core implementation seem to be solid. I find 4Suite to be stable software (tho' I'm not sure the RDF stuff is active anymore - Uche et al have been working on anobind most recently) I've been using rdlib and sparta recently and those are very neat. Sparta is in good shape for a 0.7, and the rdflib API is rather beautiful (tuplespace fans will love it). Dave Beckett's Redland is really impressive; the amount of work that has gone into it is incredible. Short version: the amount of work done by the RDF community in the last couple of years is humbling.
The web. The web is now more machine-oriented than a few years ago. Much more. The RDF community saw this would come to pass before anyone else, I think, but perhaps not quite in the way it has turned out - RSS, WS and REST-as-deployed, rather than intelligent software agents. Even so, those technologies are likely to start creaking on the data front - arguably WS and REST-as-deployed already are at that point. As the networking and application protocol work gets bedded down, the new low-hanging fruit becomes extensible data formats sprinkled with semantic constraint pixie dust rather than type annotations and namespaces (media-types remaining useful). RDF-Forms, some people's re-examination of description languages, and the interest in speech acts are just the beginning.
Shipping. I'm not sure how useful RDF is for explicit data representations over XML and relational tables, but as an internal format for applications and machine level chit-chat it is a decent option that you could be looking at before rolling your own configuration formats. Less code, more data. Now, people will point at how Mozilla's RDF is a millstone, (and they would be right), but we are 5 years on from that - the use idioms are a known today. You can even write something approaching sane RDF/XML once you avoid that nasty striping idiom.
Potential. My current work on desktop client using RDF to manage application state makes me think that a simple reasoner (a la cwm) could get into a mobile device within two years and such a reasoner is possible now for desktop aggregators, albeit being a tough enough programming exercise. And when you're done what's still needed there is a reporting language from which to drive the views. But if you had all that? Then that could push the kinds of things the folks at Nature have been doing right into the client (the way Nature is using RSS is extremely cool, and also well beyond the commercial state of the art). Everyone would get the equivalent of an an embedded SQL engine inside their aggregators working over their RSS data. Such reasoners available for consumer-grade software would turn the industry being built on RSS infrastructure on its head, as the ability to innovate with data would accelerate drastically. Imagine being able to cross filter and repurpose data on your phone instead of waiting for Technorati, Amazon or Yahoo! to get round to providing a cool new service. Or put another way, why wait for the services when you can generate the same views locally? (and then SMS them to your mates). The market emphasis could shift from rich clients to rich data very quickly and would, I imagine, force Web2.0 businesses to expose their data much more transparently than happens today (otherwise they don't get to participate in the user's views). If that happens, the current extensibility models available today in RSS and Atom might not offer any competitive advantage - writing new code and upgrading the aggregator is going to be too slow to matter. In this regard I think the WinFS approach was boiling the ocean. WinFS is like EAI for the desktop, when a few hacks and a webserver would get most of the way there. It would have been enough to have reporting and searching for incoming RSS data built into the desktop as a first cut. A smarter filesystem could have been done later after the approach was proved to work and after you had proxied a My Documents feed behind an IIS daemon.
Anyway, enough analyst-speak :) All in all, I would say this RDF stuff is just about ready for a second look. The big question is whether the world can get past the Semantic Web hype and bluster from years gone by to see the value.
In the HTTPLR protocol, there are a few resources of interest that let us reason about a message exchange:
Outside the protocol proper, we'll also be interested in the following:
So here's the data in RDF/XML format:
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:httplr="http://purl.oclc.org/httplr/"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<httplr:message rdf:about="http://www.dehora.net/test/httplr/sub1/msg1.xml" >
<httplr:exchange rdf:resource="http://www.dehora.net/test/httplr/sub1/msg1.xml?exchange"/>
<httplr:state rdf:resource="http://purl.oclc.org/httplr/state/created/"/>
<httplr:location>file://c:/temp/msg1.xml</httplr:location>
<httplr:auth>digest</httplr:auth>
<dc:format>application/atom+xml</dc:format>
</httplr:message>
</rdf:RDF>
And as a graph:
It turned out to be simpler than I thought to represent. As ever the XML isn't pretty, but the arrangement of information is clean. It does raise some questions that I don't have answers for:
Why is this useful? Well there are a few reasons:
Probably, an example like this will go into the next HTTPLR draft as a non-normative appendix.
* This is the kind of thing that REST people are banging on about with regard to self-description, statelessness, and also why uniform interfaces matter. A WS approach would have to expose specific methods to support this. In REST we can carry on with uniform methods.
** And This is the kind of thing that RDF people are banging on about with regard to partial understanding and extensiblity. It's also why RDF doesn't need mU or mI.
I have a habit, when working in Python, of starting classes by extending UserDict, usually because I dont have a strong idea of where I'm going just yet. The UserDict acts as a scaffold. So I might start with something like this to fill out against the initial tests:
class ExchangeState(UserDict):
def __init__(self, msg_url='', exchange_url='', httplr_state=state_UNKNOWN):
UserDict.__init__(self)
self['msg_url']=msg_url # the URL of the message
self['httplr_state']=httplr_state # the current exchange state
self['exchange_url']=exchange_url # the HTTPLR exchange URL
As I'm working the code, some of the dict keys will get lifted to object fields:
class ExchangeState(UserDict):
def __init__(self, msg_url='', exchange_url='', mtype=None, httplr_state=state_UNKNOWN):
UserDict.__init__(self)
self['msg_url']=msg_url # the URL of the message
self['httplr_state']=httplr_state # the current exchange state
self['exchange_url']=exchange_url # the HTTPLR exchange URL
self['mtype']=mtype # the message mimetype
self.msg_url=msg_url # the URL of the message
self.state=httplr_state # the current exchange state
Eventually, the dict scaffolding will be taken away, leaving the object:
class ExchangeState:
def __init__(self, msg_url='', exchange_url='',mtype=None, httplr_state=state_UNKNOWN):
self.msg_url=msg_url # the URL of the message
self.state=httplr_state # the current exchange state
self.exchange_url=exchange_url # the HTTPLR exchange URL
self.msg_mtype=mtype # the message mimetype
Of course sometimes there is no lifting, and the class gets left as an extended dictionary. But I'm wondering, does anyone else develop classes like this? I'm finding it a very natural way of working.
Another best post ever from Ryan Tomayko:
In passing: "That is the essence of software engineering I think. Its not about writing cryptic programs to show how smart we as programmers are. Its about finding elegant forms of expression that maximimise our return on behavioural complexity" - Sean's comment Simplicity on the attack
Vaguely related: "A general stopped by to give us a little speech about strategy. In infantry battles, he told us, there is only one strategy: Fire and Motion. You move towards the enemy while firing your weapon." - Joel Spolsky's Fire and Motion
Mark Nottingham is wondering:
I'm talking about RDF datatypes, of course. As far as I can see, they're a special case to the data model; although the datatype itself is identified with a URI, the property 'RDF datatype' isn't, and as a result you can't meaningfully talk about (as in, reason with CWM, or access with most RDF APIs) them using that oh-so-delicious subject, predicate, object triple."
The charter when I was on the RDF wg said, when you got down to it, that RDF had to play nice with XML Schema. That was back when you could remember that XML Schema was meant to be a simple replacement for DTDs and just before people starting seeing serious problems with that technology (ie, it may not be sanely implementable). RDF Datatypes attempted to cover that requirement off.
Anyway, if the RDF wg didn't address that, others would, over and over. Some folks are deeply, deeply attached to data typing - Web Services proves that beyond question. It does not matter whether they are needed or even appropriate, people want data to have machine based types. There's a lot to be said for pre-empting that desire. For example Atom is taking much the same form of pre-emption with link types in the use of atom:link[@rel].
The literals are another special case that RDF datypes try to cater for. XMl literals in particular proved to be quite hairy; I seem recall a few calls with Jeremy Carroll at the time while we were sent off to figure something out.
There's a been a lot of back and forth on whether literals should be subjects of RDF statements. Some people think that anything worth talking about should have a name - so name it. Others will point out that a huge amount of legacy blob data exists out there that RDF excludes to some degree. Consider that you 'can't talk meaningfully' about HTTP representations in RDF either; that's probably a bigger problem than datatype inelegance.
However none of this type stuff hurts a whole lot for real work, as RDF processors treat type information as inessential - it's optional metadata.What's likely to break is your application making unwarranted presumptions about what information will be available (if you haven't learned your data typing lesson from Web Services at this point, well... mU :)
Consider another, more significant problem RDF has. I'm currently integrating Sparta (Mark's RDF library) and rdflib into a desktop application, and I can see that soon I'm going to run into the situation where A says X Y Z and B says X Y Z and I will want to preserve the provenance of those two statements as coming from A and B.
The problem here is a straight up loss of information - you can't easily ask 'who said X Y Z?', without the context of the statements. I've never worked on a real-world application of RDF that didn't come up against this issue. Solving it in pure RDF is very clumsy; APIs tend to add fourth item to the statement, often called 'quads', but that can rope your data to the API in question, which is definitely not the point of using RDF. Plus the meaning of quads isn't nccessarily shared between systems. I'm hoping not to have to switch to 4Suite to solve this problem; 4Suite is a big full-featured API and I want to keep things as light as possible. If I can.
From the Joseph Heller school of specification:
To which I say: LOL.
Tim Bray and Elliotte Harold are not as amused as I am by the looks of things. Tim thinks these folks are heading down a slippery slope: I wager that slope begins with the Infoset. Amy Lewis has the best comment so far:
Norm Walsh points to a hole in how the XInclude and xml:base specs interact:
Yow. Between this and xml:id|c14n, I wonder if there isn't a process issue with the core XML work. That's twice a group of baseline specs haven't been specced to work properly together. And twice backward compatibility is being preferred over getting things straight. Over time that is going to have to paid back in some form of technical debt.
It's not so much the size of the specs that matter as the surface area of the interactions between specs. I see Norm has the Dijkstra testing quote at the top of his entry, but I'm not sure Dijkstra had this class of coordination problems in mind.
One thing that's been bothering me for a while: it does not seem that XML integrates well with other XML in the general case. That is when you move past XML1.0 and into the XML 'family' of specs, things seem to unravel ever so slightly. I worry that the XML family has essential composition problems unless you stick to a flat, dictionary-like structures a la Atom or RSS.
The new Jini starter kit is great news that that community recognizes the barrier to adoption. I caught some heat about Jini failing the ten-minute test last year - contrary arguments along the lines of, this is a neccessarily complicated problem - I just didn't buy those. They've also sorted out the Jini licencing, by moving to the ASF version 2.0, more good news, as that takes the confusion out of anyone's obligations to Sun in production scenarios.
This comes via Tim Bray, who is doing some cool sounding networking thingy skunknamed Zeppelin - wish he'd tell us more ;) He's wondering whether it's the simplest thing that could possibly work. That depends on what range Zeppelin is meant to operate at. Jini is a LAN/Enterprise range technology that could be pushed out to the WAN if you hacked an XMPP transport underneath it (the beauty of the ASF licence means that you can do this now). The thing that seems to make Jini neccessarily complicated beyond that is Java code sharing, with all the attendant security and versioning issues, and discovery (but which could be addressed via zeroconf - maybe).
JXTA went after the internet range and for data sharing, and in terms of the protocols at least, dropped the Java dependency. It's hard to argue that's not a safer bet in the long run. Something surely has to evolve on top of all that bittorrent traffic ;)
You are viewing a mobilized version of this site...
View original page here