trackback fixed
Thanks to Bob Wyman for pointing out my trackback was busted. Normal transmission resumed.
« January 2004 | Main | March 2004 »
Thanks to Bob Wyman for pointing out my trackback was busted. Normal transmission resumed.
You'd think speccing a feed format would be straightforward, but the way things are going on the atom-syntax list over the last few days, Atom will have make a best-effort to address the versioning and naming problems to proceed.
Now, where's the cache-invalidation thread at?
I think we'd agree in Java-land that cross-platform APIs have been a mistake (except perhaps for SAX). As for the whole factory and dynamic loading model for raw parsers, well that can get extremely messy in Java. Most of us have run into some form of Xerces hell at some point. To be fair that is usually a problem induced by the Java classloading architecture (what architecture?) rather than XML APIs. I suspect that the .NET loading model isn't much better, but .NET has the luxury of having fewer things to load as Dare points out:
Really, who's going to use something other than System.Xml/MSXML?
XML support for Java however is fantastic, after you muddle through the options. To name just a few, SAX2, XOM, JDOM, XmlPull, XmlBeans and Jaxen are all really very good libraries (and open source). To be fair to Sun, the JAX* set of APIs had to evolve piecemeal and thus are not always consistent, coherent or without mistakes - a case of putting the wheels on a moving car. .NET APIs have had the luxury of coming a bit later.
All in all, I see the use of interfaces or not as a red-herring here. It comes down to what value cross-platform APIs have (if any), how dynamic implementation loading is managed in a static context and whether you actually need multiple implementations in the first place.
I got into an email conversation with Bob Wyman a while back about the PubSub feed aggregator. With his permission I'm blogging about the PubSub architecture and internal processing model.
Bob asked that I don't paint a negative picture of being anti-XML and I hope I haven' t done that - PubSub doesn't strike me as anything other than great service. For those of you that aren't XML obsessives, Bob has taken some heat in the XML community over the last year for promoting binary infoset approaches. So when I asked if he was using binfosets, he responded:
It's worth noting that all this is internal to PubSub; the public server I/O is XML.
On XML v Binfosets and the processing model:
On PubSub metrics:
The hardware statement is interesting; it seems to align with the Google view of using commodity boxes while keeping the smarts into software.
On scalability:
On the value of XML:
On future interfaces into PubSub:
The main thing I take from Bob's explanations is that PubSub, along with being a fine service, is doing a good job of separating interoperability issues from performance ones, by sticking to XML at the system/web boundary and leveraging ASN.1 PER internally. That helps reduce XML-Binfoset controversy to a kerfuffle. PubSub not the only one working along these lines - Antartica (Tim Bray is on the board) also consumes and produces XML, but internally converts the markup to structures optimized for the algorithms required for generating visual maps. Similarity Systems Athanor allows you to describe data matching plans in XML, but again is converting to optimized data structures when it comes to making matches. The key mistake in interop terms seems to be wanting to distort XML to fit the binary/api worldview or replace it wholesale at the system edges.
Brett gets the boot in:
Honestly yes, I am looking to build tools will generate the RDF (indexes and metadata). I want to scrape RDF metadata from structured data, analogous to the way spiders today scrape indices from unstructured data. It's much the same issue, but I figure the signal to noise ratio will be better in the former - at least I don't see how it could be worse. It already looks like one of the first things I'll have to do is recast http server logs and syslog as RDF triples.
Part of this project is about exercising RDF in a domain I understand. After it, I expect to know whether RDF has value outside academia and standards worlds and what that value is. I was a huge huge fan of the technology, even serving on the working group for the best part of year, before becoming deeply disenchanted with where that process and the community at large was going (models, models, models) to the point where I felt I had little to contribute other than ranting from the sidelines. For the record, I'm still a fan, on my third reading of Shelly's book, am waiting for Danny's, and despite my opinions on the process, still have enormous respect for the work RDFCore has done. But I take a strong view that RDF metadata should layer on top of statistical and automated magma, not manual data entry; that is pixie dust. This hetereogenity is what we know works in robotics, reinforcement learning* and hybrid AI or for any technique that has to live outside a closed environment. So I see much less need for the tidy substrate and attention to good modelling the current RDF model-think presupposes. I also think the semweb cake is missing or willfully ignoring a key layer that the search engines are thriving in - the environmental noise of the web.
It's not metacrap, it's meta living on crap.
As for the AI pixie dust, I don't see computing RDF from structured data being any more pixiesh that computing pagerank from a page or computing a spam filter from spam (did I say I like hybrid techniques? :). The truth is, I'm at least as skeptical as Brett, but it's like being skeptical about what a computer can do in light of the halting problem - yes there's a hard limit, but you can still do something useful before you get there.
* and will be needed for IBM's autonomic computing feedback loops, but I digress...
[roni size: heroes]
How did I miss this:
Awesome. So much better David Parnas, than the fool of a Took who taught me down there (and killed my interest in programming for many years afterwards).
[opm: el capitan]
Web search blows goats. Local search totally blows goats.
For the web case: we need to decentralize search by passing queries around from site to site (trackback chains, mod-pubsub, or hack the bejeesus out of mod_backhand) and allowing sites to generate metadata locally and publish it instead of having spiders reverse engineer from HTML (duh). No matter how fast you can do it; downloading the Web into a cluster and indexing it - in what possible world is that a good idea?
For the local case: same thing, except we do the indexing and monitoring by hanging listeners onto the OS. The plumbing and UI is different but the index material, metadata and plugin models for listerners and indexers should be much the same. We could do lan-wide index sharing over zeroconf, that would be fun, as would a tuplespaces model instead of using mqs or interrrupts. We can of course upload indices to the web or onto your phone.
Let's use RDF for the data. Having seen that people figure using SOAP envelopes is not insane for UDP discovery broadcasts, content management or systems integration, I figure RDF is as production worth a technology as any for search and query. Or possibly an RDF that uses WikiNames instead of URIs.
But basically, a) my continuous build thingy is going to be done in the next two months; b) I can't think of a fun mobile devices project, c) wiki, my favourite web technology is now owned by confluence and snipsnap, d) I badly need better search over all my stuff.
So I'm going to give this 12-18 months. Cool names solicited.
[air: alpha beta gaga]
Someday, your neighbours' brats will try to crack your fridge, run denial of service attacks on the washing machine and own your toaster, perhaps defacing your toast in the process.
Ain't life grand.
>applause<. Finally, it's time to get off CVS.
As long as RDF/XML is the sanctioned syntax.
From the alt-tab-up-enter-alt-tab department.
I've been using the IDEA 4 eap this weekend. So far it's better than 3.0x - the interface is cleaner with a nippier response. I like modules (I think). But it seems IDEA doesn't support Ant 1.6 (specifically import). I've been fooling around with jarfiles for the last hour - this is like being back in 2001. Anyone got it working? I'm loath to go back to entity inclusions, plus I want to move some projects onto 1.6 at work.
[note: I bounced the date forward on this entry. It seems javablogs is picking up my feed again, so perhaps someone out there has a hack for this]
Eugene Kuleshov asks, Why can't IDEs use Ant as their project files?
Somewhat less ambitious: why can't IDEs (and Java itself) use the Ant classpath declaration structure?
[See also: java -cp classpath.xml]
AI is often said to be largely useless, but if you had done enough of it you would already know this:
Among other things, you would also know that the an important lesson folks picked up after the AI winter (who's 15 year anniversary cannnot be far away) is that how you model the inert data is key; that's one reason why all the SUO and WebOnt folks are so hung up on getting the ontologies just so, and that there remain a wasteland of decent tools and syntax (they just don't matter as much as abstract data models in the scheme of things). So I guess AI ain't so bad after all; if nothing else it'll keep you out of the weeds.
As for mapping the complicated stuff; we've been doing that for years in Propylon. Our CTO, Sean McGrath, can wax lyrical on this. It's called pipelining, and it's the way to go for systems integrations in general, not just munging a date format (perl will do just fine there). The main advantage of pipelines are an ability to keep recomposing as requirements change. In short - you can keep changing the transformation as fast as the business changes its mind. Try doing that with an XSLT write-only trainwreck.
I see Clemens Vasters has has caught the pipelining bug and that .NET has had it going a good while back in no small way due to Tim Ewald - WSE 1.0 supported a kind of in-memory pipeline for SOAP Envelopes; for Java folks it's not a million miles away from servlet filter chains.
Perhaps that's not representative, but it does seem that the .NET crowd gets the pipeline model. I'll go out on a limb here- I suspect that has something to do with MS programming culture been less inured to object orientation and object patterns. A key thing for XML pipelining is that you want to separate data from process acting on that data, which is heresy in some OO circles. The only process really tied to an XML document is schema validation, and even then the behaviour is so data driven, so late bound it's hardly worth picking it out. Off the top of my head, I can only think of one OO pattern where it's ok to decouple data from behaviour and it's the Visitor. It seems that at the system edges, where XML does matter, functional programming and lazy evaluation are the way to go.
The pipeline is the most important pattern/idiom in XML programming. The difference between it and the semantic web outlook, is that any good XML hacker knows that transformation is also primary stuff, not something to be cast aside as a small matter of programming because the model theories can't support it.
I used to say that about SOAP ;)
I'm curious to see where Mark's going with this. Like Mark I have a related issue that I can't discuss for various reasons.
In the HTTP case, many of the header tuples are metadata about the representations. But, representations are dark matter - they aren't first class objects on the web since they don't have URIs. Ironically, representations are about as real a thing as you can get on the web (they'll come into your computer if you let them, resources never do that). This 'issue' pops up in RDF circles from time to time. Yet RDF in itself is limited in how it can help with the unamed parts of web architecture, or anything that doesn't have a URI moniker.
Jon Udell: Analyzing blog content
I've heard this way of using CSS described as semantic markup. But I can see an army of RDFers wishing Jon used URIs instead of free text inside his class attributes. I don't know if CSS will take URI syntax as tokens, but WikiWords would be a good compromise.
Back in the day, before we understood the value of standards that would've been the attitude I'd expect from, say, a large software company ;-)
Anyway...
Russell is rightly annoyed about all this, but he's rightly annoyed at the wrong things, the wrong people, and that's understandable given how we got here. I take the opposite view to Russ; not having mainstream availability of PUT and DELETE is the singularly most broken aspect of web technology today.
Let's go back. There is a broken spec in this story, but it's not Atom and and it's not HTTP. It's, wait for it... HTML. The reason technologies like SOAP and the Midp and Flash only use those verbs is because HTML only allowed POST and GET in forms. That's where the rot started.
What's the big deal? Well, the hoops you have to go through to do basic messaging on the web are frankly, ridiculous, and it results directly from inheriting the web forms legacy of abusing POST. For example, consider reliable messaging done over the web. The absence of client infrastructure that supports a full verb complement gives leeway to invent a raft of over-complicated non-interoperable reliable message specs. Reliable messaging, by the way, is one area that WS vendors can't seem to agree to standardize - perhaps that's because it's critical to enterprises (read, there's real money in it). But, the point is that there should never been an opportunity to make things complicated in the first place. In my job, we design and build messaging infrastructure, a lot of it happens over HTTP. There's a good amount of pressure to make that infrastructure fit with web forms technology and existing client stacks. Now, to do RM over the web, and do it cleanly, you want the full complement of HTTP verbs at your disposal (esp. PUT and DELETE). With them you can uniquely name each exchange and use the verbs to create a simple state machine or workflow operating over that exchange. Without them you have to use multiple resources to name one exchange plus clients and servers will typically have to inspect the URLs to find out what's going on. Operators and software have to be able to manage this, know all the URLs involved in the exchange, plus the private keys you're using to bind them together behind the firewall. Oh, did I mention that you'll have to reinvent these verbs in your content anyway and then get your partners to agree on their meaning? POST driven exchanges become to a small degree non-scalable, to some degree insecure, and to a large degree hard to manage.
Trust me, it's not an academic issue, and it's not limited to RM; basic content management is in scope too. For those of you that don't monkey about with HTTP for a living, I can sum up the problem of the problem of not having PUT and DELETE like this - imagine dealing with a subset of SQL that doesn't suppport UPDATE and DELETE or Java Collections that didn't have an add() method. It's an insanely stupid way of working. But if you never knew SQL had UPDATE to begin with, and it was useful, perhaps that wouldn't be as apparent.
The irony is, that while some of us are left to compromizing with the fallout from uninformed specs, a number of people think that PUT and DELETE are some sort of esoterica that only a spec wonk could care about. And now, over on the Atom list some people are talking about workarounds. To heck with that. Get Sun to fix the Midp and the W3C TAG to fix HTML/XForms. The latter is worth emphasizing - as far as I can tell, this issue isn't even on the TAG's radar.
Russ, sorry; Atom is not the broken spec and the REST folks are not being intransigent nerds (this time). Argung for a subset of HTTP is not the way to go here, even if it's the expedient way right now for J2ME. Sure there are hundreds of millions of broken clients out there, but what worries me are the next billion clients, not the early adopters.
I've done usual change-poll-time-and-update-bounce-dance, but for some reason, my blog feed is not being refreshed by javablogs. The last update seems to be ~Jan 31st. I validated the feed, checked against a few aggregators and as far as I can tell there's no encoding weirdness in the feed.
Anyone else having trouble? I tried posting to the javablogs forum but got a reponse code 500 with a stacktrace... oh well, maybe someone can point the Atlassian massive at this entry instead ;)
I put links to my weblog categories on the frontpage yesterday. Turns out this lets me inbound link to myself in Technnorati each time I add an entry. I'm taking the categories off for the time being.
No, but would we want ever expose such things via a direct binding? Legacy systems living deep in the enterprise, in general don't seem to require web service interfaces; they require web service gateways with data transformation pipelines that can be dynamically brought into the delivery channels. While I think there are avenues of exploration, deep integrations aren't yet something than can be push-button automated by tools. But there are ways to get the job done faster and better.
Consider this - exposing a 24x7 web interface into a mon-fri batch nightly COBOL job system. The COBOL system works correctly and is mission critical to the enterprise in question; not something to be tinkered with. Our answer to that scenario was to accept data and queries as async calls over HTTPS in XML form encapsulated by a standard XML envelope. The back of the web service is a series of pipelines. The pipelines entail auditing, structural validations, content validations, code mappings, pre-matching, data cleansing, statistical capture and conversion to the job a format before leaving the job in a repository. A second process running on a schedule uploads the jobs to VMS vis FTP. A third process collects FTP'd responses from the COBOL systems and proceeds to reconcille the responses with the submissions and fulfil those back the sender of the original XML message via another pipeline as well as publishing the results to other subscribers. The setup has proven flexible, and robust to changes in requirements and semantics.
The idea of blowing out the service from the COBOL; that would be problematic. Herein lies a key issue with the way middleware webservices toolsets prefer to be used - codifying a domain model, then generating web service stubs and wsdl descriptors for deployment to the DMZ tier is, in terms of software process, precisely backwards for deep integrations or repurposing for service oriented architectures. Neither the tools or the local object models should be driving the integration process, they should be supporting it. There are some subtle gotchas. Web interfaces and batch processes are working to different timeframes; this entails an asynchronous gateway, but also impacts system adminstration and operation. There is usually no canonical domain model in an enterprise and perhaps more importantly no time or possibility for agreement on one - I see this as being a serious issue for efforts like the OMG's MDA and the consequent modelling toolsets coming down the line.
Having built the web service, it would be fine to expose, say, WSDL if someone really wanted it, but this is driven from the service design, not the provisions in some RDB/OO/WSDL mapper. Personally, I would see generating WSDL as being a publication exercise more than a design exercise.
As for the Achilles heel of web services :) I've complained about toolkits, but it's not that (as understanding of enterprise integrations grows, the tools will get better). In early 2004, the Achilles heel of web services is the complexity resulting from the sheer volume and lack of coherence in the web services specs and a lack of architectural guidance from the folks generating them - hence the title of this blog. Witness the current ASF list.
[Update: Mark Baker laments the passing of the W3C Web Services Architecture group. Me too - there was some confort to be had by having the likes of Mike Champion, Eric Newcomer, and Frank McCabe thrashing this stuff out. ]
You are viewing a mobilized version of this site...
View original page here