Which Mobile OS?

Daily Wireless asks, "Which Mobile OS?"

This one?

Scaling XMPP and Pub/Sub

Jack Moffit: "Sorry, Twitter. Until we see some answers, you don’t have data, just a big mouth."

I think Jack Moffit, always excellent, is being hard on Alex Payne and the Twitter gang. He is criticising Twitter for restricting access to the firehose - the XMPP stream of events - "tweets" in Twitter parlance. Jack alludes to a strategic reason for this, as in - twitter 'own' the data and therefor should own the derivative value obtained from analysing or reorganising the data.

"I don’t know the exact time that they started pruning the list of consumers of the firehose, but to me it seemed like this starting happening after Summize was acquired or around that time. The logical conclusion from this is that Twitter does not want more interesting things being built on top of its data."


I'm guessing the reason is not just that, although we do know that Twitter will be announcing a business plan a few quarters out. The other reason might be scale.

Scale? It's received wisdom that heavy HTTP polling is stupid and wrong, whereas push is both more efficient and more optimal. The problem is there isn't much science or shared field experience on what it means to have a public XMPP data and notification endpoint with a lot of subscribers. When I say a lot, I mean 250,000 to 1M clients holding open connections to your server(s).   Issues I've seen are that load balancing becomes a problem, db access costs dominate login times for clients, and XMPP server clustering isn't as far along as I'd thought it was. Scaling XMPP does not appear to be a commodity problem the way HTTP scaling is - you are back down to looking at whether/if the servers are using epoll/nio; whether load balancing should be done by clients (remember the load balancers actually get in the way), how long it takes to log a user in, set up presence, rosters etc; what the cluster toplogy's graph connectivity measure is (S2S doesn't seem to be the answer). It's like being back in 2000 and wistfully reading Dan Kegel's c10k page.

My suspicion is that services pushing out notifications to a number if subscribers (Sn) where that number is large is not yet a panacea to web poll scaling issues because there is latent asymmetry in the costs of pushing out events to increasing numbers despite it being more peformant and less latent for smaller values of Sn. And that service providers will need to look carefully at graph theory, flooding and gossip/propogation models to get pub/sub notifications to meet web scale delivery - and at that point we'll be half-way to either a peer to peer model, or usenet - take your pick ;)

Observations on Portable Contacts 1.0 Draft C

The specification is here: http://portablecontacts.net/draft-spec.html. My overall observation is that it's a good idea, but the spec needs a lot work. Herein my initial observations

I'm surprised that a data format spec is tied to XRDS and Oauth. This seems unneccasary and brittle, apart from my opinions, which are XRDS is complex, even when it's called "simple", and Oauth is not a general purpose auth protocol (it excludes user agents). I'm not sure how the "intended goal of widespread adoption" is met by depending on Oauth.

It doesn't define a posting variant. Dear format designers, please learn one lesson from Atom - think about how a resource exposing your format can be created, don't just assume a singularity. This is valid criticism as there is a design intent to be a protocol - "This API defines a language- and platform- neutral protocol for Consumers to request address book, profile, and friends-list information from Service Providers. As a protocol, it is intended to be easy to understand and implement, either as a Service Provider or Consumer, using any language or platform of choice."


The XML root element is "response". Why not "result"? Or "foo"? If it's a protocol, shouldn't there be a symmetric "request" format?

It doesn't have a schema. There are two formats, but no model. How will we add new formats , or will there only ever be two? It doesn't define a procesing model for the JSON result. or the XML. It refers to xs:string, but doesn't reference WXS.

For any address/contacts spec, I always want to know how it maps (or does not map), to vcard. It's a goal, but not explained how it's been achieved - the vcard mappings need to be stated.


The last time I looked, none of these were standards - OpenSocial, OAuth, XRDS-Simple, or, etc.

The "password anti-pattern" is referred to, but not referenced. Why is this a goal anyway? As with subsetting methods, this seems like a layer violation. What a format spec needs to do is state security issues known to be introduced by the format.

"We started with a review of all the major existing contacts APIs and targeted common capabilities that they all share and provide." Which ones? References?

It's clearly tied to HTTP, so more layering problems. It refers to methods - "All requests to the Service Provider are made as HTTP GET operations on a URL deriving from the Base URL specified in Section 5 (Discovery)." This is dubious as it breaks protocol layering and subsets HTTP, while being utterly dependent on HTTP. It subsets the response codes - "Service Providers MAY return additional codes to indicate additional information, but are discouraged from doing so and should instead augment the reason text with existing codes, if possible" - that's not good technical advice. Don't even start me on the path mappings. Also, any serious attempt at unifying contacts formats needs to work over phones, and in/out of directory/mail services.

import antigravity checked in

Revision: 66902

Monounsaturated

Stephan Schmidt: "Then our converter cannot detect that items is a list"

Right. XML is "looser" - it doesn't constrain child elements by design ahead of time (as long as you have the right processing rules). With JSON you have to know whether something is a sequence or not, upfront. Whether this extra coupling matters or not, I'm not sure - JSON isn't just a fat free alternative, it's an optimisation. One outcome is that JSON is never going to replace markup as a documentation format - its sweet spot is for sending down database results from servers.

That XML is (shock) well thought out was even a surprise to Lisp people, once - try catching missing end tag errors with sexprs sometime.

Magnificent Seven - the value of Atom

Bill Burke: "For me, the value of Atom haven’t really clicked with me yet.  Its just too SOAPy for me.  If you look at ATOM, the ATOM protocol, and how people are talking about using it, they’re really using it as an envelope.  One of the things that attracted me to REST was that I could focus on the problem at hand and ignore bulky middleware protocols (like SOAP) and lean on HTTP as a rich application protocol."

Great question and very important as Atom/AtomPub and things "REST" go through a hype phase. For me, the value of Atom is wrapped up in

atom:id
atom:updated atom:link the extension rules (mustIgnore, foreign markup) the date construct rules the content encoding rules unordered elements

the problem with SOAP (as opposed to the WS cruft that followed) was that the minimum envelope defined nothing, the extension rules took the wrong default (nustUnderstand) and content encoding was left as an exercise.

Even if you don't like Atom (or XML for that matter), if your carrier format is going to survive on the web, you need to have addressed these 7 primitives. This is what I tell people who prefer something domain specific and direct instead of trying to map the domain in abstract  formats like Atom and SOAP - square off those and you're 80% there in terms of format quality and robustness. This applies I think to any format for use over the web or in a decentralised system, not just XML. Once a sloppy data format gets into the wild, you can't just refactor the callers, you have to version. And version. And version.

"Maybe I just haven’t seen the light yet.  It took me months to accept REST as a viable way of doing things.  Maybe I just need somebody to yell at me about ATOM."

You'd never know going by Bill's posts to the JSR311 list. I can't wait for the major Java stacks to have decent options for REST style web apps.

Schedule bound

Steve Loughran: "Looking at the other areas of work, I think scheduling will get the most interest from different people. Why? Because its where people like Platform Computing deliver value. It's not the APIs for grid computing, it's in distributing work to chosen machines. The current Job Scheduler works, but it is very simple. Every task worker node has a number of 'slots' -work is assigned to workers with spare slots. The scheduler is location aware, looking for the closest open slot to data, but there is no real examination of how much work a node is really doing, what the expected workload of the new job is (based on past experience), or anything resembling balanced scheduling between users. Over time, that's where there is going to be fun. Watch that space."

I think scheduling is interesting for another reason. Scheduling seems like a natural bottleneck in a  master/worker system. I've was looking at Hadoop for a project in work a while back (and to see if we can use it for general async/batch work) and while it's easy to get hung up on something like the namenode or reducers, or even the "it takes get used to" programming model, I kept coming back to the code that will decide when to put work into the jobservers - worried that it would dominate the system.

Choosing a Java web framework

Mark Watson: "The problem I am having is that I would very much like to settle on a single framework in order to reduce the effort of staying on top of too many tools and frameworks. [...]

I would like to be able to invest at most 100 hours of study time, and get back up to speed on a single framework, but I am not sure which to choose. GWT is very tempting but GWT does not cover all of the types of web applications and services that I am likely to be contracted to build. Seam looks good as an integrated framework, but I need to set aside a long weekend to give it a good preliminary evaluation."


I think in the Java space right now, there is no web pony, and looking back as far as 2004, there never has been.

Given what I know about Mark (working in Ruby) and his background (AI/Lisp, I have one of his books from college), I would recommend looking at Grails. Grails also has orthodox internals that will appeal to most shops that use Java - Groovy running with Spring, Sitemesh and Hibernate - so for example, the ActiveRecord style approach doesn't become a point of resistence, and even though Groovy is not the fastest language on the block, the framework is offloading plenty of work. Groovy has also the advantage of being syntactically closer to Java than Python/Ruby, which eases adoption - if you squint you might see Javascript, which gives interesting options for who builds server side web pages :)

I don't think Seam is the right choice if your background or productivity sweet spot is a Rails/Django style stack, though I'm sure it will be a popular framework for in-house work or 'post-modern' enterprise projects. JRuby/Rails I think will only get faster, fwiw.

CMIS Specifics

Bex Huff on CMIS: "I have some issues with this, because I feel APP isn't robust enough for large scale syndication. "

AtomPub is a posting protocol, not a syndication protocol.

"There simply is no guarantee of quality of service when you're using "feeds", "

What does "quality of service" mean?

"and polling-based architectures simply don't scale to thousands of enterprise applications. That's the dirty little secret that ReST fanboys don't want you to find out..."

Someone might want to tell that to the web syndication world. I think their web is bigger than your enterprise.

There's probably a point in Bex's post, ECM can get very complicated but there'd need to be a lot more precision about criticising web technology like RSS/Atom/AtomPub/Http. For example:

Versioning Synchronisation Private/restricted content User varying content Conflict resolution Batching Error codes Translation Editing workflows Composite documents Multipart posting Security Search (including thesauri and vocabularies)
Partial updates Publishing, (and multichannel publishing) Link verification Metadata management
Multiformat export

which is the meaty stuff once you get beyond basic CRUD work. But that would require a more detailed post and less handwaving ;)

Repository supersets DAO

Phillip Calçado: "More and more I’ve seem the Repository been used as a fancy name for DAOs. It is very common nowadays to have things named Repository that create SQL/HQL/EJBQL queries or deal with database transactions or connections. Only a DAO with a different name."

This resonates with me. Here's SpringSource's definition of their @Repository annotation: "Indicates that an annotated class is a "Repository" (or "DAO")." That's mixing up two distinct concepts - worse, the Repository pattern is a superset of the DAO, so it's wrong.

Reducing the concept of a Repository to a DAO isn't helpful. It means the industry (or maybe just Spring ;) will need to introduce a third concept down the line or redefine what was meant by "Repository". Granted it doesn't help that the Repository pattern in Domain Driven Design isn't well explained, and you can argue that there's nothing inherently "per table" in the DAO or DataMapper patterns, but in reality that's how most of us use them.

The intent of a Repository is to abstract out the kind of storage you are using, not just the details of a particular relational database. What the Repository isa Dao interface gives a developer is a structural decoupling from the database, so you can use HSQL or Mocks for testing, or support multiple backends in production, but it still has the assumption that you are using a single relational database; that tends to mean in code each supposed Repository is mapped onto tables.

[image]

(update: 2008/09/21: the original version of this made it seem as though hibernate shards was at fault; it wasn't ;)

As an example, recently in work I did an extended spike on what will be a shared data system in our products. It'll get called heavily so it probably 'needs ro scale' and perhaps support functional splits; in practical terms that meant keeping an eye on the data joins. For that reason I started the spike with Hibernate shards. What I found is that the Repository-isa-DAO approach falls apart when you need to either carefully order the serialization sequence to put related data on the same physical partition or where you're organising your relational data to be treelike with minimal joins (so it could in principle be functionally split later). Normally that logic would be exposed in Domain Facade and per object save methods; the problem with the I was using sharding to start with is that you I called saves on one to many relationships out of order in what (I thought) otherwise would be reasonable code. In my case what I found was I was allowing Hibernate Shards to put objects that should be related to each other on different shards - not good! This is less of a referential integrity problem and more like finding you've hung drawn and quartered your data and sent it to the four corners of the realm. I don't think a sharding layer should be fixing this up - the application code has to manage data persistence properly; so in these cases the developer's life just got a bit harder.  Some quick fixes later and I had the corrected the persistence layer, by making sure the object graph was properly cued. All that logic had to pushed back behind an interface - the Repository. And this is where the Repository abstraction shines, especially in conjunction with Builders. You give the Repo an object graph and it takes care of the serialisation, the physical allocation of data, and removes the assumption around a single relational database.  DAOs by contrast don't remove the single database assumption. I suspect also that unlike a DAO, a Repository is part of the domain and not a detail of persistence mechanism that needs to be hidden behind a facade/service layer.

Why does this matter? Well, if the Java world goes the route of partitioning and splitting databases or passing off storage to remote systems the way the LAMP crowd have, then it would be good to have to have explicit idioms and patterns to suppport those models. And of course it should to be said that if you are building on a single database (and that's most of us, most of the time), DAO/DataMapper are enough of an persistence abstraction and Repositories are probably pattern noise - use a DomainFacade and move on.

Where to develop Web Specs 2

August: "Maybe it's time for the W3C to look at the consortium aspect and address concerns about "openness", perhaps by having an auxilary to provide the kind of structure and governance that would not require something like the WHATWG to exist."

September: W3F

 

There is a building

I go to read about CMIS, a spec for an AtomPub based CMS. AtomPub was a purpose of mine. CMIS is not an alternative to JCR. It will be submitted to OASIS.

[image]

I get to a page. In that page there is a link to a zip file. There is a jessionid in the URL. In the downloaded zip file there are PDF files. In the zip file there is also a folder.  In this folder are many files. Some XML, some XSD,  some WSDL. One file is called APP.xsd. It is imported by ATOM4CMIS.xsd. ATOM4CMIS.xsd is the source file; it leads to many other files.

This had better be one good spec.

Ching

Google Blogoscoped: "Google Chrome is Google’s open source browser project. As rumored before under the name of “Google Browser”, this will be based on the existing rendering engine Webkit. Furthermore, it will include Google’s Gears project."

That's a mobile play.

[image]

 

 

Embrace Change

David Anderson on Infoq, Future Directions for Agile.  I think maybe this is the best material I've come across on Agile since v2 of Extreme Programming. It floored me.

[image]
I didn't like this presentation to start with, but it sucked me in. Give it 10 minutes (I know, that's *forever*). Agile these days is like food and drink, but not everyone today will know the conditions Agile grew out of and what needed to be fixed. The first half hour is an excellent history lesson. There are two underlying themes. First is that Agile needs to become predictable, a science, and not devolve into a cargo cult. The second is that Agile needs to grow beyond the programming part of the software business.

[image]


The presentation really kicks off about 35 minutes in when some new principles get derived from lean techniques. I'd like to say the answer is to expand towards Lean, because it addesses so many lifecycle, quality and scheduling problems,  and this is a landmark presentation, but there's more to it than that. 

"in knowledge work problems, coordination costs grow non-linearly with batch size".

[image]

It's funny how things work. The first time I heard of Kanban was in 2001, on a work placement; another student who was in manufacturing explained how crates of parts were being pulled from a shelf when the production needed them; when that shelf was pulled from that caused part orders to come in from outside. That was a good 5 years before starting my CS degree. At the time I knew nothing about programming; I just thought it was cool seeing how dental floss boxes were made (I was told that the one-piece dental folding floss box itself was an industrial design breakthrough).

"If you multiply what we know we can get out of agile and lean methods and what the SEI and academic communities know you can get from software product lines, you get a 100-fold. improvement, an improvement on the same sort of scale, that we've seen in manufacturing over a hundred years as they moved from mass manufacturing to lean manufacturing." 

Anderson refers to cloud computing and says architecture and design will be back in fashion in a few years. I'd say it already is, but in the guise of splitting out heavy infrastructure work from development, and starting to rely on DSLs, not the old way of trying to make things perfect upfront and visual/4gl programming.

[image]

Last quote: "Get it out of your mind that high maturity and agile are incompatible. If you like maybe CMMI levels 2 and 3, and the antipatterns they create are very incompatible with agile, but high maturity levels, 4 and 5, that's what we've aiming for all these years"

The presentation ends with some stuff on real option theory (as opposed to the last responsible moment), the part about how people prefer making a decision over being right alone is worth it.

Flickr: API responses as feeds

Kellan Elliott-McCrae: "You can already specify that you want the output format of a Flickr API call to be REST (POX), XML-PRC, SOAP (shudder, not sure that one still works), JSON, or serialized PHP. We always wanted to support formats like KML, or Atom but we were never quite sure how to represent the results of a call to flickr.photos.getInfo() or flickr.photos.licenses.getInfo() as a KML.

Last week we finally got around to pushing out our 80% solution — an experimental response format for API methods that use the standard photos response format that allows you to request API responses as as one of our many feed formats.

You can now get the output of flickr.photos.search(), or flickr.favorites.getList() as Atom, or GeoRSS, or KML, or whatever.

The syntax is "&format=feed-{SOME_FEED_IDENTIFER}" where the feed identifiers follow the same convention you use when fetching…feeds."


You are viewing a mobilized version of this site...
View original page here

How do you rate mobile version of this page?

Mobilized by Mowser Mowser