The Wall Street Journal


Register for FREE

Register for FREE

Thank you for registering.

We sent an email to:

Please click on the link inside the email to complete your registration

Please register to gain free access to WSJ tools.

An account already exists for the email address entered.

Forgot your username or password?

This service is temporary unavailable due to system maintenance. Please try again later.

The username entered is already associated with
another account. Please enter a different username

The email address you have entered is already in use.
Please re-enter the email address.

Send me information about more WSJ features

Create a profile for me in the Journal Community

Why Register?

Privacy Policy | Terms & Conditions

As a registered user of The Wall Street Journal Online, you will be able to:

Setup and manage your portfolio

Personalize your own news page

Receive and manage newsletters



 Carl Bialik examines the way numbers are used, and abused.

science

Measuring Bolt’s Potential

When Usain Bolt neared the finish line of the 100-meter dash men’s Olympics final in Beijing, it was apparent that he slowed himself down by beginning his celebration early. Thanks to a Norwegian astrophysicist’s curiosity and painstaking analysis of television footage, we know more precisely how much Mr. Bolt slowed down. Yet we’re still a long way from knowing how fast Mr. Bolt could have run had he finished the race in a more conventional fashion.

Usain Bolt
Bolt’s unconventional finish may have added one or two tenths of a second to his time in the 100-meter dash. (AFP/Getty Images)

Hans Kristian Eriksen, a post-doc at the Institute of Theoretical Astrophysics at the University of Oslo, watched Norwegian television coverage of Mr. Bolt running 100 meters in 9.69 seconds, and remembers thinking, “It would have been fun to see what the world record would have been.” He decided to try to answer this question when he read that Mr. Bolt’s coach, Glen Mills, claimed Mr. Bolt could have finished in 9.52 seconds — a bold assertion in an event where shaving a few hundredths of a second off the record is an accomplishment.

Dr. Eriksen started by acquiring video footage of the race. Norwary’s Olympics broadcaster provided him with a DVD. “But the most useful one was from NBC, on the Web in fairly high resolution,” Dr. Eriksen told me. Then he used the footage to track the progress of Mr. Bolt and runner-up Richard Thompson, a tenth of a second at a time. He found that Mr. Thompson actually gained ground as Mr. Bolt’s speed dipped below his pursuer during the celebration. Had Mr. Bolt’s deceleration matched Mr. Thompson’s after eight seconds, he would have finished in 9.61 seconds. Had Mr. Bolt maintained a higher acceleration during that period of the race, as he did early on, he would have finished in 9.55 seconds.

The resulting range of 9.55 seconds to 9.61 seconds was widely reported. However, because of uncertainty arising from the video footage, each estimate came with a margin of 0.04 seconds, meaning Mr. Bolt’s true potential time, as measured by the study, was between 9.51 seconds and 9.65 seconds — a range enough wide that our naked eyes could have told us as much.

The study’s biggest shortcoming may be its omission of prior runs by Mr. Bolt, which may say more about his typical running pattern in a sprint’s final moments than would a performance by a competitor. “Using other runs would make more sense, definitely,” Dr. Eriksen said, though he points out that this, too, could mislead because runners are affected by their competitors. In any case, he has no plans to study other Bolt races — this study, submitted to the American Journal of Physics, “was mostly for fun,” he said, a point he made clear to the press outlets that interviewed him.

The flurry of press interest was a new experience for Dr. Eriksen, whose day job involves studying the origin of the universe using microwave radiation. “This is far beyond anything I’ve ever seen before,” Dr. Eriksen said.

Further reading: CNBC’s Darren Rovell has suggested that Mr. Bolt may have slowed up to make it easier to break the record again — since some track sponsors pay bonuses for each record-breaking performance.

What Are the Most-Common English Words?

My print column this week is about the math of words: What linguists, dictionary publishers and even Microsoft learn from corpus research. Corpora are large bodies of text, pulled from books, articles, blogs and conversations, that are meant to represent part or all of a language.

books

What first drew me to the topic was an online quiz challenging me to name the 100 most frequently used words in the English language. That quiz was based on an uncited list on Wikipedia, though it has since been updated. Many other lists are available, each slightly different; you can see and sort five Top 100 lists here – after taking that quiz, of course.

WordCount graphically represents the 86,800 most frequently used words from the British National Corpus. “I thought it was an interesting insight into the way we use language,” Jonathan Harris, an artist who developed the site, told me. “I wanted to make something a bit more playful to bring it to life.” Part of the fun is seeing where words appear further down the list. “Conquistador” is last of those words to make the cut. And users of the site occasionally email Mr. Harris with curious juxtapositions. For instance, “Microsoft,” “acquire,” “salary” and “tremendous” rank 4,304th through 4,307th.

Speaking of Microsoft, the company develops enormous corpora for more-serious purposes than online quizzes. Among the applications are spell-checking in Word and a new online translation service. And the Oxford English Dictionary is backed by a Web-derived corpus two billion words strong.

Though the use of the Web as a snapshot of language is controversial, it’s clear that the Internet and computer power in general has enhanced the field. “Five hundred Dominican monks worked under Hugues de Saint-Cher to create the first concordance of the bible in 1230; today, a better and more complete job can be done with a few minutes of programming and a few seconds of computer time,” says University of Pennsylvania linguist Mark Liberman.

Corpora can also be used to track changes in language over time, using a sampling of texts from different time periods.

What do you think? What’s the best way to represent spoken and written language? Are word-frequency lists useful for thinking about and teaching language? What’s your favorite online word quiz? Please let me know in the comments.

Thanks to readers Karen Ash and David Goldenberg for suggesting the topic.

The Immortal Math of Dog Years

My print column this week delves into the longstanding belief that dogs age seven times as fast as humans. Veterinary researchers have long known that dogs age quickly after birth, and that aging varies by breed, but the seven-year rule persists.

tombstone

Brian Kenny, a Phoenix anthropologist and dog owner who maintains the myth-debunking site Dog Years thinks he understands the impulse from his own field. “It’s a typical thing in anthropology,” Mr. Kenny told me. “We have complex ideas of how the world works. We see a lot of data pass before our eyes. Often to make it simplified for the public, we come up with a normative concept to help people relate.”

But there is no one-size-fits-all rule for dogs. “Dogs are the most diverse domestic species we have,” Gina Spadafori, syndicated pet-care columnist with Universal Press, told me. “We have cut and tailored and bred them in so many different forms. They’re all the same species, but they’re hardly recognizable as such.”

What do you think? Do you convert your pet’s age to human years? Did you know that the seven-year rule doesn’t really apply? Please let me know in the comments.

Further reading: This chart gives a rough idea of the comparative aging of dogs and cats. Kelly M. Cassidy’s Web site and the Kennel Club in the U.K. have longevity data by breed. This book has more information on the inscription at Westminster Abbey putting the human-dog lifespan ratio at nine.

An Overweight Nation

My print column this week discusses a study published in the journal Obesity that projects recent increases in the prevalence of obesity among Americans into the future, and arrives at some alarming numbers. Among them: By 2048, 100% of American adults could be overweight. The problem with that logic is that a linear trend can’t continue past 100%. For instance, recent trends in abandonment of landlines suggest that more than 100% of American adults will not have landlines by that same year, but as readers of this blog know, only public figures regularly exceed 100%.

What do you think? Do the obesity projections seem reasonable? If they help galvanize health officials to fight the problem, does that justify fuzzy numbers? How would you project obesity rates to 2048? Please let me know in the comments.

Beijing’s Murky Pollution Numbers

Beijing’s Air Pollution Index, also called API, is being closely watched this week as the Olympics begin. It’s being reported daily in dispatches about air quality and visibility in the city, and is included on the Online Journal’s Olympics page. But several factors make the index a questionable gauge of the air quality experienced by Olympians. And China’s translation of the index into “blue sky days” tends to understate the level of pollution, as journalists on the ground have noticed.

Chinese flag

China considers any index number at 100 or lower to be acceptable. This is in line with developing nations, but would be considered inacceptable in developed countries, according to Kenneth A. Rahn, professor emeritus of oceanography at the University of Rhode Island. If the U.S. used a scale similar to Beijing’s, east coast cities typically would have index levels around 10 or 20; Beijing’s index has hovered near 100 in recent days. “In U.S. terms, that’s a ridiculously high concentration,” Prof. Rahn said.

For that reason, calling sub-100 days blue-sky days, as China does, is an arbitrary choice, Prof. Rahn said. “When you have a fixed point like 100, that you should not exceed, that creates an artificial duality. This is a legalistic argument. It has nothing to do with what’s going on in the atmosphere.”

The problems extend to what is measured, and how. Beijing’s index covers just three types of pollutants, according to Bill Scotti, who works with international companies on environmental issues in China as director of risk and compliance for Meradia Group. That contrasts with six in Hong Kong. Beijing doesn’t measure carbon monoxide, ozone or respirable suspended particulates — all of which are included in Hong Kong’s index. “Air pollution is measured differently in different regions of the world, and even differently within China itself,” Mr. Scotti told me.

As a practical matter, Beijing’s index really is based only on a single indicator: the concentration in the air of particulate matter with diameter less than 10 micrometers (or millionths of a meter). That’s because like some other indexes — including the U.S. air quality index — the Beijing index is equal to the highest index for any single pollutant. And in Beijing, particularly in the summer, the highest index value is usually the one for particulate matter, according to Prof. Rahn.

That’s an unfortunate choice for the index, Prof. Rahn told me, because environmentalists prefer to measure only particles with diameter less than 2.5 micrometers (the U.S. Environmental Protection Agency changed its standards in 2006). The larger particles are “not considered particularly dangerous to one’s health,” Prof. Rahn said. Including them clouds the pollution number further.

Including them doesn’t make the China index more stringent; the scales are raised to account for the inclusion of the broader group of particles. It just leaves the index a few years behind current standards in pollution reporting, and clouds the issue further.

Further reading: Steven Q. Andrews has questioned the accuracy of the underlying readings for Beijing pollution, in a Journal op-ed and to a Time reporter. Chinese officials repeatedly have denied any tampering with the data collection. A recent study published by Chinese scientists found that the levels of particulate matter in Beijing this time last year “were much higher” than the national air quality standards. The BBC is doing its own measurements of particulate matter and finding levels that exceed World Health Organization standards. Prof. Rahn has a useful guide to the issue of Beijing air pollution, only viewable in Internet Explorer.

We’re Far Removed From Proof of ‘Six Degrees’ Theory

The notion that there are an average of six degrees of separation between any pair of individuals in the world helped inspire a play and movie, which mangled the numbers by claiming that we’re all connected by no more than six degrees. It also gave rise to related connectedness theories for subsets of the world, among mathematicians or film actors.

networking

If you believed news reports this week, you’d think that the six-degrees theory — first advanced in a test using the mail in 1967 — had been proven. Microsoft researchers announced that a study of the company’s instant-message traffic in June 2006 showed that the average number of people needed to bridge any pair of users was 6.6. The median number was 7, and the longest was 29.

The authors readily agreed with me that this is far from proof of the average distance between any two people on Earth. Some limitations of the study cause it to overstate our connectedness, while others serve to understate it. Still, the Microsoft numbers represent a major advance in our knowledge about the fragmentary nature of the online population, and certainly are more rigorous than the original six-degrees experiment from more than four decades ago.

Some 240 million people used Microsoft’s instant-message product, Messenger, in the month in question — meaning about 96.3% of the world’s population didn’t. Those who didn’t were disproportionately located in Africa and parts of South America, Asia and Eastern Europe. Excluding people can cut both ways: Some of those excluded might have helped bridge the gap between those people in the study. On the other hand, some of those excluded may not have access to telecommunications devices, and these relatively isolated people may be many degrees removed from each other, driving up the numbers. Including only Messenger users accounts for “part of the bias that is there,” Jure Leskovec, a graduate student in machine learning at Carnegie Mellon University and a study co-author, told me.

Meanwhile, the study covered just one month and counted only messages, not buddy-membership, as connections. “Grandparents probably don’t speak to grandchildren using Messenger, yet those are very strong communication links,” Eric Horvitz, principal researcher at Microsoft Research and a study co-author, told me. “Whole sets of real-world communications are removed.”

But improving on the study would be tough. Email networks are even more fragmented, so doing a similar study using only, say, Hotmail addresses would miss a lot of connections that involved Gmail addresses. Phone numbers can be shared among multiple people. And accessing this kind of data raises privacy issues. Typically, Microsoft doesn’t collect this kind of information. The researchers made a request to access just the age, gender and location of users, and it was approved. “It would take a lot of effort to go back and grab more data,” Mr. Horvitz said. Linking the email database with instant messages would pose other privacy hurdles, as well.

Mr. Horvitz mentioned another possibility: Take a sample of the people in the study and examine their other forms of communication to get an estimate of the links that are missed. But Mr. Leskovec cautioned, “Any random sampling will destroy the structure of the network.” That’s because some of the people who connect those in the sample won’t be in the sample.

The study is certainly an improvement on the 1967 experiment conducted by Jeffrey Travers and Stanley Milgram, which would have provided ample material for this blog had it and its author existed at the time. It was based on just 296 people, in Boston and Nebraska. They were asked to try to reach other by mail, using only close acquaintances as go-betweens to form a chain. Just 64 of the mailings reached their intended target. And an unpublished study by Prof. Milgram found a much greater degree of separation and a much lower rate of completion. As of today we’re closer to the true level of separation — but still far removed from that goal.

 
 


You are viewing a mobilized version of this site...
View original page here

How do you rate mobile version of this page?

Mobilized by Mowser Mowser