# Zipf’s Law – modelling the megalopolis

More statistical sensawunda in the urban environment. Remember us mentioning that guy who suggested that cities can be considered  as super-organisms? Well, a mathematician chap called Stephen Strogatz dropped into the New York Times blogs to talk about Zipf’s Law and other statistical phenomena that surround our urban environments:

The mathematics of cities was launched in 1949 when George Zipf, a linguist working at Harvard, reported a striking regularity in the size distribution of cities. He noticed that if you tabulate the biggest cities in a given country and rank them according to their populations, the largest city is always about twice as big as the second largest, and three times as big as the third largest, and so on. In other words, the population of a city is, to a good approximation, inversely proportional to its rank. Why this should be true, no one knows.

[…]

For instance, if one city is 10 times as populous as another one, does it need 10 times as many gas stations? No. Bigger cities have more gas stations than smaller ones (of course), but not nearly in direct proportion to their size. The number of gas stations grows only in proportion to the 0.77 power of population. The crucial thing is that 0.77 is less than 1. This implies that the bigger a city is, the fewer gas stations it has per person. Put simply, bigger cities enjoy economies of scale. In this sense, bigger is greener.

The same pattern holds for other measures of infrastructure. Whether you measure miles of roadway or length of electrical cables, you find that all of these also decrease, per person, as city size increases. And all show an exponent between 0.7 and 0.9.

Now comes the spooky part. The same law is true for living things. That is, if you mentally replace cities by organisms and city size by body weight, the mathematical pattern remains the same.

It looks as if there’s a lot of things that mathematical analysis could tell us about the cities we live in. The question is, are these properties inherently emergent, or could we design our urban environments more effectively and adjust some of those efficiency values in the process? [image by tylerdurden1]

# This post will make you 75% more likely to make the right decision on medicines!

No report on a new wonder-drug would be complete without the statistical results of the clinical trials – you know, the bit where it says that people taking Wotdafuxocin were 60% less likely to find captioned cat pictures funny, or something similar. [image by rbrwr]

It will probably come as no surprise to our more cynical readers that these risk reduction numbers – while technically correct – are expressed in a way to maximise the medicine’s results as perceived by the casual reader:

Those are the figures on risk, expressed as something called the relative risk reduction. It is the biggest possible number for expressing the change in risk. But 54% lower than what? The trial was looking at whether it is worth taking a statin if you are at low risk of a heart attack or a stroke, as a preventive measure: it is a huge market – normal people – but these are people whose baseline risk is already very low.

If you express the same risks from the same trial as an absolute risk reduction, they look less exciting. On placebo, your risk of a heart attack in the trial was 0.37 events per 100 person years; if you were taking rosuvastatin it fell to 0.17. Woohoo.

Other research shows that even when faced with the same risk reduction expressed in two different ways, the majority of people will still pick the one where the number looks bigger. Don’t beat yourself up about it too much, though – it’s not just us patients who fall for the marketing tricks:

The same result has also been found in experiments looking at doctors’ prescribing decisions.

But try to think positive – it’s not often we get placed on an equal footing with our doctors, after all.

# Google search terms can predict flu outbreaks; what next?

You’d have to have been under a pretty large metaphorical internet rock to have missed all the reports about Google Flu Trends that are floating around the web today like sneezed particles of snot, but just in case:

By tracking searches for terms such as ‘cough’, ‘fever’ and ‘aches and pains’ it claims to be able to accurately estimate where flu is circulating.

Google tested the idea in nine regions of the US and found it could accurately predict flu outbreaks between seven and 14 days earlier than the federal centres for disease control and prevention.

So I was thinking, if they can predict flu outbreaks by using search terms as an indicator, what else can be predicted in a similar way? Stats geeks were rinsing comparisons of Obama and McCain as search terms in the run-up to the election, but politics is a bit more complicated than infectious diseases.

Or is it? [image by trumanlo]

# Black swans and the Fourth Quadrant

Statistician and economist Nassim Nicholas Taleb has written an essay on what he calls the Fourth Quadrant, or the statistical “danger zone”. It’s in depth and I found it technically challenging to understand: but I felt it was well worth it in the end. Seriously, go read it, it makes you feel cleverer. The gist:

Statistics can fool you. In fact it is fooling your government right now. It can even bankrupt the system (let’s face it: use of probabilistic methods for the estimation of risks did just blow up the banking system).

Taleb rails against the misuse of statistical economics that has lead us to our current economics woes. Also check out the various responses to Taleb’s essay by assorted luminaries.

[essay on Edge.org][image from BotheredByBees on flickr]