Why Most Published Research Findings are False

Why is it that it’s considered noble to ‘reason’ with someone, but’rationalizing’ isconsidered to be dishonest? How is an ‘excuse’ different from a ‘reason’? The words all describe the same activity, saving the appearances, but the various forms connote differing degrees of relative ‘truthfulness’, something which is hopefully determined independently of ‘reason’.

The modern day cult of reason holds that thruthfulness is a function of lots of good, repeatable observation and data. It’s not a bad assumption, and might even be proven true on the day that we have access to all of the universe’s information and have constructed a logical system with no flaws. But in the real world, it’s pretty darned easy to get bad data, and easy to arrive at the wrong conclusions when the data is good (or not verifiably bad).

Throughout my career, I’veworked with a variety ofanalytical tools used to justify decisions. A minor obsession of mine has been to observe how often business decisions are made on flawed data and results. The first time I observed this was very early in my career, when I discovered some flaws in a spreadsheet model being used to justify several million dollars expenditure. The flaws were small, but cascaded to the final result to completely change the outcome’s profitability and trend lines. I’ve since seen studies which show that 90% of spreadsheet models used in business contain errors, or that more than half contain errors which significantly impact the result. Feed bad data into these models, and the picture just gets worse. The same kinds of errors can be found in models created with other analytical tools.

Now Marginal Revolution reasons that more than half of published research findings are false. It makes sense, really (and I believe he gives economics too much of a pass, since economics suffers from different problems than medicine, such as dubious repeatability, inability to control important variables, and wide contextual variation).

I’m not arguing that reason and analysis are useless, however. They are incredibly useful tools. But the level of blind faith placed in these tools, and the lack of healthy scientific skepticism, often have me thinking of the emperor’s new clothes.

More on CSS Hacks

Wow, it’s nice to see the folks at positioniseverything endorsing the demise of CSS hacks in IE7.

Second to useragent string detection issues, CSS hacks that used to work in IE6 but no longer work in IE7 (due to CSS fixes) will probably be the most common cause of compatibility issues for IE7, and will cause some short-term pain to existing sites that use the hacks. The long term benefits of having IE support more of CSS for new site development, however, are large. Thanks to p.i.e. for keeping sight of this important point.

Indentured Servants?

News today is that some in Congress want to up the H1-B visa limit by 30,000. It’s nice, but not much. I’ve ranted about this before, and in fact the situation has gotten worse.

It’s surprising to me that this issue, known as “retrogression“, hasn’t received more press attention. Basically, due to massive backlogs and delays in the last five years, the effective wait for someone from China or India to receive permanent residence is in some cases approaching ten years. That is 10 years of limited career mobility and freedom, amounting to virtual indentured servitude. Not only does the uncertainty impact ability to do career planning, it interferes with long-term planning for life decisions like home purchases, children’s school, and uniting with spouses.

This situation is hurting our ability to attract and retain the world’s best talent. And the situation is not getting better. IEEE seems to think that it’s just as well if we send back all of the foreigners in tech. It’s one reason I left IEEE. But besides the fact that it’s a matter of basic human rights, booting these immigrants (by policy or sheer ineptitude) is opposed to America’s interests. The native population currently has a negative birthrate, and without immigration the population would actually be declining. So, if you are going to have immigrants living next to you, wouldn’t you rather they be highly-educated and ambitious people with strong community values? In a competitive global economy, you’re going to compete with these people anyway — might as well be making them citizens, rather than propping up governments in other countries less favorable to U.S. interests. I get a feeling that the U.S. government and citizens are turning a blind eye to this terrible situation, perhaps out of a misguided sense that what’s bad for immigrants is good for us. They couldn’t be more wrong.

Who’s the Master?

Google does spell-checking based on how frequently similarly-spelled words appear in the billions of texts they index. Many other sites do spell checking based on official references such as a dictionary. Which is “right”?

It’s surprising how many people will argue that “The Dictionary” is the authoritative source for word spellings — and word definitions, for that matter. A word is just a symbol, created by people for use by people, and it acquires it’s meaning based on how people use it. A Dictionary is written by people who take the pulse of current word usage (like Google does) and capture it in reference form. The mapping between symbols and meanings shifts across time and context.

This is self-evident on the face, yet people still default to trusting the authority of the constructed symbol first. In fact, this disease of blithely ceding sovereignty to our hollow creations is epidemic. Logic, reason, science, and even bayesian networks are all lifeless golems constructed purely to do our bidding. We operate within their parameters when and because it serves us, and only then. Note that I’m not saying logic is to bent to the will of the individual (though it happens every day), but that logic which doesn’t serve a more fundamental purpose or collective goal is no logic at all. And a word which isn’t rooted in collective usage is no word at all.

Better Living Through Shared CMS

William offers The Answer for Microsoft. I agree with the vision that’s articulated there, and as an industry we are much closer today than 5 years ago. Some quick thoughts, unpolished:

  • Steve Ballmer has pointed out that Office strength comes from the fact that the average information worker spends more than 60% of their working hours inside applications shipped by Microsoft. That is the holy grail for competitors like Google/Yahoo. You want people living their lives inside your apps; you do this by finding new applications that people will spend time in, replicating vulnerable areas of the office suite, and so on.
  • Dare reports from Web 2.0 that only 5% of user page time is spent in Search, while 40% is in content consumption and creation and 40% in communication. Most of the ad revenues are coming from search, since that’s where companies have learned how to monetize, but the fact is that “shared CMS” scenarios are a huge portion of people’s time, and thus a very attractive opportunity for the big web companies.
  • Don’t assume Microsoft doesn’t get it. Wikis are all over inside the company, and the version of OneNote for Office 12 is like Wiki on steroids (allow simultaneous edits and offline experience). Microsoft’s main deviation from the ideal, IMO, is in “universal accessibility”. We need license revenues for client, so you only get the experience if you buy the product. Google has problems with “universal accessibility”, but for different reasons. Everyone still wants a walled garden. It’s getting easier to imagine zero-touch deployment and ad-funded versions of OneNote (which is just one example), so don’t write the vision off.
  • While I believe the vision of “universal accessible shared CMS” is progressing nicely, it’s not going to come from Microsoft alone. And I think that this “universal canvas” is just the first step.
  • The next step, after universal canvas, is something like a universal triple store/cloud. The people who think it’s about shared services and interfaces (or a grid) are wrong. Service-oriented grids may come in the interim, since they’re easier, but there is too much friction. There is nothing universal about a thousand different APIs from different vendors, and the data is the only thing that has value anyway. I’m not saying that SPARQL is the future (it’s not). But I agree with Dare that ning gives a hint of the future. To the extent that the developer interfaces are data-oriented (and simple and universal), you approach the next level. To the extent that the developer interfaces are behavior-oriented and type-bound, you are stuck in an expensive tower of Babel.

Normalizing Citysearch

DeWitt Clinton noticed that Citysearch ratings have inflated in the past few years, and decided to do something about it. His greasemonkey script normalizes the ratings to pick out the true top. Check out his before and after pics.

This approach should be the front line filter for all semantic web trust problems. I envision a day when all metadata you browse can be normalized and filtered to strip out outliers, and match normal statistical models. I’ve also argued for years that you need to be able to cluster based on past activity patterns for you and your circle of friends. This is no different from what we do in real life — your brain sbconsiously clusters events into nameless contexts and gives weightings to observations based on context. The web should be the same.

Why is Microsoft Afraid of Google?

This is a really bad article. The article asserts two falsehoods: one, that Microsoft is afraid of Google; and two, that Microsoft will win by being a relentless tortoise and copying features.

Let’s talk about the “copying” theory first. Five years ago, Bill Gates shook up the industry by announcing a dual-pronged strategy — all productivity apps seamlessly integrated into the universal canvas of the web, and the “web as a platform”. This wasn’t vapor, this was what I used every day. Five years ago, I did not have Office installed on my machine. I used an app that combined word processing, IM, telephony, and e-mail in a single universal canvas (with cool contextual side-menu), all running in my web browser. We decided not to ship it at that time, but it had nothing to do with product quality or feasibility.

Now fast forward to 2005. A bunch of people who worked on that project are now at Google, and rumors fly around about “bricking over” MSFT by shipping productivity apps on the web. At the same time, pundits run around talking about “web as a platform”, ripping off Bill’s 2000 vision wholesale without giving credit. Give me a break. Clearly what is happening is a bit different than Bill laid out in 2000, but the amount that is exactly the same is stunning (almost depressing; where is the originality and creativity?)

Another example is Google Earth vs. Virtual Earth. VE shipped slightly after Google Earth, but do people really think MSFT saw Google Earth and then “real quick like” copied the whole thing in a month or two? The company must really be invincible. Or take Messenger, which has been shipping new releases three times a year. Google just shipped their first version. Please don’t say we pre-emptively copied Google.

Now, this is not to disrespect Google. Both companies innovate, both copy, and both acquire innovators. I just think it’s crazy to say that any part of Microsoft’s strategy will involve copying Google. At best, it might involve resurrecting ideas that we had little incentive to ship in the past, but which are now relevant as Google tries to ship them.

Which gets to the point about “fear”. My argument is that Microsoft needs Google, to make competition fun again. Does the word lassitude mean anything? Picture a scene of a bunch of generals sitting around having no battles to fight, then the 99 red baloons floating by. “This is what we waited for, this is it boys, this is war!” Maybe even Beavis gripped by ADD, squealing with anticipation at some mayhem about to erupt.

It’s just an image that captures a mood, something that gets lost on the observers who fret about mere words like “winning” or “beating”. This is not a zero-sum game, and the most rational perspective is that both companies (and consumers) will be strengthened by the competition. When a basketball team talks about “stomping” their opponents, and pores over videos of past games, is that fear?And of course, it’s silly to focus on Google; Yahoo is schooling the web on shipping right now, and there are plenty of other strong competitors. If they all use “beat the other guy” rhetoric to galvanize their teams, I call that fun, not fear.

Seriously, look at the fruits of this newly-galvanized competition. The software world is exciting again. Which do you think is more threatening to a large company like Microsoft; being bored to death by lack of worthy competitors, or having major incentive to do new and cool things that get product teams excited and energized about coming to work?