Skip to content
June 18, 2007 / cdsmith

Is TIOBE Fatally Flawed?

update: As Bogdy mentions in the comments, my reasoning here was based on false assumptions. It still seems clear that ranking APL above Haskell, along with other anomalies, disqualifies TIOBE for any serious purpose, at least past the top ten or so languages. My rankings should be ignored, though.

During a debate at work about using Haskell for a project, a coworker pointed out that Haskell is ranked #41 on the TIOBE.  On further investigation, things look really fishy.  Common interpretations of TIOBE include the amount of “community”, “buzz”, or “excitement” around a language.  By none of these standards can APL reasonably edge out Haskell.  I dug further.

Summary of findings: the TIOBE is severely broken.  It is falling victim to the fact that search engines grossly overestimate their number of results.  For example, if I search Google for “haskell programming”, as TIOBE does, the resulting page proudly estimates 44,500 results.  However, if I click through the results, I hit the end of the list after only 652.  Nice for marketing Google, perhaps, but it seems the estimate was rather poor.  Similar things happen with other languages.

TIOBE, despite using several search engines, seems to correlate well with Googles estimated (i.e., phony) number of results.  It correlates very badly with the actual number of results.  Here’s my corrected TIOBE list, built only from the top 50 languages in the original list.  In order to comply with Google’s terms of service, I painstakingly did this by hand; so I didn’t go any further.

There are some things that are initially surprising; but some thought indicates they may be reasonably expected.  Languages near the top tend to be those that are somewhat old (more time to write about them) or commonly used – past or present – in business and/or the academic world.  That’s because these languages have a reason to have a lot of web pages written about them.  One example: Prolog clearly isn’t a commonly used language nor one with a lot of community, but it’s taught by just about every computer science department in the world’s “programming languages” intro courses, because they feel better including something besides imperative and functional languages.  Hence, it’s been written about a lot.  One can see the effect of the “big community” effect though, if only in languages that appear above where you’d expect to see them.

I also split Lisp/Scheme into Lisp and Scheme separately, and dropped Natural because Googling for “natural programming” turned up more irrelevant results than relevant ones.

Without further delay, the “Chris” update to the TIOBE list.

  1. Fortran
  2. COBOL
  3. C
  4. Logo
  5. JavaScript
  6. MATLAB
  7. Prolog
  8. RPG
  9. ML
  10. Pascal
  11. Lingo
  12. Scheme
  13. LISP
  14. REXX
  15. C++
  16. Forth
  17. Smalltalk
  18. Icon
  19. SAS
  20. ABAP
  21. Tcl
  22. IDL
  23. FoxPro
  24. Haskell
  25. Bash
  26. Java
  27. CL
  28. APL
  29. ColdFusion
  30. Delphi
  31. Perl
  32. BASIC
  33. Objective C
  34. Erlang
  35. Lua
  36. Ada
  37. Awk
  38. ActionScript
  39. VBScript
  40. Ocaml
  41. D
  42. Dylan
  43. C#
  44. Python
  45. Ruby
  46. Transact-SQL
  47. PHP
  48. LabView
  49. S-lang
  50. PL/SQL
About these ads

15 Comments

Leave a Comment
  1. Bill Mill / Jun 18 2007 8:26 pm

    By a similar standard, in no way is Ada drawing more community, buzz, or attention than Python.

    Let’s just write the whole thing off, shall we?

  2. cdsmith / Jun 18 2007 8:47 pm

    Indeed, Bill, I didn’t mean to say that it was. I only meant to say that in the real world, Google indexes more pages that contain the phrase “ada programming” than “python programming”. When one sees the real list, it busts a lot of misconceptions; including that the number of web pages about something is any kind of indicator of its popularity, community, or liveliness.

  3. cdsmith / Jun 18 2007 8:51 pm

    It may also be the phrase that TIOBE chooses. I’d venture a guess that there are a lot of web pages that talk a lot about Python but never contain the exact phrase “Python programming”. This is partly because that phrase is rather formal, and Python isn’t a very formal language. So yeah, the “new and improved” version isn’t foolproof either.

  4. LeCamarade / Jun 18 2007 11:24 pm

    In my opinion, this kind of ranking just can’t be done by one formula. You’re generally right, cdsmith. Your ranking is correct for your aims. The TIOBE one is wrong for their aims, which would require a lot more-refined querying.

  5. Ivan Tikhonov / Jun 19 2007 1:08 am

    Search “XXX language”. With quotes. These numbers are more accurate. But useless too.

  6. Tony BenBrahim / Jun 19 2007 1:14 am

    Are you telling me you got less than 652 hits for Java? I see you rank Haskell 2 spots above Java.

  7. Bogdy / Jun 19 2007 1:34 am

    Actually google limits the results to 1000. For any keywork/search phrase. From this 1000 it cuts the non relevent ones like similar content pages. So what you get here is very wrong.

    To check my affirmations you should search for “computer” and see there are less than 900 pages containing the word :P.

  8. taw / Jun 19 2007 3:53 am

    Just like Bogdy said, you cannot get more than 1000 results from Google. And after Google removes duplicates from the list you get something like 652 ;-)

  9. Assen / Jun 19 2007 7:40 am

    Your ranking downgraded Lua from #21 to #35, therefore you are a bad person and deserve to die.

  10. Jim Fredricks / Jun 6 2008 5:11 pm

    Give me a break! TIOBE is much more than just a googlefight. Read the TIOBE pages to learn how they generate this. You’re just going to make up your own ranking based on what google’s supposed “real” hit count is after going next, next, next to the end and then make your “own” ranking based on this? You’ve got to be kidding me. What a joke.

  11. Ola Hola / Jun 6 2008 5:25 pm

    Do think the fact that Java was ranked behind Haskell was a small clue? Or maybe that Logo was ranked 4th. Wow, Haskell programmers (the religious kind) really do live in another world….

  12. Crusher / Jul 22 2008 3:55 am

    I wonder… why ASP.NET does not belong to the list?

  13. Tom / Jul 23 2008 9:03 am

    ASP.NET is not on the list because it’s not a programming language. You can use C#, VB.Net, Boo, or any .NET language to write an ASP.NET web application.
    And, that is another problem in TIOBE index, at least regarding .NET languages. The fact is that the main language names (C#, VB.NET) are seldom used in blog posts. They are more used in basic articles or overviews of the language. .NET blogs are more concerned with some technology within .Net, so there are blogs about ASP.NET, Ajax, WCF, WPF, CLI, but rarely people write about C# or VB.NET itself.

  14. Tom / Jul 23 2008 9:09 am

    As an argument for this, take a look here:

    http://lui.arbingersys.com/index.html

    and you’ll see that graphs that show language USAGE show much higher rate for C# than the search for c# programming. Also, if they looked at the Codeplex repositories, they’d probably see even more of C# in the big picture.

Trackbacks

  1. Top Posts « WordPress.com

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 79 other followers

%d bloggers like this: