Is TIOBE Fatally Flawed?
update: As Bogdy mentions in the comments, my reasoning here was based on false assumptions. It still seems clear that ranking APL above Haskell, along with other anomalies, disqualifies TIOBE for any serious purpose, at least past the top ten or so languages. My rankings should be ignored, though.
During a debate at work about using Haskell for a project, a coworker pointed out that Haskell is ranked #41 on the TIOBE. On further investigation, things look really fishy. Common interpretations of TIOBE include the amount of “community”, “buzz”, or “excitement” around a language. By none of these standards can APL reasonably edge out Haskell. I dug further.
Summary of findings: the TIOBE is severely broken. It is falling victim to the fact that search engines grossly overestimate their number of results. For example, if I search Google for “haskell programming”, as TIOBE does, the resulting page proudly estimates 44,500 results. However, if I click through the results, I hit the end of the list after only 652. Nice for marketing Google, perhaps, but it seems the estimate was rather poor. Similar things happen with other languages.
TIOBE, despite using several search engines, seems to correlate well with Googles estimated (i.e., phony) number of results. It correlates very badly with the actual number of results. Here’s my corrected TIOBE list, built only from the top 50 languages in the original list. In order to comply with Google’s terms of service, I painstakingly did this by hand; so I didn’t go any further.
There are some things that are initially surprising; but some thought indicates they may be reasonably expected. Languages near the top tend to be those that are somewhat old (more time to write about them) or commonly used – past or present – in business and/or the academic world. That’s because these languages have a reason to have a lot of web pages written about them. One example: Prolog clearly isn’t a commonly used language nor one with a lot of community, but it’s taught by just about every computer science department in the world’s “programming languages” intro courses, because they feel better including something besides imperative and functional languages. Hence, it’s been written about a lot. One can see the effect of the “big community” effect though, if only in languages that appear above where you’d expect to see them.
I also split Lisp/Scheme into Lisp and Scheme separately, and dropped Natural because Googling for “natural programming” turned up more irrelevant results than relevant ones.
Without further delay, the “Chris” update to the TIOBE list.
- Fortran
- COBOL
- C
- Logo
- JavaScript
- MATLAB
- Prolog
- RPG
- ML
- Pascal
- Lingo
- Scheme
- LISP
- REXX
- C++
- Forth
- Smalltalk
- Icon
- SAS
- ABAP
- Tcl
- IDL
- FoxPro
- Haskell
- Bash
- Java
- CL
- APL
- ColdFusion
- Delphi
- Perl
- BASIC
- Objective C
- Erlang
- Lua
- Ada
- Awk
- ActionScript
- VBScript
- Ocaml
- D
- Dylan
- C#
- Python
- Ruby
- Transact-SQL
- PHP
- LabView
- S-lang
- PL/SQL
By a similar standard, in no way is Ada drawing more community, buzz, or attention than Python.
Let’s just write the whole thing off, shall we?
Indeed, Bill, I didn’t mean to say that it was. I only meant to say that in the real world, Google indexes more pages that contain the phrase “ada programming” than “python programming”. When one sees the real list, it busts a lot of misconceptions; including that the number of web pages about something is any kind of indicator of its popularity, community, or liveliness.
It may also be the phrase that TIOBE chooses. I’d venture a guess that there are a lot of web pages that talk a lot about Python but never contain the exact phrase “Python programming”. This is partly because that phrase is rather formal, and Python isn’t a very formal language. So yeah, the “new and improved” version isn’t foolproof either.
In my opinion, this kind of ranking just can’t be done by one formula. You’re generally right, cdsmith. Your ranking is correct for your aims. The TIOBE one is wrong for their aims, which would require a lot more-refined querying.
Search “XXX language”. With quotes. These numbers are more accurate. But useless too.
Are you telling me you got less than 652 hits for Java? I see you rank Haskell 2 spots above Java.
Actually google limits the results to 1000. For any keywork/search phrase. From this 1000 it cuts the non relevent ones like similar content pages. So what you get here is very wrong.
To check my affirmations you should search for “computer” and see there are less than 900 pages containing the word :P.
Just like Bogdy said, you cannot get more than 1000 results from Google. And after Google removes duplicates from the list you get something like 652 ;-)
Your ranking downgraded Lua from #21 to #35, therefore you are a bad person and deserve to die.
Give me a break! TIOBE is much more than just a googlefight. Read the TIOBE pages to learn how they generate this. You’re just going to make up your own ranking based on what google’s supposed “real” hit count is after going next, next, next to the end and then make your “own” ranking based on this? You’ve got to be kidding me. What a joke.
Do think the fact that Java was ranked behind Haskell was a small clue? Or maybe that Logo was ranked 4th. Wow, Haskell programmers (the religious kind) really do live in another world….
I wonder… why ASP.NET does not belong to the list?
ASP.NET is not on the list because it’s not a programming language. You can use C#, VB.Net, Boo, or any .NET language to write an ASP.NET web application.
And, that is another problem in TIOBE index, at least regarding .NET languages. The fact is that the main language names (C#, VB.NET) are seldom used in blog posts. They are more used in basic articles or overviews of the language. .NET blogs are more concerned with some technology within .Net, so there are blogs about ASP.NET, Ajax, WCF, WPF, CLI, but rarely people write about C# or VB.NET itself.
As an argument for this, take a look here:
http://lui.arbingersys.com/index.html
and you’ll see that graphs that show language USAGE show much higher rate for C# than the search for c# programming. Also, if they looked at the Codeplex repositories, they’d probably see even more of C# in the big picture.
We are providing clients with solution-driven legal services since 2018. And among our growing list of clients are those engaged in public utilities, inns and hotels, leasing companies, telecom distribution, banana plantation, hospital and start-up businesses.