When I posted my Closing a Door post, I mentioned that a team of moderators would be filtering comments for me. Comments that did not meet my comment policy would not be approved. Moderators also found that some comments simply did not further the conversation, were unclear and confusing due to translation issues, or were just contentless spews of hatred.
The comments on that post are now closed. The moderators approved a total of 254 comments, with 213 comments on my “Closing a Door” post, and 39 comments on my follow-up post “What Makes A Good Community?” The moderators also filtered out 186 comments total on those two posts. Now that the internet shit storm is over, I thought it would be interesting to take a peek into the acid-filled well in order to pull out some metrics.
Of course, I didn’t want to actually read the comments. That be silly! It would completely defeat the purpose of having comment moderators and let the trolls win. So, instead I used the power of open source to generate the metrics. I used the WordPress Exporter plugin to export all the comments on the two posts in XML. Then I used the python wpparser library to parse the XML into something sensible. From there, the program wrote the commenters’ names, email addresses, and IP addresses  into a CSV. I did some manual categorization of that information in Google docs.
Repeat Offenders or Drive-by Haters?
70% of the 186 filtered comments were from unique IP addresses. The remaining 30% of comments were generated by 19 different people, who left an average of three comments each. The most persistant troll commented 10 times.
Anonymous Cowards or Brave Truth Tellers?
72% of the 186 comments did not include a full name. Of the commenters that did not include a full name:
- 39 people used just a first name, making up 24% of the comments.
- 25 people used what looks like internet nicks, accounting for 16% of the comments.
- 17 people used various forms of the word “anonymous” in the name field, making up 9% of the comments.
- 12 people used an English word instead of a name, accounting for 8% of the comments.
- 4 people used obviously fake names, accounting for 7% of the comments.
- 8 people used their initials or one letter, accounting for 5% of the comments.
- 5 people used a slur in their name, accounting for 3% of the comments.
- 2 people used a threat in their name, accounting for 1% of the comments. [Edit: make that 3, or 2%]
Community Members or Internet Trolls?
38 people used a full name, accounting for 28% of the comments. That means approximately 1/3 were brave enough to put their real name behind their comments. (Or a full fake name.) The question becomes, are these people actually a part of the open source community? Are they people who have actually interacted on an open source mailing list before? To answer these questions, I choose to search the author name in the Mailing List Archives (MARC) where a variety of open source mailing lists are archived, including the Linux kernel subsystem mailing lists, BSD, database lists, etc.
Of the 38 people who used their real name, 14 people had interacted on an open source mailing list archived by MARC. They made up 8% of the filtered comments. Ten of those people had more than 10 mails to the lists.
[Edit] Of the 25 people that used what looked like internet nicks, 11 of them may be open source users (see analysis below in the comments). That accounted for 8% of the filtered comments.
The important take away here is that only 16% of the filtered comments were made by open source users and developers. This is an important finding, since the article itself was about open source community dynamics.
 Before you scream about privacy, note that my comment policy allows me to collect and potentially publish this information.