When I posted my Closing a Door post, I mentioned that a team of moderators would be filtering comments for me. Comments that did not meet my comment policy would not be approved. Moderators also found that some comments simply did not further the conversation, were unclear and confusing due to translation issues, or were just contentless spews of hatred.
The comments on that post are now closed. The moderators approved a total of 254 comments, with 213 comments on my “Closing a Door” post, and 39 comments on my follow-up post “What Makes A Good Community?” The moderators also filtered out 186 comments total on those two posts. Now that the internet shit storm is over, I thought it would be interesting to take a peek into the acid-filled well in order to pull out some metrics.
Of course, I didn’t want to actually read the comments. That be silly! It would completely defeat the purpose of having comment moderators and let the trolls win. So, instead I used the power of open source to generate the metrics. I used the WordPress Exporter plugin to export all the comments on the two posts in XML. Then I used the python wpparser library to parse the XML into something sensible. From there, the program wrote the commenters’ names, email addresses, and IP addresses [1] into a CSV. I did some manual categorization of that information in Google docs.
Repeat Offenders or Drive-by Haters?
70% of the 186 filtered comments were from unique IP addresses. The remaining 30% of comments were generated by 19 different people, who left an average of three comments each. The most persistant troll commented 10 times.
Anonymous Cowards or Brave Truth Tellers?
72% of the 186 comments did not include a full name. Of the commenters that did not include a full name:
- 39 people used just a first name, making up 24% of the comments.
- 25 people used what looks like internet nicks, accounting for 16% of the comments.
- 17 people used various forms of the word “anonymous” in the name field, making up 9% of the comments.
- 12 people used an English word instead of a name, accounting for 8% of the comments.
- 4 people used obviously fake names, accounting for 7% of the comments.
- 8 people used their initials or one letter, accounting for 5% of the comments.
- 5 people used a slur in their name, accounting for 3% of the comments.
- 2 people used a threat in their name, accounting for 1% of the comments. [Edit: make that 3, or 2%]
Community Members or Internet Trolls?
38 people used a full name, accounting for 28% of the comments. That means approximately 1/3 were brave enough to put their real name behind their comments. (Or a full fake name.) The question becomes, are these people actually a part of the open source community? Are they people who have actually interacted on an open source mailing list before? To answer these questions, I choose to search the author name in the Mailing List Archives (MARC) where a variety of open source mailing lists are archived, including the Linux kernel subsystem mailing lists, BSD, database lists, etc.
Of the 38 people who used their real name, 14 people had interacted on an open source mailing list archived by MARC. They made up 8% of the filtered comments. Ten of those people had more than 10 mails to the lists.
[Edit] Of the 25 people that used what looked like internet nicks, 11 of them may be open source users (see analysis below in the comments). That accounted for 8% of the filtered comments.
The important take away here is that only 16% of the filtered comments were made by open source users and developers. This is an important finding, since the article itself was about open source community dynamics.
[1] Before you scream about privacy, note that my comment policy allows me to collect and potentially publish this information.
Out of curiosity, what did you scan for to automatically determine if a comment contained a threat?
And, out of curiosity, do you have metrics for use of major profanity as well?
(Sorry for making an anonymous comment myself.)
The two threats were contained in the names field, and consisted of “Get Raped At Intel” and “Dead”. I did not search the bodies of the comments for further threats. I would like to do statistical analysis of the comments, searching for slurs or cussing eventually.
This is interesting information, thank you for sharing it.
BTW, I had a difficult time finding any link to your comment policy that was not in the body of this post. If I may offer a suggestion: a link to it very near the comment form would be very helpful.
There should be a link to it at the bar at the top of the page. That said, I will try to work my WordPress foo to get a link above the comment box. It might take a bit, since I’m a systems engineer, not a web developer. 🙂
“The important take away here is that only 8% of the filtered comments were made by open source users and developers.”
No, it only means that you were only able to reverse-find those 8%, and nothing more. As a counterexample, pick a random lwn article: if will be filled with handles and not full-real-names, saying that 95% of people on lwn don’t use GNU/Linux because of that would be naive. Same for, say, Ubuntu forum users, etc..
Thank you for the lovely suggestion of how to expand my analysis to include irc nicks! However, the nicks only accounted for 16% of the filtered comments, which means even if I’m *completely wrong* about the people using nicks, only 22% of the filtered comments were from open source users or contributors. However, to sooth your soul, let’s take a look at the list of potential nicks:
H.Trickler – Googling for “H.Trickler Linux” gives me no interesting hits, but “H.Trickler open source” looks like this person leaves comments on open source articles. We’ll include them.
RatBert – eh, there’s a bunch of people who have named their machines RatBert, but I can’t find any person who uses that nick.
soCal – There’s Southern California Linux Expo, which makes it really hard to search for this one. I’ll include it for posterity’s sake.
fi11222 – Oh, here’s a nice unique username! Looking through the first few pages of Christian and music site results, I did find a stackoverflow page. Great! One more commenter who says they use open source.
dolgo – Dolgo is the name of a type of crab apple tree. Might be someone in the security world? I’ll include them too!
laf163 – Might be someone in the postgres world? Sure, ok, let’s include them too.
ibsteve2u – I poked through some comments on articles, but this person seems to be interested in privacy, but not necessarily open source. I’ll include them anyway.
Jazz Fan – Nothing interesting when searching for ‘”Jazz Fan” open source’.
cade – Similar nope.
AssemblyLineHuman – Nope.
Landpaddle – Nope.
capkid – nope
Ralph Rofl – nope
mvr1981 – nope
ArthurTent – eh, this person might use IRC? I’ll include them too.
SPQR – Maybe this person is from the open source turned-based strategy game? Eh, sure, let’s include them too.
Lopp – Might be the last name of a person who spoke at OSCON? Ok, we’ll include them too!
thebigwobby – There’s lots of posts about “wobbly windows” in Linux desktop environments, but no person here.
Pipilangstrumpf – A german name for a type of horse? No results for “Pipilangstrumpf open source”
int0xc0ffee – No Google results at all for this, which is odd… Congrats, you’ve found a completely unique username and/or obscured yourself from Google!
Vegetarianmassacre666 – Nada. I suspect this person thinks I’m vegetarian, which I’m not, but if so, they should possibly be counted in the threats.
Sin2x – Ooh, a math nut! That makes it very hard to find open source results though. I’m not counting this one.
brendafdez – Hey, it’s a bitcoin person. Sure, I’ll include you too, you currency lover!
BSD User – Yeah, that’s pretty obvious. Including them too.
trumpish – Funny, but no open source results.
Now, let’s look at where that wasted 30 minutes of searching got us. I originally reported that only 8% of the filtered comments were from open source users or contributors, and with the additional nick results, that number has moved to *dun-dun-a-dun!* a whopping 16%!
Congratulations, you’ve wasted my time I could have been spending enabling graphics on the latest Intel platform in order to double the results, and prove that 84% of the commenters are still not open source contributors or users. Additionally, the number of commenters that included a threat in their name has moved from 2 to 3 people, and increased to 2%.
Thank you for participating in this week’s internet rathole!
Pipi Langstrumpf is the German name of the main character in a series of Swedish children’s books.
Frankly, I think you were overly generous in your reclassification. The majority of hecklers were probably GG trolls who wouldn’t know a compiler if it hit them in the family jewels, but who pounce on anyone (or at least any woman) they catch writing about discrimination in tech, especially if Randi approves of them. They’ll probably try to “prove” that you don’t even know how to program and that your husband’s brother’s roommate’s uncle’s dog wrote all your code for you.
Hello, Sarah. Maybe some haters also love you – you are 4th in open votes “Person of The Year 2015 in the Linux/OpenSource Community” – https://tlhp.cf/man-of-the-year-2015/ with 708 voices.
All is not so bad as you say.
Heh, I’m glad they renamed it to person of the year! And I’m amused the haters in the comments think I’m in my twenties. I’ll take that as a compliment.
Nicely done. *applause*
Any experiment’s data is less meaningful without a control. Doing the same stats on the posts which the moderators had approved would shed more light.
I think you’ve already done as much analysis here as is warranted, but a more direct way of measuring open source contributions and the corresponding weight their opinions might have in social fora would be to look at number/size of commits to github, bitbucket etc public repositories. I don’t know if any of them provide aggregate statistics already, but if not it seems like a project for some enterprising individual who wanted to make whuffie more of a reality for the average reader.
I’m learning of this story late and can’t comment on Closing A Door posting. I have seen similar attitudes with developers of many projects where there is a guru in charge. Also some system admins. The harshness is a defence, made in substitution for discussing and understanding other points of view. Sometimes it is even a defence to avoid admitting they don’t get it. What is lacking is honesty to reveal they have not thought of everything, and the humility to be an equal human being.
I was never able to be a developer, because to stay on top of the game, it would involve living in a world of code constantly. You can expect this type of specialty to produce or attract people who don’t work with humans as well as they would like. One of the attractions of coding is to have absolute control over what the machine does and how. This is what a coder experiences for many hours and it can be hard to adjust to people again, or if they never did adjust, people are just irritating. It is simpler to imitate the guru and shame others until they comply.
Really like the blog post about making a good community. I feel this discussion will contribute to a ripple effect you won’t see with outcomes in 2016, but is subtlety making people more aware where the open source community needs to progress. Thank you for doing this.
I wrote an 7-piece series of articles against verbal abuse and Torvalds for root.cz, a Linux and Open Source specialized website. I am a co-author of the Twibright Links browser and author of the opensource Ronja wireless optical link. I also cite this study there. They published 3 articles out of 7 and said they will stop it. I published 33 articles in this company and they never ever refused my article. These articles about verbal abuse have very high comment counts and I believe they are quality written, all arguments and facts based on citations from studies by renowned sources. Now they wasted my money for 4 royalties. I get impression the verbal abuse culture is really ingrained in the IT.
I am arguing in the articles that verbal abuse is violence, that bullying is endemic in the IT, that verbal abuse causes both physical and psychiatrical damage to the human brain (studies cited are supported by NIMH, the biggest psychiatry research centre in the world). I am giving examples where it leads to loss of productivity and team damage on large OS projectests – Sarah Sharp leaving Linux, The de Raadtt being kicked out of NetBSD.
https://www.root.cz/serialy/dopousti-se-linus-torvalds-verbalniho-zneuzivani/
I think these articles are one of the best I ever wrote for this company and they are talking about them like if they were low quality and that I “can write better ones”. Especially my articles about verbal abuse are referencing scientific studies constantly and have a clear structure of introduction, definitions, examples of harm, and proposed solution, so they are kinda in the style of a scientific study article.
They are claiming the readers are receiving the articles negatively (no wonder – so many perpetrators among them, cf. prevalence of bullyin in IT). I don’t read the reactions and comments because they are full of verbal abuse themselves.
I perceive this as some kind of censorship and unfair rejection. The article is apparently being rejected because it shows a reality perpetrators of verbal abuse don’t want to see.
I am disgusted by this.
I posted a comment that I wrote articles about how harmful verbal abuse from Torvalds is, about quoting a study, and that the articles were censored. I wrote it politely and in an agreeing way I didn’t criticize Sarah in any way. I believe the comment was on topic.
Yet it got deleted. I feel unfairly treated. And I feel hypocritically treated because Sarah fights against mistreatment and in my opinion mistreats me at the same time – unfairly deleting a perfectly fine, friendly, supportive, agrreeing, on-topic comment.
All comments are moderated, and I’ve been at a conference all week, so my moderation queue has been neglected. I’ve approved your comment, although it’s an exact duplicate of the one you left on http://sarah.thesharps.us/2016/10/29/measuring-the-impact-of-negative-language-on-foss-participation-part-i/#comment-158149 Please don’t leave duplicate comments in the future. Thanks!
This blog internal working is unfortunately making a false impression that comments are being deleted when they are not, which is damaging to interpersonal relationships.
After I posted the comment about the article, it always appeared with a remark “under moderation”, even when I revisited the website later the same day. Many days later, it disappeared. I concluded that it got unfairly deleted. I am 100% sure it wasn’t displayed.
So I posted a comment where I say I feel unfairly treated from deleting the agreeing comment. However at the moment I posted it, the original comment, which was previously surely not displayed, got displayed again.
Similarly I noticed Facebook falsely reports messages as read, which makes an impression of rude or disrespectful treatment by the reader party. These technological glitches are unfortunately damaging to interpersonal relationships.
My privacy settings on Facebook don’t allow messages from people I don’t follow. In general, a public tweet is the best way to get a hold of me.