Outreachy progress 2019-03

Summary of work:

  • Approved 1,013 initial applications total
  • Investigated an issue with KiwiIRC being banned on some IRC servers
  • Fixed some IRC link issues in the Outreachy website
  • Promoted new projects on Twitter and the Outreachy announce list
  • Sent semi-automated emails to remind applicants of the final application deadlines
  • Answered applicant’s questions, as time allowed
  • Communicated with potential Outreachy sponsors
  • Chased down some outstanding invoices from the December 2018 round
  • Communicated with December 2018 interns who had internship extensions

Outreachy progress: 2019-02

Summary of work this month:

  • Created final feedback form for interns and mentors
  • Contacted potential communities for the May to August 2019 round
  • Updated questions on the initial application form
  • Updated the website to the latest stable version of Django 1.11
  • Wrote a blog post announcing changes in eligibility criteria
  • Promotion on Twitter, emailing diversity in tech groups, job boards postings
  • Reviewed 874 initial application essays

The Outreachy internship program opened applications for the May to August round. Most of the time this month has been reviewing the 1,235 initial applications that have been submitted. 😱

We’re definitely getting more applications this round. After the six week application period for the December to March round, we processed 1,817 initial applications. Less than two weeks into this round, we’ve had 1,235 initial applications submitted.

That sounds like a huge number, but that’s where the magic of Django comes in. Django allows us to collect time commitment information from all the applicants. We create a calendar of their time commitments and then see if they have 49 consecutive days free from full-time commitments from during the internship period.

So far, about 181 initial applications have been rejected because applicants had full-time commitments. (The number is usually higher in the December round because students in the northern hemisphere have a shorter break.)

We also check whether people are eligible to work in the countries they’re living in, whether people have participated in Outreachy or Google Summer of Code before, etc. There are 72 applications that were automatically denied because of those kinds of issues.

That leaves 982 applicants who were eligible for Outreachy so far. 😲 And we have to manually review every single applicant essay to see whether supporting this person would align with Outreachy’s program goal to support marginalized people in tech.

We ask specific essay questions to determine whether the applicant is underrepresented. We ask two more essay questions to determine whether they face discrimination or systemic bias in their learning environment or when looking for employment opportunities. Applicants have to demonstrate both characteristics. They have to be underrepresented *and* face discrimination.

It’s quite frankly difficult to spend 5-9 hours a day reading about the discrimination people face. We ask for personal stories, and people open up with some real horror stories. It’s probably re-traumatizing for them. It certainly impacts my mental health. Other people share less specific experiences with discrimination, which is also fine.

Sometimes reading essays introduces me to types of discrimination that are unfamiliar to me. For example, I’ve been reading more about the caste system in India and ethnic/tribal discrimination in Africa. Reading the essays can be a learning experience for me, and I’m glad we have multiple application reviewers from around the world.

One of the hardest things to do is to say no to an initial application.

Sometimes it’s clear from an essay that someone is from a group underrepresented in the technology industry of their country, but their learning environment is supportive and diverse, and they don’t think they’ll face discrimination in the workplace. Outreachy has to prioritize supporting marginalized people in tech, even if that means turning down underrepresented people who have the privilege to not face discrimination.

It’s also difficult because a lot of applicants who aren’t from groups underrepresented in tech equate hardship with discrimination. For example, a man being turned down for a job because they don’t have enough technical experience could be considered hardship. Interviewers assuming a woman doesn’t have technical experience because they’re a woman is discrimination. The end result is the same (you don’t get the job because the interviewer thinks you don’t have technical experience), but the cause (sexisim) is different.

Sometimes systemic issues are at play. For example, not having access to your college’s library because you have a mobility device and there’s no elevator is both discrimination and a systemic issue. Some communities face gender violence against women. The violence means parents don’t allow women to travel away to college, and some universities to restrict women to their dorms in the evenings. Imagine not being able to study after class, or not having internet in your dorms to do research. The reaction to these systemic issues incorrectly punish the people who are most likely to face harassment.

It’s frustrating to read about discrimination, but I hope that working with Outreachy mentors gives people an opportunity they wouldn’t otherwise have.

Outreachy Progress: 2019-01

Summary

  • Finished cleaning up the technical debt that kept us from having two Outreachy rounds active at once
  • Added code for gathering internship midpoint feedback
  • Migrated the travel stipend page off the old wiki for Outreachy to the Django website
  • Added a required field for mentors to provide the minimum computer system requirements to contribute to the project
  • Created intern blog post prompts for weeks 5 & 7
  • Followed up on all December 2018 sponsorship invoices

Minimum System Requirements

New for this Outreachy round is asking mentors to provide the minimum system requirements for their project. Many Outreachy applicants have second-hand, 10 year old systems. They may not have the memory to be able to run a virtualized development environment. In the past, we’ve had applicants who tried to follow installation instructions to complete their required contribution, only to have their systems hang.

By requiring mentors to provide minimum system requirements for their projects, we hope to help applicants who can’t afford a newer computer. We also hope that it will help communities think about how they can lower their technology barriers for applicants who face socioeconomic hardship

Simplifying Language

This month I migrated the travel stipend instructions page from our old wiki to the new travel page. During that migration, I noticed the language in the page was filled with complex vocabulary and longer sentences. That’s how I tend to write, but it’s harder for people who speak English as a second language to read.

I used the Hemmingway editor to cut down on complex sentences. I would recommend that people look at similar tools to simplify their language on their website

Debt, debt, and more technical debt

I had hoped that January would be spent contacting Outreachy communities to notify them of the round. Unfortunately, Outreachy website work took priority, as it wasn’t ready for us to accept community sign-ups.

Most of the work was done on cleaning up the technical debt I talked about in my last blog post. The website has to handle having two internship rounds active at once. For example, in January, mentors were submitting feedback for the December 2018 internships, while other mentors were submitting projects for the upcoming May 2019 internships.

A lot of the process was deciding how long to display information on the website. For example, when should mentors be able to choose an applicant as an intern for their project?

Mentors could find a potential candidate very early in the application period, so the very soonest they could choose an intern would be when the application period starts.

Most people might assume that interns can’t be selected after we announce the internships. However, in the past, interns have decided not to participate, so mentors have needed to select another applicant after the interns are announced. The very latest they could select an intern would be five weeks after the internships start, since we can’t extend an internship for more than five weeks.

It’s a complex process to decide these dates. It requires a lot of tribal knowledge of how the Outreachy internship processes work. I’m happy to finally document some of those assumptions into the Outreachy website code.

Outreachy Progress: 2018-12

One of my resolutions for 2019 is to be more transparent about the work I’ve been doing for Outreachy. Hopefully (fingers crossed) this means you’ll be seeing a blog post once a month.

I’ll also throw in a selfie per month. My face is changing since I’ve been on hormone replacement therapy (testosterone) for about 7 months now. I started to get some peach fuzz around month 5. It’s still patchy, but I’m growing it out anyway so I can see if I can get a beard!

New glasses too!

What is Outreachy?

Outreachy is a three-month internship program. It’s completely remote (both interns and mentors come from around the world). We pay the interns a $5,500 USD stipend for the three months, plus a $500 travel stipend to attend a conference or event related to their internship or free software.

The goal of the internship is to introduce people to free and open source software. Outreachy has projects that involve programming, documentation, graphic design, user experience, user advocacy, and data science.

Outreachy’s other goal is to support people from groups underrepresented in the technology industry. We expressly invite women (both cis and trans), trans men, and genderqueer people to apply. We also expressly invite applications from residents and nationals of the United States of any gender who are Black/African American, Hispanic/Latin@, Native American/American Indian, Alaska Native, Native Hawaiian, or Pacific Islander. Anyone who faces under-representation, systemic bias, or discrimination in the technology industry of their country is invited to apply.

What’s My Role?

I own Otter Tech LLC, which is a diversity and inclusion consulting company. It’s been my full-time job since July 2016. I work with clients (mostly in the technology or free software space) that want to improve their culture and better support people from groups underrepresented in tech. Outreachy is one of my clients.

I am one of five Outreachy organizers. Two of us (Marina Zhurakhinskaya and I) are heavily involved in running the internship application process. Karen Sandler is great at finding funding for us. The whole Outreachy organizers team (including Tony Sebro and Cindy Pallares-Quezada) makes important decisions about the direction of the program.

Outreachy also recently hired two part-time staff members. They’ve been helping Outreachy applicants during the application period, and then also helping Outreachy interns when the internship is running. We don’t have a good name for their role yet, but we’ve sort of settled on “Outreachy Helpers”

December 2018 Progress

The December 2018 to March 2018 internship round kicked off on December 4. Usually that’s downtime for me as an Outreachy organizer, because mentors and coordinators step up to interact with their interns. In the past, the only real interaction the Outreachy organizers had with interns was if their mentor indicated they were having issues (yikes!). This month was spent increasing the frequency and types of check-ins with interns and mentors.

Outreachy Chat Server

This round, we’re trying something new to have the Outreachy interns talk with Outreachy organizers and with each other. We’ve set up a private invitation-only Zulip chat server, and invited all the Outreachy organizers, interns, mentors, and coordinators. I’ve been doing a bit of community management, participating in discussions, and answering questions that Outreachy interns have as they start their internship. I also ran a text-based discussion and then a video chat for Outreachy interns to do a second week check-in.

I think the Outreachy Zulip chat has worked out well! I see interns connecting across different free software communities, and mentors from other communities helping different interns. Zulip has the concept of “streams” which are basically chat rooms. We have a couple of different streams, like a general chat channel and a channel for asking questions about Outreachy internship procedures. I’m fairly certain that I got more questions on the Zulip chat from interns than we ever got by using email and IRC.

Frequent Feedback

The other thing we’re doing this round is collecting feedback in a different way. In the past, we collected it at two points during the internship. The midpoint was at 6 weeks in and the final feedback was at 12 weeks in. However, this round, we’re collecting it at three points: initial feedback at 2 weeks in, midpoint feedback at 8 weeks in, and final feedback at 12 weeks.

Collecting feedback three times meant more overhead for evaluating feedback and sending the results to our fiscal sponsor, the Software Freedom Conservancy. I wrote code in December to allow the Outreachy internship website to collect feedback from mentors as to whether interns should be paid their initial stipend.

We’re also collecting different feedback this round. I’m collecting feedback from both interns and mentors, based on a suggestion from a former Outreachy intern. Interns and mentors are asked the same questions, like “How long does it take (you/your intern) to respond to questions or feedback?” and “How long does it take (your mentor/you) to respond to questions and feedback?” That way, I can compare people’s self-evaluations with what the other person involved in the internship thinks.

There’s also a freeform-text for interns to give feedback on how their mentor is doing. This is important, because many Outreachy mentors are new to mentoring. They may need to have some coaching to understand how they can be more supportive to their interns. While most of the interns are doing great, I can see that I’m going to need to nudge a couple of mentor and intern pairs in the right direction.

Interviews with Alums

I did video interviews with five Outreachy interns at the Mozilla All Hands in December 2019. I loved interviewing them, because it’s great to hear their personal stories. I’ll be using the footage to create videos to promote the Outreachy program.

I’ve created short-hand transcripts of two of the videos, but haven’t gotten to the other five. Transcripts help for a couple reasons. Most importantly, I can add closed captioning to the finished videos. I also have a searchable text database for when I need to find quotes about a particular topic. Seeing the text allows me to group similar experiences and create a cohesive narrative for the promotional video.

Ramping up for May 2019 Internships

The Outreachy December 2018 to March 2019 internships are just starting, but we’re already thinking of the next round. January is typically the time we start pinging communities to see if they want to be involved in mentoring interns during the February to March application period.

That means we need to have the website ready to handle both a currently running internship cohort, and a new internship round where mentors can submit projects. There’s some technical debt in the Outreachy website code that we need to address before we can list the next round’s internship dates.

The Outreachy website is designed to guide internship applicants through the application process. It’s built with a web framework tool called Django, which is written in Python. Django makes web development easier, because you can define Python classes that represent your data. Django then uses those classes to create a representation in the database. The part of Django that translates Python into database schema is called the ORM (Object Relational Mapper).

For example, the Outreachy website keeps track of internship rounds (the RoundPage class). Each internship round has dates and other information associated with it. For example, it has the date for when the application period starts and ends, and when the internship starts and end.

It makes sense to store internship rounds in a database, because all internship rounds have the same kinds of deadlines associated with them. You can do database queries to find particular rounds in the database. For example, the Django Python code to look up the latest round (based on when the interns start their internship) is RoundPage.objects.latest(‘internstarts’).

The work I’ve recently been doing is to deal with the fact that two internship rounds can be active at once. We’re about to open the next internship round for mentors to submit new projects. On February 18, the next application period will open. But the December 2018 round of internships will still be active until March 4.

The Outreachy website’s pages has to deal with displaying data from multiple rounds. For example, on the Outreachy organizers’ dashboard page, I need to be able to send out reminder emails about final mentor feedback for the December 2018 round, while still reviewing and approving new communities to participate in the May 2019 round. Outreachy mentors need to still be able to submit feedback for their current intern in the December 2018 round, while (potentially) submitting a new project for the May 2019 round.

It’s mostly a lot of refactoring and debugging Python code. I’m writing more Django unit tests to deal with corner cases. Sometimes it’s hard to debug when something fails in the unit test, but doesn’t fail in our local deployment copy. I’m fairly new to testing in Django, and I wrote my first test recently! I feel really silly for not starting on the tests sooner, but I’m slowly catching up to things!

What’s Next?

January 2019 is going to be spent contacting communities about participating in the May 2018 to August 2018 round. I have some video footage of Outreachy interns I interviewed at the Tapia conference and Mozilla All Hands, and I hope to put it into a promotional video to inspire people to become mentors. It’s a fun exercise that uses some of the video editing skills I have from making fanvideos.

I’ll also be at FOSDEM in February 2019. If you’re there, find me in either the Software Freedom Conservancy booth on Saturday, or the Community devroom on Sunday. I’ll also be helping out with the Copyleft Conference on Monday.

I’ll be giving a talk at FOSDEM on changing team culture to better support people with impostor syndrome. The goal is not to ask people with impostor syndrome to change, but instead to figure out how to change our culture so that we don’t create or trigger impostor syndrome. The talk is called “Supporting FOSS Community Members with Impostor Syndrome“. The talk will be from 9:10am to 9:40am on Sunday (the first talk slot).


Binaries are for computers

[Note: comments on this post will be moderated by someone other than me.]

Sage in a purple tie and black shirt CC-BY-NC-ND Sage Sharp
CC-BY-NC-ND Sage Sharp

Recently, I’ve come to terms with the fact that I’m non-binary. Non-binary can be a bit of an umbrella term, but for me, being non-binary means I don’t identify as either a man or a woman. For me, gender is more like a 3D space, a universe of different traits. Most people gravitate towards a set of traits that we label as masculine or feminine. I don’t feel a strong pull towards being either really masculine or really feminine.

I’m writing this post for two reasons. The first reason is that representation matters. I know some non-binary people in tech and from the indie comics industry, but I’d love to see those voices and stories promoted. Hopefully being open and honest with people about my identity will help, both to raise awareness, and to give other people the courage to start an exploration into their own gender identity. I know talking with queer friends and reading comics helped me while I was working out my own gender identity.

The second reason I’m writing this is because there’s a couple ways allies can help me as I go through my transition:

  • Use my new name (Sage) and my correct pronouns (they/them)
  • Educate yourself on what it means to be non-binary
  • Think about how you use gender in your software and websites
  • Think about making your events more inclusive towards non-binary folks

Names

I’ve changed my name to Sage Sharp.

I would appreciate it if you could use my new name. If you’re thinking about writing about me, there’s a section on writing about individuals who are transitioning in “The Responsible Communication Style Guide”. You should buy a digital copy!

Pronouns and Titles

I use the pronoun ‘they’. If you’ve never had someone ask for you to use a specific pronoun, this Robot Hugs comic does a good job of explaining how to handle it.

If you have to add a formal title before my last name, please use ‘Mx Sharp’. ‘Mx’ is a gender-neutral honorific, like ‘Mr’, ‘Ms’, or ‘Mrs’. Mx is pronounced in a couple different ways: /ˈməks/, /ˈmÉŞks/ or /ˈmʌks/ (miks or muks). I like pronouncing it ‘mux’ like the electronics part multiplexer, but pick whichever pronunciation works for you. If you want to get really formal and are reaching for a term like ‘lady/gentlemen’, I prefer the term ‘gentleperson’.

I’ve found positive gender-neutral terms to describe myself with, like “dapper” or “cute”. I cringe every time a stranger attempts to gender me, since they usually settle on feminine forms of address. I wish more people would use gender-neutral terms like “folks”, “friend”, “comrade”, or say “Hello everyone!” or “Hi y’all” and leave it at that.

Being able to write in a gender neutral way is hard and takes a lot of practice. It means shifting from gendered terms like “sister/brother” or “daughter/son” to gender-neutral terms like “sibling” or “kid”. It means getting comfortable with the singular they instead of ‘she’ or ‘he’. Sometimes there isn’t a gender neutral term for a relationship like “aunt/uncle” and it means you have to make up some new term, like “Titi”. There’s some lists of gender-neutral titles but in general, just ask what term to use.

If this is all new and bewildering to you, I recommend the book ‘The ABCs of LGBTQ+‘. Another good book is ‘You’re in the wrong bathroom‘ which breaks down some common myths about non-binary and trans folks.

Gender Forms

I’m really not looking forward to my gender being listed as ‘other’ or ‘prefer not to say’ on every gender form out there. I don’t even know if I can change my gender in social media, email, credit cards, banking… It’s a giant headache to change my name, let alone hope that technology systems will allow me to change my gender. It’s probably a separate post all itself, or a topic for my Diversity Deep Dives mailing list.

If you’re a programmer, website designer, or user experience person, ask yourself: Do you even need to collect information about a person’s gender? Could your website use gender neutral language? If you do need to address someone in a gendered way, maybe you just need to ask for their preferred pronouns instead of their gender? Could you drop the use of gendered honorifics, like ‘Miss’, ‘Mrs’ and ‘Mr’, or ‘Sir’ and ‘Madam’?

Inclusive Tech Groups

There’s a lot of tech spaces that are designed to help connect people from groups who are underrepresented in tech. Some tech groups are for “women” or “girls” (which can sometimes mean young women, and sometimes means adult women). It’s unclear whether non-binary folks are included in such groups, which puts me in the awkward position of asking all the groups I’m currently involved in if this is still a space for me.

I recommend reading Kat’s post on the design of gender-inclusive tech spaces. If you run a group or event targeted at gender minorities in tech, consider what you mean by “women only” and whether you want to be more inclusive towards non-binary folks that also want a space away from the patriarchy.

I know that in the past, a lot of folks looked up to me as ‘a woman in open source’. Some people felt I was a role model, and some people felt triumphant that I overcame a lot of sexism and a toxic environment to do some pretty badass technical work. Guess what? I’m still a badass. As a non-binary person, I’m still a minority gender in tech. So I’m still going to continue to be my badass self, taking on the patriarchy.

Promote Pronouns

When you meet someone, don’t assume what their pronouns are. As an ally, you can help by introducing yourself and normalizing pronoun usage. E.g. “Hi, my name is Victor, and I use he/him pronouns.”

If you’re a conference organizer, make sure all your name tags have a space for pronoun stickers. Have sheets of pronoun stickers at registration, and make sure the registration volunteers point out the pronoun badge stickers. If someone is confused about what the pronouns are, have a handout on pronouns and gender ready to give them. Wiscon is a conference that does a very good job with pronouns.

Don’t print pronouns collected from the registration system on badges without permission, or force everyone to put a pronoun on their badge. Some people might use different pronouns in conversation with different people, for example, if a person is “out” as non-binary to some friends but not their coworkers. Some people are genderfluid (meaning their feelings about their gender may change over time, even day to day). Some people might be questioning their gender, and not know what pronouns they want yet. Some people may prefer not to have a pronoun sticker at all.

The best practice is to provide space for people who want to provide their pronouns, but don’t force it on everyone.

What if people misgender you?

Some people who knew me under my old name might get confused when you use my new name. It’s perfectly fine to remind them of past work I did under my old name, while emphasizing my new name and pronouns. For example:

“Sage Sharp? Who’s that?”

“Sage is a diversity and inclusion consultant with Otter Tech. They were a Linux kernel developer for 10 years and wrote the USB 3.0 driver. They help run the Outreachy internship program.”

“Oh, you mean Sarah Sharp?”

“Yeah. They changed their name to Sage and they use ‘they’ pronouns now.”

I know it might be hard for people who have known me to get used to my new name and pronoun. You might even slip up in conversation with me. That’s ok, just correct the word and move on with your conversation. No need to apologize or call attention to it. We’re all humans, and retraining the language centers of our brains takes time. As long as I can see you’re trying, we’re cool.

What about your old accounts?

The internet never forgets. There will be old pictures of me, articles about me under my old name, etc. I’m fine with that, because that’s all a part of my past, who I was and the experiences that make me who I am. It’s as much a part of who I am as the tattoo on my arm. I don’t feel sad or weird looking at old pictures of myself. Seeing my longer haircut or myself in more feminine clothing can be surprising because I’ve changed so much, but after that initial reaction what I feel most is empathy for my past self.

At the same time, I’m also not that person any more. I’d like to see current pictures of me with my current name and correct pronoun.

If you see a news article that uses my old name, please let them know about my new name and pronouns. (But if it’s some troll site, don’t engage.) Several photos of my new style can be found here. If you see a social media website that uses my old name, don’t bother emailing me about it. I might have abandoned it, or found the name/gender change process to be too complex. Speaking of email, my old email addresses will still work, but I’ll respond back with my new email address. Please update your phone and email contacts to use the name ‘Sage Sharp’.

Phew, that was a lot to process!

We’ll keep it simple. Hi, my name is Sage Sharp, and I use ‘they’ pronouns. It’s nice to meet you!

Update on Sentiment Analysis of FOSS communities

One of my goals with my new open source project, FOSS Heartbeat, has been to measure the overall sentiment of communication in open source communities. Are the communities welcoming and friendly, hostile, or neutral? Does the bulk of positive or negative sentiment come from core contributors or outsiders? In order to make this analysis scale across multiple open source communities with years of logs, I needed to be able to train an algorithm to recognize the sentiment or tone of technical conversation.

How can machine learning recognize human language sentiment?

One of the projects I’ve been using is the Stanford CoreNLP library, an open source Natural Language Processing (NLP) project. The Stanford CoreNLP takes a set of training sentences (manually marked so that each word and each combined phrase has a sentiment) and it trains a neural network to recognize the sentiment.

The problem with any form of artificial intelligence is that the input into the machine is always biased in some way. For the Stanford CoreNLP, their default sentiment model was trained on movie reviews. That means, for example, that the default sentiment model thinks “Christian” is a very positive word, whereas in an open source project that’s probably someone’s name. The default sentiment model also consistently marks any sentence expressing a neutral technical opinion as having a negative tone. Most people leaving movie reviews either hate or love the movie, and people are unlikely to leave a neutral review analyzing the technical merits of the special effects. Thus, it makes sense that a sentiment model trained on movie reviews would classify technical opinions as negative.

Since the Stanford CoreNLP default sentiment model doesn’t work well on technical conversation, I’ve been creating a new set of sentiment training data that only uses sentences from open source projects. That means that I have to manually modify the sentiment of words and phrases in thousands of sentences that I feed into the new sentiment model. Yikes!

As of today, the Stanford CoreNLP default sentiment model has ~8,000 sentences in their training file. I currently have ~1,200 sentences. While my model isn’t as consistent as the Stanford CoreNLP, it is better at recognizing neutral and positive tone in technical sentences. If you’re interested in the technical details (e.g. specificity, recall, false positives and the like), you can take a look at the new sentiment model’s stats. This blog post will attempt to present the results without diving into guided machine learning jargon.

Default vs New Models On Positive Tone

Let’s take a look at an example of a positive code review experience. The left column is from the default sentiment model in Stanford CoreNLP, which was trained on movie reviews. The right column is from the new sentiment model I’ve been training. The colors of the sentence encode what the two models think the overall tone of the sentence is:

  • Very positive
  • Positive
  • Neutral
  • Negative
  • Very negative

Hey @1Niels 🙂 is there a particular reason for calling it Emoji Code?

I think the earlier guide called it emoji name.

A few examples here would help, as well as explaining that the pop-up menu shows the first five emojis whose names contain the letters typed.

(I’m sure you have a better way of explaining this than me :-).

@arpith I called them Emoji code because that’s what they’re called on Slack’s emoji guide and more commonly only other websites as well.

I think I will probably change the section name from Emoji Code to Using emoji codes and I’ll include your suggestion in the last step.

Thanks for the feedback!

Hey @1Niels 🙂 is there a particular reason for calling it Emoji Code?

I think the earlier guide called it emoji name.

A few examples here would help, as well as explaining that the pop-up menu shows the first five emojis whose names contain the letters typed.

(I’m sure you have a better way of explaining this than me :-).

@arpith I called them Emoji code because that’s what they’re called on Slack’s emoji guide and more commonly only other websites as well.

I think I will probably change the section name from Emoji Code to Using emoji codes and I’ll include your suggestion in the last step.

Thanks for the feedback!

Default vs New Models On Positive Tone

For the default model trained on movie reviews, it rated 4 out of 7 of the sentences as negative and 1 out of 7 sentences as positive. As you can see, the default sentiment model that was trained on movie reviews tends to classify neutral technical talk as having a negative tone, including sentences like “I called them Emoji code because that’s what they’re called on Slack’s emoji guide and more commonly only other websites as well.” It did recognize the sentence “Thanks for the feedback!” as positive, which is good.

For the new model trained on comments from open source projects, it rated 1 sentence as negative, 2 as positive, and 1 as very positive. Most of the positive tone of this example comes from the use of smiley faces, which I’ve been careful to train the new model to recognize. Additionally, I’ve been teaching it that exclamation points ending a sentence that is overall positive shift the tone to very positive. I’m pleased to see it pick up on those subtleties.

Default vs New Models On Neutral Tone

Let’s have a look at a neutral tone code review example. Again, the sentence sentiment color key is:

  • Very positive
  • Positive
  • Neutral
  • Negative
  • Very negative

This seems to check resolvers nested up to a fixed level, rather than checking resolvers and namespaces nested to an arbitrary depth.

I think a inline-code is more appropriate here, something like “URL namespace {} is not unique, you may not be able to reverse all URLs in this namespace”.

Errors prevent management commands from running, which is a bit severe for this case.

One of these should have an explicit instance namespace other than inline-code, otherwise the nested namespaces are not unique.

Please document the check in inline-code.

There’s a list of URL system checks at the end.

This seems to check resolvers nested up to a fixed level, rather than checking resolvers and namespaces nested to an arbitrary depth.

I think a inline-code is more appropriate here, something like “URL namespace {} is not unique, you may not be able to reverse all URLs in this namespace”.

Errors prevent management commands from running, which is a bit severe for this case.

One of these should have an explicit instance namespace other than inline-code, otherwise the nested namespaces are not unique.

Please document the check in inline-code.

There’s a list of URL system checks at the end.

Default vs New Models On Neutral Tone

Again, the default sentiment model trained on movie reviews classifies neutral review as negative, ranking 5 out of 6 sentences as negative.

The new model trained on open source communication is a bit mixed on this example, marking 1 sentence as positive and 1 negative, out of 6 sentences. Still, 4 out of 6 sentences were correctly marked as neutral, which is pretty good, given the new model has a training set that is 8 times smaller than the movie review set.

Default vs New Models On Negative Tone

Let’s take a look at a negative example. Please note that this is not a community that I am involved in, and I don’t know anyone from that community. I found this particular example because I searched for “code of conduct”. Note that the behavior displayed on the thread caused the initial contributor to offer to abandon their pull request. A project outsider stated they would recommend their employer not use the project because of the behavior. Another project member came along to ask for people to be more friendly. So quite a number of people thought this behavior was problematic.

Again, the sentiment color code is:

  • Very positive
  • Positive
  • Neutral
  • Negative
  • Very negative

Dude, you must be kidding everyone.

What dawned on you – that for a project to be successful and useful it needs confirmed userbase – was crystal clear to others years ago.

Your “hard working” is little comparing to what other people have been doing for years.

Get humbler, Mr. Arrogant.

If you find this project great, figure out that it is so because other people worked on it before.

Learn what they did and how.

But first learn Python, as pointed above.

Then keep working hard.

And make sure the project stays great after you applied your hands to it.

Dude, you must be kidding everyone.

What dawned on you – that for a project to be successful and useful it needs confirmed userbase – was crystal clear to others years ago.

Your “hard working” is little comparing to what other people have been doing for years.

Get humbler, Mr. Arrogant.

If you find this project great, figure out that it is so because other people worked on it before.

Learn what they did and how.

But first learn Python, as pointed above.

Then keep working hard.

And make sure the project stays great after you applied your hands to it.

Default vs New Models On Negative Tone

For the default model trained on movie reviews, it classifies 4 out of 9 sentences as negative and 2 as positive. The new model classifies 2 out of 9 sentences as negative and 2 as positive. In short, it needs more work.

It’s unsurprising that the new model doesn’t currently recognize negative sentiment very well right now, since I’ve been focusing on making sure it can recognize positive sentiment and neutral talk. The training set currently has 110 negative sentences out of 1205 sentences total. I simply need more negative examples, and they’re hard to find because many subtle personal attacks, insults, and slights don’t use curse words. If you look at the example above, there’s no good search terms, aside from the word arrogant, even though the sentences are still put-downs that create an us-vs-them mentality. Despite not using slurs or curse words, many people found the thread problematic.

The best way I’ve settled on to find negative sentiment examples is to look for “communication meta words” or people talking about communication style. My current list of search terms includes words like “friendlier”, “flippant”, “abrasive”, and similar. Some search words like “aggressive” yield too many false positives, because people talk about things like “aggressive optimization”. Once I’ve found a thread that contains those words, I’ll read through it and find the comments that caused the people to ask for a different communication style. Of course, this only works for communities that want to be welcoming. For other communities, searching for the word “attitude” seems to yield useful examples.

Still, it’s a lot of manual labor to identify problematic threads and fish out the negative sentences that are in those threads. I’ll be continuing to make progress on improving the model to recognize negative sentiment, but it would help if people could post links to negative sentiment examples on the FOSS Heartbeat github issue or drop me an email.

Visualizing Sentiment

Although the sentiment model isn’t perfect, I’ve added visualization for the sentiment of several communities on FOSS Heartbeat, including 24pullrequests, Dreamwidth, systemd, elm, fsharp, and opal.

The x-axis is the date. I used the number of neutral comments in an issue or pull request as the y-axis coordinate, with the error bars indicating the number of positive and negative comments. If the comment had two times the number of negative comments as positive comments, it was marked as a negative thread. If the comment had two times the number of positive comments than negative comments, it was marked as positive. If neither sentiment won, and more than 80% of the comments were neutral, it was marked as neutral. Otherwise the issue or pull request was marked as mixed sentiment.

Here’s an example:

24pullrequests-sentiment

The sentiment graph is from the 24pullrequests repository. It’s a ruby website that encourages programmers to gift code to open source projects during the 24 days in December before Christmas. One of the open source projects you can contribute to is the 24 pull requests site itself (isn’t that meta!). During the year, you’ll see the site admins filing help-wanted enhancements to update the software that runs the website or tweak a small feature. They’re usually closed within a day without a whole lot of back and forth between the main contributors. The mid-year contributions show up as the neutral, low-comment dots throughout the year. When the 24 pull request site admins do receive a gift of code to the website by a new contributor as part of the 24 pull requests period, they’re quite thankful, which you can see reflected in the many positive comments around December and January.

Another interesting example to look at is negative sentiment in the opal community:

opal-negative-sentiment

That large spike with 1207 neutral comments, 197 positive comments, and 441 negative comments is the opal community issue to add a code of conduct. Being able to quickly see which threads are turning into flamewars would be helpful to community managers and maintainers who have been ignoring the issue tracker to get some coding done. Once the sentiment model is better trained, I would love to analyze whether communities become more positive or more neutral after a Code of Conduct is put in place. Tying that data to whether more or less newcomers participate after a Code of Conduct is in place may be interesting as well.

There are a lot of real-world problems that sentiment analysis, participation data, and a bit of psychology could help us identify. One common social problem is burnout, which is characterized by an increased workload (stages 1 & 2), working at odd hours (stage 3), and an increase in negative sentiment (stage 6). We have participation data, comment timestamps, and sentiment for those comments, so we would only need some examples of burnout to identify the pattern. By being aware of the burnout stages of our collaborators, we could intervene early to help them avoid a spiral into depression.

A more corporate focused interest might be to identify issues where their key customers express frustration and anger, and focus their developers on fixing the squeaky wheel. If FOSS Heartbeat were extended to analyze comments on mailing lists, slack, discourse, or mattersmost, companies could get a general idea of the sentiment of customers after a new software release. Companies can also use the participation and data about who is merging code to figure out which projects or parts of their code are not being well-maintained, and assign additional help, as the exercism community did.

Another topic of interest to communities hoping to grow their developer base would be identifying the key factors that cause newcomers to become more active contributors to a project. Is it a positive welcome? A mentor suggesting a newcomer tackle a medium-sized issue by tagging them? Does adding documentation about a particularly confusing area cause more newcomers to submit pull requests to that area of code? Does code review from a particularly friendly person cause newcomers to want to come back? Or maybe code review lag causes them to drop off?

These are the kinds of people-centric community questions I would love to answer by using FOSS Heartbeat. I would like to thank Mozilla for sponsoring the project for the last three months. If you have additional questions you’d love to see FOSS Heartbeat answer, I’m available for contract work through Otter Tech. If you’re thankful about the work I’ve put in so far, you can support me through my patreon.

What open source community question would you like to see FOSS Heartbeat tackle? Feel free to leave a comment.

Impact of bots on github communities

I’ve been digging into contributor statistics for various communities on github as part of my work on FOSS Heartbeat, a project to measure the health of open source communities.

It’s fascinating to see bots show up in the contributor statistics. For example, if you look at github users who comment on issues the Rust community, you’ll quickly notice two contributors who interact a lot:

rust-bots

bors is a bot that runs pull requests through the rust continuous integration test suite, and automatically merges the code into the master branch if it passes. bors responds to commands issued in pull request comments (of the form’@bors r+ [commit ID]’ by community members with permission to merge code into rust-lang/rust.

rust-highfive is a bot that recommends a reviewer based on the contents of the pull request. It then add a comment that tags the reviewer, who will get a github notification (and possibly an email, if they have that set up).

Both bots have been set up by the Rust community in order to make pull request review smoother. bors is designed to cut down the amount of time developers need to spend running the test suite on code that’s ready to be merged. rust-highfive is designed to make sure the right person is aware of pull requests that may need their experienced eye.

But just how effective are these github bots? Are they really helping the Rust community or are they just causing more noise?

Chances of a successful pull request

bors merged its first pull request on 2013-02-02. The year before bors was introduced, only 330 out of 503 pull requests were merged. The year after, 1574 out of 2311 pull requests were merged. So the Rust community had four times more pull requests to review.

Assuming that the tests bors used were some of the same tests rust developers were running manually, we would expect that pull requests would be rejected at about the same rate (or maybe rejected more, since the automatic CI system would catch more bugs).

To test that assumption, we turn to a statistics method called the Chi squared test. It helps answer the question, “Is there a difference in the success rates of two samples?” In our case, it helps us answer the question, “After bors was used, did the percentage of accepted pull requests change?”

rust-bors-merged

It looks like there’s no statistical difference in the chances of getting a random pull request merged before or after bors started participating. That’s pretty good, considering the number of pull requests submitted quadrupled.

Now, what about rust-highfive? Since the bot is supposed to recommend pull request reviewers, we would hope that pull requests would have a higher chance of getting accepted. Let’s look at the chances of getting a pull request merged for the year before and the year after rust-highfive was introduced (2014-09-18).

rust-highfive-merged

So yes, it does seem like rust-highfive is effective at getting the right developer to notice a pull request they need to review and merge.

Impact on time a pull request is open

One of the hopes of a programmer who designs a bot is that it will cut down on the amount of time that the developer has to spend on simple repetitive tasks. A bot like bors is designed to run the CI suite automatically, leaving the developer more time to do other things, like review other pull requests. Maybe that means pull requests get merged faster?

To test the impact of bors on the amount of time a pull request is open, we turn to the Two-means hypothesis test. It tells you whether there’s a statistical difference between the means of two different data sets. In our case, we compare the length of time a pull request is open. The two populations are the pull requests a year before and a year after bors was introduced.

rust-bors-pr-open

We would hope to see the average open time of a pull request go down after bors was introduced, but that’s not what the data shows. The graph shows the length of time actually increased, with an increase of 1.1 days.

What about rust-highfive? We would hope that a bot that recommends a reviewer would cause pull requests to get closed sooner.

rust-bors-pr-open

The graph shows there’s no statistical evidence that rust-highfive made a difference in the length of time pull requests were open.

These results seemed odd to me, so I did a little bit of digging to generate a graph of the average time a pull request is open for each month:

rust-pr-open-trend

The length of time pull requests are open has been increasing for most of the Rust project history. That explains why comparing pull request age before and after bors showed an increase in the wait time to get a pull request merged. The second line shows the point that rust-highfive was introduced, and we do see a decline in the wait time. Since the decrease is almost symmetrical with the increase the year before, the average was the same for the two years.

Summary

What can we conclude about github bots from all this statistics?

We can prove with 99% confidence that adding the bors bot to automatically merge changes after it passed the CI tests had no impact on the chances of a random pull request getting merged.

We can prove with 99% confidence that rust-highfive increases a Rust developer’s chances of getting code merged, by as much as 11.7%. The bot initially helped lower the amount of time developers had to wait for their pull requests to be merged, but something else changed in May 2015 that caused the wait time to increase again. I’ll note that Rust version 1.0 came out on May 2015. Rust developers may have been more cautious about accepting pull requests after the API was frozen or the volume of pull requests may have increased. It’s unclear without further study.

This is awesome, can I help?

If you’re interested in metrics analysis for your community, please leave a note in the comments or drop an email to my consulting business, Otter Tech. I could use some help identifying the github usernames for bots in other communities I’m studying:

This blog post is part of a series on open source community metrics analysis:

Part 1: Measuring the Impact of Negative Language on FOSS Participation

You can find the open source FOSS Heartbeat code and FOSS community metrics on github. Thank you to Mozilla, who is sponsoring this research!

Measuring the Impact of Negative Language on FOSS Participation (Part I)

A recent academic paper showed that there were clear differences in the communication styles of two of the top Linux kernel developers (“Differentiating Communication Styles of Leaders on the Linux Kernel Mailing List”). One leader is much more likely to say “thank you” while the other is more likely to jump into a conversation with a “well, actually”.

Many open source contributors have stories of their patches being harshly rejected. Some people are able to “toughen up” and continue participating, and others will move onto a different project. The question is, how many people end up leaving a project due to harsh language? Are people who experience positive language more likely to contribute more to a project? Just how positive do core open source contributors need to be in order to attract newcomers and grow their community? Which community members are good at mentoring newcomers and helping them step into leadership roles?

I’ve been having a whole lot of fun coming up with scientific research methods to answer these questions, and I’d like to thank Mozilla for funding that research through their Participation Experiment program.
words

How do you measure positive and negative language?

The Natural Language Processing (NLP) field tries to teach computers to parse and derive meaning from human language. When you ask your phone a question like, “How old was Ada Lovelace when she died?” somewhere a server has to run a speech to text algorithm. NLP allows that server to parse the text into a subject “Ada Lovelace” and other sentence parts, which allows the server to respond with the correct answer, “Ada Lovelace died at the age of 36”.

Several open source NLP libraries, including the Natural Language Toolkit (NLTK) and Standford CoreNLP also include sentiment analysis. Sentiment analysis attempts to determine the “tone” and objectiveness of a piece of text. I’ll do more of a deep dive into sentiment analysis next month in part II of this blog post. For now, let’s talk about a more pressing question.
wocintech (microsoft) - 62

How do you define open source participation?

On the surface, this question seems so simple. If you look at any github project page or Linux Foundation kernel report or Open Stack statistics, you’ll see a multitude of graphs analyzing code contribution statistics. How many lines of code do people contribute? How frequently? Did we have new developers contribute this year? Which companies had the most contributions?

You’ll notice a particular emphasis here, a bias if you will. All these measurements are about how much code an individual contributor got merged into a code base. However, open source developers don’t act alone to create a project. They are part of a larger system of contributors that work together.

In order for code or documentation to be merged, it has to be reviewed. In open source, we encourage peer review in order to make sure the code is maintainable and (mostly) free of bugs. Some reports measure the work maintainers do, but they often lack recognition for the efforts of code reviewers. Bug reports are seen as bad, rather than proof that the project is being used and its features are being tested. People may measure the number of closed vs open bug reports, but very few measure and acknowledge the people who submit issues, gather information, and test fixes. Open source projects would be constantly crashing without the contribution of bug reporters.

All of these roles (reviewer, bug reporter, debugger, maintainer) are valuable ways to contribute to open source, but no one measures them because the bias in open source is towards developers. We talk even less about the vital non-coding contributions people do (conference planning, answering questions, fund raising, etc). Those are invaluable but harder to measure and attribute.

For this experiment, I hope to measure some of the less talked-about ways to contribute. I would love to extend this work to the many different contributions methods and different tools that open source communities use to collaborate. However, it’s important to start small, and develop a good framework for testing hypothesis like my hypothesis about negative language impacting open source participation.

does it measure up?

How do you measure open source participation?

For this experiment, I’m focusing on open source communities on github. Why? The data is easier to gather than projects that take contributions over mailing lists, because the discussion around a contribution is all in one place, and it’s easy to attribute replies to the right people. Plus, there are a lot of libraries in different languages that provide github API wrappers. I chose to work with the github3.py library because it still looked to be active and it had good documentation.

Of course, gathering all the information from github isn’t easy when you want to do sentiment analysis over every single community interaction. When you do, you’ll quickly run into their API request rate limit of 5,000 requests per hour. There are two projects that archive the “public firehose” of all github events: http://githubarchive.org and http://ghtorrent.org However, those projects only archive events that happened after 2011 or 2012, and some of the open source communities I want to study are older than that. Plus, downloading and filtering through several terabytes of data would probably take just as long as slurping just the data I need through a smaller straw (and would allow me to avoid awkward conversations with my ISP).

For my analysis, I wanted to pull down all open and closed issues and pull requests, along with their comments. For a community like Rust, which has been around since 2010, their data (as of a week or two ago) looks like this:

  • 18,739 issues
  • 18,464 pull requests
  • 182,368 comments on issues and pull request
  • 31,110 code review comments

Because of some oddities with the github API (did you know that an issue json data can be for either an issue or a pull request?), it took about 20 hours to pull down the information I need.

I’m still sorting through how exactly I want to graph the data and measure participation over time. I hope to have more to share in a week!

*Edit* The code is available on github, and the reports for various open source communities are also available.

“I was only joking”

There was a very interesting set of tweets yesterday that dissected the social implications of saying, “I was only joking.” To paraphrase:

I’ve been mulling on the application of this analysis of humor with respect to the infamous “Donglegate” incident. Many men in tech responded with anger and fear over a conference attendee getting fired over a sexist joke. “It was only a joke!” they cried.

However, the justification falls flat if we assume that you’re never “just joking” and that jokes define in groups or out groups. The sexist joke shared between two white males (who were part of the dominant culture of conferences in 2013) defined them as part of the “in-group” and pushed the African American woman who overhead the “joke” into the “out-group”.

When the woman pushed back against the joke in by tweeting about it with a picture of the joker, the people who were part of the in-group who found that joke “funny” were angry. When the joker was fired, it was a sign that they were no longer the favored, dominant group. Fear of loss of social status is a powerful motivator, which is what caused people from the joke’s “in-group” to call for the woman to be fired as well.

Of course, it wasn’t all men who blasted the woman for reacting to a “joke”. There were many women who blasted the reporter for “public shaming”, or who thought the woman was being “too sensitive”, or rushed to reassure men that they had never experienced sexist jokes at conferences. Which brings us to the topic of “chill girls”:

The need for women to fit into a male-dominated tech world means that “chill girls” have to laugh at sexist jokes in order to be part of the “in-group”. To not laugh, or to call out the joker, would be to resign themselves to the “out-group”.

Humans have a fierce need to be socially accepted, and defining in-groups and out-groups is one way to secure that acceptance. This is exemplified in many people’s push back against what they see as too much “political correctness”.

For example, try getting your friends to stop using casually abelist terms like “lame”, “retarded”, “dumb”, or “stupid”. Bonus points if you can get them to remove classist terms like “ghetto” or homophobic statements like “that’s so gay”. What you’ll face are nonsense arguments like, “It’s just a word.” People who call out these terms are berated and no longer “cool”. Unconsciously or consciously, the person will try to preserve the in-groups and out-groups, and their own power from being a part of the in-group.

Stop laughing awkwardly. Your silence is only lending power to oppression. Start calling out people for alienating jokes. Stop preserving the hierarchy of classism, ablism, homophobia, transphobia, and sexism.

Ditch “Culture Fit”

A couple different talks at OSCON got me thinking about the unhealthy results of hiring on the basis of “culture fit”.

drinking-culture
Slide from Casey West’s OSCON talk that says “Never in the history of my career has my ability to drink beer made me better at solving a business problem.”

What is company culture? Is it celebrating with co-workers around the company keg? Or would that exclude non-drinkers? Does your company value honest and direct feedback in meetings? Does that mean introverts and remote workers are talked over? Are long working hours and individual effort rewarded, to the point that people who value family are passed up for promotion?

Often times teams who don’t have a diverse network end up hiring people who have similar hobbies, backgrounds, and education. Companies need to avoid “group think” and focus on increasing diversity, because studies have shown that gender-diverse companies are 15% more likely to financially outperform other companies, and racially-diverse companies are 35% more likely to outperform. Other studies have shown that diversity can lead to more internal conflict, but the end result is a more productive team.

How do you change your company culture to value a diverse team? It’s much more than simply hiring more diverse people or making people sit through an hour of unconscious bias training. At OSCON, Casey West talked about some examples of company culture that create an inclusive environment where diverse teams can thrive:

  • Blame-free teams
  • Knowledge sharing culture
  • Continuous learning
  • No judgement on asking questions
  • Continuous feedback
  • Curiosity about different cultures
  • Individually defined work-life balance
  • Valuing empathy

For example, if you have a culture where there’s no judgement on asking questions or raising issues and people are naturally curious about different cultures, it’s easy for a team member to suggest a new feature that might make your product appeal to a broader customer base. After years of analyzing teams, Google found that the most productive teams foster a sense of “psychological safety”, a shared belief in expressing ideas without fear of humiliation.

The other problem with “culture fit” is that it’s an unevenly applied standard. An example of this was Kevin Stewart’s OSCON talk called “Managing While Black”. When Kevin emulated the company culture of pushing back on unnecessary requirements and protecting his team, he was told to “work on his personal brand”. White coworkers were reading him as “the angry black guy.” When he dialed it back, he was told he was “so articulate”, which is a non-compliment that relies on the stereotype that all African Americans are either uneducated or recent immigrants.

In both cases, even though his project was successful, Kevin had his team (and his own responsibilities) scaled back. After years of watching less successful white coworkers get promoted, he was told by management that they simply didn’t “see him in a leadership role.” Whether or not people of color emulate the white leadership behavior and corporate culture around them, they are punished because their coworkers are biased towards white leaders.

As a woman in technical leadership positions, I’ve faced similar “culture fit” issues. I’ve been told by one manager that I needed to be the “one true technical voice” (meaning as a leader I need to shout over the mansplainy guys on my team). And yet, when I clearly articulate valid technical or resourcing concerns to management, I’m “dismissive” of their goals. When I was a maintainer in the Linux kernel and adamantly pushed back on a patch that wall-papered over technical debt, I was told by another maintainer to “calm down”. (If you don’t think that’s a gendered slur based on the stereotype that women are “too emotional”, try imagining telling Linus Torvalds to calm down when he gets passionate about technical debt.)

The point is, traditional “cultural fit” narratives and leadership behaviors only benefit the white cis males that created these cultural norms. Culture can be manipulated in the span of a couple years to enforce or change the status quo. For example, computer programming used to be dominated by women, before hiring “personality tests” biased for men who displayed “disinterest in people”.

We need to be deliberate about the company culture we cultivate. By hiring for empathy, looking for coworkers who are curious about different cultures, and rewarding leaders who don’t fit our preconceived notions, we create an inclusive work environment where people are free to be their authentic selves. Project Include has more resources and examples for people who are interested in changing their company’s culture.


Thanks for reading! If you want me to write more posts on diversity in tech, please consider donating to my Patreon.