Loading
O17-2020: Team O

O17-2020: Team O

by orunsdru007 | updated May 13, 2020
based on O17-2020: Team XX

0

State the Challenge Number and Project Name:

Challenge 3 and Project name is Eliminating gender bias in AI algorithms.

Project Description:

There is a lot of gender bias in AI algorithms which is mainly caused by feeding AI algorithms biased data sets. These AI algorithms make conclusions from the data and are biased towards different genders when making recommendations such as the right candidate for a Job.This project describes an easy and fast way to totally eliminate gender bias in AI.

Team Introduction:

My name is Dayo orusnolu and I'm working as a single team. I am currently studying Information and Communication Science in school. Also, I write Python codes and currently learning 'Data analysis using python'

March 27, 2020 at 12:40 PM
Created by amudha
Edited by orunsdru007
Comments (2)
Great, concise project definition. Make sure to add a picture here to make your project stand out. Also, try to find why your solution is unique - are there other solutions similar to yours being used elsewhere?
3 months ago
Cool! It's good to learn about the people behind the project.
3 months ago

Please sign in to leave comments

Describe what is the need of this project?

Bias in AI algorithms leads to biased results. When AI is biased towards a particular gender, it produces results that are biased towards. For instance, a man and a woman both searching for a programmer job with equal prospects. The man would most likely be selected because the AI learnt that most programmers are men from the data it has been fed. That is gender bias.

This project would outline steps that can be taken to eliminate gender bias in AI algorithms.

March 29, 2020 at 12:25 PM
Created by amudha
Edited by orunsdru007
Comments (2)
Great use of linking other media to help explain the problem
3 months ago
Concise, objective, great.
3 months ago

Please sign in to leave comments

Describe who is affected?

AI algorithms are more biased towards the female gender than male gender. For instance, if random people apply for the position of a programmer job listed by a company. The AI algorithm is more likely to select a male programmer, even though there is a female with the same qualities because most people who have held this position in the past are females.

 

 

March 29, 2020 at 2:31 PM
Created by amudha
Edited by orunsdru007
Comments (1)
Good use of statistics
3 months ago

Please sign in to leave comments

What are the causes?

1. Using 'Gender' as one of the major criteria to produce outputs.

When feeding training data to AI,  the AI learns the different genders that exist. It also learns that a particular gender is dominant in a field than the other gender. This creates a bias as the AI would automatically select the dominant gender in that field, even if both genders have equal prospects for the job.

2. Having different gender values in the gender column.

Unifying the gender column before training the data is important as this would ensure the AI does not think in terms of he or she. Instead, it uses other criterias to produce outputs.

March 29, 2020 at 2:53 PM
Created by amudha
Edited by orunsdru007
Comments (1)
Could you elaborate on this? Tell us HOW these causes generate gender bias.
3 months ago

Please sign in to leave comments

what is the evidence?  Also who can you interview?  What can you find out?  What experiment can you run?

A group of researchers from Princeton University and University of Bath conducted a study in which they tested how ordinary human language applied to machine learning results in human-like semantic biases. For this experiment, the authors replicated a set of historically known biased dichotomies of different terms, “using a […] purely statistical machine-learning model trained on a standard corpus of text from the Web.”  “Their results indicate that text corpora [the machine learning system that was tested] contain recoverable and accurate imprints of our historic biases, whether morally neutral as towards insects or flowers, problematic as towards race or gender, or even simply veridical, reflecting the status quo distribution of gender with respect to careers or first names” (Ibid)

Also, AIs are marketed with feminine identities, names and voices. Examples such as Alexa, Siri, Cortana demonstrate this: even though they enable male identities, the fact that the predetermined setting is female speaks loudly.

March 29, 2020 at 2:59 PM
Created by amudha
Edited by orunsdru007
Comments (1)
Good job!
3 months ago

Please sign in to leave comments

The O17 challenge's original question ask, "How could public participation in data gathering reduce this bias?"

There's not much the public can do towards reducing the gender bias in AI except raise awareness and report issues they notice. The reason is that most times, the public is not involved in the collection or gathering of data. Eliminating the gender column in the collection of data has lots of disadvantages. For instance, a social or economic benefit that is designed mainly for women would be difficult to implement as you cannot separate men from women in the data. 

In order to eliminate the bias, data scientists and those involved in feeding the data into the algorithm are the major stakeholders. This is because they have the power to manipulate the data to reduce the bias. In artificial intelligence and machine learning, data is divided into training data and testing data. The training data usually makes up 80% or more of the data while the testing data is always less and is used to verify that the machine learned from the training data. This is to enable the machine learn with as much data as possible and learn as many scenarios as possible. In the case of gender bias, the gender column should be unified. That is, instead of having male or female genders in the gender column, we could replace them with person or people. This should be done before any data is being fed into the system. This would ensure the machine does not assign any specific attributes to a particular gender. Also, this is better approach than totally eliminating the gender column at the point of entry.

The reason why keeping the gender column is important is because the same data collected can be used by government, NGOs and organizations to create social and economic benefits for women. For instance, NGOs can easily create educational awareness about an issue about cervical cancer that affects women by easily filtering out men from the women. The point I am trying to make is that, keeping the different gender intact makes it beneficial in other spheres.

March 31, 2020 at 12:47 PM
Created by amudha
Edited by orunsdru007
Comments (1)
Good job in pointing out important trade-offs. Please explain why the "person" column is useful.
3 months ago

Please sign in to leave comments

What is The Big Idea? What is the value proposition?

The singular idea behind this project is to UNIFY the data in the gender column before it is fed to the AI. This allows the AI to see only one gender instead of two. To better understand the impact of this, I will summarize how machine learning works. Machine learning systems are what they eat. Since the training language for Natural Language Processing (NLP) is human language, these machines have a tendency to perpetuate our own biases.

When we talk about bias in NLP, we can actually talk about both kinds. The pre-existing biases in our society affect the way we speak and what we speak about, which in turn translates into what’s written down, which is ultimately what we use to train machine learning systems. When we train our models using biased data, it gets incorporated into our models, which allows our own biases to be confirmed, and preserved. The short version is that words are represented by lists of numbers called word embeddings that encode information about the word’s meaning, usage, and other properties. Computers “learn” these values for every word by being given training data of many millions of lines of text, where words are used in their natural contexts. Since word embeddings are numbers, they can be visualized as coordinates in a plane, and the distance between words — more precisely, the angle between them — is a way of measuring their semantic similarity. These relationships can be used to generate analogies.

From the image in the second figure, the orange arrows represent royalty, and the blue arrows gender, capturing the relationship man is to king as woman is to queen.

But what happens if we want to extend this analogy to other words, say professions?

March 27, 2020 at 12:40 PM
Created by amudha
Edited by orunsdru007
Comments (2)
Great explanation and use of images!
3 months ago
I appreciate this explanation and illustration, but I think it is missing a an answer to the question in one full and objective sentence.
3 months ago

Please sign in to leave comments

What is the mechanism of beneficial change?

The impact of my approach would be measurable as we would be able to compare the AI outputs from the same data under different circumstances. In my approach, I suggest the use of one gender phrase such as "person" or "human", instead of male or female. With a unified gender, AI does not take into consideration the gender when producing outputs.

March 27, 2020 at 12:40 PM
Created by amudha
Edited by orunsdru007
Comments (1)
This is all great, Dayo. However, in the Theory of Change section, we are seeking a path to the envisioned outcomes. How will you implement this? In which program? Who will use it/ test it? Is there any company that would be willing to try this out? Could you demonstrate this to showcase it to an audience?
3 months ago

Please sign in to leave comments

What are the key metrics?

The impact will be measurable as outputs produced from the same data can be compared against each other with only the gender column modified in one of the two data. The idea is for the AI to take in some data about employees which would include their names, years of experience, organization, gender etc. The expected outputs would be to ask the AI to suggest an employee that is best suited for a certain position from this data. The results produced when using the original data that contains both genders would be compared against the results produced using a single gender.

If the outputs produced differ by more than 70%, then we can be sure that the data is biased due to changes in the gender column.

March 27, 2020 at 12:40 PM
Created by amudha
Edited by orunsdru007
Comments (1)
Maybe you could try to define the impacts a little more? Would a change in the ratio of male/female tech professionals be traceable to your project?
3 months ago

Please sign in to leave comments

Who is most likely to be supportive?

People in the data science and anlytics field are more likely to be supportive in this project as they are the ones who interact directly with data and can notice any flaws in their outputs and designs. Also, studies has found out that, Women in tech are more likely to show support in cases such as this as they try to encourage more women to come into the tech world.

March 27, 2020 at 12:40 PM
Created by amudha
Edited by orunsdru007
Comments (2)
Try to expand on your comment "AI tends to portray women as irrational beings compared to men.". Also, try to think of more precise stakeholders than just men/women. Finally, are there scenarios where men can be interested in a lower bias of AI?
3 months ago
I agree with Will that stakeholders need to be specified. This will allow your project to move forward.
Also, I'd contest the notion that women "are affected more". I believe men are affected too, by being privileged. May this have an effect on the path of your project?
3 months ago

Please sign in to leave comments

Key foes? Who is most likely to oppose?

In the long run, I do not see any opposition to this idea as it creates a level field for all genders. That is, outputs are produced based on other factors excluding the gender of a person. Nevertheless, there will always be people who are opposed to change, one way or another.

March 27, 2020 at 12:40 PM
Created by amudha
Edited by orunsdru007
Comments (4)
Try to give more examples and more detailed stakeholders. "Supporters of patriarchy" is a very broad group of people, encompassing all ages, professions, etc., so it is hard to adapt your project to try to mitigate the opposition.
3 months ago
You might need just one signature, or just one supervisor, somewhere, to make this work in their environment. Can you think of who this could be?
3 months ago
By the way, kudos on not supporting patriarchy! No cookie for you though, it's the least I demand from anyone lol
Am looking forward to see this project tested!
3 months ago
Hello, I've made some changes to my week 2 assignment. Kindly review them and get back. Thanks.
3 months ago

Please sign in to leave comments

What is the user experience?

Let's say I own a company called Essex and I have a database of all my employees. It appears there's a vacancy in the IT department of Essex in which males makes up 80% of the department. A vacancy is announced, we receive thousands of applications which would be difficult and time consuming to skim through one-by-one. We create a model and feed our IT departments datasets to the model. Before feeding the datasets into the model, we replace the gender column in the datasets by replacing them with a gender-neutral pronoun such as person to ensure the gender is not used as a measure to choose a candidate for the job.

Once the outputs are produced, we pick the best five fits and interview them. This has saved us valuable time and money that we would have used to interview thousands of candidates.

March 27, 2020 at 12:40 PM
Created by amudha
Edited by orunsdru007
Comments (1)
Good summary. Are there more details you could provide here? Is there a chance that the metrics themselves are biased, such as years of experience, skill sets, etc.?
3 months ago

Please sign in to leave comments

Who has to do what. to make it happen?

Like I stated earlier, using a gender neutral pronoun or using the same pronoun for both genders before feeding AI the training data is the main aim of this project. Thankfully, this topic is a growing area of research. The 2019 Annual Meeting for Computational linguistics (ACL) http://www.acl2019.org/ that took place in Florence, Italy, which many AI enthusiasts attended had its first-ever Workshop on Gender Bias for Natural Language Processing.

Google has also invested resources into mitigating this problem. In December 2018, they announced that Google Translate would begin returning translations of single words from four languages to English in both the feminine and masculine form.

March 27, 2020 at 12:40 PM
Created by amudha
Edited by orunsdru007
Comments (1)
So the data scientist and the data analyst can choose everything related to their code? Do their bosses, organizations, or teams play a role?
Please specify a real company where you could try this out.
3 months ago

Please sign in to leave comments

Who are the key partners to execute? Key partners to help others evaluate your value preposition?

Data scientists and data analyst are key partners in executing this project. That is because they are the ones directly involved in the cleaning, wrangling and munging of data. I had a video chat with a senior colleague, who is a data scientist in Nigeria and is part of the Data science Nigeria team. I explained my project to him which he gave his approval on and is ready to render assistance wherever needed. Also, I currently signed up for a 5-day training program by "Data science Nigeria" which would be online and would teach students and enthusiast how to train and test their data. This way, I would be able to evaluate the difference between producing results with different genders and producing results using a single gender.

During the training, I would be able to meet Data scientist who have field experience and pose my idea to them for evaluation. Also, this free online training would help me understand how to best carry out my project.

March 27, 2020 at 12:40 PM
Created by amudha
Edited by orunsdru007
Comments (1)
Got it.
What if you spoke to someone from HR of a company that self-proclaims as equal employer and offered this idea? What are the pros and cons of doing this before or after this online training?
3 months ago

Please sign in to leave comments

What are the precipitating events?

It is important to start the project as early as possible to see find out how effective my approach is. That is, comparing results produced from the two datasets to see if there is a huge difference between the outputs. If the difference produced is minimal, that would mean I take another approach.

Also, many machine learning models are at their infant stage and growing. It is important to get it right now as it is impossible for most machine learning models to unlearn. That is, if AI is built on gender bias at an infant stage, it becomes impossible to unlearn as this becomes a foundation upon which outputs are produced in the future.

An AI system that is well trained can be used by millions of companies and individuals. Creating a robust AI system takes lots of time, computational power and human resources. If one AI system is well trained and free from bias, it can be used by others.

March 27, 2020 at 12:40 PM
Created by amudha
Edited by orunsdru007
Comments (1)
When thinking about the world out there, what could be possible reasons for gender bias reduction to happen now?
3 months ago

Please sign in to leave comments

who else is in the field?

Others working in this space include are data analysts, data scientists, tech enthusiasts, data entry operators and UX/UI designers. Conferences are being held each year that discusses the importance of AI bias and gender inequality. One of such conferences it the Association of Computational Logistics (ACL) that held in 2019, in Florence, Italy. http://www.acl2019.org/EN/index.xhtml

The conference had its first ever Workshop on Gender Bias for Natural Language Processing. Also, Google has also invested resources into mitigating this problem. In December 2018, they announced that Google Translate would begin returning translations of single words from four languages to English in both the feminine and masculine form.

April 12, 2020 at 1:55 PM
Created by amudha
Edited by orunsdru007
Comments (3)
Although some forms are filled online by individuals, data entry operators still make up a high percentage of data entry points. Interesting. What do data entry operators typically add to a data set?
3 months ago
Who else is pursuing gender equality? Are there organizations that are already making efforts in this sense? Could they maybe use the help of a CS person such as yourself?
3 months ago
You could also look at the tools that companies currently use to hire candidates. Keyword analysis seems to be the standard in many locations already.
3 months ago

Please sign in to leave comments

I read some articles online that addressed gender bias in AI and proffered solutions. These article suggests more inclusion of females and people of color into the system while others suggest the swapping of gender roles in the dataset. 

https://time.com/5520558/artificial-intelligence-racial-gender-bias/ 

https://www.catalyst.org/research/trend-brief-gender-bias-in-ai/

https://www.forbes.com/sites/falonfatemi/2020/02/17/bridging-the-gender-gap-in-ai/#7bb85c35ee89

 

Including greenhorns into the process is a nice idea over a long period of time but would slow down the team if this should be done immediately due to the huge knowledge gap between the existing team members and the new team members. Also, swapping gender roles is also being bias as this puts the affected gender miles ahead of the other gender.

My approach which uses a gender neutral pronoun helps to save time, money and improves the quality of the data. This is because the gender section in the data can be changed in one sitting and it cost nothing in monetary terms.The AI outputs also increases in quality as the records are not distinguished by gender anymore.

 

April 12, 2020 at 2:41 PM
Created by amudha
Edited by orunsdru007
Comments (1)
Nice!
Are there other people trying to implement it, though?
There are multiple ways to implement something. Sometimes that's what makes one project work and the other not. Please create a path to implementation, even if it is rough or imaginary at this point.
3 months ago

Please sign in to leave comments

Physical and intellectual resources needed (besides financial resources)

Physical resources that would be needed for this project include a laptop with internet connection, the right datasets, and supporting software. Intellectual resources needed are the technical know-how to manipulate the data, adequate knowledge of machine learning algorithms and programming background. I will use python for this project.

April 12, 2020 at 3:02 PM
Created by amudha
Edited by orunsdru007
Comments (0)

Please sign in to leave comments

Next steps? Pilots?

  1. First, I will have to get the right employee datasets to work with. Basically, the datasets would contain columns such as years of experience, job role, skills, gender, age etc. Although, not all columns would be used as a measure to produce outputs.
  2. Clean up my datasets to ensure there are no errors or wrong values in the data. It is important to ensure that data is 100% accurate before being fed into the machine so that the machine can learn properly and correct results are produced.
  3. I will use Amazon SageMaker  which is an online web service provided by amazon to easily build, train and deploy machine learning models.
  4. The particular service I will use is the Amazon SageMaker Autopilot which automatically runs my datasets against different machine learning models and produces outputs based on these models.
  5. From the results produced, I can easily compare if the gender column has a direct impact on the output or not.

 

March 27, 2020 at 12:40 PM
Created by amudha
Edited by orunsdru007
Comments (0)

Please sign in to leave comments

Cost structure? Financial Sustainability? Revenue streams?

In order to sustain this project, I will need to pay for to access for Amazon SageMaker AutoPilot which is a web service provided by amazon for deploying, building, and training machine learning models.

March 27, 2020 at 12:40 PM
Created by amudha
Edited by orunsdru007
Comments (0)

Please sign in to leave comments

How might this go wrong? How might the problem evolve? What are the legal, cultural and other impediments?

The only way this project might have a problem is if the outputs from the two datasets do not differ which is very unlikely. In this case, further research would be carried out.

If the outputs produced from the datasets does not show a correlation with the gender role, further research would be carried out to ascertain other causes of gender bias.

March 27, 2020 at 12:40 PM
Created by amudha
Edited by orunsdru007
Comments (0)

Please sign in to leave comments

How will i promote adoption?

Awareness would be targeted to two particular group of people, first data scientist and the general population. For developers, my datasets, outputs and process would be documented on GitHub, so data scientists who are interested in the project can begin their own research or continue from where I stopped.

For the general public, awareness would be raised on social media to highlight the problem with AI algorithms. Most people affected by gender bias are not aware of the problem. They would be enlightened to understand how gender bias in AI affects them.

March 27, 2020 at 12:40 PM
Created by amudha
Edited by orunsdru007
Comments (0)

Please sign in to leave comments