State the Challenge Number and Project Name:
Challenge 3 and Project name is Eliminating gender bias in AI algorithms.
There is a lot of gender bias in AI algorithms which is mainly caused by feeding AI algorithms biased data sets. These AI algorithms make conclusions from the data and are biased towards different genders when making recommendations such as the right candidate for a Job.This project describes an easy and fast way to totally eliminate gender bias in AI.
My name is Dayo orusnolu and I'm working as a single team. I am currently studying Information and Communication Science in school. Also, I write Python codes and currently learning 'Data analysis using python'
Describe what is the need of this project?
Bias in AI algorithms leads to biased results. When AI is biased towards a particular gender, it produces results that are biased towards. For instance, a man and a woman both searching for a programmer job with equal prospects. The man would most likely be selected because the AI learnt that most programmers are men from the data it has been fed. That is gender bias.
This project would outline steps that can be taken to eliminate gender bias in AI algorithms.
Describe who is affected?
AI algorithms are more biased towards the female gender than male gender. For instance, if random people apply for the position of a programmer job listed by a company. The AI algorithm is more likely to select a male programmer, even though there is a female with the same qualities because most people who have held this position in the past are females.
What are the causes?
1. Using 'Gender' as one of the major criteria to produce outputs.
When feeding training data to AI, the AI learns the different genders that exist. It also learns that a particular gender is dominant in a field than the other gender. This creates a bias as the AI would automatically select the dominant gender in that field, even if both genders have equal prospects for the job.
2. Having different gender values in the gender column.
Unifying the gender column before training the data is important as this would ensure the AI does not think in terms of he or she. Instead, it uses other criterias to produce outputs.
what is the evidence? Also who can you interview? What can you find out? What experiment can you run?
A group of researchers from Princeton University and University of Bath conducted a study in which they tested how ordinary human language applied to machine learning results in human-like semantic biases. For this experiment, the authors replicated a set of historically known biased dichotomies of different terms, “using a […] purely statistical machine-learning model trained on a standard corpus of text from the Web.” “Their results indicate that text corpora [the machine learning system that was tested] contain recoverable and accurate imprints of our historic biases, whether morally neutral as towards insects or flowers, problematic as towards race or gender, or even simply veridical, reflecting the status quo distribution of gender with respect to careers or first names” (Ibid)
Also, AIs are marketed with feminine identities, names and voices. Examples such as Alexa, Siri, Cortana demonstrate this: even though they enable male identities, the fact that the predetermined setting is female speaks loudly.
The O17 challenge's original question ask, "How could public participation in data gathering reduce this bias?"
There's not much the public can do towards reducing the gender bias in AI except raise awareness and report issues they notice. The reason is that most times, the public is not involved in the collection or gathering of data. Eliminating the gender column in the collection of data has lots of disadvantages. For instance, a social or economic benefit that is designed mainly for women would be difficult to implement as you cannot separate men from women in the data.
In order to eliminate the bias, data scientists and those involved in feeding the data into the algorithm are the major stakeholders. This is because they have the power to manipulate the data to reduce the bias. In artificial intelligence and machine learning, data is divided into training data and testing data. The training data usually makes up 80% or more of the data while the testing data is always less and is used to verify that the machine learned from the training data. This is to enable the machine learn with as much data as possible and learn as many scenarios as possible. In the case of gender bias, the gender column should be unified. That is, instead of having male or female genders in the gender column, we could replace them with person or people. This should be done before any data is being fed into the system. This would ensure the machine does not assign any specific attributes to a particular gender. Also, this is better approach than totally eliminating the gender column at the point of entry.
The reason why keeping the gender column is important is because the same data collected can be used by government, NGOs and organizations to create social and economic benefits for women. For instance, NGOs can easily create educational awareness about an issue about cervical cancer that affects women by easily filtering out men from the women. The point I am trying to make is that, keeping the different gender intact makes it beneficial in other spheres.
What is The Big Idea? What is the value proposition?
The singular idea behind this project is to UNIFY the data in the gender column before it is fed to the AI. This allows the AI to see only one gender instead of two. To better understand the impact of this, I will summarize how machine learning works. Machine learning systems are what they eat. Since the training language for Natural Language Processing (NLP) is human language, these machines have a tendency to perpetuate our own biases.
When we talk about bias in NLP, we can actually talk about both kinds. The pre-existing biases in our society affect the way we speak and what we speak about, which in turn translates into what’s written down, which is ultimately what we use to train machine learning systems. When we train our models using biased data, it gets incorporated into our models, which allows our own biases to be confirmed, and preserved. The short version is that words are represented by lists of numbers called word embeddings that encode information about the word’s meaning, usage, and other properties. Computers “learn” these values for every word by being given training data of many millions of lines of text, where words are used in their natural contexts. Since word embeddings are numbers, they can be visualized as coordinates in a plane, and the distance between words — more precisely, the angle between them — is a way of measuring their semantic similarity. These relationships can be used to generate analogies.
From the image in the second figure, the orange arrows represent royalty, and the blue arrows gender, capturing the relationship man is to king as woman is to queen.
But what happens if we want to extend this analogy to other words, say professions?
What is the mechanism of beneficial change?
The impact of my approach would be measurable as we would be able to compare the AI outputs from the same data under different circumstances. In my approach, I suggest the use of one gender phrase such as "person" or "human", instead of male or female. With a unified gender, AI does not take into consideration the gender when producing outputs.
What are the key metrics?
The impact will be measurable as outputs produced from the same data can be compared against each other with only the gender column modified in one of the two data. The idea is for the AI to take in some data about employees which would include their names, years of experience, organization, gender etc. The expected outputs would be to ask the AI to suggest an employee that is best suited for a certain position from this data. The results produced when using the original data that contains both genders would be compared against the results produced using a single gender.
If the outputs produced differ by more than 70%, then we can be sure that the data is biased due to changes in the gender column.
Who is most likely to be supportive?
People in the data science and anlytics field are more likely to be supportive in this project as they are the ones who interact directly with data and can notice any flaws in their outputs and designs. Also, studies has found out that, Women in tech are more likely to show support in cases such as this as they try to encourage more women to come into the tech world.
Key foes? Who is most likely to oppose?
In the long run, I do not see any opposition to this idea as it creates a level field for all genders. That is, outputs are produced based on other factors excluding the gender of a person. Nevertheless, there will always be people who are opposed to change, one way or another.
What is the user experience?
Let's say I own a company called Essex and I have a database of all my employees. It appears there's a vacancy in the IT department of Essex in which males makes up 80% of the department. A vacancy is announced, we receive thousands of applications which would be difficult and time consuming to skim through one-by-one. We create a model and feed our IT departments datasets to the model. Before feeding the datasets into the model, we replace the gender column in the datasets by replacing them with a gender-neutral pronoun such as person to ensure the gender is not used as a measure to choose a candidate for the job.
Once the outputs are produced, we pick the best five fits and interview them. This has saved us valuable time and money that we would have used to interview thousands of candidates.
Who has to do what. to make it happen?
Like I stated earlier, using a gender neutral pronoun or using the same pronoun for both genders before feeding AI the training data is the main aim of this project. Thankfully, this topic is a growing area of research. The 2019 Annual Meeting for Computational linguistics (ACL) http://www.acl2019.org/ that took place in Florence, Italy, which many AI enthusiasts attended had its first-ever Workshop on Gender Bias for Natural Language Processing.
Google has also invested resources into mitigating this problem. In December 2018, they announced that Google Translate would begin returning translations of single words from four languages to English in both the feminine and masculine form.
Who are the key partners to execute? Key partners to help others evaluate your value preposition?
Data scientists and data analyst are key partners in executing this project. That is because they are the ones directly involved in the cleaning, wrangling and munging of data. I had a video chat with a senior colleague, who is a data scientist in Nigeria and is part of the Data science Nigeria team. I explained my project to him which he gave his approval on and is ready to render assistance wherever needed. Also, I currently signed up for a 5-day training program by "Data science Nigeria" which would be online and would teach students and enthusiast how to train and test their data. This way, I would be able to evaluate the difference between producing results with different genders and producing results using a single gender.
During the training, I would be able to meet Data scientist who have field experience and pose my idea to them for evaluation. Also, this free online training would help me understand how to best carry out my project.
What are the precipitating events?
It is important to start the project as early as possible to see find out how effective my approach is. That is, comparing results produced from the two datasets to see if there is a huge difference between the outputs. If the difference produced is minimal, that would mean I take another approach.
Also, many machine learning models are at their infant stage and growing. It is important to get it right now as it is impossible for most machine learning models to unlearn. That is, if AI is built on gender bias at an infant stage, it becomes impossible to unlearn as this becomes a foundation upon which outputs are produced in the future.
An AI system that is well trained can be used by millions of companies and individuals. Creating a robust AI system takes lots of time, computational power and human resources. If one AI system is well trained and free from bias, it can be used by others.
who else is in the field?
Others working in this space include are data analysts, data scientists, tech enthusiasts, data entry operators and UX/UI designers. Conferences are being held each year that discusses the importance of AI bias and gender inequality. One of such conferences it the Association of Computational Logistics (ACL) that held in 2019, in Florence, Italy. http://www.acl2019.org/EN/index.xhtml
The conference had its first ever Workshop on Gender Bias for Natural Language Processing. Also, Google has also invested resources into mitigating this problem. In December 2018, they announced that Google Translate would begin returning translations of single words from four languages to English in both the feminine and masculine form.
I read some articles online that addressed gender bias in AI and proffered solutions. These article suggests more inclusion of females and people of color into the system while others suggest the swapping of gender roles in the dataset.
Including greenhorns into the process is a nice idea over a long period of time but would slow down the team if this should be done immediately due to the huge knowledge gap between the existing team members and the new team members. Also, swapping gender roles is also being bias as this puts the affected gender miles ahead of the other gender.
My approach which uses a gender neutral pronoun helps to save time, money and improves the quality of the data. This is because the gender section in the data can be changed in one sitting and it cost nothing in monetary terms.The AI outputs also increases in quality as the records are not distinguished by gender anymore.
Physical and intellectual resources needed (besides financial resources)
Physical resources that would be needed for this project include a laptop with internet connection, the right datasets, and supporting software. Intellectual resources needed are the technical know-how to manipulate the data, adequate knowledge of machine learning algorithms and programming background. I will use python for this project.
Next steps? Pilots?
Cost structure? Financial Sustainability? Revenue streams?
In order to sustain this project, I will need to pay for to access for Amazon SageMaker AutoPilot which is a web service provided by amazon for deploying, building, and training machine learning models.
How might this go wrong? How might the problem evolve? What are the legal, cultural and other impediments?
The only way this project might have a problem is if the outputs from the two datasets do not differ which is very unlikely. In this case, further research would be carried out.
If the outputs produced from the datasets does not show a correlation with the gender role, further research would be carried out to ascertain other causes of gender bias.
How will i promote adoption?
Awareness would be targeted to two particular group of people, first data scientist and the general population. For developers, my datasets, outputs and process would be documented on GitHub, so data scientists who are interested in the project can begin their own research or continue from where I stopped.
For the general public, awareness would be raised on social media to highlight the problem with AI algorithms. Most people affected by gender bias are not aware of the problem. They would be enlightened to understand how gender bias in AI affects them.