As our world becomes more digital, the impact of bad software becomes more visceral. Whether it’s a locked phone, a locked bank account, or a loved one being locked out of their work, school, or even country–unintentional biases in software are having real-world impacts.
A February study released by Gallup shows that 7.1% of all Americans identify as LGBTQ+, a number that has doubled in the past decade. Despite the growing number of our community wishing to identify outside of traditional gender binaries, we don’t discuss the software biases potentially impacting them enough.
If we truly want to be more accepting of new identities as a society, it’s critical that our software reflects this.
To better understand the unintentional bias potentially impacting some algorithms, it is helpful to look at the importance of accurate data to biometric software specifically.
At their core, all artificial intelligence algorithms rely on the ingestion of data. The more data, the more accurate a prediction that an algorithm can make. This is because most biometric systems rely on supervised learning algorithms. While these algorithms require a substantial amount of data for training, they have the ability to improve as more data is ingested. Traditional algorithms degrade once too much data is taken in.
Supervised learning lets the neural network guess at the answer for each record in a curated data set. In a facial biometric use case, this involves uploading numerous images of the same person and tagging them with key data points. These range from ethnicity, age, and gender identification, all the way to lighting conditions and pose angles.
After the algorithm makes initial guesses, we calculate “how wrong” the network is and propagate these errors backward through the network. This is done continuously for millions of records–and the process is repeated millions of times until the error rate is at an acceptable level.
Why is this important? It’s important because logged data matters. In fact, it is about the only thing that matters in the training of biometric algorithms. The data is run consistently so that the identifiers match the images. Unfortunately, many members of the LGBTQ+ community have been forced to misidentify on official documentation for years.
Currently, only 23 states allow residents to identify as non-gender specific. Federally, the Biden administration only recently announced a similar third gender option would be incorporated into passport identities. Prior to this, everyone was forced to nominate one of two boxes in gender-based identifiers.
This incorrect data is logged in digital permanency and requires proactive effort to address. However, this proactive effort is needed urgently–because these algorithms are quickly becoming even more effective at predicting the data sets they’re fed.
We shouldn’t be too pessimistic. There are countless dedicated and incredibly smart individuals looking to combat these problems. There are also developments that may help speed the process along.
In a recent “hackathon” our developer team took part in, one team member realized that California has started denoting “no preference” responses as a “9” in license barcodes. This may sound small, but pinpointing documents where members of the LGBTQ+ community have been able to correctly identify is an important step in training our algorithms to perform better. These solutions can guide our efforts as we look for better ways of offering inclusive identification options to improve the data pool.
A greater focus on gender diversity amongst software engineers will lead to these issues being proactively spotted earlier in the learning process. The people powering these technologies must reflect the diverse, global audience engaging with these technologies throughout daily life. Many leading organizations have built ambitious DEI policies, and ensuring these efforts are mirrored in the way software is developed and tested is an immediate step for many companies to take.
Software is becoming a greater part of our daily lives. It is often the only way to access vital services. If we train our software with inaccurate data sets that don’t allow for the identity traits we deem important today, then it will fail in ways that are increasingly dangerous.
Truly equal software is possible, but it will take a concerted effort and an openness to address all the potential biases in our datasets.
Steve Ritter is the CTO of digital ID and biometrics company Mitek Systems.
The opinions expressed in Fortune.com Commentary pieces are solely the views of their authors, and do not reflect the opinions and beliefs of Fortune.
More must-read commentary published by Fortune:
- How to dismantle a culture of impunity–and find the real superstars in your company
- Now would be a good time for the IMF to do away with unfair and unnecessary surcharges
- Sallie Krawcheck: This women-led funding round gives me hope venture capital is changing
- Women of color can no longer buy into the ‘inclusion delusion’
- Here’s how CEOs can successfully navigate inflation
Sign up for the Fortune Features email list so you don’t miss our biggest features, exclusive interviews, and investigations.