In July 2013, we prepared to launch a new website for a client, a form which requires first and last names, phone number, address, and a few more pieces of information. Our testing was going well until we got some real beta users. This showed us that our development team had assumed that a first name or a last name must be composed of one or more letters and of only letters. What’s the problem with this?
- John O’Connor
- Gloria Fernández
- Mark London Cooper
- Mary Smith-Foster
- J. Edward Prince
- Arthur Robert Harris, Jr.
- Arthur Robert Harris III
- Nicholas If-Jesus-Christ-Had-Not-Died-For-Thee-Thou-Hadst-Been-Damned Barebone, who was a real person
Each of these names has something that prevents it from fitting in nicely to our clean, Eurocentric first name + last name pattern, all made up of letters.
* John O’Connor has an apostrophe, and it has two capital letters in the last name.
* Gloria has an accent in her name.
* For Mark London Cooper, he will have a space in one of his names… and is he “Mark London” / “Cooper,” or is he “Mark” / “London Cooper”?
* Mary has a hyphen, while Mr. Prince has a period and a space.
* The elder Mr. Harris has a space, a comma, and a period, and his son has at least two spaces.
* Nicholas has a really long name.
And we haven’t even gotten to adding “Dr.” or “MD” or other honorifics! This teaches us that a name can’t comprise solely letters, so what should a valid name look like?
Our development manager, Dustin Ashby, shared this blog post about working with names in software: Falsehoods Programmers Believe about Names. The author’s list of false assumptions gets hyperbolic and funny, but he makes valid points about names and how to handle them.
What does this mean? Do we just throw out all the rules? Is any name acceptable as long as it isn’t blank? And how does The Artist Formerly Known As Prince fill out online forms?
- We try to help our clients understand that names are not as simple as they seem, that there are many valid names that don’t conform with our expectations.
- Despite the Falsehoods article, we make an assumption that all data that a user of our software enters will be Unicode. Our work has been very much centered on the US, so we have not had to work with names that can’t be Unicode encoded.
- We try to avoid matching or lookups based on name, or based solely on name. Names are not unique, and the two systems we are matching may not encode and store names in the same way.
- We encourage our clients to consider using a single “Name” field, rather than splitting into first/middle/last names.
- We do not change the case of letters that a user has entered for their name. If you enter your name in all caps, the system should keep it that way. If you enter your last name as O’Hare, then the system should have two capital letters.
- Finally, we allow alphabetic letters, spaces, punctuation, and numeric digits. We make sure that the Name field has been filled in and move on without further validation.
Our conclusion? We must make a number of assumptions as we define a client’s system. We express as many of those assumptions as we can in our documentation; some assumptions are deeply entrenched in our cultural context and get overlooked. Our assumptions around names make our work easier, and we hope that those whose names fall outside of our assumptions will be forgiving.