March 6, 2007
Stuart Schechter, Rachna Dhamija, Andy Ozment, and Ian Fischer, working at MIT and Harvard, have recently published a draft paper titled The Emperor’s New Security Indicators: An evaluation of website authentication and the effect of role playing on usability studies (see http://www.usablesecurity.org/emperor/). This paper that will be presented in May at the IEEE Symposium on Security and Privacy, and it is already receiving a lot of attention in various venues, including Wired, Slashdot, and Computerworld .
What Schechter et al. reported is a study that involved having people repeatedly login into an unnamed bank (presumably Bank of America) that has recently installed new security indicators to help prevent phishing attacks. The bank is using SiteKey, which is a system that allows a user to choose an image and piece of text at registration time, and they are instructed to check for that image again each time they access their bank account.
Logging into the bank becomes a two-step process: in Step 1 the user submits their bank account number (the username), and in Step 2 they are shown their personalized image and text along with a prompt for their password. They are instructed to not enter their password if they do not recognize the image as their own.
The purpose of the SiteKey system is to provide site authentication and prevent phishing attacks where users are lured to false bank sites that attempt to steal their login and password information. Since the false bank site won’t know the user’s personalized image and text, the false site will not show the right information and the user will not enter their password.
The notable finding from the research is that when this security indicator was removed (and the more traditional https browser indicators), just about all the users (92%) willingly logged into their account anyway, even when they were using their own personal banking information.
The researchers also tested the effectiveness of browser warning messages, such as those displayed in Internet Explorer 7, where a full-screen message warns users that there “is a problem with this website’s security certificate.” Here 53% of the participants went on to login to the bank despite the obvious warning that continuing was “not recommended,” including 36% of the people who were using their own personal bank accounts. The implication is that these new security indicators are not going to be very effective in making online banking more secure.
Let me begin my commentary by stating that I like this article, I like the research, and I think we need much more of it. But I think there are serious problems with the methodology in this research caused by a failure to understand the psychology of research participation. As a result, I think the results are biased in the direction of providing over-estimates of the real-world rates at which these security indicators will be ignored.
This is not to say that these indicators will be very successful, or that they are the best strategies we can use to prevent phishing attacks. In fact, I think any technique that requires users to notice something is different when they are doing their banking tasks is bound to have limited success. But this flawed study does not provide compelling evidence about the failure of these devices. Moreover, because the study of usable security is such a new field, we have to be very careful to use the best research methodologies that we can. My motivation in this commentary is to discuss the issues associated with this kind of research methodology; so that we can all do better research.
Recruiting Procedure and the Research Sample
To understand my concerns, let’s review how the research was conducted. Participants in the study (often called Subjects in research reports) were recruited by circulating flyers around Harvard University campus. The study was described as an opportunity to “help making online banking better” and the participants were offered $25 for their participation. Participants were also required to be regular users of the online bank in question, and familiar with Windows and Internet Explorer. A total of 67 people participated in the study, with most (68%) being aged 18 – 25, and just about all of them (91%) being university students. So this was the typical young, student research sample that is so common in psychological research.
It is important to note that 21 people were recruited but were not able to participate in the study. Three of these people refused to sign the consent form and explicitly expressed concerns about the privacy of their banking data. Also, five people stated they could not remember their login information when it came time to complete the tasks. These people may actually have had concerns about using their personal banking information for the study, and used forgetting of their login information as an excuse for not completing the study.
The result is that up to 8 people with, expressed or perhaps hidden, concerns about providing private information during the experiment were excluded from the study. This means that the final group of participants was biased towards people who did not have concerns about their banking privacy in this setting. We know that people have different defaults attitudes towards their personal privacy, with some research showing that 27% are privacy liberals, with few concerns, 17% are privacy fundamentalists, with serious concerns, and the remaining 56% being privacy pragmatists (see Cranor, Reagle, Ackerman, 1999). The recruiting procedures for this study appears to have eliminated the privacy fundamentalists, and thus biased the results towards the liberals and pragmatists who were willing to continue banking in spite of repeated security/privacy indicators.
The Research Setting
The research session took place in a Harvard classroom building, where each participant was “placed alone in a private classroom,” presumably with one or more of the experimenters. Participants were given a consent form to sign, and then a set of banking “tasks” to complete. These tasks were presented rather formally on individual sheets of paper, and the participants were instructed to report non-sensitive information from the bank accounts they were accessing (either their own or a demonstration account). This non-sensitive information was such things as whether the last number in the current bank balance was odd or even, or the date of their last statement. Participants could not proceed to the next task until they completed the current, and the same five tasks were presented in the same order to all participants. In addition, once the instructions were given, participants were told that the experimenters “would not be able to provide assistance or to answer questions about the study tasks.”
The Psychology of Research Participation
After many years of psychological experimentation, we know a lot about the psychology of research participation. The thoughts, opinions, and behaviors of people when they participate in a psychological experiment are influenced by subtle, complex factors that often have nothing to do with the actual experiment. Participants are not just doing the task assigned in the experiment (in this case, online banking), they are also behaving as research participants. Even if they have not participated in studies before, there are emotions, opinions, fears, and perceived expectations that come into play as they decide how to act during the study. There are at least 3 well-know psychological phenomena related to participation in research studies that are important here: demand characteristics, task focus, and obedience to authority.
Demand characteristics refers to the tendency for research subjects to guess the reason for a study, and then to attempt to confirm the experimenter’s apparent hypothesis. Participants will work hard to do what is demanded of them, so that they can please the experimenter and make sure that the research is successful. Participants form a mental model of the purpose of the research and what is supposed to happen, and then they behave accordingly.
The sources of this tendency are often subtle, and include the preconceptions of the participants, the appearance and behavior of the experimenters (age, gender, dress, status, behavior), and the research setting (academic institution, classroom, etc.). Researchers may not perceive themselves and the setting as important for setting expectations about behavior, but it is the perception of the participants that matters. The sources and effects of demand characteristics are subtle and powerful, and the influence on the results can be profound.
In this study, the academic setting for the research, the classroom testing environment, the formal presentation of the tasks to be completed, and the refusal to answer questions about the tasks, all contributed to setting expectations of how to behave. Since the emphasis of the study was on online banking, and all indications were that the experiments expected (demanded) the participants to do online banking, it is not surprising that the participants attempted to please the experimenters and do the banking, even in the face of missing security indicators. If the participants refused to login to the bank, they could not “help making online banking better,” which was the state purpose of the study. Given the participants the option of ending the study while still receiving their payment does not alleviate this bias because the participants have strong social motivations to be a good subject and do what is expected.
The solutions to demand characteristics include remove the cues that are setting the expectations in the minds of the participants, and/or successful deception. Conducting the research outside of traditional laboratory settings, with non-academic looking researchers, is often useful. Also, studies can attempt to mislead the participants about the apparent purpose of the study, to ensure that any bias in the results is neutral or opposite to the research hypothesis. Using deception during psychological research raises ethical issues, but that is the subject of another long essay.
A related phenomenon in research participation is task focus. Participants in research studies are often given tasks to complete, and usually they take the tasks very seriously and are highly motivated to complete the tasks. The demand characteristics, already discussed, often provide strong indications that the participants are expected to finish the task. In addition, even though experimenters may attempt to reassure participants that they are not being tested or evaluated during a study, participants often feel that they are being tested. As a result, they want to complete the task.
In some studies, participants can become very focused on the experimental tasks, and fail to notice or choose to ignore things going on around them. This was demonstrated recently in a study on Internet browsers that showed that participants failed to notice messages in the status line of the browser, even messages that promised a financial reward (i.e., there is a $20 bill under your seat, you can have it if you see this message). [If anybody has a reference to this study, please let me know.] It is also common to see people quickly click- through pop-up messages on computer screens, even ones containing warning messages, in order to continue with their task.
In this study, the tasks given to the participants were online banking. Users were motivated, then, to do the online banking and complete the tasks. Refusing to login to the bank site, the behavior that the experimenters wanted to see, would mean not completing the task and failing the test. It is not surprising then, that participants continued to login even in the face of security warnings.
Obedience to Authority
Another aspect of the psychology of research participation is obedience to (perceived) authority. Research participants are surprisingly, shockingly, willing to obey authority figures in a research setting. The classic studies in this area are Millgram’s experiments on the willingness of participants to apply an apparent electric shock to another person in a learning experiment, and the Stanford prison experiment on role playing prisoners and guards. These studies demonstrate that research subjects are quite willing to risk harm to others in order to obey the perceived authority of the researchers and the expectations set by the research context.
It is not surprising then, that participants are also willing to put themselves at risk of harm in the face of an authority that appears to expect it. In the present study, the participants put their financial information at risk, under the influence of an authority figure and context that seemed to expect it. The researchers may not have perceived themselves and the setting as one involving authority, but it is the perception of the participants that matters. It is not unreasonable to presume that young university students perceived an authority situation in the campus research setting that was used.
It is also important to note that the experimental procedure involved progressive risk taking, as more security indicators were removed and more warnings were provided. We know that people may be more willing to take risks of a progressive nature then they would be if the maximum risk is presented on its own. So, the procedure itself may have led to a bias to ignore the security indicators.
Understanding the Risk
Another flaw in the current experiment is failing to confirm that the participants perceived they were taking risks when they persisted to login to the bank site without security indicators. Although the researchers feel that failing to heed the security warnings as risky behavior, did the participants see it that way? Most people have not personally experienced harm because of their behavior on the Internet, so the perceived risk may be have been very low.
Also, people who have been using the Internet for any time have become accustomed to logging in to systems where there are no security indicators, and it is often necessary to do tasks on the Internet (e.g., the common practice of including login forms on non-secure web pages). Participants may also have accounts at other banking sites that do not provide the SiteKey security indicators, so they may be being quite willing to login without the indicator.
The way to improve this study is to remove the biases caused by participant selection, demand characteristics, task focus, and obedience. This could be done by removing the focus on online banking such that the login procedure becomes a secondary task. With the proper disguise of an unrelated primary task, the biases could be neutralized and participant’s true behavior could be observed.
Consider, for example, a primary task of preparing an investment plan. Participants are given the task of preparing a plan for handling their current and future investments. One aspect of the task is to take an inventory of their current assets. As a result, during the course of the task, it would become necessary for a user to login to their bank account, but this is clearly a secondary task during the experiment. Participants who fundamentally did not want to do this in this research context would not be excluded, as was done in the current study, but instead they would contribute to the data set.
The same experimental manipulations could be applied so that different groups of users are given different security indicators. It would probably be necessary to only test one login per session to properly protect the true purpose of the experiment, so more participants would be needed to collect the same sized data set. Also, drawing participants from the general population, rather than just recruiting university students, would reduce the chances for bias and improve the validity of the study. With these improvements, we have a better chance of observing users’ true behavior in the face of different security indicators.