Commentary on Research on New Security Indicators

March 6, 2007

Stuart Schechter, Rachna Dhamija, Andy Ozment, and Ian Fischer, working at MIT and Harvard, have recently published a draft paper titled The Emperor’s New Security Indicators: An evaluation of website authentication and the effect of role playing on usability studies (see http://www.usablesecurity.org/emperor/). This paper that will be presented in May at the IEEE Symposium on Security and Privacy, and it is already receiving a lot of attention in various venues, including Wired, Slashdot, and Computerworld .

What Schechter et al. reported is a study that involved having people repeatedly login into an unnamed bank (presumably Bank of America) that has recently installed new security indicators to help prevent phishing attacks. The bank is using SiteKey, which is a system that allows a user to choose an image and piece of text at registration time, and they are instructed to check for that image again each time they access their bank account.

Logging into the bank becomes a two-step process: in Step 1 the user submits their bank account number (the username), and in Step 2 they are shown their personalized image and text along with a prompt for their password. They are instructed to not enter their password if they do not recognize the image as their own.

The purpose of the SiteKey system is to provide site authentication and prevent phishing attacks where users are lured to false bank sites that attempt to steal their login and password information. Since the false bank site won’t know the user’s personalized image and text, the false site will not show the right information and the user will not enter their password.

The notable finding from the research is that when this security indicator was removed (and the more traditional https browser indicators), just about all the users (92%) willingly logged into their account anyway, even when they were using their own personal banking information.

The researchers also tested the effectiveness of browser warning messages, such as those displayed in Internet Explorer 7, where a full-screen message warns users that there “is a problem with this website’s security certificate.” Here 53% of the participants went on to login to the bank despite the obvious warning that continuing was “not recommended,” including 36% of the people who were using their own personal bank accounts. The implication is that these new security indicators are not going to be very effective in making online banking more secure.

Let me begin my commentary by stating that I like this article, I like the research, and I think we need much more of it. But I think there are serious problems with the methodology in this research caused by a failure to understand the psychology of research participation. As a result, I think the results are biased in the direction of providing over-estimates of the real-world rates at which these security indicators will be ignored.

This is not to say that these indicators will be very successful, or that they are the best strategies we can use to prevent phishing attacks. In fact, I think any technique that requires users to notice something is different when they are doing their banking tasks is bound to have limited success. But this flawed study does not provide compelling evidence about the failure of these devices. Moreover, because the study of usable security is such a new field, we have to be very careful to use the best research methodologies that we can. My motivation in this commentary is to discuss the issues associated with this kind of research methodology; so that we can all do better research.

Recruiting Procedure and the Research Sample

To understand my concerns, let’s review how the research was conducted. Participants in the study (often called Subjects in research reports) were recruited by circulating flyers around Harvard University campus. The study was described as an opportunity to “help making online banking better” and the participants were offered $25 for their participation. Participants were also required to be regular users of the online bank in question, and familiar with Windows and Internet Explorer. A total of 67 people participated in the study, with most (68%) being aged 18 – 25, and just about all of them (91%) being university students. So this was the typical young, student research sample that is so common in psychological research.

It is important to note that 21 people were recruited but were not able to participate in the study. Three of these people refused to sign the consent form and explicitly expressed concerns about the privacy of their banking data. Also, five people stated they could not remember their login information when it came time to complete the tasks. These people may actually have had concerns about using their personal banking information for the study, and used forgetting of their login information as an excuse for not completing the study.

The result is that up to 8 people with, expressed or perhaps hidden, concerns about providing private information during the experiment were excluded from the study. This means that the final group of participants was biased towards people who did not have concerns about their banking privacy in this setting. We know that people have different defaults attitudes towards their personal privacy, with some research showing that 27% are privacy liberals, with few concerns, 17% are privacy fundamentalists, with serious concerns, and the remaining 56% being privacy pragmatists (see Cranor, Reagle, Ackerman, 1999). The recruiting procedures for this study appears to have eliminated the privacy fundamentalists, and thus biased the results towards the liberals and pragmatists who were willing to continue banking in spite of repeated security/privacy indicators.

The Research Setting

The research session took place in a Harvard classroom building, where each participant was “placed alone in a private classroom,” presumably with one or more of the experimenters. Participants were given a consent form to sign, and then a set of banking “tasks” to complete. These tasks were presented rather formally on individual sheets of paper, and the participants were instructed to report non-sensitive information from the bank accounts they were accessing (either their own or a demonstration account). This non-sensitive information was such things as whether the last number in the current bank balance was odd or even, or the date of their last statement. Participants could not proceed to the next task until they completed the current, and the same five tasks were presented in the same order to all participants. In addition, once the instructions were given, participants were told that the experimenters “would not be able to provide assistance or to answer questions about the study tasks.”

The Psychology of Research Participation

After many years of psychological experimentation, we know a lot about the psychology of research participation. The thoughts, opinions, and behaviors of people when they participate in a psychological experiment are influenced by subtle, complex factors that often have nothing to do with the actual experiment. Participants are not just doing the task assigned in the experiment (in this case, online banking), they are also behaving as research participants. Even if they have not participated in studies before, there are emotions, opinions, fears, and perceived expectations that come into play as they decide how to act during the study. There are at least 3 well-know psychological phenomena related to participation in research studies that are important here: demand characteristics, task focus, and obedience to authority.

Demand Characteristics

Demand characteristics refers to the tendency for research subjects to guess the reason for a study, and then to attempt to confirm the experimenter’s apparent hypothesis. Participants will work hard to do what is demanded of them, so that they can please the experimenter and make sure that the research is successful. Participants form a mental model of the purpose of the research and what is supposed to happen, and then they behave accordingly.

The sources of this tendency are often subtle, and include the preconceptions of the participants, the appearance and behavior of the experimenters (age, gender, dress, status, behavior), and the research setting (academic institution, classroom, etc.). Researchers may not perceive themselves and the setting as important for setting expectations about behavior, but it is the perception of the participants that matters. The sources and effects of demand characteristics are subtle and powerful, and the influence on the results can be profound.

In this study, the academic setting for the research, the classroom testing environment, the formal presentation of the tasks to be completed, and the refusal to answer questions about the tasks, all contributed to setting expectations of how to behave. Since the emphasis of the study was on online banking, and all indications were that the experiments expected (demanded) the participants to do online banking, it is not surprising that the participants attempted to please the experimenters and do the banking, even in the face of missing security indicators. If the participants refused to login to the bank, they could not “help making online banking better,” which was the state purpose of the study. Given the participants the option of ending the study while still receiving their payment does not alleviate this bias because the participants have strong social motivations to be a good subject and do what is expected.

The solutions to demand characteristics include remove the cues that are setting the expectations in the minds of the participants, and/or successful deception. Conducting the research outside of traditional laboratory settings, with non-academic looking researchers, is often useful. Also, studies can attempt to mislead the participants about the apparent purpose of the study, to ensure that any bias in the results is neutral or opposite to the research hypothesis. Using deception during psychological research raises ethical issues, but that is the subject of another long essay.

Task Focus

A related phenomenon in research participation is task focus. Participants in research studies are often given tasks to complete, and usually they take the tasks very seriously and are highly motivated to complete the tasks. The demand characteristics, already discussed, often provide strong indications that the participants are expected to finish the task. In addition, even though experimenters may attempt to reassure participants that they are not being tested or evaluated during a study, participants often feel that they are being tested. As a result, they want to complete the task.

In some studies, participants can become very focused on the experimental tasks, and fail to notice or choose to ignore things going on around them. This was demonstrated recently in a study on Internet browsers that showed that participants failed to notice messages in the status line of the browser, even messages that promised a financial reward (i.e., there is a $20 bill under your seat, you can have it if you see this message). [If anybody has a reference to this study, please let me know.] It is also common to see people quickly click- through pop-up messages on computer screens, even ones containing warning messages, in order to continue with their task.

In this study, the tasks given to the participants were online banking. Users were motivated, then, to do the online banking and complete the tasks. Refusing to login to the bank site, the behavior that the experimenters wanted to see, would mean not completing the task and failing the test. It is not surprising then, that participants continued to login even in the face of security warnings.

Obedience to Authority

Another aspect of the psychology of research participation is obedience to (perceived) authority. Research participants are surprisingly, shockingly, willing to obey authority figures in a research setting. The classic studies in this area are Millgram’s experiments on the willingness of participants to apply an apparent electric shock to another person in a learning experiment, and the Stanford prison experiment on role playing prisoners and guards. These studies demonstrate that research subjects are quite willing to risk harm to others in order to obey the perceived authority of the researchers and the expectations set by the research context.

It is not surprising then, that participants are also willing to put themselves at risk of harm in the face of an authority that appears to expect it. In the present study, the participants put their financial information at risk, under the influence of an authority figure and context that seemed to expect it. The researchers may not have perceived themselves and the setting as one involving authority, but it is the perception of the participants that matters. It is not unreasonable to presume that young university students perceived an authority situation in the campus research setting that was used.

It is also important to note that the experimental procedure involved progressive risk taking, as more security indicators were removed and more warnings were provided. We know that people may be more willing to take risks of a progressive nature then they would be if the maximum risk is presented on its own. So, the procedure itself may have led to a bias to ignore the security indicators.

Understanding the Risk

Another flaw in the current experiment is failing to confirm that the participants perceived they were taking risks when they persisted to login to the bank site without security indicators. Although the researchers feel that failing to heed the security warnings as risky behavior, did the participants see it that way? Most people have not personally experienced harm because of their behavior on the Internet, so the perceived risk may be have been very low.

Also, people who have been using the Internet for any time have become accustomed to logging in to systems where there are no security indicators, and it is often necessary to do tasks on the Internet (e.g., the common practice of including login forms on non-secure web pages). Participants may also have accounts at other banking sites that do not provide the SiteKey security indicators, so they may be being quite willing to login without the indicator.

Making Improvements

The way to improve this study is to remove the biases caused by participant selection, demand characteristics, task focus, and obedience. This could be done by removing the focus on online banking such that the login procedure becomes a secondary task. With the proper disguise of an unrelated primary task, the biases could be neutralized and participant’s true behavior could be observed.

Consider, for example, a primary task of preparing an investment plan. Participants are given the task of preparing a plan for handling their current and future investments. One aspect of the task is to take an inventory of their current assets. As a result, during the course of the task, it would become necessary for a user to login to their bank account, but this is clearly a secondary task during the experiment. Participants who fundamentally did not want to do this in this research context would not be excluded, as was done in the current study, but instead they would contribute to the data set.

The same experimental manipulations could be applied so that different groups of users are given different security indicators. It would probably be necessary to only test one login per session to properly protect the true purpose of the experiment, so more participants would be needed to collect the same sized data set. Also, drawing participants from the general population, rather than just recruiting university students, would reduce the chances for bias and improve the validity of the study. With these improvements, we have a better chance of observing users’ true behavior in the face of different security indicators.

6 thoughts on “Commentary on Research on New Security Indicators”

  1. Thanks for writing these comments. While I’m glad the Emperor study was done, I also think your points are well taken, and have a lot of merit in pointing the direction toward new and better studies. I hope this post is read by many researchers as well as non-researchers who heard about this particular study.

  2. [Full disclosure: I work with Rachna and reviewed the paper.]

    Interesting points; many very good ones, but a few not so good.
    1. Your suggestion that the recruitment process excluded the security-conscious isn’t supported. Prospective participants who cite privacy concerns (e.g., I don’t like white-coated shoulder-surfers) are not necessarily more vigilant observers of security indicators. Moreover, 21 vs. 67 is a red herring. The privacy-concerned=security-vigilant argument only applies to 3 out of 88 prospects, you need an additional unsupported hypothesis (prospects are lying) to expand the 3 to 8. I forget my passwords all the time. You should investigate the ratio of password resets to successful logins at financial sites before you implicitly make the claim that zero out of 88 people would forget their passwords in this situation.
    2. The section “Understanding the Risk” misses the point. Yes, users seem not to understand the import of online security. Yes, users seem to not notice the absence of indicators that are rarely presented to them. That’s not a fault of the study methodology, that’s the point of the study!
    See my blog for a expanded version of these points.

  3. Interesting critiques of the research methodology. (I agree with Schiffman’s criticism of “Understanding the Risk”, BTW.). I was surprised by two other elements in the study.

    First, I think the authors may have gone a step too far when they concluded that a security measure that is only 8% effective should not be deployed. To really evaluate whether a security measure should be deployed, you need to consider whether it is cost-effective. Site authentication images are fairly inexpensive. If they give an 8% reduction in losses due to phishing, that could be a terrific bargain. (That’s particularly true when you consider the alternative security measures.)

    Second, the authors didn’t do a particularly good job of simulating the security and attack indicators that occur in the real world. They don’t say in the study, but it appears that the subjects entered the correct web address of their bank in the address bar of the browser, but were transparently directed to alternate pages by the software used in the study. That simulates a DNS poisoning attack, a technique rarely used by phishers. The study subjects may have felt safer because they were not following a link in an e-mail.

    Also, the researchers presented the IE 7 phishing warning page *after* already gaining access to the subject’s site key via a MITM attack. In the real world, the phishing warning would most likely be triggered by the MITM attack, before the site key was stolen.

    I don’t want to downplay the importance of the study too much. In particular, the finding that study subjects who were not using their own accounts were not as security concious is a big deal. I hope to see more research in this area.

  4. Pingback: Andrew Patrick » Commentary on new usable security research: The Emperor is biased

  5. I had the same concerns expressed in the essay, particularly with respect to introduction of bias and subject perception of risk. The author accurately points out a flaw with respect to perceived risk and on line behavior. He merely suggests the researchers failed to confirm whether or not risk perception was a factor in their decision to continue, which would have shown whether or not it affected the results more conclusively.

Leave a Reply

Your email address will not be published. Required fields are marked *