Quantitative Evaluation

From CS 160 Fall 2008

Lecture on Oct 20, 2008

Readings

How to do Experiments Chapter 2, Doing Psychology Experiments, Martin.

How to Decide Which Variables To Manipulate and Measure Chapter 7, Doing Psychology Experiments, Martin.

Discussions

Please post your critiques/commments on the required readings below. To do that, first login by using your user name and password, then click the "edit" tab on the top part of this page (between the "discussion" page and the "history" page), New to wikis? Read the Wiki editing guide. . Hint - Please put a whole line == ~~~~ == (literally) at the beginning of your submitted critique, so the wiki system will index, sign and date your submission automatically.

1 Readings
2 Mike Kendall 22:23, 18 October 2008 (UTC)
3 James Yeh 09:20, 19 October 2008 (UTC)
4 JoshuaKwan 23:11, 19 October 2008 (UTC)
5 Witton Chou 00:06, 20 October 2008 (UTC)
6 Perry Lee 01:51, 20 October 2008 (UTC)
7 Alan McCreary 03:13, 20 October 2008 (UTC)
8 Buda Chiou 04:48, 20 October 2008 (UTC)
9 Jordan Berk 05:42, 20 October 2008 (UTC)
10 Frank Yang 06:16, 20 October 2008 (UTC)
11 KevinFriedheim 06:26, 20 October 2008 (UTC)
12 Jimmy Nguyen 06:45, 20 October 2008 (UTC)
13 Karen Tran 06:46, 20 October 2008 (UTC)
14 Hao Luo 06:52, 20 October 2008 (UTC)
15 Haosi Chen 06:56, 20 October 2008 (UTC)
16 Kumar Garapaty 06:59, 20 October 2008 (UTC)
17 Jacekmw 07:20, 20 October 2008 (UTC)
18 Stuart Bottom 07:52, 20 October 2008 (UTC)
19 Jeffrey Rosen 08:03, 20 October 2008 (UTC)
20 Volodymyr Kalish 08:28, 20 October 2008 (UTC)
21 Trinhvo 08:31, 20 October 2008 (UTC)
22 Vedran Pogacnik 08:45, 20 October 2008 (UTC)
23 Gary Wu 09:15, 20 October 2008 (UTC)
24 Wenda Zhao 09:32, 20 October 2008 (UTC)
25 Xuexin Zhang 09:33, 20 October 2008 (UTC)
26 nathanyan 10:29, 20 October 2008 (UTC)
27 Cynthia T. Hsu 11:09, 20 October 2008 (UTC)
28 Geoffrey Lee 13:03, 20 October 2008 (UTC)
29 Juanpadilla 15:36, 20 October 2008 (UTC)
30 Billy Grissom 16:01, 20 October 2008 (UTC)
31 Shaharyar Muzaffar 16:26, 20 October 2008 (UTC)
32 Greg Nagel 16:31, 20 October 2008 (UTC)
33 Saliem Than 16:32, 20 October 2008 (UTC)
34 Bing Wang 16:45, 20 October 2008 (UTC)
35 Yuta Morimoto
36 Paul Im 17:23, 20 October 2008 (UTC)
37 MuQing Jing 17:26, 20 October 2008 (UTC)
38 Shyam Vijayakumar 17:27, 20 October 2008 (UTC)
39 Kai Lin Huang 17:28, 20 October 2008 (UTC)
40 Jonathan Fong 17:38, 20 October 2008 (UTC)
41 Mikeboulos 17:42, 20 October 2008 (UTC)
42 Antony Setiawan 17:48, 20 October 2008 (UTC)

Mike Kendall 22:23, 18 October 2008 (UTC)

Can I just say that I know all of these things from Sociology 5 (Methods of sociological inquiry)? Anyways, at the very end of chapter 7, Martin brought up a technique that I didn't see in Soc 5 which was dual-task methodology. Maybe it was omitted from our class because of how difficult it would be to implement, but it does make me wonder how and when this method would be used. Also, if your subjects are going to be reading a sentence and listening for a sound, aren't they going to get better and better at this specific type of multitasking as the experiment goes on? Do you perform only one test with each subject or do you have them go through it multiple times and analyse the trials?

Also, with respect to reading, couldn't this be accidentally measuring how engrossing a text is rather than how difficult it is to read? This technique intrigues me, but Martin's description wasn't enough to tell me how or why I would use it.

James Yeh 09:20, 19 October 2008 (UTC)

Both readings seem to provide tools and techniques that would aid in fleshing out the cause-effect relationship between stimulus and behavior. A lot of the information mentioned in the reading was intuitive—for example, the tradeoff between specificity and generality is easy to understand because randomness is encapsulated in nature—but the fact that the author brought these details to light helps us (the experimenters) not forget that many different factors could affect the results and undermine our analysis of a particular experiment. In particular, in my personal experience with conducting experiments I have realized that I tend to focus so hard on getting trivial measurements exact that I sometimes overlook the possibility of confounding variables in my experiment, the validity of my measurements, randomness in my testing sample, etc. However, one reason that could explain my experimental habits is that most of my experiments have been conducted in a lab setting (usually in class), using equipment and samples that are usually well-defined and have little chance for variance. This kind of experimental environment is not nearly as volatile as the one of experiments involving human behavior (as described in the reading), so I would imagine one would have to be very careful and adaptive when conducting such an experiment. Even so, there seems to be so many different variables and factors that could taint the results and conclusions of an experiment that it is hard to determine what degree of thoroughness is required for the experimenter to stop worrying about and/or doubt the results.

JoshuaKwan 23:11, 19 October 2008 (UTC)

Doing Psychology Experiments is a well-written work on how to perform any sort of experiment with a large body of human beings. Because human beings are irrational and unpredictable, a number of measures need to be kept in mind to prevent the testers from 1) selecting a biased set of experimental subjects 2) drawing the wrong conclusions based on the results of their tests 3) having enough experimental controls to ensure that the data set is consistent. The point of all of this is to map the quality of the tried and true scientific method to decidedly non-empirical test conditions, and instead take advantage of statistics in order to at least ensure some sort of "average result" for any test.

I gotta say, though: the comics are terrible.

Witton Chou 00:06, 20 October 2008 (UTC)

Martin brings up a great point that we typically can't control every variable that is not our independent variable. While in Jr High/High School science class we were taught that we have to make sure all our control variables were accounted for, experimenting with a product that will be used in more environments than we may realize means that the product is not always ideally testable in a strictly controlled environment. While in science it may be crucial to enforce the concept of testing only one independent variable at a time (perhaps for observing particle behavior), the experimentation procedures to not fully mirror themselves in product/prototype testing. Thus, random variables can be beneficial as long as we understand what random variables exist in the environment we are experimenting in. Understanding the differences in the testing conditions of experiments is a key factor in understanding and correctly relating the results to our claimed independent variables. In any case, however, we must consider all the variables we can in associating a behavior with a design characteristic. We have to take care in making these associations with the correct attributes.

Perry Lee 01:51, 20 October 2008 (UTC)

Like James, I found the covered material very intuitive. The readings felt like a recap or refresher of the methods that we were taught in our middle school/high school science classes (e.g., what is a hypothesis?). It is important to watch out for confounding variables (as was demonstrated in the "Coke-Pepsi Slugfest" experiment) and to be careful of your selection of subjects. Although it seems silly and obvious, you must be careful not draw a generalized conclusion from a non-representative group of subjects (e.g., all engineers). If you do not have a good randomized selection, you may come to an incorrect conclusion.

Oh, and yeah... the comics are quite horrible.

Alan McCreary 03:13, 20 October 2008 (UTC)

These two readings reminded me of high school statistics - deciding on independent variables and dependent variables, and properly using randomness to minimize the effect of confounding variables. I think one of the most important concepts in the reading is that of the confounding (or lurking) variable, simply because we see its effects everywhere. Many people - in advertising or in politics, for instance - violate the rule, "correlation does not imply causation," suggesting that since a change in A goes along with a change in B, the change in A caused the change in B. The conclusion is often wrong, but I'm sure it fools many.

And yes, there seems to be a consensus that the comics are quite bad.

Buda Chiou 04:48, 20 October 2008 (UTC)

I'm don't understand what is the difference between random variables and confounding variables. It seems like both variables are the elements that we can't control. Maybe confounding variables are the elements that are more obscure, but once we consider existence of these variables, which means we have observed them, it's no longer obscure. Anyway, what can we do after we cognize there are some confounding variables in our experiment? should we just throw away our report and say it's not accurate at all? In fact, I think it's impossible for an experiment to have no confounding variable.

Jordan Berk 05:42, 20 October 2008 (UTC)

The part in Chapter 2 of Martin about "threats to internal validity" seems to have some ideas that can be applied to usability testing for user interfaces. For example, in terms of subject selection, previous experience with similar interfaces, and hence the general familiarity of a type of interface (game, browser, etc) can be important when selecting subjects. Likewise, the tester should cover as much of the range of the target user group as possible over the various factors. The chapter also discussed maturation. This could affect the testing because a tester could have become acclimated to potentially poor design choices simply because he/she will have seen them before, and therefore will likely miss these problems in their testing.

Frank Yang 06:16, 20 October 2008 (UTC)

I felt that this reading brings up important points when it comes to experiments. However, in regards to the creation of user interfaces, it seems that the experiments should only apply to interfaces that have already been created and are merely fine tuning them. The reading brings up such things as validity and setting reasonable ranges for independent and dependent variables. In regards to that, I feel that when we want to design an experiment to help with our interface, it will be extremely hard to create a decent range or maintain validity. The goal of the experiment would be to improve efficiency for the user, and for that, we would need to see an improvement in the user, which implies that multiple sessions would be needed. However, if the two interfaces that contain the changed element that we want to correlate with efficiency are similar, the validity must be questioned because the subject in the experiment may show improvement simply because of exposure to the previous one. It just seems hard to create a formal experiment in order to test a user interface.

KevinFriedheim 06:26, 20 October 2008 (UTC)

Martin in Chapter 2 talks about "Threats to Internal Validity" and immediately what popped into my head when reading about "maturation" was if I were to test my prototype with an older generation versus a younger generation (i.e. students at Cal), the results would have been very different. That is, I could test the flow of my design with individuals that are used to working with like designs, and I could test the same design with individuals (older folks that aren't too computer savvy like both my parents) and get completely different results. I think it was Martin's point to suggest that we need to consider all factors/elements when testing.

Jimmy Nguyen 06:45, 20 October 2008 (UTC)

I think a point that these articles fail to elaborate on is the issues of "causation". The methods of analysis using their variables should have talked more about possible failures to interpret data. I remember learning in my statistics classes about very similar (if not the same) methods of expiermentation. They also taught that causation and correlation were two different things. For instance, if you calculate number of fire trucks vs. number of fires, you would see a linear correlation/regression, but you would not say more fire trucks caused more fires.

Karen Tran 06:46, 20 October 2008 (UTC)

“How to Do Experiments”

This reading presents the different variables that are found in an experiment. There are five different classifications of variables: independent variables, dependent variables, control variables, random variables, and confounding variables. The chapter discussed the details of each of these types and gives examples of their basic definition and characters. This chapter also presented a portion that discusses randomization with constraints particularly, which was particularly interesting because it brings both together both the elements of randomness and control to an experiment. The author also mentions external validity and its importance in experimentation. Also the threats of external validity to internal validity are pointed out. Random variables can be used to improve the external validity of an experiment. Some of the other threats to internal validity are also mentioned. They include: maturity, selection, morality, statistical regression, and testing. I really liked this chapter because it reminds us of what to be aware of in an experiment setting. I found it interesting that there needs to be a conscious effort to minimize confounding variables because of their affects on internal variability. I really liked this reading and especially liked the Coke v. Pepsi and IQ examples, since examples like these help connect the concepts to reality.

"How to Decide Which Variables to Manipulate and Measure"

This chapter from the book expands on the previous reading that we did, which makes total sense since the author is the same. This reading once again deals with dependent and independent variables. First the author discusses the factors that are associated with choosing the independent variable, and then he moves on to the factors associated with choosing a dependent variable. To me this reading was more interesting than the previous since it goes into detail about the small detailed factors that must be decided when designing an experiment. I have actually been in the position of designing an experiment thus I know how important and crucial these decisions are. a little more interesting than the other one because it went into more detail about the subtle decisions involved in designing an experiment. The author also mentions some physiological measures and the behavioral measures. The physiological measures were a little hard to interpret, but they are very useful since they can tell a lot about internal states. Behavioral measures were also discussed in their role of determining a participant’s internal state. This reading was over very useful and will come in handy when designing my next experiment.

Hao Luo 06:52, 20 October 2008 (UTC)

The reading went over more general methods of experimentation and user-testing than the methods we have covered in class so far. While they are not as relevant towards this course and our project, it is more useful in a general sense and good to know . I thought confounding variables was particularly thought-provoking and it's definitely something to watch out for when conducting interviews with users. Are you sure the reason the user is doing a particular action is the real reason? Maybe it was another variable that you were not aware of. This is also why it's useful to have the user provide constant vocal feedback as to not only what he's doing but also why he's doing it. Other things to watch out for I thought were not as relevant to the project but still interesting to read about. Statistical regression was an interesting one and had I not read about it I would have made wrong conclusions about the experiments mentioned in the reading. Even after taking classes in psychology that talked about such experiments and possible errors I still find a lot of these errors to be subtle.

Haosi Chen 06:56, 20 October 2008 (UTC)

These two readings seem to provide tools and techniques that would aid in fleshing out the cause effect relationship between stimulus and behavior. A lot of the information mentioned in the reading was intuitive. For example, the tradeoff between specificity and generality is easy to understand because randomness is encapsulated in nature—but the fact that the author brought these details to light helps us and the experimenters not forget that many different factors could affect the results and undermine our analysis of a particular experiment.

Kumar Garapaty 06:59, 20 October 2008 (UTC)

We could have used the terms and the experimental method shown here during our low fidelity prototype testing as well as future testing. This way we could be more thorough in our experiments/testing sessions so that the data for improving our design would be more useful. The information from the first article about testing was something that most of us have already known from high school science classes and we now see it being applied to psychology. The second article went into much more depth but it was still based on common sense and probably more helpful than the first article. Testing/Experimental method is a very broad subject and can be easily applied to the design process at any particular level.

Jacekmw 07:20, 20 October 2008 (UTC)

Specifically in the context of the lo-fi (or other) prototype testing, these readings are particularly significant. The first one is good in that it provides some structure to the process of testing psychological cases (which in this case would be serious game user interfaces). The independent and dependent variables being various tests of the user interface, while the confounding variables would be outside or inside influences that might distract the user from completing their task. Thus our job as testers of the user interface is to minimize confounding variables as the test occurs. Mortality is an issue in a different way than it is described - for our case, Mortality would signify a user giving up on the interface and quitting the program, or just leaving it alone if they can't even figure that out.

The second text also brings up a good point with composite dependent variables - often when presented with an entire user interface, the user may be overwhelmed and no be able to answer specifically what is bothering them, an issue caused by too many test variables run amok simultaneously.

Stuart Bottom 07:52, 20 October 2008 (UTC)

It would be interesting to use fMRI to study UI interactions. Unfortunately most of the more interesting techniques and equipment used to study mental action are outside the price range of HCI researchers, but as the technology becomes more affordable hopefully we will see it in wider use. These readings bring up the broader question of HCI-psychology interdisciplinary research; it would be interesting to see a few seminal readings on more psychology-based HCI studies. The last section of the second reading (referring to dual-task methodology) might be of particular relevance, considering its connections to the Model Human Processor we studied earlier.

Jeffrey Rosen 08:03, 20 October 2008 (UTC)

It is important to have a procedure for experiments. It is important to test for a specific thing and isolate it as much as possible. If it is not possible to reproduce an experiment reliably, it is not scientific and its utility is limited. One thing I would like to point out is that it is definitely possible to test for multiple things at the same time. This is called multi-variate testing and is actually used commonly for user interface design. This is one of its most well known applications. On a high traffic website, for instance, a site like Google might show different pages to different people. Each one will have slight changes to hopefully get the user to perform some action. After serving a million pages, you are able to look at the statistics and see which combinations performed best.

Volodymyr Kalish 08:28, 20 October 2008 (UTC)

The readings reminded me of the material from my Psychology class. It helped to refresh my knowledge of experimental variables: independent variables, dependent variables, control variables, random variables, and confounding variables. I also like the Coke vs. Pepsi experiment since during my Psychology class I was participating in one study myself. And the cartoons/pictures in the readings weren't that clever.

Trinhvo 08:31, 20 October 2008 (UTC)

The two readings are very interesting and helpful. Ch. 2's title is "how to do Experiments", but I think it's more correct to say it's about what factors make up an experiment, and Ch.7 actually expands Ch. 2 to talk about "how to do Experiments" so experiments could be conducted in the right direction and so they will produce the expected results.

Vedran Pogacnik 08:45, 20 October 2008 (UTC)

Matrin’s first text gives a great template for conducting experiments on any level. Specifically though, when talking about user interfaces, it is very applicable to usability study that we had to conduct some weeks ago. The results would have been different had we, for example, not used a lo-fi prototype; maybe the user would less or more confused about some things in our interface. Also, the target user group has a lot of influence on the outcome, so the “attributes” of a person (age, ethnicity, knowledge of computers, etc.) would definitely be improving the external validity of the experiment.

His second text gives a perfect framework for the context of any study. Both of the texts seem like large scale heuristics for conducting studies.

Gary Wu 09:15, 20 October 2008 (UTC)

Having taken a Psychology class and participating in an experiment, I can now see the importance of the data collection process. Without good testing procedures like setting the correct control variables, choosing the independent variables, and testing the right dependent variables, the data collection process would prove futile. I think the hardest part of these Psychological experiments is determining and trying to combat low internal validity caused by the confounding variables mentioned in the text.

In chapter 7 of Martin's book, he mentions the importance of choosing the correct range. I totally agree with this, as conducting tests on participants who fit a certain criteria will give you skewed results. Have the anomalies during the small testing sections will help with the product design when made on a massive level. The few anomalies in the small group will balloon up to thousands, even millions, if mass marketed. Having the correct range that takes into account extremes of users will provide invaluable information in designing the product to be usable by anyone.

Wenda Zhao 09:32, 20 October 2008 (UTC)

The reading was quite interesting. I enjoyed reading the Coke and Pepsi letter preference experiment. The reading gave me some exposure on how to design and conduct our project experiment from the psychology prospective.The author made a point that it is getting more and more difficult to randomize when picking participants. This really reminds me how we picked our participants for our lo-fi prototype and the first project. We have tried to pick them as random as possible. The goal is to be unbiased towards the picking participants.

Xuexin Zhang 09:33, 20 October 2008 (UTC)

These readings cover the controlled experiments. One of the key element to a meaningful experiment is to choose the independent variable very well, even those it could be not so obvious to us during some experiments. The other thing is to precisely measure the dependent variable since they are they values that modeling the environment. Besides those, the invisible confounding variables could affect the dependent variables,and make it impossible to measure whether the independent variables leads to any difference to the result.

nathanyan 10:29, 20 October 2008 (UTC)

From the "How to do Experiments" chapter, I found the idea of "random variables" over controlled variables interesting, and I'm not sure if I agree with it. External validity is an important attribute of an experiment - it's needed to make any information gleamed from the data actually valid and applicable to real-world situations. To that end, obviously the need for variation in control variables is needed to rule out that they are non-factors, and that the conclusion of the experiment can be applied regardless of the value of the control variable.

Random variation in variables works great for this, but seems like it would break down if results from an experiment were to be inconclusive. For example, instead of controlling the age of participants, we might take a sample of the population with random ages. Suppose that the results from the experiment also produced a random correlation - in this case we have no idea whether the randomness came from the actual independent variable we controlled (i.e. the independent variable has no effect), or if the randomness occurred because of the random distribution of the participants' ages.

The better solution seems to be to test the control variables over a range - in our age case, say, group up particpants by age intervals of 5-years, and ensure that the overall experiment population includes an even mix of each. This way, if the experiment produces a strong correlation, we can generalize this across all age ranges. If the experiment produces random results, at least we can begin to break down the results by age group (are the results also random in the 5-10 age, 10-15 age, etc) and determine whether or not our varying control factor (age) is also a variable.

Cynthia T. Hsu 11:09, 20 October 2008 (UTC)

I was kind of surprised by how engaging I found this reading. I've always been skeptical of psychology as a field, but this article highlighted many of the confounds that I have felt were limits in psychological studies. My supervisor at work (who used to be a psychiatrist but now studies molecular neurophysiology in worms) complained once that we know more about the psychology of college students than about anyone else; this is a serious confound in psychological studies.

Like Witton Chou described, I was kind of surprised by Martin's argument for the value of generalization; it is very convincing but not something that is touched on in traditional experimental design classes, especially the hard sciences (speaking of which, I've had a bad habit of referencing xkcd http://xkcd.com/435/ in normal conversation as of late). The problem with generalization is that ethics and practicality are really confounding in these cases. It would be difficult to eliminate the confound of the people who volunteer for an experiment being of a specific type of personality (perhaps the random people my group interviewed for the low-fi prototype assignment were particularly nice and gave us a biased perspective of our prototype). Or someone might be nervous or be attempting to modify their behavior to fit what they believe they are supposed to be doing. I wonder if that is why the vast majority of psychology seems like common sense.

One problem that I felt was not addressed was how defining the independent variable might confound the experiment. The judges themselves must be randomized in this cae. It seems that with generalizing your data, the number of paricipants you have to include grows exponentially with each randomized variable (for example, the issue of practice with responding to high vs. low frequency of lights required twice as many people as there would have been if practice had not been a compound). I also wonder if Martin realized while describing his lecture example that the variable was not entirely independent - he says explicitly that if the experimenter can not choose what to randomize, as this will not happen, but it seems that without predefined definitions of when to talk slow, medium, or fast, lecture pace can not be randomized either. He might subconsciously choose to lecture innately more interesting material at a faster pace because he is excited. I considered that one possible solution might be to have a previous year's students rate the interestingness of a lecture, but this would be subject to historical confounds, such as a particularly "hot topic" one year; having current students rate a lecture as it is done may restrict the subjects towards the students who are more willing to volunteer, resulting in less generalization. I'm not quite sure what the best response for this would be.

Geoffrey Lee 13:03, 20 October 2008 (UTC)

Most of these 2 readings were pretty basic material that any science/engineering major would know, but I did find some useful tips, such as in the section "Threats to Internal Validity." It was helpful to breakdown the common pitfalls into the article's categories and apply it to my group's project. One particular pitfall that may have affected the results of our low-fidelity prototype was Maturation. We found that our first task was the most difficult for the test-users even though we didn't think it was the most complicated task. But by viewing the results in terms of the maturation criteria, it's clear that each subsequent task becomes easier as the test-user becomes more familiar with the interface. Had we arranged the order of our tasks differently, we would've produced different results.

Juanpadilla 15:36, 20 October 2008 (UTC)

I think this article highlights the fact that it is immensely difficult to provide accurate, reliable and consistent results that all peers will agree on. It almost makes me think why even do these types of controlled experiments. As a result, these reading make it clear how easy it is for a company to perform these types of tests while tweaking the independent variables and their ranges to accommodate the results they are looking for.

Billy Grissom 16:01, 20 October 2008 (UTC)

Initially i didn't find anything super interesting about these articles. But after reading through them I can see that, although they cover the basics, they do a good job of refreshing your memory.

I think just about every sort of experimental process can take something from the ideas presented here. What i also thought was interesting was how the thing was a psychology effort. i guess when experimenting with people you kind of forget to account for all of the human aspects and things that people can have. You place too much control on your control variable when really you don't have as much control as you think. This article did a good job of reminding me what it is like to experiment with people and what sort of problems can arise if you don't work the process right.

Shaharyar Muzaffar 16:26, 20 October 2008 (UTC)

I felt like I have learned the material discussion about variables in my math courses here at Berkeley; just they are now being applied to real-life experiments rather than functions. I did think the section about confounding variable was not only interesting but important. Many people who conducts experiments often think that they're tests fail or succeed without actually thinking why they do. It could be just that there was a confounding variable which led their test to be faulty. Therefore, it is not safe to assume that just because some test passes or fails, a hypothesis is wrong or right. One must make sure the tests accurately tests the hypothesis without any confounding variables.

Greg Nagel 16:31, 20 October 2008 (UTC)

I'd like to disagree with Stuart that using brain imaging would give insight to HCI. We don't understand enough of the brain yet for that sort of technique to be useful. At best, we could find a correlation which could be overused like pupil size was on Madison Avenue. We already get a lot of data from our techniques. We probably need to spend more effort in analyzing it.

Saliem Than 16:32, 20 October 2008 (UTC)

At a recent SES conference held in San Jose, there was a landing page optimization panel held that at some point touched on multivariate testing or the much more clearly named: multiple dependent variable (testing). The goal was to increase conversion rates (the rate at which people who actually bought something, signed up for something etc when visiting the site) via slight changes in the design of a particular page. It's nice to see the idea in practice. Another interesting idea I can't think of an instance of at the moment are composite dependent variables which is an idea that I'm certain is used alongside multivariate testing for there are after all at least 4-5 key elements on any given page.

Bing Wang 16:45, 20 October 2008 (UTC)

The first article talked about how to carry out a proper experiment. I believe the procedure is pretty standard and it is used across all the studies and industries. It involves several steps and creating several procedures to deal with the experiment. Variables such as dependent and independent and control are created. The interesting thing that was mentioned on the second half of the article is validity, It mentioned to use other factors and experiences to validate the variables. I believe it make sense as sometimes during the experiment, one often do not know how to categorize the variables. The choice of variables also vary the results you hope to get. Sometimes, the variable space is so large that one might only want to choose the most relevant variables.

The second article talked about how to manipulate and alter variables to see the effect that it has on the experiment. I believe it is an interesting idea but it seems difficult to carry out in reality. If you have more than just a few variables, the size of manipulation would grow exponentially making data collection extremely tedious. I believe it can be done on the most relevant variables, but anything beyond that is difficult to carry out in reality.

Yuta Morimoto

Readings remind me of a lecture of statistics. In statistics class, I learned about independent and dependent variables and the procedure to obtain statistical regression. I think I can apply the statistical method to our project and will obtain great data to improve the application of our project. However, I know statistical procedure requires many experiment to gather data t to get meaningful values. Even if I can sample twenty or more testers, it does not seem enough to make a rough estimate. I think I have no affluent time to conduct such survey so that I may be useless in our case. But, if I can include a logger in our application, it accumulates useful information to improve future work.

Paul Im 17:23, 20 October 2008 (UTC)

Both of these articles were very informative and were good refreshers on how to conduct experiments. Not only did they define statistical jargon, there were also many helpful tips on how to make the experiments more reliable. There are, of course, many ways to conduct an experiment, and there are too many ways to confound and disrupt the validity of that experiment. Because of this, in order to get accurate results, we need to be able to conduct an experiment while controlling all of these factors.

In reference to our class, I can see how using statistics can help measure the effectiveness of our user interface. By holding experiments, we can see in numerical form how efficient and user friendly our interface is. We could test the functionality of a certain feature by giving the feature to a test group and not giving the feature to a control group. We could then test response times and visual scanning as these two groups use our application. If the test group finds the application easier to use, we would probably include the feature in our serious game. Thus far, the users/designers have subjectively contributed to all of the data. All of this statistical testing gives us good, objective data on our interface.

MuQing Jing 17:26, 20 October 2008 (UTC)

The readings felt like a refresher of some of the basic topics covered in an intro to stats class. However, it does also talk about proper experimentation and observation techniques; not just performing and recording data, but how to prepare and actually set it up. On the whole, though, the articles seemed to cover some fairly obvious and intuitive things, such as the different types of variables. Definitely, though, the concepts covered in the articles should be applied to our testing and feedback stages of design and development; however, we probably would want to expand beyond the limited scope of how the articles lay things out in order to obtain more appropriate data.

Shyam Vijayakumar 17:27, 20 October 2008 (UTC)

I think it might have been interesting to make an experiment out of the low-fidelity prototyping sessions. We simply showed the same user interface to all 3 testers, but we could have given each users slightly modified user interfaces instead. This would be helpful if we had two different ways of carrying out a certain task. One of the ways might be more risky than the other. So we could have divided the 3 people into one control person and 2 experimental people. Actually, the control and experimental groups should be the same size. But anyway, the control person would get the less risky user interface, whereas the experimental people would get the more risky one. This way we could have seen how long the control person when dealing with that part of the user interface and compared it with how long the experimental people took when dealing with the slightly modified user interface. As you can see, the topics covered in the reading is applicable in a variety of situations and should be understood thoroughly before engaging any sessions that involves testing humans.

Kai Lin Huang 17:28, 20 October 2008 (UTC)

Since I have touched Econometrics and statistics from classes, I get a deeper understanding of how to conduct experiment and choose data to collect when reading both chapters from Martin. Because of the complexity of the environment that human are living in, finding a conclusion to a statement can seem to be easy, but often it is challenging when attempting to collect and analyze accurate data, if there is any. Especially in the case that the statement that we want to prove contains ambiguous measurement words such as violence in the example that Martin gives. Some accepted research methods have been developed to identify the reliability and validity of experiments; however, as a person in the science field, we should be skeptical even when using the developed methods to avoid biased outcome. For example, predictive validity to verify the validity of an experiment could have included dependent luring variables.

Jonathan Fong 17:38, 20 October 2008 (UTC)

This was a very interesting reading. I always considered psychology a very soft science where experiments (and experimental data) are subjective and any conclusions were an interpretation of results. Martin, however, does a very good job describing and enumerating principles to make experiments as objective and scientific as possible. Some of it sounds like common sense, but needs to be said. I particularly noticed and enjoyed the part about the "sledgehammer effect. I just happened to read about the sledgehammer effect this past weekend, though a different application. All since we are groups of engineers (i.e. technical and methodical, for the most part), I think we'll be able to conduct pretty good experiments.

Mikeboulos 17:42, 20 October 2008 (UTC)

After reading those two chapters, I found out that most of the steps mentioned were very intuitive to the point that I wouldn't have used them in any experiment that I would do. But having read it, I am more enlightened on their usages and how to apply them, I think that this reading is very essential for us to know since we'll not only be developing a serious game (program), but we'll have to make it interact well with users. Most of the things mentioned in the readings reminded me as well with my statistics course. But the main question here, how will we be able to conduct a controlled experiment while we lack the tools and time. so any experiment that will do will have more random variables than than control variables. and most of our experiment will depend on independent variables (ie: circumstances). And that will threaten the internal validity since we won't be able to control if an uncontrolled event can or can not occur previously in the game (ie: history of an event)

Antony Setiawan 17:48, 20 October 2008 (UTC)

Martin points out an issue concerning using a single dependent variable. I agree that in most cases, the single dependent variable that we choose may not be appropriate. The example that Martin uses shows that two persons performs differently after 10 trials were made; one improves and the other does not. Having one dependent variable shouldn't lead to any conclusion. having multiple dependent variable is important and those Dependent variables are valid if they are selected using standarts that are commonly used.