Theory Building

Things we’re going to talk about today:

1. What makes a good research problem?

  • Research Questions for Theory Development
  • Research Questions for Practical Application

2. Turning research problems into testable hypotheses

What is the purpose of performing research?

  1. To increase knowledge as a consumer of research and to understand research within a discipline or area of study.
  2. Afterwards: to increase knowledge within a discipline or an area of study.

What is the scientific method?

There is no official “scientific method” but basically it works like this:

Maybe a bit more like this:

Actually really like this:

Characteristics of what scientists agree is good science:

Science seeks to improve our understanding of the world.

  • Explanations are based on observations
    • Scientific truths must stand up to empirical scrutiny
    • Sometimes “scientific truth” must be thrown out in the face of new findings
  • Theory and observation affect one another:
    • Our perceptions of the world affect how we understand it
    • Our understanding of the world affects how we perceive it
  • Creativity is important
    • Theories, hypotheses, experimental designs
    • Search for elegance, simplicity
      • Law of parsimony, Occam’s razor

All research is wrong

Laboratory Experiments
Cannot study large scale software development in the lab!
Too many variables to control them all!
Case Studies
How do we know what’s true in one project generalizes to others?
Researcher chose what questions to ask, hence biased the study
Surveys
Self-­‐selection of respondents biases the study
Respondents tell you what they think they ought to do, not what they actually do

and on and on…

Strategies to overcome the wrongness

  • Theory-­building
    • Testing a hypothesis is pointless (single flawed study!) unless it builds evidence for a clearly stated theory
  • Empirical Induction
    • Series of studies over time
    • Each designed to probe more aspects of the theory together build evidence for a clearly stated theory
  • Mixed Methods Research
    • Use multiple methods to investigate the same research question
    • Each method compensates for the flaws of the others to together build evidence for a clearly stated theory

Why is theory building so important?

(A $p$-value of 0.05 means you can expect your results to be wrong (random statistical anomaly) 5% of the time… that’s the definition of a $p$-value)

What is a contribution to research?

A better understanding of computing:

  • Identification of problems with the current state­‐of­‐the-­art?
  • A characterization of the properties of new tools/techniques?
  • Evidence that approach A is better than approach B at solving X?

Meet Stuart Dent:

(Examples taken from Steve Easterbrook, Univ. of Toronto)

Name: Stuart Dent (aka “Stu”)
Advisor: Prof. Helen Back
Topic: “Building a Better Approximate Nearest Neighbor Search”
Status: First year PhD student
Built a tool
Needs to evaluate it

Stu’s Evaluation Plan: Formal Experiment
Independent Variable: Stu-­search vs. FAISS
Dependent Variables: Correctness, Speed, Subjective Assessment
Task: Finding relevant images quickly given a query
Subjects: Grad Students in SE
H1: “Stu-­search retrieves useful images more often than FAISS”
H2: “Subjects retrieves images faster with Stu-search than with FAISS”
H3: “Subjects prefer using Stu-search to FAISS”
Results

  • H1 accepted (strong evidence)
  • H2 & H3 rejected
  • Subjects found the tool unintuitive

Threats to the validity of this experiment?

  • Construct Validity
    • What do we mean by a search? What is goodness?
    • 5-­‐point scale for subjective assessment -­‐ insufficient discriminatory power
      • (both tools scored very low)
  • Internal Validity
    • Confounding variables: Time taken to learn the tool; familiarity
      • Subjects were all familiar with FAISS, not with Stu-­search
  • External Validity
    • Task representativeness
      • search tasks were of a toy problem
    • Subject representativeness
      • Grad students as sample of what population?
  • Theoretical Reliability
    • Researcher bias
      • subjects knew Stu-­search was Stu’s own tool

How could we make this better?

What was the research question? “Is tool A better than tool B?”
What would count as an answer?
What use would the answer be? How is it a “contribution to knowledge”?
How does this evaluation relate to the existing literature?

What theory does this build upon?

Think of Clinic Trials:

The important question:

For this you have to have a theory.

Some Vocab:

  • A model is an abstract representation of a phenomenon or set of related phenomena
    • Some details included, others excluded
  • A theory is a set of statements that explain a set of phenomena
    • Serves to explain and predict
    • Precisely defined terminology
    • Concepts, relationships, causal inferences
    • (operational definitions for theoretical terms)
  • A hypothesis is a testable statement derived from a theory
    • A hypothesis is not a theory!

In CS, we have mostly folk theories

A good theory is a simple explanation of all the available evidence

Why Theory Building Works:

Theories lie at the heart of what it means to do science. They produce generalizable knowledge.

Theory provides orientation for data collection. All data collection and observation relies on a theoretical perspective. Theories allow us to compare similar work, define key terms, and understand what to measure.

Let’s talk about Stu again. What were his theories?

Background Assumptions: Hierarchical clusterings help similarity search.
Basic Theory: Image Retrieval is a subjective process that aims to return semantically related images given a query-image. The ‘Goodness’ of a result depends on the information task of the user. If a result doesn’t seem ‘good’, we many need to change either the index strategy, or the query strategy.
[Still needs some work]
Derived Hypotheses:
H1: Useful Nearest Neighbor Tools tools need to represent semantic distance
H2: Useful Nearest Neighbor Tools need to be insanely fast even for billions of vectors.

Incomplete List of Questions you can ask:

Existence:
Does X exist?
Description & Classification
What is X like?
What are its properties?
How can it be categorized?
How can we measure it?
What are its components?
Descriptive­-Comparative
How does X differ from Y?
Frequency and Distribution
How often does X occur?
What is an average amount of X?
Descriptive-­Process
How does X normally work?
By what process does X happen?
What are the steps as X evolves?

Relationship
Are X and Y related?
Do occurrences of X correlate with occurrences of Y?
Causality
Does X cause Y?
Does X prevent Y?
What causes X?
What effect does X have on Y?
Causality-‐Comparative
Does X cause more Y than does Z?
Is X better at preventing Y than is Z?
Does X cause more Y than does Z under one condition but not others?
Design
What is an effective way to achieve X?
How can we improve X?

Stu’s Research Questions:

Existence
Does nearest neighbor search happen in practice?
Description/Classification
What are the different types of NN search that occur in practice on large scale systems?
Descriptive-Comparative
How does NN built with hierarchies differ from NN without such hierarchical representations?
Causality
Does hierarchical NN search cause users to be more happy with their search results?
Causality­‐Comparative
Does the hierarchical representation of NN in Stu’s index lead users to click on more advertisers than non-hierarchical indexes?


When you have a few theories you want to test or expand upon. You need to actually design experiments to do the testing.

There are a few:

Lab Methods

  • Controlled Experiments
    • Rational Reconstructions
    • Exemplars
    • Benchmarks
    • Simulations

In the Wild Methods

  • Quasi-Experiments
  • Case Studies
  • Survey Research
  • Ethnographies
  • Action Research

Let’s match each of Stu’s research questions to a experimental methodology.

Some notes:

No method is perfect

Don’t get hung up on methodological purity 

Pick something and get on with it

Some knowledge is better than none

Stu built a tool

In computing we often times build tools. This is different from much of science. Science != Engineering

Why do we build things?

Build a Tool to Test a Theory: Tool is part of the experimental materials needed to conduct your study
Build a Tool to Develop a Theory: Theory emerges as you explore the tool
Build a Tool to Explain your Theory: Theory as a concrete instantiation of (some aspect of) the theory