When we started Teacher Tapp we really didn’t appreciate how excellent it would be as a tool for validating survey instruments. A survey instrument is just the bundle of questions that are asked to measure one thing. For example, if we want to know whether behaviour in your classroom is good, the survey instrument might include half a dozen different questions, with each one asked at different times over the week or term. We often use publicly available survey instruments that have been developed and tested by others (e.g. well-being questions from other British surveys), but it is surprising how little work has been done validating teacher specific survey instruments.
When we ask questions, teacher often debate whether the question can dependably tell us the thing we want to know across different types of school settings. For example, can teachers recall the situation accurately? Do they interpret the question consistently? Are they willing to answer honestly?
Validating survey instruments is always tough and often follows a pretty standard process:
- Agree whether the thing we are trying to measure is likely to be uni-dimensional, or not. (For example, well-being and life satisfaction are usually considered to be slightly different things.)
- Draft some initial questions and get a second opinion from others experienced in writing survey questions
- Cross-check with teachers in the field across a wide variety of settings
- Try the bundle of questions out and use factor analysis to look for internal consistency of response across questions
- Pay particular attention to how responses look across different subject teachers and phases
- Revise individual questions that do not seem to consistently measure the phenomenon of interest
- Test again, and assess whether teachers answer consistently over time in situations where you would expect them to
Collecting responses over multiple days aids validation!
We have now exploited the staggered collection of data over days to validate instruments a few times. Back in November 2018, we wanted to find out whether teachers could accurately recall their working hours last week. This is important because almost all research on teacher workload relies on them being able to recall their weekly working hours. We asked how many hours they had worked yesterday for each day from the 20th to 26th of November. Then on the 28th November we asked them to recall their weekly working hours. From this we could establish that 80% of teachers can recall their working hours within a 10 hour (i.e. pretty generous) band. We will repeat this exercise again one day with smaller time bands!
As part of the same project, we wanted to know whether it matters if you ask about a teacher’s anxiety TODAY or YESTERDAY. Our instincts were that recalling anxiety with even just one day’s delay would be seriously subject to bias, but we weren’t sure. It matters to us because ONS always asks about anxiety YESTERDAY in their social surveys, and whilst we’d like to align with their question, this wouldn’t be appropriate if recall bias is a serious problem.
We ran a short randomised trial where half our participants were asked about anxiety today on the Tuesday and anxiety yesterday on the Wednesday. The other half were just asked about anxiety yesterday on the Wednesday. This allowed us to see whether measurement of anxiety is subject to recall bias. Thankfully we’ve found it isn’t particularly, which leaves us more relaxed about exactly how we frame the question for the remainder of our project.
We wanted to validate some instruments of automaticity and habits for a project with UCL researchers. In this example, the survey question roots were taken from a well-validated instrument, but we wrote new stems to apply them to classroom settings. This type of validation is typically done by asking the questions in one go and looking for consistency in response. However, the very fact you are delivering them in a block leads respondents to repeatedly select the same response category for ease. Because Teacher Tapp runs every day, we were able to deliver them on separate days (see an example schedule below). One funny consequence was that our panelists kept asking why they were getting the SAME question over and over again! It felt the same to them because it measured the same underlying construct.
These are all examples of how the unique daily nature of the Teacher Tapp survey is helping academic researchers validate how to ask questions in surveys. It might not feel that important to teachers on the ground, but these tiny, methodological findings are the bedrock of how a better scientific understanding of the profession will emerge.