Sunday 23 January 2022

Assessment in English (Part 3- Reliability and Validity)

This is the third instalment of my blog series on assessment in English.  Following on from my last blog on some of the key terms when it comes to curriculum planning, I now want to take a brief look at the terms ‘assessment’ and ‘validity’, in order to consider the questions they raise when it comes to planning and implementing effective assessment.  

Validity

Dylan Wiliam’s statement that ‘there is no such thing as a valid test’ (Wiliam, 2020) really challenged my thinking when it came to considering how we use assessments (for both formative and summative purposes).  Instead, it’s all about considering the purpose of the data we gather and considering the validity (or accuracy) of the inferences we make- whether the data be end of unit test scores, mock results or more qualitative data (such as students’ responses to a mini-whiteboard task).


As English teachers, we are often adept at unpicking underlying meanings within language and assessing their validity; it’s important that we apply the same thought process to the assessment data we gather.


Wiliam explains (in the brilliant ‘ResearchEd Guide to Assessment’- a must-read for anyone evaluating the efficacy of assessment in the classroom) how there are two main threats to this:

  • Construct underrepresentation (meaning that the sample of knowledge being assessed is too small to make valid inferences about students’ learning)
  • Construct-irrelevant variance (where a lack of knowledge not relevant to the knowledge being assessed prevents students from demonstrating what they have learnt)


Take, for example, this GCSE-style question:

Explore how Shakespeare presents Macduff as virtuous.


Construct underrepresentation is easily illustrated through the use of literature mock exams, where they are used to assess students’ knowledge of a set text.  For example, a low mark for an essay that focuses on Macduff could lead a teacher to infer that the student’s knowledge on the play ‘Macbeth’ needs much work.  However, it might be that their knowledge of Macbeth, Lady Macbeth and other characters/themes is much stronger: the inference, therefore, wouldn’t be valid.


Likewise, the word ‘virtuous’ might also pose a barrier to valid inferences, since it might generate construct-irrelevant variance.  If students aren’t confident with the meaning of this word, then they are less likely to be able to communicate their knowledge of the text.  Whilst it is important to teach and promote a wide vocabulary, we do need to be aware that the use of it in assessment questions might impact our ability to make accurate inferences of what students do and don’t know.


Reliability

Reliability is the measure of how consistent an assessment result would be if the assessment was repeatedly administered over time. Assuming that no learning took place, you’d expect a completely reliable assessment to generate the same mark for a single student, no matter when they took it or who marked it.


Again, I’m referring to Wiliam (the champion of evidence-informed assessment), who highlights that total reliability of an assessment isn’t logistically possible.  Instead, ‘we need to be aware of the limitations of our assessments so we do not place more weight on the result of an assessment that its reliability would warrant’.


I feel that a specific threat to English assessment is the subject nature of much of the success criteria we use (whether this be on the mark schemes for the KS4 and KS5 exams, or on internal criteria).  You only need to look at the difference in grading after GCSE English re-marks have been submitted to see that even standardised tests cannot be wholly reliable.


Questions for English teachers and leaders

I’m definitely not advocating for exam-style questions to be banned from the English classroom, as they are important to prepare students for the exams they will sit.  This also isn’t where I start to explain my views on exam reform.


That being said, it is clear that any summative use of assessment needs to be planned carefully to maximise the validity of the inferences we make and mitigate for the impact of issues with reliability.


When we design these assessments, we need to consider:

  • What do we want to assess?  Does the assessment sample a wide enough range of this knowledge?
  • Which inferences do we want to make from the data?  How does the assessment set up these inferences to be valid?
  • Are there any barriers (especially gaps in knowledge and vocabulary gaps) that prevent students demonstrating the knowledge they have?  How can these be mitigated?
  • If the same student completed the assessment on different days, how consistent (or reliable) would their score be, assuming no new learning has taken place?
  • If different teachers marked the same assessment, how consistent (or reliable) would their marking be?
  • What actions could be taken to mitigate the risks to reliability?

To conclude, when it comes to reliability and validity, Wiliam’s advice at the end of his chapter ‘How to think about assessment’ is rock-solid:




References:

Wiliam, D. (2020) ‘How to think about assessment’ The ResearchEd Guide to Assessment John Catt Educational

Sunday 9 January 2022

Assessment in English (Part 2: Key Terms)

This is part 2 of a series of blogs looking at assessment in English.  For part 1, please click here.

Part 3, focusing on the reliability and validity of assessments can be accessed here.

Last year, I blogged on the importance of a shared understanding of terminology when it comes to discussing great teaching.  Likewise, this shared understanding of key terms is vital when it comes to discussing assessment, especially given the number of disagreements on social media that have arisen from misinterpretation.  Nobody understands this more than us, given our knowledge of how meaning is tied to the reader’s interpretation as much as the writer’s intention.


For this reason, I wanted to start by setting out my own interpretations of some key terms that sit behind a solid understanding of assessment in English.  I accept that some of these might be widely debated, but I’m hoping that these definitions will help when reflecting on an understanding of assessment in later blogs.


Knowledge

Though this may be a controversial statement, I see knowledge as the foundation of teaching and learning.  This is reflected in the Cambridge Dictionary’s definitions of the verbs ‘teach’ (“to give someone knowledge”) and ‘learn’ (“to get knowledge or skill”).  Everything we do is rooted in planning, delivering, applying and knowledge.


One of the reasons many teachers might disagree with this sentiment is the concept that skills are not the same as knowledge.  However, I view this as a false dichotomy and adopt the view that there are different types of knowledge, including declarative knowledge and procedural knowledge (the latter being the knowledge that allows us to demonstrate skills).





Declarative knowledge

Declarative knowledge can be seen as the factual knowledge we teach students )“knowing that- facts” (Raichura, 2018).  Examples of this within English could be:

  • Knowing that Shakespeare believed James I to be a descendent of the real-life Banquo
  • Knowing that inferences are guesses based on evidence
  • Knowing that semi-colons are used to separate two independent clauses


Procedural knowledge

Whereas declarative knowledge is factual, such as in the examples above, procedural knowledge is can be used as a term for skills-based knowledge (“knowing how” (Raichura, 2018)).  In this sense, knowing how to do something is classed as a type of knowledge in itself.  Examples within English could include:

  • Knowing how to analyse a simile
  • Knowing how to accurately use semi-colons
  • Knowing how to interpret a character


Procedural knowledge is based on a foundation of declarative knowledge (you’d be hard-pressed to use a semi-colon if you didn’t have knowledge of the conventions of their use or what they look like) but I believe it shouldn’t be seen as a hierarchy.  Though the relationship between declarative and procedural knowledge will impact the sequence of teaching, it doesn’t mean that declarative knowledge is any less complex or powerful.


Curriculum

The Cambridge Dictionary definition focuses on curriculum as ‘what is studied’ (whether it be subjects across a school or the specific knowledge within a subject).  However, it’s also important to consider the ‘how’ (curriculum implementation, including sequencing) and the symbiotic relationship between curriculum and assessment.  In short, the findings from assessment processes should inform the curriculum choices as much as the curriculum choices inform the decisions around how and what to assess.


Assessment

When I began teaching, I would have seen ‘assessment’ as a synonym for the task a student does that is then marked by me, the teacher.  I think that teacher John Dabell would probably see the government-driven APP (Assessing Pupils’ Progress) and the levels that went alongside this as the cause, given that he  explains how it “led to merging formative and summative into one big stinking pot of damaging sub-levels and labels which politicians stirred and cackled over.”


Either way, the view that ‘assessment’ is a task completed by students is a problematic one.  Instead, it’s better to take Dylan Wiliam’s lead and focus on its root word: to assess.  If we do this, then the focus shifts to the process the teacher undertakes instead.  The tasks are just the input that allow us to ‘assess’.  In defining this, I’ve also followed up on Wiliam’s interpretation by focusing on Lee J. Cronbach’s (1971) definition of assessment as ‘a procedure for drawing inferences’ (Wiliam, 2020).


Our purposes for those inferences and the subsequent actions then lead us to the definitions of formative and summative assessment.


Formative assessment

Paul Black and Dylan Wiliam’s seminal work Inside the Black Box (1998) is well known for exploring formative assessment as a process.  Wiliam even went so far to reflect on the coinage of the term in a 2013 tweet, by considering that it should have been named ‘something like “responsive teaching”’ (Wiliam, 2013).


So how we we define this?  Let’s go back to the root word again: form (from the Latin ‘formare’ - to form).  Formative assessment helps us make inferences that allow us to form the next steps of the learning process.  Essentially:

What do they know?

What do they not know?

What next?  How does this impact our teaching and students’ learning?


In this sense, formative inferences form part of a cycle of learning where the inferences from assessment constantly feed into teaching on an ongoing basis.  Great teachers make and act on formative inferences constantly, both within and between lessons.


Harry Fletcher-Wood explores this effectively in his blogs and also his book on the subject (Fletcher-Wood, 2018), which look at responsive teaching across different subjects.  In later blogs, I intend to set out how we can utilise effective formative assessment in the English classroom.


Summative assessment

Whereas formative assessment forms part of a ‘cycle’ of learning, summative assessment focuses more on the end goals by representing the sum of a student’s learning.  The most ubiquitous example of a summative inference is the grades students are awarded at the end of a course (students’ knowledge of the course content is assessed through a sample in an exam paper and/or non-exam assessment and the summative inference is made about their attainment in that subject).


However, making such inferences can be problematic, as they are often reported to external stakeholders (whether it’s parents, the press or politicians) and can be based on a narrow sampling of students’ learning.  Even where these are well-designed, Daisy Christodoulou highlights how despite the ‘accurate shared meaning’, they provide us with ‘relatively little information that will change [our] teaching’ (Christodoulou, 2016).  I would argue that this is especially true for English that other subjects for formal qualifications, due to a combination of subjective criteria and other variables that impact students’ ability to hit the criteria in place.


As the focus of summative inferences is to make judgements of students’ learning and attainment across a longer span of time, we encounter summative assessment less frequently and it doesn’t impact our teaching directly.  That being said, the fact that - since the removal of levels - English schools are now freer to make their own decisions about how to assess summatively makes it worth considering effective approaches to this (which I discuss in the final section of this book).





Reliability and Validity

The final two terms that I feel are vital to the design and use of effective assessment are reliability and validity.  In part 3 of these series, I’ll explore what these terms mean in terms of assessment in the classroom and how they impact the design and use of assessment.


Reflection Questions

To what extent do these definitions chime with your existing understanding of the terms?


Where do you currently make formative inferences as part of assessment processes?

How do these impact the next steps for teaching and learning?


What summative inferences do you currently make in your school?

How useful do you find these summative inferences in your practice?


References

Cambridge Dictionary, https://dictionary.cambridge.org/dictionary/english/teach

Cambridge Dictionary, https://dictionary.cambridge.org/dictionary/english/learn 

Cambridge Dictionary, https://dictionary.cambridge.org/dictionary/english/curriculum 

Christodoulou, D. (2016) Making Good Progress? The future of Assessment for Learning, Oxford University Press

Fletcher-Wood, H. (2018) Responsive Teaching: Cognitive Science and Formative Assessment in Practice, Routledge

Raichura, P. (2018), https://bunsenblue.wordpress.com/2018/05/31/procedural-declarative-knowledge-my-cogscisci-talk/ 

Wiliam, D. (2013), https://twitter.com/dylanwiliam/status/393045049337847808

Wiliam, D. (2020), ‘How to think about assessment’, The ResearchEd Guide to Assessment, John Catt Educational

Monday 3 January 2022

Assessment in English (Part 1: An Introduction)

After looking at assessment in my own practice and across the English curriculum as a trust English lead, I’ve decided to sum up some ideas and approaches in a series of blogs to kick-start 2022.  For part 2 in the series, please click here.

Assessment can be particularly problematic for English teachers, due to the subjective nature of most of what we teach- something that often leads to a sea of remark requests when exam results are issued.  Though I’d be wary of any argument that assessing English is ‘harder’ than other subjects (such comparisons are rarely useful and often only serve to divide teachers), we cannot ignore the fact that many of the processes we look to assess are inherently complex.


For experienced practitioners, who have had repeated exposure to what good does (and doesn’t) look like, it’s possible to trust initial impressions to make inferences of students’ knowledge, based on what they produce.  However, criteria such as GCSE mark schemes can be an unhelpful support, leading many of us to question what makes an explanation ‘clear’ or how you decide that a student’s work is ‘perceptive’ rather than ‘thoughtful’.


The definitions of some key terms in assessment can also be confusing.  As a trainee and NQT, I remember getting muddled about what constituted formative and summative assessments, as well as how to make them effective.  Luckily, with the growth of edutwitter and the range of books and blogs on the subject, there is more on offer to demystify assessment and its role in teaching.  I’m hoping that these blogs will serve as a welcome support for English teachers - from trainees to those with decades of experience in the classroom- who are looking for clear views of assessment in English, an understanding of where the challenges lie and advice on how to overcome these.


I’ll be taking a tour of the different elements of assessment in English, firstly clarifying some key definitions of terms linked to assessment in the classroom before then taking a detailed look at different approaches to formative assessment and the tools that we will need for our day-to-day practice.  Later, I’ll move on to address the questions surrounding summative assessment, considering the systems we cannot control (such as GCSE and A level exams) and also evaluating different approaches to assessment that have arisen since the removal of levels from the English National Curriculum.