Metacognitive Reflection and Art Students

Abstract

The study investigates how metacognitive reflection (MR) and teacher feedback affect art students' learning and retention of academic language, particularly in perspective drawing.
The research employs a quasi-experimental design, repeated three times to enhance validity.
Findings suggest that while MR treatment groups showed greater mean gains, the differences were not always statistically significant across all studies.
However, students who engaged in reflection with teacher feedback demonstrated a weighted mean gain of $d = 0.37$ on posttest scores compared to those in the comparison groups, indicating a moderate positive effect.
Keywords include: metacognition, academic language, reflection, teacher feedback, and visual arts.

Introduction

Art education, specifically the study of perspective, involves specialized vocabulary that can be challenging for students.
Students may not always enjoy or engage with the more formal aspects of the curriculum, such as learning vocabulary.
Understanding academic language improves communication between teachers and students and enhances students' ability to think critically about the subject matter.
Metacognitive reflection (MR) is presented as a method to improve students' learning of academic language and other subjects.

Significance

Despite the widespread advocacy for reflection in art education theory, there's a lack of empirical studies examining its effectiveness in art classrooms.
A search of leading art education journals revealed limited research on students reflecting on their work, with no studies reporting statistical data for further analysis.
Existing case studies on reflective practice, while inspiring, lack statistical data to gauge the impact of the described experiences.

Theoretical Constructs

Metacognition

Key foundational theorists in metacognition include Piaget, James, and Vygotsky.
John Flavell and Ann Brown further developed the theory of metacognition through research on children's learning strategies.
Brown (1994) defined metacognition as the process through which learners understand their strengths and weaknesses and access strategies for learning.
Flavell identified different aspects of metacognition, including metacognitive knowledge, experiences, goals, and actions.

Metacognitive Reflection

Metacognitive reflection (MR) is a metacognitive action where students think about their learning, noting important points and mistakes, and identifying connections between initial understanding and learning outcomes.
Georghiades (2004a, p. 371) defined MR as critically revisiting the learning process, acknowledging mistakes, and tracing connections between initial understanding and learning outcomes.
Students do not always engage in metacognitive thinking, even with well-designed prompts and activities (Kwon & Jonassen, 2011).
A conducive classroom environment is crucial for meaningful reflection, requiring trust in the teacher and freedom from judgment (Black & Wiliam, 2009; Georghiades, 2004a; Slinger-Friedman & Patterson, 2016).
Learners' perceptions of their metacognitive use are often inaccurate (McCardle & Hadwin, 2015).
Students may overestimate their reflection depth or be misled by reflections, potentially reducing review efforts if a subject seems easy (Proust, 2007).
Students need to apply reflection results to future learning (Tarricone, 2011).
Teachers can model MR using strategies like Think Aloud (Ellis et al., 2014; Zimmerman, 2013).
Despite understanding the benefits, many teachers do not actively promote MR (Dignath & Büttner, 2018).
MR activities can be simple to implement, even with competing priorities (Bannister-Tyrrell & Clary, 2017; Zuckerman, 2003).

Teacher Feedback

Teacher feedback is often misapplied when praise is given without addressing the task (Hattie & Timperley, 2007).
Effective feedback guides learners in their learning process (Hattie & Clarke, 2019).
Hattie & Clarke (2019) reported feedback as having an effect of $d = 0.73$ on student achievement.
Feedback prevents the adoption of incorrect models by challenging misunderstandings (Kwon & Jonassen, 2011).

Reflection in Art

Reflection has been a part of art education, with early advocates like Winner and Simmons (1992) encouraging students to reflect on their work.
National standards (National Coalition for Core Arts Standards, 2018) call for students to analyze, interpret, and evaluate artwork.
State and local districts, such as Maryland and San Diego, also promote student self-reflection (Maryland State Department of Education, 2018; San Diego Unified School District, 2018).
Despite its promotion, few empirical studies have examined the efficacy of reflection in art classrooms.

Academic Language

Academic language skill is critical to student learning (Lawrence et al., 2015; Uccelli et al., 2015).
Academic language conveys context-specific concepts and can be difficult to comprehend (Jucks & Paus, 2012; Uccelli et al., 2015).
Learning academic language improves communication and enhances thinking about the content (Lahey, 2017; Nagy et al., 2012).
Metacognitive reflection supports student learning by prompting learners to consider their comprehension (Bond et al., 2015).
Meaningful learning includes practicing the language through discussion (Lawrence et al., 2015; Uccelli et al., 2015).
Teachers should value the language skills students bring from their communities (MacSwan, 2018).

Review of Empirical Studies of Metacognition

The following studies provide evidence of the effectiveness of MR in improving learning outcomes.

Metacognition and Math Journals

Baliram and Ellis (2019) studied MR in a high school geometry classroom, assigning classes to treatment or comparison conditions.
The study included a pretest, MR intervention, posttest, and retention test, informed by Hattie's (2012) work and teacher feedback.
To avoid biased responses, a third party provided feedback on student reflections, with the teacher aware of general trends.
Results achieved statistical significance, with the treatment group outscoring the comparison group $(F(1, 73) = 7.27, p = .009, η_p^2 = .09)$ .
Intact classes may have impacted results, with the sample size slightly below what was indicated by a power analysis. However, the study was thoughtfully conducted and representative of realistic educational settings (Gall et al., 2007).

Metacognition, Academic Achievement, and Intelligence

Ohtani and Hisasaka (2018) conducted a meta-analysis of 118 articles, finding a moderate correlation between metacognition and academic achievement when controlling for intelligence (r = .28, 95 \% CI [0.24, 0.31], p < .001).
Intelligence was identified as a confounding variable, with individuals of higher intelligence processing information rapidly, freeing up mental capacity for metacognition.
A limitation was the exclusion of students and adults with disabilities, representing a significant segment of the population.

Metacognition and Confidence

Weight (2017) studied teachers and staff, finding that those using metacognitive instructional strategies reported greater confidence in working with students experiencing anxiety (χ^2 (1, N = 171) = 20.93, p < .05).
The sample was large and representative, with a wide range of experience levels, and qualitative insights were gathered through interviews.
Surveys were limited by participant honesty, with a known lack of fit between teacher report and actual practice.

Dissertation Synthesis

Bond, Denton, & Ellis (2015) examined the impact of reflective self-assessment, documenting results from 10 doctoral dissertations across subjects.
Teacher feedback was part of the intervention in six studies, with a positive effect size documented in seven studies.
The resulting weighted mean effect size was 0.28 for the posttest, with a range of -0.34 to 0.69.
A limitation was that all studies were conducted at one institution, strengthening the need for studies including students from other types of schools and locations.

Learning Science

Georghiades (2004b) conducted an experiment with Year Five students, finding that the experimental group retained more information over time $(p = .048)$ .
The statistical test used in the study provided a reason for readers to interpret the results with caution because the researcher relied on three t-tests to analyze the data.
The use of multiple t-tests inflated the chance of Type I error (Field, 2013). A more conservative approach would have been to use ANOVA with Bonferroni adjustment (Tabachnick & Fidell, 2007).
Aspects might have been improved, it was the type of situated inquiry that relied on methods beyond student self-report and was needed to add to our knowledge of the effects of metacognition (Dinsmore et al., 2008).

Method

Research Questions

To what extent does MR influence students’ ability to learn and retain academic language related to perspective drawing?
To what extent does teacher feedback to the MR influence students’ initial ability to learn and retain academic language related to perspective drawing?

Research Design

The research design used in this study was quasi-experimental, conducted with intact classes taught by the investigator (see Table 1).
To overcome the reduced internal validity of intact groups, three iterations of the study: Spring 2018, Autumn 2018, and Winter 2019, were conducted with different groups over several terms.
The pretest was an attempt to mitigate this threat to internal validity (Gall et al., 2007).

Participants and Sampling Process

Demographics

The students sampled were enrolled in the investigator’s middle school visual arts class.
Assignment of Condition: To remain objective, the investigator flipped a coin to decide which condition each class would receive.

Sample Size

To ensure the number of participants in each study was large enough for the statistical test to detect an effect if it existed (Gall et al., 2007), an a priori power analysis was conducted using G*Power 3 (Faul, Erdfelder, Lang, & Buchner, 2007).
For this analysis the population effect size was set to 0.3 (Ellis, 2010).
Output from the analysis indicated a total sample size of 75 required with (p < .05) and a power level of .8 (Lakens, 2013).

Description of Samples

Five art sections were included in each study. Students ranged in age from 11-15. A majority of students enrolled in Art One were in sixth grade and a majority of Art Two students were in seventh or eighth grade.
In the first study conducted in Spring 2018, three classes were assigned to reflective assessment (see Table 2).
In the second study conducted in Autumn 2018, three classes were assigned to reflective assessment (see Table 3).
In the third study conducted in Winter 2019 three classes were assigned to reflective assessment (see Table 4).

Protection of participants

This study involved typical classroom instruction and assessment procedures, which did not require informed consent from participants.
The investigator protected the privacy of participants’ data by only reporting scores that could not be linked to individual students.

Measures

The first study relied on a teacher generated thirty-question multiple choice test of academic language related to perspective drawing.
Construct validity was assessed by comparison to similar measures in published art curricula and inclusion of academic language listed in state and national visual art standards.
Content validity was attained through a review by a group of art teachers teaching similar ages.
The test was examined for reliability by generating split-half reliabilities using posttest scores.
A value above .7 indicated that an instrument was consistently measuring the same factor (Vogt & Johnson, 2011).
Spearman's rho correlations between the halves of the Spring 2018 test administration were .81 indicating a reliable measure.
Spearman's rho correlations between the split-halves of the Autumn and Winter test administration were .80 and .71 respectively, indicating a reliable measure.

Procedure

For each iteration, at the beginning of the ten-day perspective unit, a pretest on academic language related to perspective drawing was administered to all classes on day one.
Students in all conditions were then instructed over the course of the following eight, 56-minute class periods using a variety of methods including teacher modeling, note-taking, guided practice, independent practice and group discussion.
At the end of the unit on day ten, following a review, all groups completed the same questions as a posttest.
Three weeks after the posttest, the same exam was administered as a retention test.
For the classes assigned to reflective assessment, students engaged in a four to five-minute reflective activity.
These took place on instructional days, toward the end of class, for a total of nine reflective sessions.
The MR prompt used was varied from day to day, so students would not lose interest (Georghiades, 2004b).
Reflection only classes did not receive feedback on their reflective assessments apart from the investigator in the role of classroom teacher thanking them for completing it.
In the reflection with feedback condition, the investigator in the role of classroom teacher individually responded to each student’s reflection with a short note or verbal comment related to what they wrote as soon as possible (Slinger-Friedman & Patterson, 2016).

Statistical Analysis

Because ANOVA had a lower chance of Type I error than multiple t-tests (Field, 2013) and allowed post-hoc testing with a Bonferroni adjustment (Tabachnick & Fidell, 2007), a repeated measures ANOVA was used in this study.
To conduct the repeated measures ANOVA, student scores on the academic language test were entered into SPSS Version 25 software.
As such, there was one within subjects factor: time of test, with three levels: pretest, posttest, and retention test. There was one between subject factor: group, with three levels: reflection with feedback, reflection only, and comparison. The level of statistical significance for this analysis was set at (p < .05).
A post-hoc test with a Bonferroni adjustment post-hoc test was conducted.
In addition to tests of statistical significance the investigator calculated effect sizes pretest to posttest and pretest to retention test for each study using Cohen’s d.

Results

Spring 2018

In the Spring 2018 study, all groups made gains between each test (see Figure 1), except for the reflection with feedback group which plateaued between post and retention test.
There was statistically significant: positive kurtosis in the pretest scores, negative skewness in the posttest, and both negative skewness and positive kurtosis in the retention test (see Table 5).
Mauchly’s test confirmed the assumption of Sphericity was not violated $(p = .27)$ .
There was a statistically significant within-subject interaction effect between time of test and condition $(F(4, 198) = 2.66, p =.03)$ .
However, a Bonferroni adjustment revealed group score differences were not statistically significant when compared across condition.

Autumn 2018

In the second study conducted in autumn 2018, all groups made gains between each test (see Figure 2). The reflection only group started with the highest pretest mean scores and maintained this lead for the following two assessments.
There was statistically significant positive skewness and kurtosis in the pretest scores (see Table 6).
Mauchly’s test revealed the assumption of Sphericity had been violated χ^2(2) =13.96 (p < .001) so the Greenhouse-Geisser values were interpreted.
Bonferroni adjustment revealed group score differences were not statistically significant when compared across condition.

Winter 2019

In the third study conducted in winter 2019, all groups made gains between each test, except for the reflection group which plateaued between posttest and retention test (see Figure 3).
There was statistically significant positive skewness in the pretest scores (see Table 7).
Mauchly’s test confirmed the assumption of Sphericity was not violated $(p = .30)$ .
There was a statistically significant between-subject effect based on condition (F(2, 109) = 7.21, p < .001).
A Bonferroni adjustment revealed group score differences between the comparison group and the reflection group were statistically significant $(p = .03)$ .
Score differences between the comparison group and the reflection with feedback group were also statistically significant $(p = .002)$ .

Effect Sizes

To synthesize the results and look for trends in the data within and across all three studies the investigator calculated effect sizes.
Pretest to posttest effect sizes showed which group had higher initial gains (see Table 9).
These effects were also pooled as weighted mean effect sizes to compare overall results.
Pretest to retention test effect sizes show which group better retained these gains (see Table 10).
These effects were also pooled as weighted mean effect sizes to compare overall results.

Summary of Results

Although the data did show deviations from normality in each study, a trend emerged.
Mauchly’s test revealed that in two cases the assumption of Sphericity was not violated, the one time it was, the Greenhouse-Geisser values were interpreted.
In the Winter 2019 study there was a statistically significant between-subject effect based on condition (F(2, 109) = 7.21, p < .001).
Effects by condition were also combined as weighted mean effects to compare overall results.
These weighted mean effects favored the reflection with feedback group, followed by the reflection only group, for both pretest to posttest and pretest to retention test.

Discussion

During the Spring 2018 study, as suggested by the literature, the reflection with feedback group made the greatest gains pretest to posttest.
In the Autumn 2018 study, the comparison group unexpectedly made the greatest gains post to retention test, almost surpassing the reflection with feedback group in mean score on the retention test.
This sample suffered from a high attrition rate of 36 subjects compared to 16 and 19 from Spring and Winter respectively.
In the Winter 2019 study, based on effect sizes (see Tables 9 and 10), all conditions in this study made the greatest gains compared to any other conditions in previous studies with one exception.

Synthesis

To synthesize the results and look for trends in the data across all three studies the investigator calculated weighted mean effects.
John Hattie (2012) cited anything over $d = 0.40$ as a worthwhile effect size for an academic intervention.
The groups who engaged in reflection with feedback added an average $(d = 0.37)$ to their initial scores and an average $(d = 0.13)$ to their retention scores above the comparison groups.

Limitations

Because this study was quasi-experimental, there was a major threat to internal validity.
As noted earlier, this study relied on a teacher generated test.
Intelligence was a confounding variable in studies of achievement and metacognition (Ohtani & Hisasaka, 2018).
The need for academic data was underscored during the Winter 2019 study.
The current study was based on nine reflective sessions over ten days.
Low attendance at school could have many causes, but anxiety was certainly a contributing factor (Ingul & Nordahl, 2013).
Bianchi (2007) noted the possible differential effect of teachers reacting to student reflections as a weakness in earlier studies.
A conducive classroom environment for reflection (Black, & Wiliam, 2009) included student trust of the teacher (Georghiades, 2004a; Hattie & Clarke, 2019), the current studies were conducted in the context of a semester length class, at times, towards the beginning of the term.

Further Research

Bannert and Mengelkamp (2008) called for improved measures of metacognition.
Studies such as the current one, provided evidence of the short-term benefits of MR. There is reason to believe that metacognition has long-term effects as well (Georghiades, 2004a).
The link between feedback and improved academic outcomes has been studied in depth (e.g. Hattie & Clarke, 2019; Hattie & Timperley, 2007; Schunk, Pintrich, & Meece, 2008), but some areas, including the best timing of the delivery of the feedback are still being researched (Shute, 2008).

Conclusion

The empirical evidence provided by this study should be interpreted with some caution based on aforementioned limitations.
A strength of this study was that it did not rely on self-report which was often the case in these types of study (Dinsmore et al., 2008).
Also, because it was conducted in a school classroom, it had a certain “real world” authenticity.
Metacognition reflection is a not magic solution to every problem in education.