WEEK SIX Intelligence Journal of Review - We Can Boost IQ: Revisiting Kvashchev’s Experiment Notes

Introduction

This paper examines the effects of creative problem-solving training on intelligence.
It revisits Stankov’s report on R. Kvashchev’s experiment, which showed an average IQ increase of seven points across 28 intelligence tests after three years of training.
The paper argues that previous analyses were conservative and didn’t account for the reduction in IQ test variances after training.
When standard deviations from the initial test and the 2nd retest were pooled, the experimental group performed 10 IQ points higher than the control group.
With proper measures of fluid and crystallized intelligence, the experimental group showed a 15 IQ point increase compared to the control group.
The conclusion is that prolonged, intensive training in creative problem-solving can substantially and positively affect intelligence during late adolescence (ages 18–19).
Keywords: intelligence; training; creative problem-solving

Background

For decades, hereditarians have influenced beliefs about the modifiability of intelligence.
Jensen (1969) claimed that intelligence is about 80% heritable, limiting environmental/schooling effects.
This view led to increased focus on racial/ethnic IQ differences and the belief that cognitive performance training is ineffective with effects limited to practiced tasks.
Far transfer (applying knowledge to distantly related tasks or broader cognitive processes) was considered unlikely.
Jensen’s views on heritability and training effects were influential in intelligence research (Eysenck 1971; Haier 2014).
It is now accepted that the heritability of intelligence increases from about 20% in infancy to about 80% in later adulthood (Plomin et al. 2014; Plomin and Deary 2015), leaving room for environmental effects, especially in school-aged populations.

The Effect of Cognitive Training on Intelligence

Ongoing debates exist about the effects of cognitive training on cognitive abilities.
In 2014, two groups of scientists presented opposing views (Simons et al. 2016).
One group claimed no compelling evidence exists that cognitive training can reduce or reverse cognitive decline.
The claim that working memory training can improve fluid intelligence attracted criticism (Novick et al. 2020).
The other group stated that cognitive training procedures can significantly improve cognitive function.
Harvey et al. (2018) argued that computerized cognitive training is effective for far transfer and can improve cognitive performance on untrained skills in healthy older people and people with schizophrenia.
This echoes findings from research on practicing performance with competing/dual cognitive tasks (Fogarty and Stankov 1982, 1988).
Doing two intelligence tests simultaneously for eight practice sessions led to overall improvement in all components of the competing tasks and an increase in the common variance captured by tests of fluid intelligence (Stankov 1991).
Therefore, depending on the cognitive tasks and training protocol, practice may lead to transfer effects beyond closely related tasks.
Simons et al. (2016) were negative and agreed with Jensen on the poor effectiveness of far transfer.
They concluded that brain-training interventions improve performance on trained/practiced tasks but not on distantly related tasks or everyday cognitive performance.
Overall, the conclusion was pessimistic.
They also reported shortcomings in intervention studies, including small sample sizes, short training periods, lack of random assignment and pre-test baselines, reliance on a single measure of intelligence, and the absence of a control group.
Stankov (1986) noted that most of these shortcomings didn't apply to Kvashchev’s (1980) study because it had a large sample size (N close to 300), lasted over three years, relied on 28 intelligence measures, and used random assignment of school classes to control and treatment groups.
Given the positive outcomes of Kvashchev’s experiment, it was suggested that carefully designed training can be effective if long-lasting and intensive.
Recent studies have challenged Jensen’s (1969) claim about the failure of compensatory education in boosting intelligence.
A meta-analysis reported an average increase of 3.4 IQ points for one additional year of schooling (Ritchie and Tucker-Drob 2018).
This was replicated by Hegelund et al. (2020) in Denmark, who found an average increase of 4.3 IQ points per year of education in young adulthood and 1.3 IQ points in adults in mid-life.
They also reported that individuals with low intelligence in childhood derived the largest benefit, concluding that education constitutes a promising method for raising intelligence, especially among disadvantaged individuals.
These studies suggest that more years of formal schooling can improve the cognitive functioning of children and young adults.
The strong hereditarian position has also been challenged in studies that investigate the role of socioeconomic status (SES) in the development of intelligence.
Selita and Kovas (2019) argued that social inequality and social policies can profoundly affect the heritability of educational attainment.
Heritability is higher among people living in neighborhoods with greater equality levels, implying that social inequality stifles the expression of educationally relevant genetic propensities.
This agrees with Weiss and Saklofske (2020), who focused on environmental rather than hereditary influences on group differences in intelligence.
They reviewed findings with standardized IQ tests and suggested that pronounced environmental effects on IQ, in addition to those captured by SES, can be identified within socially disadvantaged samples of African Americans.

Effects of Creativity on Problem-Solving

A meta-analysis by Ma (2009) examined the effects of psychological (e.g., personality, motivation) and environmental (e.g., teacher encouragement, peer competition) variables on creative performance.
The study included 111 empirical studies on creativity.
Two findings are pertinent:
- The mean effect size was stronger for problem-solving creativity (Cohen’s $d = 0.86$ ) compared to verbal creativity ( $0.79$ ), nonverbal creativity ( $0.45$ ), and emotional creativity ( $0.34$ ).
- The strongest effect ( $0.93$ ) was found for defining a problem, which included restating the problem in multiple ways before solving it.
- Overall, Ma (2009) reported a medium weighted average effect size of $0.72$ for these five components.

Kvashchev’s Experiment

Jensen’s views were challenged by Stankov (1986), who reported findings from an intervention study by Kvashchev (1980).
The intervention was conducted in the mid-1970s in two high schools in northern Serbia.
One school was the control (N = 147), and the other was experimental (N = 149).
The experiment started with first-year high school students (average age 15) after eight years of primary schooling.
Students at the experimental school received special classes in creative problem-solving at least once a week.
Teachers were trained by Kvashchev to develop creative thinking exercises for their courses (e.g., mathematics, science, Serbian language).
Stankov (1986) provided detailed information about the experimental procedure, training exercises, and descriptions of the 28 intelligence tests.
All exercises were referred to as training in “creative problem solving”.
The principles included:
- Exercises should combine remote elements.
- Exercises should call for radical reorganization and reformulation of the problem situation.
- Exercises should call for both convergent and divergent thinking operations.
Kvashchev collected exercises from textbooks and journals and presented them to the experimental group in their original form or adapted them to the syllabus.

An Example Problem:

“I was captured by a band of outlaws, and their leader had my hands and legs tied up so that I could not move. They did not gag me up though, and I was able to use my mouth freely. The leader of the gang hung a piece of bread exactly five centimetres away from my mouth…Nevertheless, I managed to free myself. How?”
Students in the experimental group listed as many solutions as possible.
The key was active participation and attention to principles for creative thinking, such as producing imaginative solutions.
Acceptable solutions included blowing at the bread to create a pendulum or assuming favorable conditions like wind moving the rope or tree.
Students in the experimental school received such exercises at least once a week for three years.
Students in the control group attended school as usual.
Both groups completed a battery of 28 intelligence tests.
- Including matrices, verbal analogies, and number series.
All tests were subtests from verbal and nonverbal IQ test batteries known to have acceptable reliabilities, ranging between $0.65$ and $0.80$ (Kvashchev 1980).
- Six nonverbal tests were parts of Cattell’s culture fair battery, and five were part of the Bujas’ test of intelligence.
- The Dominoes D-48 test was also used.
- Two verbal tests were in Serbo-Croatian, developed by students of French and Spearman psychologists.
The first test round was the “Initial Test”.
All participants took the same tests at the end of the experiment (“Final Test”) and again at the beginning of the next year (“1st Retest”) and at the very end of high school (“2nd Retest”).
No creative problem-solving training was given in the last year.
The exercises were not designed to practice the cognitive processes measured by the intelligence tests.
The exercises required a combination of remote elements and a radical reorganization of the problem, relying heavily on divergent thinking processes.
Changes in performance on intelligence tests were attributed to far transfer.
Stankov (1986) presented means and standard deviations for each test on four occasions (Initial, Final, 1st Retest, 2nd Retest).
Averages (means) over the 28 tests are plotted (see Figure 1), representing a global performance measure of intelligence.
Most tests were classified as measures of fluid (Gf) or crystallized (Gc) intelligence.
Initially, the control group had a higher average score than the experimental group.
The experimental group became superior at the end of training and in the retest stages.
ANCOVA showed statistically significant F-values, with the experimental group performing better on 26 out of 28 tests in the 2nd Retest session (Stankov 1986).
The overall conclusion was that the experiment produced statistically significant and positive results.
- Specifically, the improvement of the experimental group at age 18 was, on average, 5.66 IQ points higher than the control group.
- A year later at age 19 (2nd Retest), the improvement of the experimental group was, on average, 7 to 8 IQ points higher than the control group.
- In terms of Cohen’s d effect size criteria, the change represents a ‘medium’ (0.50) effect size (Cohen 1988).
- This led to the cautious conclusion that through training in creative problem-solving it is “… possible to achieve small improvement in performance” (Stankov 1986, p. 209).
Given the nature of the experiment and the wide range of the test batteries, it was believed that far transfer was demonstrated one year after the end of the treatment (at age 19).
However, the overall effect was not seen as sufficiently strong, which led to the questioning tone in the title of Stankov’s (1986) paper, “Can we boost intelligence?”

Methodology and Results: New Insights and Proposed Reanalysis

While Stankov's (1986) findings were accepted, the questioning tone led to it being seen as inconclusive.
Three aspects of the analyses appear to have been too conservative:
- Two ANCOVA analyses were carried out, one based on individuals and the other on class. The latter had a small number of degrees of freedom and didn't identify strong effects on several tests.
- The interpretation focused on the “Final” data and less on the 2nd Retest, although the latter showed longer-lasting and stronger far transfer effects.
- Standard deviations from the Initial testing were used in the calculation of the effect sizes and their IQ equivalent scores.

Statistical Re-evaluation

Using pre-test dispersions was justified because student performance before training was seen as representative of the population.
However, students’ performance was more heterogeneous at the beginning of training, and standard deviations were significantly larger at the beginning.
Kvashchev (1980) noted that reduced post-test variance was due to both experimental and control groups benefitting from high school experience, with stronger effects in the experimental group (especially on participants with low cognitive abilities).
Additionally, high-achieving students might have approached a ceiling level on some tests, which was more pronounced in the experimental group.
Lower heterogeneity of the experimental group at the end of treatment is important for re-analysis of the effect sizes.
The top part of Table 1 shows calculations analogous to those used in Stankov (1986, p. 228) to arrive at the 7 to 8 IQ points differences.
- Standard deviations (4.37 and 4.18) from the Initial testing session were used.
- Using Initial test values alone disregarded smaller variances obtained after four years of schooling.
For comparison, the same statistics were calculated using the 2nd Retest standard deviations (3.19 and 3.79) instead.
- The difference in effect sizes is now 14.28 IQ points, about twice as large as reported in Stankov (1986).
The experimental group showed about 10 IQ points stronger effect size (Cohen’s dav = 0.67) than the control group.
In intelligence research, an IQ score of 15 points is equal to one standard deviation, a 10-point difference cannot be classified as small improvement.
When considering the results for each of the 28 tests, the experimental group showed superior performance compared to the control group on all but one test, namely, the arithmetic test (−0.27 in IQ points).
The IQ difference between the experimental and control groups was the largest (19.49 IQ points) for the word classification test, while six tests showed a difference of 15 IQ points or higher.
In within-subjects studies with pronounced differences between the pre-test and post-test variances, effect sizes expressed in terms of Cohen’s dav may be more appropriate than those based on pre-test variances only.
Even though the tests employed in Kvashchev’s studies were part of intelligence test batteries, the scores that were used in the analyses in this paper are raw scores and not the normed IQ scores.
The difference between the experimental and control groups are analogous to Cohen’s dav values that were rescaled to correspond to the typical IQ metric.

Changes in Fluid (Gf) and Crystallized (Gc) Intelligence

Some tests had stronger effect sizes than others.
The difference across the 28 tests can be attributed to fluid (Gf) or crystallized (Gc) intelligence, and whether the training was more effective for Gf or Gc.
Two papers by Stankov and Chen (1988a, 1988b) used Kvashchev’s experiment data and SEM to assess factorial invariance.
Due to limited computational power, the studies used different test selections (11 tests in [1988a] and eight tests in [1988b]).
Test choice was based on putative cognitive processes captured by the test.
- For example, in Stankov and Chen (1988a), six tests from the Cattell’s culture fair battery were chosen as measures of Gf, while proverbs, verbal analogies, word classification, essential features, and disarranged sentences were chosen as potential measures of Gc.
Separate Gf and Gc factors were fitted, and the factor structure was invariant across the experimental and control groups.
Using factor score means in Stankov and Chen (1988a, 1988b) and procedures analogous to those presented in the bottom part of Table 1, the training effects can be summarized:
- Calculations based on Stankov and Chen (1988a) indicated that the training led to an increase of about 10 IQ points on the Gc factor and about 27 IQ points on the Gf factor.
- Calculations based on Stankov and Chen (1988b) analyses showed that the training led to the increase of about 21 IQ points on the Gc factor and only five IQ points on the Gf factor.
- Overall, across the two studies, the change was about equal on the two factors $(10 + 21)/2 = 15.5$ IQ points on Gc, and $(27 + 5)/2 = 16$ IQ points on Gf.
- In both cases, the improvement was above 15 IQ points.
The selection of tests may make quite a difference to the results.
A closer look at the tests themselves may provide useful clues.
- There was a pronounced difference in the Gf factor in two Stankov and Chen (1988a, 1988b) studies (27 IQ points in the 1986a study and five points in the 1986b study).
- The five Gf tests in the Stankov and Chen (1988a) study were all from the Cattell’s culture fair battery, a well-known test of fluid intelligence.
- On the other hand, only one of the Gf tests in Stankov and Chen (1988b) study (Dominoes-48 test) was an established Gf marker.
- The other three—perceptual reasoning test, multiple solutions tests, and pictorial poly-profile test—appear to contain pronounced visual perception content in addition to the Gf component.
- It is plausible that the perceptual processing component may have led to a less pronounced Gf effect in Stankov and Chen (1988b).

Conclusions and Discussion

Stankov’s (1986) account of the effects of Kvashchev’s training in creative problem-solving on the general factor of intelligence may have been too conservative.
Reanalyses indicate that the experimental group gained at least 10 IQ points more than the control group at the end of the four years.
On some cognitive tests and on properly defined measures of fluid and crystallized intelligence (Stankov and Chen 1988a, 1988b), the advantage of the experimental group was more than 15 IQ points.
The effects can be classified as ‘upper medium’ or ‘large’, following Cohen’s effect size guideline (Cohen 1988).
Two additional training studies—one devoted to creativity development and the other on critical thinking—carried out by Kvashchev over a 27-year period produced similar outcomes, but data were only available for the reported study.
Several implications can be drawn in the context of current views about intelligence, cognitive training, and schooling:
- An important observation was the reduction of the test variances that took place under the influence of training. This may be particularly pronounced in long-lasting interventions during the school years of childhood and adolescence.
- Kvashchev (1980) suggested that the reduced variances may be partly due to the increase in performance (especially among the initially lower-scoring participants) and partly due to reaching the ceiling levels of the tests employed (especially by the highly able participants).
- This reduction in dispersion was evident at all successive retesting sessions, but especially at the later stages of retesting.
- An important lesson points to the need to employ tests of a sufficient range of difficulty levels in similar studies in the future.
- A realistic assessment of the changes in the cognitive abilities of young students would also need to consider the role of maturation in the reduced variances in both experimental and control groups.
- Aside from reduced variances, there is clear evidence of an increase in the mean levels of performance across the occasions of testing.
- Further, that increase was statistically significant in both experimental and control groups (see Stankov 1986).
- Compared to the control group, the effect in the experimental group was larger—a difference of 10 to 15 IQ points—indicating that training in creative problem-solving can improve performance on tests of intelligence.
- Of particular interest to educationists may be the observation that there was a larger increase in the experimental group’s performance a year after the end of the training. Longitudinal studies will be needed to examine longer-lasting effects of training.
- Second, Sauce and Matzel (2018) views about the gene–environment interplay may be relevant to this discussion. They postulated that the malleability of intelligence exists and conclude that “… one can say that IQ has high heritability and a high malleability”. Their evidence was mostly observational and included IQ gains consequent to adoption/immigration, changes in heritability across the lifespan and socio-economic status, the Flynn effect, the slowdown of age-related cognitive decline, and IQ gains via early compensatory education.
- Kvashchev’s work on the effects of training in creative problem-solving provides experimental evidence for the important role of the environment.
- Third, the present study adds valuable information to an ongoing debate within medicine, information technology, and education, as well as the theoretical cognitive psychology, about the effects of training (see Simons et al. 2016; Harvey et al. 2018).
- Kvashchev’s work shows that with prolonged and intensive training in the school environment, far transfer is possible in the cognitive domains that have not been deliberately included in the training protocol.
- Critics of Kvashchev’s work may argue that the experiment was not carried out in a laboratory setting and that Kvashchev’s school-based experiment is inferior to more conventional laboratory interventions because the trained processes were not explicitly defined and thus the effects cannot be unambiguously attributed to specific treatments.
- In the natural environment, it was not possible to control all or many potentially confounding variables, and different aspects of participant schools might have also played a role in addition to that of creative thinking exercises.
- The school environment factor is an important and relevant aspect of the debate since laboratory-based training over several weeks (despite carefully crafted exercises) has produced controversial or mixed outcomes.
- The Kvashchev’s experiment results, based on the comparison to the control group that was similar to the experimental group, coupled with the prolonged training over three years conducted in the natural environment, the presences of the pre-test measures, and the employment of a variety of 28 IQ tests, cannot be easily dismissed.
- We may further claim that the experiments conducted in the actual classroom, on a weekly basis, involving collaboration between the school teachers and the researcher, in fact, strengthen the external validity argument.
- Contemporary studies involving cognitive performance training often do not contain many subtests of WAIS or WISC, let alone as many as 28 tests employed here. Such studies often rely on a single test, like the Raven’s Progressive Matrices.
- The evidence collected from many tests is certainly a better approximation of g than a single or smaller number of tests (Gignac 2015).
- With a variety of cognitive tests, one can expect some variation in the training effects.
- Stankov and Stankov and Chen (1988a, 1986) studies indicate that the broad Gf and Gc factors happen to be about equally affected by Kvashchev’s training.
- However, the differences in the effects of training on Gf factors suggest that visual perceptual processes may be pronounced in one set of Gf tests but less so in the other.
- Thus, it is useful to examine closely varying degrees of training outcomes on a range of different cognitive tests.
- Some contemporary theorists also do not attach particular importance to g.
  - For example, some adherents of the Cattell–Horn–Carroll (CHC) theory have pointed out that the percentage of the variance accounted for by the first principal component is small and can be neglected in favour of the broad factors, such as Gf and Gc (Stankov 2019a).
  - Another group of theorists view g as a formative construct rather than a reflective “source trait”.
  - For example, in Kovacs and Kovacs and Conway (2019) process overlap theory (POT), intelligence was interpreted in a way similar to socio-economic status, i.e., different indices (tests) contribute to its formation but there is no psychological entity underlying it.
  - Although it is not clear how the increase in cognitive performance following the training would be interpreted under this formulation, the importance of IQ is also diminished by the POT (Stankov 2019b).
In conclusion, cognitive abilities captured by intelligence tests are not fixed entities.
- Prolonged and intensive training in creative problem-solving within typical school environments can lead to sizable and positive gains in overall cognitive function in late adolescence (ages 18–19).
Future work needs to focus on theoretical issues, such as identifying elementary cognitive processes that facilitate gains, or practical topics, such as validating the effects of training in creative problem-solving on real-life situations beyond IQ tests.
Training based on computerized games, machine learning, and artificial intelligence may benefit from the use of principles incorporated in creative problem-solving exercises.

This paper rigorously examines the effects of creative problem-solving training on the intelligence of adolescents, specifically focusing on late adolescence (ages 18-19), where cognitive abilities are still developing and can be influenced by various environmental factors. The study revisits Stankov’s comprehensive report on R. Kvashchev’s experimental training, which revealed a statistically significant average increase of seven IQ points across 28 different intelligence tests administered over a three-year training program. This robust training program was carefully designed to enhance cognitive capacities in real-world problem-solving scenarios, particularly through creative methodologies that encourage innovative thinking.

The paper contends that the previous analyses of such training programs were likely overly conservative, as they did not adequately account for the marked reduction in variances observed in IQ test scores post-training. By pooling standard deviations from the initial test and subsequent retests, it was identified that the experimental group performed an astounding average of 10 IQ points higher than their control counterparts at the conclusion of the training. Furthermore, utilizing more nuanced measures of both fluid and crystallized intelligence, the experimental group exhibited a remarkable increase of 15 IQ points when compared to the control group. This highlights the potential for substantial improvements in cognitive intelligence due to prolonged, intensive training centered around creative problem-solving techniques.

Keywords associated with this research include intelligence, training, and creative problem-solving, which encapsulate the critical components of the study.

Background

For decades, the influence of hereditarian thought has profoundly shaped perceptions regarding the malleability of intelligence. Prominent researcher Arthur Jensen (1969) asserted that intelligence is approximately 80% heritable, consequently limiting the perceived impact of environmental factors such as education and training. This perspective has fueled discussions surrounding racial and ethnic differences in IQ and fostered a prevailing belief that cognitive performance training yields minimal transferable benefits, with observable effects limited strictly to practiced tasks. Critics of this stance argue for the possibility of 'far transfer', which refers to the application of learned knowledge to tasks that are considerably different from the training context.

Jensen’s authoritative assertions regarding the heritability of intelligence have had lasting repercussions in intelligence research communities, as seen in the works of Eysenck (1971) and Haier (2014). However, recent advancements in our understanding suggest that the heritability of intelligence indeed increases from approximately 20% in infancy to about 80% during late adulthood (Plomin et al. 2014; Plomin and Deary 2015). This has led to a renewed focus on the significant role that environmental influences can play, particularly in school-aged populations where cognitive training can yield substantial improvements in intelligence.

The Effect of Cognitive Training on Intelligence

Debates are ongoing concerning the efficacy of cognitive training in enhancing cognitive abilities. In a pivotal 2014 analysis, two groups of scientists provided conflicting perspectives on the matter (Simons et al. 2016). One faction maintained that there is no compelling evidence to substantiate the claim that cognitive training can mitigate or reverse cognitive decline, particularly critiquing the touted benefits of working memory training for improving fluid intelligence (Novick et al. 2020). Conversely, the opposing group advocated that cognitive training interventions can substantially boost cognitive function, thereby increasing overall intelligence. Notably, Harvey et al. (2018) demonstrated that computerized cognitive training is indeed effective in facilitating far transfer, yielding measurable improvements in untrained cognitive skills among healthy older individuals and those with schizophrenia. This finding resonates with previous research conducted by Fogarty and Stankov (1982, 1988), which illustrated that participants who engaged in dual-task intelligence assessments enjoyed a comprehensive enhancement in their cognitive performance, leading to an increase in the overall variance observed in tests of fluid intelligence (Stankov 1991).

While some researchers, such as Simons et al. (2016), remained skeptical and aligned with Jensen’s perspective regarding the limited effectiveness of far transfer training, others have pointed out critical weaknesses in intervention studies, including small sample sizes, inadequate or brief training periods, reliance on singular measures of intelligence, and the absence of proper control groups. However, Stankov (1986) highlighted that Kvashchev’s (1980) intervention study successfully sidesteps many of these methodological pitfalls due to its considerable sample size (N close to 300), extensive duration (over three years), varied 28 intelligence measures employed, and random assignment of school classes to both control and treatment groups. Given the positive implications of Kvashchev’s findings, it is posited that training protocols which are carefully constructed and delivered over an extended period can yield meaningful cognitive benefits.

Recent studies have also disputed Jensen’s (1969) assertions regarding the ineffectiveness of compensatory education in enhancing intelligence. A comprehensive meta-analysis indicated an average rise of 3.4 IQ points for each additional year of schooling (Ritchie and Tucker-Drob 2018). This observation was further supported by Hegelund et al. (2020) in Denmark, which discovered an average increase of 4.3 IQ points per year of education during young adulthood and a 1.3 point increase in mid-life adults. Importantly, their findings revealed that individuals exhibiting lower intelligence scores in childhood experienced the most substantial benefits, underscoring the value of educational interventions in promoting cognitive development, particularly among disadvantaged groups. These revelations suggest that increasing formal schooling correlates positively with enhanced cognitive capacities among children and young adults, challenging fixed notions of intelligence.

The robust hereditarian argument has also been scrutinized through investigations into how socioeconomic status (SES) can influence intelligence. Selita and Kovas (2019) contend that social inequality and policy frameworks significantly mold the heritability of educational attainment, arguing that heritability rates were higher in communities characterized by greater equity. Consequently, this implies that social imbalances can inhibit the realization of innate cognitive potentials. This notion finds corroboration in the findings presented by Weiss and Saklofske (2020), who emphasized environmental rather than hereditary contributions to intelligence differences among diverse groups, particularly notable within socioeconomically disadvantaged African American communities.

Effects of Creativity on Problem-Solving

A pivotal meta-analysis conducted by Ma (2009) investigated how psychological factors (such as personality and motivation) and environmental variables (like teacher support and peer competition) can influence creative performance. This extensive study encompassed 111 empirical studies on creativity and arrived at crucial insights: notably, the mean effect size for problem-solving creativity was considerably higher (Cohen’s $d = 0.86$ ) compared to verbal creativity ( $0.79$ ), nonverbal creativity ( $0.45$ ), and emotional creativity ( $0.34$ ). The most pronounced effect ( $0.93$ ) was associated with the ability to define problems effectively, including restating problems in multiple ways prior to arriving at solutions. Across all five evaluated components, Ma (2009) summarized a medium weighted average effect size of $0.72$ , highlighting the varied influences of creative training across different types of cognitive tasks.

Kvashchev’s Experiment

The entrenched beliefs regarding heritability in intelligence were rigorously challenged by Stankov (1986) through his analysis of Kvashchev’s (1980) intervention study, conducted in Serbian high schools during the mid-1970s. The experimental design involved one control school (N = 147) and one experimental school (N = 149), engaging first-year high school students (average age 15) post eight years of primary education. Notably, students in the experimental group participated in weekly classes dedicated to creative problem-solving, while educators were trained by Kvashchev to curate exercises aimed at fostering creative thinking across various subjects—including mathematics, science, and the Serbian language.

Stankov meticulously documented the intervention methodology, enumerating the specific training exercises and the diverse range of 28 intelligence tests implemented. Each exercise was meticulously crafted around the foundational principles of creative problem-solving, which emphasized:

The integration of remote elements into exercises,
The reorganization and reformulation of the problem situation,
The invocation of both convergent and divergent thinking processes.

Kvashchev’s training program utilized a myriad of exercises gleaned from numerous textbooks and academic journals, presenting them to the experimental cohort in both original and tailored formats.

An Example Problem:

One illustrative problem posed to students was: “I was captured by a band of outlaws, and their leader had my hands and legs tied up so that I could not move. They did not gag me up though, and I was able to use my mouth freely. The leader of the gang hung a piece of bread exactly five centimeters away from my mouth…Nevertheless, I managed to free myself. How?”
Students in the experimental group were tasked with brainstorming as many potential solutions as they could conceive. The goal was to promote active engagement with problem definitions and to harness principles of creative thinking, nurturing the generation of imaginative solutions. Acceptable responses ranged from blowing at the bread to create a pendulum to hypothesizing favorable conditions such as wind affecting the rope or tree.
Throughout the three-year program, students in the experimental group were exposed to such exercises on a minimum of a weekly basis, while the control group followed the traditional curriculum. Both cohorts subsequently took a battery of 28 intelligence assessments, which encompassed variables such as verbal analogies, number series, and matrices. Importantly, the tests were meticulously selected from reliable verbal and nonverbal IQ batteries, gauging reliability scores between $0.65$ and $0.80$ (Kvashchev 1980), with several nonverbal assessments sourced from Cattell’s culture-fair battery.

At the outset, participants undertook an “Initial Test” to establish baseline performance metrics. Upon conclusion of the experiment, participants retook the same tests (“Final Test”) along with a follow-up assessment at the beginning of the subsequent year (“1st Retest”) and a concluding assessment at the end of high school (“2nd Retest”). It is significant to note that no further creative problem-solving training was conducted during the final year of high school; moreover, the exercises were not structured to directly practice the cognitive processes evaluated by the intelligence assessments.

The exercises necessitated the synthesis of remote elements and demanded a radical reconfiguration of the problem at hand, relying heavily on divergent cognitive methodologies. Stankov (1986) subsequently outlined the means and standard deviations recorded for each test across the four assessment occasions (Initial, Final, 1st Retest, 2nd Retest), presenting an aggregated overview of intelligence performance. Initially, the control group exhibited higher average scores compared to the experimental group. However, by the conclusion of the training and during the retesting phases, the experimental group showcased a significant edge in performance.

Employing ANCOVA, significant F-values were established, demonstrating that the experimental group outperformed their control peers on 26 out of 28 assessments during the 2nd Retest (Stankov 1986). The overarching conclusion drawn from this research was that the experimental conditions engendered marked and statistically relevant improvements, with members of the experimental group recording a net average advancement of 5.66 IQ points over controls at age 18, extending to a remarkable 7 to 8 IQ point advantage by age 19 (2nd Retest). Notably, utilizing Cohen’s effect size criteria, the observed change was classified as a ‘medium’ effect size (0.50) (Cohen 1988). Consequently, it can be cautiously inferred that structured training in creative problem-solving significantly enhances overall cognitive performance.

While the experiment was initially perceived to demonstrate insufficiently robust effects, further statistical analysis indicated that Kvashchev’s training regime indeed facilitated far transfer, evident even one year post intervention (at age 19). However, Stankov’s (1986) inquiry regarding the question, “Can we boost intelligence?” reflects the critical examination and cautious interpretation of these findings, acknowledging the complexities inherent in intelligence research.

Methodology and Results: New Insights and Proposed Reanalysis

Despite the acceptance of Stankov’s (1986) findings, his paper's inquisitive tone contributed to a perception of inconclusiveness. A detailed reexamination reveals three critical factors where the analyses may have been overly cautious:

Two ANCOVA methodologies were implemented—one focusing on individual assessments and the other on classroom-wide evaluations. The latter, constrained by a reduced number of degrees of freedom, failed to ascertain substantial effects across a range of tests.
The analyses predominantly centered on “Final” data, which neglected the heightened and prolonged transfer effects showcased in the 2nd Retest.
The calculations of effect sizes and their corresponding IQ changes were based on standard deviations from the Initial testing phase.

While referencing pre-test variances was justified,as student performance prior to training was assumed to be reflective of the wider population characteristics, the performance heterogeneity observed at the training's inception was relatively large—implying substantial discrepancies across student performances.

Stankov (1986) noted that reduced levels of variance in post-training assessments resulted from the experiential advancements achieved in both experimental and control groups through high school exposure, with more substantial effects realized in the experimental cohort (particularly regarding students with pre-existing low cognitive scores). The findings suggest that high-achieving students may have approached a ceiling effect, particularly within the experimental group, thereby complicating the variance evaluations. The diminished variability within the experimental group at the treatment’s conclusion serves as a critical consideration for reanalyzing the effect sizes observed.

For comparative purposes, re-calculations using the 2nd Retest standard deviations (3.19 and 3.79) indicate that the difference in effect sizes reaches approximately 14.28 IQ points—nearly double the values initially reported (Stankov 1986). The experimental group exhibited a robust IQ enhancement of approximately 10 points relative to the control cohort. In IQ research, a 15-point difference signifies one standard deviation; hence a 10-point discrepancy cannot be downplayed as a trivial or inconsequential improvement. When evaluated across the 28 individual tests, the experimental group demonstrated superior performances on nearly all tests except for a solitary arithmetic assessment (−0.27 IQ points). Notably, the most considerable IQ gap between groups (19.49 IQ points) emerged in the context of the word classification test, with six assessments yielding differences of 15 IQ points or greater.

In situations where pronounced variances exist between pre-test and post-test evaluations, utilizing effect sizes computed as Cohen’s dav may yield more accurate representations compared to those founded solely on pre-test variances. Despite the fact that tests utilized in Kvashchev’s investigations formed components of existing intelligence test batteries, the scores analyzed herein are raw scores rather than their corresponding normed IQ values. Differences between the two groups correlate with Cohen’s dav values, appropriately scaled to align with the conventional IQ metric.

Changes in Fluid (Gf) and Crystallized (Gc) Intelligence

Different tests exhibited varying degrees of effectiveness, with certain assessments yielding more substantial effect sizes compared to others. The differential impact across the 28 intelligence tests can be aligned with either fluid (Gf) or crystallized (Gc) intelligence, reflecting the effectiveness of training on each cognitive aspect. Stankov and Chen (1988a, 1988b) conducted assessments analyzing Kvashchev's experimental data and employed Structural Equation Modeling (SEM) to verify factorial invariance related to intelligence.

Due to computational limitations, distinct test selections were employed (11 tests in the 1988a study and eight tests in the 1988b study), with test selection contingent on the cognitive processes purportedly measured. For instance, in Stankov and Chen (1988a), six tests drawn from Cattell’s culture-fair battery were classified as measures of Gf, while other tests selected (including proverbs and verbal analogies) were indicative of Gc.

The data indicated that the training resulted in an estimated increase of approximately 10 IQ points concerning the Gc factor, coupled with an expansive increase of approximately 27 IQ points relative to the Gf factor, as derived from Stankov and Chen (1988a). Further analysis, aligned with Stankov and Chen (1988b), suggested a 21 IQ point enhancement in Gc and roughly 5 IQ points in Gf, leading to an aggregate increase of approximately 15.5 IQ points for Gc and 16 IQ points for Gf across both analyses, illustrating notable improvements exceeding 15 IQ points on both intelligence dimensions.

The selection of specific intelligence tests could vastly influence reported results, necessitating a thorough exploration of their properties. Notably, pronounced variability in Gf effects was observed between the two studies conducted by Stankov and Chen (27 IQ points in 1988a versus 5 IQ points in 1988b). The five Gf tests utilized in the first study predominantly drew from Cattell’s culture-fair battery, a renowned assessment for evaluating fluid intelligence; conversely, limited representation of established Gf markers was observed in the second study.

Conclusions and Discussion

This comprehensive exploration demonstrates that the narrative surrounding Stankov’s (1986) depiction of Kvashchev's creative problem-solving intervention and its impact on the general factor of intelligence might have been excessively conservative. Reanalyses propose that the experimental group profited from an IQ increase of no less than 10 points over the control group by the conclusion of the four-year training initiative. In certain contexts and on meticulously defined measures for both fluid and crystallized intelligence (evidenced in studies by Stankov and Chen 1988a, 1988b), the experimental group enjoyed advantages extending beyond 15 IQ points.

These enhancements are categorized within the ‘upper medium’ to ‘large’ effect size range, following Cohen’s guidelines (Cohen 1988). Notably, two additional training studies—one exploring creative development and the other focused on critical thinking—conducted by Kvashchev across 27 years, corroborate these findings; however, data from these studies were unavailable for this particular report.

Several implications arise from this body of research concerning contemporary beliefs about intelligence, cognitive training, and educational practices. A pivotal observation was the significant reduction in test variances observed under extended training influence. This reduction appears most relevant in long-term interventions targeting young learners during their formative school experiences. Kvashchev (1980) hypothesized that the lessened variances were attributable to both enhanced performance levels (particularly among participants initially scoring lower) and the attainment of ceiling effects on utilized tests by higher-achieving participants.

Such variance reductions were universally evident during successive retesting sessions, with marked effects noted at later testing intervals. Consequently, future studies highlight the necessity for employing tests encompassing a sufficiently challenging difficulty spectrum. Moreover, effectively assessing changes in young students’ cognitive capabilities must consider maturation's role in reduced variances across both experimental and control populations.

Evidence suggests a statistically significant increase in average performance across testing occasions; notably, the experimental group demonstrated significant enhancements relative to the control cohort, underscoring that creative problem-solving training can facilitate measurable gains in intelligence tests. Of particular educational interest is the observation that substantial improvements were apparent among the experimental group a full year after the training concluded. Further longitudinal studies are warranted to scrutinize the enduring impacts associated with such training.

Notably, Sauce and Matzel’s (2018) perspectives on the interplay between genetics and environment can be integrally related to this discussion, as they assert that intelligence possesses both high heritability and malleability potential. Their substantive evidence includes IQ advancements attributable to adoption or immigration, variations in heritability across different life stages, socioeconomic impacts, the Flynn effect, the deceleration of age-associated cognitive decline, and improvements resulting from early educational interventions.

In summary, Kvashchev’s investigative work exploring the ramifications of creative problem-solving training serves as a pivotal experimental foundation within cognitive research. Findings indicate an environment's pronounced role in shaping intelligence. Overall, the results of this comprehensive study reveal that sustained and dedicated training in creative problem-solving within traditional educational settings holds the potential to significantly enhance overall cognitive capacities among adolescents, showcasing a promising avenue for future educational practices and cognitive developments.