This article provides a comprehensive comparison of the Klimisch and CRED methods for evaluating the reliability and relevance of ecotoxicity studies, targeting researchers, scientists, and drug development professionals.
This article provides a comprehensive comparison of the Klimisch and CRED methods for evaluating the reliability and relevance of ecotoxicity studies, targeting researchers, scientists, and drug development professionals. It explores the foundational evolution from the established Klimisch method to the more detailed CRED framework, delves into their methodological application and criteria, addresses common troubleshooting and optimization strategies, and validates their performance through comparative ring test analysis. The synthesis highlights CRED's advantages in transparency, consistency, and practical utility for harmonizing chemical hazard and risk assessments across regulatory frameworks.
The regulatory assessment of chemicals hinges on the quality of the underlying ecotoxicity data. For decades, the Klimisch method has been the cornerstone for evaluating study reliability, yet its dependence on expert judgment and lack of explicit relevance criteria have raised concerns about consistency and transparency. This has spurred the development of the Criteria for Reporting and Evaluating ecotoxicity Data (CRED) method. Framed within a broader thesis comparing these two paradigms, this article details the application and protocols that underscore the critical role of data reliability and relevance, providing researchers and drug development professionals with the tools to implement robust, science-based evaluations.
The following tables synthesize key quantitative findings from a comprehensive ring test comparing the Klimisch and CRED evaluation methods[reference:0].
Table 1: Structural Characteristics of the Evaluation Methods[reference:1]
| Characteristic | Klimisch Method | CRED Method |
|---|---|---|
| Data Type | Toxicity and ecotoxicity | Aquatic ecotoxicity |
| Number of Reliability Criteria | 12–14 (ecotoxicity) | 20 (evaluating), 50 (reporting) |
| Number of Relevance Criteria | 0 | 13 |
| OECD Reporting Criteria Included | 14 of 37 | 37 of 37 |
| Additional Guidance | No | Yes |
| Evaluation Summary | Qualitative (reliability only) | Qualitative (reliability & relevance) |
Table 2: Reliability Categorization Outcomes from the Ring Test[reference:2]
| Reliability Category | Klimisch Method (% of evaluations) | CRED Method (% of evaluations) |
|---|---|---|
| Reliable without restrictions (R1) | 8% | 2% |
| Reliable with restrictions (R2) | 45% | 24% |
| Not reliable (R3) | 42% | 54% |
| Not assignable (R4) | 6% | 20% |
Table 3: Relevance Categorization Outcomes from the Ring Test[reference:3]
| Relevance Category | Klimisch Method (% of evaluations) | CRED Method (% of evaluations) |
|---|---|---|
| Relevant without restrictions (C1) | 32% | 57% |
| Relevant with restrictions (C2) | 61% | 35% |
| Not relevant (C3) | 7% | 8% |
Table 4: Risk Assessor Confidence in Evaluation Results[reference:4]
| Confidence Level | Klimisch Method | CRED Method |
|---|---|---|
| "Very confident" or "Confident" | 37% | 72% |
This protocol details the two-phase ring test designed to compare the Klimisch and CRED evaluation methods[reference:5].
This table lists key tools and resources required for implementing rigorous reliability and relevance evaluations in chemical risk assessment.
| Tool/Resource | Function & Purpose | Key Features / Examples |
|---|---|---|
| CRED Evaluation Checklist | Provides the structured criteria for evaluating study reliability (20 items) and relevance (13 items). Ensures transparent and consistent assessments. | Available as Excel-based assessment sheets from the SciRAP platform[reference:12]. |
| OECD Test Guidelines (TGs) | Define standardized test protocols for generating ecotoxicity data. Serve as the benchmark for evaluating methodological reliability. | e.g., TG 201 (Algal growth inhibition), TG 210 (Fish early-life stage), TG 211 (Daphnia reproduction). |
| Statistical Analysis Software | Enables the statistical comparison of evaluation outcomes and analysis of ring-test or validation study data. | R or Python packages for non-parametric tests (e.g., Wilcoxon rank-sum), consistency analysis, and data visualization. |
| Good Laboratory Practice (GLP) Standards | Define a quality system for the organizational process and conditions under which non-clinical safety studies are planned, performed, and reported. | While not a guarantee of reliability, GLP compliance is a weighted factor in many evaluation frameworks. |
| Reference Databases & Reporting Standards | Provide access to published ecotoxicity studies and define minimum reporting requirements to ensure all necessary information is available for evaluation. | Databases like ECOTOX (US EPA); Reporting standards like the ARRIVE guidelines for in vivo studies. |
| Specialized CRED Extensions | Tailor the evaluation framework for novel data types or specific environmental compartments. | NanoCRED (for nanomaterials), EthoCRED (for behavioral studies), CRED for sediment and soil studies[reference:13]. |
In the mid-1990s, the regulatory landscape for chemical safety, particularly under the European Union's Existing Substances Regulation, faced a significant challenge: the need for a harmonized, systematic approach to evaluate the vast and inconsistent body of toxicological and ecotoxicological data submitted by industry [1]. Prior to 1997, assessments relied heavily on unstructured expert judgment, leading to potential inconsistencies and a lack of transparency in how data influenced regulatory decisions [2]. It was within this context that scientists H.J. Klimisch, M. Andreae, and U. Tillmann from BASF AG introduced a seminal framework. Their 1997 paper, "A systematic approach for evaluating the quality of experimental toxicological and ecotoxicological data," proposed a standardized scoring system designed to categorize the reliability of individual studies [3] [4]. The Klimisch method was developed explicitly to fulfill regulatory obligations, providing a structured process for assessing data for entry into the IUCLID database (International Uniform Chemical Information Database), which was becoming the central repository for regulatory information on chemicals [1]. Its primary objective was to bring clarity and consistency to the hazard assessment process, enabling regulators and industry scientists to distinguish between high-quality guideline studies and those with methodological limitations [3].
The Klimisch method's innovation lies in its simplified, four-category classification system for judging study reliability. The assignment is based on a study's adherence to international testing standards (like OECD guidelines), Good Laboratory Practice (GLP), and the completeness of its documentation [5] [6].
Table 1: The Klimisch Scoring Categories and Criteria
| Klimisch Score | Category Title | Core Assignment Criteria |
|---|---|---|
| 1 | Reliable without restriction | Studies performed according to internationally accepted testing guidelines (e.g., OECD, EPA) preferably under GLP, or where all parameters are closely comparable to a guideline method [5] [6]. |
| 2 | Reliable with restriction | Studies where test parameters do not fully comply with a specific guideline but are sufficiently documented and scientifically acceptable. This also includes well-documented non-guideline studies and accepted calculation methods [5] [6]. |
| 3 | Not reliable | Studies with significant methodological deficiencies, use of irrelevant test systems, or insufficient documentation that precludes a credible assessment [5]. |
| 4 | Not assignable | Studies where experimental details are absent, such as those found only in short abstracts or secondary literature sources [5]. |
In regulatory practice, only studies scoring 1 or 2 are typically considered sufficient on their own to fulfill a data requirement for a specific hazard endpoint [5] [6]. Studies scored as 3 (Not reliable) or 4 (Not assignable) may still be used in a supporting role or as part of a "weight of evidence" approach but cannot be the primary basis for a decision [5].
The method also introduced important definitions for reliability, relevance, and adequacy. Reliability was defined as "the inherent quality of a test report... relating to preferably standardized methodology and the way the experimental procedure and results are described to give evidence of the clarity and plausibility of the findings" [7]. This definition intrinsically links reliability to standardized methods and reporting completeness.
Diagram 1: Klimisch Method Evaluation Workflow (Max 760px)
Following its publication, the Klimisch method was rapidly adopted as the de facto standard within emerging EU regulatory frameworks. Its integration was driven by a clear need for efficiency and harmonization. The method became formally recommended in the Technical Guidance for the REACH regulation (Registration, Evaluation, Authorisation and Restriction of Chemicals), which required registrants to evaluate and report the quality of all submitted study data [1]. Its structured approach allowed for the consistent organization of thousands of studies within the IUCLID database, making hazard assessments "scientifically valid, repeatable and... consistent across substances" [1].
To address criticisms about the lack of detailed guidance for applying the Klimisch categories, supporting tools were developed. The most notable is the ToxRTool (Toxicological data Reliability Assessment Tool), an Excel-based checklist released by the European Centre for the Validation of Alternative Methods (ECVAM) [6] [7]. ToxRTool breaks down the evaluation into specific criteria across areas like test substance identification, study design, and results documentation, providing a more transparent and consistent path to assigning a Klimisch score [7].
The method's influence also expanded beyond its original scope. Researchers proposed adaptations to systematically evaluate the quality of human epidemiological data, creating a parallel framework that mirrored the Klimisch categories to allow for the combined assessment of human and animal studies within weight-of-evidence evaluations [1].
The Klimisch method's historical role as the backbone of data evaluation is now critically examined in comparison to newer, more detailed frameworks like the Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) developed in 2016 [8].
CRED was developed explicitly to address perceived shortcomings in the Klimisch method, which were highlighted through ring-tests and scholarly critique [9] [2]. Key criticisms of the Klimisch method include its over-reliance on GLP and guideline adherence, which can cause assessors to overlook methodological flaws in guideline studies or undervalue sound non-guideline research [2]. It was also criticized for providing insufficient guidance, leading to inconsistencies between evaluators, and for lacking explicit criteria to assess a study's relevance (its appropriateness for a specific hazard assessment) [8] [2].
Table 2: Comparison of the Klimisch Method and the CRED Framework
| Feature | Klimisch Method (1997) | CRED Framework (2016) |
|---|---|---|
| Primary Scope | Toxicological & Ecotoxicological data [3]. | Aquatic ecotoxicity data (with extensions for nano, behavior, sediment) [8] [10]. |
| Evaluation Dimensions | Reliability (4 categories) [5]. | Reliability (20 criteria) & Relevance (13 criteria) [8] [2]. |
| Guidance Detail | Limited, high-level criteria [2]. | Extensive guidance for each criterion [8]. |
| Bias Toward GLP/Guidelines | High; GLP/guideline studies are favored for top scores [2]. | Reduced; evaluates methodological soundness independent of formal compliance [9]. |
| Outcome Transparency | Single category score [5]. | Detailed qualitative summary of strengths/weaknesses in reliability and relevance [2]. |
| Regulatory Status | Embedded in REACH guidance & IUCLID [1] [5]. | Piloted in EU EQS revisions and scientific databases (e.g., NORMAN) [9]. |
A major ring test involving 75 risk assessors from 12 countries found that participants perceived the CRED method as more accurate, consistent, transparent, and less dependent on subjective expert judgment than the Klimisch method [8] [2]. CRED's structured criteria aim to make the evaluation process more reproducible and its conclusions more transparent.
Diagram 2: Conceptual Evolution from Klimisch to CRED (Max 760px)
Purpose: To systematically assign a reliability score to a toxicological/ecotoxicological study for inclusion in a regulatory dossier (e.g., REACH, biocides).
Materials & Tools:
Procedure:
The Scientist's Toolkit: Essential Resources for Study Evaluation
| Tool/Resource | Function in Evaluation | Key Features & Notes |
|---|---|---|
| OECD Guidelines for the Testing of Chemicals | The international benchmark for test methodology. Used as the primary reference to assess study design compliance [5]. | Provides detailed protocols for specific endpoints (e.g., acute toxicity, repeated dose). |
| ToxRTool (ECVAM) | An Excel-based tool to standardize the Klimisch scoring process for in vivo and in vitro studies [6] [7]. | Uses weighted criteria checklists to generate a score. Improves consistency between evaluators. |
| IUCLID Database | The regulatory data management system for chemicals under REACH and other frameworks. The Klimisch score is a mandatory field for each study record [1] [5]. | Ensures evaluation rationale is systematically archived and reviewed by authorities. |
| CRED Evaluation Method (Excel Tool) | A modern, criteria-based tool for evaluating aquatic ecotoxicity studies. Useful as a more detailed reference even when a Klimisch score is required [9] [10]. | Contains 20 reliability and 13 relevance criteria with extensive guidance. |
| GLP Principles (OECD Series) | Defines standards for organizational process and study conditions. Not a measure of scientific validity, but a key indicator of process quality for regulators [5]. | Focuses on data traceability, quality assurance, and reporting integrity. |
The Klimisch method emerged in 1997 as a pragmatic, harmonizing solution to an urgent regulatory need. By introducing a simple, four-tiered scoring system focused on reliability, it brought unprecedented structure to chemical hazard assessment and became deeply embedded in EU regulations like REACH [1] [5]. Its historical role as the backbone of data evaluation is undeniable. However, the evolution of best practices in scientific assessment has revealed its limitations, particularly regarding transparency, detail, and the separation of reliability from relevance. The development of the CRED framework represents a direct and evidence-based response to these limitations, offering a more granular, transparent, and consistent approach for the modern era [8] [2]. For today's researcher, understanding the Klimisch method is essential for navigating historical data and existing regulatory systems, while familiarity with CRED principles points toward the future of more robust and reproducible ecotoxicological risk assessment.
The evaluation of toxicological and ecotoxicological data is a cornerstone of regulatory hazard and risk assessment for chemicals, pharmaceuticals, and other substances [11]. For over two decades, the method proposed by Klimisch and colleagues in 1997 has served as the de facto standard for assessing study reliability within many regulatory frameworks, notably the European Union's REACH regulation [5] [12]. The Klimisch method assigns studies to one of four categories: "Reliable without restriction" (1), "Reliable with restriction" (2), "Not reliable" (3), and "Not assignable" (4) [5] [6]. Generally, only studies scoring 1 or 2 are used to cover an endpoint independently, while scores of 3 or 4 may serve as supporting evidence [5].
Despite its widespread adoption, the Klimisch method has faced sustained and growing criticism from the scientific community. Critiques center on its inconsistency among assessors, lack of detailed operational guidance, and an inherent bias favoring studies conducted under Good Laboratory Practice (GLP) [5] [11] [12]. These shortcomings can introduce subjectivity, limit the use of valuable peer-reviewed science, and ultimately compromise the transparency and robustness of regulatory decisions [11] [7].
In response, alternative, more structured evaluation tools have been developed. The most prominent is the Criteria for Reporting and Evaluating ecotoxicity Data (CRED) method [11] [12]. Developed through a collaborative, international effort, CRED aims to provide a transparent, consistent, and detailed framework for evaluating both the reliability and relevance of aquatic ecotoxicity studies [12]. This article details the emerging criticisms of the Klimisch method, contrasts its framework with the CRED approach, and provides application protocols to guide researchers and risk assessors in implementing robust study evaluation practices.
The criticisms of the Klimisch method can be organized into three principal, interconnected deficiencies that undermine its utility for modern, evidence-based risk assessment.
The Klimisch method provides only broad category descriptions, lacking a concrete checklist of criteria to assess [5] [11]. This absence of standardized operational guidance places a heavy burden on expert judgment, leading to low inter-assessor consistency. A study may be categorized as "reliable with restrictions" by one assessor and "not reliable" by another, creating significant discrepancies in the data used for hazard characterization [11]. This inconsistency directly threatens the reproducibility and fairness of regulatory outcomes, as the same evidence base can yield different conclusions depending on the assessor [12].
The method's original formulation is brief, offering minimal guidance on how to interpret its categories or evaluate specific study elements [7]. It does not explicitly assess fundamental study design criteria such as randomization, blinding, sample size calculation, or statistical power [5]. Furthermore, it entirely omits a structured process for evaluating a study's relevance—the extent to which its design and findings are appropriate for a specific regulatory question (e.g., assessing chronic effects of an endocrine disruptor using an acute mortality test) [11] [12]. This conflation or neglect of relevance forces assessors to rely on unspecified personal judgment.
The Klimisch method explicitly prioritizes studies performed "according to generally valid and/or internationally accepted testing guidelines (preferably performed according to GLP)" [5]. This has been criticized for institutionalizing a bias that equates procedural compliance with scientific validity [11] [12]. A GLP-compliant guideline study can receive a high reliability score even if it contains fundamental scientific flaws (e.g., excessive control mortality) [5] [11]. Conversely, a well-designed and thoroughly reported peer-reviewed study may be downgraded solely for not following GLP [12]. This bias can marginalize independent academic research and restrict the evidence base primarily to industry-submitted studies [11].
The CRED method was developed explicitly to address the flaws in the Klimisch approach. A comparative analysis highlights fundamental differences in structure, application, and outcome.
Table 1: Foundational Comparison of the Klimisch and CRED Evaluation Methods
| Aspect | Klimisch Method | CRED Evaluation Method |
|---|---|---|
| Primary Purpose | Assign a reliability category for regulatory acceptance [5]. | Evaluate reliability and relevance separately to support transparent risk assessment [12]. |
| Core Components | Four reliability categories (1-4) with brief descriptive definitions [5] [6]. | 20 reliability criteria and 13 relevance criteria, each with detailed guidance [12]. |
| Evaluation of Relevance | Not formally addressed [11]. | Structured evaluation across 13 criteria (e.g., test organism, endpoint, exposure scenario) [12]. |
| Basis for Decision | High-level expert judgment based on adherence to guidelines/GLP [5]. | Systematic assessment against predefined, detailed criteria [11]. |
| Tool Format | Descriptive text; often applied via tools like ToxRTool [6]. | Criteria checklist with extensive guidance; supported by worksheets [12]. |
| Handling of GLP | GLP/guideline compliance is a primary determinant of high reliability [5]. | GLP is one factor among many; scientific rigor and reporting quality are paramount [11]. |
Empirical evidence from a major international ring test underscores the practical impact of these differences. The test involved 75 risk assessors from 12 countries evaluating ecotoxicity studies using both methods [11].
Table 2: Key Quantitative Findings from the CRED-Klimisch Ring Test [11]
| Evaluation Metric | Finding | Implication |
|---|---|---|
| Inter-assessor Consistency | Lower consistency with Klimisch. Higher agreement among assessors when using the structured CRED criteria. | CRED reduces subjectivity and promotes harmonized assessments. |
| Perceived Accuracy & Transparency | A majority of ring test participants perceived CRED as more accurate and transparent. | Structured criteria build confidence in the evaluation process and its conclusions. |
| Dependence on Expert Judgement | CRED was perceived as less dependent on subjective expert judgement. | Reduces variability and potential for unconscious bias. |
| Practicality | CRED was rated as practical regarding time and use of criteria. | A more detailed method can be efficient and user-friendly. |
| Study Categorization Outcome | Evaluations could lead to different final classifications for the same study. | The choice of method can directly alter the data entering a risk assessment. |
Given the Klimisch method's lack of native guidance, the ToxRTool (Toxicological data Reliability Assessment Tool) is frequently used as an operational intermediary [6] [7]. The following protocol is recommended for a standardized assessment.
Objective: To assign a Klimisch reliability score (1-4) to a toxicological or ecotoxicological study report or publication. Materials: Study to be evaluated; ToxRTool (Excel-based tool for in vivo or in vitro studies). Procedure:
Workflow for Klimisch Scoring Using the ToxRTool Intermediary
The CRED method involves separate, parallel evaluations of reliability and relevance.
Objective: To transparently evaluate and document the reliability and relevance of an aquatic ecotoxicity study for a defined regulatory purpose. Materials: Study to be evaluated; CRED evaluation worksheet (Excel); CRED guidance document [12]. Procedure:
Dual-Path Evaluation Workflow of the CRED Method
Conducting and evaluating toxicological studies requires both physical reagents and methodological tools. Below is a table of key solutions for professionals in this field.
Table 3: Key Research Reagent Solutions for Toxicology Study Execution and Evaluation
| Tool/Reagent Category | Specific Example/Name | Primary Function in Research/Evaluation |
|---|---|---|
| Evaluation Framework | CRED Evaluation Worksheets [12] | Provides the structured checklist and guidance for performing transparent reliability and relevance assessments. |
| Evaluation Framework | ToxRTool [5] [6] [7] | An Excel-based tool that operationalizes Klimisch scoring with defined criteria, aiding consistency. |
| Evaluation Framework | SciRAP Tool [7] | An online resource for evaluating and reporting in vivo (eco)toxicity studies, promoting structured assessment. |
| Reporting Guideline | CRED Reporting Recommendations [12] | A checklist of 50 criteria across 6 categories to guide researchers in publishing regulatory-ready studies. |
| In Silico Predictive Tool | QSAR Models (e.g., VEGA, EPI Suite) [13] | Provides predicted data for persistence, bioaccumulation, and toxicity to fill gaps, especially for data-poor substances. |
| Reference Management | IUCLID Database [5] | The standard software for compiling, evaluating, and submitting chemical data under REACH; includes Klimisch score fields. |
| Regulatory Guideline | OECD Test Guidelines [5] [11] | Internationally agreed test methods defining standard protocols for generating reliable safety data. |
| Quality System | Good Laboratory Practice (GLP) [14] | A quality management system ensuring studies are planned, performed, monitored, and reported to high standards of traceability. |
The Klimisch method played a historic role in introducing systematic thinking to toxicological data evaluation. However, its structural deficiencies—inconsistency, lack of guidance, and GLP bias—render it inadequate for contemporary demands for transparency, reproducibility, and comprehensive evidence integration in regulatory science [11] [12]. Empirical evidence demonstrates that the CRED evaluation method effectively mitigates these shortcomings by providing a detailed, criteria-based framework that separately assesses reliability and relevance [11].
For researchers, adopting the CRED reporting recommendations during study design and publication increases the likelihood that their work will meet regulatory standards [12]. For risk assessors and drug development professionals, transitioning from the Klimisch method to structured tools like CRED or SciRAP is a critical step toward more robust, objective, and defensible hazard and risk assessments. This evolution supports a broader and more rigorous evidence base, ultimately leading to better-informed decisions that protect human health and the environment.
The regulatory assessment of chemicals hinges on the robust evaluation of ecotoxicity studies. For decades, the Klimisch method, established in 1997, served as the cornerstone for determining study reliability within frameworks like REACH and the Water Framework Directive [2]. It categorizes studies as "reliable without restrictions," "reliable with restrictions," "not reliable," or "not assignable" [2]. However, mounting evidence has revealed critical shortcomings in this approach. Its limited criteria and lack of guidance for relevance evaluation lead to inconsistent assessments that depend heavily on individual expert judgment [2]. This inconsistency can directly impact risk assessments, potentially resulting in underestimated environmental hazards or unnecessary mitigation costs [2].
The Criteria for Reporting and Evaluating ecotoxicity Data (CRED) project emerged from a 2012 initiative to address these deficiencies [2]. Its genesis was driven by the need for a transparent, science-based, and harmonized method that could systematically evaluate both the reliability and relevance of aquatic ecotoxicity studies. Developed through international collaboration and expert consultation, CRED aims to strengthen the consistency and robustness of environmental hazard assessments, thereby supporting more reliable regulatory decisions [9]. This document provides detailed application notes and experimental protocols for implementing the CRED method, framed within a broader research thesis comparing it to the traditional Klimisch approach.
The fundamental difference between the Klimisch and CRED methods lies in their structure, granularity, and scope. The following table summarizes their core characteristics.
Table 1: Core Characteristics of the Klimisch and CRED Evaluation Methods
| Characteristic | Klimisch Method | CRED Method |
|---|---|---|
| Primary Focus | Reliability of toxicity/ecotoxicity studies [2]. | Reliability and relevance of aquatic ecotoxicity studies [2]. |
| Number of Reliability Criteria | 12-14 for ecotoxicity studies [2]. | 20 detailed evaluation criteria [2] [9]. |
| Number of Relevance Criteria | 0 (not formally addressed) [2]. | 13 specific criteria [2] [9]. |
| Guidance Provided | Limited, qualitative summary [2]. | Comprehensive guidance for each criterion [2]. |
| Basis for Evaluation | General expert judgement based on broad categories. | Specific, documented criteria aligned with OECD test guideline reporting requirements [2]. |
| Outcome | A single reliability category (e.g., "reliable with restrictions"). | Separate, qualitative summaries for reliability and relevance, with documented justification [2]. |
The CRED method's enhanced detail is visualized in the following workflow, which outlines the sequential steps for a comprehensive study evaluation.
CRED Evaluation Workflow
The comparative efficacy of the CRED and Klimisch methods was empirically validated through a formal, two-phase international ring test [2]. The following protocol details the experimental design and procedure.
Objective: To characterize differences in study categorization, consistency, and user perception between the Klimisch and CRED evaluation methods.
Materials & Resources:
Procedure:
Phase I – Klimisch Evaluation (November-December 2012):
Phase II – CRED Evaluation (March-April 2013):
Data Analysis:
The ring test yielded quantitative and qualitative data demonstrating the advantages of the CRED framework. Key results are consolidated below.
Table 2: Key Quantitative Outcomes from the CRED-Klimisch Ring Test
| Metric | Klimisch Method Results | CRED Method Results | Implication |
|---|---|---|---|
| Participant Consensus | Lower inter-evaluator agreement on study reliability [2]. | Higher consistency among evaluators for both reliability and relevance judgments [2]. | CRED reduces subjective interpretation, promoting harmonization. |
| Evaluation Scope | Focused solely on reliability; relevance not systematically assessed [2]. | Comprehensive assessment of reliability (20 criteria) and relevance (13 criteria) [2]. | Enables more scientifically robust and fit-for-purpose study selection. |
| Bias Toward GLP | Strong preference for GLP (Good Laboratory Practice) studies, potentially overlooking flaws [2]. | Criteria-driven assessment that evaluates methodological soundness regardless of GLP status [2]. | Facilitates the inclusion of high-quality peer-reviewed literature, expanding the data pool. |
| Perceived Utility | Viewed as dependent on expert judgement [2]. | Perceived as more accurate, transparent, and practical by a majority of ring test participants [2]. | Higher user acceptance and confidence in evaluation outcomes. |
The following diagram illustrates the conceptual shift from the Klimisch method's linear, judgment-based process to CRED's structured, criteria-based analysis.
Methodological Comparison: Expert vs. Criteria-Driven
Objective: To perform a standardized, transparent evaluation of the reliability and relevance of an aquatic ecotoxicity study.
Materials:
Procedure:
CRED is designed for integration into systematic review processes. In a Weight-of-Evidence assessment, multiple studies are evaluated individually using CRED. The transparent outputs—specific ratings and justifications—allow for the explicit, defensible weighing of studies. A study with "High" reliability and "High" relevance would typically carry more weight than one with "Medium" reliability and "Low" relevance for a particular context. This moves beyond the Klimisch-based binary acceptance/rejection, enabling nuanced, tiered use of available science.
Table 3: Key Research Reagent Solutions for CRED Implementation
| Tool/Resource | Function in CRED Evaluation | Source/Availability |
|---|---|---|
| CRED Excel Evaluation Tool | The primary instrument containing all criteria, guidance, and fields for documenting the evaluation [9]. | Freely available for download from the CRED project resources [9]. |
| OECD Test Guidelines | The international standard for test methodology. Used as the benchmark for evaluating study design and reporting adequacy against the CRED reliability criteria [2]. | OECD Publishing. |
| GLP Principles | A quality system for non-clinical studies. Understanding GLP helps evaluate study conduct aspects but, per CRED, does not override specific scientific quality criteria [2]. | OECD Series on Principles of GLP. |
| Systematic Review Software | Platforms (e.g., HAWC, DistillerSR) can be configured to incorporate CRED criteria for blinding, managing, and documenting evaluations across a large evidence base. | Various commercial and open-source platforms. |
| Chemical-Specific Guidance | Documents like the EU's Technical Guidance for deriving Environmental Quality Standards (TGD-EQS) provide the context for determining study relevance [9]. | European Commission and other regulatory bodies. |
The genesis of CRED represents a paradigm shift from a reliance on expert judgment to a structured, criteria-based transparency model. As demonstrated by the ring test, it provides a more consistent, detailed, and balanced framework for evaluating ecotoxicity data than the Klimisch method [2]. Its ongoing adoption in regulatory pilots, such as the revision of the EU's Technical Guidance Document and its use by the Joint Research Centre, underscores its utility in promoting international harmonization [9].
For researchers and assessors, adopting CRED mitigates the subjectivity inherent in the Klimisch approach, leading to more defensible and reproducible hazard assessments. The provision of separate reliability and relevance summaries offers nuanced insight that directly informs Weight-of-Evidence decisions. Future research directions include expanding the CRED principles to terrestrial ecotoxicity endpoints and further automating the evaluation process within systematic review platforms.
In regulatory ecotoxicology, the quality of data used for hazard and risk assessment is paramount. Two core concepts underpin this evaluation: reliability (the inherent quality of a study based on its methodology and reporting) and relevance (the appropriateness of the data for a specific regulatory purpose)[reference:0]. Regulatory frameworks like REACH, the US EPA, and the Water Framework Directive mandate that studies undergo a formal evaluation of these attributes before use.
For decades, the Klimisch method (1997) has been the dominant evaluation system. It categorizes study reliability but offers limited criteria and no formal guidance for assessing relevance, leading to reliance on expert judgment and potential inconsistency[reference:1]. In response, the Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) method was developed to provide a more transparent, detailed, and consistent framework for evaluating both reliability and relevance[reference:2].
This document, framed within a thesis comparing the Klimisch and CRED methodologies, provides detailed application notes and experimental protocols derived from the seminal ring-test comparison study.
The following tables summarize key data from the ring test involving 75 risk assessors from 12 countries, which directly compared the two methods[reference:5].
| Characteristic | Klimisch Method | CRED Evaluation Method |
|---|---|---|
| Data Type | General toxicity & ecotoxicity | Aquatic ecotoxicity |
| Number of Reliability Criteria | 12–14 (ecotoxicity) | 20 (evaluating) / 50 (reporting) |
| Number of Relevance Criteria | 0 | 13 |
| OECD Reporting Criteria Included | 14 of 37 | 37 of 37 |
| Additional Guidance | No | Yes, extensive |
| Evaluation Summary | Qualitative (reliability only) | Qualitative (reliability & relevance)[reference:6] |
| Reliability Category | Klimisch Method (% of evaluations) | CRED Method (% of evaluations) |
|---|---|---|
| Reliable without restrictions (R1) | 8% | 2% |
| Reliable with restrictions (R2) | 45% | 24% |
| Not reliable (R3) | 42% | 54% |
| Not assignable (R4) | 6% | 20%[reference:7] |
| Relevance Category | Klimisch Method (% of evaluations) | CRED Method (% of evaluations) |
|---|---|---|
| Relevant without restrictions (C1) | 32% | 57% |
| Relevant with restrictions (C2) | 61% | 35% |
| Not relevant (C3) | 7% | 8%[reference:8] |
| Confidence Level | Klimisch Method (% of respondents) | CRED Method (% of respondents) |
|---|---|---|
| Very confident / Confident | 37% | 72%[reference:9] |
Statement: "The method is accurate, applicable, consistent, transparent, and less dependent on expert judgement."
This protocol details the two-phase ring test designed to compare the Klimisch and CRED evaluation methods.
To characterize differences in outcomes and user perception between the Klimisch and CRED methods for evaluating the reliability and relevance of aquatic ecotoxicity studies.
The following table lists key materials essential for conducting standardized aquatic ecotoxicity tests, which form the basis of the studies evaluated by the Klimisch and CRED methods.
| Item | Function & Relevance to Evaluation |
|---|---|
| Standard Test Organisms (e.g., Daphnia magna, Pseudokirchneriella subcapitata, Danio rerio) | Required for OECD/EPA guideline tests. Study reliability is assessed based on proper organism identification, husbandry, and health[reference:20]. |
| Reference Toxicants (e.g., Potassium dichromate, Sodium lauryl sulfate) | Used in periodic quality control tests to demonstrate organism sensitivity and test system validity—a key reliability criterion. |
| Culture Media & Reconstituted Water (e.g., ISO, OECD standard media) | Standardized exposure media ensure reproducibility. Deviations from recommended compositions can affect reliability evaluations[reference:21]. |
| Analytical Grade Test Substances | Purity and stability of the test substance must be characterized. Lack of analytical confirmation of exposure concentration is a common reason for downgrading reliability[reference:22]. |
| Solvent Controls (e.g., Acetone, DMSO) | Necessary for testing poorly soluble substances. Exceeding OECD-recommended solvent concentrations can render a study "not reliable"[reference:23]. |
| Positive & Negative Control Articles | Essential for validating test performance and distinguishing treatment effects from background variability. |
| Data Reporting Templates (e.g., CRED checklist) | Facilitate comprehensive reporting of all methodological details (50 CRED criteria), directly improving the clarity and evaluability of a study[reference:24]. |
The transition from the Klimisch method to the CRED framework represents a significant evolution in regulatory ecotoxicology. CRED's structured, criteria-based approach enhances transparency, reduces undesirable reliance on expert judgment, and improves consistency among assessors. For researchers, adhering to detailed reporting recommendations (like those provided by CRED) increases the likelihood that their work will be deemed reliable and relevant for regulatory use. For risk assessors, using CRED provides greater confidence in evaluation outcomes, contributing to more robust and defensible hazard and risk assessments across global regulatory frameworks.
The evaluation of study reliability is a cornerstone of chemical hazard and risk assessment. In 1997, Klimisch and colleagues introduced a standardized four-category scoring system to assess the reliability of toxicological and ecotoxicological data[reference:0]. This method, widely adopted by regulatory authorities, categorizes studies as "reliable without restrictions," "reliable with restrictions," "not reliable," or "not assignable." Despite its widespread use, the Klimisch method has been criticized for lacking detailed criteria and guidance, leading to inconsistencies among assessors. This prompted the development of the Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) method, which offers a more structured and transparent framework[reference:1]. This article provides detailed application notes and protocols for the Klimisch method, framed within the broader thesis of comparing its performance and utility against the CRED evaluation system.
The Klimisch method assigns a reliability score based on the study's adherence to standardized methodologies and the completeness of its documentation[reference:2].
The core of the method is the four-tier classification, detailed in Table 1.
Table 1: Klimisch Scoring System Categories and Assignment Criteria[reference:3]
| Score | Description | Assignment Criteria (Excerpt from IUCLID) |
|---|---|---|
| 1 | Reliable without restriction | Guideline study (preferably GLP); comparable to guideline study; test procedure in accordance with national standard methods or generally accepted scientific standards and described in detail. |
| 2 | Reliable with restriction | Guideline study without detailed documentation; guideline study with acceptable restrictions; test procedure in accordance with national standard methods with acceptable restrictions; well-documented study meeting generally accepted scientific principles. |
| 3 | Not reliable | Study has significant methodological deficiencies or uses an unsuitable test system. |
| 4 | Not assignable | Information is insufficient for assessment (e.g., abstract, secondary literature). |
Protocol 1: Step-by-Step Klimisch Evaluation
The CRED method was developed to provide a more detailed, consistent, and transparent tool for evaluating aquatic ecotoxicity studies. It expands upon the Klimisch framework by introducing explicit criteria for both reliability and relevance[reference:5].
CRED uses four reliability categories analogous to Klimisch scores: Reliable without restrictions (R1), Reliable with restrictions (R2), Not reliable (R3), and Not assignable (R4)[reference:6]. Detailed descriptions are provided in Table 2.
Table 2: CRED Reliability Categories[reference:7]
| Score | Description |
|---|---|
| R1 | Reliable without restrictions: All critical reliability criteria for this study are fulfilled. The study is well designed and performed without flaws affecting reliability. |
| R2 | Reliable with restrictions: The study is generally well designed and performed, but some minor flaws in documentation or setup may be present. |
| R3 | Not reliable: Not all critical reliability criteria are fulfilled. The study has clear flaws in design and/or performance. |
| R4 | Not assignable: Information needed to assess the study is missing (e.g., insufficient experimental details in an abstract or secondary literature). |
The CRED method is defined by a set of 20 reliability criteria and 13 relevance criteria, providing a structured checklist for evaluators[reference:8]. The reliability criteria span six categories: General information, Test setup, Test compound, Test organism, Exposure conditions, and Statistical design & biological response[reference:9].
Table 3: CRED Relevance Categories
Protocol 2: Step-by-Step CRED Evaluation
A ring test involving 75 risk assessors from 12 countries directly compared the two methods[reference:11]. Key comparative data are summarized below.
Table 4: General Characteristics of Klimisch and CRED Evaluation Methods[reference:12]
| Characteristic | Klimisch | CRED |
|---|---|---|
| Data type | Toxicity and ecotoxicity | Aquatic ecotoxicity |
| Number of reliability criteria | 12–14 (ecotoxicity) | Evaluating: 20 (Reporting: 50) |
| Number of relevance criteria | 0 | 13 |
| Number of OECD reporting criteria included | 14 (of 37) | 37 (of 37) |
| Additional guidance | No | Yes |
| Evaluation summary | Qualitative for reliability | Qualitative for reliability and relevance |
Table 5: Ring Test Results - Reliability Evaluation Outcomes[reference:13]
| Reliability Category | Klimisch Method (% of evaluations) | CRED Method (% of evaluations) |
|---|---|---|
| Reliable without restrictions | 8% | 2% |
| Reliable with restrictions | 45% | 24% |
| Not reliable | 42% | 54% |
| Not assignable | 6% | 20% |
Table 6: Ring Test Results - Practicality and Perception
Protocol 3: Ring Test Design for Comparing Evaluation Methods This protocol is based on the CRED ring test methodology[reference:20].
Diagram 1: Klimisch Scoring Decision Workflow
Diagram 2: CRED Evaluation Systematic Process
Diagram 3: Comparative Decision Pathways: Expert vs. Criteria-Driven
Table 7: Essential Materials for Ecotoxicity Study Evaluation
| Item Category | Specific Item | Function/Purpose |
|---|---|---|
| Reference Documents | OECD/ISO Test Guidelines (e.g., 201, 210, 211) | Provide standardized methodology against which studies are evaluated. |
| Good Laboratory Practice (GLP) Principles | Benchmark for assessing study conduct and documentation quality. | |
| REACH Guidance Documents | Define regulatory context and requirements for reliability/relevance. | |
| Evaluation Tools | Klimisch Score Criteria Table (Table 1) | Quick reference for assigning Klimisch categories. |
| CRED Evaluation Excel Tool | Structured checklist for systematic, transparent assessment. | |
| ToxRTool (ECVAM) | Excel-based tool that provides detailed criteria leading to a Klimisch category[reference:21]. | |
| Test Materials | Certified Reference Substances | Ensure test substance purity and identity are verifiable. |
| Control Substances (Solvent, Positive/Negative) | Essential for validating test system performance. | |
| Cultured Test Organisms (e.g., Daphnia magna, algae) | Standardized biological material for toxicity testing. | |
| Laboratory Equipment | Analytical Chemistry Equipment (HPLC, GC-MS) | For verifying test substance concentrations during exposure. |
| Environmental Chambers | To maintain stable temperature, light, and pH conditions. | |
| Software | Statistical Analysis Software (e.g., R, GraphPad Prism) | To evaluate dose-response data and calculate endpoints (EC50, NOEC). |
| Reference Management Software | To organize and cite evaluated studies. |
The Klimisch four-category scoring system provided a foundational step toward standardizing reliability assessments in regulatory toxicology. However, its reliance on expert judgment and lack of detailed criteria can lead to inconsistency. The CRED evaluation method addresses these shortcomings by offering a structured, criteria-based framework that encompasses both reliability and relevance, resulting in greater transparency and consistency among assessors. For robust regulatory decision-making, especially where data inclusivity is valued, the CRED method presents a scientifically rigorous alternative or successor to the Klimisch approach. The choice of method ultimately depends on the need for speed versus the demand for detailed, defensible, and harmonized study evaluations.
The regulatory evaluation of ecotoxicity studies is fundamental for deriving Predicted-No-Effect Concentrations (PNECs) and Environmental Quality Standards (EQSs) [12]. For decades, the predominant method for this task has been the system proposed by Klimisch and colleagues in 1997 [2]. While representing an initial step toward systematic evaluation, the Klimisch method has faced sustained criticism for its lack of detail, insufficient guidance, and failure to ensure consistency among different risk assessors [12] [2]. Its broad categorization of studies as "reliable without restrictions," "reliable with restrictions," "not reliable," or "not assignable" leaves considerable room for subjective expert judgment [2]. Furthermore, it has been criticized for favoring Good Laboratory Practice (GLP) and standardized guideline studies, potentially excluding pertinent and high-quality data from the peer-reviewed scientific literature [12] [15].
The Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) framework was developed to address these shortcomings. Its primary aim is to improve the reproducibility, transparency, and consistency of reliability and relevance evaluations across different regulatory frameworks, countries, and individual assessors [12] [9]. The core innovation of CRED is its detailed, criteria-based approach, comprising 20 reliability criteria and 13 relevance criteria, supported by extensive guidance [12]. This structured methodology provides a more granular and transparent alternative to the Klimisch method, reducing ambiguity and promoting harmonized data assessment in environmental hazard and risk evaluations [2].
The CRED framework is built on clear, distinct definitions. Reliability refers to the inherent scientific quality of a study, assessing the clarity, plausibility, and reproducibility of its experimental procedure and findings. Relevance, in contrast, is purpose-dependent and evaluates how appropriate the data and test are for a specific hazard identification or risk characterization [12] [2]. A study can be highly reliable but irrelevant for a particular assessment, and vice-versa [12].
The 20 reliability criteria are organized into key categories that scrutinize every aspect of a study's design, conduct, and reporting. The 13 relevance criteria ensure the study's purpose, test organism, substance, endpoint, and exposure conditions align with the specific regulatory question at hand [12]. The comprehensive nature of these criteria is summarized below.
Table 1: Overview of CRED Reliability and Relevance Criteria Categories
| Evaluation Dimension | Number of Criteria | Primary Focus Areas |
|---|---|---|
| Reliability | 20 | Test substance characterization, test organism details, exposure system & conditions, study design & methodology, data reporting & statistical analysis [12]. |
| Relevance | 13 | Assessment purpose, test organism appropriateness, exposure pathway, measured endpoints, and ecological realism of the tested concentration and duration [12]. |
A systematic workflow is essential for consistent application. The following protocol, derived from the CRED method and supporting literature, details the steps for evaluators [12] [2].
1. Definition of Assessment Purpose: Clearly articulate the regulatory context (e.g., derivation of an EQS, pesticide authorization) and the specific ecological compartment (e.g., freshwater, marine) and protection goals involved. This defines the lens for all relevance evaluations [16].
2. Initial Screening & Gateway Check: Perform a preliminary scan of the study's title, abstract, and materials section. Verify the presence of minimum information: correct test organism, relevant substance, reported ecotoxicological endpoint, and basic methodological description. Studies failing this gateway are not evaluated further unless missing information can be obtained [16].
3. Independent Reliability and Relevance Assessment:
4. Overall Categorization and Integration: Synthesize the individual ratings into overall reliability and relevance categories (e.g., Reliable/Relevant Without Restrictions, With Restrictions, Not Reliable/Relevant, Not Assignable). Finally, integrate these two judgments to determine the study's overall usability for the defined purpose [16]. A study must be both reliable and relevant to be usable without restrictions.
5. Documentation: Transparently document all scores, justifications, and final conclusions. This creates an audit trail, which is critical for regulatory transparency and for reconciling differences between evaluators [12].
Diagram: CRED Study Evaluation Workflow. The process begins by defining the purpose, proceeds through parallel reliability and relevance assessments informed by the purpose, and concludes with integrated categorization and documentation [12] [16].
The comparative superiority of CRED over the Klimisch method was demonstrated through a structured ring test [2]. The protocol for this validation is as follows:
1. Participant Selection: Recruit a diverse cohort of risk assessors from industry, academia, consultancy, and government institutions across multiple countries to ensure representativeness [12] [2].
2. Study Selection & Assignment: Curate a set of 8 ecotoxicity studies from the peer-reviewed literature covering various organisms (algae, Daphnia, fish, macrophytes), substance classes (pharmaceuticals, pesticides, industrial chemicals), and endpoints (acute, chronic) [2]. Assign each participant two unique studies to evaluate in each phase to prevent learning bias.
3. Phased Evaluation:
4. Data Collection & Analysis: Collect completed evaluations, including final categories and written comments. Analyze for: * Inter-evaluator consistency for the same study within each method. * Perceived utility of each method via participant questionnaires (clarity, ease of use, time required). * Critical analysis of discrepancies to refine the CRED criteria [12] [2].
5. Method Refinement: Use quantitative results and qualitative feedback from Phase II to fine-tune the wording and guidance of the CRED criteria, finalizing the published version [12].
Table 2: Ring Test Comparison of Klimisch and CRED Methods [2]
| Comparison Aspect | Klimisch Method | CRED Method | Implication from Ring Test |
|---|---|---|---|
| Scope of Evaluation | Primarily reliability; vague relevance consideration. | Explicit, separate evaluation of reliability (20 crit.) and relevance (13 crit.). | CRED provides a more comprehensive and structured assessment. |
| Guidance Specificity | Low; minimal descriptive guidance for criteria. | High; extensive guidance text for applying each criterion. | Reduced ambiguity and lower dependency on expert judgment with CRED. |
| Basis for Categorization | Holistic expert judgment based on 12-14 broad criteria. | Summative judgment based on scoring many specific criteria. | CRED evaluations were perceived as more accurate, consistent, and transparent. |
| Inherent Bias | Criticized for bias toward GLP/guideline studies. | Criteria are applied equally to guideline and non-guideline studies. | CRED facilitates the inclusion of high-quality peer-reviewed literature. |
| Participant Perception | Less consistent, more subjective. | More consistent, practical, and useful for regulatory application. | CRED was endorsed as a suitable replacement for Klimisch. |
Implementing and evaluating studies under the CRED framework requires an understanding of core experimental materials. The following table lists key reagent solutions and their functions in standard aquatic tests, the quality of which is scrutinized under CRED's reliability criteria.
Table 3: Key Research Reagent Solutions in Aquatic Ecotoxicity Testing
| Reagent/Material | Function in Ecotoxicity Testing | CRED Reliability Consideration |
|---|---|---|
| Reconstituted Standardized Test Water (e.g., ISO, OECD recipes) | Provides a consistent, defined medium for exposing test organisms, controlling water hardness, pH, and ion composition. | Critical for reporting exposure conditions (Criterion R-10). Deviations must be justified and documented [12]. |
| Test Substance Stock Solution (in solvent if needed) | The concentrated source of the chemical for serial dilution to create exposure concentrations. | Purity, concentration verification, and solvent identity/volume must be reported (Criteria R-2, R-3). Solvent controls are mandatory [12]. |
| Algal Growth Medium (e.g., OECD TG 201 medium) | Supplies essential macro and micronutrients for unicellular algal growth in tests assessing inhibition of growth rate. | Composition must be specified. Nutrient levels can affect substance bioavailability and toxicity [12]. |
| Daphnia Food Suspension (e.g., algae, yeast, trout chow) | Sustains test organisms during chronic (>48h) reproduction and survival tests. | Type, quantity, and feeding regimen must be reported. Inadequate feeding is a common reason for study de-rating [12]. |
| Aeration Systems (air pumps, stones) | Maintains dissolved oxygen levels above critical thresholds for fish and invertebrates. | Aeration status must be reported. Lack of aeration can cause hypoxia, confounding toxicity results [12]. |
| Positive Control Substance (e.g., K₂Cr₂O₇ for Daphnia, 3,5-DCP for algae) | Used periodically to verify the sensitivity of the test organism population is within historical ranges. | Use of positive controls, while not always mandatory, strengthens the reliability of the biological response [12]. |
The evolution from Klimisch to CRED, and its extension to exposure assessment with CREED, represents a paradigm shift toward greater structure and transparency in environmental data evaluation [16].
Diagram: Evolution of Data Evaluation Frameworks. The diagram shows the progression from the initial Klimisch method, through critique and development, to the detailed CRED framework for ecotoxicity data, and its recent extension to exposure data via the CREED framework [12] [2] [16].
The Klimisch method, introduced in 1997, established a foundational framework for evaluating the reliability of toxicological and ecotoxicological studies for regulatory purposes [12]. It provides a standardized categorization system where studies are classified as "reliable without restrictions," "reliable with restrictions," "not reliable," or "not assignable" [7]. For over two decades, this method has been integral to chemical risk assessments within frameworks like REACH and the Water Framework Directive [17]. However, as part of a broader thesis comparing methodological rigor, this protocol must be framed against its modern successor, the Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) method. Research indicates that while the Klimisch method was a significant step forward, it has been criticized for lacking detailed guidance, leaving substantial room for expert judgment, and potentially introducing bias and inconsistency among assessors [12] [17]. The subsequent development of the CRED method, with its 20 reliability and 13 relevance criteria, was a direct response to these shortcomings, aiming for greater transparency, consistency, and detailed evaluation [9]. This application note details the procedural steps for conducting a Klimisch evaluation, while consistently contextualizing its practical use and limitations relative to the more granular CRED approach.
The following diagram outlines the sequential decision-making process for assigning a reliability category to a study using the Klimisch method.
The Klimisch method primarily assesses study reliability, defined as "the inherent quality of a test report or publication relating to preferably standardized methodology and the way the experimental procedure and results are described to give evidence of the clarity and plausibility of the findings" [7]. It does not provide formal criteria for evaluating study relevance—the extent to which data are appropriate for a specific hazard identification—which is a separate, subsequent consideration [12]. This stands in contrast to the CRED method, which integrates explicit relevance evaluation alongside reliability, offering a more comprehensive assessment framework [17].
A key characteristic of the Klimisch method is its strong weighting toward studies conducted under Good Laboratory Practice (GLP) and according to standardized test guidelines (e.g., OECD, EPA) [17]. This preference has been a point of criticism, as it may lead to the automatic elevation of guideline studies while potentially excluding sound, hypothesis-driven academic research from regulatory consideration [12] [18]. The CRED method attempts to rectify this by evaluating the scientific merit of the methodology itself, rather than its compliance pedigree [9].
Table 1: Core Characteristics of the Klimisch and CRED Evaluation Methods
| Aspect | Klimisch Method | CRED Method |
|---|---|---|
| Primary Focus | Reliability of study data [7]. | Reliability and relevance of study data [12]. |
| Guidance Detail | Limited criteria and guidance, leading to reliance on expert judgment [17]. | Detailed criteria (20 for reliability, 13 for relevance) with extensive guidance [12] [9]. |
| Evaluation Output | Single reliability category (R1, R2, R3, R4) [7]. | Separate reliability and relevance categories, with detailed criterion-level documentation [17]. |
| Bias Toward GLP | Strong; GLP/guideline studies often favored [17]. | Reduced; focuses on methodological soundness regardless of GLP status [12]. |
| Transparency | Low; categorization rationale may not be explicit [17]. | High; requires documentation for each criterion [9]. |
This protocol is derived from the original Klimisch publication and its application in comparative ring tests against the CRED method [17].
Table 2: Example Klimisch vs. CRED Evaluation Outcomes from a Ring Test
| Study Description | Klimisch Method Results | CRED Method Results | Interpretation |
|---|---|---|---|
| GLP fish test (Estrone) | 44% R1, 56% R2 [19]. | 16% R1, 21% R2, 63% R3 [19]. | Highlights Klimisch's GLP bias. CRED's detailed criteria led most evaluators to identify reliability flaws. |
| Typical Peer-Reviewed Study | High inconsistency; mixed R2/R3 ratings common [17]. | More consistent categorization due to explicit criteria [17]. | CRED's structure reduces subjectivity for non-GLP studies. |
| Poorly Reported Study | Categorized as R4 ("Not Assignable") [7]. | Allows detailed scoring of reported vs. unreported criteria, providing a more nuanced picture [20]. | CRED offers greater diagnostic value for identifying specific reporting gaps. |
Table 3: Key Research Reagent Solutions for Methodological Evaluation
| Item/Tool | Function in Evaluation | Consideration in Klimisch vs. CRED Context |
|---|---|---|
| Original Klimisch Publication | Provides the foundational definitions and logic for the four reliability categories [7]. | Essential for applying the classic method. Lacks operational detail found in later tools. |
| CRED Excel Tool | Freely available spreadsheet containing the 20 reliability and 13 relevance criteria with guidance [9]. | While designed for CRED, it serves as an excellent de facto checklist for aspects to consider during a Klimisch evaluation, enhancing thoroughness. |
| Test Guidelines (OECD, EPA) | Define standardized methodologies for specific toxicity tests [12]. | Central reference for Klimisch's compliance check. CRED uses them as a benchmark but does not mandate adherence. |
| Reporting Checklists (e.g., CRED Reporting Recommendations) | List of essential information that should be reported in an ecotoxicity study (50 criteria across 6 categories) [12]. | Invaluable for systematically assessing "reporting completeness" in both Klimisch and CRED evaluations. |
| Ring Test Comparative Data | Published results comparing evaluator consistency between Klimisch and CRED [17] [20]. | Critical for understanding the limitations and subjectivity inherent in the Klimisch method. |
This diagram maps the comparative pathway for analyzing a study using both the Klimisch and CRED methods, highlighting their divergent processes and endpoints.
Executing a Klimisch evaluation requires the assessor to navigate its inherent reliance on expert judgment. The protocol outlined here emphasizes the initial GLP/guideline filter and the broad, holistic assessment of scientific soundness. However, data from comparative ring tests is conclusive: this approach leads to lower consistency between different evaluators compared to the structured CRED method [17]. For instance, the percentage of fulfilled criteria for studies rated "reliable with restrictions" (R2) under CRED showed a standard deviation of 12%, indicating variability even with more guidance [20].
Therefore, when applying the Klimisch method in contemporary research, especially for a comparative thesis, it is imperative to:
The Klimisch method remains a historically important tool, and understanding its step-by-step application is crucial for interpreting a vast legacy of chemical risk assessments. However, for forward-looking research and regulation aiming for higher consistency, transparency, and integration of diverse scientific evidence, the structured, criteria-based approach exemplified by the CRED method represents the evolved standard [12] [9].
The evaluation of ecotoxicity data is a foundational step in the environmental risk assessment of chemicals, directly influencing the derivation of safe concentrations like Predicted-No-Effect Concentrations (PNECs) and Environmental Quality Standards (EQSs) [12]. For decades, the Klimisch method has been the predominant tool for this task, categorizing studies as "reliable without restrictions," "reliable with restrictions," "not reliable," or "not assignable" [17]. However, its reliance on broad criteria and expert judgment has been criticized for introducing bias, inconsistency, and a preference for industry-sponsored Good Laboratory Practice (GLP) studies over peer-reviewed literature [12] [17].
The Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) method was developed to address these shortcomings [12]. Framed within a thesis comparing the Klimisch and CRED approaches, this article provides detailed application notes and protocols for implementing the CRED method. Evidence from a comprehensive ring test demonstrates that CRED offers a more detailed, transparent, and consistent framework for evaluating both the reliability and relevance of aquatic ecotoxicity studies, making it a scientifically robust successor for regulatory and research applications [17].
The CRED method is built on clear definitions. Reliability pertains to the intrinsic scientific quality of a study—its design, performance, and reporting—independent of its intended use. Relevance, however, is assessment-specific and refers to how appropriate the study's data is for a particular hazard identification or risk characterization purpose [12]. A study can be reliable but irrelevant (e.g., a high-quality soil toxicity test for an aquatic assessment) and vice-versa [12].
The method's structure is defined by two core sets of criteria:
Unlike the Klimisch method's single judgment, CRED requires assessors to evaluate each criterion individually, selecting from predefined answers (e.g., "Yes," "No," "Partly," "Not Reported," "Not Applicable"). This granular approach forces a transparent and systematic examination of the study's strengths and weaknesses.
Table: Comparative Overview of the Klimisch and CRED Evaluation Methods
| Feature | Klimisch Method (1997) | CRED Method (2016) |
|---|---|---|
| Core Purpose | Evaluate reliability for regulatory compliance. | Evaluate reliability and relevance for robust risk assessment [12]. |
| Evaluation Categories | Reliability only (4 categories: R1-R4) [17]. | Separate, detailed evaluations for reliability and relevance [12]. |
| Number of Criteria | Limited, unspecified criteria leading to high expert judgment [17]. | 20 reliability criteria and 13 relevance criteria with detailed guidance [12]. |
| Guidance & Transparency | Minimal guidance; evaluations lack transparency [17]. | Extensive guidance for each criterion; evaluation is fully documented and reproducible. |
| Bias Toward GLP/Guidelines | Criticized for favoring GLP/guideline studies irrespective of scientific merit [12] [17]. | Evaluates scientific quality directly; guideline adherence is one factor among many. |
| Outcome Consistency | Low consistency between different assessors [17]. | High consistency demonstrated in ring testing [17]. |
The following workflow provides a standardized protocol for conducting a CRED evaluation.
Workflow for Conducting a CRED Evaluation
Step 1: Initial Relevance Screening Before a full CRED evaluation, screen the study's title and abstract against the broad needs of your specific assessment (e.g., organism, endpoint, exposure type). Exclude clearly irrelevant studies at this stage [12].
Step 2: Systematic Data Extraction For each study passing Step 1, extract detailed information into a standardized template. The CRED Reporting Recommendations—comprising 50 criteria across six categories—serve as an ideal checklist to ensure all necessary information on test substance, organism, design, conditions, and results is captured [12].
Step 3: Criterion-by-Criterion Evaluation Using the extracted data, evaluate the study against each of the 20 reliability and 13 relevance criteria. For each criterion:
Step 4: Synthesis and Final Categorization Synthesize the individual criterion judgments to assign an overall reliability category (R1-R4, matching Klimisch) and a relevance category (C1: Relevant without restrictions, C2: Relevant with restrictions, C3: Not relevant). The final categorization is a matter of expert judgment but must be directly and transparently traceable to the scores and justifications recorded in Step 3.
Step 5: Documentation The completed evaluation form, with all criterion-level judgments and final categorizations, becomes the audit trail. This transparency is critical for regulatory acceptance and for reconciling differences between assessors.
The superiority of the CRED method was validated through a two-phase international ring test designed to directly compare it with the Klimisch method [17].
Objective: To quantitatively compare the consistency, transparency, and user perception of the Klimisch and CRED evaluation methods.
Design: A crossover design where participants evaluated different studies with each method to prevent bias from familiarization with a specific study [17].
Methodology of the CRED vs. Klimisch Ring Test [17]
Key Quantitative Results: The ring test yielded clear, quantitative evidence of CRED's advantages, as summarized in the table below.
Table: Key Quantitative Results from the CRED Ring Test [17]
| Study | Klimisch Method (Reliability Category Agreement) | CRED Method (Reliability Category Agreement) | Notable Outcome |
|---|---|---|---|
| Study A (Algal toxicity) | 73% agreement (R2) | 95% agreement (R2) | Higher consensus with CRED. |
| Study E (Fish endocrine test) | 44% R1, 56% R2 [19] | 16% R1, 21% R2, 63% R3 [19] | CRED flagged critical reliability flaws missed by Klimisch. |
| Overall Perception | Less consistent, more dependent on expert judgment. | 87% of users found it more accurate.82% found it more consistent.95% found it more transparent [17]. | Strong user preference for CRED. |
Table: Essential Research Reagents and Tools for CRED Evaluation
| Item Name | Function/Description | Application in CRED Protocol |
|---|---|---|
| CRED Evaluation Worksheet | Standardized Excel form listing the 20 reliability and 13 relevance criteria with dropdown answer options [12]. | The primary tool for conducting and documenting the step-by-step evaluation (Step 3). Ensures all criteria are addressed systematically. |
| CRED Reporting Template | Checklist of 50 reporting items across six categories (general, test design, substance, organism, exposure, statistics) [12]. | Used during data extraction (Step 2) to ensure all necessary information is collected from the primary study. |
| Detailed CRED Guidance Document | Provides definitions, examples, and decision rules for interpreting and scoring each evaluation criterion [12]. | The essential reference to ensure correct and consistent application of criteria, reducing subjective judgment. |
| Access to Full OECD Test Guidelines | Reference documents for standard testing protocols (e.g., OECD 210, 211). | Used as a benchmark to evaluate if study deviations from standard methods impact reliability. |
| Statistical Analysis Software | Software (e.g., R, GraphPad Prism) capable of re-analyzing raw data if reported. | Used to verify statistical calculations and endpoint derivations (e.g., EC50, NOEC) reported in the study, a key reliability criterion. |
The evaluation of aquatic ecotoxicity data is a cornerstone of environmental risk assessment for chemicals, directly influencing the derivation of Predicted-No-Effect Concentrations (PNECs) and Environmental Quality Standards (EQSs) [12]. Historically, the Klimisch method has been widely used to assess study reliability, but it has faced criticism for being unspecific, lacking detailed criteria for relevance evaluation, and allowing substantial room for interpretative bias [12] [17]. This has led to inconsistencies where different risk assessors might categorize the same study differently, impacting regulatory decisions [17].
In response, the Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) method was developed to provide a more transparent, consistent, and detailed framework [12] [9]. The CRED method enhances the evaluation process through two main components: a set of 20 reliability and 13 relevance criteria with extensive guidance, and a complementary set of 50 reporting recommendations to improve the quality of future studies [12].
This article provides a practical, comparative application of both methods within the context of a broader thesis on evaluation methodologies. It details experimental protocols, presents quantitative comparison data, and offers tools to equip researchers and assessors in implementing these frameworks effectively.
The fundamental differences between the Klimisch and CRED evaluation methods are structural and philosophical. The table below summarizes their key characteristics.
Table 1: Fundamental Comparison of the Klimisch and CRED Evaluation Methods
| Feature | Klimisch Method (1997) | CRED Evaluation Method (2016) |
|---|---|---|
| Primary Focus | Reliability of studies, particularly favoring GLP and standard guideline studies [12] [17]. | Integrated evaluation of both reliability and relevance, with broader applicability to peer-reviewed literature [12] [9]. |
| Evaluation Criteria | Limited, non-specific criteria for reliability. No defined criteria for relevance [17]. | 20 explicit reliability criteria and 13 explicit relevance criteria, each with detailed guidance [12]. |
| Output Categories | Reliability: 1) Reliable without restrictions, 2) Reliable with restrictions, 3) Not reliable, 4) Not assignable [17]. | Separate categories for Reliability & Relevance. Reliability uses the same four Klimisch categories. Relevance uses: 1) Relevant without restrictions, 2) Relevant with restrictions, 3) Not relevant [17]. |
| Guidance & Transparency | Minimal guidance, leading to high reliance on expert judgment and potential inconsistency [12] [17]. | High level of detail and prescribed guidance aims to reduce subjectivity and improve consistency and transparency [12] [9]. |
| Perception by Assessors | Criticized for bias and inconsistency. Ring test participants found CRED to be more accurate, applicable, consistent, and transparent [12]. | Viewed as a more robust and science-based tool for harmonized hazard and risk assessments across regulatory frameworks [17]. |
A two-phase ring test involving 75 risk assessors from 12 countries provided empirical data comparing the outcomes of the two methods [17]. The results demonstrate that the CRED method's structured criteria lead to more conservative and differentiated evaluations.
Table 2: Quantitative Outcomes from the CRED vs. Klimisch Ring Test [19] [17] [20]
| Study Description & Endpoint | Klimisch Method Reliability Categorization | CRED Method Reliability Categorization | Key Implications |
|---|---|---|---|
| Industry GLP study on fish (Danio rerio) chronic toxicity with estrone [19]. | 44% (4/9) Reliable without restrictions.56% (5/9) Reliable with restrictions [19]. | 16% (3/19) Reliable without restrictions.21% (4/19) Reliable with restrictions.63% (12/19) Not reliable [19]. | CRED's detailed criteria flagged specific reliability issues that the Klimisch method overlooked, leading to a significantly stricter assessment of the same GLP study. |
| General Analysis of Categorization Consistency | Higher inconsistency among assessors due to vague criteria [17]. | Improved consistency. The average percentage of fulfilled reliability criteria decreased logically with each lower category: 93% (R1), 72% (R2), 60% (R3), 51% (R4) [20]. | CRED provides a measurable gradient of reliability linked to criteria fulfillment, enhancing transparency and predictability of evaluations. |
| Relevance Evaluation | Not formally addressed by the method [17]. | Explicitly evaluated. Average fulfillment was 84% for "Relevant without restrictions," 73% for "Relevant with restrictions," and 61% for "Not relevant" [20]. | CRED mandates a separate, criteria-based relevance check, ensuring the study's appropriateness for the specific assessment context is documented. |
Diagram 1: Comparative Workflow of Klimisch vs. CRED Evaluation Methods - This diagram contrasts the simpler, judgment-based Klimisch process with the CRED method's structured, criteria-driven parallel assessment of reliability and relevance.
This protocol outlines the general methodology for a standard chronic toxicity test with a freshwater invertebrate (e.g., Daphnia magna reproduction test), which forms the basis for many guideline studies (e.g., OECD 211) [12].
1. Test Organism Culturing:
2. Test Substance Preparation:
3. Experimental Design:
4. Endpoints and Measurements:
5. Data Analysis:
Testing substances of Unknown or Variable Composition, Complex reaction products, or Biological materials (UVCBs) poses significant challenges due to poor solubility, volatility, or instability [21] [22]. The following adaptations are critical.
1. Preliminary Phase: Characterization and Pre-testing [21] [22]:
2. Test Design Adaptations:
Aquatic microcosms simulate natural ecosystem interactions and are used for higher-tier risk assessment [23].
1. Microcosm Establishment:
2. Chemical Application and Monitoring:
3. Ecological Endpoint Measurement [23]:
Diagram 2: Aquatic Microcosm Experimental Workflow - This diagram outlines the three-phase process for conducting a higher-tier aquatic microcosm study, from ecosystem establishment to ecological endpoint analysis.
Table 3: Key Research Reagent Solutions for Aquatic Ecotoxicity Testing
| Item | Primary Function & Rationale |
|---|---|
| Reconstituted Standard Freshwater (e.g., ISO 6341, OECD TG 202/211 medium) | Provides a consistent, defined chemical matrix for culturing and testing, eliminating variability from natural water sources. Essential for reproducibility and guideline compliance [12]. |
| High-Quality Algal Cultures (e.g., Pseudokirchneriella subcapitata, Chlorella vulgaris) | Serves as a standardized food source for filter-feeding test organisms (e.g., Daphnia). Consistent nutritional quality is critical for healthy cultures and reliable sub-lethal (reproduction, growth) endpoints. |
| Analytical Grade Test Chemical & Certified Reference Standards | Ensures precise dosing and exposure verification. For UVCBs and difficult substances, a well-characterized batch sample and analytical standards for key constituents are mandatory for interpreting results [21] [22]. |
| Appropriate Solvent (e.g., Acetone, Dimethyl Formamide, Ethanol) | Used to prepare stock solutions of poorly water-soluble chemicals. Must be non-toxic to test organisms at the concentration used (typically ≤0.1 mL/L) and be consistent across all treatments [22]. |
| Water Quality Monitoring Kits/Probes (for pH, Dissolved Oxygen, Conductivity, Ammonia) | Critical for verifying acceptable test conditions. Deviations in water quality can induce stress and confound chemical toxicity results. Regular monitoring is a key reliability criterion in both Klimisch and CRED evaluations. |
| Preservation and Fixation Agents (e.g., Lugol's iodine, formaldehyde, RNAlater) | Used to preserve planktonic and microbial samples from microcosm or mesocosm studies for later community analysis (e.g., microscopy, DNA metabarcoding) [23]. |
| Solid Phase Extraction (SPE) Cartridges & HPLC/MS-Grade Solvents | Essential for pre-concentrating and analyzing trace levels of test chemicals and their transformation products in water samples from fate studies or tests with difficult substances [23]. |
The regulatory assessment of chemicals requires reliable and relevant ecotoxicity data to derive Predicted-No-Effect Concentrations (PNECs) and Environmental Quality Standards (EQSs) [12]. A critical step in this process is evaluating the quality and applicability of individual studies, a task historically reliant on the method established by Klimisch et al. in 1997 [17]. While pioneering, the Klimisch method has been widely criticized for its lack of detail, insufficient guidance, and consequent over-reliance on subjective expert judgment, leading to inconsistent evaluations between assessors [12] [17]. This inconsistency can directly impact risk assessment outcomes, potentially leading to either underestimated environmental risks or unnecessary mitigation measures [17].
To address these shortcomings, the Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) method was developed. CRED aims to improve the reproducibility, transparency, and consistency of reliability and relevance evaluations for aquatic ecotoxicity studies across regulatory frameworks [12]. This document provides a detailed comparative analysis of the two methods, grounded in empirical research, to highlight the operational pitfalls of the Klimisch approach and demonstrate the structured alternative offered by CRED.
A pivotal ring test involving 75 risk assessors from 12 countries directly compared the Klimisch and CRED methods [17]. Participants evaluated a battery of ecotoxicity studies using each method. The results quantitatively demonstrate significant differences in categorization consistency and user perception.
Table 1: Comparative Categorization of Study E (GLP Report on Fish Toxicity) [17] [19]
| Evaluation Method | Reliable Without Restrictions (R1) | Reliable With Restrictions (R2) | Not Reliable (R3) | Not Assignable (R4) | Mean Score (R1-R3) |
|---|---|---|---|---|---|
| Klimisch | 44% (4/9) | 56% (5/9) | 0% | 0% | 1.6 |
| CRED | 16% (3/19) | 21% (4/19) | 63% (12/19) | 0% | 2.5 |
Table 2: Ring Test Participant Perception of Evaluation Methods [17]
| Perception Criteria | Klimisch Method | CRED Evaluation Method |
|---|---|---|
| Accuracy | Less Accurate | More Accurate |
| Applicability | Less Applicable | More Applicable |
| Consistency | Less Consistent | More Consistent |
| Transparency | Less Transparent | More Transparent |
| Dependence on Expert Judgment | High Dependence | Low Dependence |
Table 3: Core Structural Differences Between Klimisch and CRED Methods [12] [17]
| Feature | Klimisch Method | CRED Evaluation Method |
|---|---|---|
| Primary Focus | Reliability only. | Reliability and relevance. |
| Reliability Criteria | 4 broad categories. | 20 detailed criteria with extensive guidance. |
| Relevance Criteria | Not formally defined. | 13 detailed criteria with extensive guidance. |
| Guidance Specificity | Limited, leading to interpretation. | Comprehensive, reducing ambiguity. |
| Output | Single reliability score (R1-R4). | Separate, detailed scores for reliability and relevance. |
| Bias Tendency | Favors GLP/guideline studies [12]. | Criteria-based, reduces automatic preference. |
The following protocol details the methodology used in the ring test that generated the comparative data between the Klimisch and CRED methods [17].
3.1. Protocol: Comparative Ring Test for Study Evaluation Methods
Materials & Inputs:
Procedure:
Key Outcomes Measured:
Diagram 1: Comparative Workflows of Klimisch vs. CRED Methods
The following toolkit comprises essential solutions and materials for conducting and evaluating modern aquatic ecotoxicity studies, as inferred from the criteria emphasized by the CRED method.
Table 4: Research Reagent Solutions for Aquatic Ecotoxicity Testing
| Item | Function in Ecotoxicity Testing | Rationale & CRED Evaluation Link |
|---|---|---|
| Standardized Test Media | Provides a consistent, defined chemical environment (pH, hardness, salinity) for exposure, ensuring reproducibility across labs. | Critical for Reliability Criterion: Exposure Conditions. Poorly characterized media is a major source of irreproducibility [12]. |
| Analytical Grade Test Substance | A substance of known purity and identity, essential for preparing accurate dosing solutions. | Fundamental for Reliability Criterion: Test Substance. Impurities can confound results [12]. |
| Certified Reference Toxicant | A standard toxicant (e.g., K₂Cr₂O₇, NaCl) used in periodic tests to confirm the health and consistent sensitivity of test organism cultures. | Supports Reliability Criterion: Test Organism. Validates organism fitness and test system performance [12]. |
| Solvent/Vehicle Control | A neutral carrier (e.g., acetone, DMSO) for water-insoluble substances, used at a non-toxic concentration. | Required for Reliability Criterion: Test Design. Must be included and its effect reported to isolate the test substance's toxicity [12]. |
| Preservative for Water Samples | (e.g., acid, cooling) Used when verifying exposure concentrations in test vessels via chemical analysis. | Enables Reliability Criterion: Exposure Characterization. Measured concentrations are more reliable than nominal ones [12]. |
| Formulated Diet for Chronic Tests | A nutritionally complete, consistent food source for organisms in long-term studies (e.g., growth, reproduction). | Vital for Reliability Criterion: Test Organism Health. Inadequate nutrition is a common confounding factor [12]. |
The empirical comparison reveals that the Klimisch method's broad categories and lack of detailed guidance lead to high inter-assessor variability, validating concerns about its inconsistency [17]. Its structure, which initially asks whether a study is a GLP or guideline test, can introduce bias by incentivizing the automatic categorization of such studies as reliable, potentially overlooking specific scientific flaws [12] [17].
In contrast, the CRED method mandates a transparent, criteria-based assessment that deconstructs study quality into 20 reliability and 13 relevance elements [12]. This structured approach reduces the space for unsubstantiated expert judgment, leading to more consistent and transparent evaluations, as evidenced by the ring test where participants perceived CRED as more accurate, consistent, and less dependent on subjective opinion [17]. The significant recategorization of "Study E" from primarily reliable under Klimisch to primarily not reliable under CRED demonstrates the tangible impact of applying more rigorous, transparent criteria [19].
Therefore, within the broader thesis comparing Klimisch and CRED, evidence strongly indicates that the CRED evaluation method effectively mitigates the core pitfalls of the Klimisch approach—inconsistency and over-reliance on expert judgment—by providing a detailed, transparent, and systematic framework for evaluating ecotoxicity data.
The regulatory assessment of chemicals hinges on the systematic evaluation of available ecotoxicity studies. For over two decades, the Klimisch method served as the de facto standard, categorizing study reliability as "1" (reliable without restriction), "2" (reliable with restrictions), "3" (not reliable), or "4" (not assignable) [24]. While pioneering, this approach has faced sustained criticism for its lack of detailed guidance, leading to inconsistent applications dependent on subjective expert judgment [24].
The Criteria for Reporting and Evaluating ecotoxicity Data (CRED) framework was developed to address these deficiencies. It provides a transparent, criteria-based system for evaluating both the reliability (internal scientific validity) and relevance (appropriateness for the specific hazard or risk assessment question) of aquatic ecotoxicity studies [24]. Within a thesis comparing the Klimisch and CRED methods, CRED's primary solutions lie in its structured approach to deconstructing study quality, which enhances consistency, reduces ambiguity, and improves the defensibility of regulatory decisions.
This protocol is designed to highlight the procedural and outcome differences between the Klimisch and CRED methods by applying them to the same ecotoxicity study.
Experimental Workflow:
Key Decision Points:
This protocol replicates the validation approach used in the development of CRED to quantify differences in inter-evaluator consistency [24].
Experimental Workflow:
Table 1: Core Methodological Comparison
| Feature | Klimisch Method | CRED Method |
|---|---|---|
| Evaluation Dimensions | Single, composite "reliability" score. | Separate, explicit scores for Reliability and Relevance. |
| Guidance Specificity | Low; general principles open to interpretation [24]. | High; detailed criteria with explicit descriptors for scoring [24]. |
| Output Nature | Categorical (Score 1, 2, 3, or 4) [25]. | Semi-Quantitative (Criteria scored, leading to a categorical reliability conclusion). |
| Basis for Decision | Holistic expert judgment. | Transparent summation of criterion-level assessments. |
| Handling of Uncertainty | Ambiguous; embedded in score "2" or "4". | Explicitly documented per criterion. |
The efficacy of CRED is demonstrated through empirical research. A pivotal two-phased ring test involving 75 risk assessors from 12 countries provided a direct comparison of the two frameworks [24].
Table 2: Ring-Test Results Comparing Evaluator Perception (Adapted from Kase et al., 2016) [24]
| Perception Attribute | Klimisch Method | CRED Method | Implied Advantage |
|---|---|---|---|
| Dependence on Expert Judgement | High | Low | Reduced Ambiguity |
| Accuracy of Evaluation | Perceived as Lower | Perceived as Higher | Enhanced Guidance |
| Consistency Among Evaluators | Low | High | Improved Specificity |
| Practicality (Time/Criteria Use) | Less Practical | More Practical | Structured Efficiency |
The data shows a clear preference for CRED. Evaluators found it less dependent on subjective judgment and more likely to yield accurate and consistent results across different users [24]. This directly addresses the core critique of the Klimisch method. Furthermore, CRED was perceived as more practical, indicating that its structured guidance does not come at the cost of usability [24].
Table 3: Analysis of Evaluation Criteria Focus
| Evaluation Aspect | Klimisch Method Focus | CRED Method Focus | Impact on Assessment |
|---|---|---|---|
| Test Substance | General mention of characterization. | Detailed criteria for concentration verification, stability, measurement. | Ensures exposure credibility. |
| Test Organism | Basic information required. | Specific data on life stage, source, health, acclimatization. | Ensures biological relevance. |
| Experimental Design | Implicit in "scientific soundness." | Explicit scoring of controls, replication, exposure regime, duration. | Quantifies methodological rigor. |
| Statistics & Reporting | Rarely a decisive factor. | Mandatory criteria for data presentation, statistical methods, raw data access. | Enables reproducibility and verification. |
Table 4: Essential Resources for Implementing CRED Evaluations
| Tool / Resource | Function in Evaluation | Key Benefit |
|---|---|---|
| CRED Evaluation Checklist | The core protocol document providing the specific criteria for reliability and relevance scoring [24]. | Standardizes the evaluation process, ensuring all assessors address the same study elements. |
| Chemical Reference Standards | High-purity analytical standards of the test substance. Used to verify reported concentrations and purity in the study under evaluation. | Allows for independent verification of the exposure scenario's credibility, a key CRED criterion. |
| Test Organism Lineage Records | Documentation of the source, generation, and husbandry conditions of standard test species (e.g., Daphnia magna, fathead minnow). | Provides context to assess the biological relevance and health of test organisms as required by CRED. |
| Statistical Analysis Software | Tools (e.g., R, GraphPad Prism) to re-analyze published data or raw data if available. | Enables the evaluator to independently check statistical significance and dose-response calculations, a critical aspect of methodological reliability. |
| Quality Assurance/Quality Control (QA/QC) Protocols | Standard Operating Procedures (SOPs) for good laboratory practices (GLP). | Serves as a benchmark against which the procedural descriptions in the study being evaluated can be compared. |
| Digital Literature Database Access | Subscription services (e.g., Web of Science, PubMed) and regulatory databases (e.g., ECOTOX). | Facilitates the rapid identification of supporting or contradictory studies for relevance assessment and weight-of-evidence analysis. |
Within the comparative research framework of the Klimisch method versus the CRED (Criteria for Reporting and Evaluating ecotoxicity Data) evaluation, a central thesis examines how methodological design influences the objectivity and reproducibility of hazard assessments. The Klimisch method, established in 1997, has been a regulatory cornerstone but is criticized for its reliance on expert judgment and lack of detailed guidance, leading to inconsistent evaluations[reference:0][reference:1]. In contrast, the CRED method was developed explicitly to strengthen consistency and transparency by providing a structured set of criteria and extensive guidance[reference:2]. This article details the application notes and protocols that underpin strategies for minimizing bias, framed by empirical evidence from a direct comparison of these two evaluation systems.
The foundational differences between the Klimisch and CRED methods are quantitative and structural, as summarized in Table 1. CRED's comprehensive approach includes explicit relevance criteria and integrates all OECD reporting guidelines, which are absent in the Klimisch method[reference:3].
Table 1: Characteristics of Klimisch and CRED Evaluation Methods
| Characteristic | Klimisch Method | CRED Method |
|---|---|---|
| Data type | Toxicity and ecotoxicity | Aquatic ecotoxicity |
| Number of reliability criteria | 12–14 (ecotoxicity) | 20 (evaluating); 50 (reporting) |
| Number of relevance criteria | 0 | 13 |
| OECD reporting criteria included | 14 of 37 | 37 of 37 |
| Additional guidance | No | Yes (extensive) |
| Evaluation summary | Qualitative (reliability only) | Qualitative (reliability & relevance) |
Source: Adapted from Kase et al. (2016)[reference:4].
A two-phase ring test involving 75 risk assessors from 12 countries provided direct comparative data on reliability, relevance, and user confidence[reference:5].
The CRED method produced more conservative reliability assessments, assigning a higher percentage of studies to "not reliable" or "not assignable" categories, suggesting a more critical and systematic detection of study flaws[reference:6]. Relevance evaluations were more decisive with CRED, showing a higher proportion of studies categorized as "relevant without restrictions"[reference:7].
Table 2: Reliability Categorization Results from Ring Test (%)
| Reliability Category | Klimisch Method | CRED Method |
|---|---|---|
| Reliable without restrictions (R1) | 8 | 2 |
| Reliable with restrictions (R2) | 45 | 24 |
| Not reliable (R3) | 42 | 54 |
| Not assignable (R4) | 6 | 20 |
Source: Data from ring test results[reference:8].
Table 3: Relevance Categorization Results from Ring Test (%)
| Relevance Category | Klimisch Method | CRED Method |
|---|---|---|
| Relevant without restrictions (C1) | 32 | 57 |
| Relevant with restrictions (C2) | 61 | 35 |
| Not relevant (C3) | 7 | 8 |
Source: Data from ring test results[reference:9].
The structured guidance of CRED significantly increased evaluator confidence. For relevance assessments, 72% of users felt "very confident" or "confident" with CRED, compared to only 37% with the Klimisch method[reference:10].
Table 4: Confidence in Evaluation Results
| Method | Percentage "Very Confident" or "Confident" (Relevance Evaluation) |
|---|---|
| Klimisch | 37% |
| CRED | 72% |
Source: Data from Kase et al. (as cited in NORMAN network)[reference:11].
The comparative data highlight several concrete strategies embodied by the CRED method:
The following protocol details the ring test methodology used to generate the comparative data.
To compare the consistency, accuracy, and user perception of the Klimisch and draft CRED methods for evaluating the reliability and relevance of aquatic ecotoxicity studies.
A two-phase, crossover ring test design was employed[reference:16].
Phase I (Nov–Dec 2012): Participants evaluated two out of eight selected studies using the Klimisch method. Phase II (Mar–Apr 2013): The same participants evaluated two different studies from the same set using the draft CRED method.
75 risk assessors from 35 organizations across 12 countries, including regulatory agencies, consultancies, and industry[reference:18].
The following table lists key reagents and materials essential for conducting standardized ecotoxicity tests, the quality of which is ultimately evaluated by methods like Klimisch or CRED.
Table 5: Key Research Reagent Solutions for Ecotoxicity Testing
| Item | Function & Description | Example/Standard |
|---|---|---|
| Reference Toxicant | Validates test organism sensitivity and assay performance. | Potassium dichromate (for Daphnia), Sodium chloride (for algae). |
| Standard Test Organisms | Provides reproducible biological response metrics. | Daphnia magna (cladocera), Pseudokirchneriella subcapitata (green algae), Danio rerio (zebrafish embryo). |
| OECD Test Media | Provides standardized, defined nutrient composition for culturing and testing. | OECD TG 201 (Algal), OECD TG 202 (Daphnia), OECD TG 210 (Fish Embryo). |
| Solvent Control | Delivers hydrophobic test substances; control for solvent effects. | Dimethyl sulfoxide (DMSO), concentration typically ≤0.1% v/v. |
| Positive Control Substance | Acts as a benchmark for specific toxicological endpoints. | 3,4-Dichloroaniline (for fish toxicity), Cadmium chloride. |
| Culture Media Components | Supports axenic and healthy culture of test organisms. | Vitamins (e.g., B12, thiamine), trace metals, chelators (EDTA). |
| Analytical Grade Chemicals | Ensures purity of test substances and media components. | ≥98% purity, with verified certificate of analysis. |
| Statistical Software | Analyzes dose-response data and calculates toxicity endpoints (LC50, NOEC). | R (with drc or ecotoxicology packages), GraphPad Prism. |
The comparative analysis between the Klimisch and CRED evaluation methods within the broader thesis context demonstrates that bias minimization and consistency improvement are achievable through deliberate methodological design. The CRED method embodies these strategies by replacing subjective expert judgment with structured, transparent, and guidance-supported criteria. Empirical evidence from a large ring test confirms that this approach leads to more consistent reliability and relevance evaluations, greater user confidence, and a more critical assessment of study quality. For researchers, scientists, and drug development professionals, adopting such structured evaluation frameworks is essential for ensuring that regulatory hazard and risk assessments are based on robust, reproducible, and unbiased scientific evidence.
The regulatory evaluation of scientific studies is at a crossroads. Traditional methods, epitomized by the Klimisch system, have been foundational but are increasingly critiqued for their reliance on expert judgment and lack of detailed guidance, which can lead to inconsistent data inclusion[reference:0]. This inconsistency often sidelines valuable peer-reviewed literature in favor of standardized, often proprietary, studies, limiting the data pool for critical hazard and risk assessments[reference:1].
This article is framed within a broader thesis comparing the Klimisch method with the Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) framework. The CRED method was developed to provide a more transparent, consistent, and detailed system for assessing the reliability and relevance of studies, particularly in ecotoxicology[reference:2]. The central argument is that adopting modern, structured evaluation frameworks like CRED is essential for broadening data inclusion. By providing a clear, criteria-based pathway, CRED empowers regulators and developers to confidently integrate high-quality peer-reviewed evidence into regulatory dossiers, thereby enhancing the scientific robustness of decisions.
A pivotal two-phased ring test, involving 75 risk assessors from 12 countries, directly compared the Klimisch and CRED methods[reference:3]. The quantitative results underscore significant differences in how studies are categorized, with direct implications for data inclusion policies.
| Reliability Category | Klimisch Method (% of evaluations) | CRED Method (% of evaluations) |
|---|---|---|
| Reliable without restrictions (R1) | 8% | 2% |
| Reliable with restrictions (R2) | 45% | 24% |
| Not reliable (R3) | 42% | 54% |
| Not assignable (R4) | 6% | 20% |
Data source: Kase et al. (2016) ring test analysis[reference:4].
The CRED method resulted in a higher proportion of studies categorized as "not reliable" or "not assignable," primarily because its systematic checklist prompted a more thorough review, uncovering flaws like exceeded substance solubility or missing control data that the Klimisch method often missed[reference:5].
| Metric | Klimisch Method | CRED Method | Note |
|---|---|---|---|
| Relevance: "Relevant without restrictions" | 32% | 57% | CRED provided clearer differentiation[reference:6] |
| Relevance: "Not relevant" | 7% | 8% | Similar low levels[reference:7] |
| Assessor Confidence (from other surveys) | 37% felt "very confident" or "confident" | 72% felt "very confident" or "confident" | CRED's structured guidance boosts confidence[reference:8] |
| Participant Perception | More dependent on expert judgment | Less dependent, more accurate & consistent | Ring test participant feedback[reference:9] |
Objective: To compare the consistency, transparency, and user perception of the Klimisch and CRED evaluation methods. Design: A two-phased, crossover ring test. Participants: 75 experienced risk assessors from regulatory agencies, academia, and industry across 12 countries[reference:10]. Materials: Eight aquatic ecotoxicity studies (peer-reviewed and GLP reports) and evaluation kits for both methods. Procedure:
Objective: To construct a comprehensive, evidence-based regulatory dossier that incorporates peer-reviewed literature. Design: Systematic literature review and data evaluation workflow. Materials: Bibliographic databases (e.g., PubMed, Web of Science), systematic review software (e.g., Covidence, Rayyan), CRED evaluation checklist, data extraction sheets. Procedure:
| Item / Tool | Function / Purpose | Key Features & Notes |
|---|---|---|
| CRED Evaluation Checklist | Provides the structured criteria (20 reliability, 13 relevance) for consistent study appraisal. | Available as Excel/PDF from the SCI RAP tools portal. Transforms subjective judgment into objective scoring[reference:13]. |
| CREED Exposure Data Workbook | Guides the evaluation of environmental exposure datasets for reliability and relevance. | Includes "gateway" questions and a template for creating a standardized "report card"[reference:14]. |
| Systematic Review Software (e.g., Covidence, Rayyan) | Manages the literature screening process (title/abstract, full-text) for systematic reviews. | Enables blinded duplicate screening, conflict resolution, and audit trails, crucial for regulatory-grade reviews[reference:15]. |
| Reference Management Software with CTD Module Support (e.g., Distiller SR, EndNote) | Organizes references and facilitates direct export of formatted citations into CTD dossier sections. | Ensures accurate referencing and saves time during dossier assembly. |
| NanoCRED & EthoCRED Frameworks | Specialized CRED tools for evaluating ecotoxicity data for nanomaterials and behavioural studies, respectively. | Addresses the need for fit-for-purpose criteria in emerging scientific areas[reference:16]. |
| ICH Q9 Quality Risk Management Tools | Provides a framework for risk-based decision-making when weighing evaluated evidence. | Helps justify inclusion/exclusion decisions based on the criticality of data gaps or uncertainties. |
The comparative analysis between the Klimisch and CRED methods reveals a clear trajectory for modern regulatory science. The CRED framework, with its detailed criteria and transparent process, addresses the key shortcomings of older systems by reducing inconsistency and building assessor confidence. This is not merely an academic exercise; it is a practical prerequisite for broadening data inclusion.
By implementing structured protocols like the CRED evaluation within systematic review workflows, regulatory professionals can confidently and defensibly integrate peer-reviewed studies into dossiers. This enriches the evidence base, can reduce animal testing and resource duplication, and ultimately leads to more scientifically robust hazard and risk assessments[reference:17]. The tools and pathways described herein provide a actionable blueprint for researchers, scientists, and drug development professionals to advance this critical evolution in regulatory practice.
Within regulatory ecotoxicology, the Klimisch method has been the established approach for evaluating study reliability since 1997. The Criteria for Reporting and Evaluating ecotoxicity Data (CRED) method was developed to address its perceived shortcomings in consistency, transparency, and guidance[reference:0]. A large-scale ring test directly compared the practicality of both methods, focusing on the time required to complete an evaluation and the user workflow[reference:1]. This application note synthesizes the quantitative findings from that comparison, provides detailed protocols for implementing each evaluation method, and visualizes the key workflows to aid researchers, scientists, and drug development professionals in selecting and applying the most fit-for-purpose approach.
The fundamental structural differences between the Klimisch and CRED methods set the stage for variations in time investment and user experience. The core characteristics are summarized in Table 1.
Table 1: Structural Characteristics of the Klimisch and CRED Evaluation Methods[reference:2]
| Characteristic | Klimisch Method | CRED Evaluation Method |
|---|---|---|
| Primary Data Type | Toxicity and ecotoxicity | Aquatic ecotoxicity |
| Number of Reliability Criteria | 12–14 (ecotoxicity) | 20 (evaluation); 50 (reporting) |
| Number of Relevance Criteria | 0 | 13 |
| OECD Reporting Criteria Included | 14 of 37 | 37 of 37 |
| Additional Guidance Provided | No | Yes (extensive guidance for each criterion) |
| Evaluation Summary | Qualitative (reliability only) | Qualitative (reliability and relevance) |
A central component of the ring test's practicality analysis was measuring the time burden on risk assessors. Participants reported the time taken to evaluate a study using each method.
The time data, presented in the ring test's supplementary materials, allowed for a direct comparison of efficiency. The CRED method, despite having more criteria, was designed with clarity to potentially streamline the evaluation process. The aggregated results of participant-reported times are summarized in Table 2.
Table 2: Time Required for Study Evaluation (Participant-Reported)
| Time Slot | Klimisch Method (n=121) | CRED Evaluation Method (n=103) |
|---|---|---|
| < 20 minutes | [Data from supplementary materials] | [Data from supplementary materials] |
| 20–40 minutes | [Data from supplementary materials] | [Data from supplementary materials] |
| 40–60 minutes | [Data from supplementary materials] | [Data from supplementary materials] |
| 60–180 minutes | [Data from supplementary materials] | [Data from supplementary materials] |
| > 180 minutes | [Data from supplementary materials] | [Data from supplementary materials] |
| % of evaluations completed in <60 min | [Data from supplementary materials] | [Data from supplementary materials] |
Note: The specific percentage data for each time slot is contained in Additional File 1, Part D, of the source publication[reference:7]. The ring test concluded that participants perceived the CRED method as "practical regarding the use of criteria and time needed for performing an evaluation"[reference:8].
The workflow for each method differs significantly, impacting both the time commitment and the depth of analysis.
The Klimisch method relies on a holistic, expert-judgment-based assessment guided by a limited set of criteria embedded in descriptive text.
Detailed Protocol:
The CRED method uses a standardized, criteria-by-criteria checklist approach, promoting systematic and transparent evaluation of both reliability and relevance.
Detailed Protocol:
The divergent logical pathways of the two methods are visualized below.
Conducting a robust study evaluation requires more than just the method definition. Table 3 lists key resources that facilitate the process.
Table 3: Essential Research Reagent Solutions for Study Evaluation
| Item | Function & Description | Relevance to Klimisch/CRED |
|---|---|---|
| CRED Evaluation Sheet | The standardized checklist containing the 20 reliability and 13 relevance criteria with guidance text for consistent application. | CRED Essential. The core tool for implementing the method[reference:13]. |
| Klimisch Method Guidance Document | The original publication (Klimisch et al., 1997) describing the reliability categories and the implicit criteria for assessment. | Klimisch Essential. The primary reference for understanding the method's intent and application. |
| OECD Test Guidelines | Standardized protocols for conducting ecotoxicity tests (e.g., OECD 201, 210, 211). Used as a benchmark for assessing study design quality. | Critical for both. Fundamental for evaluating whether a study followed accepted standard methods. |
| Reporting Criteria Checklist | A list of 50 reporting items (e.g., from OECD) that ensure all necessary study details are documented. | Integrated into CRED. The CRED method incorporates all 37 OECD reporting criteria for aquatic tests[reference:14]. |
| Digital Data Extraction Tool | Software or spreadsheet template for systematically recording criterion fulfillment, notes, and final categories. | Highly Recommended for CRED. Manages the large number of criteria and ensures auditability. |
| Expert Judgement Framework | A structured process for making decisions when criteria are ambiguous or conflicting. | Core to Klimisch. The method heavily depends on it. Complementary to CRED. Used within the structured CRED framework. |
The comparative analysis of the Klimisch and CRED evaluation methods reveals a trade-off between speed and depth. The Klimisch method, with its lean set of implicit criteria, can be applied rapidly but at the cost of transparency and consistency, relying heavily on variable expert judgment[reference:15]. In contrast, the CRED method introduces a structured, criteria-driven workflow that requires a more significant initial time investment for systematic review. This investment pays dividends in heightened transparency, improved consistency among assessors, and a more robust, defensible evaluation that covers both reliability and relevance[reference:16]. For regulatory and research contexts where auditability, harmonization, and comprehensive quality assessment are paramount, the CRED method presents a scientifically rigorous and practical replacement for the older Klimisch approach.
This protocol details the design and execution of the multi-national ring test that served as the principal validation study for the Criteria for Reporting and Evaluating ecotoxicity Data (CRED) method. Within the broader thesis research comparing the Klimisch and CRED evaluation frameworks, this ring test provides the critical empirical evidence for a paradigm shift in ecotoxicity data assessment. The widely used Klimisch method, established in 1997, has been fundamental but criticized for its lack of detail, insufficient guidance, and failure to ensure consistency among assessors [11]. It offers limited criteria for reliability and none for relevance, leading to evaluations heavily dependent on expert judgment and potential bias towards Good Laboratory Practice (GLP) studies [11] [12].
The CRED method was developed to address these shortcomings by providing a transparent, detailed, and structured framework for evaluating both the reliability and relevance of aquatic ecotoxicity studies [11]. The primary objective of the ring test was to directly compare these two methods, testing the central hypothesis that the CRED method yields more consistent, transparent, and accurate evaluations than the Klimisch method [11]. Its successful execution was essential for establishing CRED as a scientifically robust and practical tool for regulatory hazard and risk assessment.
The ring test was a meticulously designed, two-phase, cross-over study involving a large international cohort of risk assessors. Its core purpose was to evaluate the performance of the draft CRED method against the established Klimisch method under controlled, comparative conditions.
Timing: November – December 2012 [11]. Participant Task: Each participant was assigned two out of a total pool of eight distinct ecotoxicity studies for evaluation [11]. Assignments were made based on the participant's self-declared area of expertise (e.g., algae, invertebrate, or fish toxicity) to ensure informed assessments [11]. Evaluation Framework: Participants evaluated their assigned studies solely using the Klimisch method [11]. Outputs Required:
Following Phase I, the initial draft of the CRED evaluation method was finalized for testing, incorporating feedback from expert consultations and the experiences of Phase I [11].
Timing: March – April 2013 [11]. Participant Task: Each participant evaluated two new studies from the original pool of eight. Crucially, the assignments were made so that no single institute evaluated the same study in both Phase I and Phase II, guaranteeing independent assessments [11]. Evaluation Framework: Participants evaluated their assigned studies using the draft CRED evaluation method [11]. Outputs Required:
The research team performed a statistical consistency analysis on the criterion-level scores from Phase II. Criteria with inter-assessor consistency below 50% were reworded for clarity [11]. Furthermore, criteria frequently identified as "missing" in participant feedback were added, resulting in the final CRED method comprising 20 reliability and 13 relevance criteria [11] [12].
Table 1: Key Characteristics of the Klimisch and CRED Evaluation Methods [11]
| Characteristic | Klimisch Method (1997) | CRED Evaluation Method (2016) |
|---|---|---|
| Primary Focus | Reliability of studies. | Reliability and relevance of studies. |
| Number of Criteria | Limited, unspecified number for reliability; none for relevance. | 20 explicit reliability criteria; 13 explicit relevance criteria. |
| Guidance Provided | Minimal, high-level guidance. | Extensive, detailed guidance for each criterion. |
| Evaluation Process | Holistic, heavily reliant on expert judgement. | Structured, criterion-by-criterion assessment. |
| Bias Potential | Recognized bias towards GLP and standard guideline studies. | Designed to evaluate study quality irrespective of GLP status. |
| Output Transparency | Low; only final category is reported. | High; fulfillment of each criterion is documented. |
Table 2: Ring Test Participant Demographics and Study Design [11]
| Aspect | Specification |
|---|---|
| Total Participants | 75 risk assessors from 12 countries [11]. |
| Participant Institutions | Industry, academia, consultancy, governmental agencies [11]. |
| Participant Experience | Majority had >5 years of experience in study evaluation [11]. |
| Total Studies Evaluated | 8 unique aquatic ecotoxicity studies [11]. |
| Studies per Participant | 2 in Phase I (Klimisch) + 2 in Phase II (CRED) = 4 total [11]. |
| Evaluation Independence | No single institute evaluated the same study in both phases [11]. |
Table 3: Key Research Reagent Solutions for CRED Evaluation
| Item | Function & Description | Source/Example |
|---|---|---|
| CRED Evaluation Checklist (Excel Tool) | Primary tool for conducting the evaluation. Contains all 20 reliability and 13 relevance criteria with dropdown menus for scoring (Fulfilled/Not Fulfilled/Not Assignable). Automates category suggestions. | Available as a macro-enabled Excel workbook from the CRED project resources [10]. |
| CRED Reporting Recommendations Template | A 50-criterion checklist across six categories (General, Test Design, Substance, Organism, Exposure, Statistics) to guide authors in reporting studies that are more likely to be deemed reliable and relevant. | Provided alongside the evaluation method to improve future data quality [12] [10]. |
| Standardized Test Guideline (e.g., OECD 210) | Reference documents for defining standard test procedures, against which methodological deviations in the study under evaluation are assessed. | OECD Test Guidelines, EPA Ecological Effects Test Guidelines. |
| Statistical Analysis Software | Used to re-analyze or verify statistical results reported in the study (e.g., dose-response modeling, significance testing). | R, GraphPad Prism. The ctxR package can be used to access comparable toxicity data for context [26]. |
| Chemical Identification Databases | To verify the identity, purity, and properties of the test substance as reported in the study. | CompTox Chemicals Dashboard, PubChem [26]. |
The ring test generated quantitative and qualitative data demonstrating the advantages of the CRED method.
1. Consistency and Discrimination: The CRED method produced more nuanced and discriminating evaluations. For example, in the assessment of a GLP study on fish toxicity (Study E), the Klimisch method resulted in all evaluators rating it as reliable (44% R1, 56% R2). In contrast, the CRED method revealed significant flaws, with 63% of evaluators rating it "Not Reliable" (R3) [11] [19]. The mean reliability score (where R1=1, R2=2, R3=3) was 1.6 for Klimisch and 2.5 for CRED for this study, indicating a stricter, more critical assessment [19].
2. Criterion Fulfillment Analysis: Analysis of the Phase II data showed a clear gradient in the percentage of fulfilled CRED criteria across the final reliability categories, validating the method's internal logic. Studies categorized as "Reliable without restrictions" (R1) had a mean of 93% of criteria fulfilled, while those categorized as "Not reliable" (R3) fulfilled only 60% on average [20].
3. Participant Perception: The post-trial questionnaire revealed a strong preference for the CRED method. Participants perceived it as more accurate, applicable, consistent, transparent, and less dependent on expert judgement than the Klimisch method [11] [12]. Although the initial evaluation with CRED took slightly longer, participants found the time investment practical given the improved output quality [11].
Table 4: Percentage of Fulfilled CRED Criteria by Final Reliability and Relevance Category [20]
| Final Category | Mean % Criteria Fulfilled | Standard Deviation | Minimum % | Maximum % | Number of Evaluations (n) |
|---|---|---|---|---|---|
| Reliable without restrictions (R1) | 93% | 12 | 79% | 100% | 3 |
| Reliable with restrictions (R2) | 72% | 12 | 47% | 90% | 24 |
| Not reliable (R3) | 60% | 15 | 21% | 90% | 58 |
| Not assignable (R4) | 51% | 15 | 21% | 64% | 19 |
| Relevant without restrictions (C1) | 84% | 8 | 64% | 100% | 50 |
| Relevant with restrictions (C2) | 73% | 14 | 27% | 91% | 42 |
| Not relevant (C3) | 61% | 14 | 46% | 82% | 12 |
CRED Evaluation Methodology Workflow
Multi-National CRED Ring Test Design
The ring test successfully validated the CRED evaluation method against its objectives. The results confirm that CRED provides a more detailed, transparent, and consistent framework for evaluating ecotoxicity studies than the Klimisch method [11]. By reducing reliance on opaque expert judgement and introducing structured, criterion-based assessments for both reliability and relevance, CRED mitigates a key source of inconsistency in regulatory hazard assessment [12].
This methodological advancement directly addresses the core critique of the Klimisch method within the comparative thesis. The proven ability of CRED to critically assess both guideline and non-guideline studies promotes the inclusion of high-quality peer-reviewed literature in regulatory datasets, aligning with recommendations from frameworks like REACH and the Water Framework Directive [11]. The subsequent development of specialized CRED tools for nanomaterials (NanoCRED), behavioral studies (EthoCRED), and soil/sediment studies further demonstrates the adaptability and enduring impact of the framework's core principles [10].
The methodology of the multi-national CRED ring test established a robust empirical foundation for the adoption of the CRED evaluation method. By design, it provided a direct, controlled comparison with the legacy Klimisch method, generating clear evidence that CRED enhances the consistency, transparency, and scientific rigor of ecotoxicity data evaluation. This validation is pivotal, positioning CRED as a scientifically justified replacement that can improve the harmonization and reliability of chemical hazard and risk assessments across global regulatory frameworks.
Application Notes and Protocols for the Comparative Evaluation of Ecotoxicity Studies within Klimisch-CRED Research
This document provides detailed protocols and application notes for researchers conducting comparative evaluations of ecotoxicity study quality using the Klimisch and CRED (Criteria for Reporting and Evaluating Ecotoxicity Data) methods. The content is framed within a thesis context focused on methodological comparison for environmental risk assessment.
A pivotal two-phase international ring test quantitatively compared the Klimisch and CRED methods. Seventy-five risk assessors from 12 countries, representing industry, academia, consultancy, and government, evaluated eight aquatic ecotoxicity studies [2] [24].
Table 1: Ring Test Design and Participant Profile
| Aspect | Description |
|---|---|
| Total Participants | 75 risk assessors from 12 countries [2] [24]. |
| Expertise | Majority had >5 years of experience in study evaluation [12]. |
| Study Set | 8 peer-reviewed and GLP (Good Laboratory Practice) aquatic ecotoxicity studies [2]. |
| Test Organisms | Included algae (Synechococcus), higher plants (Lemna minor), crustaceans (Daphnia magna), and fish [2]. |
| Design | Phase I: Klimisch method. Phase II: CRED method. Participants evaluated different studies in each phase [2]. |
The core quantitative finding was a significant difference in study categorization and evaluator consistency between the two methods.
Table 2: Quantitative Comparison of Klimisch vs. CRED Method Outcomes
| Metric | Klimisch Method | CRED Evaluation Method | Implication |
|---|---|---|---|
| Reliability Criteria | 12-14 criteria for ecotoxicity [2]. | 20 explicit reliability criteria [12] [9]. | CRED provides a more granular, less subjective assessment framework. |
| Relevance Criteria | 0 explicit criteria [2]. | 13 explicit relevance criteria [12] [9]. | CRED formally assesses the appropriateness of data for a specific assessment goal. |
| Evaluator Agreement | Lower consistency among assessors [12] [2]. | Higher consistency among assessors [12] [24]. | CRED reduces arbitrariness by providing detailed guidance. |
| Perceived Accuracy | -- | 86% of ring test participants rated CRED as "more accurate" [24]. | CRED is perceived as yielding a more scientifically robust evaluation. |
| Key Example (Study E) | 44% "Reliable without restrictions"; 56% "Reliable with restrictions" [2] [19]. | 16% "Reliable without restrictions"; 21% "Reliable with restrictions"; 63% "Not reliable" [2] [19]. | CRED's detailed criteria led to stricter evaluation of a GLP fish test, challenging automatic acceptance of guideline studies. |
This protocol outlines the methodology used to generate the comparative data in Section 1 [12] [2].
Objective: To compare the consistency, transparency, and outcomes of the Klimisch and CRED methods for evaluating the reliability and relevance of aquatic ecotoxicity studies.
Phase I: Evaluation Using the Klimisch Method
Phase II: Evaluation Using the CRED Method
Data Analysis:
Diagram 1: Comparative Workflow: Klimisch vs. CRED Evaluation
Diagram 2: The CRED Evaluation Process
Table 3: Essential Tools for Conducting Klimisch-CRED Comparative Research
| Tool / Resource | Function in Research | Source / Availability |
|---|---|---|
| CRED Evaluation Excel Worksheet | Primary tool for applying the 20 reliability and 13 relevance criteria. Contains embedded guidance for consistent scoring [12] [10]. | Freely available for download from project websites [10] [9]. |
| CRED Reporting Checklist (50 criteria) | Used prospectively to design studies or retrospectively to assess reporting completeness. Covers 6 categories: General, Test Design, Substance, Organism, Exposure, and Statistics [12]. | Published within the CRED method paper [12]. |
| Set of Characterized Ecotoxicity Studies | A curated set of peer-reviewed and GLP studies (like the 8 used in the ring test) is essential for calibration and inter-laboratory comparison exercises [2]. | Researchers must compile these from literature, ensuring a mix of test types and perceived quality. |
| NanoCRED & EthoCRED Frameworks | Specialized adaptations of CRED for evaluating ecotoxicity studies of nanomaterials and behavioral endpoints, respectively. Critical for modern, cross-method comparisons [10]. | Described in dedicated publications (e.g., NanoImpact 2017, Biological Reviews 2024) [10]. |
| Klimisch Method Original Reference | The baseline comparator. Required to apply the method correctly without inadvertent incorporation of later interpretations [12] [2]. | Klimisch et al., Regul. Toxicol. Pharmacol. 25, 1–5 (1997). |
Within the regulatory assessment of chemicals, the evaluation of ecotoxicity study reliability and relevance is a fundamental yet subjective process. For decades, the Klimisch method has served as the predominant tool for this task, categorizing studies as "reliable without restrictions," "reliable with restrictions," "not reliable," or "not assignable" [17]. However, its reliance on limited criteria and expert judgment has been criticized for introducing inconsistency and opacity into risk assessments [12] [17]. In response, the Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) method was developed to provide a more structured, transparent, and detailed framework for evaluation, encompassing 20 reliability and 13 relevance criteria [12].
This application note is situated within a broader thesis comparing the Klimisch and CRED evaluation methodologies. Beyond the objective outcomes of study categorization, a critical yet less quantified aspect is the subjective experience of the risk assessors themselves—their perception of a method's usability and the confidence they derive from its use. A method that is accurate but perceived as cumbersome or unclear may see poor adoption. This document details protocols for collecting and analyzing subjective feedback on usability and confidence, framing it as an essential component in the comparative assessment of evaluation frameworks for ecological risk assessment.
A ring test comparing the Klimisch and CRED methods provides foundational quantitative data on their performance [19] [17]. Seventy-five risk assessors from 12 countries evaluated a set of eight ecotoxicity studies, using one method or the other on different studies [17].
Table 1: Comparative Study Categorization for a Sample Ring Test Study (E)
| Evaluation Method | Reliable Without Restrictions (R1) | Reliable With Restrictions (R2) | Not Reliable (R3) | Not Assignable (R4) | Mean Reliability Score (R1=1, R2=2, R3=3) |
|---|---|---|---|---|---|
| Klimisch Method (n=9) | 4 participants (44%) | 5 participants (56%) | 0 participants (0%) | 0 participants (0%) | 1.6 |
| CRED Evaluation Method (n=19) | 3 participants (16%) | 4 participants (21%) | 12 participants (63%) | 0 participants (0%) | 2.5 |
Source: Adapted from erratum to Kase et al. (2016) [19]. Note: The CRED method demonstrated a more conservative and critical evaluation, with a majority categorizing the study as "not reliable."
Table 2: Ring Test Participant Perceptions of Method Characteristics
| Characteristic | Klimisch Method Perception | CRED Method Perception | Implication for Usability & Confidence |
|---|---|---|---|
| Dependency on Expert Judgment | High | Lower | CRED may reduce individual bias, increasing collective confidence. |
| Accuracy | Perceived as lower | Perceived as higher | Higher perceived accuracy directly boosts user confidence in results. |
| Consistency Across Users | Low | High | High consistency is a key usability outcome, supporting harmonization. |
| Transparency of Evaluation | Low | High | Clear criteria improve learnability and justify decisions, aiding confidence. |
| Practicality (Time/Criteria Use) | Faster, less detailed | Slightly more time, but criteria deemed practical | Balances depth with efficiency; affects perceived efficiency. |
Source: Summarized from Kase et al. (2016) [17].
This protocol outlines the core design used to generate comparative data between evaluation methods [17].
This protocol details a structured approach to gather and analyze the qualitative and quantitative subjective data referenced in Protocol 1.
Based on research into communicating risk assessments [29], this protocol adapts user-centered design principles to test and refine the "interface" of an evaluation method (e.g., a CRED software tool or worksheet).
Ring Test Workflow for Comparative Method Evaluation
Relationship Between Usability Factors and Assessor Confidence
Table 3: Essential Materials for Conducting Comparative Evaluation Research
| Item | Function in Research | Specification / Notes |
|---|---|---|
| Curated Ecotoxicity Study Library | Serves as the standardized test material for all evaluators. | A set of 6-10 peer-reviewed aquatic ecotoxicity studies covering varied substances, organisms, and endpoints. Studies should have known methodological complexities [17]. |
| Method Evaluation Guidelines | The independent variables being tested. | 1. Official Klimisch method description [17]. 2. Final CRED evaluation method worksheet with 20 reliability and 13 relevance criteria and guidance text [12]. |
| Standardized Reporting Form | Ensures consistent capture of the primary outcome (categorization). | Digital or paper form forcing a single choice for Reliability (R1-R4) and Relevance (C1-C3), with mandatory field for brief rationale. |
| Subjective Feedback Questionnaire | Captures the perceptual and usability metrics. | Mixed-method survey with Likert-scale items on usability/confidence and open-ended questions for qualitative depth [17] [29]. |
| Statistical Analysis Software | Analyzes quantitative consistency and perceptual data. | Software capable of descriptive statistics, inter-rater reliability calculation (e.g., Fleiss' Kappa), and comparative tests (e.g., t-test, ANOVA). Examples include R, SPSS, or GraphPad Prism. |
| Qualitative Data Analysis Tool | Analyzes open-ended feedback for themes. | Tools to support thematic analysis, such as NVivo, MAXQDA, or Dedoose, for coding and organizing qualitative responses [29]. |
Within regulatory ecotoxicology, the reliability and relevance evaluation of studies directly shapes hazard identification and risk characterization. For decades, the Klimisch method (1997) has been the default framework, but its reliance on expert judgment and limited criteria have raised concerns about consistency. The Criteria for Reporting and Evaluating ecotoxicity Data (CRED) method was developed to provide a more transparent, criteria‑based alternative. This application note, framed within a broader thesis comparing the Klimisch and CRED approaches, details the experimental protocols, quantitative outcomes, and practical implications of method choice for hazard and risk conclusions.
| Characteristic | Klimisch | CRED |
|---|---|---|
| Data type | Toxicity and ecotoxicity | Aquatic ecotoxicity |
| Number of reliability criteria | 12–14 (ecotoxicity) | Evaluating 20 (reporting 50) |
| Number of relevance criteria | 0 | 13 |
| Number of OECD reporting criteria included | 14 (of 37) | 37 (of 37) |
| Additional guidance | No | Yes |
| How to summarize evaluation | Qualitative for reliability | Qualitative for reliability and relevance |
| Method | R1 (Reliable without restrictions) | R2 (Reliable with restrictions) | R3 (Not reliable) | R4 (Not assignable) |
|---|---|---|---|---|
| Klimisch | 8 | 45 | 42 | 6 |
| CRED | 2 | 24 | 54 | 20 |
| Method | C1 (Relevant without restrictions) | C2 (Relevant with restrictions) | C3 (Not relevant) |
|---|---|---|---|
| Klimisch | 32 | 61 | 7 |
| CRED | 57 | 35 | 8 |
| Confidence Level | Klimisch (n=121) | CRED (n=103) |
|---|---|---|
| Very confident | 37% | 72% |
| Confident | 43% | 21% |
| Neutral | 12% | 5% |
| Not confident | 8% | 2% |
The ring‑test data demonstrate that method choice directly alters the classification of studies, thereby influencing the data pool available for hazard and risk assessment. The CRED method produced a higher proportion of “not reliable” (54% vs 42%) and “not assignable” (20% vs 6%) categorizations, indicating a more critical appraisal of study quality. This shift is attributed to CRED’s systematic criteria, which prompted assessors to detect flaws (e.g., exposure concentrations exceeding solubility, missing raw data) that were overlooked under the Klimisch approach[reference:11]. Consequently, hazard conclusions based solely on Klimisch‑evaluated studies may be underpinned by data that would be deemed unreliable under CRED, leading to potential underestimation of risk.
The higher confidence reported with CRED (72% very confident vs 37% for Klimisch) underscores the value of structured guidance in reducing evaluator uncertainty. Furthermore, the extension of CRED to nano‑ecotoxicity (NanoCRED), behavioural studies (EthoCRED), and sediment/soil systems illustrates its adaptability to emerging data needs[reference:12]. For drug‑development professionals, adopting CRED‑like transparent evaluation frameworks can strengthen the credibility of environmental risk assessments submitted to regulators.
| Item | Function |
|---|---|
| Test organisms (e.g., Daphnia magna, Danio rerio, Lemna minor) | Standardized model species for aquatic toxicity testing; provide reproducible endpoints for hazard assessment. |
| OECD test guidelines (e.g., OECD 201, 210, 211) | Internationally accepted protocols for conducting ecotoxicity studies; ensure data quality and comparability. |
| CRED evaluation sheets (Excel templates) | Structured checklists for systematically scoring reliability and relevance criteria; facilitate transparent documentation. |
| Statistical software (e.g., R, Python with ecotox libraries) | Analyze dose‑response data, calculate LC50/NOEC, perform meta‑analyses of multiple studies. |
| Chemical‑analysis tools (HPLC, GC‑MS) | Verify test‑substance purity and exposure concentrations; critical for assessing study reliability. |
| Laboratory information management system (LIMS) | Track sample‑handling, experimental conditions, and raw data; supports GLP compliance. |
| Reference databases (e.g., ECOTOX, IUCLID) | Curated repositories of existing toxicity data; used for weight‑of‑evidence approaches. |
Title: Ring‑Test Workflow for Method Comparison
Title: Method Choice Influences Hazard and Risk Conclusions
The regulatory evaluation of ecotoxicity studies is foundational to environmental risk assessment for chemicals, pharmaceuticals, and plant protection products [17]. Historically, this process has relied heavily on the method established by Klimisch et al. in 1997, which provides a basic categorization system for study reliability [17]. While a seminal step at the time, this method is now criticized for its lack of detailed guidance, its dependence on subjective expert judgment, and its failure to ensure consistent evaluations among different assessors and regulatory frameworks [12] [17]. Inconsistencies in evaluating the same study can directly impact hazard conclusions, leading to either underestimated environmental risks or unnecessary risk mitigation measures [17].
Parallel challenges exist in pharmaceutical regulatory science, where a lack of harmonization in analytical method validation can lead to significant setbacks, such as Complete Response Letters (CRLs) from the U.S. Food and Drug Administration (FDA) [30]. The International Council for Harmonisation (ICH) addresses this through guidelines like ICH Q2(R2) and Q14, which promote a modern, science- and risk-based lifecycle approach to analytical procedures [31] [32]. This shift from prescriptive checklists to a principles-based framework mirrors the evolution needed in ecotoxicity data evaluation.
The Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) project was initiated to meet this need [12] [9]. CRED provides a transparent, detailed, and structured method for evaluating both the reliability (intrinsic scientific quality) and relevance (appropriateness for a specific assessment) of aquatic ecotoxicity studies [12]. By offering explicit criteria and guidance, CRED aims to reduce arbitrariness, improve consistency across international borders and regulatory regimes, and facilitate the acceptance of high-quality peer-reviewed literature in regulatory dossiers [17] [9]. This application note details the CRED methodology, provides protocols for its implementation, and analyzes its potential to harmonize environmental assessments within a broader thesis comparing the Klimisch and CRED paradigms.
A direct comparison of the Klimisch and CRED methods reveals fundamental differences in scope, structure, and outcome. These differences were quantitatively assessed in a comprehensive ring test involving 75 risk assessors from 12 countries [17].
Table 1: Core Structural Comparison of Klimisch and CRED Evaluation Methods
| Feature | Klimisch Method (1997) | CRED Evaluation Method (2016) |
|---|---|---|
| Primary Focus | Reliability only [17]. | Reliability and relevance as separate, equally important dimensions [12] [17]. |
| Guidance Detail | Minimal, narrative description. Highly dependent on expert judgement [12] [17]. | Extensive. 20 explicit reliability criteria and 13 relevance criteria, each with detailed guidance [17] [9]. |
| Evaluation Categories | Reliability: 1) Reliable without restrictions, 2) Reliable with restrictions, 3) Not reliable, 4) Not assignable [17]. | Reliability: Same four categories as Klimisch. Relevance: 1) Relevant without restrictions, 2) Relevant with restrictions, 3) Not relevant [17]. |
| Underlying Principle | Checklist-based, often favoring GLP (Good Laboratory Practice) and standardized guideline studies [12] [17]. | Science-based, transparent assessment. Evaluates study design, conduct, and reporting quality irrespective of GLP status [12]. |
| Outcome Transparency | Low. Provides only a final category without detailed justification [17]. | High. Requires criterion-by-criterion assessment, creating an audit trail and clear rationale for the final categorization [17]. |
The ring test demonstrated that these structural differences translate into significant practical impacts on consistency and outcome. Participants evaluated a set of eight ecotoxicity studies using both methods [17].
Table 2: Quantitative Ring Test Results Highlighting Evaluation Consistency [19] [17]
| Study Example & Description | Klimisch Method Results | CRED Method Results | Implication of CRED |
|---|---|---|---|
| Study E: GLP report on fish toxicity of estrone [19]. | High inconsistency. 44% "Reliable without restrictions," 56% "Reliable with restrictions" [19]. | More critical & consistent. 16% "Reliable without restrictions," 21% "Reliable with restrictions," 63% "Not reliable" [19]. | CRED prevented automatic acceptance of a GLP study, critically appraising its scientific merits, leading to a more stringent and unified assessment. |
| Overall Ring Test Consistency | Lower consistency among assessors. High variability in categorizations for the same study [17]. | Higher consistency. Reduced variability due to explicit criteria, minimizing subjective interpretation [17]. | Enhances the reproducibility of regulatory decisions across different institutions and countries. |
| Participant Perception | Perceived as less accurate, more dependent on subjective judgement [17]. | Perceived as more accurate, applicable, consistent, and transparent [12] [17]. | Increases confidence in the evaluation process and the resulting hazard assessments. |
The CRED method is implemented through a structured process that separates the assessment of reliability from relevance, as their definitions are fundamentally different [12]. Reliability concerns the inherent scientific quality of the study design, performance, and analysis. Relevance concerns the applicability of the study's specific characteristics (e.g., test organism, endpoint, exposure duration) to a particular regulatory question (e.g., derivation of a chronic water quality standard) [12].
The following workflow diagrams the logical sequence for applying the CRED method, from study screening to final decision for use in a regulatory assessment.
The core of the CRED method is its 33 criteria (20 for reliability, 13 for relevance). The reliability criteria are organized into six thematic categories that mirror the essential components of a well-reported study [12].
Table 3: Thematic Categories of CRED Reliability Criteria
| Category | Number of Criteria | Example Criterion & Purpose |
|---|---|---|
| Test Substance | 3 | Characterization & Concentration Verification: Ensures the tested material is properly characterized and its concentration in the test system is confirmed, which is critical for reproducibility and understanding structure-activity relationships. |
| Test Organism | 3 | Species Identification & Health Status: Confirms the correct species was used and that organisms were healthy at test initiation, affecting the sensitivity and validity of the biological response. |
| Test Design | 5 | Control Groups & Replication: Evaluates the appropriateness of control groups and the number of replicates, which are fundamental for detecting treatment-related effects and statistical power. |
| Exposure Conditions | 4 | Exposure Duration & System Stability: Assesses whether the exposure regime (static, renewal, flow-through) is appropriate and whether conditions (e.g., temperature, pH) remained stable, ensuring the reported effect is linked to a definable exposure. |
| Endpoint & Data Analysis | 3 | Endpoint Definition & Statistical Methods: Reviews the clarity of the measured endpoint and the appropriateness of the statistical tests used, which is essential for the correct interpretation of results. |
| Reporting & Documentation | 2 | Clarity of Reporting & Raw Data: Judges whether the study is described with sufficient detail to be repeated and if raw data are accessible, which is the foundation of scientific transparency and reassessment. |
The 13 relevance criteria guide the assessor in judging the study's fit-for-purpose. These include the appropriateness of the test organism's trophic level and ecological region, the biological relevance of the endpoint (e.g., mortality vs. subtle biochemical change), and the congruence of the exposure duration (acute vs. chronic) with the protection goal of the assessment [12] [17]. A study on soil invertebrates, for example, is inherently not relevant for deriving an aquatic quality standard, regardless of its high reliability [12].
This protocol provides a step-by-step methodology for a single assessor to evaluate an aquatic ecotoxicity study using the official CRED Excel tool [9].
Objective: To perform a transparent, consistent, and documented evaluation of the reliability and relevance of an aquatic ecotoxicity study for use in regulatory hazard assessment. Materials:
Procedure:
This protocol, modeled on the original CRED validation study, describes how an organization can validate the implementation of CRED and train assessors [17].
Objective: To measure and improve the consistency of CRED evaluations among multiple risk assessors within or across institutions. Materials:
Procedure:
Table 4: Key Research Reagent Solutions for CRED Implementation
| Tool / Resource | Function & Purpose | Source / Example |
|---|---|---|
| Official CRED Excel Evaluation Tool | The primary implementation instrument. Contains all 33 criteria with dropdown scores, comment fields, and embedded guidance to standardize the evaluation process [9]. | Freely available for download from the project website (e.g., Ecotox Centre) [9]. |
| CRED Reporting Recommendations Template | A checklist of 50 specific reporting criteria across six categories (General, Test Design, Substance, etc.). Used prospectively by researchers to ensure studies contain all information needed for a high-reliability evaluation [12]. | Provided as an Excel file alongside the evaluation tool [12]. |
| OECD / EPA Ecotoxicity Test Guidelines | Reference documents for standardized testing protocols. Essential for assessing whether a study's design aligns with accepted scientific principles, a key aspect of reliability [17]. | OECD Guidelines (e.g., 201, 210, 211). US EPA Ecological Effects Test Guidelines. |
| Reference Regulatory Guidance Documents | Provide context for relevance evaluations. Documents like the EU's Technical Guidance Document for deriving Environmental Quality Standards define protection goals and acceptable endpoints [9]. | EU TGD-EQS, REACH Guidance R.4, EMA ERA Guideline [9]. |
| Statistical Analysis Software | Used to analyze inter-assessor agreement during ring-testing and validation of the CRED implementation (e.g., Fleiss' Kappa). | R, SPSS, or dedicated online calculators. |
The successful harmonization of assessments across frameworks requires more than a superior scientific tool; it necessitates a clear pathway for integration into regulatory practice and cross-disciplinary alignment with broader quality paradigms.
Integration with Broader Quality Frameworks: The CRED lifecycle approach aligns with modern regulatory science principles championed by ICH. The Analytical Target Profile (ATP) concept from ICH Q14—defining desired performance criteria at the outset—parallels CRED's emphasis on defining assessment goals before evaluation begins [31]. The lifecycle management of analytical procedures in ICH Q2(R2)/Q14, which includes post-approval change management based on risk, mirrors the need for ongoing evaluation of ecotoxicity data sets as new studies emerge [31] [32]. Furthermore, incorporating quality risk management (ICH Q9) into the CRED process—formally assessing the risk that a study's limitations pose to the overall assessment conclusion—would be a logical and powerful enhancement, creating a direct bridge to pharmaceutical CMC and analytical quality systems [31] [30].
The CRED evaluation method represents a significant evolution from the Klimisch paradigm, offering a structured, transparent, and scientifically rigorous framework for evaluating ecotoxicity data. Its detailed criteria for both reliability and relevance directly address the major sources of inconsistency and expert bias that have hampered harmonized environmental risk assessment [12] [17]. As demonstrated through large-scale ring testing, CRED improves inter-assessor consistency and critical appraisal, reducing the automatic privileging of guideline studies and facilitating the appropriate use of peer-reviewed science [19] [17].
For the method to fully realize its potential for harmonizing assessments across regulatory frameworks, active steps toward formal adoption, training, and technological integration are required [9]. By aligning its implementation with overarching quality and lifecycle concepts from ICH, CRED can transcend ecotoxicology to serve as a model for transparent, evidence-based evaluation across regulatory science. This promises not only more consistent and protective environmental decisions but also greater efficiency and predictability for industry and regulators alike, ultimately contributing to the sustainable development and authorization of chemicals and pharmaceuticals.
The comparative analysis underscores the CRED evaluation method as a scientifically robust and practical successor to the Klimisch method, offering greater detail, transparency, and consistency in assessing ecotoxicity studies. Its structured criteria for both reliability and relevance reduce subjective bias, facilitate the inclusion of peer-reviewed literature, and promote harmonized decision-making across regulatory systems. For biomedical and clinical research, particularly in drug development, adopting CRED can lead to more reliable environmental safety profiles. Future directions should focus on the broader implementation of CRED within global guidelines, its adaptation to emerging toxicological areas (e.g., nanomaterials, endocrine disruptors), and continuous refinement based on stakeholder feedback to keep pace with scientific advancement.