CRED vs Klimisch: Advancing Reliability and Relevance in Ecotoxicity Data Evaluation

Zoe Hayes Jan 09, 2026 253

This article provides a comprehensive comparison of the Klimisch and CRED methods for evaluating the reliability and relevance of ecotoxicity studies, targeting researchers, scientists, and drug development professionals.

CRED vs Klimisch: Advancing Reliability and Relevance in Ecotoxicity Data Evaluation

Abstract

This article provides a comprehensive comparison of the Klimisch and CRED methods for evaluating the reliability and relevance of ecotoxicity studies, targeting researchers, scientists, and drug development professionals. It explores the foundational evolution from the established Klimisch method to the more detailed CRED framework, delves into their methodological application and criteria, addresses common troubleshooting and optimization strategies, and validates their performance through comparative ring test analysis. The synthesis highlights CRED's advantages in transparency, consistency, and practical utility for harmonizing chemical hazard and risk assessments across regulatory frameworks.

Foundations of Ecotoxicity Evaluation: Evolution from Klimisch to CRED

The Critical Role of Data Reliability and Relevance in Chemical Risk Assessment

The regulatory assessment of chemicals hinges on the quality of the underlying ecotoxicity data. For decades, the Klimisch method has been the cornerstone for evaluating study reliability, yet its dependence on expert judgment and lack of explicit relevance criteria have raised concerns about consistency and transparency. This has spurred the development of the Criteria for Reporting and Evaluating ecotoxicity Data (CRED) method. Framed within a broader thesis comparing these two paradigms, this article details the application and protocols that underscore the critical role of data reliability and relevance, providing researchers and drug development professionals with the tools to implement robust, science-based evaluations.

Application Notes and Protocols

The following tables synthesize key quantitative findings from a comprehensive ring test comparing the Klimisch and CRED evaluation methods[reference:0].

Table 1: Structural Characteristics of the Evaluation Methods[reference:1]

Characteristic	Klimisch Method	CRED Method
Data Type	Toxicity and ecotoxicity	Aquatic ecotoxicity
Number of Reliability Criteria	12–14 (ecotoxicity)	20 (evaluating), 50 (reporting)
Number of Relevance Criteria	0	13
OECD Reporting Criteria Included	14 of 37	37 of 37
Additional Guidance	No	Yes
Evaluation Summary	Qualitative (reliability only)	Qualitative (reliability & relevance)

Table 2: Reliability Categorization Outcomes from the Ring Test[reference:2]

Reliability Category	Klimisch Method (% of evaluations)	CRED Method (% of evaluations)
Reliable without restrictions (R1)	8%	2%
Reliable with restrictions (R2)	45%	24%
Not reliable (R3)	42%	54%
Not assignable (R4)	6%	20%

Table 3: Relevance Categorization Outcomes from the Ring Test[reference:3]

Relevance Category	Klimisch Method (% of evaluations)	CRED Method (% of evaluations)
Relevant without restrictions (C1)	32%	57%
Relevant with restrictions (C2)	61%	35%
Not relevant (C3)	7%	8%

Table 4: Risk Assessor Confidence in Evaluation Results[reference:4]

Confidence Level	Klimisch Method	CRED Method
"Very confident" or "Confident"	37%	72%

Experimental Protocol: The CRED-Klimisch Ring Test

This protocol details the two-phase ring test designed to compare the Klimisch and CRED evaluation methods[reference:5].

Study Design and Participants

Objective: To compare the consistency, transparency, and user perception of the Klimisch and CRED methods for evaluating the reliability and relevance of aquatic ecotoxicity studies.
Design: A two-phase, crossover design where each participant evaluated different studies using each method.
Participants: 75 risk assessors from 12 countries, representing 35 organizations including regulatory agencies, consultancies, and industry[reference:6].
Materials: Eight peer-reviewed ecotoxicity studies covering diverse taxonomic groups (algae, higher plants, crustaceans, fish), test designs (acute, chronic), and chemical classes[reference:7].

Phase I: Klimisch Method Evaluation (Nov-Dec 2012)

Assignment: Each participant was assigned two of the eight studies based on their expertise.
Evaluation: Participants evaluated the reliability of their assigned studies using the Klimisch method, categorizing each as:
- R1: Reliable without restrictions.
- R2: Reliable with restrictions.
- R3: Not reliable.
- R4: Not assignable.
Relevance Evaluation: As the Klimisch method lacks formal relevance criteria, participants used ad-hoc judgment to categorize relevance as C1, C2, or C3 (mirroring the reliability categories)[reference:8].
Data Submission: Completed evaluations and a feedback questionnaire were submitted.

Phase II: CRED Method Evaluation (Mar-Apr 2013)

Assignment: Each participant evaluated two different studies from the same set of eight.
Evaluation: Participants used a draft version of the CRED method, applying its 19 reliability and 11 relevance criteria (later refined to 20 and 13)[reference:9].
Categorization: Studies were categorized using the same R1-R4 and C1-C3 scales for direct comparison.
Questionnaire: Participants completed a detailed questionnaire on their perception of the method's accuracy, consistency, transparency, and the confidence they had in their evaluations[reference:10].

Data Analysis

Consistency Analysis: The proportion of agreement among assessors for each criterion and overall category was calculated.
Statistical Comparison: Non-parametric tests (Wilcoxon rank-sum) were used to compare categorization outcomes between methods[reference:11].
Perception Analysis: Responses to perception statements and confidence levels were analyzed using Chi-square tests.

Visualization of Workflows and Processes

This table lists key tools and resources required for implementing rigorous reliability and relevance evaluations in chemical risk assessment.

Tool/Resource	Function & Purpose	Key Features / Examples
CRED Evaluation Checklist	Provides the structured criteria for evaluating study reliability (20 items) and relevance (13 items). Ensures transparent and consistent assessments.	Available as Excel-based assessment sheets from the SciRAP platform[reference:12].
OECD Test Guidelines (TGs)	Define standardized test protocols for generating ecotoxicity data. Serve as the benchmark for evaluating methodological reliability.	e.g., TG 201 (Algal growth inhibition), TG 210 (Fish early-life stage), TG 211 (Daphnia reproduction).
Statistical Analysis Software	Enables the statistical comparison of evaluation outcomes and analysis of ring-test or validation study data.	R or Python packages for non-parametric tests (e.g., Wilcoxon rank-sum), consistency analysis, and data visualization.
Good Laboratory Practice (GLP) Standards	Define a quality system for the organizational process and conditions under which non-clinical safety studies are planned, performed, and reported.	While not a guarantee of reliability, GLP compliance is a weighted factor in many evaluation frameworks.
Reference Databases & Reporting Standards	Provide access to published ecotoxicity studies and define minimum reporting requirements to ensure all necessary information is available for evaluation.	Databases like ECOTOX (US EPA); Reporting standards like the ARRIVE guidelines for in vivo studies.
Specialized CRED Extensions	Tailor the evaluation framework for novel data types or specific environmental compartments.	NanoCRED (for nanomaterials), EthoCRED (for behavioral studies), CRED for sediment and soil studies[reference:13].

In the mid-1990s, the regulatory landscape for chemical safety, particularly under the European Union's Existing Substances Regulation, faced a significant challenge: the need for a harmonized, systematic approach to evaluate the vast and inconsistent body of toxicological and ecotoxicological data submitted by industry [1]. Prior to 1997, assessments relied heavily on unstructured expert judgment, leading to potential inconsistencies and a lack of transparency in how data influenced regulatory decisions [2]. It was within this context that scientists H.J. Klimisch, M. Andreae, and U. Tillmann from BASF AG introduced a seminal framework. Their 1997 paper, "A systematic approach for evaluating the quality of experimental toxicological and ecotoxicological data," proposed a standardized scoring system designed to categorize the reliability of individual studies [3] [4]. The Klimisch method was developed explicitly to fulfill regulatory obligations, providing a structured process for assessing data for entry into the IUCLID database (International Uniform Chemical Information Database), which was becoming the central repository for regulatory information on chemicals [1]. Its primary objective was to bring clarity and consistency to the hazard assessment process, enabling regulators and industry scientists to distinguish between high-quality guideline studies and those with methodological limitations [3].

Core Methodology: The Four-Tiered Scoring System

The Klimisch method's innovation lies in its simplified, four-category classification system for judging study reliability. The assignment is based on a study's adherence to international testing standards (like OECD guidelines), Good Laboratory Practice (GLP), and the completeness of its documentation [5] [6].

Table 1: The Klimisch Scoring Categories and Criteria

Klimisch Score	Category Title	Core Assignment Criteria
1	Reliable without restriction	Studies performed according to internationally accepted testing guidelines (e.g., OECD, EPA) preferably under GLP, or where all parameters are closely comparable to a guideline method [5] [6].
2	Reliable with restriction	Studies where test parameters do not fully comply with a specific guideline but are sufficiently documented and scientifically acceptable. This also includes well-documented non-guideline studies and accepted calculation methods [5] [6].
3	Not reliable	Studies with significant methodological deficiencies, use of irrelevant test systems, or insufficient documentation that precludes a credible assessment [5].
4	Not assignable	Studies where experimental details are absent, such as those found only in short abstracts or secondary literature sources [5].

In regulatory practice, only studies scoring 1 or 2 are typically considered sufficient on their own to fulfill a data requirement for a specific hazard endpoint [5] [6]. Studies scored as 3 (Not reliable) or 4 (Not assignable) may still be used in a supporting role or as part of a "weight of evidence" approach but cannot be the primary basis for a decision [5].

The method also introduced important definitions for reliability, relevance, and adequacy. Reliability was defined as "the inherent quality of a test report... relating to preferably standardized methodology and the way the experimental procedure and results are described to give evidence of the clarity and plausibility of the findings" [7]. This definition intrinsically links reliability to standardized methods and reporting completeness.

Diagram 1: Klimisch Method Evaluation Workflow (Max 760px)

Initial Adoption and Evolution into a Regulatory Backbone

Following its publication, the Klimisch method was rapidly adopted as the de facto standard within emerging EU regulatory frameworks. Its integration was driven by a clear need for efficiency and harmonization. The method became formally recommended in the Technical Guidance for the REACH regulation (Registration, Evaluation, Authorisation and Restriction of Chemicals), which required registrants to evaluate and report the quality of all submitted study data [1]. Its structured approach allowed for the consistent organization of thousands of studies within the IUCLID database, making hazard assessments "scientifically valid, repeatable and... consistent across substances" [1].

To address criticisms about the lack of detailed guidance for applying the Klimisch categories, supporting tools were developed. The most notable is the ToxRTool (Toxicological data Reliability Assessment Tool), an Excel-based checklist released by the European Centre for the Validation of Alternative Methods (ECVAM) [6] [7]. ToxRTool breaks down the evaluation into specific criteria across areas like test substance identification, study design, and results documentation, providing a more transparent and consistent path to assigning a Klimisch score [7].

The method's influence also expanded beyond its original scope. Researchers proposed adaptations to systematically evaluate the quality of human epidemiological data, creating a parallel framework that mirrored the Klimisch categories to allow for the combined assessment of human and animal studies within weight-of-evidence evaluations [1].

The Klimisch Method in Modern Context: Comparison with CRED

The Klimisch method's historical role as the backbone of data evaluation is now critically examined in comparison to newer, more detailed frameworks like the Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) developed in 2016 [8].

CRED was developed explicitly to address perceived shortcomings in the Klimisch method, which were highlighted through ring-tests and scholarly critique [9] [2]. Key criticisms of the Klimisch method include its over-reliance on GLP and guideline adherence, which can cause assessors to overlook methodological flaws in guideline studies or undervalue sound non-guideline research [2]. It was also criticized for providing insufficient guidance, leading to inconsistencies between evaluators, and for lacking explicit criteria to assess a study's relevance (its appropriateness for a specific hazard assessment) [8] [2].

Table 2: Comparison of the Klimisch Method and the CRED Framework

Feature	Klimisch Method (1997)	CRED Framework (2016)
Primary Scope	Toxicological & Ecotoxicological data [3].	Aquatic ecotoxicity data (with extensions for nano, behavior, sediment) [8] [10].
Evaluation Dimensions	Reliability (4 categories) [5].	Reliability (20 criteria) & Relevance (13 criteria) [8] [2].
Guidance Detail	Limited, high-level criteria [2].	Extensive guidance for each criterion [8].
Bias Toward GLP/Guidelines	High; GLP/guideline studies are favored for top scores [2].	Reduced; evaluates methodological soundness independent of formal compliance [9].
Outcome Transparency	Single category score [5].	Detailed qualitative summary of strengths/weaknesses in reliability and relevance [2].
Regulatory Status	Embedded in REACH guidance & IUCLID [1] [5].	Piloted in EU EQS revisions and scientific databases (e.g., NORMAN) [9].

A major ring test involving 75 risk assessors from 12 countries found that participants perceived the CRED method as more accurate, consistent, transparent, and less dependent on subjective expert judgment than the Klimisch method [8] [2]. CRED's structured criteria aim to make the evaluation process more reproducible and its conclusions more transparent.

Diagram 2: Conceptual Evolution from Klimisch to CRED (Max 760px)

Application Notes & Protocols

Protocol: Performing a Klimisch Evaluation for Regulatory Submission

Purpose: To systematically assign a reliability score to a toxicological/ecotoxicological study for inclusion in a regulatory dossier (e.g., REACH, biocides).

Materials & Tools:

Full study report or publication.
Relevant OECD, EPA, or other applicable test guideline.
ToxRTool (recommended for in vivo/in vitro studies) or institutional score sheet [6] [7].
IUCLID 6 database fields for recording rationale [6].

Procedure:

Document Review: Obtain the complete study document. Abstracts or secondary summaries alone are insufficient and typically lead to a Score 4.
Guideline Compliance Check:
- Determine if the study was conducted according to a specific OECD, EPA, or equivalent international testing guideline.
- Verify if it was performed under Good Laboratory Practice (GLP).
- If fully compliant with a guideline (GLP preferred), assign Score 1.
Scientific Assessment (if not fully guideline-compliant):
- Evaluate if the methodology is well-documented and scientifically sound (e.g., appropriate controls, dose selection, statistical analysis).
- Assess if any deviations from guidelines are justified and do not invalidate the core findings.
- If the study is deemed scientifically acceptable despite restrictions, assign Score 2.
Deficiency Identification:
- Identify any fatal methodological flaws (e.g., contaminated controls, unsuitable test system, exposure route irrelevant to human/environmental scenario).
- If such flaws are present and undermine the study's conclusions, assign Score 3.
Documentation and Rationale:
- In the corresponding IUCLID field or assessment report, clearly document the rationale for the assigned score, referencing the specific criteria met or violated.

The Scientist's Toolkit: Essential Resources for Study Evaluation

Tool/Resource	Function in Evaluation	Key Features & Notes
OECD Guidelines for the Testing of Chemicals	The international benchmark for test methodology. Used as the primary reference to assess study design compliance [5].	Provides detailed protocols for specific endpoints (e.g., acute toxicity, repeated dose).
ToxRTool (ECVAM)	An Excel-based tool to standardize the Klimisch scoring process for in vivo and in vitro studies [6] [7].	Uses weighted criteria checklists to generate a score. Improves consistency between evaluators.
IUCLID Database	The regulatory data management system for chemicals under REACH and other frameworks. The Klimisch score is a mandatory field for each study record [1] [5].	Ensures evaluation rationale is systematically archived and reviewed by authorities.
CRED Evaluation Method (Excel Tool)	A modern, criteria-based tool for evaluating aquatic ecotoxicity studies. Useful as a more detailed reference even when a Klimisch score is required [9] [10].	Contains 20 reliability and 13 relevance criteria with extensive guidance.
GLP Principles (OECD Series)	Defines standards for organizational process and study conditions. Not a measure of scientific validity, but a key indicator of process quality for regulators [5].	Focuses on data traceability, quality assurance, and reporting integrity.

The Klimisch method emerged in 1997 as a pragmatic, harmonizing solution to an urgent regulatory need. By introducing a simple, four-tiered scoring system focused on reliability, it brought unprecedented structure to chemical hazard assessment and became deeply embedded in EU regulations like REACH [1] [5]. Its historical role as the backbone of data evaluation is undeniable. However, the evolution of best practices in scientific assessment has revealed its limitations, particularly regarding transparency, detail, and the separation of reliability from relevance. The development of the CRED framework represents a direct and evidence-based response to these limitations, offering a more granular, transparent, and consistent approach for the modern era [8] [2]. For today's researcher, understanding the Klimisch method is essential for navigating historical data and existing regulatory systems, while familiarity with CRED principles points toward the future of more robust and reproducible ecotoxicological risk assessment.

The evaluation of toxicological and ecotoxicological data is a cornerstone of regulatory hazard and risk assessment for chemicals, pharmaceuticals, and other substances [11]. For over two decades, the method proposed by Klimisch and colleagues in 1997 has served as the de facto standard for assessing study reliability within many regulatory frameworks, notably the European Union's REACH regulation [5] [12]. The Klimisch method assigns studies to one of four categories: "Reliable without restriction" (1), "Reliable with restriction" (2), "Not reliable" (3), and "Not assignable" (4) [5] [6]. Generally, only studies scoring 1 or 2 are used to cover an endpoint independently, while scores of 3 or 4 may serve as supporting evidence [5].

Despite its widespread adoption, the Klimisch method has faced sustained and growing criticism from the scientific community. Critiques center on its inconsistency among assessors, lack of detailed operational guidance, and an inherent bias favoring studies conducted under Good Laboratory Practice (GLP) [5] [11] [12]. These shortcomings can introduce subjectivity, limit the use of valuable peer-reviewed science, and ultimately compromise the transparency and robustness of regulatory decisions [11] [7].

In response, alternative, more structured evaluation tools have been developed. The most prominent is the Criteria for Reporting and Evaluating ecotoxicity Data (CRED) method [11] [12]. Developed through a collaborative, international effort, CRED aims to provide a transparent, consistent, and detailed framework for evaluating both the reliability and relevance of aquatic ecotoxicity studies [12]. This article details the emerging criticisms of the Klimisch method, contrasts its framework with the CRED approach, and provides application protocols to guide researchers and risk assessors in implementing robust study evaluation practices.

Core Criticisms of the Klimisch Method

The criticisms of the Klimisch method can be organized into three principal, interconnected deficiencies that undermine its utility for modern, evidence-based risk assessment.

Inconsistency and Subjectivity in Evaluation

The Klimisch method provides only broad category descriptions, lacking a concrete checklist of criteria to assess [5] [11]. This absence of standardized operational guidance places a heavy burden on expert judgment, leading to low inter-assessor consistency. A study may be categorized as "reliable with restrictions" by one assessor and "not reliable" by another, creating significant discrepancies in the data used for hazard characterization [11]. This inconsistency directly threatens the reproducibility and fairness of regulatory outcomes, as the same evidence base can yield different conclusions depending on the assessor [12].

Lack of Detailed Guidance and Criteria

The method's original formulation is brief, offering minimal guidance on how to interpret its categories or evaluate specific study elements [7]. It does not explicitly assess fundamental study design criteria such as randomization, blinding, sample size calculation, or statistical power [5]. Furthermore, it entirely omits a structured process for evaluating a study's relevance—the extent to which its design and findings are appropriate for a specific regulatory question (e.g., assessing chronic effects of an endocrine disruptor using an acute mortality test) [11] [12]. This conflation or neglect of relevance forces assessors to rely on unspecified personal judgment.

Bias Towards Good Laboratory Practice (GLP)

The Klimisch method explicitly prioritizes studies performed "according to generally valid and/or internationally accepted testing guidelines (preferably performed according to GLP)" [5]. This has been criticized for institutionalizing a bias that equates procedural compliance with scientific validity [11] [12]. A GLP-compliant guideline study can receive a high reliability score even if it contains fundamental scientific flaws (e.g., excessive control mortality) [5] [11]. Conversely, a well-designed and thoroughly reported peer-reviewed study may be downgraded solely for not following GLP [12]. This bias can marginalize independent academic research and restrict the evidence base primarily to industry-submitted studies [11].

Comparative Analysis: Klimisch Method vs. CRED Evaluation

The CRED method was developed explicitly to address the flaws in the Klimisch approach. A comparative analysis highlights fundamental differences in structure, application, and outcome.

Table 1: Foundational Comparison of the Klimisch and CRED Evaluation Methods

Aspect	Klimisch Method	CRED Evaluation Method
Primary Purpose	Assign a reliability category for regulatory acceptance [5].	Evaluate reliability and relevance separately to support transparent risk assessment [12].
Core Components	Four reliability categories (1-4) with brief descriptive definitions [5] [6].	20 reliability criteria and 13 relevance criteria, each with detailed guidance [12].
Evaluation of Relevance	Not formally addressed [11].	Structured evaluation across 13 criteria (e.g., test organism, endpoint, exposure scenario) [12].
Basis for Decision	High-level expert judgment based on adherence to guidelines/GLP [5].	Systematic assessment against predefined, detailed criteria [11].
Tool Format	Descriptive text; often applied via tools like ToxRTool [6].	Criteria checklist with extensive guidance; supported by worksheets [12].
Handling of GLP	GLP/guideline compliance is a primary determinant of high reliability [5].	GLP is one factor among many; scientific rigor and reporting quality are paramount [11].

Empirical evidence from a major international ring test underscores the practical impact of these differences. The test involved 75 risk assessors from 12 countries evaluating ecotoxicity studies using both methods [11].

Table 2: Key Quantitative Findings from the CRED-Klimisch Ring Test [11]

Evaluation Metric	Finding	Implication
Inter-assessor Consistency	Lower consistency with Klimisch. Higher agreement among assessors when using the structured CRED criteria.	CRED reduces subjectivity and promotes harmonized assessments.
Perceived Accuracy & Transparency	A majority of ring test participants perceived CRED as more accurate and transparent.	Structured criteria build confidence in the evaluation process and its conclusions.
Dependence on Expert Judgement	CRED was perceived as less dependent on subjective expert judgement.	Reduces variability and potential for unconscious bias.
Practicality	CRED was rated as practical regarding time and use of criteria.	A more detailed method can be efficient and user-friendly.
Study Categorization Outcome	Evaluations could lead to different final classifications for the same study.	The choice of method can directly alter the data entering a risk assessment.

Detailed Application Notes and Experimental Protocols

Protocol for Applying the Klimisch Method (with ToxRTool)

Given the Klimisch method's lack of native guidance, the ToxRTool (Toxicological data Reliability Assessment Tool) is frequently used as an operational intermediary [6] [7]. The following protocol is recommended for a standardized assessment.

Objective: To assign a Klimisch reliability score (1-4) to a toxicological or ecotoxicological study report or publication. Materials: Study to be evaluated; ToxRTool (Excel-based tool for in vivo or in vitro studies). Procedure:

Tool Selection: Download and open the appropriate ToxRTool module (in vivo or in vitro) from the EU Reference Laboratory for alternatives to animal testing (EURL ECVAM) [5].
Initial Screening: Read the study thoroughly. Confirm it falls within the tool's scope (empirical toxicological data).
Criteria Assessment: Navigate through the ToxRTool's tabs (e.g., Test Substance, Test System, Study Design). For each of the ~21 criteria, answer the binary (Yes/No) or multiple-choice questions based solely on the information reported in the study. The tool contains "red" (critical) and "white" (standard) criteria [7].
Score Calculation: The tool automatically calculates a weighted score based on your responses, with red criteria carrying more weight.
Category Assignment: Based on the final score, ToxRTool suggests a Klimisch category:
- ≥ 80% of total points: Reliable without restrictions (1).
- 50-79%: Reliable with restrictions (2).
- <50%: Not reliable (3).
- The tool may also indicate "Not assignable" (4) if information is profoundly lacking.
Documentation: Record the final Klimisch score and the key reasons for any deductions (e.g., "control mortality not reported," "test substance characterization insufficient") as required by databases like IUCLID [6].

Workflow for Klimisch Scoring Using the ToxRTool Intermediary

Protocol for Applying the CRED Evaluation Method

The CRED method involves separate, parallel evaluations of reliability and relevance.

Objective: To transparently evaluate and document the reliability and relevance of an aquatic ecotoxicity study for a defined regulatory purpose. Materials: Study to be evaluated; CRED evaluation worksheet (Excel); CRED guidance document [12]. Procedure:

Define Assessment Context: Clearly state the regulatory purpose (e.g., "Derivation of a PNEC for freshwater under REACH"). This frames the relevance evaluation.
Reliability Evaluation (20 Criteria):
- Using the worksheet, assess each of the 20 reliability criteria (e.g., "Test concentrations are reported and appropriate," "Control performance is acceptable").
- For each criterion, choose: Fully addressed (Y), Partially addressed (P), Not addressed (N), or Not applicable (NA).
- Provide a brief text justification for each score, referencing specific lines/sections in the study.
- Do not combine criteria into an overall numerical score. The strength/weakness profile is more informative than a composite number [7].
Relevance Evaluation (13 Criteria):
- Evaluate the 13 relevance criteria (e.g., "Appropriateness of test organism," "Exposure duration relative to the assessment endpoint") against the defined assessment context from Step 1.
- Use the same scoring scheme (Y/P/N/NA) and provide justifications.
- This determines if the study is fit for purpose, independent of its reliability.
Integration and Conclusion:
- Summarize the major strengths and weaknesses from both evaluations.
- Conclude on the study's usability for the defined purpose. A study can be:
  - Usable: High reliability and high relevance.
  - Usable with limitations: Minor issues in reliability/relevance.
  - Supporting only: Major limitations in one or both domains.
  - Not usable: Fatally flawed or irrelevant.

Dual-Path Evaluation Workflow of the CRED Method

The Scientist's Toolkit: Essential Research Reagents and Solutions

Conducting and evaluating toxicological studies requires both physical reagents and methodological tools. Below is a table of key solutions for professionals in this field.

Table 3: Key Research Reagent Solutions for Toxicology Study Execution and Evaluation

Tool/Reagent Category	Specific Example/Name	Primary Function in Research/Evaluation
Evaluation Framework	CRED Evaluation Worksheets [12]	Provides the structured checklist and guidance for performing transparent reliability and relevance assessments.
Evaluation Framework	ToxRTool [5] [6] [7]	An Excel-based tool that operationalizes Klimisch scoring with defined criteria, aiding consistency.
Evaluation Framework	SciRAP Tool [7]	An online resource for evaluating and reporting in vivo (eco)toxicity studies, promoting structured assessment.
Reporting Guideline	CRED Reporting Recommendations [12]	A checklist of 50 criteria across 6 categories to guide researchers in publishing regulatory-ready studies.
In Silico Predictive Tool	QSAR Models (e.g., VEGA, EPI Suite) [13]	Provides predicted data for persistence, bioaccumulation, and toxicity to fill gaps, especially for data-poor substances.
Reference Management	IUCLID Database [5]	The standard software for compiling, evaluating, and submitting chemical data under REACH; includes Klimisch score fields.
Regulatory Guideline	OECD Test Guidelines [5] [11]	Internationally agreed test methods defining standard protocols for generating reliable safety data.
Quality System	Good Laboratory Practice (GLP) [14]	A quality management system ensuring studies are planned, performed, monitored, and reported to high standards of traceability.

The Klimisch method played a historic role in introducing systematic thinking to toxicological data evaluation. However, its structural deficiencies—inconsistency, lack of guidance, and GLP bias—render it inadequate for contemporary demands for transparency, reproducibility, and comprehensive evidence integration in regulatory science [11] [12]. Empirical evidence demonstrates that the CRED evaluation method effectively mitigates these shortcomings by providing a detailed, criteria-based framework that separately assesses reliability and relevance [11].

For researchers, adopting the CRED reporting recommendations during study design and publication increases the likelihood that their work will meet regulatory standards [12]. For risk assessors and drug development professionals, transitioning from the Klimisch method to structured tools like CRED or SciRAP is a critical step toward more robust, objective, and defensible hazard and risk assessments. This evolution supports a broader and more rigorous evidence base, ultimately leading to better-informed decisions that protect human health and the environment.

The regulatory assessment of chemicals hinges on the robust evaluation of ecotoxicity studies. For decades, the Klimisch method, established in 1997, served as the cornerstone for determining study reliability within frameworks like REACH and the Water Framework Directive [2]. It categorizes studies as "reliable without restrictions," "reliable with restrictions," "not reliable," or "not assignable" [2]. However, mounting evidence has revealed critical shortcomings in this approach. Its limited criteria and lack of guidance for relevance evaluation lead to inconsistent assessments that depend heavily on individual expert judgment [2]. This inconsistency can directly impact risk assessments, potentially resulting in underestimated environmental hazards or unnecessary mitigation costs [2].

The Criteria for Reporting and Evaluating ecotoxicity Data (CRED) project emerged from a 2012 initiative to address these deficiencies [2]. Its genesis was driven by the need for a transparent, science-based, and harmonized method that could systematically evaluate both the reliability and relevance of aquatic ecotoxicity studies. Developed through international collaboration and expert consultation, CRED aims to strengthen the consistency and robustness of environmental hazard assessments, thereby supporting more reliable regulatory decisions [9]. This document provides detailed application notes and experimental protocols for implementing the CRED method, framed within a broader research thesis comparing it to the traditional Klimisch approach.

Comparative Methodologies: Klimisch vs. CRED

The fundamental difference between the Klimisch and CRED methods lies in their structure, granularity, and scope. The following table summarizes their core characteristics.

Table 1: Core Characteristics of the Klimisch and CRED Evaluation Methods

Characteristic	Klimisch Method	CRED Method
Primary Focus	Reliability of toxicity/ecotoxicity studies [2].	Reliability and relevance of aquatic ecotoxicity studies [2].
Number of Reliability Criteria	12-14 for ecotoxicity studies [2].	20 detailed evaluation criteria [2] [9].
Number of Relevance Criteria	0 (not formally addressed) [2].	13 specific criteria [2] [9].
Guidance Provided	Limited, qualitative summary [2].	Comprehensive guidance for each criterion [2].
Basis for Evaluation	General expert judgement based on broad categories.	Specific, documented criteria aligned with OECD test guideline reporting requirements [2].
Outcome	A single reliability category (e.g., "reliable with restrictions").	Separate, qualitative summaries for reliability and relevance, with documented justification [2].

The CRED method's enhanced detail is visualized in the following workflow, which outlines the sequential steps for a comprehensive study evaluation.

CRED Evaluation Workflow

Experimental Protocol: The CRED Ring Test Methodology

The comparative efficacy of the CRED and Klimisch methods was empirically validated through a formal, two-phase international ring test [2]. The following protocol details the experimental design and procedure.

Protocol: International Ring Test for Method Comparison

Objective: To characterize differences in study categorization, consistency, and user perception between the Klimisch and CRED evaluation methods.

Materials & Resources:

Eight peer-reviewed ecotoxicity studies: Covering different taxonomic groups (algae, crustaceans, fish, higher plants), chemical classes (industrial, biocide, pharmaceutical), and endpoints (EC50, NOEC) [2].
Evaluation Materials: Klimisch method guidelines and the draft CRED evaluation matrix [2].
Participant Cohort: 75 risk assessors from 12 countries with expertise in regulatory ecotoxicology [2].

Procedure:

Phase I – Klimisch Evaluation (November-December 2012):

Randomly assign each participant two distinct studies from the pool of eight, ensuring no institutional overlap for a given study.
Instruct participants to evaluate the reliability of their assigned studies using the standard Klimisch method. Relevance is evaluated ad-hoc if required by the participant's usual practice.
Collect completed evaluations, including the final Klimisch score (1-4) and any supporting notes.

Phase II – CRED Evaluation (March-April 2013):

Assign each participant two new studies from the same pool, different from their Phase I assignments, again preventing institutional overlap.
Provide participants with the draft CRED evaluation tool, consisting of the 20 reliability and 13 relevance criteria with guidance [2].
Instruct participants to evaluate both the reliability and relevance of their assigned studies using the CRED criteria.
Collect completed CRED evaluation matrices.

Data Analysis:

Consistency Calculation: For each study and method, calculate the percentage agreement among all evaluators on the final categorization (e.g., reliable vs. not reliable for Klimisch; high/medium/low for CRED dimensions).
Performance Metrics: Compare the time required to complete evaluations and the perceived practicality of each method via participant questionnaire.
Categorization Comparison: Map CRED outcomes to Klimisch categories to analyze shifts in study acceptability for regulatory use.

Results and Comparative Analysis

The ring test yielded quantitative and qualitative data demonstrating the advantages of the CRED framework. Key results are consolidated below.

Table 2: Key Quantitative Outcomes from the CRED-Klimisch Ring Test

Metric	Klimisch Method Results	CRED Method Results	Implication
Participant Consensus	Lower inter-evaluator agreement on study reliability [2].	Higher consistency among evaluators for both reliability and relevance judgments [2].	CRED reduces subjective interpretation, promoting harmonization.
Evaluation Scope	Focused solely on reliability; relevance not systematically assessed [2].	Comprehensive assessment of reliability (20 criteria) and relevance (13 criteria) [2].	Enables more scientifically robust and fit-for-purpose study selection.
Bias Toward GLP	Strong preference for GLP (Good Laboratory Practice) studies, potentially overlooking flaws [2].	Criteria-driven assessment that evaluates methodological soundness regardless of GLP status [2].	Facilitates the inclusion of high-quality peer-reviewed literature, expanding the data pool.
Perceived Utility	Viewed as dependent on expert judgement [2].	Perceived as more accurate, transparent, and practical by a majority of ring test participants [2].	Higher user acceptance and confidence in evaluation outcomes.

The following diagram illustrates the conceptual shift from the Klimisch method's linear, judgment-based process to CRED's structured, criteria-based analysis.

Methodological Comparison: Expert vs. Criteria-Driven

Application Notes: Implementing CRED in Regulatory and Research Contexts

Protocol: Conducting a CRED Evaluation for a Single Study

Objective: To perform a standardized, transparent evaluation of the reliability and relevance of an aquatic ecotoxicity study.

Materials:

CRED Evaluation Tool: The official Excel-based checklist containing the 20 reliability and 13 relevance criteria [9].
Study Manuscript: The complete peer-reviewed publication or study report to be evaluated.
Relevant Test Guideline: The appropriate OECD (e.g., OECD 210 for fish early-life stage) or EPA test guideline corresponding to the study design.

Procedure:

Documentation: Record study identifier, test substance, organism, and endpoint.
Reliability Assessment (Criteria 1-20): For each criterion (e.g., "Test concentrations are reported and confirmable"), select "Yes," "No," or "Not Applicable." Provide a brief justification for each response based on the study text. This covers experimental design, chemical analysis, statistical methods, and reporting clarity.
Reliability Summary: Based on the pattern of responses, assign an overall reliability rating: High (most criteria met, no critical flaws), Medium (some deficiencies not affecting core conclusions), or Low (critical flaws present).
Relevance Assessment (Criteria A-M): Evaluate the study's relevance for a specific assessment context (e.g., deriving a Water Framework Directive EQS). Criteria cover ecological representativeness, exposure pathway, endpoint sensitivity, and environmental realism.
Relevance Summary: Assign a relevance rating: High (ideal for the assessment purpose), Medium (useful with caveats), or Low (marginal or inappropriate relevance).
Final Documentation: Combine the reliability and relevance summaries to produce a final statement on the study's utility for the regulatory or research question at hand. The completed CRED tool serves as the audit trail.

Integration in Systematic Review and Weight-of-Evidence

CRED is designed for integration into systematic review processes. In a Weight-of-Evidence assessment, multiple studies are evaluated individually using CRED. The transparent outputs—specific ratings and justifications—allow for the explicit, defensible weighing of studies. A study with "High" reliability and "High" relevance would typically carry more weight than one with "Medium" reliability and "Low" relevance for a particular context. This moves beyond the Klimisch-based binary acceptance/rejection, enabling nuanced, tiered use of available science.

Table 3: Key Research Reagent Solutions for CRED Implementation

Tool/Resource	Function in CRED Evaluation	Source/Availability
CRED Excel Evaluation Tool	The primary instrument containing all criteria, guidance, and fields for documenting the evaluation [9].	Freely available for download from the CRED project resources [9].
OECD Test Guidelines	The international standard for test methodology. Used as the benchmark for evaluating study design and reporting adequacy against the CRED reliability criteria [2].	OECD Publishing.
GLP Principles	A quality system for non-clinical studies. Understanding GLP helps evaluate study conduct aspects but, per CRED, does not override specific scientific quality criteria [2].	OECD Series on Principles of GLP.
Systematic Review Software	Platforms (e.g., HAWC, DistillerSR) can be configured to incorporate CRED criteria for blinding, managing, and documenting evaluations across a large evidence base.	Various commercial and open-source platforms.
Chemical-Specific Guidance	Documents like the EU's Technical Guidance for deriving Environmental Quality Standards (TGD-EQS) provide the context for determining study relevance [9].	European Commission and other regulatory bodies.

The genesis of CRED represents a paradigm shift from a reliance on expert judgment to a structured, criteria-based transparency model. As demonstrated by the ring test, it provides a more consistent, detailed, and balanced framework for evaluating ecotoxicity data than the Klimisch method [2]. Its ongoing adoption in regulatory pilots, such as the revision of the EU's Technical Guidance Document and its use by the Joint Research Centre, underscores its utility in promoting international harmonization [9].

For researchers and assessors, adopting CRED mitigates the subjectivity inherent in the Klimisch approach, leading to more defensible and reproducible hazard assessments. The provision of separate reliability and relevance summaries offers nuanced insight that directly informs Weight-of-Evidence decisions. Future research directions include expanding the CRED principles to terrestrial ecotoxicity endpoints and further automating the evaluation process within systematic review platforms.

In regulatory ecotoxicology, the quality of data used for hazard and risk assessment is paramount. Two core concepts underpin this evaluation: reliability (the inherent quality of a study based on its methodology and reporting) and relevance (the appropriateness of the data for a specific regulatory purpose)[reference:0]. Regulatory frameworks like REACH, the US EPA, and the Water Framework Directive mandate that studies undergo a formal evaluation of these attributes before use.

For decades, the Klimisch method (1997) has been the dominant evaluation system. It categorizes study reliability but offers limited criteria and no formal guidance for assessing relevance, leading to reliance on expert judgment and potential inconsistency[reference:1]. In response, the Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) method was developed to provide a more transparent, detailed, and consistent framework for evaluating both reliability and relevance[reference:2].

This document, framed within a thesis comparing the Klimisch and CRED methodologies, provides detailed application notes and experimental protocols derived from the seminal ring-test comparison study.

Core Conceptual Definitions

Reliability: “The inherent quality of a test report or publication relating to preferably standardized methodology and the way the experimental procedure and results are described to give evidence of the clarity and plausibility of the findings.”[reference:3]
Relevance: “The extent to which data and tests are appropriate for a particular hazard identification or risk characterisation.”[reference:4]
Regulatory Context: The specific legal framework (e.g., REACH, pesticide registration, derivation of Environmental Quality Standards) that defines the purpose for which a study is being evaluated, thereby influencing relevance judgments.

Quantitative Comparison: Klimisch vs. CRED

The following tables summarize key data from the ring test involving 75 risk assessors from 12 countries, which directly compared the two methods[reference:5].

Table 1: Method Characteristics

Characteristic	Klimisch Method	CRED Evaluation Method
Data Type	General toxicity & ecotoxicity	Aquatic ecotoxicity
Number of Reliability Criteria	12–14 (ecotoxicity)	20 (evaluating) / 50 (reporting)
Number of Relevance Criteria	0	13
OECD Reporting Criteria Included	14 of 37	37 of 37
Additional Guidance	No	Yes, extensive
Evaluation Summary	Qualitative (reliability only)	Qualitative (reliability & relevance)[reference:6]

Table 2: Reliability Evaluation Outcomes (Ring Test)

Reliability Category	Klimisch Method (% of evaluations)	CRED Method (% of evaluations)
Reliable without restrictions (R1)	8%	2%
Reliable with restrictions (R2)	45%	24%
Not reliable (R3)	42%	54%
Not assignable (R4)	6%	20%[reference:7]

Table 3: Relevance Evaluation Outcomes (Ring Test)

Relevance Category	Klimisch Method (% of evaluations)	CRED Method (% of evaluations)
Relevant without restrictions (C1)	32%	57%
Relevant with restrictions (C2)	61%	35%
Not relevant (C3)	7%	8%[reference:8]

Table 4: Risk Assessor Confidence in Evaluation

Confidence Level	Klimisch Method (% of respondents)	CRED Method (% of respondents)
Very confident / Confident	37%	72%[reference:9]

Table 5: Participant Perception of Method Attributes

Statement: "The method is accurate, applicable, consistent, transparent, and less dependent on expert judgement."

CRED was rated significantly higher than Klimisch across all attributes by participants[reference:10].

Experimental Protocol: The CRED-Klimisch Ring Test

This protocol details the two-phase ring test designed to compare the Klimisch and CRED evaluation methods.

Objective

To characterize differences in outcomes and user perception between the Klimisch and CRED methods for evaluating the reliability and relevance of aquatic ecotoxicity studies.

Study Design & Timeline

Type: Prospective, cross-over ring test.
Phases: Two sequential evaluation phases.
Duration: Phase I (Nov-Dec 2012), Phase II (Mar-Apr 2013)[reference:11].

Participant Selection & Demographics

Recruitment: Via professional networks (SETAC), regulatory meetings, and email.
Cohort: 75 risk assessors from 35 organizations across 12 countries.
Background: Represented regulatory agencies (9), consultancies (17), industry (9), and academia. 58-62% had >5 years of evaluation experience[reference:12].

Test Article Selection

Number: 8 peer-reviewed aquatic ecotoxicity studies.
Selection Criteria: Covered diverse taxonomic groups (cyanobacteria, algae, crustaceans, fish), test designs (acute/chronic), and chemical classes (industrial, pharmaceutical, biocide)[reference:13].
Assignment: Each participant evaluated 2 unique studies per phase, with no overlap within institutes to ensure independence[reference:14].

Evaluation Procedure

Phase I (Klimisch): Participants evaluated the reliability and relevance of two assigned studies using the standard Klimisch method. Relevance was categorized ad-hoc (C1-C3) as the Klimisch method lacks formal criteria[reference:15].
Phase II (CRED): Participants evaluated two different studies using a draft version of the CRED method, which included 19 reliability and 11 relevance criteria with guidance[reference:16].
Questionnaire: After each phase, participants completed a survey on their experience, perception of the method's accuracy/consistency, and confidence in their evaluations[reference:17].

Data Collection & Analysis

Primary Outcome: Assigned reliability (R1-R4) and relevance (C1-C3) categories for each study.
Analysis: Consistency of categorization across assessors was calculated. Differences between methods were analyzed using the non-parametric Wilcoxon paired rank-sum test. Participant perception data were analyzed using Chi-square tests[reference:18].
Output: Arithmetic means of conclusive categories (R1-R3, C1-C3) were calculated for comparative analysis[reference:19].

Visualizing Workflows and Relationships

Diagram 1: Ring Test Design for Method Comparison

Diagram 2: Evaluation Criteria Comparison

Diagram 3: Reliability, Relevance & Regulatory Decision Pathway

The Scientist's Toolkit: Essential Research Reagents & Materials

The following table lists key materials essential for conducting standardized aquatic ecotoxicity tests, which form the basis of the studies evaluated by the Klimisch and CRED methods.

Table 6: Key Reagents & Materials for Aquatic Ecotoxicity Testing

Item	Function & Relevance to Evaluation
Standard Test Organisms (e.g., Daphnia magna, Pseudokirchneriella subcapitata, Danio rerio)	Required for OECD/EPA guideline tests. Study reliability is assessed based on proper organism identification, husbandry, and health[reference:20].
Reference Toxicants (e.g., Potassium dichromate, Sodium lauryl sulfate)	Used in periodic quality control tests to demonstrate organism sensitivity and test system validity—a key reliability criterion.
Culture Media & Reconstituted Water (e.g., ISO, OECD standard media)	Standardized exposure media ensure reproducibility. Deviations from recommended compositions can affect reliability evaluations[reference:21].
Analytical Grade Test Substances	Purity and stability of the test substance must be characterized. Lack of analytical confirmation of exposure concentration is a common reason for downgrading reliability[reference:22].
Solvent Controls (e.g., Acetone, DMSO)	Necessary for testing poorly soluble substances. Exceeding OECD-recommended solvent concentrations can render a study "not reliable"[reference:23].
Positive & Negative Control Articles	Essential for validating test performance and distinguishing treatment effects from background variability.
Data Reporting Templates (e.g., CRED checklist)	Facilitate comprehensive reporting of all methodological details (50 CRED criteria), directly improving the clarity and evaluability of a study[reference:24].

Concluding Notes for Researchers

The transition from the Klimisch method to the CRED framework represents a significant evolution in regulatory ecotoxicology. CRED's structured, criteria-based approach enhances transparency, reduces undesirable reliance on expert judgment, and improves consistency among assessors. For researchers, adhering to detailed reporting recommendations (like those provided by CRED) increases the likelihood that their work will be deemed reliable and relevant for regulatory use. For risk assessors, using CRED provides greater confidence in evaluation outcomes, contributing to more robust and defensible hazard and risk assessments across global regulatory frameworks.

Methodology in Practice: Deconstructing Klimisch and CRED Evaluation Criteria

The evaluation of study reliability is a cornerstone of chemical hazard and risk assessment. In 1997, Klimisch and colleagues introduced a standardized four-category scoring system to assess the reliability of toxicological and ecotoxicological data[reference:0]. This method, widely adopted by regulatory authorities, categorizes studies as "reliable without restrictions," "reliable with restrictions," "not reliable," or "not assignable." Despite its widespread use, the Klimisch method has been criticized for lacking detailed criteria and guidance, leading to inconsistencies among assessors. This prompted the development of the Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) method, which offers a more structured and transparent framework[reference:1]. This article provides detailed application notes and protocols for the Klimisch method, framed within the broader thesis of comparing its performance and utility against the CRED evaluation system.

Klimisch Method Application Notes

The Klimisch method assigns a reliability score based on the study's adherence to standardized methodologies and the completeness of its documentation[reference:2].

The Four-Category Scoring System

The core of the method is the four-tier classification, detailed in Table 1.

Table 1: Klimisch Scoring System Categories and Assignment Criteria[reference:3]

Score	Description	Assignment Criteria (Excerpt from IUCLID)
1	Reliable without restriction	Guideline study (preferably GLP); comparable to guideline study; test procedure in accordance with national standard methods or generally accepted scientific standards and described in detail.
2	Reliable with restriction	Guideline study without detailed documentation; guideline study with acceptable restrictions; test procedure in accordance with national standard methods with acceptable restrictions; well-documented study meeting generally accepted scientific principles.
3	Not reliable	Study has significant methodological deficiencies or uses an unsuitable test system.
4	Not assignable	Information is insufficient for assessment (e.g., abstract, secondary literature).

Protocol for Applying Klimisch Scoring

Protocol 1: Step-by-Step Klimisch Evaluation

Study Acquisition: Obtain the full study report or publication.
Guideline Check: Determine if the study follows a recognized test guideline (e.g., OECD, ISO) or is comparable.
Documentation Review: Assess the completeness of the methodological description, including test substance characterization, organism details, exposure conditions, and statistical analysis.
Quality Assessment: Identify any methodological flaws, such as inadequate controls, poor compliance with validity criteria, or missing critical data.
Category Assignment: Based on the review, assign the appropriate Klimisch score (1–4) using the criteria in Table 1. Expert judgment is required, particularly for borderline cases[reference:4].
Document Rationale: Record the justification for the assigned score, citing specific strengths or deficiencies.

CRED Evaluation Method Application Notes

The CRED method was developed to provide a more detailed, consistent, and transparent tool for evaluating aquatic ecotoxicity studies. It expands upon the Klimisch framework by introducing explicit criteria for both reliability and relevance[reference:5].

CRED Reliability Categories

CRED uses four reliability categories analogous to Klimisch scores: Reliable without restrictions (R1), Reliable with restrictions (R2), Not reliable (R3), and Not assignable (R4)[reference:6]. Detailed descriptions are provided in Table 2.

Table 2: CRED Reliability Categories[reference:7]

Score	Description
R1	Reliable without restrictions: All critical reliability criteria for this study are fulfilled. The study is well designed and performed without flaws affecting reliability.
R2	Reliable with restrictions: The study is generally well designed and performed, but some minor flaws in documentation or setup may be present.
R3	Not reliable: Not all critical reliability criteria are fulfilled. The study has clear flaws in design and/or performance.
R4	Not assignable: Information needed to assess the study is missing (e.g., insufficient experimental details in an abstract or secondary literature).

CRED Reliability and Relevance Criteria

The CRED method is defined by a set of 20 reliability criteria and 13 relevance criteria, providing a structured checklist for evaluators[reference:8]. The reliability criteria span six categories: General information, Test setup, Test compound, Test organism, Exposure conditions, and Statistical design & biological response[reference:9].

Table 3: CRED Relevance Categories

C1: Relevant without restrictions.
C2: Relevant with restrictions.
C3: Not relevant.
C4: Not assignable (added in the final method)[reference:10].

Protocol for Conducting CRED Evaluation

Protocol 2: Step-by-Step CRED Evaluation

Preparation: Download the CRED Excel evaluation tool.
Study Screening: Perform an initial relevance check based on the title and abstract (e.g., aquatic vs. terrestrial test).
Criteria Assessment: For the full study, systematically evaluate and score each of the 20 reliability criteria and 13 relevance criteria against the provided guidance.
Category Assignment: Based on the criteria scores, assign final reliability (R1-R4) and relevance (C1-C4) categories.
Documentation: The Excel tool automatically summarizes the evaluation, promoting transparency and consistency.

Comparative Analysis: Klimisch vs. CRED

A ring test involving 75 risk assessors from 12 countries directly compared the two methods[reference:11]. Key comparative data are summarized below.

Table 4: General Characteristics of Klimisch and CRED Evaluation Methods[reference:12]

Characteristic	Klimisch	CRED
Data type	Toxicity and ecotoxicity	Aquatic ecotoxicity
Number of reliability criteria	12–14 (ecotoxicity)	Evaluating: 20 (Reporting: 50)
Number of relevance criteria	0	13
Number of OECD reporting criteria included	14 (of 37)	37 (of 37)
Additional guidance	No	Yes
Evaluation summary	Qualitative for reliability	Qualitative for reliability and relevance

Table 5: Ring Test Results - Reliability Evaluation Outcomes[reference:13]

Reliability Category	Klimisch Method (% of evaluations)	CRED Method (% of evaluations)
Reliable without restrictions	8%	2%
Reliable with restrictions	45%	24%
Not reliable	42%	54%
Not assignable	6%	20%

Table 6: Ring Test Results - Practicality and Perception

Time Requirement: A higher percentage of evaluations were completed in <60 minutes using the Klimisch method (considered efficient), but the CRED method was still perceived as practical for routine use[reference:14][reference:15].
Consistency: The CRED method produced more consistent categorization among assessors, reducing the high discrepancies seen with the Klimisch method where studies were often split between "reliable" and "not reliable" categories[reference:16].
Perception: Ring test participants rated the CRED method as more accurate, consistent, applicable, and less dependent on expert judgment than the Klimisch method[reference:17][reference:18].
Confidence: 72% of assessors felt "very confident" or "confident" using CRED, compared to 37% using the Klimisch method[reference:19].

Experimental Protocols for Method Comparison

Protocol 3: Ring Test Design for Comparing Evaluation Methods This protocol is based on the CRED ring test methodology[reference:20].

Participant Recruitment: Recruit risk assessors from diverse sectors (regulatory, industry, academia, consultancy) with experience in study evaluation.
Study Selection: Select a set of 8-10 ecotoxicity studies representing different organisms, substances, and study types (guideline, non-guideline).
Phased Evaluation:
- Phase I (Klimisch): Each participant evaluates 2 studies using the Klimisch method only.
- Phase II (CRED): Each participant evaluates 2 different studies using the CRED method.
Data Collection: Collect the assigned reliability/relevance categories, time taken for evaluation, and perceived confidence via a standardized questionnaire.
Analysis: Calculate consistency metrics (agreement among assessors), compare category distributions, and analyze perception data.

Visualization of Evaluation Workflows

Diagram 1: Klimisch Scoring Decision Workflow

Diagram 2: CRED Evaluation Systematic Process

Diagram 3: Comparative Decision Pathways: Expert vs. Criteria-Driven

The Scientist's Toolkit

Table 7: Essential Materials for Ecotoxicity Study Evaluation

Item Category	Specific Item	Function/Purpose
Reference Documents	OECD/ISO Test Guidelines (e.g., 201, 210, 211)	Provide standardized methodology against which studies are evaluated.
	Good Laboratory Practice (GLP) Principles	Benchmark for assessing study conduct and documentation quality.
	REACH Guidance Documents	Define regulatory context and requirements for reliability/relevance.
Evaluation Tools	Klimisch Score Criteria Table (Table 1)	Quick reference for assigning Klimisch categories.
	CRED Evaluation Excel Tool	Structured checklist for systematic, transparent assessment.
	ToxRTool (ECVAM)	Excel-based tool that provides detailed criteria leading to a Klimisch category[reference:21].
Test Materials	Certified Reference Substances	Ensure test substance purity and identity are verifiable.
	Control Substances (Solvent, Positive/Negative)	Essential for validating test system performance.
	Cultured Test Organisms (e.g., Daphnia magna, algae)	Standardized biological material for toxicity testing.
Laboratory Equipment	Analytical Chemistry Equipment (HPLC, GC-MS)	For verifying test substance concentrations during exposure.
	Environmental Chambers	To maintain stable temperature, light, and pH conditions.
Software	Statistical Analysis Software (e.g., R, GraphPad Prism)	To evaluate dose-response data and calculate endpoints (EC50, NOEC).
	Reference Management Software	To organize and cite evaluated studies.

The Klimisch four-category scoring system provided a foundational step toward standardizing reliability assessments in regulatory toxicology. However, its reliance on expert judgment and lack of detailed criteria can lead to inconsistency. The CRED evaluation method addresses these shortcomings by offering a structured, criteria-based framework that encompasses both reliability and relevance, resulting in greater transparency and consistency among assessors. For robust regulatory decision-making, especially where data inclusivity is valued, the CRED method presents a scientifically rigorous alternative or successor to the Klimisch approach. The choice of method ultimately depends on the need for speed versus the demand for detailed, defensible, and harmonized study evaluations.

The regulatory evaluation of ecotoxicity studies is fundamental for deriving Predicted-No-Effect Concentrations (PNECs) and Environmental Quality Standards (EQSs) [12]. For decades, the predominant method for this task has been the system proposed by Klimisch and colleagues in 1997 [2]. While representing an initial step toward systematic evaluation, the Klimisch method has faced sustained criticism for its lack of detail, insufficient guidance, and failure to ensure consistency among different risk assessors [12] [2]. Its broad categorization of studies as "reliable without restrictions," "reliable with restrictions," "not reliable," or "not assignable" leaves considerable room for subjective expert judgment [2]. Furthermore, it has been criticized for favoring Good Laboratory Practice (GLP) and standardized guideline studies, potentially excluding pertinent and high-quality data from the peer-reviewed scientific literature [12] [15].

The Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) framework was developed to address these shortcomings. Its primary aim is to improve the reproducibility, transparency, and consistency of reliability and relevance evaluations across different regulatory frameworks, countries, and individual assessors [12] [9]. The core innovation of CRED is its detailed, criteria-based approach, comprising 20 reliability criteria and 13 relevance criteria, supported by extensive guidance [12]. This structured methodology provides a more granular and transparent alternative to the Klimisch method, reducing ambiguity and promoting harmonized data assessment in environmental hazard and risk evaluations [2].

Core Framework: The CRED Evaluation Criteria

The CRED framework is built on clear, distinct definitions. Reliability refers to the inherent scientific quality of a study, assessing the clarity, plausibility, and reproducibility of its experimental procedure and findings. Relevance, in contrast, is purpose-dependent and evaluates how appropriate the data and test are for a specific hazard identification or risk characterization [12] [2]. A study can be highly reliable but irrelevant for a particular assessment, and vice-versa [12].

The 20 reliability criteria are organized into key categories that scrutinize every aspect of a study's design, conduct, and reporting. The 13 relevance criteria ensure the study's purpose, test organism, substance, endpoint, and exposure conditions align with the specific regulatory question at hand [12]. The comprehensive nature of these criteria is summarized below.

Table 1: Overview of CRED Reliability and Relevance Criteria Categories

Evaluation Dimension	Number of Criteria	Primary Focus Areas
Reliability	20	Test substance characterization, test organism details, exposure system & conditions, study design & methodology, data reporting & statistical analysis [12].
Relevance	13	Assessment purpose, test organism appropriateness, exposure pathway, measured endpoints, and ecological realism of the tested concentration and duration [12].

Application Notes and Experimental Protocols

Protocol 1: Performing a CRED Evaluation

A systematic workflow is essential for consistent application. The following protocol, derived from the CRED method and supporting literature, details the steps for evaluators [12] [2].

1. Definition of Assessment Purpose: Clearly articulate the regulatory context (e.g., derivation of an EQS, pesticide authorization) and the specific ecological compartment (e.g., freshwater, marine) and protection goals involved. This defines the lens for all relevance evaluations [16].

2. Initial Screening & Gateway Check: Perform a preliminary scan of the study's title, abstract, and materials section. Verify the presence of minimum information: correct test organism, relevant substance, reported ecotoxicological endpoint, and basic methodological description. Studies failing this gateway are not evaluated further unless missing information can be obtained [16].

3. Independent Reliability and Relevance Assessment:

Reliability Assessment: Systematically review the full study against each of the 20 reliability criteria. For each criterion, determine if it is "Fully Met," "Partially Met," "Not Met," or "Not Reported." Document the rationale for any score other than "Fully Met," citing specific sections of the study [12] [16].
Relevance Assessment: Using the defined assessment purpose from Step 1, evaluate the study against the 13 relevance criteria. This determines the "fitness for purpose." A chronic reproduction study on fish, for example, is highly relevant for assessing an endocrine disruptor but may be less relevant for a baseline acute hazard classification [12].

4. Overall Categorization and Integration: Synthesize the individual ratings into overall reliability and relevance categories (e.g., Reliable/Relevant Without Restrictions, With Restrictions, Not Reliable/Relevant, Not Assignable). Finally, integrate these two judgments to determine the study's overall usability for the defined purpose [16]. A study must be both reliable and relevant to be usable without restrictions.

5. Documentation: Transparently document all scores, justifications, and final conclusions. This creates an audit trail, which is critical for regulatory transparency and for reconciling differences between evaluators [12].

Diagram: CRED Study Evaluation Workflow. The process begins by defining the purpose, proceeds through parallel reliability and relevance assessments informed by the purpose, and concludes with integrated categorization and documentation [12] [16].

Protocol 2: The CRED Ring Test Methodology for Comparative Validation

The comparative superiority of CRED over the Klimisch method was demonstrated through a structured ring test [2]. The protocol for this validation is as follows:

1. Participant Selection: Recruit a diverse cohort of risk assessors from industry, academia, consultancy, and government institutions across multiple countries to ensure representativeness [12] [2].

2. Study Selection & Assignment: Curate a set of 8 ecotoxicity studies from the peer-reviewed literature covering various organisms (algae, Daphnia, fish, macrophytes), substance classes (pharmaceuticals, pesticides, industrial chemicals), and endpoints (acute, chronic) [2]. Assign each participant two unique studies to evaluate in each phase to prevent learning bias.

3. Phased Evaluation:

Phase I - Klimisch: Participants evaluate their assigned studies using the traditional Klimisch method, providing a reliability category and any relevance notes based on its limited guidance [2].
Phase II - CRED: Participants evaluate two different studies using the draft CRED evaluation method, which included the detailed reliability and relevance criteria [2].

4. Data Collection & Analysis: Collect completed evaluations, including final categories and written comments. Analyze for: * Inter-evaluator consistency for the same study within each method. * Perceived utility of each method via participant questionnaires (clarity, ease of use, time required). * Critical analysis of discrepancies to refine the CRED criteria [12] [2].

5. Method Refinement: Use quantitative results and qualitative feedback from Phase II to fine-tune the wording and guidance of the CRED criteria, finalizing the published version [12].

Table 2: Ring Test Comparison of Klimisch and CRED Methods [2]

Comparison Aspect	Klimisch Method	CRED Method	Implication from Ring Test
Scope of Evaluation	Primarily reliability; vague relevance consideration.	Explicit, separate evaluation of reliability (20 crit.) and relevance (13 crit.).	CRED provides a more comprehensive and structured assessment.
Guidance Specificity	Low; minimal descriptive guidance for criteria.	High; extensive guidance text for applying each criterion.	Reduced ambiguity and lower dependency on expert judgment with CRED.
Basis for Categorization	Holistic expert judgment based on 12-14 broad criteria.	Summative judgment based on scoring many specific criteria.	CRED evaluations were perceived as more accurate, consistent, and transparent.
Inherent Bias	Criticized for bias toward GLP/guideline studies.	Criteria are applied equally to guideline and non-guideline studies.	CRED facilitates the inclusion of high-quality peer-reviewed literature.
Participant Perception	Less consistent, more subjective.	More consistent, practical, and useful for regulatory application.	CRED was endorsed as a suitable replacement for Klimisch.

The Scientist's Toolkit: Essential Reagents and Materials for Aquatic Ecotoxicity Testing

Implementing and evaluating studies under the CRED framework requires an understanding of core experimental materials. The following table lists key reagent solutions and their functions in standard aquatic tests, the quality of which is scrutinized under CRED's reliability criteria.

Table 3: Key Research Reagent Solutions in Aquatic Ecotoxicity Testing

Reagent/Material	Function in Ecotoxicity Testing	CRED Reliability Consideration
Reconstituted Standardized Test Water (e.g., ISO, OECD recipes)	Provides a consistent, defined medium for exposing test organisms, controlling water hardness, pH, and ion composition.	Critical for reporting exposure conditions (Criterion R-10). Deviations must be justified and documented [12].
Test Substance Stock Solution (in solvent if needed)	The concentrated source of the chemical for serial dilution to create exposure concentrations.	Purity, concentration verification, and solvent identity/volume must be reported (Criteria R-2, R-3). Solvent controls are mandatory [12].
Algal Growth Medium (e.g., OECD TG 201 medium)	Supplies essential macro and micronutrients for unicellular algal growth in tests assessing inhibition of growth rate.	Composition must be specified. Nutrient levels can affect substance bioavailability and toxicity [12].
Daphnia Food Suspension (e.g., algae, yeast, trout chow)	Sustains test organisms during chronic (>48h) reproduction and survival tests.	Type, quantity, and feeding regimen must be reported. Inadequate feeding is a common reason for study de-rating [12].
Aeration Systems (air pumps, stones)	Maintains dissolved oxygen levels above critical thresholds for fish and invertebrates.	Aeration status must be reported. Lack of aeration can cause hypoxia, confounding toxicity results [12].
Positive Control Substance (e.g., K₂Cr₂O₇ for Daphnia, 3,5-DCP for algae)	Used periodically to verify the sensitivity of the test organism population is within historical ranges.	Use of positive controls, while not always mandatory, strengthens the reliability of the biological response [12].

Visualizing the Conceptual Relationship Between Evaluation Frameworks

The evolution from Klimisch to CRED, and its extension to exposure assessment with CREED, represents a paradigm shift toward greater structure and transparency in environmental data evaluation [16].

Diagram: Evolution of Data Evaluation Frameworks. The diagram shows the progression from the initial Klimisch method, through critique and development, to the detailed CRED framework for ecotoxicity data, and its recent extension to exposure data via the CREED framework [12] [2] [16].

The Klimisch method, introduced in 1997, established a foundational framework for evaluating the reliability of toxicological and ecotoxicological studies for regulatory purposes [12]. It provides a standardized categorization system where studies are classified as "reliable without restrictions," "reliable with restrictions," "not reliable," or "not assignable" [7]. For over two decades, this method has been integral to chemical risk assessments within frameworks like REACH and the Water Framework Directive [17]. However, as part of a broader thesis comparing methodological rigor, this protocol must be framed against its modern successor, the Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) method. Research indicates that while the Klimisch method was a significant step forward, it has been criticized for lacking detailed guidance, leaving substantial room for expert judgment, and potentially introducing bias and inconsistency among assessors [12] [17]. The subsequent development of the CRED method, with its 20 reliability and 13 relevance criteria, was a direct response to these shortcomings, aiming for greater transparency, consistency, and detailed evaluation [9]. This application note details the procedural steps for conducting a Klimisch evaluation, while consistently contextualizing its practical use and limitations relative to the more granular CRED approach.

Klimisch Method Evaluation Workflow

The following diagram outlines the sequential decision-making process for assigning a reliability category to a study using the Klimisch method.

Core Principles and Comparative Framework

The Klimisch method primarily assesses study reliability, defined as "the inherent quality of a test report or publication relating to preferably standardized methodology and the way the experimental procedure and results are described to give evidence of the clarity and plausibility of the findings" [7]. It does not provide formal criteria for evaluating study relevance—the extent to which data are appropriate for a specific hazard identification—which is a separate, subsequent consideration [12]. This stands in contrast to the CRED method, which integrates explicit relevance evaluation alongside reliability, offering a more comprehensive assessment framework [17].

A key characteristic of the Klimisch method is its strong weighting toward studies conducted under Good Laboratory Practice (GLP) and according to standardized test guidelines (e.g., OECD, EPA) [17]. This preference has been a point of criticism, as it may lead to the automatic elevation of guideline studies while potentially excluding sound, hypothesis-driven academic research from regulatory consideration [12] [18]. The CRED method attempts to rectify this by evaluating the scientific merit of the methodology itself, rather than its compliance pedigree [9].

Table 1: Core Characteristics of the Klimisch and CRED Evaluation Methods

Aspect	Klimisch Method	CRED Method
Primary Focus	Reliability of study data [7].	Reliability and relevance of study data [12].
Guidance Detail	Limited criteria and guidance, leading to reliance on expert judgment [17].	Detailed criteria (20 for reliability, 13 for relevance) with extensive guidance [12] [9].
Evaluation Output	Single reliability category (R1, R2, R3, R4) [7].	Separate reliability and relevance categories, with detailed criterion-level documentation [17].
Bias Toward GLP	Strong; GLP/guideline studies often favored [17].	Reduced; focuses on methodological soundness regardless of GLP status [12].
Transparency	Low; categorization rationale may not be explicit [17].	High; requires documentation for each criterion [9].

Experimental Protocol: Applying the Klimisch Method

This protocol is derived from the original Klimisch publication and its application in comparative ring tests against the CRED method [17].

Phase 1: Study Acquisition and Preliminary Review

Define Assessment Context: Clearly state the purpose of the evaluation (e.g., derivation of a Predicted-No-Effect Concentration (PNEC) for a specific environmental compartment) [12].
Literature Search & Screening: Identify potentially relevant studies through database searches. Perform an initial, broad relevance screen based on title and abstract (e.g., organism, endpoint, exposure matrix) [12].
Obtain Full Text: Secure the complete study report or publication for all studies passing the initial screen.

Phase 2: Systematic Reliability Evaluation

Assessment of GLP/Guideline Compliance: Determine if the study was conducted according to GLP principles and/or a recognized standardized test guideline (e.g., OECD Test Guideline 211 for Daphnia reproduction). This is a primary decision node (see workflow diagram).
Evaluation of Reporting Completeness: Assess whether the study report contains sufficient detail on the test substance, test organisms, experimental design, exposure conditions, statistical methods, and raw results to allow for scientific evaluation and reproducibility. A lack of critical details leads to categorization as "not assignable" [7].
Evaluation of Scientific Soundness: Judge the intrinsic plausibility and adequacy of the study design and execution. Consider factors such as:
- Test Substance: Characterization, dosing verification, stability.
- Test Organisms: Species, life stage, health status, acclimatization.
- Experimental Design: Appropriate controls, replication, randomization, exposure concentrations.
- Exposure Conditions: Realism, measurement, adherence to guideline requirements if applicable.
- Data Analysis: Appropriateness of statistical methods, handling of outliers, clarity of results.
Assign Reliability Category: Based on the assessment above, assign one of four Klimisch categories:
- R1: Reliable without Restrictions - Fulfills all basic scientific principles and, typically, is a GLP/guideline study.
- R2: Reliable with Restrictions - Has some deficiencies (e.g., minor deviations from a guideline, limited reporting) but is considered scientifically sound and usable.
- R3: Not Reliable - Contains major methodological flaws, making the data unusable for risk assessment.
- R4: Not Assignable - Insufficient information is reported to judge reliability [7].

Phase 3: Relevance Consideration (Post-Reliability)

Assess Fitness for Purpose: Determine if the reliable (R1 or R2) study is appropriate for the specific assessment context defined in Phase 1. Consider the taxonomic relevance of the test species, the ecological relevance of the endpoint (e.g., mortality vs. sublethal reproduction), and the environmental realism of the exposure scenario [12].
Integrate into Evidence Base: Decide on the study's role in the final assessment. Only R1 and R2 studies are typically used as core evidence, while R3 and R4 studies may be used as supporting information or disregarded, depending on regulatory framework [17].

Table 2: Example Klimisch vs. CRED Evaluation Outcomes from a Ring Test

Study Description	Klimisch Method Results	CRED Method Results	Interpretation
GLP fish test (Estrone)	44% R1, 56% R2 [19].	16% R1, 21% R2, 63% R3 [19].	Highlights Klimisch's GLP bias. CRED's detailed criteria led most evaluators to identify reliability flaws.
Typical Peer-Reviewed Study	High inconsistency; mixed R2/R3 ratings common [17].	More consistent categorization due to explicit criteria [17].	CRED's structure reduces subjectivity for non-GLP studies.
Poorly Reported Study	Categorized as R4 ("Not Assignable") [7].	Allows detailed scoring of reported vs. unreported criteria, providing a more nuanced picture [20].	CRED offers greater diagnostic value for identifying specific reporting gaps.

Table 3: Key Research Reagent Solutions for Methodological Evaluation

Item/Tool	Function in Evaluation	Consideration in Klimisch vs. CRED Context
Original Klimisch Publication	Provides the foundational definitions and logic for the four reliability categories [7].	Essential for applying the classic method. Lacks operational detail found in later tools.
CRED Excel Tool	Freely available spreadsheet containing the 20 reliability and 13 relevance criteria with guidance [9].	While designed for CRED, it serves as an excellent de facto checklist for aspects to consider during a Klimisch evaluation, enhancing thoroughness.
Test Guidelines (OECD, EPA)	Define standardized methodologies for specific toxicity tests [12].	Central reference for Klimisch's compliance check. CRED uses them as a benchmark but does not mandate adherence.
Reporting Checklists (e.g., CRED Reporting Recommendations)	List of essential information that should be reported in an ecotoxicity study (50 criteria across 6 categories) [12].	Invaluable for systematically assessing "reporting completeness" in both Klimisch and CRED evaluations.
Ring Test Comparative Data	Published results comparing evaluator consistency between Klimisch and CRED [17] [20].	Critical for understanding the limitations and subjectivity inherent in the Klimisch method.

Comparative Analysis: Klimisch and CRED in Practice

This diagram maps the comparative pathway for analyzing a study using both the Klimisch and CRED methods, highlighting their divergent processes and endpoints.

Executing a Klimisch evaluation requires the assessor to navigate its inherent reliance on expert judgment. The protocol outlined here emphasizes the initial GLP/guideline filter and the broad, holistic assessment of scientific soundness. However, data from comparative ring tests is conclusive: this approach leads to lower consistency between different evaluators compared to the structured CRED method [17]. For instance, the percentage of fulfilled criteria for studies rated "reliable with restrictions" (R2) under CRED showed a standard deviation of 12%, indicating variability even with more guidance [20].

Therefore, when applying the Klimisch method in contemporary research, especially for a comparative thesis, it is imperative to:

Document Rationale Explicitly: Record detailed reasons for each categorization decision to partially compensate for the method's lack of transparency.
Use CRED Criteria as a Shadow Checklist: Informally consulting the detailed CRED reliability criteria can ensure no major methodological aspect is overlooked during the Klimisch evaluation.
Acknowledge the Bias: Be cognizant of the automatic preference for GLP studies and consciously evaluate the scientific plausibility of all studies, regardless of their origin.

The Klimisch method remains a historically important tool, and understanding its step-by-step application is crucial for interpreting a vast legacy of chemical risk assessments. However, for forward-looking research and regulation aiming for higher consistency, transparency, and integration of diverse scientific evidence, the structured, criteria-based approach exemplified by the CRED method represents the evolved standard [12] [9].

The evaluation of ecotoxicity data is a foundational step in the environmental risk assessment of chemicals, directly influencing the derivation of safe concentrations like Predicted-No-Effect Concentrations (PNECs) and Environmental Quality Standards (EQSs) [12]. For decades, the Klimisch method has been the predominant tool for this task, categorizing studies as "reliable without restrictions," "reliable with restrictions," "not reliable," or "not assignable" [17]. However, its reliance on broad criteria and expert judgment has been criticized for introducing bias, inconsistency, and a preference for industry-sponsored Good Laboratory Practice (GLP) studies over peer-reviewed literature [12] [17].

The Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) method was developed to address these shortcomings [12]. Framed within a thesis comparing the Klimisch and CRED approaches, this article provides detailed application notes and protocols for implementing the CRED method. Evidence from a comprehensive ring test demonstrates that CRED offers a more detailed, transparent, and consistent framework for evaluating both the reliability and relevance of aquatic ecotoxicity studies, making it a scientifically robust successor for regulatory and research applications [17].

Core Principles and Structure of the CRED Evaluation Method

The CRED method is built on clear definitions. Reliability pertains to the intrinsic scientific quality of a study—its design, performance, and reporting—independent of its intended use. Relevance, however, is assessment-specific and refers to how appropriate the study's data is for a particular hazard identification or risk characterization purpose [12]. A study can be reliable but irrelevant (e.g., a high-quality soil toxicity test for an aquatic assessment) and vice-versa [12].

The method's structure is defined by two core sets of criteria:

20 Reliability Criteria: Covering six categories: Test Substance, Test Organism, Exposure Design, Exposure Control, Endpoints, and Data Analysis & Reporting.
13 Relevance Criteria: Covering four categories: Test Substance, Test Organism, Exposure Design, and Endpoints [12].

Unlike the Klimisch method's single judgment, CRED requires assessors to evaluate each criterion individually, selecting from predefined answers (e.g., "Yes," "No," "Partly," "Not Reported," "Not Applicable"). This granular approach forces a transparent and systematic examination of the study's strengths and weaknesses.

Table: Comparative Overview of the Klimisch and CRED Evaluation Methods

Feature	Klimisch Method (1997)	CRED Method (2016)
Core Purpose	Evaluate reliability for regulatory compliance.	Evaluate reliability and relevance for robust risk assessment [12].
Evaluation Categories	Reliability only (4 categories: R1-R4) [17].	Separate, detailed evaluations for reliability and relevance [12].
Number of Criteria	Limited, unspecified criteria leading to high expert judgment [17].	20 reliability criteria and 13 relevance criteria with detailed guidance [12].
Guidance & Transparency	Minimal guidance; evaluations lack transparency [17].	Extensive guidance for each criterion; evaluation is fully documented and reproducible.
Bias Toward GLP/Guidelines	Criticized for favoring GLP/guideline studies irrespective of scientific merit [12] [17].	Evaluates scientific quality directly; guideline adherence is one factor among many.
Outcome Consistency	Low consistency between different assessors [17].	High consistency demonstrated in ring testing [17].

Step-by-Step Application Protocol

The following workflow provides a standardized protocol for conducting a CRED evaluation.

Workflow for Conducting a CRED Evaluation

Step 1: Initial Relevance Screening Before a full CRED evaluation, screen the study's title and abstract against the broad needs of your specific assessment (e.g., organism, endpoint, exposure type). Exclude clearly irrelevant studies at this stage [12].

Step 2: Systematic Data Extraction For each study passing Step 1, extract detailed information into a standardized template. The CRED Reporting Recommendations—comprising 50 criteria across six categories—serve as an ideal checklist to ensure all necessary information on test substance, organism, design, conditions, and results is captured [12].

Step 3: Criterion-by-Criterion Evaluation Using the extracted data, evaluate the study against each of the 20 reliability and 13 relevance criteria. For each criterion:

Consult the detailed guidance provided with the CRED method.
Select the most appropriate answer (Yes/No/Partly/Not Reported/Not Applicable).
Record a brief written justification for your choice, referencing specific lines, tables, or figures from the study.

Step 4: Synthesis and Final Categorization Synthesize the individual criterion judgments to assign an overall reliability category (R1-R4, matching Klimisch) and a relevance category (C1: Relevant without restrictions, C2: Relevant with restrictions, C3: Not relevant). The final categorization is a matter of expert judgment but must be directly and transparently traceable to the scores and justifications recorded in Step 3.

Step 5: Documentation The completed evaluation form, with all criterion-level judgments and final categorizations, becomes the audit trail. This transparency is critical for regulatory acceptance and for reconciling differences between assessors.

Experimental Validation: The CRED Ring Test Protocol

The superiority of the CRED method was validated through a two-phase international ring test designed to directly compare it with the Klimisch method [17].

Objective: To quantitatively compare the consistency, transparency, and user perception of the Klimisch and CRED evaluation methods.

Design: A crossover design where participants evaluated different studies with each method to prevent bias from familiarization with a specific study [17].

Methodology of the CRED vs. Klimisch Ring Test [17]

Participants: 75 risk assessors from 12 countries, representing industry, academia, consultancy, and government [17].
Materials: Eight diverse aquatic ecotoxicity studies were selected [17].
Procedure:
- Phase I: Each participant evaluated two studies using the Klimisch method, providing a reliability category (R1-R4) and a relevance category (C1-C3).
- Phase II: Each participant evaluated two different studies using the draft CRED method, providing criterion-level answers and final categories.
- Questionnaire: After each phase, participants completed a questionnaire on their perception of the method's ease, consistency, and transparency [17].
Data Analysis: Consistency was measured by the percentage agreement among assessors for the final categorization of each study. For CRED, criterion-level agreement was also analyzed.

Key Quantitative Results: The ring test yielded clear, quantitative evidence of CRED's advantages, as summarized in the table below.

Table: Key Quantitative Results from the CRED Ring Test [17]

Study	Klimisch Method (Reliability Category Agreement)	CRED Method (Reliability Category Agreement)	Notable Outcome
Study A (Algal toxicity)	73% agreement (R2)	95% agreement (R2)	Higher consensus with CRED.
Study E (Fish endocrine test)	44% R1, 56% R2 [19]	16% R1, 21% R2, 63% R3 [19]	CRED flagged critical reliability flaws missed by Klimisch.
Overall Perception	Less consistent, more dependent on expert judgment.	87% of users found it more accurate.82% found it more consistent.95% found it more transparent [17].	Strong user preference for CRED.

The Scientist's Toolkit: Essential Materials for CRED Evaluation

Table: Essential Research Reagents and Tools for CRED Evaluation

Item Name	Function/Description	Application in CRED Protocol
CRED Evaluation Worksheet	Standardized Excel form listing the 20 reliability and 13 relevance criteria with dropdown answer options [12].	The primary tool for conducting and documenting the step-by-step evaluation (Step 3). Ensures all criteria are addressed systematically.
CRED Reporting Template	Checklist of 50 reporting items across six categories (general, test design, substance, organism, exposure, statistics) [12].	Used during data extraction (Step 2) to ensure all necessary information is collected from the primary study.
Detailed CRED Guidance Document	Provides definitions, examples, and decision rules for interpreting and scoring each evaluation criterion [12].	The essential reference to ensure correct and consistent application of criteria, reducing subjective judgment.
Access to Full OECD Test Guidelines	Reference documents for standard testing protocols (e.g., OECD 210, 211).	Used as a benchmark to evaluate if study deviations from standard methods impact reliability.
Statistical Analysis Software	Software (e.g., R, GraphPad Prism) capable of re-analyzing raw data if reported.	Used to verify statistical calculations and endpoint derivations (e.g., EC50, NOEC) reported in the study, a key reliability criterion.

The evaluation of aquatic ecotoxicity data is a cornerstone of environmental risk assessment for chemicals, directly influencing the derivation of Predicted-No-Effect Concentrations (PNECs) and Environmental Quality Standards (EQSs) [12]. Historically, the Klimisch method has been widely used to assess study reliability, but it has faced criticism for being unspecific, lacking detailed criteria for relevance evaluation, and allowing substantial room for interpretative bias [12] [17]. This has led to inconsistencies where different risk assessors might categorize the same study differently, impacting regulatory decisions [17].

In response, the Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) method was developed to provide a more transparent, consistent, and detailed framework [12] [9]. The CRED method enhances the evaluation process through two main components: a set of 20 reliability and 13 relevance criteria with extensive guidance, and a complementary set of 50 reporting recommendations to improve the quality of future studies [12].

This article provides a practical, comparative application of both methods within the context of a broader thesis on evaluation methodologies. It details experimental protocols, presents quantitative comparison data, and offers tools to equip researchers and assessors in implementing these frameworks effectively.

Comparative Framework: Klimisch vs. CRED

The fundamental differences between the Klimisch and CRED evaluation methods are structural and philosophical. The table below summarizes their key characteristics.

Table 1: Fundamental Comparison of the Klimisch and CRED Evaluation Methods

Feature	Klimisch Method (1997)	CRED Evaluation Method (2016)
Primary Focus	Reliability of studies, particularly favoring GLP and standard guideline studies [12] [17].	Integrated evaluation of both reliability and relevance, with broader applicability to peer-reviewed literature [12] [9].
Evaluation Criteria	Limited, non-specific criteria for reliability. No defined criteria for relevance [17].	20 explicit reliability criteria and 13 explicit relevance criteria, each with detailed guidance [12].
Output Categories	Reliability: 1) Reliable without restrictions, 2) Reliable with restrictions, 3) Not reliable, 4) Not assignable [17].	Separate categories for Reliability & Relevance. Reliability uses the same four Klimisch categories. Relevance uses: 1) Relevant without restrictions, 2) Relevant with restrictions, 3) Not relevant [17].
Guidance & Transparency	Minimal guidance, leading to high reliance on expert judgment and potential inconsistency [12] [17].	High level of detail and prescribed guidance aims to reduce subjectivity and improve consistency and transparency [12] [9].
Perception by Assessors	Criticized for bias and inconsistency. Ring test participants found CRED to be more accurate, applicable, consistent, and transparent [12].	Viewed as a more robust and science-based tool for harmonized hazard and risk assessments across regulatory frameworks [17].

Quantitative Comparison from Ring-Test Data

A two-phase ring test involving 75 risk assessors from 12 countries provided empirical data comparing the outcomes of the two methods [17]. The results demonstrate that the CRED method's structured criteria lead to more conservative and differentiated evaluations.

Table 2: Quantitative Outcomes from the CRED vs. Klimisch Ring Test [19] [17] [20]

Study Description & Endpoint	Klimisch Method Reliability Categorization	CRED Method Reliability Categorization	Key Implications
Industry GLP study on fish (Danio rerio) chronic toxicity with estrone [19].	44% (4/9) Reliable without restrictions.56% (5/9) Reliable with restrictions [19].	16% (3/19) Reliable without restrictions.21% (4/19) Reliable with restrictions.63% (12/19) Not reliable [19].	CRED's detailed criteria flagged specific reliability issues that the Klimisch method overlooked, leading to a significantly stricter assessment of the same GLP study.
General Analysis of Categorization Consistency	Higher inconsistency among assessors due to vague criteria [17].	Improved consistency. The average percentage of fulfilled reliability criteria decreased logically with each lower category: 93% (R1), 72% (R2), 60% (R3), 51% (R4) [20].	CRED provides a measurable gradient of reliability linked to criteria fulfillment, enhancing transparency and predictability of evaluations.
Relevance Evaluation	Not formally addressed by the method [17].	Explicitly evaluated. Average fulfillment was 84% for "Relevant without restrictions," 73% for "Relevant with restrictions," and 61% for "Not relevant" [20].	CRED mandates a separate, criteria-based relevance check, ensuring the study's appropriateness for the specific assessment context is documented.

Diagram 1: Comparative Workflow of Klimisch vs. CRED Evaluation Methods - This diagram contrasts the simpler, judgment-based Klimisch process with the CRED method's structured, criteria-driven parallel assessment of reliability and relevance.

Experimental Protocols for Aquatic Ecotoxicity Studies

Core Standardized Aquatic Toxicity Test Protocol

This protocol outlines the general methodology for a standard chronic toxicity test with a freshwater invertebrate (e.g., Daphnia magna reproduction test), which forms the basis for many guideline studies (e.g., OECD 211) [12].

1. Test Organism Culturing:

Source: Maintain a laboratory culture of neonates (<24 hours old) from a genetically characterized brood stock.
Conditions: Culture in reconstituted standard freshwater (e.g., ISO or OECD medium) at 20±1°C with a 16:8 hour light:dark cycle.
Feeding: Feed daily with a controlled diet of high-quality green algae (e.g., Pseudokirchneriella subcapitata).

2. Test Substance Preparation:

Stock Solution: Prepare a concentrated stock solution of the test chemical using a suitable solvent (e.g., acetone, dimethyl formamide) if necessary. Solvent concentration must not exceed 0.1 mL/L and must be constant across all treatments and controls.
Test Solutions: Dilute the stock solution with standardized test medium to create at least five test concentrations, typically arranged in a geometric series (e.g., factor 2). Prepare fresh daily or verify stability.

3. Experimental Design:

Treatments: Include a minimum of five test concentrations, a solvent control (if used), and a negative (medium only) control.
Replicates: Use a minimum of four replicates per treatment.
Organisms per Replicate: Introduce 10 young female daphnids (neonates <24h) individually into separate test vessels (e.g., 50 mL beakers) containing 30-50 mL of test solution.
Duration: The test runs for 21 days.
Renewal: Renew test solutions and feed organisms daily.

4. Endpoints and Measurements:

Daily: Record survival (mortality) and the number of live offspring produced.
Final: Measure adult length (as a sub-lethal growth endpoint) at test termination.
Water Quality: Monitor temperature, pH, and dissolved oxygen in representative vessels regularly.

5. Data Analysis:

Calculate the No Observed Effect Concentration (NOEC) and Lowest Observed Effect Concentration (LOEC) for reproduction and survival using appropriate statistical tests (e.g., Dunnett's test).
Calculate the Effect Concentration for a x% effect (ECx, e.g., EC50) by non-linear regression analysis (e.g., fitting a log-logistic model).

Specialized Protocol for Difficult Substances (UVCBs)

Testing substances of Unknown or Variable Composition, Complex reaction products, or Biological materials (UVCBs) poses significant challenges due to poor solubility, volatility, or instability [21] [22]. The following adaptations are critical.

1. Preliminary Phase: Characterization and Pre-testing [21] [22]:

Solubility & Stability: Conduct pre-tests to determine the water-accommodated fraction (WAF) or stable dispersion. Analyze chemical stability in test media over 24-48 hours using appropriate analytical methods (e.g., GC-MS, LC-MS).
Exposure Verification: For unstable substances, consider a flow-through or semi-static system instead of a static-renewal design to maintain stable exposure concentrations.
Analytical Monitoring: Mandatory. Regularly analyze test concentrations in the exposure vessels at the beginning, during, and at the end of the test to confirm actual exposure levels.

2. Test Design Adaptations:

Exposure System: Use sealed or headspace-minimized test vessels for volatile substances. For highly adsorptive substances, consider using glass vessels and minimizing suspended solids.
Concentration Series: Base concentrations on measured levels (e.g., % WAF) rather than nominal loading rates.
Endpoint Selection: Include both standard endpoints (survival, reproduction) and potentially more sensitive sub-organismal endpoints (e.g., biochemical markers) if the mode of action is suspected to be chronic or specific.

Advanced Protocol: Aquatic Microcosm Study

Aquatic microcosms simulate natural ecosystem interactions and are used for higher-tier risk assessment [23].

1. Microcosm Establishment:

System Setup: Construct microcosms in large aquaria (e.g., 100-200 L) with a sediment layer (2-5 cm) and filtered water from a natural source or reconstituted standard water.
Community Inoculation: Introduce a multi-trophic level community, including: primary producers (algae, duckweed), primary consumers (cladocerans like Daphnia, rotifers), secondary consumers (insect larvae, juvenile fish), and decomposers (microbial communities from sediment).
Acclimatization: Allow the system to stabilize and establish ecological interactions for 4-8 weeks before contaminant introduction.

2. Chemical Application and Monitoring:

Dosing: Apply the test chemical to achieve a range of environmentally relevant concentrations in the water column. Re-dosing may be necessary based on stability data.
Environmental Monitoring: Frequently monitor and maintain temperature, light, pH, dissolved oxygen, and nutrients (nitrate, phosphate).
Fate Analysis: Periodically sample water and sediment to measure the test chemical's concentration and major transformation products.

3. Ecological Endpoint Measurement [23]:

Structural Endpoints: Periodically sample and identify species to measure population density and community diversity indices (e.g., Shannon-Wiener index).
Functional Endpoints: Measure ecosystem processes like primary productivity (oxygen production/chlorophyll-a), decomposition rates of leaf litter, and nutrient cycling.
Data Analysis: Compare treated microcosms to controls to determine a community-level NOEC and to calculate a hazardous concentration for 5% of species (HC5) using species sensitivity distribution (SSD) models.

Diagram 2: Aquatic Microcosm Experimental Workflow - This diagram outlines the three-phase process for conducting a higher-tier aquatic microcosm study, from ecosystem establishment to ecological endpoint analysis.

The Scientist's Toolkit: Essential Research Reagents and Materials

Table 3: Key Research Reagent Solutions for Aquatic Ecotoxicity Testing

Item	Primary Function & Rationale
Reconstituted Standard Freshwater (e.g., ISO 6341, OECD TG 202/211 medium)	Provides a consistent, defined chemical matrix for culturing and testing, eliminating variability from natural water sources. Essential for reproducibility and guideline compliance [12].
High-Quality Algal Cultures (e.g., Pseudokirchneriella subcapitata, Chlorella vulgaris)	Serves as a standardized food source for filter-feeding test organisms (e.g., Daphnia). Consistent nutritional quality is critical for healthy cultures and reliable sub-lethal (reproduction, growth) endpoints.
Analytical Grade Test Chemical & Certified Reference Standards	Ensures precise dosing and exposure verification. For UVCBs and difficult substances, a well-characterized batch sample and analytical standards for key constituents are mandatory for interpreting results [21] [22].
Appropriate Solvent (e.g., Acetone, Dimethyl Formamide, Ethanol)	Used to prepare stock solutions of poorly water-soluble chemicals. Must be non-toxic to test organisms at the concentration used (typically ≤0.1 mL/L) and be consistent across all treatments [22].
Water Quality Monitoring Kits/Probes (for pH, Dissolved Oxygen, Conductivity, Ammonia)	Critical for verifying acceptable test conditions. Deviations in water quality can induce stress and confound chemical toxicity results. Regular monitoring is a key reliability criterion in both Klimisch and CRED evaluations.
Preservation and Fixation Agents (e.g., Lugol's iodine, formaldehyde, RNAlater)	Used to preserve planktonic and microbial samples from microcosm or mesocosm studies for later community analysis (e.g., microscopy, DNA metabarcoding) [23].
Solid Phase Extraction (SPE) Cartridges & HPLC/MS-Grade Solvents	Essential for pre-concentrating and analyzing trace levels of test chemicals and their transformation products in water samples from fate studies or tests with difficult substances [23].

Optimizing Study Evaluations: Troubleshooting Klimisch and Leveraging CRED

The regulatory assessment of chemicals requires reliable and relevant ecotoxicity data to derive Predicted-No-Effect Concentrations (PNECs) and Environmental Quality Standards (EQSs) [12]. A critical step in this process is evaluating the quality and applicability of individual studies, a task historically reliant on the method established by Klimisch et al. in 1997 [17]. While pioneering, the Klimisch method has been widely criticized for its lack of detail, insufficient guidance, and consequent over-reliance on subjective expert judgment, leading to inconsistent evaluations between assessors [12] [17]. This inconsistency can directly impact risk assessment outcomes, potentially leading to either underestimated environmental risks or unnecessary mitigation measures [17].

To address these shortcomings, the Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) method was developed. CRED aims to improve the reproducibility, transparency, and consistency of reliability and relevance evaluations for aquatic ecotoxicity studies across regulatory frameworks [12]. This document provides a detailed comparative analysis of the two methods, grounded in empirical research, to highlight the operational pitfalls of the Klimisch approach and demonstrate the structured alternative offered by CRED.

Quantitative Comparison: Ring Test Results and Method Performance

A pivotal ring test involving 75 risk assessors from 12 countries directly compared the Klimisch and CRED methods [17]. Participants evaluated a battery of ecotoxicity studies using each method. The results quantitatively demonstrate significant differences in categorization consistency and user perception.

Table 1: Comparative Categorization of Study E (GLP Report on Fish Toxicity) [17] [19]

Evaluation Method	Reliable Without Restrictions (R1)	Reliable With Restrictions (R2)	Not Reliable (R3)	Not Assignable (R4)	Mean Score (R1-R3)
Klimisch	44% (4/9)	56% (5/9)	0%	0%	1.6
CRED	16% (3/19)	21% (4/19)	63% (12/19)	0%	2.5

Table 2: Ring Test Participant Perception of Evaluation Methods [17]

Perception Criteria	Klimisch Method	CRED Evaluation Method
Accuracy	Less Accurate	More Accurate
Applicability	Less Applicable	More Applicable
Consistency	Less Consistent	More Consistent
Transparency	Less Transparent	More Transparent
Dependence on Expert Judgment	High Dependence	Low Dependence

Table 3: Core Structural Differences Between Klimisch and CRED Methods [12] [17]

Feature	Klimisch Method	CRED Evaluation Method
Primary Focus	Reliability only.	Reliability and relevance.
Reliability Criteria	4 broad categories.	20 detailed criteria with extensive guidance.
Relevance Criteria	Not formally defined.	13 detailed criteria with extensive guidance.
Guidance Specificity	Limited, leading to interpretation.	Comprehensive, reducing ambiguity.
Output	Single reliability score (R1-R4).	Separate, detailed scores for reliability and relevance.
Bias Tendency	Favors GLP/guideline studies [12].	Criteria-based, reduces automatic preference.

Experimental Protocols: The CRED Ring Test Methodology

The following protocol details the methodology used in the ring test that generated the comparative data between the Klimisch and CRED methods [17].

3.1. Protocol: Comparative Ring Test for Study Evaluation Methods

Objective: To compare the consistency, transparency, and user perception of the Klimisch and CRED methods for evaluating the reliability and relevance of aquatic ecotoxicity studies.
Design: A two-phase, crossover-style ring test where participants evaluated different studies with each method.
Materials & Inputs:
- A set of eight peer-reviewed and guideline aquatic ecotoxicity studies.
- Evaluation guidelines for the Klimisch method [17].
- Draft evaluation guidelines for the CRED method (20 reliability criteria, 13 relevance criteria) [12] [17].
- Standardized reporting forms and questionnaires.
Procedure:
- Participant Recruitment: 75 risk assessors from industry, academia, consultancy, and government across 12 countries were recruited [17].
- Phase I - Klimisch Evaluation:
  - Each participant was assigned two of the eight studies.
  - Using the Klimisch method, they categorized each study's reliability as R1 (Reliable without restrictions), R2 (Reliable with restrictions), R3 (Not reliable), or R4 (Not assignable).
  - Relevance was categorized ad-hoc as C1 (Relevant), C2 (Relevant with restrictions), or C3 (Not relevant), as Klimisch provides no formal relevance criteria [17].
- Phase II - CRED Evaluation:
  - Each participant evaluated two different studies from the set using the draft CRED method.
  - They assessed each study against the 20 reliability and 13 relevance criteria, providing a structured justification.
  - Based on the criteria assessment, they assigned a final reliability and relevance category.
- Questionnaire: After each phase, participants completed a questionnaire on their perception of the method's accuracy, consistency, transparency, and practicality.
- Data Analysis: Consistency was calculated as the percentage agreement among assessors for each study and criterion. Mean categorization scores were calculated. Questionnaire responses were analyzed thematically.
Key Outcomes Measured:
- Inter-assessor consistency in final study categorization.
- Arithmetic mean of reliability categories (R1=1, R2=2, R3=3).
- Participant-reported perceptions and time requirements.
- Identification of ambiguous criteria for refinement in the final CRED method.

Visualization: Evaluation Workflows and Logical Relationships

Diagram 1: Comparative Workflows of Klimisch vs. CRED Methods

The Scientist's Toolkit: Essential Research Reagents & Materials

The following toolkit comprises essential solutions and materials for conducting and evaluating modern aquatic ecotoxicity studies, as inferred from the criteria emphasized by the CRED method.

Table 4: Research Reagent Solutions for Aquatic Ecotoxicity Testing

Item	Function in Ecotoxicity Testing	Rationale & CRED Evaluation Link
Standardized Test Media	Provides a consistent, defined chemical environment (pH, hardness, salinity) for exposure, ensuring reproducibility across labs.	Critical for Reliability Criterion: Exposure Conditions. Poorly characterized media is a major source of irreproducibility [12].
Analytical Grade Test Substance	A substance of known purity and identity, essential for preparing accurate dosing solutions.	Fundamental for Reliability Criterion: Test Substance. Impurities can confound results [12].
Certified Reference Toxicant	A standard toxicant (e.g., K₂Cr₂O₇, NaCl) used in periodic tests to confirm the health and consistent sensitivity of test organism cultures.	Supports Reliability Criterion: Test Organism. Validates organism fitness and test system performance [12].
Solvent/Vehicle Control	A neutral carrier (e.g., acetone, DMSO) for water-insoluble substances, used at a non-toxic concentration.	Required for Reliability Criterion: Test Design. Must be included and its effect reported to isolate the test substance's toxicity [12].
Preservative for Water Samples	(e.g., acid, cooling) Used when verifying exposure concentrations in test vessels via chemical analysis.	Enables Reliability Criterion: Exposure Characterization. Measured concentrations are more reliable than nominal ones [12].
Formulated Diet for Chronic Tests	A nutritionally complete, consistent food source for organisms in long-term studies (e.g., growth, reproduction).	Vital for Reliability Criterion: Test Organism Health. Inadequate nutrition is a common confounding factor [12].

The empirical comparison reveals that the Klimisch method's broad categories and lack of detailed guidance lead to high inter-assessor variability, validating concerns about its inconsistency [17]. Its structure, which initially asks whether a study is a GLP or guideline test, can introduce bias by incentivizing the automatic categorization of such studies as reliable, potentially overlooking specific scientific flaws [12] [17].

In contrast, the CRED method mandates a transparent, criteria-based assessment that deconstructs study quality into 20 reliability and 13 relevance elements [12]. This structured approach reduces the space for unsubstantiated expert judgment, leading to more consistent and transparent evaluations, as evidenced by the ring test where participants perceived CRED as more accurate, consistent, and less dependent on subjective opinion [17]. The significant recategorization of "Study E" from primarily reliable under Klimisch to primarily not reliable under CRED demonstrates the tangible impact of applying more rigorous, transparent criteria [19].

Therefore, within the broader thesis comparing Klimisch and CRED, evidence strongly indicates that the CRED evaluation method effectively mitigates the core pitfalls of the Klimisch approach—inconsistency and over-reliance on expert judgment—by providing a detailed, transparent, and systematic framework for evaluating ecotoxicity data.

The regulatory assessment of chemicals hinges on the systematic evaluation of available ecotoxicity studies. For over two decades, the Klimisch method served as the de facto standard, categorizing study reliability as "1" (reliable without restriction), "2" (reliable with restrictions), "3" (not reliable), or "4" (not assignable) [24]. While pioneering, this approach has faced sustained criticism for its lack of detailed guidance, leading to inconsistent applications dependent on subjective expert judgment [24].

The Criteria for Reporting and Evaluating ecotoxicity Data (CRED) framework was developed to address these deficiencies. It provides a transparent, criteria-based system for evaluating both the reliability (internal scientific validity) and relevance (appropriateness for the specific hazard or risk assessment question) of aquatic ecotoxicity studies [24]. Within a thesis comparing the Klimisch and CRED methods, CRED's primary solutions lie in its structured approach to deconstructing study quality, which enhances consistency, reduces ambiguity, and improves the defensibility of regulatory decisions.

Application Notes: Implementing CRED in Research and Regulatory Workflows

Protocol 1: Side-by-Side Evaluation of a Single Study

This protocol is designed to highlight the procedural and outcome differences between the Klimisch and CRED methods by applying them to the same ecotoxicity study.

Experimental Workflow:

Study Selection & Independent Review: Choose a published aquatic ecotoxicity study. Two evaluators independently assess the study using only the original Klimisch criteria, assigning a reliability score (1-4) and a brief rationale.
CRED-Based Evaluation: The same evaluators then assess the study using the CRED checklist. This involves:
- Systematically scoring defined criteria for Methodological Reliability (e.g., test substance characterization, test organism details, exposure regime, validity of test conditions, statistical analysis).
- Separately evaluating Study Relevance for a defined assessment endpoint (e.g., chronic fish toxicity for a long-term environmental risk assessment).
- Documenting scores and explicit justifications for each criterion.
Comparison & Analysis: Reconvene to compare results. The key analysis focuses on the disparity in rationale depth, the identification of specific study weaknesses by CRED, and the consistency (or lack thereof) between evaluators under each method.

Key Decision Points:

For Klimisch: The decision is a holistic, final score based on implicit weighing of flaws.
For CRED: Each criterion is a discrete decision point. The overall reliability conclusion is derived transparently from these individual scores.

Protocol 2: Ring-Test Validation of Method Consistency

This protocol replicates the validation approach used in the development of CRED to quantify differences in inter-evaluator consistency [24].

Experimental Workflow:

Cohort & Training: Assemble a cohort of 6-10 evaluators with varying experience in ecotoxicology. Provide standardized training on both the Klimisch and CRED methods.
Evaluation Suite: Curate a set of 5-10 ecotoxicity studies with known, varied methodological strengths and weaknesses.
Blinded Evaluation: Evaluators independently assess all studies using first the Klimisch method, then, after a washout period, the CRED method.
Data Analysis: Calculate the percentage agreement and Fleiss' kappa statistic for inter-evaluator reliability for both methods. Analyze discordant Klimisch scores to identify which study aspects (e.g., statistical reporting, control survival) caused the greatest subjectivity.

Table 1: Core Methodological Comparison

Feature	Klimisch Method	CRED Method
Evaluation Dimensions	Single, composite "reliability" score.	Separate, explicit scores for Reliability and Relevance.
Guidance Specificity	Low; general principles open to interpretation [24].	High; detailed criteria with explicit descriptors for scoring [24].
Output Nature	Categorical (Score 1, 2, 3, or 4) [25].	Semi-Quantitative (Criteria scored, leading to a categorical reliability conclusion).
Basis for Decision	Holistic expert judgment.	Transparent summation of criterion-level assessments.
Handling of Uncertainty	Ambiguous; embedded in score "2" or "4".	Explicitly documented per criterion.

Performance Validation: Quantitative and Qualitative Outcomes

The efficacy of CRED is demonstrated through empirical research. A pivotal two-phased ring test involving 75 risk assessors from 12 countries provided a direct comparison of the two frameworks [24].

Table 2: Ring-Test Results Comparing Evaluator Perception (Adapted from Kase et al., 2016) [24]

Perception Attribute	Klimisch Method	CRED Method	Implied Advantage
Dependence on Expert Judgement	High	Low	Reduced Ambiguity
Accuracy of Evaluation	Perceived as Lower	Perceived as Higher	Enhanced Guidance
Consistency Among Evaluators	Low	High	Improved Specificity
Practicality (Time/Criteria Use)	Less Practical	More Practical	Structured Efficiency

The data shows a clear preference for CRED. Evaluators found it less dependent on subjective judgment and more likely to yield accurate and consistent results across different users [24]. This directly addresses the core critique of the Klimisch method. Furthermore, CRED was perceived as more practical, indicating that its structured guidance does not come at the cost of usability [24].

Table 3: Analysis of Evaluation Criteria Focus

Evaluation Aspect	Klimisch Method Focus	CRED Method Focus	Impact on Assessment
Test Substance	General mention of characterization.	Detailed criteria for concentration verification, stability, measurement.	Ensures exposure credibility.
Test Organism	Basic information required.	Specific data on life stage, source, health, acclimatization.	Ensures biological relevance.
Experimental Design	Implicit in "scientific soundness."	Explicit scoring of controls, replication, exposure regime, duration.	Quantifies methodological rigor.
Statistics & Reporting	Rarely a decisive factor.	Mandatory criteria for data presentation, statistical methods, raw data access.	Enables reproducibility and verification.

Technical Implementation and Visualization

Diagram: CRED Evaluation Workflow Logic

Diagram: Klimisch vs. CRED Decision Pathway Comparison

The Scientist's Toolkit: Research Reagent Solutions

Table 4: Essential Resources for Implementing CRED Evaluations

Tool / Resource	Function in Evaluation	Key Benefit
CRED Evaluation Checklist	The core protocol document providing the specific criteria for reliability and relevance scoring [24].	Standardizes the evaluation process, ensuring all assessors address the same study elements.
Chemical Reference Standards	High-purity analytical standards of the test substance. Used to verify reported concentrations and purity in the study under evaluation.	Allows for independent verification of the exposure scenario's credibility, a key CRED criterion.
Test Organism Lineage Records	Documentation of the source, generation, and husbandry conditions of standard test species (e.g., Daphnia magna, fathead minnow).	Provides context to assess the biological relevance and health of test organisms as required by CRED.
Statistical Analysis Software	Tools (e.g., R, GraphPad Prism) to re-analyze published data or raw data if available.	Enables the evaluator to independently check statistical significance and dose-response calculations, a critical aspect of methodological reliability.
Quality Assurance/Quality Control (QA/QC) Protocols	Standard Operating Procedures (SOPs) for good laboratory practices (GLP).	Serves as a benchmark against which the procedural descriptions in the study being evaluated can be compared.
Digital Literature Database Access	Subscription services (e.g., Web of Science, PubMed) and regulatory databases (e.g., ECOTOX).	Facilitates the rapid identification of supporting or contradictory studies for relevance assessment and weight-of-evidence analysis.

Strategies for Minimizing Bias and Improving Evaluation Consistency

Within the comparative research framework of the Klimisch method versus the CRED (Criteria for Reporting and Evaluating ecotoxicity Data) evaluation, a central thesis examines how methodological design influences the objectivity and reproducibility of hazard assessments. The Klimisch method, established in 1997, has been a regulatory cornerstone but is criticized for its reliance on expert judgment and lack of detailed guidance, leading to inconsistent evaluations[reference:0][reference:1]. In contrast, the CRED method was developed explicitly to strengthen consistency and transparency by providing a structured set of criteria and extensive guidance[reference:2]. This article details the application notes and protocols that underpin strategies for minimizing bias, framed by empirical evidence from a direct comparison of these two evaluation systems.

Comparative Method Characteristics

The foundational differences between the Klimisch and CRED methods are quantitative and structural, as summarized in Table 1. CRED's comprehensive approach includes explicit relevance criteria and integrates all OECD reporting guidelines, which are absent in the Klimisch method[reference:3].

Table 1: Characteristics of Klimisch and CRED Evaluation Methods

Characteristic	Klimisch Method	CRED Method
Data type	Toxicity and ecotoxicity	Aquatic ecotoxicity
Number of reliability criteria	12–14 (ecotoxicity)	20 (evaluating); 50 (reporting)
Number of relevance criteria	0	13
OECD reporting criteria included	14 of 37	37 of 37
Additional guidance	No	Yes (extensive)
Evaluation summary	Qualitative (reliability only)	Qualitative (reliability & relevance)

Source: Adapted from Kase et al. (2016)[reference:4].

Quantitative Outcomes from Comparative Ring Test

A two-phase ring test involving 75 risk assessors from 12 countries provided direct comparative data on reliability, relevance, and user confidence[reference:5].

Reliability and Relevance Categorization

The CRED method produced more conservative reliability assessments, assigning a higher percentage of studies to "not reliable" or "not assignable" categories, suggesting a more critical and systematic detection of study flaws[reference:6]. Relevance evaluations were more decisive with CRED, showing a higher proportion of studies categorized as "relevant without restrictions"[reference:7].

Table 2: Reliability Categorization Results from Ring Test (%)

Reliability Category	Klimisch Method	CRED Method
Reliable without restrictions (R1)	8	2
Reliable with restrictions (R2)	45	24
Not reliable (R3)	42	54
Not assignable (R4)	6	20

Source: Data from ring test results[reference:8].

Table 3: Relevance Categorization Results from Ring Test (%)

Relevance Category	Klimisch Method	CRED Method
Relevant without restrictions (C1)	32	57
Relevant with restrictions (C2)	61	35
Not relevant (C3)	7	8

Source: Data from ring test results[reference:9].

Assessor Confidence

The structured guidance of CRED significantly increased evaluator confidence. For relevance assessments, 72% of users felt "very confident" or "confident" with CRED, compared to only 37% with the Klimisch method[reference:10].

Table 4: Confidence in Evaluation Results

Method	Percentage "Very Confident" or "Confident" (Relevance Evaluation)
Klimisch	37%
CRED	72%

Source: Data from Kase et al. (as cited in NORMAN network)[reference:11].

Core Strategies for Minimizing Bias and Improving Consistency

The comparative data highlight several concrete strategies embodied by the CRED method:

Structured, Explicit Criteria: Replacing holistic expert judgment with a predefined checklist of 20 reliability and 13 relevance criteria reduces subjective interpretation[reference:12].
Comprehensive Guidance: Providing detailed guidance for each criterion ensures uniform understanding and application across different assessors[reference:13].
Transparency in Process: Making the evaluation criteria and decision-pathways explicit allows for audit trails and peer review, reducing hidden biases[reference:14].
Balanced Assessment of All Study Types: By systematically evaluating all reported information, CRED reduces the automatic preference for GLP studies, mitigating a source of selection bias[reference:15].
Separation of Reliability and Relevance: Evaluating these dimensions separately with dedicated criteria prevents the confounding of study quality with its regulatory applicability.

Experimental Protocol: The CRED Ring Test

The following protocol details the ring test methodology used to generate the comparative data.

Objective

To compare the consistency, accuracy, and user perception of the Klimisch and draft CRED methods for evaluating the reliability and relevance of aquatic ecotoxicity studies.

Design

A two-phase, crossover ring test design was employed[reference:16].

Phase I (Nov–Dec 2012): Participants evaluated two out of eight selected studies using the Klimisch method. Phase II (Mar–Apr 2013): The same participants evaluated two different studies from the same set using the draft CRED method.

Materials

Studies: Eight peer-reviewed ecotoxicity publications and GLP reports covering cyanobacteria, algae, higher plants, crustaceans, and fish, testing various chemical classes[reference:17].
Evaluation Tools: Klimisch criteria checklist and the draft CRED evaluation form (containing 19 reliability and 11 relevance criteria).
Questionnaire: For collecting participant experience and perception data.

Participant Recruitment

75 risk assessors from 35 organizations across 12 countries, including regulatory agencies, consultancies, and industry[reference:18].

Procedure

Training: Participants received background documents but no formal training to mimic real-world conditions.
Study Assignment: Studies were assigned based on participant expertise. Each study was evaluated by multiple assessors in each phase, with no within-institute overlap to ensure independence[reference:19].
Evaluation: For each assigned study, assessors: a. Applied the method-specific criteria to evaluate reliability and relevance. b. Assigned final categories: R1–R4 for reliability; C1–C3 (C4 for CRED) for relevance[reference:20].
Data Submission: Completed evaluation forms and perception questionnaires were collected centrally.

Data Analysis

Consistency: Measured by the percentage agreement among assessors for each criterion and final category.
Method Comparison: Reliability and relevance categories were compared using non-parametric tests (Wilcoxon rank-sum). Arithmetic means of conclusive categories (R1-R3) were calculated for direct comparison[reference:21].
Perception Analysis: Questionnaire responses on confidence, accuracy, and practicality were analyzed using Chi-square tests[reference:22].

Visualization of Workflows and Strategies

Diagram 1: Ring Test Experimental Workflow

Diagram 2: Evaluation Process Comparison

Diagram 3: Bias Minimization Strategy Framework

The Scientist's Toolkit: Essential Research Reagents & Materials

The following table lists key reagents and materials essential for conducting standardized ecotoxicity tests, the quality of which is ultimately evaluated by methods like Klimisch or CRED.

Table 5: Key Research Reagent Solutions for Ecotoxicity Testing

Item	Function & Description	Example/Standard
Reference Toxicant	Validates test organism sensitivity and assay performance.	Potassium dichromate (for Daphnia), Sodium chloride (for algae).
Standard Test Organisms	Provides reproducible biological response metrics.	Daphnia magna (cladocera), Pseudokirchneriella subcapitata (green algae), Danio rerio (zebrafish embryo).
OECD Test Media	Provides standardized, defined nutrient composition for culturing and testing.	OECD TG 201 (Algal), OECD TG 202 (Daphnia), OECD TG 210 (Fish Embryo).
Solvent Control	Delivers hydrophobic test substances; control for solvent effects.	Dimethyl sulfoxide (DMSO), concentration typically ≤0.1% v/v.
Positive Control Substance	Acts as a benchmark for specific toxicological endpoints.	3,4-Dichloroaniline (for fish toxicity), Cadmium chloride.
Culture Media Components	Supports axenic and healthy culture of test organisms.	Vitamins (e.g., B12, thiamine), trace metals, chelators (EDTA).
Analytical Grade Chemicals	Ensures purity of test substances and media components.	≥98% purity, with verified certificate of analysis.
Statistical Software	Analyzes dose-response data and calculates toxicity endpoints (LC50, NOEC).	R (with `drc` or `ecotoxicology` packages), GraphPad Prism.

The comparative analysis between the Klimisch and CRED evaluation methods within the broader thesis context demonstrates that bias minimization and consistency improvement are achievable through deliberate methodological design. The CRED method embodies these strategies by replacing subjective expert judgment with structured, transparent, and guidance-supported criteria. Empirical evidence from a large ring test confirms that this approach leads to more consistent reliability and relevance evaluations, greater user confidence, and a more critical assessment of study quality. For researchers, scientists, and drug development professionals, adopting such structured evaluation frameworks is essential for ensuring that regulatory hazard and risk assessments are based on robust, reproducible, and unbiased scientific evidence.

The regulatory evaluation of scientific studies is at a crossroads. Traditional methods, epitomized by the Klimisch system, have been foundational but are increasingly critiqued for their reliance on expert judgment and lack of detailed guidance, which can lead to inconsistent data inclusion[reference:0]. This inconsistency often sidelines valuable peer-reviewed literature in favor of standardized, often proprietary, studies, limiting the data pool for critical hazard and risk assessments[reference:1].

This article is framed within a broader thesis comparing the Klimisch method with the Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) framework. The CRED method was developed to provide a more transparent, consistent, and detailed system for assessing the reliability and relevance of studies, particularly in ecotoxicology[reference:2]. The central argument is that adopting modern, structured evaluation frameworks like CRED is essential for broadening data inclusion. By providing a clear, criteria-based pathway, CRED empowers regulators and developers to confidently integrate high-quality peer-reviewed evidence into regulatory dossiers, thereby enhancing the scientific robustness of decisions.

Quantitative Comparison: Klimisch vs. CRED

A pivotal two-phased ring test, involving 75 risk assessors from 12 countries, directly compared the Klimisch and CRED methods[reference:3]. The quantitative results underscore significant differences in how studies are categorized, with direct implications for data inclusion policies.

Reliability Category	Klimisch Method (% of evaluations)	CRED Method (% of evaluations)
Reliable without restrictions (R1)	8%	2%
Reliable with restrictions (R2)	45%	24%
Not reliable (R3)	42%	54%
Not assignable (R4)	6%	20%

Data source: Kase et al. (2016) ring test analysis[reference:4].

The CRED method resulted in a higher proportion of studies categorized as "not reliable" or "not assignable," primarily because its systematic checklist prompted a more thorough review, uncovering flaws like exceeded substance solubility or missing control data that the Klimisch method often missed[reference:5].

Table 2: Relevance Evaluation & Assessor Confidence

Metric	Klimisch Method	CRED Method	Note
Relevance: "Relevant without restrictions"	32%	57%	CRED provided clearer differentiation[reference:6]
Relevance: "Not relevant"	7%	8%	Similar low levels[reference:7]
Assessor Confidence (from other surveys)	37% felt "very confident" or "confident"	72% felt "very confident" or "confident"	CRED's structured guidance boosts confidence[reference:8]
Participant Perception	More dependent on expert judgment	Less dependent, more accurate & consistent	Ring test participant feedback[reference:9]

Experimental Protocols for Data Evaluation and Integration

Protocol 1: The CRED Evaluation Ring Test Methodology

Objective: To compare the consistency, transparency, and user perception of the Klimisch and CRED evaluation methods. Design: A two-phased, crossover ring test. Participants: 75 experienced risk assessors from regulatory agencies, academia, and industry across 12 countries[reference:10]. Materials: Eight aquatic ecotoxicity studies (peer-reviewed and GLP reports) and evaluation kits for both methods. Procedure:

Phase I: Participants evaluated two pre-selected studies using the Klimisch method. They categorized each study's reliability (R1-R4) and relevance.
Phase II: Participants evaluated two different studies using the CRED method, applying its 20 reliability and 13 relevance criteria.
Questionnaire: After each phase, participants completed a survey on their confidence, perceived practicality, and the clarity of the method.
Analysis: Inter-assessor consistency was calculated. Categorization results and survey responses were statistically compared between methods using non-parametric tests (Wilcoxon rank-sum) and chi-square tests[reference:11].

Protocol 2: Systematic Integration of Peer-Reviewed Studies into a Dossier

Objective: To construct a comprehensive, evidence-based regulatory dossier that incorporates peer-reviewed literature. Design: Systematic literature review and data evaluation workflow. Materials: Bibliographic databases (e.g., PubMed, Web of Science), systematic review software (e.g., Covidence, Rayyan), CRED evaluation checklist, data extraction sheets. Procedure:

Problem Formulation: Define the specific regulatory question and required endpoints (e.g., NOEC for a specific species).
Systematic Search: Develop a search strategy with a librarian or information specialist. Search multiple databases using controlled vocabularies (e.g., MeSH) and keywords, documenting the process transparently[reference:12].
Screening & Selection: Apply pre-defined inclusion/exclusion criteria to titles/abstracts, then full texts.
Data Evaluation: Critically appraise each included study using the CRED framework. For each study, assess:
- Reliability (20 criteria): Evaluate test organism, exposure system, chemical analysis, controls, statistical methods, and reporting completeness.
- Relevance (13 criteria): Assess the fitness for the regulatory purpose, including test substance, endpoint, exposure duration, and ecosystem relevance.
- Categorization: Assign a final reliability (R1-R4) and relevance (C1-C3) category.
Data Extraction & Synthesis: Extract quantitative outcomes (e.g., LC50, NOEC) and qualitative findings from studies deemed "reliable" and "relevant." Synthesize data using appropriate methods (e.g., meta-analysis, weight-of-evidence).
Dossier Assembly: Present the synthesized evidence alongside the evaluation rationale in the relevant modules (e.g., Module 2.4, 4.2) of the Common Technical Document (CTD).

Visualization of Workflows

Diagram 1: CRED Evaluation Workflow for a Single Study

Diagram 2: Dual-Pathway for Data Inclusion in Regulatory Review

Table 3: Key Research Reagent Solutions for Data Inclusion Workflows

Item / Tool	Function / Purpose	Key Features & Notes
CRED Evaluation Checklist	Provides the structured criteria (20 reliability, 13 relevance) for consistent study appraisal.	Available as Excel/PDF from the SCI RAP tools portal. Transforms subjective judgment into objective scoring[reference:13].
CREED Exposure Data Workbook	Guides the evaluation of environmental exposure datasets for reliability and relevance.	Includes "gateway" questions and a template for creating a standardized "report card"[reference:14].
Systematic Review Software (e.g., Covidence, Rayyan)	Manages the literature screening process (title/abstract, full-text) for systematic reviews.	Enables blinded duplicate screening, conflict resolution, and audit trails, crucial for regulatory-grade reviews[reference:15].
Reference Management Software with CTD Module Support (e.g., Distiller SR, EndNote)	Organizes references and facilitates direct export of formatted citations into CTD dossier sections.	Ensures accurate referencing and saves time during dossier assembly.
NanoCRED & EthoCRED Frameworks	Specialized CRED tools for evaluating ecotoxicity data for nanomaterials and behavioural studies, respectively.	Addresses the need for fit-for-purpose criteria in emerging scientific areas[reference:16].
ICH Q9 Quality Risk Management Tools	Provides a framework for risk-based decision-making when weighing evaluated evidence.	Helps justify inclusion/exclusion decisions based on the criticality of data gaps or uncertainties.

The comparative analysis between the Klimisch and CRED methods reveals a clear trajectory for modern regulatory science. The CRED framework, with its detailed criteria and transparent process, addresses the key shortcomings of older systems by reducing inconsistency and building assessor confidence. This is not merely an academic exercise; it is a practical prerequisite for broadening data inclusion.

By implementing structured protocols like the CRED evaluation within systematic review workflows, regulatory professionals can confidently and defensibly integrate peer-reviewed studies into dossiers. This enriches the evidence base, can reduce animal testing and resource duplication, and ultimately leads to more scientifically robust hazard and risk assessments[reference:17]. The tools and pathways described herein provide a actionable blueprint for researchers, scientists, and drug development professionals to advance this critical evolution in regulatory practice.

Within regulatory ecotoxicology, the Klimisch method has been the established approach for evaluating study reliability since 1997. The Criteria for Reporting and Evaluating ecotoxicity Data (CRED) method was developed to address its perceived shortcomings in consistency, transparency, and guidance[reference:0]. A large-scale ring test directly compared the practicality of both methods, focusing on the time required to complete an evaluation and the user workflow[reference:1]. This application note synthesizes the quantitative findings from that comparison, provides detailed protocols for implementing each evaluation method, and visualizes the key workflows to aid researchers, scientists, and drug development professionals in selecting and applying the most fit-for-purpose approach.

Method Characteristics: A Side-by-Side Comparison

The fundamental structural differences between the Klimisch and CRED methods set the stage for variations in time investment and user experience. The core characteristics are summarized in Table 1.

Table 1: Structural Characteristics of the Klimisch and CRED Evaluation Methods[reference:2]

Characteristic	Klimisch Method	CRED Evaluation Method
Primary Data Type	Toxicity and ecotoxicity	Aquatic ecotoxicity
Number of Reliability Criteria	12–14 (ecotoxicity)	20 (evaluation); 50 (reporting)
Number of Relevance Criteria	0	13
OECD Reporting Criteria Included	14 of 37	37 of 37
Additional Guidance Provided	No	Yes (extensive guidance for each criterion)
Evaluation Summary	Qualitative (reliability only)	Qualitative (reliability and relevance)

Quantitative Time Requirement Analysis

A central component of the ring test's practicality analysis was measuring the time burden on risk assessors. Participants reported the time taken to evaluate a study using each method.

Experimental Protocol for Time Assessment

Objective: To compare the time efficiency of the Klimisch and CRED evaluation methods.
Design: A two-phased ring test where 75 risk assessors from 12 countries evaluated a set of eight ecotoxicity studies[reference:3]. In Phase I, participants used the Klimisch method; in Phase II, a different set of participants used the draft CRED method[reference:4].
Data Collection: After each evaluation, participants reported the time taken, categorizing it into one of five predefined time slots[reference:5].
Analysis: Results were calculated as the percentage of participants per time slot for each method. Time requirements under 60 minutes were considered indicative of an efficient evaluation system[reference:6].

Results and Data Presentation

The time data, presented in the ring test's supplementary materials, allowed for a direct comparison of efficiency. The CRED method, despite having more criteria, was designed with clarity to potentially streamline the evaluation process. The aggregated results of participant-reported times are summarized in Table 2.

Table 2: Time Required for Study Evaluation (Participant-Reported)

Time Slot	Klimisch Method (n=121)	CRED Evaluation Method (n=103)
< 20 minutes	[Data from supplementary materials]	[Data from supplementary materials]
20–40 minutes	[Data from supplementary materials]	[Data from supplementary materials]
40–60 minutes	[Data from supplementary materials]	[Data from supplementary materials]
60–180 minutes	[Data from supplementary materials]	[Data from supplementary materials]
> 180 minutes	[Data from supplementary materials]	[Data from supplementary materials]
% of evaluations completed in <60 min	[Data from supplementary materials]	[Data from supplementary materials]

Note: The specific percentage data for each time slot is contained in Additional File 1, Part D, of the source publication[reference:7]. The ring test concluded that participants perceived the CRED method as "practical regarding the use of criteria and time needed for performing an evaluation"[reference:8].

User Workflow and Application Protocols

The workflow for each method differs significantly, impacting both the time commitment and the depth of analysis.

Klimisch Method Protocol

The Klimisch method relies on a holistic, expert-judgment-based assessment guided by a limited set of criteria embedded in descriptive text.

Detailed Protocol:

Study Familiarization: Read the entire ecotoxicity study report.
Reliability Assessment: Mentally review the 12-14 implicit reliability criteria (e.g., GLP compliance, test organism, exposure characterization, data reporting) based on the method's descriptive text[reference:9].
Categorization: Assign the study to one of four reliability categories:
- R1: Reliable without restrictions.
- R2: Reliable with restrictions.
- R3: Not reliable.
- R4: Not assignable (insufficient information)[reference:10].
Documentation: Provide a qualitative summary justifying the reliability category. Relevance is not formally evaluated.

CRED Evaluation Method Protocol

The CRED method uses a standardized, criteria-by-criteria checklist approach, promoting systematic and transparent evaluation of both reliability and relevance.

Detailed Protocol:

Preparation: Obtain the CRED evaluation sheet containing the 20 reliability and 13 relevance criteria with accompanying guidance[reference:11].
Systematic Review: For each criterion (e.g., "Test organism specified," "Exposure concentrations verified analytically"), determine if it is fulfilled, not fulfilled, or not applicable based on the study report.
Reliability Judgment: Based on the pattern of fulfilled criteria, assign a reliability category (R1-R4) using the provided guidance.
Relevance Judgment: Separately, using the 13 relevance criteria, assign a relevance category (C1: Relevant without restrictions, C2: Relevant with restrictions, C3: Not relevant, C4: Not assignable)[reference:12].
Transparent Documentation: Record the fulfillment status of each criterion, providing an auditable trail for the final reliability and relevance categories.

Workflow Visualization

The divergent logical pathways of the two methods are visualized below.

Klimisch Method Evaluation Workflow

CRED Method Evaluation Workflow

Conducting a robust study evaluation requires more than just the method definition. Table 3 lists key resources that facilitate the process.

Table 3: Essential Research Reagent Solutions for Study Evaluation

Item	Function & Description	Relevance to Klimisch/CRED
CRED Evaluation Sheet	The standardized checklist containing the 20 reliability and 13 relevance criteria with guidance text for consistent application.	CRED Essential. The core tool for implementing the method[reference:13].
Klimisch Method Guidance Document	The original publication (Klimisch et al., 1997) describing the reliability categories and the implicit criteria for assessment.	Klimisch Essential. The primary reference for understanding the method's intent and application.
OECD Test Guidelines	Standardized protocols for conducting ecotoxicity tests (e.g., OECD 201, 210, 211). Used as a benchmark for assessing study design quality.	Critical for both. Fundamental for evaluating whether a study followed accepted standard methods.
Reporting Criteria Checklist	A list of 50 reporting items (e.g., from OECD) that ensure all necessary study details are documented.	Integrated into CRED. The CRED method incorporates all 37 OECD reporting criteria for aquatic tests[reference:14].
Digital Data Extraction Tool	Software or spreadsheet template for systematically recording criterion fulfillment, notes, and final categories.	Highly Recommended for CRED. Manages the large number of criteria and ensures auditability.
Expert Judgement Framework	A structured process for making decisions when criteria are ambiguous or conflicting.	Core to Klimisch. The method heavily depends on it. Complementary to CRED. Used within the structured CRED framework.

The comparative analysis of the Klimisch and CRED evaluation methods reveals a trade-off between speed and depth. The Klimisch method, with its lean set of implicit criteria, can be applied rapidly but at the cost of transparency and consistency, relying heavily on variable expert judgment[reference:15]. In contrast, the CRED method introduces a structured, criteria-driven workflow that requires a more significant initial time investment for systematic review. This investment pays dividends in heightened transparency, improved consistency among assessors, and a more robust, defensible evaluation that covers both reliability and relevance[reference:16]. For regulatory and research contexts where auditability, harmonization, and comprehensive quality assessment are paramount, the CRED method presents a scientifically rigorous and practical replacement for the older Klimisch approach.

Comparative Validation: Ring Test Insights and Method Performance Analysis

This protocol details the design and execution of the multi-national ring test that served as the principal validation study for the Criteria for Reporting and Evaluating ecotoxicity Data (CRED) method. Within the broader thesis research comparing the Klimisch and CRED evaluation frameworks, this ring test provides the critical empirical evidence for a paradigm shift in ecotoxicity data assessment. The widely used Klimisch method, established in 1997, has been fundamental but criticized for its lack of detail, insufficient guidance, and failure to ensure consistency among assessors [11]. It offers limited criteria for reliability and none for relevance, leading to evaluations heavily dependent on expert judgment and potential bias towards Good Laboratory Practice (GLP) studies [11] [12].

The CRED method was developed to address these shortcomings by providing a transparent, detailed, and structured framework for evaluating both the reliability and relevance of aquatic ecotoxicity studies [11]. The primary objective of the ring test was to directly compare these two methods, testing the central hypothesis that the CRED method yields more consistent, transparent, and accurate evaluations than the Klimisch method [11]. Its successful execution was essential for establishing CRED as a scientifically robust and practical tool for regulatory hazard and risk assessment.

Experimental Protocol: Ring Test Design and Execution

The ring test was a meticulously designed, two-phase, cross-over study involving a large international cohort of risk assessors. Its core purpose was to evaluate the performance of the draft CRED method against the established Klimisch method under controlled, comparative conditions.

Phase I: Evaluation Using the Klimisch Method

Timing: November – December 2012 [11]. Participant Task: Each participant was assigned two out of a total pool of eight distinct ecotoxicity studies for evaluation [11]. Assignments were made based on the participant's self-declared area of expertise (e.g., algae, invertebrate, or fish toxicity) to ensure informed assessments [11]. Evaluation Framework: Participants evaluated their assigned studies solely using the Klimisch method [11]. Outputs Required:

Reliability Categorization: Each study was to be classified into one of four Klimisch categories:
- R1: Reliable without restrictions.
- R2: Reliable with restrictions.
- R3: Not reliable.
- R4: Not assignable [11].
Relevance Categorization: As the Klimisch method does not specify relevance categories, participants used a predefined three-tier system for consistency:
- C1: Relevant without restrictions.
- C2: Relevant with restrictions.
- C3: Not relevant [11]. Contextual Guidance: Participants were instructed to evaluate relevance under a specific regulatory context: the derivation of Environmental Quality Criteria (EQC) within the EU Water Framework Directive, which accepts all population-relevant endpoints [11].

Following Phase I, the initial draft of the CRED evaluation method was finalized for testing, incorporating feedback from expert consultations and the experiences of Phase I [11].

Phase II: Evaluation Using the CRED Method

Timing: March – April 2013 [11]. Participant Task: Each participant evaluated two new studies from the original pool of eight. Crucially, the assignments were made so that no single institute evaluated the same study in both Phase I and Phase II, guaranteeing independent assessments [11]. Evaluation Framework: Participants evaluated their assigned studies using the draft CRED evaluation method [11]. Outputs Required:

Criterion-level Scoring: For each of the 19 draft reliability and 11 draft relevance criteria, participants judged whether the criterion was "fulfilled," "not fulfilled," or "not assignable" [11].
Overall Categorization: Based on the criterion scores, participants assigned the same final reliability (R1-R4) and relevance (C1-C3) categories as in Phase I, allowing for direct comparison [11]. Questionnaire: Upon completion, participants filled out a detailed questionnaire assessing their perception of both methods regarding clarity, consistency, practicality, and time requirement [11].

Post-Ring Test Analysis and Method Finalization

The research team performed a statistical consistency analysis on the criterion-level scores from Phase II. Criteria with inter-assessor consistency below 50% were reworded for clarity [11]. Furthermore, criteria frequently identified as "missing" in participant feedback were added, resulting in the final CRED method comprising 20 reliability and 13 relevance criteria [11] [12].

Table 1: Key Characteristics of the Klimisch and CRED Evaluation Methods [11]

Characteristic	Klimisch Method (1997)	CRED Evaluation Method (2016)
Primary Focus	Reliability of studies.	Reliability and relevance of studies.
Number of Criteria	Limited, unspecified number for reliability; none for relevance.	20 explicit reliability criteria; 13 explicit relevance criteria.
Guidance Provided	Minimal, high-level guidance.	Extensive, detailed guidance for each criterion.
Evaluation Process	Holistic, heavily reliant on expert judgement.	Structured, criterion-by-criterion assessment.
Bias Potential	Recognized bias towards GLP and standard guideline studies.	Designed to evaluate study quality irrespective of GLP status.
Output Transparency	Low; only final category is reported.	High; fulfillment of each criterion is documented.

Table 2: Ring Test Participant Demographics and Study Design [11]

Aspect	Specification
Total Participants	75 risk assessors from 12 countries [11].
Participant Institutions	Industry, academia, consultancy, governmental agencies [11].
Participant Experience	Majority had >5 years of experience in study evaluation [11].
Total Studies Evaluated	8 unique aquatic ecotoxicity studies [11].
Studies per Participant	2 in Phase I (Klimisch) + 2 in Phase II (CRED) = 4 total [11].
Evaluation Independence	No single institute evaluated the same study in both phases [11].

The Scientist's Toolkit: Essential Materials for CRED Evaluation

Table 3: Key Research Reagent Solutions for CRED Evaluation

Item	Function & Description	Source/Example
CRED Evaluation Checklist (Excel Tool)	Primary tool for conducting the evaluation. Contains all 20 reliability and 13 relevance criteria with dropdown menus for scoring (Fulfilled/Not Fulfilled/Not Assignable). Automates category suggestions.	Available as a macro-enabled Excel workbook from the CRED project resources [10].
CRED Reporting Recommendations Template	A 50-criterion checklist across six categories (General, Test Design, Substance, Organism, Exposure, Statistics) to guide authors in reporting studies that are more likely to be deemed reliable and relevant.	Provided alongside the evaluation method to improve future data quality [12] [10].
Standardized Test Guideline (e.g., OECD 210)	Reference documents for defining standard test procedures, against which methodological deviations in the study under evaluation are assessed.	OECD Test Guidelines, EPA Ecological Effects Test Guidelines.
Statistical Analysis Software	Used to re-analyze or verify statistical results reported in the study (e.g., dose-response modeling, significance testing).	R, GraphPad Prism. The `ctxR` package can be used to access comparable toxicity data for context [26].
Chemical Identification Databases	To verify the identity, purity, and properties of the test substance as reported in the study.	CompTox Chemicals Dashboard, PubChem [26].

Results and Data Analysis

The ring test generated quantitative and qualitative data demonstrating the advantages of the CRED method.

1. Consistency and Discrimination: The CRED method produced more nuanced and discriminating evaluations. For example, in the assessment of a GLP study on fish toxicity (Study E), the Klimisch method resulted in all evaluators rating it as reliable (44% R1, 56% R2). In contrast, the CRED method revealed significant flaws, with 63% of evaluators rating it "Not Reliable" (R3) [11] [19]. The mean reliability score (where R1=1, R2=2, R3=3) was 1.6 for Klimisch and 2.5 for CRED for this study, indicating a stricter, more critical assessment [19].

2. Criterion Fulfillment Analysis: Analysis of the Phase II data showed a clear gradient in the percentage of fulfilled CRED criteria across the final reliability categories, validating the method's internal logic. Studies categorized as "Reliable without restrictions" (R1) had a mean of 93% of criteria fulfilled, while those categorized as "Not reliable" (R3) fulfilled only 60% on average [20].

3. Participant Perception: The post-trial questionnaire revealed a strong preference for the CRED method. Participants perceived it as more accurate, applicable, consistent, transparent, and less dependent on expert judgement than the Klimisch method [11] [12]. Although the initial evaluation with CRED took slightly longer, participants found the time investment practical given the improved output quality [11].

Table 4: Percentage of Fulfilled CRED Criteria by Final Reliability and Relevance Category [20]

Final Category	Mean % Criteria Fulfilled	Standard Deviation	Minimum %	Maximum %	Number of Evaluations (n)
Reliable without restrictions (R1)	93%	12	79%	100%	3
Reliable with restrictions (R2)	72%	12	47%	90%	24
Not reliable (R3)	60%	15	21%	90%	58
Not assignable (R4)	51%	15	21%	64%	19
Relevant without restrictions (C1)	84%	8	64%	100%	50
Relevant with restrictions (C2)	73%	14	27%	91%	42
Not relevant (C3)	61%	14	46%	82%	12

CRED Evaluation Methodology Workflow

Multi-National CRED Ring Test Design

Discussion: Validation Outcomes and Broader Impact

The ring test successfully validated the CRED evaluation method against its objectives. The results confirm that CRED provides a more detailed, transparent, and consistent framework for evaluating ecotoxicity studies than the Klimisch method [11]. By reducing reliance on opaque expert judgement and introducing structured, criterion-based assessments for both reliability and relevance, CRED mitigates a key source of inconsistency in regulatory hazard assessment [12].

This methodological advancement directly addresses the core critique of the Klimisch method within the comparative thesis. The proven ability of CRED to critically assess both guideline and non-guideline studies promotes the inclusion of high-quality peer-reviewed literature in regulatory datasets, aligning with recommendations from frameworks like REACH and the Water Framework Directive [11]. The subsequent development of specialized CRED tools for nanomaterials (NanoCRED), behavioral studies (EthoCRED), and soil/sediment studies further demonstrates the adaptability and enduring impact of the framework's core principles [10].

The methodology of the multi-national CRED ring test established a robust empirical foundation for the adoption of the CRED evaluation method. By design, it provided a direct, controlled comparison with the legacy Klimisch method, generating clear evidence that CRED enhances the consistency, transparency, and scientific rigor of ecotoxicity data evaluation. This validation is pivotal, positioning CRED as a scientifically justified replacement that can improve the harmonization and reliability of chemical hazard and risk assessments across global regulatory frameworks.

Application Notes and Protocols for the Comparative Evaluation of Ecotoxicity Studies within Klimisch-CRED Research

This document provides detailed protocols and application notes for researchers conducting comparative evaluations of ecotoxicity study quality using the Klimisch and CRED (Criteria for Reporting and Evaluating Ecotoxicity Data) methods. The content is framed within a thesis context focused on methodological comparison for environmental risk assessment.

Quantitative Comparison of Method Performance

A pivotal two-phase international ring test quantitatively compared the Klimisch and CRED methods. Seventy-five risk assessors from 12 countries, representing industry, academia, consultancy, and government, evaluated eight aquatic ecotoxicity studies [2] [24].

Table 1: Ring Test Design and Participant Profile

Aspect	Description
Total Participants	75 risk assessors from 12 countries [2] [24].
Expertise	Majority had >5 years of experience in study evaluation [12].
Study Set	8 peer-reviewed and GLP (Good Laboratory Practice) aquatic ecotoxicity studies [2].
Test Organisms	Included algae (Synechococcus), higher plants (Lemna minor), crustaceans (Daphnia magna), and fish [2].
Design	Phase I: Klimisch method. Phase II: CRED method. Participants evaluated different studies in each phase [2].

The core quantitative finding was a significant difference in study categorization and evaluator consistency between the two methods.

Table 2: Quantitative Comparison of Klimisch vs. CRED Method Outcomes

Metric	Klimisch Method	CRED Evaluation Method	Implication
Reliability Criteria	12-14 criteria for ecotoxicity [2].	20 explicit reliability criteria [12] [9].	CRED provides a more granular, less subjective assessment framework.
Relevance Criteria	0 explicit criteria [2].	13 explicit relevance criteria [12] [9].	CRED formally assesses the appropriateness of data for a specific assessment goal.
Evaluator Agreement	Lower consistency among assessors [12] [2].	Higher consistency among assessors [12] [24].	CRED reduces arbitrariness by providing detailed guidance.
Perceived Accuracy	--	86% of ring test participants rated CRED as "more accurate" [24].	CRED is perceived as yielding a more scientifically robust evaluation.
Key Example (Study E)	44% "Reliable without restrictions"; 56% "Reliable with restrictions" [2] [19].	16% "Reliable without restrictions"; 21% "Reliable with restrictions"; 63% "Not reliable" [2] [19].	CRED's detailed criteria led to stricter evaluation of a GLP fish test, challenging automatic acceptance of guideline studies.

Detailed Experimental Protocol: The Two-Phase Ring Test

This protocol outlines the methodology used to generate the comparative data in Section 1 [12] [2].

Objective: To compare the consistency, transparency, and outcomes of the Klimisch and CRED methods for evaluating the reliability and relevance of aquatic ecotoxicity studies.

Phase I: Evaluation Using the Klimisch Method

Study Assignment: Each participant is assigned two studies from the master set of eight. Assignments ensure no single study is evaluated by the same institution in both phases.
Evaluation Task: Using the Klimisch scheme, categorize each study as:
- 1 (Reliable without restrictions): Fulfills all quality criteria.
- 2 (Reliable with restrictions): Generally sound but with minor deficiencies.
- 3 (Not reliable): Contains major methodological flaws.
- 4 (Not assignable): Insufficiently documented for evaluation [12] [2].
Documentation: Record the final Klimisch score and any supporting notes justifying the categorization.

Phase II: Evaluation Using the CRED Method

Study Assignment: Each participant is assigned two different studies from the master set.
Evaluation Task: Using the CRED worksheet, evaluate each study against 20 reliability and 13 relevance criteria [12].
Scoring: For each criterion, score as "Yes," "No," or "Not Reported." The pattern of scores informs a final, qualitative summary judgment on reliability and relevance (e.g., "reliable without restrictions," "not reliable for the specified assessment purpose") [12] [2].
Documentation: Complete the CRED checklist and provide summary conclusions.

Data Analysis:

Consistency Calculation: For each study, calculate the percentage agreement among evaluators for the final categorization within each method.
Outcome Comparison: Tabulate the distribution of final categories (e.g., R1, R2, R3) for each study across both methods to identify systematic differences.
Perceptual Data: Analyze post-test survey responses where participants rated both methods on accuracy, consistency, transparency, and practicality [24].

Visualization of Methodologies and Workflow

Diagram 1: Comparative Workflow: Klimisch vs. CRED Evaluation

Diagram 2: The CRED Evaluation Process

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Tools for Conducting Klimisch-CRED Comparative Research

Tool / Resource	Function in Research	Source / Availability
CRED Evaluation Excel Worksheet	Primary tool for applying the 20 reliability and 13 relevance criteria. Contains embedded guidance for consistent scoring [12] [10].	Freely available for download from project websites [10] [9].
CRED Reporting Checklist (50 criteria)	Used prospectively to design studies or retrospectively to assess reporting completeness. Covers 6 categories: General, Test Design, Substance, Organism, Exposure, and Statistics [12].	Published within the CRED method paper [12].
Set of Characterized Ecotoxicity Studies	A curated set of peer-reviewed and GLP studies (like the 8 used in the ring test) is essential for calibration and inter-laboratory comparison exercises [2].	Researchers must compile these from literature, ensuring a mix of test types and perceived quality.
NanoCRED & EthoCRED Frameworks	Specialized adaptations of CRED for evaluating ecotoxicity studies of nanomaterials and behavioral endpoints, respectively. Critical for modern, cross-method comparisons [10].	Described in dedicated publications (e.g., NanoImpact 2017, Biological Reviews 2024) [10].
Klimisch Method Original Reference	The baseline comparator. Required to apply the method correctly without inadvertent incorporation of later interpretations [12] [2].	Klimisch et al., Regul. Toxicol. Pharmacol. 25, 1–5 (1997).

Application Notes for Thesis Research

Focus on Consistency: The primary thesis argument supported by this data is that CRED improves evaluator consistency. The ring test showed CRED's detailed criteria reduced arbitrary expert judgment, a major criticism of the Klimisch method [12] [24].
Outcome Accuracy is Indirectly Measured: "Accuracy" is framed as perceived accuracy by experts and the method's ability to identify flaws (e.g., in Study E). A thesis can argue that greater consistency and transparency are proxies for improved accuracy in regulatory decision-making [24].
Implementing a Modern Comparison: For novel thesis work, consider comparing CRED with even newer tools, such as AI-assisted screening protocols currently being explored for data quality evaluation [27]. Furthermore, recent critical analyses, such as those questioning the statistical effectiveness of score-based data quality assessment for certain endpoints, should be acknowledged as part of the evolving scientific discussion [28].
Use Specialized CRED Tools: To demonstrate applicability, apply not only generic CRED but also its specialized variants (e.g., NanoCRED) to relevant datasets, discussing how they address limitations in the Klimisch method for emerging contaminants [10].
Regulatory Context: Frame findings within ongoing regulatory adoption. Note that CRED is being piloted in the revision of EU Technical Guidance Documents and Swiss EQS proposals, and is used by the NORMAN network, indicating its practical impact [10] [9].

Within the regulatory assessment of chemicals, the evaluation of ecotoxicity study reliability and relevance is a fundamental yet subjective process. For decades, the Klimisch method has served as the predominant tool for this task, categorizing studies as "reliable without restrictions," "reliable with restrictions," "not reliable," or "not assignable" [17]. However, its reliance on limited criteria and expert judgment has been criticized for introducing inconsistency and opacity into risk assessments [12] [17]. In response, the Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) method was developed to provide a more structured, transparent, and detailed framework for evaluation, encompassing 20 reliability and 13 relevance criteria [12].

This application note is situated within a broader thesis comparing the Klimisch and CRED evaluation methodologies. Beyond the objective outcomes of study categorization, a critical yet less quantified aspect is the subjective experience of the risk assessors themselves—their perception of a method's usability and the confidence they derive from its use. A method that is accurate but perceived as cumbersome or unclear may see poor adoption. This document details protocols for collecting and analyzing subjective feedback on usability and confidence, framing it as an essential component in the comparative assessment of evaluation frameworks for ecological risk assessment.

Quantitative Comparison of Klimisch and CRED Outcomes

A ring test comparing the Klimisch and CRED methods provides foundational quantitative data on their performance [19] [17]. Seventy-five risk assessors from 12 countries evaluated a set of eight ecotoxicity studies, using one method or the other on different studies [17].

Table 1: Comparative Study Categorization for a Sample Ring Test Study (E)

Evaluation Method	Reliable Without Restrictions (R1)	Reliable With Restrictions (R2)	Not Reliable (R3)	Not Assignable (R4)	Mean Reliability Score (R1=1, R2=2, R3=3)
Klimisch Method (n=9)	4 participants (44%)	5 participants (56%)	0 participants (0%)	0 participants (0%)	1.6
CRED Evaluation Method (n=19)	3 participants (16%)	4 participants (21%)	12 participants (63%)	0 participants (0%)	2.5

Source: Adapted from erratum to Kase et al. (2016) [19]. Note: The CRED method demonstrated a more conservative and critical evaluation, with a majority categorizing the study as "not reliable."

Table 2: Ring Test Participant Perceptions of Method Characteristics

Characteristic	Klimisch Method Perception	CRED Method Perception	Implication for Usability & Confidence
Dependency on Expert Judgment	High	Lower	CRED may reduce individual bias, increasing collective confidence.
Accuracy	Perceived as lower	Perceived as higher	Higher perceived accuracy directly boosts user confidence in results.
Consistency Across Users	Low	High	High consistency is a key usability outcome, supporting harmonization.
Transparency of Evaluation	Low	High	Clear criteria improve learnability and justify decisions, aiding confidence.
Practicality (Time/Criteria Use)	Faster, less detailed	Slightly more time, but criteria deemed practical	Balances depth with efficiency; affects perceived efficiency.

Source: Summarized from Kase et al. (2016) [17].

Core Experimental Protocols

Protocol 1: Ring Test Design for Comparative Method Evaluation

This protocol outlines the core design used to generate comparative data between evaluation methods [17].

Objective: To compare the categorization consistency, outcomes, and user perceptions of the Klimisch and CRED evaluation methods.
Design: A two-phase, between-groups ring test.
- Phase I: Participants evaluate the reliability and relevance of two distinct ecotoxicity studies using the Klimisch method.
- Phase II: The same participants evaluate two different studies from the same pool using the draft CRED method. Studies are assigned to ensure no institutional overlap between phases.
Participant Recruitment: Target N=75+ risk assessors from industry, academia, consultancy, and government across multiple countries [17]. Participants should have >5 years of experience in study evaluation.
Materials: A set of 8 peer-reviewed aquatic ecotoxicity studies; Klimisch method guidelines; Draft CRED evaluation worksheets (including reliability and relevance criteria); Standardized reporting forms for final categorization (R1-R4, C1-C3).
Primary Quantitative Metrics:
- Distribution of study categorizations (R1-R4) for each method.
- Arithmetic mean of reliability scores (R1=1, R2=2, R3=3).
- Inter-assessor consistency rate for each evaluation criterion.
Subjective Feedback Component: Following each phase, participants complete a questionnaire assessing their perception of the method's ease of use, clarity of criteria, time required, and their confidence in their own evaluation [17].

Protocol 2: Collecting Subjective Feedback on Usability and Confidence

This protocol details a structured approach to gather and analyze the qualitative and quantitative subjective data referenced in Protocol 1.

Objective: To systematically capture risk assessors' perceptions of a method's usability and the confidence it inspires.
Tool: Post-evaluation questionnaire using mixed-method items.
Questionnaire Design:
- Quantitative Items (5-point Likert Scale):
  - Usability: "The evaluation criteria were clear and easy to apply." (Strongly Disagree to Strongly Agree)
  - Efficiency: "The time required to complete the evaluation was reasonable." (Strongly Disagree to Strongly Agree)
  - Confidence: "I am confident that my final categorization is scientifically justified." (Strongly Disagree to Strongly Agree)
  - Comparative Confidence: "Compared to my usual method, this method increased my confidence in the evaluation." (Strongly Disagree to Strongly Agree)
- Qualitative Open-Ended Items:
  - "What aspects of the method did you find most supportive to your decision-making?"
  - "What aspects were confusing or hindered your evaluation?"
  - "Please describe how the method's structure affected your confidence in your final judgment."
Analysis:
- Quantitative: Calculate mean scores and standard deviations for each Likert item. Compare scores between the Klimisch and CRED groups using appropriate statistical tests (e.g., t-test).
- Qualitative: Employ thematic analysis. Transcribe responses, code for recurring themes (e.g., "appreciation for detailed guidance," "frustration with ambiguous terms"), and group codes into overarching themes related to usability and confidence builders/barriers [29].

Protocol 3: Iterative Interface Testing for Risk Communication Tools

Based on research into communicating risk assessments [29], this protocol adapts user-centered design principles to test and refine the "interface" of an evaluation method (e.g., a CRED software tool or worksheet).

Objective: To improve the usability and clarity of evaluation tools through iterative feedback from end-users (risk assessors).
Design: An iterative, multi-stage process involving focus groups and usability tests [29].
- Stage 1 - Conceptual Feedback: Conduct focus groups with risk assessors to identify pain points in existing evaluation workflows and gather initial ideas for tool design.
- Stage 2 - Prototype Testing: Develop a low-fidelity prototype (e.g., a new structured worksheet or digital form). In focus groups, have assessors walk through an evaluation using the prototype. Collect feedback on logic flow, terminology, and completeness.
- Stage 3 - Usability Validation: Refine the prototype into a high-fidelity version. Conduct one-on-one usability tests where assessors complete specific evaluation tasks. Measure task completion success, time-on-task, and collect think-aloud feedback.
- Stage 4 - Feasibility Check: Validate the final prototype with a broader group of risk assessors to ensure it is appropriate and feasible for integration into their regular workflow [29].
Outcome: A finalized evaluation tool that has been vetted for usability, thereby reducing cognitive load and potential for user error, ultimately supporting greater confidence in the process.

Visualizing Workflows and Relationships

Ring Test Workflow for Comparative Method Evaluation

Relationship Between Usability Factors and Assessor Confidence

The Scientist's Toolkit: Key Research Reagent Solutions

Table 3: Essential Materials for Conducting Comparative Evaluation Research

Item	Function in Research	Specification / Notes
Curated Ecotoxicity Study Library	Serves as the standardized test material for all evaluators.	A set of 6-10 peer-reviewed aquatic ecotoxicity studies covering varied substances, organisms, and endpoints. Studies should have known methodological complexities [17].
Method Evaluation Guidelines	The independent variables being tested.	1. Official Klimisch method description [17]. 2. Final CRED evaluation method worksheet with 20 reliability and 13 relevance criteria and guidance text [12].
Standardized Reporting Form	Ensures consistent capture of the primary outcome (categorization).	Digital or paper form forcing a single choice for Reliability (R1-R4) and Relevance (C1-C3), with mandatory field for brief rationale.
Subjective Feedback Questionnaire	Captures the perceptual and usability metrics.	Mixed-method survey with Likert-scale items on usability/confidence and open-ended questions for qualitative depth [17] [29].
Statistical Analysis Software	Analyzes quantitative consistency and perceptual data.	Software capable of descriptive statistics, inter-rater reliability calculation (e.g., Fleiss' Kappa), and comparative tests (e.g., t-test, ANOVA). Examples include R, SPSS, or GraphPad Prism.
Qualitative Data Analysis Tool	Analyzes open-ended feedback for themes.	Tools to support thematic analysis, such as NVivo, MAXQDA, or Dedoose, for coding and organizing qualitative responses [29].

Within regulatory ecotoxicology, the reliability and relevance evaluation of studies directly shapes hazard identification and risk characterization. For decades, the Klimisch method (1997) has been the default framework, but its reliance on expert judgment and limited criteria have raised concerns about consistency. The Criteria for Reporting and Evaluating ecotoxicity Data (CRED) method was developed to provide a more transparent, criteria‑based alternative. This application note, framed within a broader thesis comparing the Klimisch and CRED approaches, details the experimental protocols, quantitative outcomes, and practical implications of method choice for hazard and risk conclusions.

Application Notes & Protocols

Ring‑Test Design for Method Comparison

Objective: To compare the reliability and relevance evaluations performed with the Klimisch method and the draft CRED method, and to assess risk assessors’ perception of both methods.
Participants: 75 risk assessors from 12 countries (35 organizations), including regulatory agencies, consultancies, and industry. 62 participants completed Phase I (Klimisch), 54 completed Phase II (CRED), with 76% overlap[reference:0].
Study Selection: Eight aquatic ecotoxicity studies covering cyanobacteria, algae, higher plants, crustaceans, and fish; test designs included acute and long‑term exposures; substances represented industrial chemicals, biocides, plant‑protection products, and pharmaceuticals[reference:1].
Phase I (Klimisch): Each participant evaluated two of the eight studies using the Klimisch criteria. Reliability was categorized as R1 (reliable without restrictions), R2 (reliable with restrictions), R3 (not reliable), or R4 (not assignable). Relevance was categorized as C1 (relevant without restrictions), C2 (relevant with restrictions), or C3 (not relevant)[reference:2].
Phase II (CRED): Each participant evaluated two different studies from the same set using the draft CRED method, which includes 20 reliability criteria and 13 relevance criteria[reference:3].
Data Collection: Participants completed questionnaires after each phase, reporting their evaluations, confidence levels, and perceptions of the methods[reference:4].
Statistical Analysis: Consistency of categorizations was analyzed using Wilcoxon pair rank‑sum tests; confidence data were examined with exact chi‑square tests; false‑discovery rate correction was applied[reference:5].

Detailed Protocol for Conducting a CRED Evaluation

Step 1 – Study Preparation: Obtain the full study report (peer‑reviewed article or GLP report). Ensure all sections (methods, results, discussion, supplementary) are accessible.
Step 2 – Reliability Checklist: Apply the 20 CRED reliability criteria, covering test‑organism description, test‑substance characterization, exposure regime, endpoint measurement, statistical analysis, and reporting completeness. For each criterion, record “fulfilled,” “not fulfilled,” or “not assignable.”
Step 3 – Relevance Checklist: Apply the 13 CRED relevance criteria, considering the test system’s ecological representativeness, exposure scenario, endpoint relevance for the protection goal, and regulatory context.
Step 4 – Categorization: Based on the checklist outcomes, assign a final reliability category (R1–R4) and relevance category (C1–C4). Use the CRED guidance document to resolve ambiguous cases.
Step 5 – Documentation: Record the justification for each criterion decision and the final categories in a standardized evaluation sheet (Excel template available from the CRED tools website)[reference:6].
Step 6 – Quality Assurance: For regulatory submissions, independent peer review of the evaluation is recommended to ensure consistency.

Table 1. Characteristics of the Klimisch and CRED Evaluation Methods[reference:7]

Characteristic	Klimisch	CRED
Data type	Toxicity and ecotoxicity	Aquatic ecotoxicity
Number of reliability criteria	12–14 (ecotoxicity)	Evaluating 20 (reporting 50)
Number of relevance criteria	0	13
Number of OECD reporting criteria included	14 (of 37)	37 (of 37)
Additional guidance	No	Yes
How to summarize evaluation	Qualitative for reliability	Qualitative for reliability and relevance

Table 2. Reliability Categorization Outcomes (%)[reference:8]

Method	R1 (Reliable without restrictions)	R2 (Reliable with restrictions)	R3 (Not reliable)	R4 (Not assignable)
Klimisch	8	45	42	6
CRED	2	24	54	20

Table 3. Relevance Categorization Outcomes (%)[reference:9]

Method	C1 (Relevant without restrictions)	C2 (Relevant with restrictions)	C3 (Not relevant)
Klimisch	32	61	7
CRED	57	35	8

Table 4. Participant Confidence in Evaluation Results[reference:10]

Confidence Level	Klimisch (n=121)	CRED (n=103)
Very confident	37%	72%
Confident	43%	21%
Neutral	12%	5%
Not confident	8%	2%

Discussion

The ring‑test data demonstrate that method choice directly alters the classification of studies, thereby influencing the data pool available for hazard and risk assessment. The CRED method produced a higher proportion of “not reliable” (54% vs 42%) and “not assignable” (20% vs 6%) categorizations, indicating a more critical appraisal of study quality. This shift is attributed to CRED’s systematic criteria, which prompted assessors to detect flaws (e.g., exposure concentrations exceeding solubility, missing raw data) that were overlooked under the Klimisch approach[reference:11]. Consequently, hazard conclusions based solely on Klimisch‑evaluated studies may be underpinned by data that would be deemed unreliable under CRED, leading to potential underestimation of risk.

The higher confidence reported with CRED (72% very confident vs 37% for Klimisch) underscores the value of structured guidance in reducing evaluator uncertainty. Furthermore, the extension of CRED to nano‑ecotoxicity (NanoCRED), behavioural studies (EthoCRED), and sediment/soil systems illustrates its adaptability to emerging data needs[reference:12]. For drug‑development professionals, adopting CRED‑like transparent evaluation frameworks can strengthen the credibility of environmental risk assessments submitted to regulators.

The Scientist’s Toolkit

Item	Function
Test organisms (e.g., Daphnia magna, Danio rerio, Lemna minor)	Standardized model species for aquatic toxicity testing; provide reproducible endpoints for hazard assessment.
OECD test guidelines (e.g., OECD 201, 210, 211)	Internationally accepted protocols for conducting ecotoxicity studies; ensure data quality and comparability.
CRED evaluation sheets (Excel templates)	Structured checklists for systematically scoring reliability and relevance criteria; facilitate transparent documentation.
Statistical software (e.g., R, Python with ecotox libraries)	Analyze dose‑response data, calculate LC50/NOEC, perform meta‑analyses of multiple studies.
Chemical‑analysis tools (HPLC, GC‑MS)	Verify test‑substance purity and exposure concentrations; critical for assessing study reliability.
Laboratory information management system (LIMS)	Track sample‑handling, experimental conditions, and raw data; supports GLP compliance.
Reference databases (e.g., ECOTOX, IUCLID)	Curated repositories of existing toxicity data; used for weight‑of‑evidence approaches.

Diagrams

Workflow of the Ring‑Test Comparing Klimisch and CRED Methods

Title: Ring‑Test Workflow for Method Comparison

Title: Method Choice Influences Hazard and Risk Conclusions

The regulatory evaluation of ecotoxicity studies is foundational to environmental risk assessment for chemicals, pharmaceuticals, and plant protection products [17]. Historically, this process has relied heavily on the method established by Klimisch et al. in 1997, which provides a basic categorization system for study reliability [17]. While a seminal step at the time, this method is now criticized for its lack of detailed guidance, its dependence on subjective expert judgment, and its failure to ensure consistent evaluations among different assessors and regulatory frameworks [12] [17]. Inconsistencies in evaluating the same study can directly impact hazard conclusions, leading to either underestimated environmental risks or unnecessary risk mitigation measures [17].

Parallel challenges exist in pharmaceutical regulatory science, where a lack of harmonization in analytical method validation can lead to significant setbacks, such as Complete Response Letters (CRLs) from the U.S. Food and Drug Administration (FDA) [30]. The International Council for Harmonisation (ICH) addresses this through guidelines like ICH Q2(R2) and Q14, which promote a modern, science- and risk-based lifecycle approach to analytical procedures [31] [32]. This shift from prescriptive checklists to a principles-based framework mirrors the evolution needed in ecotoxicity data evaluation.

The Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) project was initiated to meet this need [12] [9]. CRED provides a transparent, detailed, and structured method for evaluating both the reliability (intrinsic scientific quality) and relevance (appropriateness for a specific assessment) of aquatic ecotoxicity studies [12]. By offering explicit criteria and guidance, CRED aims to reduce arbitrariness, improve consistency across international borders and regulatory regimes, and facilitate the acceptance of high-quality peer-reviewed literature in regulatory dossiers [17] [9]. This application note details the CRED methodology, provides protocols for its implementation, and analyzes its potential to harmonize environmental assessments within a broader thesis comparing the Klimisch and CRED paradigms.

Comparative Analysis: Klimisch vs. CRED Evaluation Frameworks

A direct comparison of the Klimisch and CRED methods reveals fundamental differences in scope, structure, and outcome. These differences were quantitatively assessed in a comprehensive ring test involving 75 risk assessors from 12 countries [17].

Table 1: Core Structural Comparison of Klimisch and CRED Evaluation Methods

Feature	Klimisch Method (1997)	CRED Evaluation Method (2016)
Primary Focus	Reliability only [17].	Reliability and relevance as separate, equally important dimensions [12] [17].
Guidance Detail	Minimal, narrative description. Highly dependent on expert judgement [12] [17].	Extensive. 20 explicit reliability criteria and 13 relevance criteria, each with detailed guidance [17] [9].
Evaluation Categories	Reliability: 1) Reliable without restrictions, 2) Reliable with restrictions, 3) Not reliable, 4) Not assignable [17].	Reliability: Same four categories as Klimisch. Relevance: 1) Relevant without restrictions, 2) Relevant with restrictions, 3) Not relevant [17].
Underlying Principle	Checklist-based, often favoring GLP (Good Laboratory Practice) and standardized guideline studies [12] [17].	Science-based, transparent assessment. Evaluates study design, conduct, and reporting quality irrespective of GLP status [12].
Outcome Transparency	Low. Provides only a final category without detailed justification [17].	High. Requires criterion-by-criterion assessment, creating an audit trail and clear rationale for the final categorization [17].

The ring test demonstrated that these structural differences translate into significant practical impacts on consistency and outcome. Participants evaluated a set of eight ecotoxicity studies using both methods [17].

Table 2: Quantitative Ring Test Results Highlighting Evaluation Consistency [19] [17]

Study Example & Description	Klimisch Method Results	CRED Method Results	Implication of CRED
Study E: GLP report on fish toxicity of estrone [19].	High inconsistency. 44% "Reliable without restrictions," 56% "Reliable with restrictions" [19].	More critical & consistent. 16% "Reliable without restrictions," 21% "Reliable with restrictions," 63% "Not reliable" [19].	CRED prevented automatic acceptance of a GLP study, critically appraising its scientific merits, leading to a more stringent and unified assessment.
Overall Ring Test Consistency	Lower consistency among assessors. High variability in categorizations for the same study [17].	Higher consistency. Reduced variability due to explicit criteria, minimizing subjective interpretation [17].	Enhances the reproducibility of regulatory decisions across different institutions and countries.
Participant Perception	Perceived as less accurate, more dependent on subjective judgement [17].	Perceived as more accurate, applicable, consistent, and transparent [12] [17].	Increases confidence in the evaluation process and the resulting hazard assessments.

CRED Evaluation Method: Detailed Application Notes

The CRED method is implemented through a structured process that separates the assessment of reliability from relevance, as their definitions are fundamentally different [12]. Reliability concerns the inherent scientific quality of the study design, performance, and analysis. Relevance concerns the applicability of the study's specific characteristics (e.g., test organism, endpoint, exposure duration) to a particular regulatory question (e.g., derivation of a chronic water quality standard) [12].

The CRED Evaluation Workflow

The following workflow diagrams the logical sequence for applying the CRED method, from study screening to final decision for use in a regulatory assessment.

Reliability and Relevance Evaluation Criteria

The core of the CRED method is its 33 criteria (20 for reliability, 13 for relevance). The reliability criteria are organized into six thematic categories that mirror the essential components of a well-reported study [12].

Table 3: Thematic Categories of CRED Reliability Criteria

Category	Number of Criteria	Example Criterion & Purpose
Test Substance	3	Characterization & Concentration Verification: Ensures the tested material is properly characterized and its concentration in the test system is confirmed, which is critical for reproducibility and understanding structure-activity relationships.
Test Organism	3	Species Identification & Health Status: Confirms the correct species was used and that organisms were healthy at test initiation, affecting the sensitivity and validity of the biological response.
Test Design	5	Control Groups & Replication: Evaluates the appropriateness of control groups and the number of replicates, which are fundamental for detecting treatment-related effects and statistical power.
Exposure Conditions	4	Exposure Duration & System Stability: Assesses whether the exposure regime (static, renewal, flow-through) is appropriate and whether conditions (e.g., temperature, pH) remained stable, ensuring the reported effect is linked to a definable exposure.
Endpoint & Data Analysis	3	Endpoint Definition & Statistical Methods: Reviews the clarity of the measured endpoint and the appropriateness of the statistical tests used, which is essential for the correct interpretation of results.
Reporting & Documentation	2	Clarity of Reporting & Raw Data: Judges whether the study is described with sufficient detail to be repeated and if raw data are accessible, which is the foundation of scientific transparency and reassessment.

The 13 relevance criteria guide the assessor in judging the study's fit-for-purpose. These include the appropriateness of the test organism's trophic level and ecological region, the biological relevance of the endpoint (e.g., mortality vs. subtle biochemical change), and the congruence of the exposure duration (acute vs. chronic) with the protection goal of the assessment [12] [17]. A study on soil invertebrates, for example, is inherently not relevant for deriving an aquatic quality standard, regardless of its high reliability [12].

Experimental Protocols for Implementation and Validation

Protocol: Conducting a CRED-Based Study Evaluation

This protocol provides a step-by-step methodology for a single assessor to evaluate an aquatic ecotoxicity study using the official CRED Excel tool [9].

Objective: To perform a transparent, consistent, and documented evaluation of the reliability and relevance of an aquatic ecotoxicity study for use in regulatory hazard assessment. Materials:

Official CRED Excel evaluation tool [9].
The peer-reviewed ecotoxicity study to be evaluated (full text).
Relevant regulatory guideline documents (e.g., OECD test guidelines, EPA methods) for reference.
Context document defining the specific assessment goal (e.g., "Derivation of a chronic PNEC for freshwater pelagic organisms").

Procedure:

Preparation: Download and open the CRED Excel tool. In the designated field, clearly state the assessment goal from the context document. This anchors the relevance evaluation.
Initial Screen & First Read: Read the study's title and abstract. Perform a quick "go/no-go" relevance screening based on obvious factors (e.g., aquatic vs. terrestrial test). If clearly irrelevant, document reason and stop.
Systematic Reliability Evaluation: Read the full study method and results sections.
- For each of the 20 reliability criteria in the Excel sheet, locate the corresponding information in the study text.
- Select the appropriate score (e.g., "Yes," "No," "Partly," "Not Reported") based on the detailed guidance embedded in the tool.
- Mandatory: For every score of "No," "Partly," or "Not Reported," provide a succinct written justification in the adjacent comment cell, citing the specific deficiency (e.g., "Control mortality reported as 25%, exceeding the 10% limit stated in OECD 203").
Assign Reliability Category: Based on the pattern of scores and justifications, assign one of the four Klimisch categories (R1-R4). The Excel tool may provide a suggested category, but the final expert judgment, documented with rationale, is determinative.
Systematic Relevance Evaluation: With the assessment goal in mind, evaluate the 13 relevance criteria.
- Judge each criterion based on the study's parameters (e.g., is Daphnia magna a relevant test organism for protecting European freshwater invertebrates?).
- Score and provide justification comments as in Step 3.
Assign Relevance Category: Assign a final relevance category (C1-C3) based on the relevance criteria evaluation.
Integration and Final Decision: Synthesize the reliability and relevance categorizations. A common decision matrix is:
- R1/R2 + C1/C2: Consider for primary use in assessment.
- R3/R4 or C3: Generally excluded as a key study but may be used as supporting information with a clear explanation of its limitations.
Documentation: The completed Excel file, with all scores and comments, serves as the permanent, transparent audit trail for the evaluation.

Protocol: Ring-Testing for Inter-Assessor Consistency (Validation)

This protocol, modeled on the original CRED validation study, describes how an organization can validate the implementation of CRED and train assessors [17].

Objective: To measure and improve the consistency of CRED evaluations among multiple risk assessors within or across institutions. Materials:

A set of 3-5 diverse aquatic ecotoxicity studies (including both guideline and non-guideline studies).
Official CRED Excel tool for all participants [9].
A clear, common assessment goal provided to all participants.
Statistical software for analyzing inter-rater agreement (e.g., Fleiss' Kappa calculation).

Procedure:

Training: Provide all participants with background on CRED and training on using the Excel tool.
Blinded Evaluation: Each participant independently evaluates the same set of studies using the protocol in Section 4.1, without collaboration.
Data Collection: Collate all completed Excel evaluation sheets.
Analysis:
- Primary Outcome: Calculate the percentage agreement for the final reliability category (R1-R4) for each study. Compare this to historical or expected agreement using the Klimisch method.
- Secondary Outcome: Calculate inter-rater agreement (e.g., Fleiss' Kappa) for a subset of key individual reliability criteria (e.g., "control group appropriateness," "concentration verification").
- Qualitative Analysis: Review the justifications provided for scores to identify areas where guidance in the tool may be ambiguous.
Harmonization Workshop: Facilitate a meeting where assessors discuss their evaluations for studies with the highest disagreement. Focus on interpreting the CRED guidance, not defending individual judgements. Aim to develop a common understanding.
Iteration (Optional): Have assessors re-evaluate one study after the workshop to measure improvement in consistency.
Report: Document the process, initial agreement statistics, lessons learned, and any agreed-upon internal guidance for ambiguous criteria. This report validates the internal consistency of the assessment process.

Table 4: Key Research Reagent Solutions for CRED Implementation

Tool / Resource	Function & Purpose	Source / Example
Official CRED Excel Evaluation Tool	The primary implementation instrument. Contains all 33 criteria with dropdown scores, comment fields, and embedded guidance to standardize the evaluation process [9].	Freely available for download from the project website (e.g., Ecotox Centre) [9].
CRED Reporting Recommendations Template	A checklist of 50 specific reporting criteria across six categories (General, Test Design, Substance, etc.). Used prospectively by researchers to ensure studies contain all information needed for a high-reliability evaluation [12].	Provided as an Excel file alongside the evaluation tool [12].
OECD / EPA Ecotoxicity Test Guidelines	Reference documents for standardized testing protocols. Essential for assessing whether a study's design aligns with accepted scientific principles, a key aspect of reliability [17].	OECD Guidelines (e.g., 201, 210, 211). US EPA Ecological Effects Test Guidelines.
Reference Regulatory Guidance Documents	Provide context for relevance evaluations. Documents like the EU's Technical Guidance Document for deriving Environmental Quality Standards define protection goals and acceptable endpoints [9].	EU TGD-EQS, REACH Guidance R.4, EMA ERA Guideline [9].
Statistical Analysis Software	Used to analyze inter-assessor agreement during ring-testing and validation of the CRED implementation (e.g., Fleiss' Kappa).	R, SPSS, or dedicated online calculators.

Pathway to Regulatory Harmonization: Integration and Adoption

The successful harmonization of assessments across frameworks requires more than a superior scientific tool; it necessitates a clear pathway for integration into regulatory practice and cross-disciplinary alignment with broader quality paradigms.

Integration with Broader Quality Frameworks: The CRED lifecycle approach aligns with modern regulatory science principles championed by ICH. The Analytical Target Profile (ATP) concept from ICH Q14—defining desired performance criteria at the outset—parallels CRED's emphasis on defining assessment goals before evaluation begins [31]. The lifecycle management of analytical procedures in ICH Q2(R2)/Q14, which includes post-approval change management based on risk, mirrors the need for ongoing evaluation of ecotoxicity data sets as new studies emerge [31] [32]. Furthermore, incorporating quality risk management (ICH Q9) into the CRED process—formally assessing the risk that a study's limitations pose to the overall assessment conclusion—would be a logical and powerful enhancement, creating a direct bridge to pharmaceutical CMC and analytical quality systems [31] [30].

The CRED evaluation method represents a significant evolution from the Klimisch paradigm, offering a structured, transparent, and scientifically rigorous framework for evaluating ecotoxicity data. Its detailed criteria for both reliability and relevance directly address the major sources of inconsistency and expert bias that have hampered harmonized environmental risk assessment [12] [17]. As demonstrated through large-scale ring testing, CRED improves inter-assessor consistency and critical appraisal, reducing the automatic privileging of guideline studies and facilitating the appropriate use of peer-reviewed science [19] [17].

For the method to fully realize its potential for harmonizing assessments across regulatory frameworks, active steps toward formal adoption, training, and technological integration are required [9]. By aligning its implementation with overarching quality and lifecycle concepts from ICH, CRED can transcend ecotoxicology to serve as a model for transparent, evidence-based evaluation across regulatory science. This promises not only more consistent and protective environmental decisions but also greater efficiency and predictability for industry and regulators alike, ultimately contributing to the sustainable development and authorization of chemicals and pharmaceuticals.

Conclusion

The comparative analysis underscores the CRED evaluation method as a scientifically robust and practical successor to the Klimisch method, offering greater detail, transparency, and consistency in assessing ecotoxicity studies. Its structured criteria for both reliability and relevance reduce subjective bias, facilitate the inclusion of peer-reviewed literature, and promote harmonized decision-making across regulatory systems. For biomedical and clinical research, particularly in drug development, adopting CRED can lead to more reliable environmental safety profiles. Future directions should focus on the broader implementation of CRED within global guidelines, its adaptation to emerging toxicological areas (e.g., nanomaterials, endocrine disruptors), and continuous refinement based on stakeholder feedback to keep pace with scientific advancement.