The CRED Method: A Modern Framework for Evaluating Ecotoxicity Study Reliability in Regulatory Science

Noah Brooks Jan 09, 2026 151

This article provides researchers, scientists, and drug development professionals with a comprehensive guide to the Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) method.

The CRED Method: A Modern Framework for Evaluating Ecotoxicity Study Reliability in Regulatory Science

Abstract

This article provides researchers, scientists, and drug development professionals with a comprehensive guide to the Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) method. Developed as a transparent and consistent replacement for the outdated Klimisch method, CRED enhances the quality of environmental hazard and risk assessments. The article explores the method's foundations, detailing its 20 reliability and 13 relevance criteria with practical application guidance. It addresses common challenges in study evaluation, examines validation data from international ring tests demonstrating its superior consistency, and discusses its integration into modern workflows, including emerging AI-assisted tools. By synthesizing these aspects, the article underscores CRED's critical role in harmonizing regulatory decisions and incorporating high-quality academic research into chemical safety assessments.

Beyond Klimisch: Why the CRED Method Was Developed for Modern Ecotoxicology

The Critical Need for Reliable Ecotoxicity Data in Chemical Regulation

The derivation of Predicted-No-Effect Concentrations (PNECs) and Environmental Quality Standards (EQSs) forms the cornerstone of chemical regulation, aiming to protect ecosystems from harmful substances. The integrity of this process is entirely dependent on the reliability and relevance of the underlying ecotoxicity studies [1]. Historically, the evaluation of study quality has been subject to expert judgment, leading to potential bias and inconsistency, where different assessors can reach divergent conclusions from the same data [1]. This inconsistency undermines regulatory transparency and confidence.

This article is framed within the context of a broader thesis on the Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) method, a systematic framework designed to overcome these challenges. The CRED method posits that transparent, criteria-driven evaluation is critical for scientific and regulatory integrity [2]. It provides a standardized tool with explicit criteria to assess reliability (the intrinsic scientific quality) and relevance (the appropriateness for a specific assessment purpose) [1]. As global regulations evolve towards greater stringency and the adoption of New Approach Methodologies (NAMs) [3] [4], the principles championed by CRED—consistency, transparency, and robust data utility—are more critical than ever for ensuring that chemical management decisions are founded on trustworthy science.

Core Application Notes: The CRED Evaluation Framework

The CRED evaluation method was developed to replace the less specific Klimisch method, offering a more detailed, transparent, and consistent system for assessing aquatic ecotoxicity studies [1]. It is structured around two pillars: a set of 20 criteria for reliability and 13 criteria for relevance, each accompanied by extensive guidance to minimize ambiguity [1].

Reliability focuses on the inherent quality of the study design, performance, and reporting (e.g., test organism health, exposure control, statistical analysis).
Relevance assesses the applicability of the study's endpoints and conditions to a specific regulatory question (e.g., appropriateness of the species, exposure duration, and measured effect for the hazard being evaluated) [1].

A study is evaluated against these criteria, and the degree to which they are fulfilled determines its final classification. A ring-test evaluation demonstrated that the CRED method leads to more consistent and transparent assessments compared to older methods [1].

Table 1: CRED Evaluation Outcome Categories and Criteria Fulfillment

Evaluation Category	Description	Mean % of Criteria Fulfilled	Standard Deviation
Reliable without restrictions	High-quality study with no significant shortcomings.	93%	12% [5]
Reliable with restrictions	Study is usable but has deficiencies that limit its interpretive power.	72%	12% [5]
Not reliable	Study has critical flaws precluding its use in formal risk assessment.	60%	15% [5]
Relevant without restrictions	Study design and endpoints are directly applicable to the assessment.	84%	8% [5]
Relevant with restrictions	Study is applicable but with caveats (e.g., a surrogate species is used).	73%	14% [5]

Furthermore, CRED includes 50 reporting recommendations across six categories (general information, test design, test substance, test organism, exposure conditions, and statistical design) to guide researchers in producing studies that meet regulatory data needs [1].

Experimental & Analytical Protocols

Protocol 1: Systematic Reliability Assessment of an Ecotoxicity Study Using CRED

Objective: To perform a standardized, transparent evaluation of the reliability of an aquatic ecotoxicity study for use in regulatory decision-making.

Materials: The study to be evaluated; CRED evaluation checklist (20 reliability criteria); guidance documents [1].

Procedure:

Initial Documentation: Record the study identifier, evaluator name, and evaluation date.
Criterion-by-Criterion Assessment: For each of the 20 reliability criteria, examine the study manuscript for the required information.
- Example Criteria: "Test organism: The test organism is clearly identified (species, life stage, source)." "Exposure concentrations: The measured concentrations are reported and are close to the nominal concentrations, or the reason for a difference is explained."
Scoring: For each criterion, assign a fulfillment status: "Yes," "No," or "Not Assessable/Not Reported."
Overall Classification: Based on the pattern of fulfilled criteria, assign the study to one of four reliability categories (see Table 1). A study fulfilling nearly all criteria (mean ~93%) is "Reliable without restrictions," while one with significant gaps (mean ~60%) is "Not reliable" [5].
Transparent Reporting: Document the rationale for each scoring decision and the final classification in an evaluation report. This ensures the assessment is reproducible and auditable.

Protocol 2: Applying Mass Balance Models for In Vitro Bioavailability Correction

Objective: To predict the freely dissolved (bioavailable) concentration of a test chemical in an in vitro assay from the nominal concentration, improving extrapolation to in vivo conditions for QIVIVE [6].

Materials: Chemical property data (Log KOW, pKa, solubility); assay system parameters (media volume, serum lipid/protein content, cell count, plastic well surface area); computational software (R, Python); mass balance model (e.g., Armitage et al. model) [6].

Procedure:

Data Compilation: Gather all necessary input parameters for the selected model.
- Chemical Properties: Obtain accurate values for Log KOW, pKa, molecular weight, and solubility [6].
- System Parameters: Define media composition (lipid, protein, water fractions), cell volume and lipid content, and labware binding surface area [6].
Model Selection: Choose an appropriate mass balance model. For broad applicability, the Armitage et al. model is recommended as it handles neutral/ionizable chemicals and accounts for partitioning to media, cells, labware, and headspace [6].
Prediction Execution: Run the model using the compiled inputs to calculate the distribution of the chemical. The key output is the freely dissolved fraction (f_{u, media}) in the exposure medium.
Concentration Correction: Multiply the reported nominal concentration by the predicted f_{u, media} to estimate the bioavailable concentration driving the biological effect.
Application to QIVIVE: Use the corrected in vitro bioavailable concentration, rather than the nominal concentration, in subsequent physiologically based kinetic (PBK) modeling for reverse dosimetry to predict an equivalent in vivo dose [6].

Visualization of Workflows and Relationships

CRED Study Evaluation and Regulatory Use Workflow

Model-Based Bioavailability Correction for QIVIVE

Table 2: Key Reagents, Databases, and Tools for Ecotoxicity Research and Data Evaluation

Tool/Resource	Function in Ecotoxicity Research	Key Features / Notes
CRED Evaluation Excel Tool [2]	Provides the standardized checklist for evaluating study reliability and relevance.	Contains the 20 reliability and 13 relevance criteria with guidance. Freely available for download.
Standard Reference Toxicants	Used to validate the health and sensitivity of test organism cultures and the performance of the test system.	Examples include sodium chloride for daphnia or potassium dichromate for fish. Must be of high purity.
Solvent Controls (e.g., Acetone, DMSO)	Used as a vehicle control when testing poorly water-soluble substances.	Must be tested to ensure no toxic effect at the maximum concentration used. Purity should be >99%.
Defined Culture Media & Food	For maintaining and testing standard organisms (e.g., algae, daphnia).	Ensures reproducible organism health and eliminates confounding toxicity from media impurities.
ToxValDB (Toxicity Values Database) [7] [8]	Curated database of in vivo toxicity results and derived values for data gap filling and model benchmarking.	Contains over 240,000 records for ~42,000 chemicals in a standardized format [7].
ECOTOX Knowledgebase [8]	Comprehensive repository of single-chemical ecotoxicity test results for aquatic and terrestrial species.	Essential for literature review, weight-of-evidence assessment, and data collection for PNEC derivation.
Mass Balance Model Software (e.g., R/HTTK Package) [6] [8]	Predicts free concentrations in in vitro assays to improve in vitro to in vivo extrapolation (QIVIVE).	The Armitage et al. model is recommended for broad applicability [6].
CompTox Chemicals Dashboard [8]	Integrates chemical properties, hazard, exposure, and risk data from multiple EPA resources.	Provides access to ToxCast, ToxRefDB, and predictive models, serving as a central chemical data hub.

The regulatory evaluation of ecotoxicity studies is a cornerstone of environmental hazard and risk assessment for chemicals, informing decisions on marketing authorizations and environmental quality standards (EQS) [9] [2]. For decades, this evaluation has predominantly relied on the method established by Klimisch et al. in 1997, which categorizes study reliability into four levels: "reliable without restrictions," "reliable with restrictions," "not reliable," and "not assignable" [9] [10]. While pioneering, this method suffers from critical shortcomings, including a lack of detailed guidance and insufficient criteria, leading to inconsistent evaluations dependent on expert judgement [9] [1] [11]. These inconsistencies can directly impact risk assessment outcomes, potentially leading to underestimated environmental risks or unnecessary mitigation measures [9].

In response, the Criteria for Reporting and Evaluating ecotoxicity Data (CRED) method was developed to provide a transparent, consistent, and science-based framework [9] [1]. This document details the inherent shortcomings of the Klimisch method and presents the CRED evaluation method as a robust alternative, providing detailed application notes and protocols for researchers and assessors within the broader thesis of advancing ecotoxicity study reliability research.

Core Shortcomings of the Klimisch Method

The Klimisch method's primary flaws stem from its high-level, non-specific approach. It provides only generalized prompts for evaluation rather than a structured set of criteria, making the process highly subjective [11] [10]. This lack of granular guidance is the root cause of inconsistent categorizations of the same study by different risk assessors [9]. Furthermore, the method conflates reporting quality with methodological reliability, often leading to an automatic preference for studies conducted under Good Laboratory Practice (GLP) or according to standardized OECD guidelines, even when such studies may contain methodological flaws [9] [1]. The method also offers no formal criteria or categories for evaluating the relevance of a study to a specific regulatory question, a critical aspect distinct from reliability [9] [1].

Quantitative Comparison: Klimisch vs. CRED Evaluation Outcomes

The CRED method addresses Klimisch's shortcomings with a detailed, criteria-based framework. It employs 20 criteria for reliability and 13 for relevance, each accompanied by extensive guidance to standardize judgement [1] [2]. A major ring-test involving 75 risk assessors from 12 countries compared the two methods, revealing significant differences in consistency and perception [9].

Table 1: Participant Perception of Klimisch vs. CRED Evaluation Methods [9]

Evaluation Aspect	Klimisch Method	CRED Method
Dependency on Expert Judgement	High	Low
Perceived Accuracy	Lower	Higher
Perceived Consistency	Lower	Higher
Practicality (Use of Criteria)	Less Practical	More Practical
Transparency of Process	Lower	Higher

The ring test also allowed for a quantitative analysis of how fulfilled criteria correlate with final reliability categories under the CRED system, demonstrating a clear, measurable gradient.

Table 2: CRED Criteria Fulfillment by Final Reliability Category [5]

CRED Reliability Category	Mean % of Criteria Fulfilled	Standard Deviation	Sample Size (n)
Reliable without restrictions	93%	12	3
Reliable with restrictions	72%	12	24
Not reliable	60%	15	58
Not assignable	51%	15	19

Application Notes & Protocols for the CRED Evaluation Method

Core Protocol: Conducting a CRED Evaluation

The CRED evaluation is a structured, two-phase process assessing reliability and relevance separately. The following protocol is designed for the evaluation of a single aquatic ecotoxicity study for use in deriving an Environmental Quality Standard (EQS).

Phase 1: Reliability Evaluation

Preparation: Obtain the full text of the study. Use the standardized CRED Excel tool [2].
Criteria Assessment: Systematically review the study against the 20 reliability criteria. For each criterion, determine if it is "Fulfilled," "Not Fulfilled," or "Not Applicable." Document the rationale for each decision using the guidance notes.
Overall Judgement: Based on the pattern of fulfilled/unfulfilled criteria, assign an overall reliability category (R1-R4). A study with all key criteria fulfilled is R1. Specific deficiencies (e.g., lack of solvent control, statistical limitations) typically lead to R2. Critical flaws (e.g., unacceptable control mortality, incorrect test species) result in R3. Insufficient reporting precluding evaluation leads to R4.

Phase 2: Relevance Evaluation

Context Definition: Define the purpose of the assessment (e.g., EQS derivation for a specific water body).
Criteria Assessment: Evaluate the study against the 13 relevance criteria, considering the defined context. Determine if the test organism, endpoint, exposure duration, and effect concentrations are appropriate ("Fulfilled"/"Not Fulfilled").
Overall Judgement: Assign a relevance category: C1 (Relevant without restrictions), C2 (Relevant with restrictions), or C3 (Not relevant). A chronic study on a locally relevant species for a chronic EQS would be C1. An acute study used for a chronic assessment might be C2, providing supporting information.

Key Conceptual Relationship: Reliability and relevance are independent. A study on the correct organism (relevant) may be poorly conducted (unreliable). Conversely, a well-conducted study (reliable) on an irrelevant organism or endpoint is not suitable for the specific assessment [1].

CRED Evaluation Workflow

Protocol for a Comparative Method Ring Test

To empirically demonstrate differences in consistency between methods, a standardized ring test can be conducted [9].

Participant Selection: Recruit multiple risk assessors with varying experience from different organizations.
Study Selection: Choose 4-8 ecotoxicity studies with varying quality (e.g., a GLP guideline study, a well-reported academic study, a poorly reported study).
Blinded Evaluation: Randomly assign each participant 2 studies to evaluate using the Klimisch method and 2 different studies using the CRED method. Provide identical background context.
Data Collection: Collect final reliability/relevance categories and the time taken for each evaluation.
Analysis: Calculate the percentage agreement among assessors for each study under each method. Compare time requirements and gather feedback on method usability via questionnaire.

The Scientist's Toolkit: Essential Reagents & Materials

Table 3: Research Reagent Solutions for Ecotoxicity Testing & Evaluation

Item / Solution	Function in Ecotoxicity Testing / Evaluation
Standardized Test Media (e.g., M4, M7 for Daphnia, FET for fish)	Provides consistent, defined water chemistry for aquatic tests, ensuring reproducibility and comparability of results across studies. Essential for reliability.
Reference Toxicants (e.g., K₂Cr₂O₇, NaCl)	Used in periodic laboratory performance checks to confirm healthy test organisms and consistent response. A key reliability criterion.
Test Substance Analysis Standards	High-purity chemical standards used to confirm the concentration and stability of the test substance in stock and test solutions via analytical verification. Critical for reliability.
Solvent Controls (e.g., acetone, DMSO, methanol)	Required when a solvent is needed to dissolve a hydrophobic test substance. Must be appropriate, non-toxic at used concentration, and included as an additional control group. A key CRED reliability criterion.
Formulated Animal Feed	Specific, consistent diets for cultured test organisms (e.g., algae, Daphnia, fish). Ensures organism health and reduces variability in test response, supporting reliability.
Data Evaluation Tool (CRED Excel File) [2]	The standardized checklist tool that guides the assessor through the 20 reliability and 13 relevance criteria, ensuring a structured, transparent, and consistent evaluation process.
Guideline Documents (OECD, EPA, ISO)	Provide the standardized methodological protocols against which study design and reporting are compared during the reliability evaluation.

Pathway to Regulatory-Ready Data

The Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) initiative was developed to address critical inconsistencies in environmental risk assessment. Its primary objective is to improve the reproducibility, transparency, and consistency of reliability and relevance evaluations for aquatic ecotoxicity studies across different regulatory frameworks, countries, institutes, and individual assessors [1].

The development of CRED was driven by the widespread use of the Klimisch method for study evaluation, which was found to be unspecific, lacking in essential criteria, and subject to considerable interpretational differences, potentially introducing bias into regulatory decisions [1]. In contrast, CRED provides a structured, detailed, and guided framework aimed at minimizing subjective expert judgment. A key complementary aim is to enhance the usability of peer-reviewed ecotoxicity studies for regulatory purposes by providing clear reporting recommendations for researchers, thereby ensuring that published studies contain all information necessary for a high-quality evaluation [1].

Core CRED Evaluation Framework: Criteria and Application

The CRED evaluation method is built on a fundamental distinction between two key concepts:

Reliability: This assesses the inherent scientific quality of a test report, focusing on the clarity and plausibility of the findings based on the methodology, experimental procedure, and description of results [1].
Relevance: This evaluates the appropriateness of the data for a specific hazard identification or risk characterization, which depends on the assessment's purpose [1].

A study can be reliable but not relevant for a particular regulatory question, and vice versa. The CRED framework provides separate, detailed criteria for evaluating each aspect.

Quantitative Comparison with the Klimisch Method

The following table summarizes a ring test comparison that highlighted CRED's advantages over the previously dominant Klimisch method [1].

Table 1: Comparison of the Klimisch and CRED Evaluation Methods Based on a Ring Test Analysis [1]

Evaluation Aspect	Klimisch Method	CRED Method	Implication of CRED's Approach
Core Structure	Four general categories (e.g., "reliable without restriction")	20 reliability & 13 relevance criteria with extensive guidance	Reduces ambiguity and subjective interpretation.
Transparency	Low; limited guidance leads to opaque decisions.	High; explicit criteria and documented reasoning are required.	Improves auditability and understanding of regulatory decisions.
Consistency	Low; high variability between different assessors.	High; structured criteria lead to greater agreement.	Promotes harmonization across assessors and jurisdictions.
Bias Potential	Criticized for potential bias toward industry & GLP studies [1].	Designed to be neutral; evaluates intrinsic study quality.	Allows for a fairer evaluation of all relevant scientific literature.
User Perception	Considered less accurate and applicable by ring test participants.	Rated as more accurate, applicable, and transparent by participants.	Indicates higher acceptance and usability among professionals.

Detailed CRED Evaluation Criteria

The CRED method's robustness stems from its comprehensive set of criteria. The tables below categorize the core questions assessors must address.

Table 2: Categories and Examples of CRED Reliability Criteria (20 total criteria) [1]

Reliability Category	Example Criterion	Purpose of Evaluation
Test Design	Are the test concentrations appropriate and justified?	Ensures the experimental design can produce meaningful dose-response data.
Exposure Control	Was the exposure concentration measured and verified?	Confirms the test organisms were exposed to the reported concentrations.
Test Organism	Is the test organism species/life stage clearly identified?	Ensures biological relevance and reproducibility of the test.
Control Performance	Were control results within accepted limits?	Validates the health of test organisms and the test system's stability.
Statistical Analysis	Are the statistical methods appropriate and clearly described?	Verifies the correctness and robustness of the data analysis.

Table 3: Categories and Examples of CRED Relevance Criteria (13 total criteria) [1]

Relevance Category	Example Criterion	Purpose of Evaluation
Test Substance	Is the tested substance relevant for the assessment (e.g., correct form, purity)?	Ensures the test material corresponds to the substance of regulatory concern.
Exposure Regime	Is the exposure duration relevant for the assessment endpoint?	Judges if the test (e.g., acute or chronic) matches the required protection goal.
Test Endpoint	Is the observed effect relevant for the protection goal?	Determines if the measured parameter (e.g., mortality, reproduction) is suitable.
Test Organism	Is the test species/group relevant for the ecosystem being protected?	Assesses the ecological representativeness of the chosen model organism.

Application Notes & Experimental Protocols

Protocol for Performing a CRED Evaluation

This protocol provides a stepwise methodology for evaluating an aquatic ecotoxicity study using the CRED framework [1].

Title: Systematic Reliability and Relevance Evaluation of an Aquatic Ecotoxicity Study Using the CRED Method. Purpose: To transparently determine the scientific reliability and regulatory relevance of a given ecotoxicity study for a defined assessment purpose (e.g., deriving a Predicted-No-Effect Concentration for a specific substance). Materials:

The ecotoxicity study to be evaluated (peer-reviewed paper or report).
CRED evaluation sheet (Excel tool recommended) [12].
CRED guidance document detailing the 20 reliability and 13 relevance criteria [1].

Procedure:

Study Preparation: Clearly define the regulatory assessment context (e.g., substance, protected ecosystem, endpoint of concern). Read the study in full.
Reliability Assessment: For each of the 20 reliability criteria, answer "Yes," "No," or "Unclear" based solely on the information reported in the study. Provide a brief justification for each answer in the comment field. Do not infer missing information.
Reliability Integration: Synthesize the individual reliability ratings into an overall reliability classification (e.g., High, Medium, Low). The guidance document assists in weighing major versus minor deficiencies.
Relevance Assessment: For each of the 13 relevance criteria, answer "Yes," "No," or "Partially" based on the study details and the predefined assessment purpose. Justify each answer.
Relevance Integration: Synthesize the relevance ratings to conclude whether the study is "Relevant," "Not Relevant," or "Partially Relevant" for the specific assessment.
Final Documentation: The completed CRED sheet, with all answers and justifications, serves as the transparent audit trail for the evaluation.

Protocol for Applying CRED Reporting Recommendations

To improve the quality of future studies, CRED also provides 50 reporting criteria across six categories [1]. Researchers can use this as a checklist when designing studies and preparing manuscripts.

Title: Conducting and Reporting an Aquatic Ecotoxicity Study Aligned with CRED Principles. Purpose: To ensure an ecotoxicity study is performed and documented to a standard that facilitates a high-reliability evaluation and regulatory use. Procedure (Reporting Phase): Use the CRED reporting checklist to verify all essential information is included in the manuscript or supplementary data [1].

General Information: Declare funding, conflicts of interest, and compliance with ethical standards.
Test Substance: Report source, purity, chemical identification (CAS), and characterization (e.g., for nanomaterials or mixtures).
Test Organism: Specify species, life stage, source, husbandry conditions, and acclimation procedures.
Test Design: Detail the experimental design, number of replicates, concentrations, controls, and exposure system.
Exposure Conditions: Document the medium, temperature, pH, light, renewal regime, and measured concentration data (nominal vs. measured).
Statistical & Biological Response: Describe statistical methods, raw data availability, dose-response analysis, and endpoint observations.

Visualizing the CRED Workflow and Its Regulatory Context

Diagram 1: CRED study evaluation workflow for regulatory use

Diagram 2: The development and evolution of the CRED initiative

Table 4: Key Research Reagent Solutions & Materials for CRED Implementation [1] [12] [2]

Tool / Resource	Function / Purpose	Source / Availability
CRED Excel Evaluation Tool	A structured spreadsheet with the 33 criteria, guidance pop-ups, and fields for comments. Facilitates standardized, documented evaluations.	Freely available for download from the SciRAP or ECOTOX Centre websites [12] [2].
CRED Reporting Checklist	A list of 50 specific criteria in six categories to guide researchers in preparing comprehensive study reports.	Published within the primary CRED methodology paper [1].
NanoCRED Framework	An adaptation of CRED with modified criteria for evaluating ecotoxicity studies of engineered nanomaterials, addressing nano-specific issues (e.g., particle characterization).	Detailed in a dedicated publication (Hartmann et al., 2017) [12].
EthoCRED Framework	An extension of CRED to guide the evaluation of behavioural ecotoxicity studies, ensuring reliability and relevance of more complex endpoints.	Detailed in a dedicated publication (Bertram et al., 2024) [12].
CRED for Sediment and Soil	Adapted criteria for evaluating studies on terrestrial and sediment organisms, expanding the framework beyond aquatic toxicity.	Described in Casado-Martinez et al., 2024 [12].

Within regulatory environmental risk assessment, the derivation of Predicted-No-Effect Concentrations (PNECs) and Environmental Quality Standards (EQSs) hinges on the quality of underlying ecotoxicity studies [13]. Two cornerstone concepts in evaluating this quality are reliability and relevance. Their precise definition and systematic application are critical for ensuring that regulatory decisions are based on sound, defensible, and pertinent science.

Reliability (also referred to as credibility or internal validity) assesses the methodological soundness of a study. It answers the question: "How trustworthy are the study's data and reported results based on its design, conduct, and reporting?" A reliable study minimizes bias and error, allowing for confidence in its findings.

Relevance (or external validity) assesses the usefulness and applicability of a study's data for a specific regulatory purpose. It answers the question: "Are the test species, endpoints, exposure conditions, and effect concentrations appropriate for the protective goal at hand?" A highly reliable study may have low relevance if, for example, it tests an insensitive species unrelated to the ecosystem being protected.

The CRED (Criteria for Reporting and Evaluating Ecotoxicity Data) method was developed to provide a transparent, consistent, and structured framework for evaluating these two dimensions [13]. It moves beyond older, more subjective evaluation schemes (e.g., the Klimisch method) by offering detailed criteria and explicit guidance, thereby reducing bias and increasing consistency among different assessors [13] [12]. This document provides application notes and protocols for implementing the CRED evaluation within a research or regulatory context.

Quantitative Framework of the CRED Method

The CRED method operationalizes the evaluation of reliability and relevance through a set of explicit criteria. The original framework for aquatic ecotoxicity studies defines 20 criteria for reliability and 13 for relevance [13]. Subsequent developments have extended this framework to nanoecotoxicity (NanoCRED), behavioral studies (EthoCRED), and sediment/soil studies [12].

Table 1: Core CRED Evaluation Criteria for Aquatic Ecotoxicity Studies [13]

Dimension	Category	Number of Criteria	Example Criteria (Paraphrased)
Reliability	Test Substance Characterization	4	Purity, concentration verification, stability, measurement of exposure concentrations.
	Test Organism & Design	6	Species identification, health, age, randomization, blinding, sample size justification.
	Exposure System & Conditions	5	Control of physico-chemical parameters, system stability, renewal of test media.
	Data Reporting & Analysis	5	Clear presentation of raw data, statistical methods, dose-response, control performance.
Relevance	Test Species & Endpoint	5	Appropriateness of taxonomic group, life-stage, and endpoint for protection goal.
	Exposure Pattern	4	Match of exposure duration, route, and regime to real-world scenarios.
	Ecological Context	4	Consideration of sensitive species, population/community-level implications.

The outcome of a CRED evaluation is not a single numeric score but a structured profile. This profile details which specific criteria are fulfilled, partly fulfilled, or not fulfilled, providing a transparent audit trail for the assessment. Comparative analysis has demonstrated the utility of this approach.

Table 2: Comparison of Method Evaluation from a Ring-Test Study [12]

Evaluation Aspect	Klimisch Method	CRED Method	Outcome / Preference
Transparency	Low - Provides limited guidance and rationale.	High - Offers explicit criteria and detailed guidance for each.	CRED strongly preferred for transparency [12].
Consistency	Moderate to Low - Relies heavily on expert judgment.	High - Structured criteria reduce subjective variance.	CRED found to improve consistency among assessors [12].
Accuracy	Not directly assessed.	Perceived as more accurate due to comprehensiveness.	Assessors perceived CRED as more accurate [12].
Ease of Use	Initially easier due to simplicity.	Requires more initial effort to learn the criteria.	CRED's added complexity is justified by its benefits [12].

Application Notes and Experimental Protocols

Protocol for Conducting a CRED Evaluation

The following step-by-step protocol is adapted from the CRED methodology for evaluating a single aquatic ecotoxicity study [13] [12].

Phase 1: Preparation and Familiarization

Define the Regulatory Context: Clearly articulate the protection goal (e.g., protecting freshwater fish populations) and the intended use of the data (e.g., deriving a PNEC).
Acquire the Full Study Manuscript: Secure the complete text, including any supplementary materials. An abstract alone is insufficient for evaluation.
Select the Appropriate CRED Tool: Choose the evaluation sheet matching your study type (e.g., standard aquatic, NanoCRED for nanomaterials, EthoCRED for behavioral studies) [12].

Phase 2: Systematic Assessment

Reliability Assessment (Criteria R1-R20):
- Work through each reliability criterion sequentially.
- For each criterion, locate the relevant information in the study manuscript.
- Judge the criterion as "Fulfilled" (Y), "Not Fulfilled" (N), "Partially Fulfilled" (P), or "Not Reported/Unclear" (NR). Base judgments strictly on what is reported.
- Document the justification for each judgment by citing page numbers, figure/table references, or quoting text. This creates an essential audit trail.
Relevance Assessment (Criteria Rel1-Rel13):
- Switch to the relevance criteria. This assessment is inherently more contextual and tied to the protection goal defined in Step 1.
- Judge each relevance criterion as "High", "Medium", "Low", or "Not Applicable."
- Justify each judgment by linking the study's design (species, exposure, endpoint) to the regulatory context. For example, a study on a benthic invertebrate would have "High" relevance for deriving a sediment EQS but potentially "Medium" relevance for a general water column PNEC.

Phase 3: Integration and Conclusion

Synthesize Reliability Findings: Summarize the reliability profile. A study with many "Not Fulfilled" criteria in critical areas (e.g., exposure concentration verification, control performance) has low reliability, and its results should be weighted lightly or excluded.
Synthesize Relevance Findings: Summarize the relevance profile. A study may be highly reliable but of low relevance to the specific assessment.
Form an Integrated Conclusion: Weigh both dimensions to determine the study's overall utility for the assessment. The final conclusion might categorize the study as:
- "Core Study": High reliability and high relevance. Suitable for quantitative analysis (e.g., included in species sensitivity distribution).
- "Supporting Study": High reliability but moderate/low relevance, or medium reliability and high relevance. Used for qualitative support or uncertainty analysis.
- "Excluded Study": Low reliability. Not used for decision-making regardless of perceived relevance.

Protocol for AI-Assisted Extraction of Experimental Procedures

The CRED method emphasizes the need for detailed reporting. Recent advances in artificial intelligence (AI) can aid in the structuring and analysis of experimental data from literature. The following protocol, inspired by computational literature mining approaches, outlines how AI can be used to extract and formalize experimental procedures from ecotoxicity study manuscripts [14].

Objective: To automatically parse a published ecotoxicity study's "Materials and Methods" section into a structured, machine-actionable sequence of experimental actions.

Materials:

Source Documents: Digital manuscripts of ecotoxicity studies in PDF or HTML format.
Software & Models: Natural Language Processing (NLP) toolkit (e.g., spaCy, specialized chemical NLP models). A pre-trained sequence-to-sequence model (e.g., based on Transformer or BART architectures) trained on chemical procedures is ideal [14].
Action Ontology: A defined vocabulary of experimental actions relevant to ecotoxicology (e.g., PREPARE_STOCK_SOLUTION, ACCLIMATE_ORGANISMS, MEASURE_PH, RECORD_MORTALITY).

Procedure:

Text Extraction and Pre-processing:
- Convert the PDF/HTML manuscript to plain text.
- Isolate the "Materials and Methods" (or equivalent) section using section headers.
- Clean the text (remove line breaks, standardize units).
Named Entity Recognition (NER):
- Apply an NLP model to identify and tag key entities within the text:
  - Chemicals: Test substance, solvents, formulants.
  - Organisms: Species names, life stages.
  - Apparatus: Test chambers, water quality probes.
  - Parameters: Concentrations, temperatures, durations, pH levels.
Action Sequence Prediction:
- Input the processed text or a formalized summary (e.g., a list of identified entities) into a pre-trained action prediction model [14].
- The model generates a sequence of step-by-step actions. For example:
  - ACTION: PREPARE_TEST_SOLUTION; INPUT: $TEST_SUBSTANCE$; PARAMETER: CONCENTRATION=100 mg/L; SOLVENT: $DILUTION_WATER$
  - ACTION: ACCLIMATE_ORGANISMS; PARAMETER: DURATION=7 days
  - ACTION: INITIATE_EXPOSURE; INPUT: $TEST_SOLUTION$; INPUT: $TEST_ORGANISMS$
Validation and Curation:
- A domain expert (ecotoxicologist) reviews the AI-generated action sequence against the original manuscript for accuracy and completeness.
- Errors are corrected, and the curated sequence is added to a growing database of structured protocols. This feedback loop improves future model performance.

Application to CRED: The resulting structured protocol can be algorithmically checked for completeness against CRED's reporting recommendations (e.g., "Was exposure concentration measured?" corresponds to a MEASURE_CONCENTRATION action). This can provide a preliminary, automated check on study reporting quality.

Visualizing the Evaluation Workflow and Conceptual Framework

CRED Evaluation Protocol Workflow

Reliability and Relevance in Regulatory Decision-Making

Table 3: Key Resources for CRED Evaluation and Ecotoxicity Study Design

Resource / Tool	Function / Purpose	Key Features & Notes
CRED Assessment Sheets (Excel) [12]	The primary tool for conducting evaluations. Provides the structured checklist of 20 reliability and 13 relevance criteria with guided fields for scoring and justification.	Available for aquatic studies. Requires enabling macros for full functionality [12].
NanoCRED Tool [12]	Specialized adaptation of CRED for evaluating ecotoxicity studies of engineered nanomaterials (ENMs). Incorporates criteria specific to ENM characterization (e.g., size, coating, agglomeration state in media).	Addresses the unique reliability challenges posed by nanomaterial testing.
EthoCRED Framework [12]	A framework to guide the reporting and evaluation of behavioral ecotoxicity studies. Provides criteria to assess the reliability and relevance of behavioral endpoints.	Helps integrate sensitive behavioral data into regulatory assessments systematically.
CRED Reporting Recommendations [13]	A checklist of 50 specific reporting items across 6 categories (general, test design, substance, organism, exposure, stats).	Using this as a guide when designing studies or writing manuscripts proactively ensures future evaluations will yield high reliability scores.
OECD Test Guidelines	Internationally agreed test protocols (e.g., OECD 201, 210, 211).	Studies conducted in full compliance with a relevant OECD TG typically fulfill many core CRED reliability criteria.
AI for Protocol Extraction [14]	Computational models (e.g., transformer-based NLP) that can parse textual methods sections into structured action sequences.	Emerging tool to automate the extraction and formalization of experimental details, aiding in rapid screening and data curation.

A Step-by-Step Guide to Applying the CRED Evaluation Criteria

The Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) framework represents a pivotal advancement in the standardized assessment of ecotoxicological studies for regulatory and research purposes. Developed to address the inconsistencies and subjective biases inherent in earlier evaluation methods like the Klimisch approach, CRED provides a transparent, detailed, and systematic tool for judging the reliability and relevance of aquatic ecotoxicity data [1] [2]. Within the broader thesis on methodological rigor in ecotoxicity, the CRED framework is posited as the foundational pillar that enables reproducible and consistent hazard and risk assessments. Its primary aim is to improve the usability of peer-reviewed literature in regulatory processes, such as deriving Predicted-No-Effect Concentrations (PNECs) and Environmental Quality Standards (EQSs), by ensuring that evaluations are based on the best available and most trustworthy science [1] [12].

Core Components: Reliability and Relevance Criteria

The CRED evaluation method is built on two distinct but complementary pillars: reliability (the inherent scientific quality of a study) and relevance (the appropriateness of the study for a specific assessment purpose) [1]. A study can be highly reliable but irrelevant for a particular regulatory question, and vice-versa. This clear separation is a cornerstone of the framework.

The 20 Reliability Criteria

Reliability assesses the intrinsic soundness of a study's design, conduct, and reporting. The 20 criteria are designed to minimize the risk of bias and ensure the clarity and plausibility of the findings [1].

Table 1: The CRED Framework's 20 Reliability Criteria

Criterion Group	Specific Criterion	Key Evaluation Question
Test Substance	1. Identity and Purity	Is the test substance clearly identified, and is its purity/impurity profile documented?
	2. Stability	Was the stability of the test substance in the medium verified?
	3. Exposure Verification	Was the actual exposure concentration measured and reported?
Test Organism	4. Species Identity	Is the test species clearly identified (preferably to species level)?
	5. Life-stage & Source	Are the life-stage, source (e.g., culture), and health status of organisms reported?
	6. Acclimatization	Were organisms properly acclimatized to test conditions?
Test Design	7. Control Groups	Were appropriate control groups (e.g., negative, solvent) included and performed acceptably?
	8. Replicates	Was the number of replicates and organisms per replicate sufficient and reported?
	9. Randomization	Was the allocation of test organisms to treatments randomized?
	10. Blinding	Was the scoring/evaluation of endpoints performed blindly?
	11. Test Concentrations	Were test concentrations justified and appropriately spaced (e.g., geometric series)?
Exposure Conditions	12. Test Duration	Was the test duration appropriate for the endpoint and species?
	13. Test Medium	Is the composition of the test medium (water, sediment) fully described?
	14. Temperature & Light	Were key physical parameters (temperature, photoperiod) controlled and reported?
	15. Feeding & Renewal	Were feeding (if any) and medium renewal regimes specified and appropriate?
Endpoint & Analysis	16. Endpoint Definition	Is the measured endpoint (e.g., immobility, growth) clearly defined?
	17. Statistical Methods	Were appropriate statistical methods used and clearly reported?
	18. Dose-Response	Is a dose-response relationship demonstrated or discussed?
	19. Raw Data	Is access to raw data provided or are summary data sufficiently detailed?
Reporting & Plausibility	20. Results Consistency	Are the results internally consistent and plausibly linked to the methodology?

The 13 Relevance Criteria

Relevance determines how fit-for-purpose a study is for a specific regulatory assessment. It depends on the assessment goals (e.g., protecting freshwater vs. marine ecosystems, acute vs. chronic risk) [1].

Table 2: The CRED Framework's 13 Relevance Criteria

Criterion Category	Specific Criterion	Regulatory Assessment Consideration
Test Species	1. Taxonomic Group	Is the species from a relevant trophic level (e.g., algae, invertebrate, fish)?
	2. Protection Goals	Does the species represent a regionally or functionally relevant protection goal?
Exposure	3. Exposure Pathway	Is the tested exposure route (e.g., water, sediment, diet) relevant to the scenario?
	4. Temporal Pattern	Does the exposure duration (acute, chronic, pulsed) match the expected environmental exposure?
	5. Media Characteristics	Are the test medium properties (pH, hardness, organic carbon) relevant to the assessment area?
Endpoint	6. Effect Type	Is the measured endpoint (lethal, sublethal like growth/reproduction, behavioral) relevant to the protection goal?
	7. Ecological Significance	Is the endpoint linked to individual fitness or population-level consequences?
Test Substance	8. Substance Form	Is the tested form (e.g., active ingredient, formulated product, environmental metabolite) relevant?
	9. Fate Considerations	Does the test consider relevant environmental transformation processes?
Assessment Context	10. Regulatory Framework	Does the study meet the specific data requirements of the applicable regulation (e.g., REACH, WFD)?
	11. Assessment Factor Derivation	Is the study suitable for deriving assessment factors (e.g., a chronic NOEC for a PNEC)?
	12. Mode of Action	Does the test endpoint align with the known or suspected mode of action of the substance?
	13. Overall Weight of Evidence	How does the study contribute to the overall body of evidence for the hazard assessment?

Application Protocol: Implementing the CRED Evaluation Method

The following protocol outlines the step-by-step application of the CRED framework for the systematic evaluation of an aquatic ecotoxicity study.

Protocol: CRED Study Evaluation Workflow

Objective: To consistently and transparently determine the reliability and relevance of an aquatic ecotoxicity study for use in chemical hazard and risk assessment.

Materials: Study manuscript/report, CRED evaluation sheet (Excel tool recommended [2] [12]), access to supplemental data if available.

Procedure:

Study Acquisition and Preliminary Review:
- Obtain the full text of the ecotoxicity study to be evaluated.
- Perform an initial read to understand the study's objectives, design, and key findings.
Systematic Reliability Assessment (Apply 20 Criteria):
- For each of the 20 reliability criteria in Table 1, extract the corresponding information from the study's methods and results sections.
- For each criterion, assign a judgment:
  - "Yes": The criterion is fully met. Information is clearly reported and methodologically sound.
  - "No": The criterion is not met. Information is missing or the methodology is flawed.
  - "Partly": The criterion is partially met. Information or methodology is partially addressed but incomplete or unclear.
  - "Not Reported (NR)": The information needed to judge the criterion is absent from the report.
- Document the rationale for each judgment, citing specific lines, tables, or figures from the study.
Overall Reliability Classification:
- Based on the pattern of judgments, assign an overall reliability classification to the study. While CRED encourages a nuanced view, classifications often follow:
  - Reliable without restriction: All key criteria are met ("Yes"). Minor limitations may be present but do not affect the study's core conclusions.
  - Reliable with restrictions: Some criteria are only partly met or specific minor flaws exist, but the study is still scientifically valid for use.
  - Not reliable: Critical flaws (e.g., no controls, unacceptable control performance, unverified exposure concentrations, inappropriate statistics) invalidate the study's findings [1].
Context-Specific Relevance Assessment (Apply 13 Criteria):
- Define the specific purpose of your assessment (e.g., "Derivation of a chronic freshwater PNEC for a pharmaceutical").
- For each of the 13 relevance criteria in Table 2, evaluate the study against your defined assessment goal.
- Judge each criterion as "Relevant", "Not Relevant", or "Potentially Relevant".
- Document the reasoning, linking study details to the protection goals and exposure scenarios of your assessment.
Integration and Final Decision:
- Synthesize the reliability and relevance evaluations.
- Make a final decision on the study's usability:
  - A study must be at least "Reliable with restrictions" to be considered for primary use in quantitative risk assessment.
  - A "Not reliable" study is typically excluded, though it may be noted in a weight-of-evidence discussion.
  - The relevance judgment determines how a reliable study is used (e.g., as a key study, a supporting study, or for a different assessment endpoint).
Documentation and Reporting:
- Complete the CRED evaluation sheet in full, ensuring all judgments and rationales are recorded. This creates a transparent audit trail [1].

Experimental Basis: The CRED Ring Test

The CRED framework was empirically validated through an international ring test [1] [2].

Design: Risk assessors from industry, academia, and government evaluated multiple ecotoxicity studies, first using the traditional Klimisch method and then using the draft CRED method.
Outcome: Participants found the CRED method more accurate, applicable, consistent, and transparent than the Klimisch method. The detailed criteria reduced subjectivity and variability between assessors [1].
Result: The ring test provided the evidence base for fine-tuning the criteria and established CRED as a demonstrably superior tool for harmonized study evaluation.

Specialized Extensions of the CRED Framework

The core CRED principles have been adapted to address specific challenges in emerging areas of ecotoxicology.

EthoCRED for Behavioral Ecotoxicity Studies

Behavioral endpoints are highly sensitive but poorly covered by standard test guidelines. EthoCRED extends the CRED framework with tailored criteria for behavioral studies [15] [16].

Key Adaptations:
- Reliability: Adds criteria for behavioral assay validation (e.g., equipment calibration, environmental control in arenas), automated tracking verification, and ethical considerations for behavioral testing.
- Relevance: Emphasizes the ecological significance of behavioral endpoints (e.g., linking altered foraging to individual fitness, linking predator avoidance to population dynamics) and the temporal resolution of measurements appropriate for behavioral responses [15].

NanoCRED for Nanomaterial Ecotoxicity

Evaluating studies on engineered nanomaterials (ENMs) requires attention to their unique properties. NanoCRED modifies CRED to address nano-specific challenges [12].

Key Adaptations:
- Reliability: Stricter criteria for material characterization (size, shape, coating, aggregation state in the test medium) and exposure verification using analytical techniques suitable for ENMs (e.g., ICP-MS, electron microscopy).
- Relevance: Focuses on the environmental realism of the tested nanoform and the appropriateness of the test media for assessing fate and bioavailability of ENMs [12].

Evolution and Integration: The EcoSR Framework

The Ecotoxicological Study Reliability (EcoSR) framework, proposed in 2025, represents an evolution integrating CRED's strengths with a more formal Risk of Bias (RoB) assessment approach common in human health [17].

Table 3: Comparison of the CRED and EcoSR Frameworks

Feature	CRED Framework	EcoSR Framework
Primary Focus	Evaluation of reliability and relevance for regulatory data acceptance.	In-depth assessment of internal validity (risk of bias) for toxicity value development.
Structure	Two sets of criteria (20 reliability, 13 relevance).	Two-tiered: Tier 1 (screening) and Tier 2 (full RoB assessment across bias domains).
Core Methodology	Criteria-based scoring with expert judgment.	Domain-based RoB judgment (e.g., Low/Medium/High/Unclear risk of bias in selection, exposure, outcome measurement).
Output	Classification (e.g., reliable without/with restrictions) and relevance judgment.	A detailed bias profile identifying the most critical weaknesses affecting study validity.
Relationship	Serves as the foundational criterion set. EcoSR incorporates and builds upon CRED's reliability concepts for deeper validity appraisal [17].

Reporting Guidelines: The CRED Reporting Recommendations

To proactively improve study quality, CRED provides 50 reporting recommendations across six categories [1]. Adherence to these by authors minimizes evaluation ambiguity.

Table 4: CRED Reporting Recommendation Categories

Category	Number of Criteria	Purpose
General Information	7	Ensure traceability and context (e.g., authors, funding, regulatory purpose).
Test Design	9	Fully document experimental setup (e.g., type of test, controls, replicates, randomization).
Test Substance	7	Provide complete chemical identification, preparation, and analytical verification details.
Test Organism	7	Specify organism biology, source, husbandry, and acclimation conditions.
Exposure Conditions	11	Detail all physical, chemical, and temporal aspects of the exposure regime.
Statistical & Biological Response	9	Clearly present data, statistical methods, results, and dose-response relationships.

The Scientist's Toolkit: Essential Reagents and Materials for CRED-Evaluated Studies

The following materials are critical for conducting ecotoxicity studies that can meet high reliability standards under CRED evaluation.

Table 5: Research Reagent Solutions for Standard Aquatic Ecotoxicity Tests

Item	Function in Ecotoxicity Testing	CRED Evaluation Consideration
Reference Toxicants (e.g., KCl, Sodium dodecyl sulfate)	Used in periodic control tests to confirm the consistent sensitivity and health of test organism cultures.	Supports Criterion 7 (Control Groups) by demonstrating laboratory proficiency and organism health.
Solvent Controls (e.g., Acetone, Methanol, DMSO)	Vehicles for poorly water-soluble test substances. Must be non-toxic at the concentration used.	Critical for Criterion 7 and Criterion 11 (Test Concentrations); their use and effect must be reported.
Reconstituted Standardized Test Media (e.g., OECD, EPA reconstituted freshwater)	Provides a consistent, defined water chemistry matrix for tests, improving inter-laboratory reproducibility.	Directly addresses Criterion 13 (Test Medium); composition must be specified.
Analytical Grade Test Substance	The chemical of known identity and high purity used to prepare stock and test solutions.	Fundamental to Criterion 1 (Identity and Purity). Impurities must be characterized.
Internal & External Analytical Standards (for chemical analysis)	Used in chromatography (e.g., HPLC, GC) and spectroscopy to quantify the test substance concentration in the exposure medium.	Essential for Criterion 3 (Exposure Verification). The method and frequency of analysis must be reported.
Live Algal or Invertebrate Food Cultures (e.g., Pseudokirchneriella subcapitata, Artemia nauplii)	Provides nutrition for chronic tests with fish and invertebrates, and is the test organism for algal growth inhibition tests.	Relevant to Criterion 5 (Life-stage & Source) of the food organism and Criterion 15 (Feeding).
Certified Water Quality Kits/Probes	For monitoring and reporting key water quality parameters (pH, dissolved oxygen, conductivity, temperature, hardness).	Required for Criterion 14 (Temperature & Light) and part of documenting exposure conditions.

Visual Synthesis: Frameworks and Workflows

CRED Framework Ecosystem and Evolution

CRED Evaluation Decision Workflow

This document provides application notes and detailed protocols for implementing the Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) method. The content is framed within a broader thesis focused on advancing the evaluation of ecotoxicity study reliability. The CRED method was developed to address significant shortcomings in the widely used Klimisch method, which has been criticized for its lack of detailed guidance, inconsistency among assessors, and insufficient consideration of study relevance [18]. Within the thesis context, CRED represents a pivotal evolution towards a more transparent, consistent, and scientifically robust framework for evaluating data used in environmental hazard and risk assessments for chemicals, including pharmaceuticals [18].

The core thesis posits that adopting a structured, criteria-based workflow—from initial study screening to detailed mechanistic evaluation—enhances the reliability of ecotoxicological risk assessments. This workflow ensures that all available data, including peer-reviewed literature, are consistently and transparently evaluated, thereby supporting more harmonized regulatory decisions [18] [12].

Core Workflow: A Three-Phase CRED Protocol

The practical application of the CRED methodology follows a sequential, three-phase workflow. This structured approach ensures a systematic evaluation of both the reliability (inherent quality of the study) and relevance (appropriateness for the specific assessment) of ecotoxicity data [18].

Phase 1: Study Screening and Triage Phase 2: Detailed Reliability & Relevance Evaluation Phase 3: Data Integration and Uncertainty Characterization

Phase 1 Protocol: Study Screening and Triage

Objective: To efficiently identify and acquire ecotoxicity studies that meet minimum thresholds of acceptability for further detailed evaluation.

Procedure:

Source Identification: Compile studies from regulatory dossiers, scientific databases (e.g., ECOTOX), and peer-reviewed literature [19].
Initial Application of Acceptance Criteria: Screen each study against a predefined set of minimum criteria. Studies that fail are categorized as "rejected" and excluded from the next phase, though the reason for rejection must be documented.
Categorization: Based on the screen, categorize studies as:
- Accepted: For full evaluation (Phase 2).
- Rejected: Does not meet critical criteria.
- Other: Requires expert judgment (e.g., preliminary data, unclear reporting) [19].

Key Screening Criteria (Adapted from US EPA Guidelines) [19]:

The study investigates effects of a single chemical.
Effects are reported for whole, live aquatic or terrestrial organisms.
A concurrent control group is used.
An explicit exposure duration and a measured or nominal concentration/dose are reported.
A quantitative biological endpoint (e.g., LC50, NOEC, EC50) is presented.
The study is a primary source (not a review or secondary summary) and is publicly available.

Table 1: Minimum Study Screening Criteria for Aquatic Ecotoxicity Data [19]

Criterion Category	Description	Accept/Reject Decision
Test Substance	Single, identifiable chemical of concern.	Reject if mixture or unknown substance.
Test Organism	Live, whole aquatic or terrestrial species.	Reject if cell line, microbial assay, or deceased organisms.
Experimental Design	Concurrent control group; reported exposure duration.	Reject if control missing or exposure time unclear.
Data Reporting	Quantitative endpoint; concentration/dose reported.	Reject if only qualitative effects or no exposure data.
Document Type	Primary source, full article, publicly available.	Reject if abstract-only, review, or unavailable document.

Phase 2 Protocol: Detailed Reliability and Relevance Evaluation

Objective: To perform a transparent and criterion-based assessment of the methodological reliability and assessment-specific relevance of each accepted study.

Procedure:

Reliability Evaluation: Use the CRED evaluation sheet to assess 20 key criteria across four domains [18]:
- Test Substance Characterization: (e.g., purity, formulation, concentration verification).
- Test Organism Information: (e.g., species, life stage, source, health status).
- Study Design and Execution: (e.g., test system, exposure regimen, temperature, controls, compliance with guidelines like OECD).
- Data Reporting and Analysis: (e.g., raw data, statistical methods, dose-response, clarity of results). Each criterion is evaluated (e.g., Yes/No/Partly), with supporting guidance and comments documented.
Relevance Evaluation: Simultaneously evaluate the study against 13 relevance criteria tailored to the specific risk assessment question [18]:
- Biological Relevance: Appropriateness of the tested species, endpoint (e.g., mortality, growth, reproduction), and exposure pathway.
- Temporal and Spatial Relevance: Match of exposure duration and scenario to the assessment context.
Overall Weight-of-Evidence Judgment: Synthesize reliability and relevance evaluations to assign a final confidence rating (e.g., Reliable, Reliable with Restrictions, or Not Reliable) for the study's use in the specific assessment.

Table 2: Comparison of the Klimisch and CRED Evaluation Methods [18]

Characteristic	Klimisch Method	CRED Method
Primary Focus	Reliability only.	Reliability and Relevance.
Number of Criteria	12-14 vague criteria.	20 reliability + 13 relevance detailed criteria.
Guidance Provided	Minimal, leading to high expert judgment dependency.	Detailed guidance for each criterion, improving consistency.
Basis for Judgment	Often prioritizes GLP and guideline status.	Transparent, criteria-based scoring of reported methods.
Outcome Transparency	Low; categorical score only.	High; documented evaluation for each criterion.

Phase 3 Protocol: Data Integration and Uncertainty Characterization

Objective: To integrate evaluated studies into a coherent dataset for risk characterization and to transparently communicate the overall uncertainty.

Procedure:

Dataset Compilation: Create a matrix of all reliable studies, organized by species, endpoint, and reliability/relevance rating.
Mechanistic Modeling (Where Applicable): For refined assessments, use reliable data to calibrate and validate Toxicokinetic-Toxicodynamic (TKTD) models, such as the General Unified Threshold model of Survival (GUTS) [20].
- Model Calibration: Fit the model to a subset of experimental data (e.g., time-series survival under constant exposure).
- Model Validation: Test model predictions against an independent dataset (e.g., survival under pulsed exposure).
- Performance Assessment: Evaluate model fits using a combination of visual assessment and quantitative Goodness-of-Fit (GoF) metrics (e.g., Normalized Root-Mean-Square Error - NRMSE) [20].
Uncertainty Characterization: Use graphical and tabular approaches to communicate overall confidence in the derived toxicity values or risk estimates. This can include "traffic light" tables or uncertainty factor decomposition plots that illustrate the strengths and limitations of the underlying database and models [21].

Table 3: Goodness-of-Fit Metrics for TKTD (GUTS) Model Evaluation [20]

Metric	Acronym	Description	Interpretation & Suggested Threshold
Normalized Root-Mean-Square Error	NRMSE	Measures the average magnitude of prediction error over time, normalized by the mean observation. Lower values indicate better fit.	NRMSE < 50% generally indicates a satisfactory fit for survival data [20].
Survival Probability Prediction Error	SPPE	Quantifies the accuracy of predicted survival at the end of the experiment across all treatments.	SPPE < 30% is suggested as a conservative acceptance criterion [20].
Posterior Predictive Check	PPC	Assesses whether observations fall within the Bayesian confidence intervals of the model predictions.	A high percentage (e.g., >80%) of data points within the 95% prediction interval is desirable.

Diagram 1: The Three-Phase CRED Evaluation Workflow (Max Width: 760px)

Advanced Applications and Extended Protocols

The CRED framework is adaptable to specialized areas within ecotoxicology, ensuring its utility for modern research challenges.

3.1 Protocol for Nanomaterial Ecotoxicity (NanoCRED): The basic CRED criteria are extended with specific considerations for nanomaterials [12].

Test Substance Characterization: Must include particle characterization (size distribution, surface area, charge, aggregation state) in the exposure medium.
Dosing and Exposure: Documentation of methods to maintain stable and characterized exposures throughout the test (e.g., sonication, use of dispersants, measured concentrations).
Endpoint Relevance: Inclusion of endpoints specific to nano-effects, such as oxidative stress, particle uptake, or behavioral changes.

3.2 Protocol for Behavioral Endpoints (EthoCRED): EthoCRED provides a tailored framework for evaluating studies on behavioral changes, a sensitive but methodologically complex endpoint [12].

Apparatus Validation: Evaluation of the testing apparatus for its ability to accurately measure the claimed behavior (e.g., arena size, sensor calibration, video tracking settings).
Baseline Behavior: Requirement for documentation of normal behavioral variation in control groups.
Blinding and Bias Mitigation: Assessment of whether experiments were conducted with treatment blinding to observer.
Statistical Power: Evaluation of whether the sample size was sufficient to detect a biologically meaningful effect size in behavioral metrics.

Diagram 2: Protocol for Evaluating TKTD (GUTS) Model Performance (Max Width: 760px)

The Scientist's Toolkit: Essential Research Reagent Solutions

Successful implementation of the CRED workflow requires both conceptual tools and practical resources.

Table 4: Essential Toolkit for CRED-Based Ecotoxicity Evaluation

Tool/Resource	Function in the Workflow	Source/Example
CRED Evaluation Sheets	Structured templates for documenting reliability and relevance criteria assessments for each study.	Excel-based tools with macros for visualization [12].
OECD Test Guidelines	Provide the standardized methodological benchmarks against which study reliability is evaluated.	OECD Guidelines 210 (Fish Early-Life), 211 (Daphnia magna), 201 (Algae) [18].
ECOTOX Database	A key source for identifying peer-reviewed ecotoxicity studies for screening (Phase 1).	U.S. EPA ECOTOXicology knowledgebase [19].
QSAR Toolbox	Software for data gap filling via read-across and category formation, useful after data evaluation.	OECD QSAR Toolbox for grouping chemicals and predicting toxicity [22].
TKTD Modeling Software	Tools to calibrate and validate mechanistic models (e.g., GUTS) for refined risk assessment (Phase 3).	`morse` or `GUTS` R packages [20].
Uncertainty Characterization Templates	Pre-formatted tables and graphs for transparently communicating data confidence and variability.	Approaches based on IRIS framework (e.g., uncertainty factor plots) [21].

The Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) framework was developed to standardize the assessment of aquatic ecotoxicity studies, moving beyond subjective expert judgment to promote reproducibility, transparency, and consistency in regulatory decision-making [1]. A core innovation of CRED is its refined classification system for study reliability, which includes the pivotal "Reliable with Restrictions" category. This classification is essential for a nuanced hazard and risk assessment, as it allows for the inclusion of valuable scientific data that may have minor flaws or deviations from standardized guidelines but remain scientifically sound and informative.

Within CRED, reliability and relevance are distinct but interconnected concepts. Reliability refers to the inherent scientific quality of a study—its design, performance, and analysis—independent of its intended use. Relevance, however, is defined by the appropriateness of the data for a specific hazard identification or risk characterization purpose [1]. A study can be highly reliable but irrelevant for a particular assessment (e.g., a robust soil ecotoxicity study is irrelevant for setting a water quality standard). Conversely, a study deemed "Reliable with Restrictions" may be highly relevant and provide critical evidence, especially when data on a particular substance or endpoint are scarce.

The "Reliable with Restrictions" category signifies that a study is fundamentally valid and contributes useful evidence, but contains specific, defined limitations. These limitations are not severe enough to invalidate the core findings, but they introduce a degree of uncertainty or reduce the confidence with which the results can be applied. The proper interpretation of this category is therefore critical: it prevents the unnecessary dismissal of valuable research while ensuring that any constraints on the data's use are clearly acknowledged and documented.

Quantitative Criteria for the 'Reliable with Restrictions' Classification

The assignment of the "Reliable with Restrictions" category is based on a detailed evaluation against explicit criteria. The foundational CRED method outlines 20 reliability criteria covering aspects from test substance characterization to statistical analysis [1]. The more recent EthoCRED extension, designed for behavioral ecotoxicity studies, expands this to 29 reliability criteria to address the unique methodologies in this sub-discipline [23]. A study typically falls into the "Restricted" category when it fulfills most core scientific principles but has deficiencies in one or several specific criteria.

Table 1: Common Deficiencies Leading to a 'Reliable with Restrictions' Classification

Evaluation Category	Specific Criteria Deficiency	Impact on Study Interpretation
Test Substance & Solution	Incomplete characterization of test substance purity or concentration verification; inadequate description of solvent/dosing vehicle.	Introduces uncertainty in the actual exposure concentration, affecting dose-response accuracy.
Test Organism	Organism source or life stage not fully specified; pre-exposure health/holding conditions inadequately reported.	Raises questions about genetic variability, health status, and the reproducibility of the test.
Exposure System	Lack of measurement of key water quality parameters (e.g., pH, oxygen, temperature) during the test; insufficient renewal of test media.	Uncertainty over whether effects are due to the toxicant or to stressful or variable environmental conditions.
Experimental Design	Number of replicates or organisms per replicate lower than optimal but sufficient to detect a clear effect; randomisation procedure not described.	Reduces the statistical power of the study and may raise concerns about systematic bias.
Data & Reporting	Raw data not available; statistical methods not fully detailed or suboptimal but conclusions still plausible.	Limits independent re-analysis and verification of the reported effect levels (e.g., EC50).
Behavioral Endpoints (EthoCRED)	Inadequate calibration of tracking equipment; insufficient acclimation time for organisms prior to behavioral assay [23].	May introduce noise or stress artifacts into the behavioral data, potentially obscuring or confounding toxicant-induced effects.

The transition from "Reliable" to "Reliable with Restrictions" is not merely a tally of flaws. It requires expert judgment within the structured CRED framework to determine if the identified deficiencies materially undermine the study's conclusions. For example, a study might use a slightly sub-optimal number of replicates but demonstrate a very strong, dose-dependent, and statistically significant effect. In such a case, the core finding is robust despite the design limitation.

Experimental Protocols for Key Cited Studies

The practical application of the CRED evaluation is best illustrated through protocols. The following outlines a standardized behavioral assay, a type of study frequently evaluated under the EthoCRED extension, and the subsequent evaluation workflow.

Detailed Protocol: Fish Locomotor Activity Assay (Sub-lethal Endpoint)

This protocol assesses changes in swimming behavior, a sensitive indicator of neurotoxicity or general stress.

Materials:

Test Organisms: Juvenile zebrafish (Danio rerio), 30 days post-fertilization.
Exposure System: Semi-static system with 2L glass aquaria. Test concentration prepared from a certified stock solution.
Behavioral Arena: A rectangular glass tank (20cm L x 10cm W x 15cm H) with a white backdrop.
Recording Equipment: High-definition camera mounted orthogonally above the arena, connected to tracking software (e.g., EthoVision XT).
Data Analysis Software: Statistical package (e.g., R, PRISM) with appropriate non-parametric tests.

Procedure:

Acclimation: Individually acclimate fish to the behavioral arena filled with clean, aerated water for 30 minutes prior to recording.
Recording: Record swimming activity for a 10-minute period under consistent, diffuse lighting. Camera settings (frame rate, resolution) must be documented and held constant.
Exposure Groups: Repeat for fish exposed to a minimum of five concentrations of the test substance and a solvent control (if applicable). Include a minimum of 8 replicate fish per treatment.
Tracking Analysis: Use tracking software to extract endpoints: Total distance moved, average velocity, time spent mobile, and thigmotaxis (time near walls).
Statistics: Perform a one-way ANOVA or Kruskal-Wallis test on each endpoint, followed by post-hoc Dunnett's test to compare each treatment to the control. Calculate No Observed Effect Concentration (NOEC) and Lowest Observed Effect Concentration (LOEC).

CRED Evaluation Points: An evaluator would check this protocol against criteria such as: Was the concentration verified analytically? Was the camera calibrated? Was the acclimation time sufficient to avoid novelty stress? Omission of such details could lead to a "Restricted" classification.

Protocol for CRED Reliability Evaluation Workflow

This is the meta-protocol for applying the CRED method to an ecotoxicity study.

Materials:

Study manuscript or report.
CRED evaluation checklist (spreadsheet or form with 20 standard or 29 EthoCRED criteria) [1] [23].
Access to original statistical guidelines (e.g., OECD Test Guidelines).

Procedure:

Initial Screening: Read the study fully to understand objectives, design, and conclusions.
Criterion-by-Criterion Assessment: For each reliability criterion, answer "Yes," "No," or "Not Applicable." "Yes" indicates full compliance.
Document Deficiencies: For every "No," document the exact nature of the deficiency and its location in the text (e.g., "Page 5: water temperature range not reported").
Weight Deficiencies: Judge the severity of each deficiency. Does it fundamentally invalidate a core part of the study (e.g., no control group), or is it a minor reporting omission (e.g., supplier name missing for a standard strain)?
Assign Category:
- Reliable: All key criteria are met ("Yes"). Minor issues are absent or negligible.
- Reliable with Restrictions: One or more deficiencies are present, but the main study findings are considered valid and not critically compromised.
- Not Reliable: One or more critical deficiencies are present that invalidate the study's results (e.g., uncontrolled confounding factor, fatal statistical error).
Final Documentation: Produce an evaluation report that lists the final category, a summary of deficiencies for "Restricted" studies, and a commentary on the study's relevance for the intended assessment purpose.

Visualizing the Evaluation Framework and Workflow

CRED Evaluation Decision Pathway

Integrating Restricted Studies in Evidence Synthesis

The Scientist's Toolkit: Research Reagent Solutions

The reliability of an ecotoxicity study hinges on the quality and appropriate use of materials. The following table details essential reagents and materials, their function, and their link to CRED evaluation criteria.

Table 2: Essential Research Reagents and Materials for Ecotoxicity Testing

Item	Function & Purpose	CRED Evaluation Link
Certified Reference Material (CRM)	A substance with one or more properties (e.g., purity, concentration) that are certified by a recognized authority. Used to prepare accurate stock solutions and calibrate analytical equipment.	Directly addresses reliability criteria for test substance characterization and exposure concentration verification. Lack of CRM use can lead to a "Restricted" classification.
Solvent Control Substance	A high-purity solvent (e.g., acetone, dimethyl sulfoxide) used to dissolve poorly water-soluble test substances. A solvent control group is essential to distinguish toxicant effects from solvent artifacts.	Critical for evaluating test design and control groups. Absence or inappropriate concentration of a solvent control is a major deficiency.
Culture Media & Food (Standardized)	Defined, consistent algal culture media (e.g., OECD TG 201 medium) or standardized food (e.g., Selenastrum capricornutum for daphnids). Ensures test organisms are healthy and not nutritionally stressed.	Impacts criteria related to test organism health and maintenance. Use of non-standard or poorly characterized media/food can restrict reliability.
Water Quality Parameter Kits/Probes	Instruments to measure pH, dissolved oxygen (DO), conductivity, temperature, and hardness. Used to monitor and report exposure conditions.	Essential for documenting exposure conditions. Failure to report key parameters like DO or pH is a common reason for a "Restricted" classification [1].
Positive Control Toxicant	A reference substance with a known and consistent toxic effect (e.g., potassium dichromate for Daphnia acute tests). Used to confirm the sensitivity and proper response of the test organisms in a given assay.	Supports evaluation of test validity. Its inclusion demonstrates assay responsiveness and is a marker of a well-performed study.
Tracking Software & Calibration Grid	For behavioral studies, validated software (e.g., EthoVision, Noldus) and a physical calibration grid are required to accurately quantify movement. The grid ensures spatial measurements are correct.	A core EthoCRED criterion [23]. Lack of calibration or software validation introduces significant uncertainty, warranting a "Restricted" classification for behavioral data.

The Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) method represents a foundational advancement in the objective, transparent, and consistent assessment of aquatic ecotoxicity studies [1] [13]. This protocol is framed within a broader thesis positing that the reliability and relevance of ecotoxicity data are not merely outcomes of expert judgment but can be systematically engineered through structured evaluation and prospective reporting frameworks. Traditional methods, notably the Klimisch method, have been criticized for lacking detailed guidance, introducing evaluator bias, and favoring guideline studies over potentially relevant scientific literature, thereby compromising the consistency and transparency critical for regulatory decision-making [18]. The CRED project was initiated to address these shortcomings by developing a more detailed, guidance-rich tool for evaluating both the intrinsic reliability and contextual relevance of studies [1].

The core thesis argues that robust environmental risk assessment (ERA) depends on a seamless link between how studies are reported by authors and how they are subsequently evaluated by risk assessors. The CRED method operationalizes this link by providing a dual toolkit: a set of 20 reliability criteria and 13 relevance criteria for evaluators, complemented by 50 reporting recommendations for study authors [1] [13]. Empirical validation from a multinational ring test involving 75 risk assessors demonstrated that the CRED method was perceived as more accurate, consistent, transparent, and less dependent on subjective judgment than the Klimisch method [18]. This establishes CRED not just as an evaluation protocol, but as a comprehensive system that, when adopted by the research community, elevates the overall quality and regulatory utility of ecotoxicological science.

Detailed Protocols: Applying the CRED Evaluation Method

The following protocols provide a step-by-step methodology for applying the CRED evaluation framework, as used in formal validation studies and regulatory pilot programs [18] [2].

Protocol 1: The CRED Reliability and Relevance Evaluation Workflow

Objective: To perform a standardized, transparent evaluation of an aquatic ecotoxicity study's reliability and relevance for a specified regulatory assessment purpose.

Materials:

The study to be evaluated (peer-reviewed manuscript or study report).
CRED evaluation checklist (20 reliability and 13 relevance criteria with guidance) [1].
CRED Excel scoring tool (available from project resources) [2] [12].

Procedure:

Define Assessment Purpose: Clearly articulate the regulatory context (e.g., derivation of a Predicted-No-Effect Concentration (PNEC) for a pharmaceutical, setting an Environmental Quality Standard (EQS) under the Water Framework Directive). Relevance is purpose-dependent [1].
Conduct Initial Screening: Assess the study's abstract and introduction for obvious relevance to the assessment purpose (e.g., correct environmental compartment, organism group, endpoint).
Systematic Reliability Evaluation: Scrutinize the full study against the 20 reliability criteria. These are grouped into domains:
- Test Design & Reporting: Evaluate the clarity of objectives, experimental design, control groups, and replication.
- Test Substance: Assess the characterization of the test material (e.g., purity, formulation, for nanomaterials: size, coating, aggregation state) [24].
- Test Organism: Verify species identification, life stage, source, and health/condition.
- Exposure Conditions: Scrutinize the test system, exposure regimen, measurement and reporting of actual concentrations, and environmental parameters (pH, temperature, oxygen).
- Statistical & Biological Response: Evaluate data analysis methods, endpoint calculation, raw data availability, and dose-response relationship plausibility [1].
Systematic Relevance Evaluation: Evaluate the study against the 13 relevance criteria in the context of the defined purpose. Key domains include:
- Biological Relevance: Appropriateness of the test organism, endpoint, and exposure duration to the hazard or risk question.
- Environmental Relevance: Representativeness of the test conditions to real exposure scenarios.
- Toxicological Relevance: Concordance of the test endpoint with the suspected mode of action of the substance [1].
Categorization: For each criterion, document the judgment and assign the study to one of four final categories for both reliability and relevance:
- Reliable/Relevant without restrictions
- Reliable/Relevant with restrictions
- Not reliable/Not relevant
- Not assignable (insufficient information reported) [1] [18].
Documentation: Record the rationale for all judgments, especially for criteria rated as "with restrictions" or "not met." This documentation is crucial for transparency and auditability.

Validation Note: This protocol was validated in a ring test where evaluators using CRED showed improved consistency compared to those using the Klimisch method. For example, evaluation of a fish toxicity study for estrone saw a shift from 44% of evaluators rating it "reliable without restrictions" under Klimisch to only 16% under CRED, with 63% rating it "not reliable" due to identified flaws, demonstrating CRED's finer discriminatory power [25].

Protocol 2: Conducting a Ring Test to Compare Evaluation Methods

Objective: To empirically compare the consistency, transparency, and user perception of different study evaluation methods (e.g., CRED vs. Klimisch).

Materials:

A set of 8-10 diverse aquatic ecotoxicity studies (varying in organism, substance, quality, and publication type) [18].
A cohort of risk assessors (N ≥ 50) from diverse sectors (academia, industry, government, consultancy) and geographical regions [18].
Evaluation kits for each method (Klimisch and CRED guidelines).
Standardized questionnaires for participant feedback on method applicability, consistency, and time requirement.

Procedure:

Study Selection & Assignation: Select studies to cover a range of taxa (algae, crustaceans, fish), chemical classes (pharmaceuticals, biocides, industrial chemicals), and reliability levels. In a two-phase crossover design, assign each participant two unique studies to evaluate with Method A (e.g., Klimisch) and two different studies with Method B (e.g., CRED), ensuring no same-study overlap within institutions [18].
Blinded Evaluation: Provide participants with only the study documents and the evaluation guidelines for the assigned method. Do not provide the alternative method's criteria.
Data Collection: Collect the completed evaluation sheets, including the final reliability/relevance category and any free-text comments for each study.
Consistency Analysis: Calculate the degree of agreement (e.g., percentage agreement, Fleiss' kappa) among evaluators for each study and method. The ring test for CRED found that its detailed criteria reduced subjectivity [18].
Comparative Analysis: Compare the final categorizations for each study across methods. Analyze systematic shifts in categories (e.g., studies frequently moving from "reliable without restrictions" under Klimisch to "reliable with restrictions" under CRED).
Perception Analysis: Analyze questionnaire responses comparing methods on scales of accuracy, ease of use, transparency, and time consumption. The CRED ring test found it was perceived as more accurate, consistent, and transparent, though potentially more time-intensive [18].

Application Notes for Drug Development Professionals

For professionals developing human pharmaceuticals, the Environmental Risk Assessment (ERA) is a regulatory requirement in many jurisdictions [26]. The CRED method offers critical tools for both compiling and evaluating the ecotoxicity data for an Active Pharmaceutical Ingredient (API).

1. Evaluating Legacy API Data: For APIs marketed before ERA requirements, public literature may be the only data source. CRED provides a systematic framework to assess the reliability and relevance of these often non-guideline studies for inclusion in a modern regulatory submission or retrospective risk assessment [26]. Studies rated "reliable with restrictions" can be used with appropriate uncertainty analysis.

2. Designing & Reporting New Studies: When commissioning new ecotoxicity studies, the CRED reporting recommendations (50 criteria across 6 categories) serve as an ideal supplement to OECD test guidelines [1]. Ensuring that contract research organizations report all CRED-recommended information—especially on test substance characterization, exact exposure concentrations, and raw data—maximizes the likelihood that the study will be judged "reliable without restrictions" by regulatory assessors, smoothing the review process.

3. Integrating Non-Standard Endpoint Studies: Pharmaceuticals often have specific modes of action (e.g., endocrine disruption). Standard guideline tests may miss relevant sub-lethal effects. CRED's relevance criteria allow for the formal evaluation and justified inclusion of non-standard, mechanistic studies that are biologically pertinent, enhancing the scientific robustness of the ERA [1].

4. Leveraging Related Frameworks: The CRED philosophy is expanding into specialized areas. NanoCRED adapts criteria for the unique challenges of testing nanomaterial APIs (e.g., particle characterization, dissolution kinetics) [12] [24]. EthoCRED provides criteria for evaluating behavioral endpoint studies, which are increasingly relevant for neuroactive pharmaceuticals [12]. CREED (for exposure datasets) can be used to evaluate environmental monitoring data for APIs, completing the risk assessment picture [27].

The Scientist's Toolkit: Essential Research Reagent Solutions

The following table details key materials and conceptual tools essential for conducting ecotoxicity studies that align with CRED's principles of reliability, relevance, and transparent reporting.

Table 1: Key Research Reagent Solutions for CRED-Aligned Ecotoxicity Studies

Item	Function in Ecotoxicity Testing	Relevance to CRED Evaluation & Reporting
OECD Test Guidelines (e.g., OECD 201, 210, 211)	Provide standardized, internationally recognized protocols for testing chemicals on algae, daphnia, and fish [26].	Form the baseline for evaluating test design reliability. CRED criteria align with and expand upon OECD reporting requirements [18].
Good Laboratory Practice (GLP)	A quality system covering the organizational process and conditions for planning, performing, monitoring, recording, and reporting non-clinical studies [26].	Strongly supports reliability by ensuring data integrity and traceability. CRED, however, evaluates the scientific content, ensuring GLP studies are also scientifically sound [1].
Certified Reference Materials & Test Substances	Substances with specified purity and characterized properties used to ensure accuracy and reproducibility of dosing [1].	Critical for meeting CRED reliability criteria on "Test Substance" characterization (identity, purity, composition, concentration verification) [1].
Defined Test Organism Cultures (e.g., Daphnia magna, Pseudokirchneriella subcapitata)	Cultured under standardized conditions to ensure genetic consistency, health, and reproducible sensitivity [1].	Essential for meeting CRED criteria on "Test Organism" (species/strain identification, source, health status, acclimation) [1].
Analytical Grade Solvents & Chemicals	Used for preparing stock and test solutions, and for cleaning equipment to prevent contamination [1].	Supports reliability by ensuring accurate dosing and avoiding confounding toxicity, as evaluated under "Exposure Conditions" [1].
Validated Analytical Instruments (HPLC, GC-MS, ICP-MS)	Used to measure and verify the actual concentration of the test substance in the exposure medium [1] [26].	Fundamental for CRED. Measured concentrations are heavily weighted in reliability evaluation. Reporting of analytical verification is a key CRED recommendation [1].
Data Management & Statistical Software	Tools for recording raw data, performing statistical analysis (e.g., LC/EC50 calculation, hypothesis testing), and storing metadata [1].	Supports CRED criteria on "Statistical Design and Biological Response." Availability of raw data is a specific CRED reporting recommendation that enhances reliability and transparency [1].

Visualized Workflows: From Reporting to Evaluation

The following diagrams, generated using Graphviz DOT language, illustrate the logical relationships and workflows central to the CRED framework.

CRED Evaluation Workflow and Outcome Categories

CRED Evaluation Workflow and Outcome Categories

Linking Study Reporting to Evaluation Criteria

Linking Study Reporting to Evaluation Criteria

Navigating Common Pitfalls and Advanced Applications of CRED

Within the broader thesis on advancing the Criteria for Reporting and Evaluating ecotoxicity Data (CRED) method for evaluating ecotoxicity study reliability, a critical and recurrent challenge is the inconsistent interpretation of study deficiencies. Ambiguous classification of missing or inadequate information directly compromises the transparency, consistency, and scientific robustness of hazard and risk assessments for chemicals [18]. This document provides essential Application Notes and Protocols to resolve the ambiguity between two distinct concepts: "Not Reported" (a reporting quality issue) and "Not Fulfilled" (a methodological reliability issue). Clarifying this distinction is fundamental to implementing the CRED evaluation method as a suitable, transparent replacement for the older Klimisch method, thereby strengthening harmonization across regulatory frameworks [18] [12].

Definitions and Conceptual Framework

Precise definitions are the foundation for consistent study evaluation. The following terms must be strictly differentiated:

Not Reported: An item of information that is absent from the study report or publication. The evaluation is based solely on the documentation provided. It is a measure of reporting quality [11]. The actual experimental conduct regarding this item is unknown.
Not Fulfilled: An item where the reported information indicates that the experimental conduct or methodological standard was inadequate or deviated from accepted principles (e.g., OECD test guidelines, good laboratory practice). It is a measure of methodological quality or internal validity [11]. The information is present but reveals a flaw.

Conceptual Relationship: "Not Fulfilled" can only be assessed if the relevant information is reported. "Not Reported" creates uncertainty, as the item could have been either fulfilled or not fulfilled in practice. This ambiguity is a key shortcoming of less structured evaluation methods.

Application Notes and Evaluation Protocols

Core Evaluation Protocol

The following stepwise protocol must be followed for each criterion (e.g., test organism specification, control substance performance, concentration verification) within the CRED evaluation matrix [18].

Step 1: Documentation Review

Action: Systematically examine the study publication or report for the specific information required by the criterion.
Question: Is the information explicitly stated, including in tables, footnotes, supplementary materials, or references to other documents (e.g., a referenced but unavailable study plan)?

Step 2: Initial Categorization

If the information is absent, categorize the criterion as "Not Reported (NR)."
If the information is present, proceed to Step 3.

Step 3: Methodological Assessment

Action: Evaluate the reported information against the predefined benchmark for scientific adequacy. This benchmark is derived from standardized test guidelines (e.g., OECD, EPA), statistical principles, and fundamental scientific rigor [18] [28].
Question: Does the reported practice or result meet the accepted standard for the intended purpose of the study?

Step 4: Final Categorization

If the reported information meets the standard, categorize the criterion as "Fulfilled (F)."
If the reported information deviates from or fails to meet the standard, categorize the criterion as "Not Fulfilled (NF)."

Protocol for Specific Criterion Categories

A. Criteria Related to Test Substance & System (e.g., concentration verification, solvent control)

Not Reported: The study does not mention how test concentrations were measured or confirmed during the experiment.
Not Fulfilled: The study reports that concentrations were measured only at the beginning of the test, while the guideline requires periodic measurement for unstable substances.

B. Criteria Related to Test Organism & Design (e.g., control performance, randomization)

Not Reported: There is no statement on the health or biological status of the control organisms at test initiation.
Not Fulfilled: The study reports control organism mortality of 30%, which exceeds the guideline's maximum acceptable limit of 10%.

C. Criteria Related to Data & Reporting (e.g., raw data availability, statistical methods)

Not Reported: The statistical test used to calculate the EC50/LC50 is not named.
Not Fulfilled: The study uses an inappropriate statistical test (e.g., a parametric test on non-normal data without transformation) and provides no justification.

Quantitative Comparison of Klimisch and CRED Methods

The CRED method was developed to address the lack of detail, guidance, and consistency in the widely used Klimisch method [18]. The table below summarizes the structural differences that enable the precise "NR/NF" distinction.

Table 1: Structural Comparison of the Klimisch and CRED Evaluation Methods [18]

Characteristic	Klimisch Method	CRED Evaluation Method
Primary Scope	General toxicity and ecotoxicity studies.	Aquatic ecotoxicity studies (with extensions for nanomaterials, behavior, soil/sediment) [12].
Reliability Criteria	12-14 general prompts for ecotoxicity.	~20 detailed evaluation criteria for reliability, plus ~50 reporting criteria.
Relevance Evaluation	Not formally included.	13 explicit criteria for relevance assessment.
Basis of Evaluation	Heavily dependent on GLP compliance and adherence to standardized guidelines; can overlook flaws in GLP studies [18].	Fit-for-purpose assessment of methodological quality against detailed criteria, independent of GLP status [18] [28].
Output Granularity	Single, subjective categorization (Reliable without/with restrictions, Not reliable, Not assignable).	Transparent, criterion-level scoring ("Fulfilled", "Not Fulfilled", "Not Reported"), leading to an overall reliability grade.
Handling of Information Gaps	Ambiguous; leads to "Not assignable" category, which mixes unreported and unreliably conducted aspects.	Explicitly distinguishes between unreported information ("NR") and reported but inadequate practices ("NF").
Supporting Guidance	Minimal.	Comprehensive guidance documents for applying criteria [18].

Table 2: Impact of Differentiating NR vs. NF: A Ring-Test Outcome Analysis [18]

Evaluation Aspect	Outcome with Klimisch Method	Outcome with CRED Method (enabling NR/NF distinction)
Consistency among Assessors	Low. High variability in categorizing the same study [18].	High. Detailed criteria and guidance reduced subjectivity.
Transparency of Rationale	Low. Categorization rationale is often opaque.	High. Criterion-level assessment provides an audit trail.
Utility for Study Improvement	Low. "Not assignable" does not guide specific improvements.	High. Identifies exact deficiencies (e.g., "NR: solvent control concentration" vs. "NF: control mortality exceeded limit").
Risk Assessor Preference	--	Strong preference for CRED due to accuracy, consistency, and practicality [18].

Visual Workflows for Study Evaluation and Ambiguity Resolution

CRED Study Evaluation Workflow

CRED Evaluation Workflow for Ecotoxicity Studies

Decision Pathway: Not Reported vs. Not Fulfilled

Decision Pathway: Resolving Ambiguity Between NR and NF

The Scientist's Toolkit: Essential Research Reagent Solutions

Table 3: Key Reagents and Materials for CRED-Aligned Ecotoxicity Research

Item / Solution	Function in Ecotoxicity Testing	Relevance to CRED Evaluation & NR/NF Distinction
Certified Reference Materials (CRMs)	Provide traceable, known concentrations of test substances for calibrating analytical equipment (e.g., HPLC, GC-MS).	Enables verification of test concentration ("Fulfilled"). Lack of reporting on calibration using CRMs can lead to "Not Reported" or "Not Fulfilled" [28].
Control/Reference Substances	Standardized toxicants (e.g., KCl for Daphnia, sodium dodecyl sulfate for fish) used to confirm healthy test organisms and laboratory proficiency.	Critical for criterion: "Control performance meets guideline limits." Failure to use or report results leads to "NR" or "NF" [18].
Solvent Controls (Vehicle Controls)	Demonstrate that any carrier solvent (e.g., acetone, DMSO) used to dissolve the test substance has no adverse effect at the highest concentration applied.	Separate assessment criterion. Absence of data is "NR"; adverse effect in solvent control is "NF," invalidating test validity [18].
Formulated Test Substance vs. Active Ingredient	Testing with the formulated product (e.g., pesticide) vs. pure active ingredient. Differentiates toxicity of the chemical from that of co-formulants.	Critical for relevance evaluation. Misidentification or lack of reporting leads to "NR/NF" for substance characterization and flawed risk assessment [18].
Sample Preservation Reagents	Reagents for stabilizing water, sediment, or tissue samples post-collection for later chemical analysis (e.g., acids for metals, amber glass for organics).	Supports criterion on "test concentration verification." Inadequate preservation, if reported, can be "NF"; if unreported, is "NR," casting doubt on chemical analysis reliability.
Standardized Test Organisms	Organisms from certified culture centers ensuring known, consistent genetic and health status (e.g., Daphnia magna, Ceriodaphnia dubia, fathead minnows).	Foundational for test validity. Lack of source information is "NR"; use of non-standard or unhealthy organisms is "NF" [18].

Balancing Guideline Adherence with Scientific Judgment

The evaluation of scientific data for regulatory purposes, such as environmental risk assessment, hinges on the consistent application of predefined criteria. However, exclusive reliance on standardized guidelines can overlook scientific nuance, while unstructured expert judgment can introduce bias and inconsistency. The CRED (Criteria for Reporting and Evaluating Ecotoxicity Data) method provides a structured framework designed to navigate this balance [1] [13]. Developed to improve the transparency and consistency of ecotoxicity study evaluations across regulatory frameworks, CRED addresses the documented shortcomings of the previously dominant Klimisch method, which was criticized for being overly simplistic and biased toward industry-sponsored, guideline studies [1] [9].

This document outlines application notes and protocols for implementing the CRED evaluation method within the broader context of scientific reliability assessment. It integrates CRED's structured criteria with complementary frameworks for analytical method validation and research hypothesis evaluation, providing a multi-faceted toolkit for researchers and assessors. The core thesis is that robust scientific judgment is best exercised within a transparent, detailed, and consistently applied evaluative structure, which CRED provides for ecotoxicity data [2] [12].

Foundational Methodologies and Comparative Analysis

2.1 The CRED Evaluation Framework The CRED method is built on distinct, operational definitions of reliability (the inherent scientific quality of a study) and relevance (the appropriateness of the study for a specific assessment purpose) [1]. It provides assessors with 20 explicit criteria for evaluating reliability and 13 for relevance, each accompanied by extensive guidance to minimize ambiguity [1] [13]. This structure ensures that all key aspects of a study—from test design and statistical analysis to the appropriateness of the test organism and endpoint for the regulatory question—are considered systematically [9].

2.2 Comparison with the Klimisch Method A pivotal ring test involving 75 risk assessors from 12 countries compared the CRED and Klimisch methods [9]. The results, summarized in the table below, demonstrate CRED's superior performance in key metrics related to consistency and usability.

Table 1: Comparative Analysis of the Klimisch and CRED Evaluation Methods Based on Ring Test Results [9]

Evaluation Metric	Klimisch Method	CRED Method	Implication for Scientific Judgment
Number of Evaluation Criteria	4 broad categories (R1-R4) for reliability only [9].	20 reliability and 13 relevance criteria with detailed guidance [1] [13].	CRED structures judgment, reducing reliance on undefined "expert opinion."
Guidance Specificity	Limited, leading to high interpretation variance [1] [9].	Extensive guidance for each criterion [1].	Promotes consistent application across different assessors and institutions.
Perceived Consistency	Lower; evaluations varied significantly between assessors [9].	Higher; detailed criteria reduced discrepancy [9].	Enhances reproducibility of assessment outcomes.
Perceived Transparency	Lower; reasoning behind scores was often opaque [9].	Higher; explicit criteria force documentation of evaluation rationale [9].	Makes the basis for regulatory decisions auditable and debatable.
Handling of Non-Guideline Studies	Tended to favor GLP/OECD guideline studies [1] [9].	Criteria-based; allows for rigorous evaluation of all well-reported studies [1].	Facilitates the incorporation of peer-reviewed science into regulatory processes.
Time Requirement for Evaluation	Perceived as faster due to less detail [9].	Perceived as slightly more time-consuming but more accurate and defensible [9].	Invests time upfront to create a robust, defensible assessment.

2.3 Integration with Analytical Method Validation Principles The reliability of an ecotoxicity study is fundamentally linked to the validity of its underlying analytical and bioanalytical methods. Method validation (MV) is the documented process of proving an analytical method is fit for its intended purpose [29] [30]. However, as with study evaluation, numerous MV guidelines exist with discrepancies in terminology and prescribed performance parameters [29]. The table below aligns common MV parameters with their purpose, providing a checklist for evaluating the methodological foundation of studies under CRED review.

Table 2: Key Performance Parameters for Analytical Method Validation [29]

Validation Parameter	Primary Purpose	Typical Acceptance Criteria Context
Accuracy	Measures closeness of agreement between test result and accepted reference value. Composed of trueness (systematic error) and precision (random error) [29].	Expressed as percent recovery or bias. Should be within predefined limits relevant to the analyte and matrix.
Precision	Measures the dispersion of results under specified conditions. Includes repeatability, intermediate precision, and reproducibility [29].	Expressed as relative standard deviation (RSD). Limits depend on the analysis type and concentration level.
Selectivity/Specificity	Ability to assess the analyte unequivocally in the presence of other components (e.g., impurities, matrix) [29].	Demonstration that the response is due solely to the target analyte.
Limit of Detection (LOD)	Lowest concentration of analyte that can be detected, but not necessarily quantified [29].	Typically a signal-to-noise ratio of 3:1 or based on standard deviation of blank response.
Limit of Quantification (LOQ)	Lowest concentration that can be quantified with acceptable accuracy and precision [29].	Typically a signal-to-noise ratio of 10:1 or based on a predefined precision/accuracy target at low concentration.
Linearity & Range	Ability to obtain results proportional to analyte concentration within a given range [29].	Demonstrated via calibration curve with a suitable coefficient of determination (e.g., R² > 0.99).
Robustness/Ruggedness	Resistance of the method to small, deliberate variations in procedural parameters [29].	Key method performance indicators remain within acceptance criteria when parameters (e.g., pH, temperature) are varied.

2.4 Protocol: Conducting a CRED-Based Study Evaluation Objective: To perform a standardized, transparent evaluation of the reliability and relevance of an aquatic ecotoxicity study for use in regulatory risk assessment. Materials: The study manuscript, CRED evaluation worksheet (Excel tool available from project resources [12]), and relevant regulatory guidance (e.g., EPA, OECD, WFD). Procedure:

Preparation: Familiarize yourself with the 20 reliability and 13 relevance criteria and their guidance notes [1].
Initial Read-Through: Read the study to understand its overall design, objective, and findings.
Systematic Reliability Assessment: For each of the 20 reliability criteria (e.g., "Test concentrations are reported," "Control performance is reported and acceptable"), extract the relevant information from the study text, tables, and supplements. Score the criterion based on the provided guidance, noting the justification for the score in the comments field.
Systematic Relevance Assessment: For each of the 13 relevance criteria (e.g., "Test organism is relevant," "Exposure duration is relevant"), evaluate the study's alignment with the specific purpose of your assessment (e.g., deriving a chronic water quality standard for fish). Document the rationale.
Overall Classification: Based on the pattern of scores, assign an overall reliability category (e.g., Reliable with Restrictions) and relevance category (e.g., Relevant with Restrictions). The CRED method does not prescribe a rigid algorithm for this step; it requires integrative scientific judgment informed by the detailed scoring [1].
Documentation: The completed worksheet serves as the transparent audit trail for the evaluation, capturing both the data and the reasoned judgment applied.

Advanced Application: Validating Research Hypotheses and Models

3.1 Metrics for Hypothesis Quality Evaluation Before a study is conducted, the quality of its foundational hypothesis determines its potential scientific and regulatory value. A validated metrics instrument for clinical research hypotheses [31] can be adapted for ecotoxicology. The comprehensive version evaluates validity, significance, novelty, clinical (environmental) relevance, feasibility, ethicality, testability, and clarity on a 5-point Likert scale [31]. Applying such a structured assessment during study design or grant/protocol review balances creative scientific inquiry with disciplined, goal-oriented research planning.

3.2 Protocol: Applying a Hypothesis Validation Framework Objective: To systematically assess the quality and viability of a proposed ecotoxicity research hypothesis before experimental investment. Materials: Hypothesis statement, description of the research context, and the hypothesis evaluation instrument [31]. Procedure:

Instrument Adaptation: Adapt the dimension "clinical relevance" to "environmental relevance." Define sub-items for "validity" (e.g., biological plausibility, consistency with existing data).
Independent Scoring: Have at least three experts (e.g., toxicologists, ecologists, risk assessors) score the hypothesis using the instrument.
Calibration and Analysis: Calculate the Intra-class Correlation Coefficient (ICC) to measure inter-rater agreement [31]. If agreement is low, convene a discussion to calibrate understanding of the criteria.
Decision Integration: Use the aggregated scores and expert comments to refine the hypothesis, adjust the experimental design, or make a go/no-go decision for the research project. This protocol adds a layer of structured pre-validation to the research pipeline.

3.3 Robust Model Validation in Data-Scarce Contexts Predictive ecotoxicological models (e.g., QSARs, population models) face validation challenges akin to credit default models: limited data, imbalanced outcomes, and the risk of overfitting [32]. Robust cross-validation techniques are essential. Core Principle: Standard k-fold cross-validation can produce unstable performance estimates with small or imbalanced datasets. Robust cross-validation methods strive to create folds that are more homogeneous in their distribution of features and outcomes, leading to more stable and reliable estimates of model error [32]. Application: When validating a QSAR model for acute toxicity with a small dataset containing few highly toxic compounds, employ a robust cross-validation strategy that ensures each fold contains a representative proportion of these rare, high-severity events. This approach provides a more realistic and conservative estimate of how the model will perform on new chemicals [32].

Integrated Workflow and Toolkit

4.1 Visual Workflow: The Integrated Evaluation Pathway The following diagram illustrates the sequential and iterative relationship between hypothesis validation, study conduct, method validation, and final study evaluation using frameworks like CRED.

Diagram 1: Integrated Pathway for Evaluated Research

4.2 The Scientist's Toolkit: Essential Research Reagent Solutions This table details key materials and tools essential for conducting and evaluating ecotoxicity studies within a rigorous, validated framework.

Table 3: Research Reagent Solutions for Ecotoxicity Testing & Evaluation

Item/Category	Function & Description	Role in Guideline Adherence & Judgment
Certified Reference Materials (CRMs)	Substances with a certified purity, concentration, or property. Used to calibrate equipment and validate analytical methods [29].	Provides the metrological traceability required for guideline compliance. Essential for establishing trueness in method validation.
OECD Standard Test Organisms	Cultured, genetically consistent populations of species like Daphnia magna (OECD 202) or rainbow trout (OECD 203).	Adherence to guidelines ensures reproducibility and regulatory acceptance. Scientific judgment is applied in selecting the most relevant species for the specific chemical and ecosystem.
Positive & Negative Control Substances	Chemicals with known, consistent toxic (e.g., potassium dichromate for Daphnia) or non-toxic effects.	Validates test system responsiveness and health (guideline requirement). Their performance is a key reliability criterion in CRED evaluation [1].
Analytical Grade Solvents & Reagents	High-purity solvents (e.g., HPLC-grade water, acetone) for stock solution preparation and chemical analysis.	Minimizes interference, ensuring reported effects are due to the test substance. Critical for achieving the selectivity and accuracy targets of method validation [29].
CRED Evaluation Excel Tool [12]	A structured spreadsheet implementing the 20 reliability and 13 relevance criteria with guidance.	The primary tool for applying structured scientific judgment. Enforces systematic evaluation, ensuring both guideline adherence (through criteria) and transparent documentation of expert reasoning.
Statistical Analysis Software (e.g., R)	Open-source environment for executing robust statistical analyses, including advanced cross-validation techniques [33] [32].	Allows for judgment in selecting analysis appropriate to the data structure (e.g., robust CV for small datasets) beyond basic guideline prescriptions, enhancing reliability assessment.

Integrating Non-Standard and Academic Studies into Regulatory Assessments

The Criteria for Reporting and Evaluating ecotoxicity Data (CRED) method represents a significant advancement in environmental hazard and risk assessment, developed to address the limitations of the long-standing Klimisch evaluation method [18]. Within the broader thesis on improving the reliability evaluation of ecotoxicity studies, the CRED framework provides the necessary structured approach for integrating diverse data sources into regulatory assessments.

Traditional regulatory assessments have historically favored standardized tests conducted under Good Laboratory Practice (GLP), often sidelining valuable data from non-standard and academic studies [18]. This preference creates significant data gaps and potentially overlooks critical hazard information. The CRED method, with its transparent, detailed criteria for evaluating both reliability and relevance, establishes a scientifically robust pathway for incorporating these diverse data streams [18]. This integration is essential for developing more comprehensive chemical safety profiles, especially for data-poor substances, and aligns with global mandates to utilize all available information while reducing animal testing [34] [35].

Comparative Analysis of Evaluation Frameworks

A comparative analysis reveals why the CRED method is particularly suited for integrating non-standard data, offering more granularity and transparency than its predecessors.

Table 1: Comparison of Ecotoxicity Study Evaluation Frameworks

Feature	Klimisch Method (1997)	CRED Method (2016)	Integrated Eco-Human DQA Needs [34]
Primary Scope	Broad toxicity & ecotoxicity	Aquatic ecotoxicity (detailed)	Integrated ERA & HHRA
Reliability Criteria	12-14 general criteria [36]	~20 detailed criteria [18]	Explicit, separated from relevance
Relevance Criteria	Not formally included [18]	13 specific criteria [18]	Explicit, separated from reliability
Guidance for Use	Limited, high reliance on expert judgement	Detailed guidance provided [18]	Transparent, objective guidance
Handling of Non-Standard Studies	Implicit bias towards GLP/OECD studies [18]	Explicit criteria for evaluating technical quality	Objective criteria for varied data typology
Outcome Consistency	Low; leads to assessor discrepancy [18]	High; promotes harmonized assessment [18]	Promotes consistency across domains

The Klimisch method, while foundational, lacks specific guidance for relevance and provides limited reliability criteria, leading to evaluations that are inconsistent and overly dependent on an assessor's inherent trust in standardized protocols [18]. In contrast, the CRED method explicitly details both reliability and relevance criteria, covering essential study elements from test substance characterization to statistical analysis [18]. This structured approach reduces subjectivity, making it equally applicable to OECD guideline studies and well-conducted academic research. Furthermore, the CRED method aligns with the identified need for a common Data Quality Assessment (DQA) system that can transversely apply to both environmental and human health targets, as highlighted in reviews of integrated risk assessment frameworks [34].

Application Notes: Integrating Non-Standard and Academic Studies

Core Challenges and the CRED Solution

Integrating non-standard studies (e.g., non-guideline in vivo tests, in vitro assays, field studies) and academic literature into regulatory dossiers presents distinct challenges. These include variable reporting quality, the use of novel endpoints or species, and a lack of Good Laboratory Practice (GLP) certification [18]. The historical regulatory bias towards standardized data has often led to the automatic exclusion of such studies, irrespective of their scientific merit [34] [18].

The CRED method addresses these challenges by shifting the evaluation focus from a study's administrative pedigree to its intrinsic scientific quality. It provides the toolset to systematically dissect any study's methodology and reporting against a comprehensive checklist. This allows an assessor to distinguish a poorly conducted guideline study from a meticulously performed academic investigation with high relevance to a specific regulatory question. For instance, a non-standard microcosm study on a novel endocrine endpoint can be evaluated for its controlled conditions, statistical power, and clear dose-response, potentially earning a high reliability score despite being non-guideline.

Protocol for Reliability Assessment

The CRED reliability evaluation is a phased process examining four key domains [18]:

Test Substance Characterization: Assesses the documentation of the chemical's identity, purity, stability, and dosing formulation.
Test Organism & Design: Evaluates the selection and health of test organisms, the appropriateness of the experimental design (e.g., controls, replicates, exposure regime), and the climatic/physicochemical conditions.
Endpoint Analysis & Statistics: Reviews the clarity and appropriateness of the measured endpoints and the statistical methods used for analysis and data presentation.
Inherent Study Quality: A holistic consideration of the study's plausibility, consistency, and the likelihood of biases.

Each domain contains specific criteria, guiding the assessor to a conclusion on reliability (Reliable, Reliable with Restrictions, or Not Reliable) based on the aggregate of deficiencies found, their severity, and their potential impact on the results [18].

Protocol for Relevance Assessment

This is a critical step for integrating non-standard data. CRED's relevance assessment ensures the study is fit for the specific regulatory purpose. Key criteria include [18]:

Biological Relevance: Are the test species, life stage, and measured endpoints (e.g., molecular, physiological, population-level) appropriate for the protective goal?
Exposure Relevance: Do the exposure pathway, duration, and concentrations reflect plausible real-world scenarios?
Environmental Relevance: For ecotoxicity, are the test conditions (e.g., water chemistry, temperature) representative of the receiving environment?

A study on fish gill cell lines (a non-standard New Approach Methodology - NAM) may be deemed highly relevant for screening oxidative stress mechanisms but of limited relevance for deriving a chronic population-level no-effect concentration. This explicit relevance scoring prevents the misuse of data while unlocking the value of NAMs for specific assessment questions [35].

Diagram: CRED Evaluation Workflow for Study Integration

The Scientist's Toolkit: Research Reagent Solutions

Table 2: Key Reagents, Tools, and Databases for Integrated Ecotoxicity Assessment

Item / Resource	Primary Function in Assessment	Role in Integrating Non-Standard Data
CRED Evaluation Checklist [18]	Provides standardized criteria for assessing study reliability and relevance.	The core tool for objectively scoring academic and non-guideline studies, replacing subjective bias.
ECOTOX Knowledgebase [37]	Curated database of single-chemical ecotoxicity test results.	Serves as a benchmark for "standard" data and a source for contextualizing novel findings from academic studies.
Analytical Grade Test Substances & Certified Reference Materials	Ensures dosing accuracy and reproducibility in laboratory studies.	Critical for evaluating the 'Test Substance' domain in CRED; lack of characterization is a major flaw in non-standard studies.
Defined Media & Control Formulations	Provides consistent baseline conditions for toxicity testing.	Allows assessors to judge if non-standard studies maintained adequate experimental control, a key reliability factor.
High-Throughput Screening (HTS) Assay Kits (e.g., for cytotoxicity, specific pathways)	Enables rapid, mechanism-based toxicity profiling [35].	CRED relevance criteria help determine how data from these NAMs can be used in a regulatory context (e.g., screening, WoE).
Systematic Review Software (e.g., DistillerSR, Rayyan)	Manages the literature search, screening, and data extraction process.	Supports the transparent and auditable integration of academic literature, a requirement for robust regulatory assessment.

Structured Assessment Protocol (SOP)

This protocol outlines the steps for systematically identifying, evaluating, and integrating non-standard and academic studies within a CRED-based framework, consistent with systematic review principles [37].

5.1. Literature Search & Study Identification

Objective: To comprehensively identify all potentially relevant studies, including peer-reviewed academic literature and credible grey literature (e.g., theses, reputable agency reports).
Procedure:
- Define the Assessment Question (PECO/PICO: Population, Exposure, Comparator, Outcome).
- Develop a search string using chemical identifiers (name, CAS RN) and key toxicity terms. Execute across multiple databases (e.g., PubMed, Scopus, Web of Science).
- Import results into systematic review software. Remove duplicates.
- Perform title/abstract screening against pre-defined applicability criteria (e.g., original ecotoxicity data, relevant species and endpoint) [37].
- Retrieve full texts of passing references for the next phase.

5.2. Data Extraction & CRED Evaluation

Objective: To consistently extract key data and perform a transparent CRED evaluation.
Procedure:
- For each included study, populate a standardized data extraction form with fields for: Chemical info, test organism, exposure design, endpoints/results, and key methodological details.
- Two independent assessors apply the CRED checklist to evaluate reliability and relevance [18].
- Resolve any discrepancies in scoring through discussion or consultation with a third reviewer.
- Document the final CRED scores and a narrative summary justifying the ratings, noting any critical strengths or deficiencies.

5.3. Integration into Weight-of-Evidence (WoE)

Objective: To synthesize findings from standard and non-standard studies into a coherent hazard assessment.
Procedure:
- Tabulate all studies with their CRED reliability/relevance scores and key results (e.g., NOEC, LC50).
- Weigh the evidence: Studies rated "Reliable" and "Highly Relevant" carry the most weight. Studies "Reliable with Restrictions" may be included with appropriate caveats.
- Perform a sensitivity analysis to understand how including or excluding non-standard data affects the overall assessment conclusion (e.g., the derived Predicted No-Effect Concentration).
- Clearly articulate in the final assessment report how each piece of evidence was evaluated and integrated, providing full transparency from search to conclusion.

The integration of non-standard and academic studies into regulatory assessments is not merely a data-gap filling exercise but a fundamental step towards more robust, predictive, and efficient chemical safety science. The CRED method provides the essential, structured framework to make this integration scientifically defensible and transparent.

Future advancements will involve the further digitalization of this process, with computational tools potentially automating parts of the CRED evaluation against machine-readable study reports. Furthermore, the principles embedded in CRED are directly applicable to the evaluation of New Approach Methodologies (NAMs), from high-throughput in vitro assays to in silico models [35]. As the toxicological paradigm shifts, frameworks like CRED will be critical for building regulatory confidence in these new data streams by ensuring they are evaluated with appropriate rigor and contextual relevance. Ultimately, the adoption of such integrated, objective evaluation systems supports the development of a stronger, more comprehensive evidence base for protecting human health and the environment.

The reliability of ecotoxicity studies is a foundational pillar for the environmental hazard and risk assessment of chemicals, directly informing regulatory decisions for pharmaceuticals, industrial chemicals, and plant protection products [18]. For decades, the Klimisch method has served as the primary tool for evaluating study reliability. However, this approach has been critically scrutinized for its lack of detailed guidance, which leads to inconsistent evaluations among experts and an over-reliance on studies conducted under Good Laboratory Practice (GLP), potentially excluding valuable peer-reviewed data [18]. These inconsistencies can directly impact risk assessment outcomes, leading to either unnecessary mitigation measures or the underestimation of environmental threats [18].

In response, the Criteria for Reporting and Evaluating ecotoxicity Data (CRED) method was developed to introduce greater transparency, detail, and consistency [18]. The CRED framework provides explicit criteria for assessing both the reliability (20 criteria) and relevance (13 criteria) of aquatic ecotoxicity studies, moving beyond the Klimisch method's more subjective and limited scope [2]. A comprehensive ring test demonstrated that risk assessors found the CRED method to be more accurate, consistent, and less dependent on individual expert judgment [18].

This document posits that the full potential of the CRED method can only be unlocked through strategic integration with artificial intelligence (AI) and automation. Parallel advancements in financial credit management—where AI automates up to 95% of tasks, improves decision accuracy, and manages risk proactively—provide a compelling blueprint [38] [39]. This article outlines detailed application notes and protocols for future-proofing ecotoxicity evaluations by leveraging AI to automate the CRED process, enhance its analytical power, and ensure its scalability and consistency for global regulatory use.

Quantitative Foundations: CRED vs. Klimisch and AI Adoption Benchmarks

The transition from the Klimisch method to CRED represents a quantitative leap in evaluation rigor. The following tables summarize key data from comparative studies and relevant AI adoption metrics that inform the integration strategy.

Table 1: Ring Test Results Comparing the Klimisch and CRED Evaluation Methods [18]

Metric	Klimisch Method	CRED Method	Implication for Evaluation Quality
Number of Evaluation Criteria	12-14 (reliability only)	20 (reliability) + 13 (relevance)	CRED enables a more granular, comprehensive, and structured assessment.
Consistency Among Assessors	Lower	Higher	CRED's detailed criteria reduce subjectivity and improve harmonization across different evaluators.
Participant Perception (from Ring Test)	More dependent on expert judgement	Less dependent, more accurate & practical	CRED is perceived as a more transparent and user-friendly framework.
Guidance for Relevance Evaluation	No specific criteria provided	13 explicit relevance criteria	CRED formally assesses the appropriateness of data for a specific hazard identification, a critical step the Klimisch method lacks.

Table 2: AI Performance Benchmarks from Financial Credit Analysis (Analogous Applications) [38] [39] [40]

Process Area	AI Automation/Improvement Metric	Potential Analog in CRED Evaluation
Data Processing & Triage	Automation of up to 95% of routine loan manufacturing tasks [39].	Automated ingestion and initial categorization of study metadata (e.g., organism, endpoint, substance).
Risk & Reliability Scoring	3x improvement in credit scoring accuracy through machine learning [40].	Enhanced, consistent scoring of study reliability against the 20 CRED criteria using trained models.
Decision Speed	Reduction of decision-making time from 20-30 days to 2-24 hours [40].	Drastically reduced time for initial study triage and reliability flagging.
Predictive Analytics	Prediction of customer payment behavior and default risk [38].	Identification of studies with a high probability of being deemed "not reliable" or of critical relevance.
Operational Efficiency	Up to 90% of lending workflows fully automated [40].	Creation of an automated pipeline from study collection to summarized evaluation reports.

Core Experimental Protocol: The CRED Evaluation Workflow

The following protocol details the steps for manually conducting a CRED evaluation, which forms the basis for subsequent automation.

Protocol: Manual CRED Evaluation of an Aquatic Ecotoxicity Study

Objective: To consistently evaluate the reliability and relevance of a given aquatic ecotoxicity study for use in environmental hazard and risk assessment.

Materials:

CRED Evaluation Sheet (Excel-based tool) [12] [2].
The full text of the peer-reviewed ecotoxicity study to be evaluated.
Relevant OECD test guidelines (e.g., OECD 201, 210, 211) for reference [18].

Procedure:

Study Registration & Triage:
- Record the study's bibliographic information (authors, year, journal).
- Identify the test substance, test organism(s), exposure duration, and measured endpoint(s).
- Determine the broad relevance of the study to the current assessment question (e.g., is the organism and endpoint appropriate?).

Reliability Assessment (20 Criteria): Systematically evaluate the study against each of the 20 reliability criteria. These are grouped into key domains:
- Test Substance Characterization: Assess reporting of source, purity, chemical identification, and concentration verification.
- Test Organism & System: Evaluate information on organism source, life-stage, health, and test system design.
- Exposure & Control: Scrutinize details of exposure regime, control treatments, and environmental conditions (pH, temperature, oxygen).
- Data & Reporting: Examine the clarity of results, statistical methods, and dose-response documentation.
- For each criterion, assign a score (e.g., "Fully Reported," "Partially Reported," "Not Reported," "Not Applicable") based on the provided guidance.
Relevance Assessment (13 Criteria): Evaluate the study's relevance to the specific regulatory assessment. This includes:
- Biological Relevance: Appropriateness of the endpoint (e.g., mortality, growth, reproduction) and its ecological significance.
- Exposure Relevance: Consideration of the test concentrations in relation to expected environmental levels.
- Temporal & Spatial Relevance: Alignment of exposure duration and test system with the assessment scenario.
Overall Classification & Documentation:
- Synthesize the scores from the reliability assessment to assign an overall reliability category (e.g., "Reliable," "Reliable with Restrictions," "Not Reliable").
- Formulate a narrative summary of the relevance evaluation.
- Document all justifications for scores within the evaluation sheet to ensure transparency and auditability.

Validation Note: This manual protocol mirrors the process used in the international ring test that validated the CRED method, involving 75 risk assessors from 12 countries [18].

AI-Augmented Protocol: An Automated and Intelligent CRED Pipeline

Building on the manual foundation, this protocol integrates AI to create a scalable, consistent, and predictive evaluation system.

Protocol: AI-Augmented CRED Evaluation Pipeline

Objective: To automate the systematic extraction, scoring, and prioritization of data from ecotoxicity studies using the CRED framework, enhancing throughput and consistency.

Materials:

AI/ML Platform: A secure, governable AI platform capable of natural language processing (NLP) and machine learning (ML) [41].
CRED Knowledge Base: A digitized and structured version of the CRED criteria, scoring rules, and OECD guideline requirements.
Training Dataset: A curated set of ecotoxicity studies that have been authoritatively evaluated using CRED (e.g., from the ring test) [18].
Literature Corpus: Access to scientific databases (e.g., PubMed, Web of Science) for study acquisition.

Procedure:

Intelligent Study Ingestion & Data Extraction:
- Use NLP agents to automatically ingest study PDFs from targeted searches or uploaded batches.
- Deploy custom-trained models to extract specific entities: Test Substance (names, CAS numbers, concentrations), Test Organism (species, life stage), Experimental Design (duration, endpoint, controls), and Key Results (EC/LC/NOEC values, statistical data).

Automated Criteria Scoring & Flagging:
- Map extracted data points to the corresponding CRED reliability and relevance criteria.
- Apply rule-based algorithms for initial scoring (e.g., if "control mortality" is reported as >20%, flag criterion "Adequacy of Controls" as "Not Met").
- Use ML models trained on the expert-evaluated dataset to predict scores for ambiguous or complex criteria, providing a confidence score for each prediction.
Predictive Triage & Prioritization:
- Implement a predictive ranking system that scores studies based on their likely overall reliability and specific relevance to a pending assessment question.
- Flag studies with a high probability of being "Not Reliable" for expedited human review, and prioritize studies scoring highly on both reliability and relevance for in-depth analysis.
Human-in-the-Loop Review & Audit:
- Present the AI-generated evaluation in an interactive dashboard. Evaluators review flagged items, override automated scores with justification, and confirm final classifications.
- The system logs all automated actions and human interventions, creating a transparent audit trail essential for regulatory acceptance and model refinement [41] [40].
Continuous Model Learning:
- Use confirmed human evaluations as new training data to continuously retrain and improve the scoring and prediction models, closing the feedback loop.

Governance and Specialized Adaptations

AI Governance for Scientific Evaluation The implementation of AI in a regulatory science context necessitates a robust governance framework to ensure scientific integrity, fairness, and accountability. Key principles adapted from financial AI governance include [41] [40]:

Explainability: The AI system must provide clear explanations for its scoring predictions, using techniques like SHAP or LIME, to allow evaluators to understand the rationale behind automated flags [40].
Bias Monitoring: Regular audits must be conducted to ensure training data and model outputs do not perpetuate bias, such as unfairly favoring certain study types (GLP vs. academic) without scientific merit [40].
Human Oversight: Final classification decisions and relevance judgments must remain under the authority of qualified scientific evaluators (the "human-in-the-loop") [42].
Data Security & Compliance: The system must handle sensitive and proprietary study data with high security, adhering to relevant data protection standards [41].

Expanding CRED's Scope with Specialized Tools The core CRED framework for aquatic ecotoxicity is being extended to address complex testing scenarios, creating distinct pathways for AI automation:

NanoCRED: A modified framework for evaluating ecotoxicity studies of nanomaterials, accounting for unique characteristics like particle size, coating, and agglomeration state [12]. AI can assist in tracking and evaluating nanomaterial-specific descriptors across studies.
EthoCRED: A framework for assessing the reliability and relevance of behavioral ecotoxicity studies, which involve complex endpoints like swimming, feeding, or social behavior [12]. AI-powered video analysis tools can be integrated to objectively quantify behavioral endpoints from raw study data.
CRED for Sediment and Soil: An adaptation for terrestrial and benthic studies, incorporating criteria specific to these matrices [12]. AI can help model bioavailability based on extracted data about soil/sediment properties.

The Scientist's Toolkit: Essential Research Reagent Solutions

Table 3: Key Tools and Platforms for AI-Augmented CRED Evaluation

Tool / Solution Category	Example / Function	Role in AI-Augmented CRED Protocol
CRED Evaluation Software	Official CRED Excel Tool [12] [2]	The foundational scoring matrix; the target schema for AI output formatting and human review interface.
Literature Access & Management	Scientific Databases (e.g., PubMed, Wiley) with API access.	Provides the raw input stream of study literature for automated ingestion and processing.
AI/ML & NLP Platform	Governable AI platforms (e.g., custom solutions on cloud AI services).	The engine for document parsing, entity extraction, predictive scoring, and workflow automation [41].
Model Training & Validation Set	Curated library of studies with expert CRED scores (e.g., from Kase et al., 2016 [18]).	The essential "reagent" for training and validating supervised machine learning models to perform CRED scoring.
Explainable AI (XAI) Library	Open-source libraries like SHAP or LIME [40].	Provides post-hoc explanations for model predictions, crucial for evaluator trust and regulatory transparency.
Audit & Version Control System	Integrated logging within the AI platform and Git for model/code versioning.	Ensures reproducibility, tracks all system and human decisions, and maintains model lineage for compliance [41].

Evidence and Impact: How CRED Performs Against Other Methods

This application note provides a detailed protocol for the implementation and validation of the Criteria for Reporting and Evaluating ecotoxicity Data (CRED) method, a standardized framework developed to overcome critical inconsistencies in the reliability and relevance assessment of aquatic ecotoxicity studies. Central to this note are the quantitative outcomes of a comprehensive international ring test, which demonstrated that the CRED method significantly improves inter-assessor consistency compared to the traditionally used Klimisch method. By providing explicit criteria for evaluating 20 reliability and 13 relevance aspects, the CRED method reduces subjective expert judgement, increases transparency, and promotes the inclusion of high-quality peer-reviewed data in regulatory decision-making for chemicals, pharmaceuticals, and plant protection products [18] [2]. The protocols herein are framed within the broader thesis of advancing the CRED method as a cornerstone for evaluating ecotoxicity study reliability, ultimately supporting more robust and harmonized environmental risk assessments [43].

The regulatory assessment of chemicals hinges on the quality of underlying ecotoxicity data. For decades, the Klimisch method has been the dominant tool for evaluating study reliability, categorizing studies as "reliable without restrictions," "reliable with restrictions," "not reliable," or "not assignable" [18]. However, this method has been widely criticized for its lack of detailed criteria, over-reliance on Good Laboratory Practice (GLP) status, and insufficient guidance, leading to inconsistent evaluations between different risk assessors [18]. Such inconsistencies can directly impact risk assessment outcomes, potentially leading to either unnecessary mitigation measures or underestimated environmental risks [18].

The Criteria for Reporting and Evaluating ecotoxicity Data (CRED) project was initiated to address these shortcomings. Developed from OECD test guidelines and existing evaluation frameworks, the CRED method provides a transparent, criteria-based system for evaluating both the reliability (20 criteria) and relevance (13 criteria) of aquatic ecotoxicity studies [18] [2]. This application note details the experimental validation of this method through a formal ring test and provides the protocols for its application, supporting the broader research goal of standardizing ecotoxicity data evaluation to ensure robust and science-based risk assessments [43] [2].

Quantitative Data on Assessor Consistency: Ring Test Results

A two-phased international ring test was conducted to quantitatively compare the performance of the CRED and Klimisch methods. In total, 75 risk assessors from 12 countries participated, evaluating eight different aquatic ecotoxicity studies covering various organisms (e.g., Daphnia magna, fish, algae) and chemical classes (e.g., pharmaceuticals, biocides) [18].

Key Quantitative Findings

The analysis yielded clear, quantifiable evidence of the CRED method's superior consistency and user perception.

Table 1: Quantitative Comparison of Method Consistency & Outcomes

Metric	Klimisch Method	CRED Method	Implication
Inter-assessor Agreement	Low	Substantially Higher	CRED reduces subjective interpretation [18].
Perceived Dependence on Expert Judgement	High	Low	CRED's structured criteria guide the evaluation [18].
Average Time for Evaluation	Not Reported	~2 Hours	Structured criteria may initially require more time but improve standardization [18].
Number of Defined Reliability Criteria	12-14	20	Enables more granular and transparent assessment [18].
Number of Defined Relevance Criteria	0	13	Integrates critical relevance assessment formally into the process [18].

Table 2: Ring Test Participant Perception Survey Results

Perception Aspect	Majority Preference	Key Supporting Feedback
Transparency & Guidance	CRED Method	CRED provides clearer, more detailed guidance for evaluation [18].
Accuracy & Consistency	CRED Method	Perceived as more accurate and likely to yield consistent results between assessors [18].
Practicality of Criteria	CRED Method	The specific criteria were found to be practical and useful [18].
Reduction of Arbitrariness	CRED Method	The method is less dependent on individual expert judgement [18].

The data show that the CRED method successfully addressed the core flaw of the Klimisch method by significantly improving evaluator consistency. This quantitative validation is crucial for its adoption as a standardized tool in regulatory and research contexts [18].

Experimental Protocol: Implementing the CRED Evaluation Method

This protocol outlines the step-by-step procedure for conducting a CRED-based reliability and relevance evaluation of an aquatic ecotoxicity study.

Pre-Evaluation Preparation

Objective: To ensure the evaluator is equipped with the correct tools and a full understanding of the study.
Materials: CRED evaluation worksheet (Excel tool), the study manuscript or report, relevant OECD Test Guideline (if applicable), CRED guidance document [12] [2].
Procedure:
- Acquire the final CRED Excel tool from the official repository [12].
- Read the study in its entirety to understand the objectives, methodology, and results.
- Identify the appropriate OECD Test Guideline upon which the study is or should be based.
- Familiarize yourself with the 20 reliability criteria (e.g., test substance characterization, test organism details, exposure conditions, statistical analysis) and 13 relevance criteria (e.g., representativeness of endpoint, exposure duration, ecological realism) in the CRED checklist [18].

Phase 1: Reliability Evaluation

Objective: To systematically assess the intrinsic quality of the experimental work and reporting.
Procedure:
- For each of the 20 reliability criteria, answer the prompted questions in the CRED worksheet.
- Assign a score for each criterion (e.g., Yes/No/Partly/Not Reported).
- Critical Step: Provide a concise written justification in the tool for each score, referencing specific lines, tables, or figures in the study. This creates an audit trail.
- Based on the aggregate scores and guided by the CRED decision scheme, assign an overall reliability classification: Reliable, Reliable with Restrictions, or Not Reliable.

Phase 2: Relevance Evaluation

Objective: To judge the appropriateness of the study for the specific hazard or risk assessment context.
Procedure:
- For each of the 13 relevance criteria, answer the prompted questions in the CRED worksheet.
- Similar to reliability, assign a score and a mandatory written justification for each criterion.
- Assign an overall relevance classification: High, Medium, or Low.

Final Integration and Reporting

Objective: To synthesize the evaluations into a final conclusion for use in risk assessment.
Procedure:
- Combine the reliability and relevance classifications to produce a final study utility conclusion (e.g., "Usable without restrictions," "Usable with restrictions," "Not usable").
- The completed CRED worksheet, with all scores and justifications, serves as the transparent evaluation record. This record should be archived alongside the risk assessment documentation.

Protocol for Conducting a Validation Ring Test

This protocol describes the methodology used to generate the quantitative comparison data, adaptable for validating other assessment methods.

Design and Participant Recruitment

Objective: To ensure a robust, statistically sound comparison.
Procedure:
- Select a diverse set of studies (8-12) representing different organisms, endpoints, and chemical classes [18].
- Recruit a large panel of assessors (n>50) from multiple organizations and countries to ensure representativeness [18].
- Employ a crossover or parallel design: In the CRED ring test, each participant evaluated two studies using the Klimisch method and two different studies using the CRED method, preventing bias from familiarity with a specific study [18].

Execution and Data Collection

Objective: To collect consistent evaluation data and subjective feedback.
Procedure:
- Provide participants with standardized training materials for both methods.
- Distribute studies and evaluation sheets in a blinded fashion.
- Collect completed evaluations along with time-to-completion data.
- Administer a standardized perception survey to gather qualitative feedback on both methods' clarity, ease of use, and perceived consistency [18].

Data Analysis

Objective: To quantify agreement and analyze perceptions.
Procedure:
- Calculate Inter-assessor Agreement: Use statistical measures (e.g., percentage agreement, Cohen's Kappa) to quantify the consistency of final classifications (Reliable/Not Reliable) for each study under each method.
- Analyze Classification Outcomes: Compare the distribution of final classifications (e.g., % of studies rated "Reliable") between methods.
- Analyze Survey Data: Tabulate quantitative and qualitative feedback to compare user perception of the two methods.

Visualizations: Workflow and Ring Test Design

CRED vs. Klimisch Method Workflow Comparison (Max Width: 760px)

Ring Test Design for Validating the CRED Evaluation Method (Max Width: 760px)

The Scientist's Toolkit: Essential Research Reagent Solutions

Table 3: Key Tools and Resources for CRED Implementation and Ecotoxicity Data Evaluation

Item Name	Function & Purpose	Application in CRED/Ecotoxicity Research
CRED Excel Evaluation Tool	A structured spreadsheet containing the 20 reliability and 13 relevance criteria with scoring fields and automated classification guides [12] [2].	The primary platform for conducting transparent, auditable study evaluations. Ensures all assessors follow the identical structured process.
OECD Test Guidelines (e.g., 201, 210, 211)	Internationally agreed-upon testing methodologies for specific ecotoxicity endpoints (e.g., algal growth inhibition, Daphnia reproduction) [18].	Serves as the foundational benchmark against which study design and reporting are evaluated for reliability within the CRED framework.
Trigit Web Application	A free, rapid tool for objective colorimetric analysis of images, extracting RGB and other color space values [44].	Useful for standardizing and quantifying color-based endpoints in ecotoxicity tests (e.g., algal chlorophyll, enzyme-linked assays), reducing subjective interpretation of results.
Data Validation & Visualization Software (e.g., R, Python libraries, Graphviz)	Tools for statistical analysis, data validation, and generating standardized charts/graphs [45] [46].	Critical for analyzing ring test results (calculating agreement statistics) and creating clear visualizations of ecotoxicity data and method comparisons for reporting.
Chemical Databases (e.g., EPA CompTox Dashboard)	Curated databases providing physicochemical, hazard, and exposure data for chemicals [43].	Assists in verifying test substance characterization (a key CRED reliability criterion) and contextualizing the relevance of findings within a broader risk assessment framework.

The regulatory assessment of chemicals hinges on the quality and interpretability of ecotoxicity studies. For decades, the Klimisch method has been the cornerstone for evaluating study reliability within frameworks like REACH, utilizing a four-category scoring system [47]. However, its reliance on broad criteria and expert judgment has raised concerns about consistency and transparency [18]. In response, the Criteria for Reporting and Evaluating ecotoxicity Data (CRED) method was developed to provide a more structured, detailed, and transparent framework for assessing both the reliability and relevance of aquatic ecotoxicity studies [13]. This analysis, framed within broader thesis research on advancing ecotoxicity evaluation, provides a detailed, applied comparison of these two methodologies, supported by experimental protocols and case study insights.

Methodological Comparison: Structure, Criteria, and Output

The fundamental architectures of the Klimisch and CRED methods differ significantly in scope, granularity, and guiding philosophy, as summarized in the table below.

Table 1: Structural Comparison of the Klimisch and CRED Evaluation Methods [18] [48]

Characteristic	Klimisch Method	CRED Method
Primary Scope	General toxicological & ecotoxicological data [47]	Aquatic ecotoxicity studies [18]
Evaluation Dimensions	Reliability only [18]	Reliability and Relevance [13]
Number of Criteria	12-14 reliability criteria [48]	20 reliability and 13 relevance criteria [13]
Guidance Provided	Minimal; heavily dependent on expert judgement [18]	Detailed guidance for each criterion [18]
Basis for Reliability	Adherence to GLP and standardized test guidelines is heavily weighted [18] [47]	Detailed assessment of test design, performance, and reporting against 20 specific criteria [13]
Output Format	Single score (1-4) for reliability [47]	Qualitative summaries for both reliability and relevance, supported by criterion-level documentation [18]

The Klimisch method assigns studies to one of four categories:

Reliable without restrictions: Studies performed according to international guidelines and GLP.
Reliable with restrictions: Scientifically acceptable studies not fully guideline-compliant.
Not reliable: Studies with major methodological flaws.
Not assignable: Insufficiently documented studies [47].

In contrast, the CRED method deconstructs the evaluation into two parallel streams. The reliability assessment examines 20 criteria across categories like test organism, exposure design, and statistical analysis. The separate relevance assessment evaluates 13 criteria pertaining to the test's environmental realism and regulatory applicability (e.g., representative species, relevant endpoint, appropriate exposure pathway) [18] [13]. This bifurcated, criterion-driven approach is designed to reduce ambiguity.

Experimental Protocol: The CRED Ring Test Design

A definitive, large-scale ring test was conducted to compare the two methods empirically. The following protocol outlines its design [18].

Objective: To compare the consistency, user perception, and practical application of the Klimisch and CRED evaluation methods.

Phase I (Klimisch Evaluation):

Period: November–December 2012.
Participants: 75 risk assessors from 12 countries.
Task: Each participant evaluated the reliability of two out of eight preselected ecotoxicity studies using the Klimisch method. Relevance was not formally assessed.
Studies: The eight studies covered a range of organisms (algae, Lemna minor, Daphnia magna, fish) and chemical classes (pharmaceuticals, plant protection products, biocides, industrial chemicals) [18].

Phase II (CRED Evaluation):

Period: March–April 2013.
Participants: A different set of assessors from the same pool.
Task: Each participant evaluated two different studies from the same set of eight, using a draft version of the CRED method. This version required evaluation of both reliability and relevance.
Design Safeguard: Studies were assigned so that no single study was evaluated by the same institute in both phases, ensuring independent assessment [18].

Data Collected:

Categorization Outcomes: Final reliability (and relevance for CRED) scores for each study.
Participant Feedback: Structured questionnaires on each method's accuracy, consistency, transparency, and practicality.
Time Requirement: The time taken to complete evaluations for each method.

Case Study Applications and Comparative Outcomes

The ring test applied both methods to real-world studies, yielding quantitative data on performance and user perception.

Table 2: Ring Test Results Comparing Klimisch and CRED Method Performance [18]

Evaluation Metric	Klimisch Method Results	CRED Method Results	Interpretation
Consistency of Reliability Scores	Lower consistency among assessors.	Higher consistency among assessors.	CRED's detailed criteria reduced subjective interpretation.
Perceived Accuracy	59% of participants rated it as "accurate" or "very accurate."	86% of participants rated it as "accurate" or "very accurate."	Users had greater confidence in CRED-based evaluations.
Perceived Consistency	44% rated it as "consistent" or "very consistent."	79% rated it as "consistent" or "very consistent."	CRED was viewed as more robust against user bias.
Perceived Transparency	Lacked detailed guidance, reducing transparency.	Explicit criteria and guidance enhanced transparency.	CRED's process was more traceable and auditable.
Average Evaluation Time	Shorter (leveraging familiar, broad categories).	Longer (due to comprehensive criterion checks).	CRED trades off speed for depth and reduced ambiguity.

Key Findings from Case Application:

Overcoming GLP Bias: The Klimisch method's strong preference for GLP-compliant studies was evident. In one case study, a peer-reviewed, non-GLP study on a pharmaceutical was frequently categorized as "reliable with restrictions" or "not reliable" under Klimisch, despite being scientifically sound. The CRED method, by focusing on methodological details rather than compliance labels, more often rated such studies as reliable and relevant [18].
Handling "Gray Zone" Studies: For studies with partial guideline deviations (e.g., slightly modified exposure durations), Klimisch assessments showed high variability ("reliable with restrictions" vs. "not reliable"). CRED's specific criteria for exposure characterization and test design allowed for a more standardized and defensible evaluation of these deviations [18].
Integration of Relevance: A critical advance of CRED is its formal relevance assessment. A study might be reliable (well-performed) but of low relevance (e.g., using an unrealistic exposure concentration or a non-standard endpoint). This dual output prevents the automatic use of technically sound but environmentally irrelevant data in risk assessment [13].

Table 3: Research Reagent Solutions for Ecotoxicity Study Evaluation

Tool / Resource	Function	Source / Context
CRED Evaluation Sheet	Structured Excel-based tool for applying the 20 reliability and 13 relevance criteria with guided fields.	Primary tool for implementing the CRED method; includes scoring and documentation fields [2].
CRED Reporting Checklist	A list of 50 specific reporting criteria across six categories (general info, test design, substance, organism, exposure, statistics).	Used prospectively to ensure new studies report all information necessary for a high-quality evaluation [13].
ToxRTool	A software tool designed to assist and semi-automate the process of assigning a Klimisch score.	Developed to bring more structure to the Klimisch evaluation process [47].
EthoCRED Framework	An extension of CRED principles to evaluate the reliability and relevance of behavioral ecotoxicity studies.	Addresses the growing field of behavioral endpoints, which lack standardized test guidelines [12].
NanoCRED Framework	An adapted CRED framework for evaluating studies on engineered nanomaterials, accounting for their unique properties.	Critical for assessing data quality in the fast-evolving field of nanotoxicology [12].

Visualizing the Evaluation Workflows

The logical flow of each method reveals core differences in process and complexity.

Diagram: Methodological Workflow Comparison: Klimisch vs. CRED

The CRED evaluation process for a single criterion involves structured decision-making to ensure consistency.

Diagram: CRED Criterion Assessment Logic

Based on the comparative analysis and ring test results, the CRED method provides a more robust, transparent, and consistent framework for evaluating ecotoxicity studies than the Klimisch method. Its explicit separation of reliability and relevance, supported by detailed criteria and guidance, directly addresses the major criticisms of the older system—namely, its bias towards GLP studies, lack of granularity, and dependence on expert judgment [18].

Recommended Protocol for Contemporary Ecotoxicity Study Evaluation:

Selection: Identify all available studies for the chemical and endpoint of interest.
Primary Evaluation: Apply the CRED method using the official evaluation sheet. Document the outcome for each of the 20 reliability and 13 relevance criteria.
Integration: Synthesize the qualitative reliability and relevance summaries. A study deemed both highly reliable and highly relevant should be weighted as a key study. Studies with high reliability but low relevance may inform secondary endpoints or specific scenarios.
Weight-of-Evidence: Use the transparent CRED documentation to support a weight-of-evidence approach in hazard or risk assessment, clearly justifying the inclusion or exclusion of each study.
Prospective Use: For new study designs (e.g., behavioral, nanomaterial), employ specialized extensions like EthoCRED or NanoCRED [12]. For commissioning new studies, use the CRED reporting checklist to ensure all necessary data will be generated [13].

This protocol, centered on the CRED methodology, supports the thesis that advancing ecotoxicity assessment requires moving beyond binary reliability scoring toward a multi-dimensional, transparent, and criterion-driven evaluation system. This shift is crucial for building defensible, science-based environmental risk assessments that fully utilize the available scientific literature.

The Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) method was developed to address critical inconsistencies in environmental risk assessment. It provides a standardized, transparent framework for evaluating the reliability and relevance of ecotoxicity studies, moving away from opaque expert judgment towards a structured, criteria-based approach [13]. This methodological shift is central to a broader thesis on improving the reproducibility and regulatory utility of ecotoxicity data. The CRED framework is foundational, as it includes 20 reliability and 13 relevance criteria accompanied by extensive guidance, specifically designed for aquatic ecotoxicity studies [13]. Its development was informed by existing methods and OECD reporting recommendations, and a comparative ring test concluded that risk assessors preferred it over the older Klimisch method for its transparency and detailed guidance [12].

Subsequent adaptations have extended the framework's utility into specialized areas, demonstrating its robustness and flexibility. These include NanoCRED for nanomaterials and EthoCRED for behavioral studies [12]. Furthermore, criteria for evaluating sediment and soil studies have been developed, led by researchers at the Swiss Centre for Applied Ecotoxicology [12]. The core objective of CRED is to improve the consistency of study evaluations across different regulatory frameworks, countries, and individual assessors, thereby strengthening the scientific foundation for deriving Predicted-No-Effect Concentrations (PNECs) and Environmental Quality Standards (EQSs) [13].

Current Regulatory Adoption Landscape

The adoption of rigorous, transparent evaluation frameworks like CRED aligns with a broader regulatory trend in both the European Union and Switzerland towards greater standardization, predictability, and risk-based oversight. This trend is evident across financial, consumer protection, and environmental regulations.

Table 1: Summary of Key EU and Swiss Regulatory Revisions (2024-2026)

Jurisdiction	Regulatory Area	Key Development	Timeline/Status	Core Principle
European Union	Consumer Credit	Adoption of Consumer Credit Directive 2 (CCD2), expanding scope and standardizing creditworthiness assessments [49].	Transposition by Nov 2025; application from Nov 2026 [49].	Full harmonization, enhanced transparency, and consumer protection.
European Union	Banking Stability & Supervision	Implementation of Capital Requirements Directive VI (CRD VI), integrating ESG risks and strengthening governance [50].	Entry into force in member states from Jan 2026 [50].	Harmonized supervision, integrated sustainability, and robust governance.
European Union	Digital Finance	Development of the Digital Operational Resilience Act (DORA) framework and technical standards for threat-led penetration testing [51] [50].	DORA application; specific RTS in force from July 2025 [50].	Digital resilience, standardized testing, and operational risk management.
Switzerland	Consumer Credit	Reduction of statutory maximum interest rates for consumer credit agreements [52].	Effective 1 January 2026 [52].	Consumer protection and market-adaptive regulation.
Switzerland	Banking Stability ("Too Big to Fail")	Legislative package proposing stricter capital requirements for foreign subsidiaries, a senior managers regime, and enhanced FINMA powers [53].	Consultation ongoing; measures to be implemented post-2026 [53].	Systemic risk reduction, accountability, and resolvability.

The EU's regulatory agenda emphasizes harmonization and simplification. The European Supervisory Authorities' 2026 work programmes focus on promoting innovation and reducing administrative burdens [51]. Notably, the European Commission has deprioritized 115 "non-essential" Level 2 acts to streamline the legislative framework [51]. This push for clearer, more consistent rules mirrors the core value proposition of the CRED method in the ecotoxicity domain.

In Switzerland, the regulatory response to the Credit Suisse crisis underscores a focus on stability and proportionality. Proposed "Too Big to Fail" reforms aim to strengthen capital requirements for systemically important banks, particularly for foreign participations, and introduce a senior managers regime [53]. However, industry associations like the Swiss Bankers Association caution against a disproportionate "wave of regulation," advocating for targeted measures that maintain international competitiveness [54]. This tension between robustness and practicality is a common theme in regulatory science, where frameworks like CRED aim to add rigor without undue complexity.

Detailed Experimental Protocols for CRED Evaluation

Implementing the CRED evaluation method is a systematic process designed to minimize subjective judgment. The following protocol is adapted from the original CRED publication and associated guidance [12] [13].

Protocol for Reliability Evaluation

Objective: To systematically assess the inherent quality of an ecotoxicity study based on its design, conduct, and reporting. Procedure:

Study Acquisition & Initial Review: Obtain the complete, original study report or publication. Read the study thoroughly to understand its aims, design, and findings.
Criteria-Based Scoring: Using the official CRED assessment sheet, evaluate the study against the 20 reliability criteria [13]. These are typically grouped into categories:
- Test Substance: Characterization, concentration verification, stability.
- Test Organism: Species, life stage, source, health status.
- Exposure Conditions: System design, duration, measurements of pH, temperature, oxygen, etc.
- Test Design & Statistics: Replicates, controls, randomization, statistical methods.
- Data Reporting: Clarity, completeness of results, dose-response information.
Scoring Guidance: For each criterion, assign a score (e.g., 0, 1, 2) based on detailed guidance:
- 2 (Reliable): The study fully meets the criterion. No flaws that could affect data integrity.
- 1 (Potentially Reliable): The study partially meets the criterion. Minor flaws are present but are unlikely to invalidate the data.
- 0 (Not Reliable): The study fails to meet the criterion. Major flaws are present that could significantly affect data integrity.
- Not Applicable (N/A): The criterion does not apply to the specific study design.
Overall Reliability Classification: Based on the pattern of scores, assign an overall reliability category:
- Reliable without restriction: High scores across all key criteria.
- Reliable with restriction: Deficiencies in one or more non-critical criteria.
- Not reliable: Deficiencies in one or more critical criteria.

Protocol for Relevance Evaluation

Objective: To determine the usefulness and applicability of the study data for a specific regulatory purpose (e.g., PNEC derivation for a particular ecosystem). Procedure:

Define the Assessment Context: Clearly state the regulatory endpoint and environmental compartment of interest (e.g., freshwater PNEC for fish).
Criteria-Based Assessment: Evaluate the study against the 13 relevance criteria [13]. Key aspects include:
- Test Organism Relevance: Is the species/taxa appropriate for the protection goal?
- Exposure Relevance: Does the exposure pathway (e.g., waterborne) match the scenario?
- Endpoint Relevance: Is the measured effect (e.g., mortality, reproduction) ecologically meaningful and aligned with the protection goal?
- Dose-Response: Does the study provide sufficient data to identify a dose-response relationship?
Overall Relevance Judgment: Classify the study as:
- Relevant: Directly addresses the assessment context.
- Partially relevant: Addresses some aspects but has limitations (e.g., wrong species but correct endpoint).
- Not relevant: Does not address the assessment context.

CRED Evaluation Workflow: From Study to Usable Data

Integration of CRED into Regulatory Science Workflows

The principles embodied by CRED—transparency, consistency, and structured criteria—are increasingly reflected in the operational workflows of modern regulators. This integration occurs at the nexus of scientific evaluation and policy implementation.

Regulatory Integration Pathway for Standardized Methods

The pathway illustrates how a scientific framework like CRED responds to a regulatory need for consistency. Its development involved expert input and ring-testing [12] [13]. Formal adoption is evidenced by its promotion and tool distribution through authoritative bodies [12]. Implementation is facilitated by detailed assessment sheets and Excel tools for data visualization [12]. Finally, the framework is dynamic, with extensions like NanoCRED and EthoCRED [12] demonstrating an active feedback and refinement loop, similar to the ongoing consultations seen in EU and Swiss financial regulation [55] [53].

The Scientist's Toolkit: Essential Research Reagents & Materials

Conducting ecotoxicity studies that meet high reliability standards as evaluated by CRED requires careful selection of materials and methods. The following table outlines key reagent solutions and materials critical for generating robust data.

Table 2: Key Research Reagent Solutions for CRED-Aligned Ecotoxicity Studies

Item	Function in Ecotoxicity Testing	CRED Reliability Criteria Addressed	Considerations for Compliance
Certified Reference Material (CRM)	Provides an exact, traceable concentration of the test substance for preparing stock and calibration standards.	Test Substance Characterization: Ensures accurate dosing and verification of exposure concentrations [13].	Must be accompanied by a certificate of analysis. Purity and stability should be documented.
Reconstituted Standardized Water	A chemically defined medium (e.g., following ISO or OECD guidelines) that provides consistent water quality for aquatic tests.	Exposure Conditions: Controls for confounding water chemistry variables (hardness, pH, ions) [13].	Must be prepared following a validated protocol. Parameters (pH, conductivity, hardness) must be measured and reported.
Vital Stain (e.g., Neutral Red, Trypan Blue)	Used to distinguish live from dead cells in in vitro assays or to assess cell viability.	Endpoint Measurement: Provides an objective, quantifiable measure of cytotoxicity [13].	Staining protocol must be standardized and reported. Validation against a relevant biological endpoint is recommended.
Enzyme-linked Immunosorbent Assay (ELISA) Kits	Quantifies specific biomarkers of effect (e.g., vitellogenin for endocrine disruption, stress proteins).	Endpoint Relevance: Measures ecologically meaningful sub-lethal effects at a molecular level [13].	Kit validation for the test species must be confirmed. Positive and negative controls must be included.
Internal Standard (for analytical chemistry)	A chemically similar analog added in known amount to all samples to correct for losses during extraction and analysis.	Test Substance Verification: Critical for accurately measuring the actual concentration of the test substance in exposure media [13].	Should be added at the earliest possible stage of sample preparation. Should not interfere with the analysis of the target substance.

The CRED methodology represents a significant advancement in the standardization of ecotoxicity data evaluation. Its structured, criteria-based approach directly addresses the need for transparency, consistency, and reduced bias in regulatory decision-making—a need that resonates with broader regulatory trends in the EU and Switzerland towards harmonization, risk-based oversight, and operational resilience [51] [50] [53]. While direct mentions of CRED in high-level financial regulatory texts are not present, the parallel is clear: both domains are moving towards more predictable, auditable, and scientifically robust processes.

The ongoing development of specialized CRED tools (NanoCRED, EthoCRED) and its extension to sediment and soil studies demonstrate its vitality and adaptability [12]. For researchers and regulatory professionals, mastering the CRED protocols is no longer just a scientific best practice but an essential skill for contributing to environmental regulations that are both protective and pragmatic. As regulatory sciences evolve, frameworks like CRED provide the necessary foundation for ensuring that the data informing our most critical decisions are of the highest possible reliability and relevance.

Positioning CRED Within the Broader Landscape of Evidence Evaluation Frameworks

The derivation of Predicted-No-Effect Concentrations (PNECs) and Environmental Quality Standards (EQSs) is a cornerstone of chemical hazard and risk assessment across global regulatory frameworks [13]. A fundamental prerequisite for this process is the evaluation of the reliability and relevance of available ecotoxicity studies. Historically, this evaluation has heavily relied on expert judgment, often employing the method established by Klimisch et al. in 1997 [18]. While a significant step forward, this method has been criticized for its lack of detailed guidance, leading to inconsistencies and potential bias when different assessors evaluate the same study [18]. Such inconsistencies can directly impact risk assessment outcomes, potentially resulting in inadequate environmental protection or unnecessary restrictions [18].

To address these critical shortcomings, the Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) project developed a transparent, detailed, and structured evaluation method [13]. The primary objective of CRED is to improve the reproducibility, transparency, and consistency of reliability and relevance evaluations for aquatic ecotoxicity studies across different regulatory contexts and assessors [13]. By providing a robust alternative to the Klimisch method, CRED aims to strengthen the scientific foundation of environmental decision-making and promote the broader use of high-quality peer-reviewed data in regulatory dossiers [18] [2].

Comparative Analysis of Ecotoxicity Evaluation Frameworks

The CRED framework was designed to systematically address the gaps in earlier methods. A comparative overview highlights its evolution and enhanced structure.

Table 1: Comparison of Key Ecotoxicity Study Evaluation Frameworks

Framework (Year)	Primary Scope	Number of Reliability Criteria	Number of Relevance Criteria	Evaluation Output	Key Features & Limitations
Klimisch (1997)	General toxicity & ecotoxicity	12-14 (ecotoxicity)	0	Reliability category only (4 tiers)	Lacks detailed guidance and relevance evaluation; prone to expert judgment bias; favors GLP/guideline studies [18].
CRED (2016)	Aquatic ecotoxicity	20 (evaluation) / 50 (reporting)	13	Qualitative summary for both reliability and relevance	Detailed criteria with extensive guidance; includes specific reporting recommendations; validated via ring test [13] [18].
NanoCRED (2017)	Nanomaterial ecotoxicity	Nano-specific adaptations of CRED criteria	Nano-specific adaptations of CRED criteria	Integrated with CRED evaluation categories	Adds nano-specific guidance on characterization, dosimetry, and transformation processes for reliable testing [24].
EthoCRED (2024)	Behavioural ecotoxicity	29	14	Integrated with CRED evaluation categories	Extension for behavioural endpoints; provides specific criteria for experimental design, endpoint measurement, and confounding factors in behaviour [56] [57].
CREED (2023/24)	Environmental exposure datasets	19	11	Reliability, Relevance, and overall Usability categories	Framework for chemical monitoring data; includes "gateway" criteria and silver/gold scoring tiers for practical application [27].

The CRED evaluation method is explicitly structured around two pillars: reliability (the inherent quality and clarity of the study report) and relevance (the appropriateness of the data for a specific hazard or risk assessment purpose) [18]. This dual focus ensures that a study is not only well-conducted and reported but also fit-for-purpose in a given regulatory context.

Results from a comprehensive ring test involving 75 risk assessors from 12 countries demonstrated CRED's superiority over the Klimisch method. Participants found CRED to be more accurate, consistent, transparent, and less dependent on subjective expert judgment [18].

Table 2: Key Outcomes from the CRED vs. Klimisch Ring Test [18]

Metric	Klimisch Method	CRED Evaluation Method	Implication
Perceived Consistency	Lower	Higher	Reduces discrepancy between assessors.
Perceived Transparency	Lower	Higher	Makes evaluation rationale traceable.
Dependence on Expert Judgment	Higher	Lower	Limits subjective bias.
Handling of Non-Guideline Studies	Often disadvantaged	Fairer, structured evaluation	Promotes inclusion of valid peer-reviewed data.
Time Required for Evaluation	Perceived as shorter	Slightly longer, but deemed worthwhile	Investment in time increases evaluation rigor.

Diagram 1: Logical workflow comparing the Klimisch and CRED evaluation frameworks.

Core Application Notes and Protocols for the CRED Method

The CRED Evaluation Protocol

The CRED evaluation is a stepwise, criteria-based process. The assessor systematically works through the 20 reliability and 13 relevance criteria, each supported by extensive guidance text [13] [18].

Protocol 1: Conducting a Standard CRED Evaluation for an Aquatic Ecotoxicity Study

Objective: To determine the reliability and relevance of a single aquatic ecotoxicity study for a defined regulatory purpose.
Materials: The CRED evaluation sheet (Excel tool), the study manuscript or report to be evaluated, access to CRED guidance documents [2].
Procedure:
- Define Assessment Purpose: Clearly state the regulatory context (e.g., derivation of PNEC for freshwater, hazard classification).
- Initial Screening: Confirm the study falls within CRED's scope (aquatic ecotoxicity).
- Reliability Evaluation:
  - For each of the 20 reliability criteria, determine if it is Fully Met, Partially Met, Not Met, or Not Reported.
  - Criteria cover: Test substance characterization, test organism details, exposure conditions (duration, system, measurements), experimental design (controls, replicates, randomization), and data reporting & analysis (endpoints, statistics, raw data availability).
  - Document the rationale for each score, noting specific strengths or deficiencies.
- Relevance Evaluation:
  - For each of the 13 relevance criteria, score based on the defined assessment purpose.
  - Criteria cover: Appropriateness of test organism, exposure pathway, measured endpoint, exposure duration in relation to the assessment goal, and environmental realism.
- Integration & Conclusion:
  - Synthesize reliability scores. A study with all or most key criteria "Fully Met" is considered reliable. Major flaws lead to a "not reliable" conclusion.
  - Synthesize relevance scores to determine if the study is directly applicable, supportive, or not relevant for the purpose.
  - Generate a qualitative summary that transparently justifies the final assessment.

Protocol for Ring-Test Validation of Evaluation Methods

The development and validation of CRED involved a rigorous two-phase ring test, a methodology that serves as a model for validating any such framework [18].

Protocol 2: Ring-Test Methodology for Comparing Ecotoxicity Evaluation Frameworks

Objective: To compare the consistency, transparency, and user perception of two study evaluation methods (e.g., CRED vs. Klimisch).
Experimental Design: A two-phase, blinded, cross-over design where participants evaluate different studies with each method.
Materials: A set of 8-10 diverse ecotoxicity studies (peer-reviewed and guideline), evaluation kits for both methods, standardized perception questionnaires [18].
Procedure:
- Participant Recruitment: Engage a large cohort (n>50) of risk assessors from regulatory agencies, industry, and academia across multiple countries [18].
- Study Allocation: Randomly assign each participant two distinct studies in Phase I (Method A) and two different studies in Phase II (Method B). Ensure no overlap in evaluators per study within an institute to maintain independence [18].
- Blinded Evaluation: Participants evaluate their assigned studies using the provided framework and guidance, without knowledge of others' evaluations.
- Data Collection: Collect: (a) The final reliability/relevance categorization for each study. (b) A detailed questionnaire on the method's ease of use, clarity, consistency, and perceived time requirement.
- Analysis:
  - Quantitative: Calculate the degree of agreement (e.g., percentage concordance) among assessors for each study and method. Lower variance indicates higher consistency.
  - Qualitative: Analyze questionnaire responses to compare user perception of the two methods.
Key Outcome from CRED Ring Test: The CRED method produced more consistent evaluations among assessors and was overwhelmingly preferred for its detail and transparency, despite a slightly higher time investment [18].

Diagram 2: Two-phase ring test protocol for validating evaluation frameworks.

CRED in Regulatory Application and Protocol Implementation

CRED transitions from a theoretical framework to a practical tool through its integration into regulatory workflows and specific adaptation protocols.

Primary Regulatory Applications:

Derivation of Environmental Quality Standards (EQS): CRED is used to evaluate and select "key studies" for PNEC derivation under the EU Water Framework Directive and in national programs (e.g., Switzerland) [2].
Chemical Risk Assessment under REACH: Provides a transparent alternative for evaluating study reliability and relevance for registration dossiers [18] [2].
Pharmaceutical Environmental Risk Assessment (ERA): Considered for use in projects like the Intelligence-led Assessment of Pharmaceuticals in the Environment (iPiE) [2].
Data Screening for Databases: Employed by the NORMAN EMPODAT database and the EU Joint Research Centre's Literature Evaluation Tool to curate high-quality ecotoxicity data [2].

Protocol 3: Adapting CRED for Specialized Testing Domains (NanoCRED Example) The core CRED principles are adaptable to novel challenges, such as testing engineered nanomaterials (ENMs), which present unique issues like particle characterization and agglomeration [24].

Objective: To evaluate the reliability and relevance of an ecotoxicity study performed with an engineered nanomaterial.
Modifications to Standard CRED Protocol:
- Enhanced Test Substance Characterization (Reliability): Beyond chemical identity, criteria require detailed reporting of nanomaterial properties (size distribution, shape, surface charge, coating, purity) and characterization in the test medium (dispersion protocol, agglomeration state, dissolution over time) [24].
- Exposure Verification (Reliability): Emphasizes the need for measured exposure concentrations throughout the test, as nominal concentrations are meaningless for ENMs. Includes criteria for sampling and analytical methods [24].
- Relevance Considerations: Evaluates if the test design (e.g., lighting, mixing) is appropriate for maintaining a stable and relevant exposure regime for the nanomaterial. Assesses the ecological relevance of the chosen endpoint in relation to potential nano-specific effects [24].
Outcome: The NanoCRED framework allows for the structured inclusion of valid non-guideline nanomaterial studies in risk assessment, addressing a critical data gap [24].

The Expanding CRED Ecosystem: Frameworks for Emerging Endpoints and Data Types

The CRED philosophy has spawned specialized extensions to address distinct scientific and regulatory needs, forming a cohesive ecosystem for evidence evaluation.

EthoCRED for Behavioural Ecotoxicology: Behavioural endpoints are highly sensitive but methodologically diverse. EthoCRED extends CRED with 29 reliability and 14 relevance criteria specific to behaviour [56] [57].

Key Protocol Additions: Criteria cover experimental setup (e.g., habitat complexity, acclimation), endpoint measurement (automated vs. manual, observer blinding), confounding factor control (e.g., circadian rhythms, environmental noise), and the ecological relevance of the behavioural trait [56] [57].

CREED for Exposure Datasets: Completing the risk assessment paradigm, Criteria for Reporting and Evaluating Exposure Datasets (CREED) applies the CRED logic to environmental monitoring data [27].

Novel Protocol Features: Introduces "gateway" criteria (e.g., analyte, location, date) for initial dataset screening. Implements a two-tiered (silver/gold) scoring system, where "silver" represents a practically usable dataset and "gold" an ideal one, enhancing real-world applicability [27].

Table 3: The Scientist's Toolkit for Implementing CRED and Related Frameworks

Tool / Resource	Function in CRED-Related Research	Source / Example
CRED Excel Evaluation Tool	The primary implement for systematically scoring criteria, documenting rationale, and generating an evaluation summary.	Freely available from project websites [2] [12].
OECD Test Guidelines (e.g., 201, 210, 211)	Provide the standardized methodological benchmark against which many CRED reliability criteria are assessed.	OECD Publishing.
Reference Databases (e.g., NORMAN EMPODAT)	Curated databases that use CRED or similar for quality control; sources of pre-evaluated data.	NORMAN Network [2].
Guidance on Nanomaterial Characterization	Essential supplementary documents for applying NanoCRED criteria (e.g., OECD guidance on sample prep).	OECD, ISO, and project-specific guidelines [24].
CREED Excel Workbook Template	Implements the CREED workflow for exposure data, from purpose definition to final usability scoring.	Distributed by SETAC [27].

Diagram 3: The CRED ecosystem of specialized frameworks and their primary regulatory applications.

The CRED framework represents a paradigm shift from a subjective, checklist-based evaluation to a structured, transparent, and guidance-supported process. Its position within the broader landscape is central: it serves as the robust core upon which specialized extensions (NanoCRED, EthoCRED) are built and inspires analogous frameworks for complementary data types (CREED) [12] [27].

For the broader thesis on CRED, its significance lies in providing a methodologically sound, empirically validated tool that reduces uncertainty in the foundational step of ecotoxicological risk assessment: data evaluation. By minimizing bias and inconsistency, CRED strengthens the scientific credibility of subsequent steps, from PNEC derivation to regulatory decision-making.

Future directions will likely involve the further integration of CRED and its derivatives into official regulatory guidance, the development of training programs to ensure competent application, and potential expansions into other areas (e.g., terrestrial ecotoxicology, microbiomics). The ongoing development and adoption of these frameworks are critical for achieving harmonized, transparent, and science-based protection of ecosystems globally.

Conclusion

The CRED method represents a significant evolution in ecotoxicity study evaluation, moving regulatory science toward greater transparency, consistency, and scientific rigor. By providing detailed, criterion-based guidance for assessing both reliability and relevance, it addresses the critical shortcomings of the Klimisch method and reduces undesirable variability in expert judgment. Its validation through international ring tests and ongoing integration into European regulatory guidance documents underscores its practical utility and acceptance. For biomedical and clinical researchers, particularly those developing pharmaceuticals with environmental considerations, mastering CRED is essential for ensuring their ecotoxicity data is robust and regulatory-ready. Future directions point toward the synergistic use of CRED with AI-assisted review tools to manage large evidence bases and its broader application in weight-of-evidence frameworks, ultimately strengthening the scientific foundation of environmental risk assessment and product stewardship.