This article provides a comprehensive overview of the Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool, a web-based application developed by the U.S.
This article provides a comprehensive overview of the Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool, a web-based application developed by the U.S. Environmental Protection Agency. Designed for researchers, scientists, and drug development professionals, the content covers the foundational principles of using protein sequence and structural conservation to extrapolate toxicity data across species. It details the methodological workflow from sequence alignment to protein structure modeling, explores practical applications through documented case studies, addresses common troubleshooting and optimization strategies, and validates the tool's predictions against empirical data. The article aims to equip scientists with the knowledge to efficiently utilize SeqAPASS for prioritizing chemical testing, selecting relevant model species, and supporting safety assessments in the context of reduced animal testing and New Approach Methodologies (NAMs).
Human and ecological hazard assessment of chemicals has traditionally relied on toxicity data generated from a limited number of laboratory model species. However, regulatory agencies face the formidable challenge of extrapolating these limited data to thousands of diverse species of potential concern, with documented differences in chemical sensitivities ranging from several-fold to over a thousand-fold [1]. This challenge, combined with decreasing testing resources, growing international interest in reducing animal testing, and increasing demands to evaluate chemicals more rapidly, has created a compelling need for innovative, scientifically-based approaches to extrapolate toxicological data across taxa [2] [3].
The Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool emerged as a direct response to these challenges. Developed by the U.S. Environmental Protection Agency (US EPA) and first released publicly in 2016, SeqAPASS is a fast, freely available, online screening tool that enables researchers and regulators to extrapolate toxicity information across species by evaluating protein structural similarities and differences [3] [4]. The tool operates on the fundamental principle that conservation of a molecular target across species can serve as a critical line-of-evidence for predicting relative intrinsic susceptibility to chemicals that interact with that target [4].
The regulatory landscape has been rapidly evolving to support such approaches. In 2019, the US EPA Administrator issued a directive to eliminate mammalian regulatory and research studies completely by 2035, with associated funds allocated to develop alternative methods [2]. Similarly, the European Union's REACH (Registration, Evaluation, Authorisation and Restriction of Chemicals) regulations were amended in 2017 to establish animal testing as a last resort for filling data gaps [2]. These regulatory shifts have accelerated the development and adoption of New Approach Methodologies (NAMs) that can provide efficient, cost-effective toxicity evaluation while reducing reliance on whole-animal testing [2] [5].
Table 1: Key Regulatory Drivers for SeqAPASS Adoption
| Regulatory Driver | Implementing Body | Key Provision | Impact on Toxicology |
|---|---|---|---|
| EPA Directive 2035 [2] | U.S. Environmental Protection Agency | Eliminate mammalian regulatory studies by 2035 | Accelerates development of computational alternatives to animal testing |
| REACH Amendment [2] | European Union | Established animal testing as last resort for data gaps | Promotes use of alternative methods for chemical safety assessment |
| Cosmetic Testing Bans [2] | Multiple governments (EU, others) | Banned marketing of cosmetics tested on animals | Drives innovation in non-animal testing approaches |
The SeqAPASS tool was designed with flexibility to accommodate varying degrees of protein characterization, acknowledging that available information about chemical-protein interactions and molecular targets themselves can differ substantially [4]. To address this variability, the tool employs a tiered analytical approach consisting of three sequential levels of evaluation, each providing additional evidence for screening-level assessments of probable cross-species susceptibility [4] [1].
Level 1: Primary Amino Acid Sequence Comparison The initial analysis level compares the entire primary amino acid sequence of a query protein from a known sensitive species to all species with available sequence information in the National Center for Biotechnology Information (NCBI) protein database, which contains over 153 million proteins representing more than 95,000 organisms [3] [5]. Using BLASTp algorithms, the tool calculates metrics for sequence similarity and performs ortholog detection, establishing a foundational assessment of potential susceptibility across taxonomic groups [5] [4].
Level 2: Functional Domain Evaluation The second analysis level provides greater resolution by examining sequence similarity within specific functional domains, such as ligand-binding domains, rather than the entire protein sequence [5] [4]. This approach recognizes that chemicals often interact with specific protein regions rather than the entire protein structure, offering more precise predictions of susceptibility that can distinguish differences among broader taxonomic groups [1].
Level 3: Critical Amino Acid Residue Comparison The most granular level of analysis compares individual amino acid residue positions identified as critical for protein conformation, chemical binding, or other key functions [4] [1]. This highest-resolution evaluation can detect species-specific differences in chemical susceptibility that might be masked in broader sequence comparisons, potentially explaining dramatic differences in sensitivity observed between closely related species [1].
SeqAPASS Three-Tiered Analytical Workflow
Since its initial release in 2016, SeqAPASS has undergone substantial evolution through regular version updates, each introducing enhanced capabilities and features. The tool's development has been characterized by responsive adaptation to user needs and technological advancements [5].
Table 2: SeqAPASS Version Evolution and Key Features [5]
| Version | Release Date | Key Features and Enhancements |
|---|---|---|
| 1.0 | January 2016 | Initial public release with Level 1 and Level 2 analyses |
| 2.0 | May 2017 | Added Level 3 amino acid residue comparisons |
| 3.0 | March 2018 | Integrated interactive data visualization capabilities |
| 4.0 | October 2019 | Added interoperability with ECOTOX Knowledgebase |
| 5.0 | December 2020 | Introduced customizable heat map visualization |
| 6.0 | September 2021 | Implemented widget connecting to ECOTOX empirical data |
| 8.0 | Recent | Added protein structure generation across species |
The tool's interoperability with other data resources significantly enhances its utility for comprehensive assessment. SeqAPASS integrates with the CompTox Chemicals Dashboard, allowing results from ToxCast assay targets to be extrapolated across species [3]. Additionally, the ECOTOX Knowledgebase widget enables users to rapidly connect sequence-based predictions of chemical susceptibility to existing curated empirical toxicity data for terrestrial and aquatic species [5] [6].
The Endocrine Disruptor Screening Program (EDSP) faces the challenge of evaluating over 10,000 chemicals for potential effects on the endocrine system across diverse species. SeqAPASS has been employed to determine the degree to which data generated for chemical activation in mammalian systems can be translated to non-mammalian vertebrates, including fish, amphibians, and birds [3]. By comparing the conservation of the human estrogen receptor across these taxonomic groups, researchers obtained critical information to prioritize testing and assess both human health and ecological risks of estrogenic chemicals, demonstrating how mechanistic data can be strategically applied to focus limited testing resources [3].
The decline in honey bee colonies has raised significant concerns about ecosystem health and agricultural productivity. SeqAPASS has been utilized to evaluate the potential chemical susceptibility of honey bees and other insect pollinators by examining the conservation of protein targets like the nicotinic acetylcholine receptor [3] [4]. In a complementary application, the tool has helped explain the species selectivity of molt-accelerating insecticides by comparing the ecdysone receptor (EcR) between target pest species and non-target organisms [3] [4]. These analyses demonstrated that the EcR is well conserved among arthropods but exhibits sufficient sequence variation in specific functional domains to enable the design of insecticides that selectively target pests while minimizing impacts on beneficial insects [4].
Protecting threatened and endangered species from chemical exposures presents particular challenges, as traditional toxicity testing is rarely feasible with these populations. SeqAPASS provides a valuable approach to address this data gap by predicting protein target conservation and potential chemical susceptibility for species of conservation concern [3] [5]. The tool's data visualization features include specific options to highlight threatened and endangered species, enabling regulators to incorporate these considerations into chemical risk assessments even when empirical toxicity data are unavailable [5].
Cross-Species Extrapolation Concept
Objective: To perform primary amino acid sequence comparison (Level 1) and functional domain evaluation (Level 2) for cross-species susceptibility prediction [5] [7].
Methodology:
Access and Authentication
Protein Identification
Level 1 Analysis Initiation
Level 1 Data Interpretation
Level 2 Analysis Development
Level 2 Data Visualization
Objective: To compare individual amino acid residue positions of importance for protein-chemical interaction across species [5] [1].
Methodology:
Literature Review and Residue Identification
Level 3 Analysis Setup
Taxonomic Selection
Level 3 Data Compilation and Interpretation
Amino Acid Position Alignment
Heat Map Generation for Level 3 Data:
Decision Summary Report Compilation:
Table 3: Essential Research Materials and Computational Tools for SeqAPASS Analysis
| Resource Category | Specific Tool/Database | Function in Analysis | Access Point |
|---|---|---|---|
| Protein Databases | NCBI Protein Database | Provides 153+ million protein sequences across 95,000+ organisms for comparative analysis | https://www.ncbi.nlm.nih.gov/protein [3] |
| Computational Algorithms | BLASTp (Protein Basic Local Alignment Search Tool) | Calculates primary amino acid sequence similarity metrics for Level 1 analysis | Integrated in SeqAPASS [5] |
| Domain Identification | NCBI Conserved Domain Database | Identifies functional protein domains for Level 2 analysis | Integrated in SeqAPASS [5] |
| Toxicity Data Integration | ECOTOX Knowledgebase | Provides curated empirical toxicity data for comparison with sequence-based predictions | https://cfpub.epa.gov/ecotox/ [5] |
| Chemical Prioritization | CompTox Chemicals Dashboard | Supports identification of protein targets and chemical interactions | https://comptox.epa.gov/dashboard [3] |
SeqAPASS represents a transformative approach in modern toxicology, effectively addressing the critical challenge of cross-species extrapolation through innovative bioinformatics. By leveraging publicly available protein sequence information and applying a tiered analytical framework, the tool enables rapid, cost-effective predictions of chemical susceptibility across broad taxonomic groups. Its development and ongoing refinement reflect the evolving regulatory landscape that increasingly prioritizes mechanistically-oriented, animal-free testing methodologies. As a freely available, web-based application, SeqAPASS provides both researchers and regulators with a powerful platform to support chemical prioritization, inform species selection for testing, and advance the application of the Adverse Outcome Pathway frameworkâultimately contributing to more efficient and protective chemical safety assessments for both human health and ecological systems.
The fundamental principle underlying cross-species chemical susceptibility hinges on the degree of conservation of specific protein targets with which chemicals interact. A species' intrinsic susceptibility to a particular chemical is largely determined by the presence and functional conservation of these protein targets, which, when bound, can disrupt vital biological processes leading to adverse effects on survival, growth, and reproduction [3]. The Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool translates this principle into an actionable, computational method. It is a fast, freely available, online screening tool that enables researchers and regulators to extrapolate toxicity information from data-rich model species to thousands of other species for which toxicity data is limited or absent [5] [3]. By evaluating protein sequence and structural conservation, SeqAPASS provides a scientifically robust framework for predicting relative intrinsic chemical susceptibility across the tree of life.
At its core, the interaction between a chemical and a biological system is often highly specific. Many chemicals, including pharmaceuticals and pesticides, exert their effectsâboth intended and unintendedâby interacting with specific protein molecules, such as receptors, enzymes, or transporters. The presence of a specific protein target, coupled with a sufficient degree of structural compatibility at the interaction site, is a primary determinant of a chemical's effect in an organism [3]. Consequently, the diversity of responses observed across different species to the same chemical can frequently be traced back to differences in the amino acid sequences of these protein targets. Even a single amino acid substitution at a critical position within a chemical-binding pocket can dramatically alter the binding affinity and, hence, the species' susceptibility [5].
The SeqAPASS tool operationalizes this principle by leveraging the vast and publicly available protein sequence information from the National Center for Biotechnology Information (NCBI) database, which contains information on over 153 million proteins representing more than 95,000 organisms [3]. The tool uses a tiered approach to evaluate protein conservation, moving from broad, sequence-level comparisons to more refined, structure-based analyses [5] [3]. This multi-level evaluation allows users to capitalize on existing knowledge about chemical-protein interactions, making the tool both flexible and powerful for cross-species extrapolation.
The following protocol details the application of the SeqAPASS tool for predicting cross-species chemical susceptibility, using a known sensitive species and its protein target as a starting point.
Table 1: Essential Research Reagents and Computational Tools for SeqAPASS Analysis
| Item Name | Function/Description | Source/Example |
|---|---|---|
| Query Protein Sequence | The amino acid sequence of the protein target from a known sensitive species. Serves as the reference for all comparisons. | Can be obtained as a FASTA file or NCBI Protein Accession from databases like NCBI Protein. |
| SeqAPASS Online Tool | The web-based platform that performs the multi-level computational analysis. | Freely accessible at https://seqapass.epa.gov/seqapass [5]. |
| NCBI Protein Database | The comprehensive source of protein sequence data used by SeqAPASS for cross-species comparisons. | National Center for Biotechnology Information (NIH) [3]. |
| Chemical of Interest | The specific compound for which susceptibility is being predicted. Understanding its mode of action is critical. | e.g., a pesticide, pharmaceutical, or environmental contaminant. |
| External Database Links | Resources to help identify the initial query protein and its critical residues. | Integrated within SeqAPASS; e.g., CompTox Chemicals Dashboard, AOP-Wiki [5]. |
The first level of analysis provides a broad screening of protein conservation.
This level refines the prediction by focusing on the specific regions of the protein essential for its function.
The most precise level of analysis, Level 3, requires specific knowledge of the chemical-protein interaction.
The following diagram illustrates the logical workflow and decision-making process across the three tiers of SeqAPASS analysis:
SeqAPASS versions 5.0 and above include advanced data synthesis features [5].
Table 2: Summary of SeqAPASS Applications in Toxicology
| Research Area | Protein Target Example | SeqAPASS Utility |
|---|---|---|
| Endocrine Disruption | Estrogen Receptor, Androgen Receptor | Prioritize testing of chemicals across vertebrate and invertebrate species. |
| Pesticide Development & Ecotoxicology | Ecdysone Receptor, Nicotinic Acetylcholine Receptor | Understand selective toxicity and assess risks to non-target pollinators and insects. |
| Pharmaceutical Safety | Opioid Receptors, Transthyretin | Predict potential adverse drug reactions in humans and veterinary species. |
| Chemical Safety for Endangered Species | Various enzyme targets | Make informed decisions for species where empirical testing is not feasible. |
The following diagram outlines the core computational workflow that occurs within the SeqAPASS tool after a user submits a job, illustrating how the backend data and algorithms interact to produce a result.
The SeqAPASS tool represents a significant advancement in predictive toxicology and embodies the "3Rs" principle (Replace, Reduce, Refine) by minimizing reliance on whole-animal testing [5] [8] [9]. Its major strengths include its robustness, leveraging constantly updated public databases; its flexibility in accommodating different levels of prior knowledge; and its interoperability with other resources like the CompTox Chemicals Dashboard and ECOTOX Knowledgebase [5] [3].
However, users must be aware of its domain of applicability. SeqAPASS specifically evaluates intrinsic susceptibility based on protein target conservation. It does not directly address other critical factors governing toxic outcomes in whole organisms, such as ADME (Absorption, Distribution, Metabolism, and Excretion). A species may possess a conserved protein target but not be susceptible in practice due to differences in metabolism that rapidly detoxify the chemical, or due to an impermeable barrier preventing the chemical from reaching the target [8]. Therefore, SeqAPASS predictions are most powerful when used as a screening-level line of evidence within a broader weight-of-evidence assessment that considers additional toxicokinetic and physiological data.
The fundamental principle that protein target conservation determines chemical susceptibility provides a powerful lens through which to view cross-species extrapolation. The SeqAPASS tool effectively applies this principle, offering researchers and regulators a sophisticated, computationally-driven method to predict chemical susceptibility for thousands of species. Its tiered protocol allows for screening-level assessments to highly refined investigations, making it an indispensable resource in the modern toxicologist's toolkit for supporting chemical safety evaluations, prioritizing testing efforts, and protecting human health and the environment.
In toxicology, ecology, and drug development, a significant challenge arises from the stark disparity in available toxicity data across different species. For well-established model organisms such as humans, mice, rats, and zebrafish, a wealth of toxicological information exists. In contrast, for the vast majority of other plants and animals, toxicity data are extremely limited or non-existent [3]. This creates a critical data gap, hindering accurate risk assessments for pharmaceuticals, pesticides, and environmental contaminants across the full spectrum of biodiversity. Traditional whole-animal testing is not only resource-intensive and costly but is also ethically questionable, especially for threatened or endangered species. This reality has accelerated the paradigm shift towards computational predictive methods that can maximize the use of existing data from data-rich species to make reliable predictions about data-poor species [5].
The fundamental premise for bridging this gap is evolutionary conservation. The susceptibility of a species to a particular chemical is often determined by the presence and specific structure of proteins that interact with that chemical once it enters the body. If the protein target of a chemical is highly conserved across species, the susceptibility observed in a model organism can be extrapolated to others [4]. The Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool, developed by the U.S. Environmental Protection Agency, is a web-based application designed to operationalize this principle. It provides a fast, online screening tool that allows researchers and regulators to extrapolate toxicity information across thousands of species by evaluating the conservation of known protein targets [3] [10].
SeqAPASS is a publicly available, freely accessible online tool that leverages vast public repositories of protein sequence information to predict chemical susceptibility across species. Its core function is to evaluate the conservation of a known protein targetâsuch as a receptor or enzymeâfrom a species with documented chemical sensitivity (the "query") across all other species with available protein sequence data in the National Center for Biotechnology Information (NCBI) database, which contains over 153 million proteins representing more than 95,000 organisms [3] [11].
The tool is designed with a flexible, tiered approach that accommodates varying degrees of available information about the chemical-protein interaction. This multi-level evaluation allows users to refine their assessments, moving from broad, screening-level predictions to more precise, high-resolution analyses [4] [5]. A key strength of SeqAPASS is its interoperability with other databases. It can be directly linked to the EPA's CompTox Chemicals Dashboard to help identify query proteins and to the ECOTOX Knowledgebase, allowing users to compare sequence-based susceptibility predictions with existing empirical toxicity data [11] [5]. Since its initial release in 2016, SeqAPASS has undergone continuous refinement, with annual version releases incorporating new features, updated data, and enhanced visualization capabilities based on active user feedback [5].
Table 1: Key Features and Capabilities of the SeqAPASS Tool
| Feature | Description | Utility for Researchers |
|---|---|---|
| Data Source | NCBI protein database (massive and continuously updated) | Access to a comprehensive and current knowledge base for protein sequences. |
| Three-Level Analysis | Primary sequence, functional domain, and critical residue comparisons. | Provides flexibility to perform analyses with variable levels of prior knowledge. |
| Interoperability | Links to CompTox Chemicals Dashboard and ECOTOX Knowledgebase. | Facilitates query protein identification and validation of predictions with empirical data. |
| Data Visualization | Customizable box plots, heat maps, and summary reports. | Enables rapid interpretation of results and generation of publication-quality graphics. |
| Output | Downloadable data tables, visualizations, and a comprehensive summary report (.pdf). | Streamlines data synthesis and reporting for risk assessments and scientific publications. |
The predictive power of SeqAPASS is rooted in its three-level analytical workflow, which progresses from a broad whole-protein comparison to a focused inspection of specific atomic interactions. This hierarchical structure ensures that the tool is both accessible for novice users and powerful enough for advanced research.
The first and most fundamental level of analysis involves comparing the entire primary amino acid sequence of the query protein against all available protein sequences in the database. The tool uses a standalone version of the Protein Basic Local Alignment Search Tool (BLASTp) to perform this alignment [5]. It calculates a metric for overall sequence similarity and identifies potential orthologsâproteins in different species that evolved from a common ancestral gene and typically retain the same function. A high degree of sequence similarity at this level suggests that the protein target is present in the evaluated species and provides an initial, screening-level line of evidence for potential chemical susceptibility [4].
The second level of analysis offers greater resolution by focusing on specific functional domains of the protein. Not all regions of a protein are equally important for its interaction with a chemical. For instance, a chemical may bind specifically to a ligand-binding domain (LBD) or an active site. Level 2 analysis evaluates sequence similarity specifically within these user-selected or predefined domains [3] [4]. This is particularly useful when the entire protein sequence is not well-conserved, but the critical functional domain is. A species may be deemed susceptible if its functional domain is highly similar to that of the sensitive query species, even if the overall protein sequence similarity is lower.
The third and most precise level of analysis investigates the conservation of individual amino acid residues known to be critical for the protein's interaction with the chemical. These residues may be involved in forming hydrogen bonds, engaging in hydrophobic interactions, or contributing to the overall three-dimensional structure of the binding pocket. Differences in a single critical residue can be enough to abolish chemical binding and confer resistance [4] [5]. Level 3 allows users to input the positions of these critical residues from the query sequence. SeqAPASS then generates a customizable heat map visualization showing the alignment of these specific residues across species of interest, providing a high-resolution prediction of susceptibility.
The following workflow diagram illustrates the logical progression through these three tiers of analysis within the SeqAPASS tool.
This section provides a detailed, step-by-step protocol for using the SeqAPASS tool, from initial setup to data interpretation, followed by specific case studies demonstrating its practical application.
1. Getting Started and Account Creation
https://seqapass.epa.gov/seqapass using the Chrome web browser for optimal compatibility [5].2. Pre-Analysis: Identifying the Query Protein
3. Developing and Running a Level 1 Query
4. Refining the Analysis: Level 2 and Level 3
5. Data Synthesis and Interpretation
Table 2: Summary of Key Case Studies Applying the SeqAPASS Tool
| Case Study | Query Protein (Sensitive Species) | Chemical Class | Key Finding |
|---|---|---|---|
| Pollinator Risk | Nicotinic acetylcholine receptor (Honey bee) | Neonicotinoid insecticides | Predicted potential susceptibility in many other bee species and insects, informing ecological risk assessments [3]. |
| Endocrine Disruption | Estrogen receptor (Human) | Estrogenic chemicals | Determined the degree to which mammalian estrogen receptor data can be translated to fish, amphibians, and birds for the Endocrine Disruptor Screening Program [3]. |
| Insect Molting | Ecdysone receptor (Tobacco budworm) | Molt-accelerating compounds | Confirmed the mechanism of selective toxicity, showing why these compounds are toxic to larval pests but not to non-targets like honey bees and earthworms [3]. |
| Fungicide Selectivity | Cytochrome b (Fungi) | Strobilurin fungicides | Demonstrated a lack of binding site conservation in non-target species, explaining the fungicides' selective toxicity [4]. |
Successfully applying the SeqAPASS tool and validating its predictions requires a suite of informational and material resources. The following table details key components of this research toolkit.
Table 3: Research Reagent Solutions for Cross-Species Extrapolation
| Reagent/Resource | Function/Description | Example Sources/Tools |
|---|---|---|
| Query Protein Sequence | The amino acid sequence of the protein target from a known sensitive species; serves as the baseline for all comparisons. | National Center for Biotechnology Information (NCBI) Protein Database [3] [5]. |
| Chemical-Protein Interaction Data | Information on functional domains and critical amino acid residues essential for high-resolution (Level 3) analysis. | Scientific literature, crystallographic databases (e.g., Protein Data Bank), AOP-Wiki [4] [5]. |
| Taxonomic Information | A structured classification system that allows for the organization and interpretation of results across species. | Integrated Taxonomic Information System (ITIS), NCBI Taxonomy [5]. |
| Empirical Toxicity Data | Experimental data used to validate SeqAPASS predictions of susceptibility. | EPA ECOTOX Knowledgebase [11]. |
| BLAST+ and COBALT Executables | The underlying algorithms used by SeqAPASS for sequence alignment and comparison; updated regularly with new tool versions. | National Institutes of Health (NIH) [5]. |
The challenge of extrapolating toxicity data from data-rich to data-poor species is a significant bottleneck in ecological risk assessment and drug development. The SeqAPASS tool represents a powerful, innovative solution to this problem. By leveraging publicly available protein sequence data and a flexible, multi-level analytical framework, it provides researchers with a rational, evidence-based method to predict cross-species susceptibility. Its applications in predicting chemical risks to pollinators, understanding endocrine disruption across vertebrates, and confirming the selective toxicity of pesticides and fungicides underscore its utility and reliability.
As the volume of genetic and protein data continues to grow, and as computational tools like SeqAPASS become more sophisticated and integrated with other data sources, the vision of a comprehensive, predictive toxicology framework that minimizes animal testing and rapidly protects human health and the environment comes closer to reality. For researchers and drug development professionals, mastering tools like SeqAPASS is becoming essential for conducting cutting-edge, efficient, and ecologically relevant safety assessments.
The landscape of chemical safety and drug development is undergoing a fundamental transformation, driven by scientific advancement and regulatory change. The FDA Modernization Act 2.0, signed into law in December 2022, represents a pivotal shift by refuting the 1938 Federal Food, Drug, and Cosmetics Act that had mandated animal testing for every new drug development protocol [12]. This legislative change opens the door for advanced, human-relevant toolsâcollectively known as New Approach Methodologies (NAMs)âto replace, reduce, and refine traditional animal testing [13] [14].
Within this new framework, the Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool emerges as a critical bioinformatics platform for addressing one of the most persistent challenges in toxicology and risk assessment: extrapolating chemical effects across diverse species. SeqAPASS is a fast, online screening tool that allows researchers and regulators to extrapolate toxicity information from data-rich model organisms to thousands of other non-target species with limited or no toxicity data [3]. By evaluating protein sequence and structural similarities, SeqAPASS provides a scientifically robust method for predicting cross-species susceptibility, aligning perfectly with the FDA's evolving roadmap for modernized safety assessment [15].
SeqAPASS leverages the vast biological data available in the National Center for Biotechnology Information (NCBI) protein database, which contains information on over 153 million proteins representing more than 95,000 organisms [3]. The tool's power lies in its flexible, three-tiered analytical approach, which allows researchers to capitalize on existing information about chemical-protein interactions:
Table 1: SeqAPASS Tiered Analysis Framework
| Analysis Level | Comparison Focus | Key Outputs | Application Context |
|---|---|---|---|
| Level 1 | Primary amino acid sequence | Sequence similarity metrics, ortholog detection | Initial screening for potential susceptibility |
| Level 2 | Functional protein domains | Domain conservation across species | Refined analysis focusing on functional regions |
| Level 3 | Specific amino acid residues | Residue-level conservation | High-resolution analysis for critical binding sites |
This multi-tiered approach provides increasing evidence to support rapid, screening-level assessments of probable cross-species susceptibility, enabling more informed chemical prioritization and species selection for testing [4].
Protocol Title: Standardized Workflow for Cross-Species Susceptibility Prediction Using SeqAPASS
Principle: This protocol describes a systematic approach for using SeqAPASS to evaluate potential chemical susceptibility across species by analyzing conservation of protein targets.
Materials and Reagents:
Procedure:
Level 1 Analysis (Primary Sequence Alignment)
Level 2 Analysis (Functional Domain Evaluation)
Level 3 Analysis (Critical Residue Assessment)
Data Integration and Interpretation
Background: The Endocrine Disruptor Screening Program (EDSP) faces the challenge of evaluating over 10,000 chemicals for potential effects on the endocrine system across diverse species. SeqAPASS has been employed to determine the degree to which data generated for chemical activation in mammalian systems (e.g., the human estrogen receptor) can be translated to non-mammalian species such as fish, amphibians, and birds.
Experimental Approach:
Results and Regulatory Impact: The analysis revealed significant conservation of the estrogen receptor ligand-binding domain across vertebrate species, providing a scientific basis for extrapolating estrogenic activity data from mammalian models to ecological receptors. This approach has helped prioritize testing resources and inform the human health and ecological risk assessment of estrogenic chemicals [3].
Background: The decline in honey bee colonies has raised concerns about the role of chemical exposures, particularly neonicotinoid insecticides that target nicotinic acetylcholine receptors (nAChRs). SeqAPASS was used to evaluate the potential chemical susceptibility of honey bees compared to target pest species.
Experimental Approach:
Key Findings: The analysis identified key differences in specific residue positions between honey bees and target pests, explaining differential sensitivity to neonicotinoid insecticides. These findings supported the development of more selective insecticide candidates that maintain efficacy against pests while reducing risks to pollinators [4] [3].
Table 2: Summary of SeqAPASS Case Study Applications
| Application Area | Target Protein | Key Species Compared | Regulatory Impact |
|---|---|---|---|
| Endocrine Disruption | Estrogen receptor | Human, fish, amphibians, birds | Informed testing priorities for EDSP |
| Insecticide Development | Nicotinic acetylcholine receptor | Honey bees, pest insects | Supported pollinator risk assessments |
| Molting Disruption | Ecdysone receptor | Budworms, honey bees, earthworms | Validated species selectivity of insecticides |
| Fungicide Safety | Cytochrome b | Fish, birds, mammals | Informed ecological risk assessment for strobilurin fungicides |
The FDA Modernization Act 2.0 represents more than just a policy changeâit signals a fundamental reorientation toward human-relevant, mechanistic toxicology. SeqAPASS aligns perfectly with this new paradigm through several key attributes:
The FDA's 2025 "Roadmap to Reducing Animal Testing in Preclinical Safety Studies" establishes an ambitious framework for transitioning to NAMs over a 3-5 year period. This roadmap:
Within this framework, SeqAPASS serves as a critical tool for addressing species relevance questions, particularly for biologics where traditional animal models often show limited predictivity for human responses.
Table 3: Essential Research Resources for SeqAPASS and Cross-Species Research
| Tool/Resource | Type | Primary Function | Access Information |
|---|---|---|---|
| SeqAPASS | Web application | Cross-species protein sequence and structure comparison | https://seqapass.epa.gov/seqapass/ |
| NCBI Protein Database | Database | Comprehensive repository of protein sequences | https://www.ncbi.nlm.nih.gov/protein |
| I-TASSER | Computational tool | Protein structure prediction from sequence | https://zhanggroup.org/I-TASSER/ |
| CompTox Chemicals Dashboard | Database | Chemical toxicity and property data | https://comptox.epa.gov/dashboard |
| ECOTOX Knowledgebase | Database | Ecological toxicity data | https://www.epa.gov/ecotox |
| in vitroDB | Database | ToxCast high-throughput screening data | Part of EPA CompTox Chemicals Dashboard |
The integration of protein structural information represents the cutting edge of cross-species extrapolation. Recent advances have demonstrated a pipeline from SeqAPASS sequence analysis to I-TASSER-generated protein structures for comparative analysis. This approach was successfully applied to human liver fatty acid-binding protein (LFABP) and androgen receptor (AR), generating 99 LFABP and 268 AR protein models representing diverse species [15].
The structural comparisons aligned with sequence-based SeqAPASS results, providing additional evidence of LFABP and AR conservation across vertebrate species. This integration of sequence and structural data creates a more comprehensive framework for species extrapolation and enhances confidence in predictions of cross-species susceptibility.
The following diagram illustrates the integrated computational pipeline combining SeqAPASS with protein structure modeling for enhanced cross-species extrapolation:
The convergence of regulatory modernization through the FDA Modernization Act 2.0 and scientific advancement through tools like SeqAPASS represents a transformative moment for chemical safety assessment and drug development. SeqAPASS provides a scientifically robust, computationally efficient framework for addressing fundamental questions about species relevance and susceptibility, enabling more targeted testing, reduced animal use, and ultimately, more human-relevant safety assessments.
As the regulatory landscape continues to evolve toward greater acceptance of NAMs, the integration of SeqAPASS into standardized testing strategies and regulatory submissions will play a crucial role in realizing the vision of more predictive, mechanistically grounded safety assessment. The ongoing expansion of SeqAPASS capabilities to include structural comparisons and integration with other computational approaches positions this tool as a cornerstone of next-generation toxicology and risk assessment.
The global decline of pollinators, essential for ecosystem stability and agricultural productivity, represents a critical environmental challenge. Exposure to plant protection products (PPPs) is a significant contributor to this decline, with particular concern surrounding chemicals capable of inducing endocrine disruption and chronic sublethal effects [16]. Current regulatory frameworks often overlook these subtle yet population-damaging impacts in favor of assessing acute toxicity. The SeqAPASS (Sequence Alignment to Predict Across Species Susceptibility) tool, developed by the US EPA, provides a powerful bioinformatic approach to address this challenge [3]. By evaluating the conservation of protein targets across species, SeqAPASS enables researchers to predict the cross-species susceptibility of non-target organisms, such as pollinators, to specific insecticides [4]. This application note details how SeqAPASS can be deployed to investigate the molecular basis of insecticide action and endocrine disruption, facilitating the development of safer agricultural chemicals and the protection of pollinator health.
SeqAPASS is a fast, online screening tool that addresses the challenge of extrapolating toxicity data from data-rich model organisms to thousands of non-target species with limited or no toxicity information [3]. Its underlying principle is that a species' sensitivity to a chemical is often determined by the presence and specific structure of protein targets that interact with the chemical once inside the body [3]. For pesticides, these protein targets are often well-defined. SeqAPASS leverages the vast National Center for Biotechnology Information (NCBI) protein database to evaluate amino acid sequence and structural similarity, thereby identifying whether a specific protein target implicated in chemical toxicity is present and conserved in other species [3].
The tool's flexibility is manifested in its three-tiered analytical approach, which moves from broad to highly specific assessments. This allows users to capitalize on existing knowledge about chemical-protein interactions in sensitive species and provides a quantitative, screening-level line of evidence for predicting susceptibility across the tree of life [4]. This capability is indispensable for prioritizing chemicals for further testing, selecting ecologically relevant species for risk assessment, and understanding the potential ecological relevance of adverse outcome pathways.
SeqAPASS has been successfully applied to several critical areas concerning pollinator health and insecticide mode of action. The following table summarizes three prominent case studies.
Table 1: Key Case Studies Demonstrating SeqAPASS Application in Pollinator Research
| Case Study | Chemical Class | Protein Target | SeqAPASS Application & Findings |
|---|---|---|---|
| Neonicotinoid Insecticides [4] | Neonicotinoids (e.g., imidacloprid) | Nicotinic Acetylcholine Receptor (nAChR) | Used to evaluate the potential chemical susceptibility of honey bees and other bee species by comparing protein target similarity to known sensitive pest species [3]. |
| Molting-Accelerating Compounds [4] | Molt-accelerating compounds (e.g., tebufenozide) | Ecdysone Receptor | A cross-species comparison of the protein sequence in the tobacco budworm (a target pest) was used to predict the potential susceptibility of non-target insects, including honey bees and earthworms [3]. |
| Endocrine Disruption in Bees [16] | Various insecticides (e.g., fipronil, azadirachtin) | Endocrine system components (e.g., vitellogenin) | Proposed use of SeqAPASS to investigate endocrine pathways. Analysis of conserved proteins can predict potential for disrupted reproduction (queens/drones) and premature behavioral transition (nurse to forager bees) [16]. |
This protocol provides a step-by-step guide for using SeqAPASS to assess the potential susceptibility of a non-target pollinator species to a specific chemical.
Table 2: Research Reagent Solutions for SeqAPASS Analysis
| Item | Function / Description |
|---|---|
| Known Sensitive Species | Provides the query protein sequence from an organism known to be sensitive to the chemical of interest (e.g., a pest insect for an insecticide) [4]. |
| Protein Sequence Data | The amino acid sequence(s) of the specific protein target (e.g., receptor, enzyme) from the sensitive species, often retrieved from NCBI Protein database [3]. |
| Chemical-Protein Interaction Data | Information on specific amino acid residues, functional domains, or protein structures critical for the chemical's binding and action [4]. |
| List of Non-Target Species | The taxonomic list of species for which susceptibility will be predicted (e.g., Apis mellifera, Bombus terrestris) [3]. |
Procedure:
Endocrine disruption in pollinators is an emerging threat that extends beyond acute lethality, potentially causing population-level declines through impaired reproduction, development, and behavior [16]. In honey bees, documented effects include reduced reproductive success of queens and drones and the premature behavioral transition of nurse bees to foragers, which can destabilize colony dynamics [16]. These disruptions are linked to insecticides from several chemical classes, including neonicotinoids, fipronil, and azadirachtin [16]. The challenge for regulators and researchers is that standardized testing guidelines (e.g., OECD) for endocrine disruption in bees are currently lacking. The SeqAPASS tool offers a pathway to address this gap by identifying conserved endocrine pathways across species, thus predicting which chemicals are likely to act as endocrine disruptors in pollinators based on their known action in other organisms.
A key endocrine-related protein in honey bees is vitellogenin, which acts as a storage protein but also regulates behavioral maturation and foraging onset [16]. Chemicals that disrupt the hormonal control of vitellogenin or interact directly with its receptor can have profound effects on colony health. The diagram below illustrates a simplified signaling pathway for endocrine disruption in a pollinator, highlighting potential sites of chemical interference.
This protocol outlines a combined in silico and in vivo approach to screen and confirm the endocrine-disrupting potential of a chemical in pollinators.
Procedure:
The following diagram illustrates the comprehensive workflow for integrating SeqAPASS predictions with laboratory validation to assess the risk of insecticides to pollinators.
To commence an analysis with the Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool, the first step involves account creation and platform access. The SeqAPASS tool is a freely available, web-based application provided by the U.S. Environmental Protection Agency (EPA) that requires user registration to run, store, and customize jobs [5].
Step-by-Step Protocol:
Prior to submitting a computational query, it is essential to identify a specific protein target and a known sensitive species through a review of existing literature or pre-existing data [5]. The sensitivity of a species to a chemical is often determined by the presence and conservation of specific proteins that interact with chemicals, and a majority of these proteins are curated in the National Center for Biotechnology Information (NCBI) protein database [3].
Step-by-Step Protocol:
Table 1: Recommended Resources for Query Protein Identification
| Resource Name | Description | Primary Utility in SeqAPASS Context |
|---|---|---|
| NCBI Protein Database | A comprehensive repository of protein sequences from more than 95,000 organisms [3]. | The primary source for amino acid sequence data used by SeqAPASS for cross-species comparisons. |
| CompTox Chemicals Dashboard | A EPA database providing access to chemistry, toxicity, and exposure data for chemicals [5]. | Helps identify protein targets for specific chemicals of interest. |
| AOP-Wiki | A crowd-sourced knowledge base on Adverse Outcome Pathways (AOPs) [5]. | Aids in defining the Molecular Initiating Event (MIE) for a toxicological pathway, which often involves a specific protein-chemical interaction. |
| Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank | A database for 3D structural data of large biological molecules [18]. | Used to obtain protein structures for Level 4 analysis in SeqAPASS v7.0 and later. |
| AlphaFold | An AI system that predicts a proteinâs 3D structure from its amino acid sequence [18]. | Used to generate or obtain protein structures for Level 4 analysis. |
The following diagram illustrates the logical workflow for identifying a query protein and a known sensitive species, which is a critical prerequisite before starting a SeqAPASS analysis.
The following table details the key computational and data resources essential for successfully initiating a SeqAPASS analysis.
Table 2: Essential Research Reagents and Resources for SeqAPASS Analysis Initiation
| Item/Tool | Category | Function in Analysis |
|---|---|---|
| SeqAPASS Web Tool | Software Application | The primary online platform for performing cross-species susceptibility predictions via sequence and structural comparisons [3] [5]. |
| NCBI Protein Database | Data Repository | The source of over 153 million protein sequences used for sequence alignment and conservation analysis across taxonomic groups [3]. |
| CompTox Chemicals Dashboard | Data Resource | Aids in the initial identification of protein targets for specific chemicals, informing the choice of query protein [5]. |
| AOP-Wiki | Knowledge Base | Provides context on Adverse Outcome Pathways, helping to establish the relevance of a protein-chemical interaction as a Molecular Initiating Event [5]. |
| Chrome Web Browser | Software | The recommended browser for optimal compatibility and performance of the SeqAPASS web interface [5]. |
| Iterative Threading ASSEmbly Refinement (I-TASSER) | Modeling Tool | Integrated into SeqAPASS v7.0+ to generate protein structures for advanced Level 4 structural evaluations [18]. |
| Leucomycin A9 | Leucomycin A9, CAS:18361-49-4, MF:C37H61NO14, MW:743.9 g/mol | Chemical Reagent |
| cis-2-Dodecenoic acid | cis-2-Dodecenoic acid, CAS:4412-16-2, MF:C12H22O2, MW:198.30 g/mol | Chemical Reagent |
After identifying the query protein and sensitive species, users can proceed to run a SeqAPASS query. The tool employs a tiered approach to extrapolate toxicity information from data-rich model organisms to thousands of other species [3] [4]. The core of a SeqAPASS analysis involves three progressive levels of comparison, with a fourth level added in recent versions.
Table 3: Levels of Analysis in the SeqAPASS Tool
| Analysis Level | Technical Description | Taxonomic Resolution & Application |
|---|---|---|
| Level 1: Primary Amino Acid Sequence | Compares the entire primary amino acid sequence of the query protein to sequences from all species with available data, using BLASTp algorithms to calculate a metric for sequence similarity and identify orthologs [5] [4]. | Provides a broad, screening-level prediction of susceptibility across diverse taxa. Serves as the initial line of evidence. |
| Level 2: Functional Domain Comparison | Evaluates sequence similarity within selected functional domains (e.g., a ligand-binding domain) that are critical for the specific chemical-protein interaction [4]. | Offers higher taxonomic resolution than Level 1 by focusing on the functionally relevant region of the protein. |
| Level 3: Critical Amino Acid Residue Comparison | Compares individual amino acid residue positions known to be important for protein conformation and/or direct interaction with the chemical [4] [19]. | Provides the highest resolution for species-specific predictions. Requires detailed knowledge of the key residues involved in the interaction. |
| Level 4: Protein Structural Evaluation (v7.0+) | Allows users to incorporate protein structural alignments using generated or imported structures (e.g., from PDB or AlphaFold) to assess structural conservation [18]. | Adds a powerful line of evidence based on 3D protein conformation, further refining susceptibility predictions. |
Understanding the intrinsic susceptibility of diverse species to chemicals is a fundamental challenge in ecological risk assessment and translational toxicology. The Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool, developed by the U.S. Environmental Protection Agency, addresses this challenge by leveraging computational biology to predict chemical susceptibility across species based on protein target conservation [3] [4]. The tool operates through three tiers of analysis, with Level 1 serving as the foundational screen. Level 1 analysis performs a whole protein sequence comparison to rapidly identify orthologsâproteins in different species that share a common ancestor and typically, a similar functionâacross the taxonomic spectrum [5] [4]. This initial evaluation provides a critical first line of evidence for determining whether a protein target known to interact with a chemical in a well-studied model organism (e.g., human, rat, or zebrafish) is likely present in thousands of other species, thereby offering a screening-level prediction of potential susceptibility [5] [3].
The core premise of Level 1 analysis is that the primary amino acid sequence of a protein determines its fundamental structure and function. If a chemical interacts with a specific protein in a sensitive species, then other species possessing a highly similar protein sequence are preliminarily predicted to be susceptible to that same chemical [4]. This principle of sequence-structure-function relationship enables high-throughput extrapolation from data-rich model organisms to data-poor species.
SeqAPASS automates this process by mining and compiling protein sequences from the National Center for Biotechnology Information (NCBI) protein database, a comprehensive repository containing over 153 million proteins from more than 95,000 organisms [5] [3]. The Level 1 analysis utilizes the Protein Basic Local Alignment Search Tool (BLASTp) algorithm to compare a user-provided "query" protein sequence from a species of known sensitivity against this vast database [5]. The tool calculates quantitative metrics of sequence similarity and uses them to identify potential orthologs and generate initial susceptibility predictions for all species with available sequence data.
Before initiating a Level 1 analysis, researchers must complete two prerequisite steps. First, a SeqAPASS user account must be created via the official website (https://seqapass.epa.gov/seqapass/). This account allows users to run, store, access, and customize their analysis jobs [5]. Second, the protein target and a sensitive species must be identified through a review of existing literature or pre-existing toxicological data. The query protein is the molecular target against which all other species will be compared. SeqAPASS provides integrated links to external resources like the CompTox Chemicals Dashboard and AOP-Wiki to assist in this identification process [5].
The following protocol outlines the specific steps for performing a Level 1 analysis as detailed in the SeqAPASS documentation [5]:
https://seqapass.epa.gov/seqapass/ using the Chrome web browser and log in to your SeqAPASS account.Upon completion, SeqAPASS generates a Level 1 report containing several key components for data interpretation:
The following table details the essential "research reagents," or core components, required to perform a Level 1 analysis.
Table 1: Essential Components for SeqAPASS Level 1 Analysis
| Component | Function in the Analysis | Source |
|---|---|---|
| Query Protein Sequence | Serves as the reference sequence for all cross-species comparisons. It can be provided in FASTA format or via an NCBI accession number. | Researcher-provided or NCBI Protein Database [5] |
| NCBI Protein Database | The comprehensive source database against which the query sequence is compared. Contains millions of sequenced proteins from thousands of organisms. | National Center for Biotechnology Information (NCBI) [5] [3] |
| BLASTp Algorithm | The core computational engine that performs the primary amino acid sequence alignment and calculates metrics of sequence similarity (E-value, percent identity). | Integrated into SeqAPASS backend [5] |
| Sensitive/Target Species | The organism from which the query protein is derived and for which chemical susceptibility data is known. Used to contextualize the predictions. | Researcher-defined based on literature [5] |
The diagram below illustrates the logical flow and key steps of the SeqAPASS Level 1 analysis protocol.
Level 1 analysis has been successfully applied in numerous research contexts to address cross-species extrapolation challenges:
The SeqAPASS tool has undergone significant version updates since its initial launch, with each release enhancing its functionality and user interface. The table below summarizes the key developments relevant to Level 1 analysis.
Table 2: Evolution of SeqAPASS Tool Features [5]
| SeqAPASS Version | Date | Key Features and Updates Relevant to Level 1 |
|---|---|---|
| v1.0 | Jan 2016 | Initial public release with core Level 1 and Level 2 functionality. |
| v2.0 | May 2017 | Added capability to modify default settings for Level 1 reports. |
| v3.0 | Mar 2018 | Introduced interactive data visualization capabilities (density plots) for Level 1 results. |
| v4.0 | Oct 2019 | Added links to external databases (CompTox Dashboard, AOP-Wiki) to help identify query proteins. |
| v5.0 | Dec 2020 | Launched customizable summary reports for synthesizing data across all analysis levels. |
| v6.0 | Sep 2021 | Implemented a widget to connect SeqAPASS predictions directly to empirical toxicity data in the ECOTOX Knowledgebase from the Level 1 results page. |
The Level 1 analysis for whole protein sequence comparison and ortholog identification represents a powerful, efficient, and accessible first step in cross-species susceptibility assessment. By leveraging publicly available protein sequences and robust bioinformatics algorithms, it allows researchers and regulators to rapidly screen thousands of species and generate hypotheses about potential chemical susceptibility. This protocol provides the necessary framework for scientists to confidently employ SeqAPASS Level 1 analysis, thereby supporting more informed decision-making in chemical prioritization, species selection for testing, and the extrapolation of toxicological data in both ecological and human health contexts.
The Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool is a fast, freely available online screening tool developed by the US Environmental Protection Agency that enables researchers and regulators to extrapolate toxicity information across species [3] [20]. The tool addresses a critical challenge in toxicology and chemical safety assessment: predicting chemical susceptibility for thousands of species for which toxicity data are limited or non-existent, by leveraging existing data from model organisms [4] [3]. SeqAPASS operates on the fundamental principle that a species' intrinsic susceptibility to a particular chemical is determined by the presence and conservation of specific protein targets with which that chemical interacts [3].
SeqAPASS performs this assessment through three tiered levels of analysis, each providing increasing taxonomic resolution and specificity [20]. Level 1 compares entire primary amino acid sequences across species to identify potential orthologs. Level 2, the focus of this protocol, narrows the comparison to specific functional domains (e.g., ligand-binding domains) directly involved in the chemical-protein interaction [4] [20]. Level 3 provides the most granular analysis by evaluating conservation at individual amino acid residues known to be critical for chemical binding or protein function [20] [5]. This progressive approach allows researchers to capitalize on existing knowledge about chemical-protein interactions, with Level 2 serving as a crucial intermediate step that balances specificity with practical applicability when full residue-level data may be incomplete [4].
Table 1: SeqAPASS Analysis Levels and Their Applications
| Analysis Level | Comparison Focus | Taxonomic Resolution | Information Required |
|---|---|---|---|
| Level 1 | Primary amino acid sequence | Broad (e.g., phylum, class) | Protein sequence from a sensitive species |
| Level 2 | Functional domains (e.g., LBD) | Intermediate (e.g., order, family) | Domain boundaries and potential critical residues |
| Level 3 | Individual amino acid residues | High (e.g., species, population) | Specific residues critical for chemical interaction |
The ligand-binding domain (LBD) is a specialized protein region responsible for the specific binding of signaling molecules (ligands) such as hormones, pharmaceuticals, or environmental contaminants [21]. In nuclear receptors, the LBD is located at the C-terminal half of the receptor protein and adopts a globular α-helical folded structure that forms a hydrophobic binding pocket for the ligand [21]. This domain is evolutionarily conserved across diverse species, making it an ideal focus for cross-species susceptibility predictions [22] [23].
The LBD serves multiple essential functions beyond simple ligand binding. It contains the activation function-2 (AF-2) domain, which is responsible for ligand-mediated recruitment of transcriptional co-regulators [21] [22]. Upon ligand binding, the LBD undergoes a significant conformational change, particularly in helix 12, which acts like a lid to enclose the ligand within the binding pocket [21]. This structural rearrangement creates new surfaces for interaction with coactivator proteins and facilitates receptor dimerizationâa critical step in the signaling cascades of many nuclear receptors [21] [22]. The precision of this molecular mechanism explains why specific amino acid conservation within the LBD directly impacts species susceptibility to chemicals that target these pathways [4] [21].
Comparative structural analyses have revealed that LBDs maintain conserved architectural features across vast evolutionary distances. Recent research has identified a remarkable structural similarity between the LBD of human estrogen receptor alpha (ERα) and bacterial chemotaxis receptors, despite significant sequence divergence [23]. This conservation in structural folds, even with low sequence identity, suggests that fundamental protein architectures remain preserved for specific functions throughout evolution [23].
Phylogenetic studies of nuclear receptor LBDs have identified four distinct monophyletic branches and seven conserved signaling motifs with amino acid repeating patterns ('LxxLL' or 'LLxxL') that are critical for protein-protein interactions in signaling cascades [22]. These structural and functional conservation patterns provide the theoretical basis for using domain-level comparisons in cross-species susceptibility assessments. The preservation of these architectural features means that comparing LBD sequences can identify functionally equivalent targets across diverse species, even when overall protein sequence similarity is relatively low [22] [23].
Before initiating a Level 2 analysis, researchers must first identify and gather specific information about the protein target and domain of interest. This foundational step is crucial for designing an effective and interpretable analysis [20] [5].
Step 1: Identify a Query Protein and Sensitive Species
Step 2: Determine the Relevant Functional Domain
Table 2: Essential Preliminary Information for Level 2 Analysis
| Information Category | Specific Requirements | Recommended Resources |
|---|---|---|
| Query Protein | NCBI Protein Accession ID or FASTA sequence | NCBI Protein Database, UniProt |
| Sensitive Species | Taxonomic name and protein identifier | Literature review, ECOTOX Knowledgebase |
| Functional Domain | Domain name and boundary residues | NCBI Conserved Domains Database (CDD) |
| Chemical-Protein Interaction | Mechanism of action and known critical regions | Scientific literature, AOP-Wiki |
The following protocol provides detailed instructions for performing a Level 2 analysis using the SeqAPASS tool. This workflow assumes the user has already completed the preliminary information gathering steps described in Section 3.1 [20] [5].
Figure 1: SeqAPASS Level 2 Analysis Workflow. This diagram illustrates the sequential steps for performing a functional domain comparison using the SeqAPASS tool.
Step 1: Access the SeqAPASS Platform
Step 2: Complete Prerequisite Level 1 Analysis
Step 3: Initiate Level 2 Analysis
Step 4: Configure Analysis Parameters and Submit Query
Step 5: Interpret and Visualize Results
Interpreting Level 2 analysis results requires understanding several key bioinformatic metrics and their relationship to predictions of cross-species susceptibility. SeqAPASS calculates quantitative measures of sequence similarity within the specified functional domain that serve as the basis for susceptibility predictions [4] [20].
The E-value (Expect value) is a primary metric that assesses the statistical significance of sequence alignments, with lower E-values indicating greater confidence that the alignment is not due to chance alone. The percentage sequence identity within the functional domain provides a straightforward measure of conservation, while the alignment score reflects the overall quality of the alignment considering both matches and gaps [20] [5].
For susceptibility predictions, SeqAPASS uses these metrics to calculate a susceptibility cutoff value that distinguishes between potentially susceptible and non-susceptible species. The tool provides both automated predictions based on statistical distributions and customizable thresholds that can be adjusted based on expert judgment or additional experimental evidence [4] [20].
Table 3: Key Metrics for Interpreting Level 2 Analysis Results
| Metric | Definition | Interpretation Guidance |
|---|---|---|
| E-value | Statistical significance of alignment | E-value < 1e-10 indicates strong confidence; < 1e-5 indicates moderate confidence |
| Sequence Identity | Percentage of identical amino acids in domain | Higher percentage suggests greater functional conservation |
| Alignment Score | Quantitative measure of alignment quality | Higher scores indicate better overall alignment considering matches and gaps |
| Susceptibility Cutoff | Threshold for predicting susceptibility | Automated calculation with optional manual adjustment based on expert judgment |
SeqAPASS Version 5.0 and later include enhanced data visualization capabilities that facilitate interpretation of Level 2 results. The customizable box-plot graphics provide an intuitive display of distribution patterns in sequence conservation across taxonomic groups, allowing rapid identification of potentially susceptible and non-susceptible lineages [3] [20].
The heat map visualization function enables simultaneous comparison of conservation across multiple taxonomic groups and specific domain regions, highlighting patterns that might be overlooked in tabular data. For comprehensive reporting, the Decision Summary Report function allows researchers to synthesize findings from all three levels of analysis into a single downloadable PDF document suitable for regulatory submissions or scientific publications [20] [5].
Additionally, the interoperability with the ECOTOX Knowledgebase enables researchers to compare sequence-based susceptibility predictions with existing empirical toxicity data, providing a powerful approach for validating predictions and identifying discrepancies that may reveal novel biological insights or technical limitations [3] [5].
Successful implementation of SeqAPASS Level 2 analysis requires access to various bioinformatic resources and computational tools. The following table outlines essential reagents and resources for conducting comprehensive functional domain analyses [3] [20] [5].
Table 4: Essential Research Reagents and Computational Resources
| Resource Category | Specific Tool/Database | Primary Function in Level 2 Analysis |
|---|---|---|
| Sequence Databases | NCBI Protein Database | Source of protein sequences for thousands of species; contains over 153 million proteins from >95,000 organisms |
| Domain Identification | NCBI Conserved Domains Database (CDD) | Identifies functional domain boundaries and characteristic motifs for query proteins |
| Structural Resources | RCSB Protein Data Bank (PDB) | Provides 3D structural information for understanding domain architecture and ligand interactions |
| Taxonomic Classification | NCBI Taxonomy Database | Standardized taxonomic framework for consistent species classification and comparison |
| Alignment Algorithms | COBALT (Constraint-based Alignment Tool) | Performs multiple sequence alignments using conserved domain information |
| Similarity Search | BLASTP (Protein BLAST) | Identifies similar protein sequences across species based on primary sequence |
| Toxicity Data Integration | ECOTOX Knowledgebase | Links sequence-based predictions to empirical toxicity data for validation |
The practical application of SeqAPASS Level 2 analysis is demonstrated through several published case studies that highlight its utility in addressing diverse research questions in chemical risk assessment and comparative toxicology.
Case Study 1: Predicting Pollinator Susceptibility to Neonicotinoid Insecticides
Case Study 2: Assessing Cross-Species Susceptibility to Strobilurin Fungicides
Case Study 3: Estrogen Receptor Activation Across Vertebrate Species
Level 2 analysis using SeqAPASS plays a particularly valuable role in the development and assessment of Adverse Outcome Pathways (AOPs) by providing evidence for the conservation of molecular initiating events across species [4] [3]. This application supports the use of AOP frameworks in regulatory contexts by establishing taxonomic applicability domains for these knowledge structures.
The tool also aligns with the broader shift toward New Approach Methodologies (NAMs) in toxicology by providing a cost-effective, computationally efficient method for extrapolating data from model systems to diverse species without additional animal testing [20] [5]. This application is particularly valuable for addressing the challenges of assessing chemical safety across the thousands of species potentially impacted by environmental chemical exposures but for which traditional toxicity testing is impractical or unethical [3] [20].
Researchers may encounter specific technical challenges when performing Level 2 analyses. The following table outlines common issues and recommended solutions based on the SeqAPASS user experience [20] [5].
Table 5: Troubleshooting Guide for Level 2 Analysis
| Common Challenge | Potential Causes | Recommended Solutions |
|---|---|---|
| No domains listed in Level 2 menu | Query protein not properly processed in Level 1 | Verify Level 1 completed successfully; check protein accession number |
| Unexpected susceptibility predictions | Incorrect domain selection or inappropriate cutoff values | Verify domain selection using NCBI CDD; adjust susceptibility cutoff based on biological knowledge |
| Incomplete taxonomic coverage | Limited sequence data for species of interest | Use "By Accession" to add specific sequences not automatically included |
| Ambiguous results for certain taxa | Partial domain conservation or sequence fragments | Proceed to Level 3 analysis for critical residue comparison |
| Difficulty interpreting visualizations | Complex taxonomic patterns or overlapping distributions | Use filtering options to focus on specific taxonomic groups; consult User Guide |
While SeqAPASS Level 2 analysis provides valuable insights for cross-species extrapolation, researchers should recognize several important limitations. The approach assumes that sequence similarity within functional domains correlates with functional conservation, which generally holds true but may have exceptions due to complex factors such as compensatory mutations, allosteric regulation, or post-translational modifications [20].
The predictions generated by Level 2 analysis represent relative intrinsic susceptibility based solely on protein target conservation and do not incorporate other important determinants of chemical susceptibility such as toxicokinetics, metabolic capacity, tissue distribution, or compensatory physiological mechanisms [4] [20]. Additionally, the analysis depends entirely on the quality and completeness of available sequence data in public databases, which varies substantially across taxonomic groups [3] [20].
Users should therefore interpret Level 2 results as a screening-level assessment that provides compelling evidence for prioritizing further testing or research rather than as a definitive determination of chemical sensitivity or safety [4] [20]. The tool is most powerful when integrated with other lines of evidence, including empirical toxicity data, in vitro assay results, and physiological knowledge of the species of interest [3] [5].
The Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool, developed by the U.S. Environmental Protection Agency, is a fast, freely available, online screening application that allows researchers and regulators to extrapolate toxicity information across species [5]. The tool operates through three tiers of analysis, with Level 3 representing the most refined evaluation, focusing on individual amino acid residue comparisons at key positions involved in protein-chemical interactions [4] [1]. This level of analysis provides the highest taxonomic resolution for predicting cross-species susceptibility by specifically examining the conservation of amino acids that are critical for binding a chemical, maintaining protein conformation, or facilitating transcriptional activation [5]. Level 3 analysis is particularly valuable because specific variations in amino acid identities at these key positions can dramatically alter or even abolish protein-chemical interactions, leading to significant differences in species sensitivity to various chemicals [24] [1].
An amino acid residue is defined as an amino acid molecule that has been incorporated into a peptide chain, losing the elements of water in the process and characterized by its specific side chain properties [25]. For Level 3 analysis, understanding the biochemical properties of these side chains is essential because substitutions between residues with similar functional properties and molecular dimensions often preserve protein-chemical interactions, while substitutions with different properties may disrupt binding [24] [1]. The 20 naturally occurring amino acids are categorized based on their side chain properties:
Successful Level 3 analysis requires a priori knowledge of the specific amino acid residues critical for chemical-protein interaction. This information can be obtained from several sources:
The following diagram illustrates the comprehensive workflow for conducting a Level 3 analysis in SeqAPASS:
Research using in silico site-directed mutagenesis coupled with docking simulations has established rules for interpreting how amino acid substitutions affect protein-chemical interactions [24] [1]:
Table 1: Essential Research Reagents and Computational Tools for SeqAPASS Level 3 Analysis
| Resource Type | Specific Examples | Function in Level 3 Analysis |
|---|---|---|
| Protein Databases | NCBI Protein Database (>95 million proteins) | Source of protein sequences for cross-species comparison [3] [1] |
| Bioinformatics Tools | BLASTp, COBALT | Algorithms for sequence alignment and ortholog detection [5] |
| Structural Resources | Protein Data Bank (PDB) | Source of crystal structures for identifying critical residues [24] |
| Computational Modeling Software | Molecular docking programs | In silico site-directed mutagenesis and binding affinity simulations [24] [1] |
| Visualization Tools | SeqAPASS integrated heat maps | Customizable visualization of susceptibility predictions across species [5] |
Table 2: Documented Applications of SeqAPASS Level 3 Analysis in Chemical Susceptibility Prediction
| Protein Target | Chemical Class | Key Findings | Reference |
|---|---|---|---|
| Acetylcholinesterase (AChE) | Organophosphates, Carbamates | Identified specific amino acid substitutions that confer differential sensitivity across species [24] [1] | |
| Ecdysone Receptor (EcR) | Diacylhydrazines | Determined key residues in ligand-binding domain that explain species-specific susceptibility to molt-accelerating compounds [24] [1] | |
| Opioid Receptors | Opioid compounds | Evaluated conservation of binding sites across species to predict susceptibility to opioid chemicals [5] | |
| Transthyretin | Endocrine-disrupting chemicals | Assessed cross-species relevance of thyroxine-binding sites for chemical susceptibility prediction [5] |
The application of Level 3 analysis to AChE demonstrates the power of this approach. Through in silico site-directed mutagenesis and docking simulations, researchers identified specific amino acid positions critical for binding of organophosphate and carbamate insecticides [24] [1]. The analysis revealed that:
For the EcR, Level 3 analysis helped explain species-specific susceptibility to molt-accelerating insecticides [24] [1]:
While SeqAPASS Level 3 analysis provides powerful insights for cross-species susceptibility predictions, users should be aware of several important considerations:
The SeqAPASS tool continues to evolve, with recent versions incorporating improved visualization capabilities, interoperability with toxicity databases, and enhanced summary reports to support researchers in applying Level 3 analysis for chemical safety assessment and drug development [5] [3].
The accurate prediction of protein three-dimensional (3D) structure is a cornerstone of modern biological research, with profound implications for understanding cellular functions, disease mechanisms, and drug discovery. Within the specific context of cross-species susceptibility research using tools like the Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS), protein structure modeling provides critical insights that extend beyond what sequence analysis alone can offer. SeqAPASS is a web-based screening tool developed by the EPA that enables researchers and regulators to extrapolate toxicity information from data-rich model organisms to thousands of other non-target species by evaluating protein sequence and structural similarities [3]. The integration of high-accuracy structure prediction tools like I-TASSER into the SeqAPASS workflow significantly enhances the capability to evaluate potential chemical susceptibility across diverse taxonomic groups.
The I-TASSER (Iterative Threading ASSEmbly Refinement) server represents one of the most sophisticated platforms for automated protein structure and function prediction, having been consistently ranked as a top performer in community-wide blind protein structure prediction experiments [26]. This application note details protocols for leveraging I-TASSER within cross-species susceptibility research frameworks, providing researchers with methodologies to generate high-quality protein models that can inform SeqAPASS analyses at multiple levelsâfrom primary sequence comparison to evaluation of functional domains and binding site conservation [4]. The complementary nature of these tools enables more robust assessment of potential chemical interactions with protein targets across diverse species, ultimately supporting prioritization of chemicals for further evaluation, selection of appropriate test species, and extrapolation of empirical toxicity data.
I-TASSER employs a hierarchical approach to protein structure modeling that combines template-based modeling with ab initio techniques for regions where suitable templates are unavailable. The algorithm operates on the principle of fragment assembly guided by spatial restraints derived from multiple sources, followed by iterative refinement to identify low-free energy states [26]. This methodology is particularly valuable for cross-species research as it can generate reliable models even for proteins with only distant homologs of known structure, a common scenario when working with non-model organisms.
The I-TASSER pipeline proceeds through four consecutive stages: (1) identification of structural templates from the Protein Data Bank (PDB) using meta-threading approaches; (2) fragment assembly and replica-exchange Monte Carlo simulations to construct full-length models; (3) atomic-level refinement to build high-resolution structures; and (4) structure-based functional annotations [27]. Each stage incorporates multiple sources of information, including sequence-based contact predictions, hydrogen-bonding networks, and knowledge-based statistical potentials derived from known protein structures [26].
Table 1: I-TASSER Algorithmic Components and Functions
| Component | Function | Significance in Prediction |
|---|---|---|
| LOMETS | Meta-threading server that combines multiple threading algorithms | Identifies structural templates with similar folds or super-secondary structures |
| Replica-exchange Monte Carlo | Sampling method for conformational space exploration | Assembles continuous fragments from templates while building loops ab initio |
| SPICKER | Clustering algorithm for structural decoys | Identifies low free-energy states from simulation trajectories |
| REMO | Hydrogen-bonding network optimization | Constructs full-atomic models from C-alpha traces |
| C-score | Confidence score ranging from [-5,2] | Estimates model quality without knowledge of native structure |
Protocol 1: Preparing Protein Sequences for Cross-Species Modeling
Sequence Acquisition: Obtain protein sequences of interest for both data-rich and data-poor species. For SeqAPASS-integrated studies, this typically begins with the primary sequence of a protein with known chemical interaction in a model organism (e.g., human, rat, or zebrafish) [3]. The National Center for Biotechnology Information (NCBI) protein database provides over 153 million protein sequences representing more than 95,000 organisms [3].
Sequence Validation: Verify sequence integrity by checking for ambiguous residues, ensuring proper amino acid coding, and confirming sequence length. Remove any non-standard residues that might interfere with structure modeling.
Sequence Formatting: Format sequences in FASTA format with a single-line header beginning with ">" followed by sequence identifier and relevant metadata (e.g., species, protein name). The sequence data should follow in standard one-letter amino acid code.
Server Access: Navigate to the I-TASSER server (accessible through https://seq2fun.dcmb.med.umich.edu/I-TASSER/) and create a user account if required. Academic use is typically free of charge [26].
Job Submission: Upload the FASTA formatted sequence through the web interface. For cross-species studies involving multiple proteins, utilize the batch submission option where available. Specify any known structural constraints or preferred templates if experimental data suggests their relevance.
Protocol 2: Optimizing I-TASSER Parameters for Comparative Analysis
Template Exclusion Settings: When modeling proteins from understudied species, avoid over-reliance on templates from distantly related taxa by selectively excluding certain species if biological knowledge suggests significant structural divergence.
Restraint Specification: Utilize the "Advanced Parameters" section to input user-specified distance restraints when available from experimental data (e.g., cross-linking mass spectrometry, FRET) or evolutionary co-variance analyses. This significantly improves model accuracy for proteins with few homologs.
Model Generation Settings: Select the option to generate all five predicted models rather than only the top-ranked model, as lower-ranked models may occasionally provide better representations for certain structural features relevant to chemical binding.
Function Annotation Options: Enable all function prediction modules (EC number, GO terms, ligand-binding sites) to facilitate subsequent cross-species comparisons within the SeqAPASS framework.
Quality Assessment Metrics: Note that I-TASSER provides confidence scores (C-score) for each model, with values > -1.5 generally indicating correct fold prediction [26]. The predicted TM-score and RMSD for the first model provide additional quality estimates.
Protocol 3: Analyzing I-TASSER Results for Susceptibility Assessment
Model Quality Evaluation: Review the C-scores for all generated models. Higher C-scores (closer to 2) indicate higher confidence predictions. For the first model, I-TASSER provides estimated TM-scores and RMSD values relative to the hypothetical native structure [26].
Structural Alignment Assessment: Examine the top structurally similar proteins identified by TM-align. These represent known structures with the greatest similarity to your predicted model and may provide insights into potential functional mechanisms.
Function Annotation Analysis: Review the predicted Enzyme Commission (EC) numbers, Gene Ontology (GO) terms, and ligand-binding sites generated by COFACTOR. These annotations are particularly valuable for hypothesizing protein function in non-model organisms [26].
Binding Site Characterization: For susceptibility applications, pay particular attention to predicted ligand-binding sites, as conservation of these regions across species often determines chemical susceptibility [4]. Compare these binding sites across models from different species.
Comparative Analysis: Import predicted structures into molecular visualization software (e.g., PyMOL, UCSF Chimera) for side-by-side comparison of binding pocket architectures, surface properties, and residue orientations that might influence chemical interactions.
I-TASSER has been extensively evaluated through the Critical Assessment of Protein Structure Prediction (CASP) experiments, community-wide blind tests of structure prediction accuracy. In multiple CASP experiments, I-TASSER has been ranked as the top-performing automated server, demonstrating its robustness across diverse protein targets [26]. The algorithm's performance is particularly notable for proteins that lack close homologs in structural databases, making it well-suited for cross-species applications involving non-model organisms.
Table 2: I-TASSER Performance Metrics in CASP Experiments
| CASP Experiment | Rank | Key Performance Highlights |
|---|---|---|
| CASP7 (2006) | No. 1 Server | Demonstrated superior performance in both template-based and free-modeling categories |
| CASP8 (2008) | No. 1 Server | excelled in fold recognition and atomic-level refinement |
| CASP9 (2010) | No. 1 Server | Top performer in 3D structure prediction; I-TASSER and QUARK servers ranked No. 1 and 2 |
| CASP10 (2012) | No. 1 Server | Maintained leading position in server section; QUARK ranked No. 2 |
The accuracy of I-TASSER models is quantitatively assessed using several metrics. The TM-score (Template Modeling Score) measures structural similarity between predicted and native structures, with scores >0.5 indicating correct topology and scores <0.17 representing random similarity [26]. The C-score (confidence score) shows a strong correlation with model accuracy, with a correlation coefficient of 0.91 with TM-score to the native structure [26]. This relationship allows researchers to estimate model quality without knowledge of the true structure, which is particularly valuable when working with proteins from non-model organisms where experimental structures are unavailable.
The integration of I-TASSER with SeqAPASS creates a powerful synergistic workflow for cross-species susceptibility assessment. While SeqAPASS provides a structured framework for evaluating sequence and structural similarity across taxonomic groups, I-TASSER enhances this capability by generating high-quality structural models for proteins that may lack experimental structures in key species of interest [3]. This integration operates across the three tiers of SeqAPASS analysis:
Primary Sequence Comparison: I-TASSER models provide additional context for interpreting sequence alignment results by visualizing how sequence differences manifest as structural variations.
Functional Domain Evaluation: I-TASSER's function annotation capabilities (EC numbers, GO terms, ligand-binding sites) complement SeqAPASS's domain-level analysis by identifying key functional regions and predicting their structural characteristics [26].
Binding Site Characterization: For the most precise susceptibility assessments, I-TASSER models enable residue-level comparison of chemical interaction sites, identifying conservation of critical binding residues that may determine species-specific susceptibility [4].
The following diagram illustrates the integrated workflow combining I-TASSER structure prediction with SeqAPASS cross-species susceptibility analysis:
Diagram Title: I-TASSER and SeqAPASS Integrated Workflow
Table 3: Essential Research Tools for Integrated Structural and Susceptibility Analysis
| Tool/Category | Specific Resource | Application in Research |
|---|---|---|
| Structure Prediction Servers | I-TASSER Server | Primary structure modeling platform for generating 3D protein models from sequence |
| Comparative Modeling Tools | MODELLER | Alternative homology modeling approach for template-based structure prediction [28] |
| Cross-Species Extrapolation | SeqAPASS Tool | EPA web-based platform for predicting chemical susceptibility across species [3] |
| Structure Databases | Protein Data Bank (PDB) | Repository of experimentally determined protein structures used as templates in I-TASSER |
| Sequence Databases | NCBI Protein Database | Source of protein sequences for multiple species (>153 million sequences) [3] |
| Visualization Software | PyMOL, UCSF Chimera | Molecular graphics programs for comparative analysis of predicted structures |
| Function Annotation | COFACTOR | I-TASSER component that predicts EC numbers, GO terms, and ligand-binding sites [26] |
A practical application of the I-TASSER and SeqAPASS integration can be illustrated through assessment of cross-species susceptibility to endocrine-disrupting chemicals targeting estrogen receptors. The SeqAPASS tool has been utilized by EPA's Endocrine Disruptor Screening Program to evaluate the potential for chemicals that activate mammalian estrogen receptors to also affect non-mammalian species such as fish, amphibians, and birds [3]. In this context, I-TASSER can generate high-confidence models of estrogen receptor ligand-binding domains across multiple species, enabling comparative analysis of binding pocket architecture and residue conservation that determines chemical responsiveness.
The protocol for such an analysis would involve: (1) retrieving estrogen receptor sequences from human (well-studied) and multiple wildlife species (potentially less-studied); (2) generating I-TASSER models for each species; (3) comparing the predicted ligand-binding sites across models using structural alignment; (4) importing these structural insights into SeqAPASS for systematic cross-species extrapolation. This integrated approach provides a more robust basis for predicting susceptibility than sequence analysis alone, as it accounts for structural features that influence chemical binding but may not be apparent from primary sequence alignment.
In the CASP11 experiment, the integration of QUARK and I-TASSER for ab initio protein structure prediction demonstrated success in modeling free-modeling targets, with five targets successfully constructed with TM-scores above 0.4 [29]. The I-TASSER pipeline successfully modeled 60% more domains with lengths up to 204 residues compared to the QUARK pipeline alone, demonstrating its robustness for a wider range of protein targets [29]. This performance is particularly relevant for cross-species susceptibility research, as it expands the range of proteins that can be accurately modeled, including those from non-model organisms with limited template availability.
The I-TASSER server has been extensively utilized by the research community, with over 20,000 registered scientists from more than 100 countries currently using the platform [27]. This widespread adoption reflects the utility and reliability of the tool for diverse applications, including the cross-species extrapolation approaches central to SeqAPASS-based assessments.
The integration of I-TASSER for protein structure modeling within SeqAPASS-driven cross-species susceptibility research provides a powerful methodological framework that enhances the robustness of chemical safety assessments. By generating high-quality structural models for proteins across diverse taxonomic groups, I-TASSER addresses a critical gap in traditional sequence-based comparisons, enabling researchers to evaluate functional domain conservation and binding site architecture with atomic-level resolution. The protocols outlined in this application note provide researchers with practical methodologies for leveraging these complementary tools, from initial sequence submission to I-TASSER through final integrated analysis with SeqAPASS. As the field of computational toxicology continues to evolve, such integrated approaches will play an increasingly important role in addressing the challenges of cross-species extrapolation, ultimately supporting more informed chemical risk assessment and regulatory decision-making.
Cross-species susceptibility research represents a critical frontier in toxicology, ecotoxicology, and drug development, where understanding how chemicals affect diverse species is paramount for accurate risk assessment. The Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool, developed by the U.S. Environmental Protection Agency (EPA), provides a powerful, computational approach to address the fundamental challenge of extrapolating toxicity information from data-rich model organisms to thousands of non-target species with limited or no toxicity data [3]. This tool operates on the principle that conservation of molecular targets across species can serve as a robust line-of-evidence for predicting relative intrinsic susceptibility to chemical perturbation [4].
The integration of data synthesis and visualization techniques with SeqAPASS analysis transforms complex protein sequence and structural similarity data into actionable insights for research and decision-making. Effective data synthesis allows researchers to combine and condense information derived from SeqAPASS analyses to identify trends, group variations under umbrella concepts, and reduce the complexity of identified elements [30]. Meanwhile, publication-quality visualization enables the clear communication of these synthesized findings to diverse audiences, including researchers, regulators, and stakeholders in drug development. This protocol details comprehensive methodologies for synthesizing SeqAPASS data and generating high-quality visualizations suitable for scientific publications and regulatory submissions.
The SeqAPASS tool is a fast, online screening tool that leverages the extensive National Center for Biotechnology Information (NCBI) protein database, which contains information on over 153 million proteins representing more than 95,000 organisms [3]. This robust database provides the foundational data for cross-species extrapolations. SeqAPASS employs a tiered analytical approach that progresses from broad sequence comparisons to highly specific structural evaluations, with each level providing additional evidence for susceptibility predictions [4].
The SeqAPASS tool conducts evaluations at three distinct levels of complexity, each providing increasing specificity in cross-species susceptibility predictions:
Level 1 - Primary Amino Acid Sequence Comparison: This initial evaluation compares primary amino acid sequences to a query sequence from a known sensitive species, calculating quantitative metrics for sequence similarity and detecting orthologs [4]. The tool automatically determines a susceptibility cut-off based on ortholog determinations, assuming that orthologous proteins share common genetic ancestry and likely maintain similar function [1]. This level is particularly useful for distinguishing broad taxonomic patterns, such as differences between vertebrate and invertebrate susceptibility.
Level 2 - Functional Domain Evaluation: This intermediate analysis examines sequence similarity within selected functional domains, such as ligand-binding domains, which are critical for specific protein-chemical interactions [4]. By focusing on conserved functional regions, this evaluation provides greater specificity in predicting susceptibilities of specified taxonomic groups compared to Level 1 analysis.
Level 3 - Key Amino Acid Residue Position Analysis: This highest-resolution evaluation compares individual amino acid residue positions identified as critical for chemical binding, protein conformation, or other key functions [1] [4]. Level 3 analysis integrates knowledge of protein structure and protein-chemical interaction to enable precise, species-specific susceptibility predictions. The development of consistent rules for interpreting amino acid substitutions at key positions has been enhanced through in silico site-directed mutagenesis coupled with docking simulations [1].
Table 1: SeqAPASS Analysis Levels and Applications
| Analysis Level | Comparison Focus | Resolution | Primary Applications |
|---|---|---|---|
| Level 1 | Primary amino acid sequence | Broad taxonomic patterns | Distinguishing vertebrate vs. invertebrate susceptibility; ortholog detection |
| Level 2 | Functional domains | Intermediate specificity | Predicting susceptibility across taxonomic groups; focusing on conserved functional regions |
| Level 3 | Key amino acid residues | High species-specificity | Precise susceptibility predictions; identifying dramatic species-specific differences |
Data synthesis represents the process of combining and condensing information derived from data extraction to identify trends, group variations under umbrella concepts, and reduce the complexity of identified elements [30]. In the context of SeqAPASS analyses, effective data synthesis transforms raw sequence similarity metrics and structural alignment data into meaningful patterns and relationships that support cross-species susceptibility predictions.
Calculation techniques allow researchers to create new data points from raw SeqAPASS outputs. These techniques are particularly valuable for deriving metrics that enable cross-study comparisons and quantitative susceptibility assessments:
Sequence Similarity Metrics: Calculate percentage similarity scores between query sequences (from known sensitive species) and target sequences (from species of concern). These metrics provide quantitative measures of conservation that can be correlated with susceptibility potential.
Ortholog Detection Statistics: Implement algorithms to identify orthologous relationships across species, providing evolutionary context for sequence conservation observations. Ortholog detection forms the basis for automatically determined susceptibility cut-offs in SeqAPASS [1].
Taxonomic Distribution Analyses: Compute distribution statistics for similarity scores across taxonomic groups to identify patterns and outliers in susceptibility predictions. These analyses can reveal phylogenetic trends in protein conservation.
Aggregation methods combine SeqAPASS data from different analyses, species, or protein targets to provide comprehensive perspectives on cross-species susceptibility:
Multi-Species Aggregation: Combine susceptibility predictions across multiple species to assess ecosystem-level impacts or identify particularly vulnerable taxonomic groups.
Multi-Chemical Aggregation: Aggregate results from multiple chemicals acting on the same protein target to evaluate the robustness of susceptibility predictions across chemical classes.
Cross-Protein Aggregation: Synthesize results from multiple protein targets within the same adverse outcome pathway to evaluate pathway conservation across species.
Visualization techniques facilitate pattern recognition and knowledge discovery from complex SeqAPASS datasets:
Heat Maps: Create cross-tabulations of categorical variables showing the volume or strength of evidence for susceptibility across taxonomic groups and protein targets [31]. Heat maps are particularly effective for identifying knowledge clusters and gaps in susceptibility predictions.
Evidence Atlases: Develop geographical visualizations of studies or susceptibility predictions when spatial context is relevant to the research question [31]. These visualizations can reveal regional patterns in protein conservation or susceptibility.
Conceptual Models: Construct logic models or theories of change that illustrate how sequence conservation translates to susceptibility through molecular interactions [31]. These models help communicate the mechanistic basis for SeqAPASS predictions.
Objective: To conduct broad-scale comparison of primary amino acid sequences across species to identify orthologs and establish baseline susceptibility predictions.
Materials:
Methodology:
Expected Outputs: Sequence similarity metrics, ortholog designations, taxonomic distribution patterns, and preliminary susceptibility classifications.
Objective: To evaluate the impact of specific amino acid substitutions at key positions on protein-chemical interactions using high-resolution analysis.
Materials:
Methodology:
Expected Outputs: High-resolution, species-specific susceptibility predictions, residue conservation patterns, and mechanistic insights into protein-chemical interactions.
Generating publication-quality graphics from SeqAPASS data requires attention to technical specifications, visual clarity, and scientific accuracy. The following workflow ensures production of high-resolution figures suitable for scientific publications.
Before visualization, SeqAPASS data must be structured and cleaned to facilitate effective graphical representation:
The visualization process combines automated outputs from SeqAPASS with specialized graphics software to achieve publication-ready figures:
Adhere to established principles for scientific visualization to enhance clarity and interpretability:
Table 2: Technical Specifications for Publication-Ready Graphics
| Parameter | Minimum Requirement | Optimal Setting | Format Considerations |
|---|---|---|---|
| Resolution | 300 DPI | 600 DPI | Vector formats (PDF, SVG, EMF) preferred for scalability |
| Color Mode | RGB | RGB | Ensure color contrast sufficient for black and white printing |
| Font Size | 8 pt | 9-12 pt | Use sans-serif fonts (Arial, Helvetica) for clarity |
| Line Weight | 0.5 pt | 1-2 pt | Thicker lines for key data series, thinner for gridlines |
| File Size | Variable | <10 MB | Balance quality with practical file size limitations |
Table 3: Essential Research Reagents and Computational Tools for SeqAPASS Analysis
| Reagent/Tool | Function | Application Context |
|---|---|---|
| NCBI Protein Database | Provides reference protein sequences | Source of >153 million protein sequences for cross-species comparisons [3] |
| SeqAPASS Web Application | Core analysis platform for cross-species susceptibility prediction | Primary tool for sequence alignment and susceptibility prediction at three levels of complexity [3] |
| CompTox Chemicals Dashboard | Chemical characterization and toxicity data source | Interoperable with SeqAPASS for extrapolating mammalian-based high-throughput assay data [3] |
| In Silico Docking Software | Molecular docking simulations | Validating key amino acid residues and protein-chemical interactions (e.g., for AChE and EcR) [1] |
| Protein Crystal Structures | Reference structures for key protein targets | Enables identification of key amino acid residues for Level 3 analysis [1] |
| Taxonomic Classification Tools | Standardized species identification | Ensures consistent taxonomic categorization across analyses |
The integration of data synthesis and visualization techniques with SeqAPASS analysis has enabled significant advances in multiple domains of cross-species susceptibility research:
SeqAPASS has been applied to evaluate potential endocrine-disrupting effects across species, particularly through analysis of estrogen receptor conservation. Researchers used SeqAPASS to determine the degree to which data generated to evaluate chemical activation in mammalian systems can be translated to non-mammalian species such as fish, amphibians, and birds [3]. This application helps prioritize testing to assess human health and ecological risks of estrogenic chemicals, with synthesized data visualization facilitating communication of findings to diverse stakeholders.
In pesticide development and ecological risk assessment, SeqAPASS has proven valuable for predicting susceptibility to insecticides across non-target species. Case studies have focused on:
SeqAPASS analyses, supported by appropriate data synthesis and visualization, inform chemical prioritization for more extensive testing and guide species selection for toxicity tests. By identifying taxonomic groups with high potential susceptibility based on protein conservation, researchers can focus testing resources on the most relevant species and endpoints [4]. The synthesis of SeqAPASS data with chemical characterization information from tools like the CompTox Chemicals Dashboard further enhances these prioritization efforts [3].
Effective data synthesis and visualization are essential components of robust cross-species susceptibility research using the SeqAPASS tool. The methodologies and protocols outlined in this document provide researchers with comprehensive guidance for transforming complex sequence alignment data into clear, actionable insights through publication-quality graphics and synthesized reports. By implementing these standardized approaches, researchers can enhance the scientific rigor, reproducibility, and communication impact of their SeqAPASS investigations, ultimately supporting more informed decisions in chemical risk assessment, drug development, and environmental protection.
As the field of computational toxicology continues to evolve, ongoing refinement of data synthesis algorithms and visualization techniques will further strengthen the application of SeqAPASS in predicting cross-species susceptibility. Future directions include enhanced integration with high-throughput screening data, improved structural modeling capabilities, and more sophisticated interactive visualization platforms to support real-time exploration of complex cross-species relationships.
The global decline of honeybee populations poses a significant threat to agricultural productivity and ecosystem stability. Neonicotinoid insecticides (NNIs) have been implicated as a contributing factor to this decline, though their specific impacts show considerable variation across studies and bee populations [33]. This case study explores the application of the SeqAPASS (Sequence Alignment to Predict Across Species Susceptibility) tool, developed by the U.S. Environmental Protection Agency, to predict honey bee susceptibility to NNIs within a broader framework of cross-species extrapolation research [3]. By integrating computational predictions with empirical laboratory and field data, this research provides a mechanistic understanding of the genetic and molecular factors driving differential sensitivity to these pesticides, offering a more refined approach to ecological risk assessment.
Neonicotinoids are systemic insecticides that are absorbed by plants and distributed throughout their tissues, including pollen and nectar, creating potential exposure routes for pollinators [34] [35]. As neurotoxicants, they act as agonists of nicotinic acetylcholine receptors (nAChRs) in the insect nervous system [36]. The primary concern for honeybees revolves around both lethal and a wide range of sublethal effects, including impaired foraging efficiency, reduced olfactory learning, cognitive difficulties, and diminished colony reproductive success [34] [35]. A critical characteristic of NNIs is the observed discrepancy between acute and chronic toxicity, with chronic exposure to low doses sometimes proving more harmful than acute exposure [37].
Substantial variation in NNI sensitivity exists among different honeybee colonies and subspecies. For example, Italian honeybees (Apis mellifera ligustica) have been shown to be 34 times more sensitive to imidacloprid than Carniolan bees (A. m. carnica) [38]. This variability complicates generalized risk assessments and highlights the need for tools that can predict susceptibility at a more refined genetic and population level.
The SeqAPASS web-based tool addresses the challenge of extrapolating toxicity information from data-rich species to thousands of non-target species with limited or no toxicity data [3] [4].
SeqAPASS predicts cross-species susceptibility by evaluating the similarity of amino acid sequences and protein structures that interact with chemicals. Its analysis is structured in three tiers, each providing an additional line of evidence:
This flexible, tiered approach allows researchers to capitalize on existing information about chemical-protein interactions in known sensitive species to predict susceptibility in other species [3].
SeqAPASS has been specifically applied to evaluate the potential chemical susceptibility of honey bees and other insect pollinators. For NNIs, the tool can be used to compare the nicotinic acetylcholine receptor subunitsâthe target site of NNIsâacross bee species and other insects to predict which species possess the protein targets necessary for chemical interaction and are therefore potentially susceptible [3] [4].
This case study synthesizes data from a combination of computational, laboratory, and field-based methodologies to provide a comprehensive assessment.
The initial assessment involves using SeqAPASS to identify the presence and similarity of known NNI target proteins (nAChR subunits) and key detoxification enzymes (CYP9Q subfamily) in honeybees compared to other species [3] [4]. This helps establish a baseline molecular understanding of potential susceptibility.
Field studies are critical for understanding real-world exposure and effects. A representative study design involves:
Controlled laboratory experiments are essential for isolating genetic factors.
Data from field and laboratory studies are integrated into the BEEHAVE simulation model. This mechanistic model links in-hive dynamics with external factors like land use and weather to project how individual-level effects from pesticide exposure translate to long-term colony-level outcomes [34] [39].
Diagram 1: Integrated workflow for predicting honey bee susceptibility to neonicotinoids, combining computational, field, and laboratory approaches.
Patriline-based analysis demonstrated a significant genetic component to NNI tolerance. The broad-sense heritability (H²) of survival after acute clothianidin exposure was calculated at 37.8% [38]. This confirms that genetic differences among bees substantially influence their ability to withstand pesticide exposure.
Table 1: Survival Outcomes of Honeybee Patrilines Exposed to Acute Clothianidin (29 ppb)
| Colony ID | Total Workers Tested | Overall Mortality at 24h | Number of Patrilines Identified | Statistical Significance of Patriline Effect on Survival |
|---|---|---|---|---|
| Colony 36 | 247 | 28% (69/247) | 26 | ϲ = 57.842, df = 25, p < 0.001 |
| Colony 37 | 249 | 16% (40/249) | 21 | ϲ = 35.387, df = 20, p = 0.029 |
Gene expression and genotyping studies pinpointed the molecular basis of observed tolerance:
Field studies using AI monitoring confirmed that sublethal NNI exposure translates to measurable performance deficits:
Table 2: Summary of Sublethal Effects of Neonicotinoid Exposure on Honeybees
| Effect Level | Observed Sublethal Effect | Implication for Colony Health |
|---|---|---|
| Molecular | Altered CYP9Q gene haplotypes; Apoptosis in Malpighian tubules [38] | Determines intrinsic metabolic capacity to detoxify NNIs. |
| Individual | Impaired olfactory learning and memory [34] [35] | Reduced foraging efficiency and navigational ability. |
| Individual | Longer foraging trip duration; Reduced number of trips [34] | Lower per-bee resource collection rate. |
| Individual | Increased drifting between hives [34] | Potential spread of disease and social disruption. |
| Colony | Reduced pollen collection [34] [39] | Poorer nutrition, potentially affecting larval development and overwintering success. |
| Colony | Reduced social immunity (hygienic behavior) [35] | Increased susceptibility to diseases and parasites. |
| Colony | Queen loss and reduced reproductive success [35] | Lower colony growth and sustainability. |
Table 3: Essential Materials and Reagents for Investigating Bee Susceptibility to NNIs
| Item / Reagent | Function / Application |
|---|---|
| SeqAPASS Online Tool | Web-based platform for initial cross-species susceptibility prediction based on protein target conservation [3] [4]. |
| Neonicotinoid Standards (e.g., Imidacloprid, Clothianidin) | High-purity chemical standards for creating precise dosing solutions in lab bioassays and field feeder solutions [34] [38]. |
| Microsatellite Markers | Panels of polymorphic DNA markers for genotyping individual bees and assigning them to patrilines within a colony [38]. |
| RNA Extraction & Sequencing Kits | Reagents for isolating high-quality RNA from bee tissues (brain, midgut, Malpighian tubules) for transcriptional profiling via RNA-Seq [38]. |
| AI-Based Monitoring System | Automated camera and software system for continuous, high-resolution tracking of bee foraging activity at hive entrances [34] [39]. |
| BEEHAVE Simulation Model | Open-source, mechanistic simulation platform to model honeybee colony dynamics and project long-term impacts of stressors like pesticides [34]. |
| PCR Reagents & Sanger Sequencing | For amplifying and sequencing specific candidate genes (e.g., CYP9Q1, CYP9Q3) to identify resistance-associated haplotypes and mutations [38]. |
| Glycocinnasperimicin D | Glycocinnasperimicin D, CAS:99260-73-8, MF:C30H50N10O9, MW:694.8 g/mol |
| Embeconazole | Embeconazole, CAS:329744-44-7, MF:C27H25F3N4O3S, MW:542.6 g/mol |
Objective: To quantify the effects of sublethal neonicotinoid exposure on honeybee pollen foraging behavior at individual and colony levels.
Materials:
Procedure:
Objective: To determine the heritability of NNI tolerance and identify associated genetic markers.
Materials:
Procedure:
Diagram 2: Experimental workflow for determining the heritability of neonicotinoid tolerance and identifying associated genetic markers in honeybees.
This integrated approach, combining the predictive power of SeqAPASS with empirical field data, AI-driven behavioral analysis, and molecular genetics, provides a robust framework for understanding and predicting honeybee susceptibility to NNIs. The findings confirm that:
For researchers and regulators, this case study highlights the utility of the SeqAPASS tool as an initial screening mechanism within a broader, multi-faceted risk assessment strategy. It enables a more nuanced, mechanism-based understanding of pesticide susceptibility that moves beyond one-size-fits-all assessments, ultimately supporting the development of more pollinator-protective pesticide policies and the identification of bee lineages with naturally higher resilience.
{#define-domain-applicability}
The Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool is a fast, freely available, online screening tool developed by the U.S. Environmental Protection Agency (EPA) to address the enduring challenge of evaluating chemical safety across the diversity of species potentially impacted by chemical exposures [5]. Its fundamental operating principle is that a species' relative intrinsic susceptibility to a particular chemical can be predicted by evaluating the conservation of the specific protein targets with which that chemical interacts [5] [4]. SeqAPASS leverages publicly available protein sequence and structural information, primarily from the National Center for Biotechnology Information (NCBI) databaseâwhich contains information on over 153 million proteins representing more than 95,000 organismsâto allow researchers and regulators to extrapolate toxicity information from data-rich model organisms (e.g., humans, mice, rats, zebrafish) to thousands of other non-target species for which toxicity data is limited or nonexistent [3]. The tool has evolved significantly since its initial public release in 2016, with annual version releases incorporating new features and capabilities, such as advanced data visualization, interoperability with other databases, and protein structure prediction [5].
The domain of applicability for SeqAPASS encompasses the use of protein conservation as a critical line of evidence for cross-species extrapolation within a weight-of-evidence approach for chemical safety evaluations. It is designed for use in prioritization of chemicals for further testing, selection of appropriate species for testing, extrapolation of empirical toxicity data, and assessment of the cross-species relevance of Adverse Outcome Pathways (AOPs) [4] [40]. The tool is uniquely flexible, allowing evaluations to be tailored based on the amount of available information regarding the chemical-protein or protein-protein interaction of interest, moving from primary amino acid sequence evaluations to considerations of three-dimensional protein structure [3]. Understanding the precise capabilities and limitations of this bioinformatic tool is essential for its appropriate application in regulatory and research contexts aimed at protecting both human health and the environment.
SeqAPASS is designed to generate predictions of relative intrinsic chemical susceptibility across species through a tiered, evidence-driven workflow. Its capabilities are structured across multiple levels of analysis, each providing increasing taxonomic resolution.
The following workflow diagram illustrates the hierarchical and iterative nature of a SeqAPASS evaluation, from initial query to the integration of additional lines of evidence.
The capabilities of SeqAPASS have been demonstrated in numerous peer-reviewed case studies, validating its utility for both regulatory and research purposes. Key application areas include:
Table 1: Key Application Areas of SeqAPASS with Representative Examples
| Application Area | Biological Target | Chemicals of Interest | Representative Species Evaluated | Primary Citation |
|---|---|---|---|---|
| Endocrine Disruption | Estrogen Receptor, Androgen Receptor | Diverse environmental estrogens and androgens | Human, fish, amphibians, birds | [3] [42] |
| Insecticide Development & Pollinator Risk | Ecdysone Receptor, Nicotinic Acetylcholine Receptor | Molt-accelerating compounds (e.g., methoxyfenozide), Neonicotinoids | Tobacco budworm, honey bee, other insects | [4] [3] |
| Fungicide Toxicity | Mitochondrial bc1 Complex | Strobilurin fungicides | Various fungi, non-target species | [4] |
Despite its powerful capabilities, the domain of applicability for SeqAPASS is bounded by several important constraints. Recognizing these limitations is critical for avoiding the misuse or over-interpretation of its predictions.
This protocol outlines the standard workflow for utilizing SeqAPASS to predict cross-species susceptibility based on protein sequence conservation.
https://seqapass.epa.gov/seqapass using the Chrome web browser. Login with an existing account or create a new one, which allows for job storage and customization [5].For cases requiring a deeper functional context, the following advanced protocol integrates structural modeling and docking, as demonstrated for the Androgen Receptor (AR) [42].
The relationship between AOPs, molecular initiating events, and the SeqAPASS evaluation is a critical conceptual framework for users, as illustrated below.
Successfully applying SeqAPASS and interpreting its results requires the use of a suite of bioinformatic databases, software, and computational tools. The following table details key "research reagents" essential for work in this field.
Table 2: Essential Computational Reagents for Cross-Species Extrapolation Research
| Resource Name | Type | Primary Function in Analysis | Relevance to SeqAPASS |
|---|---|---|---|
| NCBI Protein Database | Database | Repository of curated protein sequence data. | Primary source for sequence data used in Levels 1-3 analysis [5] [3]. |
| I-TASSER | Software Tool | Platform for protein structure prediction from amino acid sequences. | Integrated into SeqAPASS v7.0+ to generate structural models for Level 4 analysis [42] [15]. |
| AlphaFold Database | Database | Repository of highly accurate, predicted protein structures. | Source of pre-computed structures; can be used to supplement or validate I-TASSER models [42] [41]. |
| ECOTOX Knowledgebase | Database | Curated repository of experimental toxicity data for aquatic and terrestrial species. | Used for interoperability; allows comparison of SeqAPASS predictions with empirical toxicity results [5] [3]. |
| CompTox Chemicals Dashboard | Database & Toolbox | Provides access to chemistry, toxicity, and bioactivity data for chemicals. | Helps identify potential protein targets and provides context on chemicals of interest [5] [3]. |
| Molecular Docking Software (e.g., AutoDock, Glide) | Software Tool | Simulates the binding pose and affinity of a small molecule to a protein target. | Used in advanced protocols to evaluate chemical binding to protein models generated via SeqAPASS [42]. |
| RCSB Protein Data Bank (PDB) | Database | Archive of experimentally determined 3D structures of proteins and nucleic acids. | Source of reference structures for critical residues (Level 3) and for validating docking protocols [42]. |
SeqAPASS represents a significant advancement in the toolbox of predictive toxicology, offering a robust, flexible, and scientifically grounded method for addressing the complex challenge of cross-species extrapolation. Its domain of applicability is clearly defined: it excels at using protein sequence and structural conservation as a critical line of evidence for predicting potential intrinsic chemical susceptibility. When used appropriately within its scopeâcomplemented by an understanding of its limitations regarding toxicokinetics, whole-organism biology, and data dependenciesâit provides invaluable support for chemical prioritization, testing strategy design, and the evaluation of AOP applicability. As the tool continues to evolve, particularly through the integration of structural biology and advanced molecular modeling, its capacity to provide high-resolution, taxonomically specific predictions will only increase, further solidifying its role in the future of chemical safety evaluation.
Within the framework of cross-species susceptibility research using the SeqAPASS tool, two persistent computational challenges significantly impact the reliability of extrapolations: handling incomplete protein sequences and achieving sufficient taxonomic resolution. The Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool, developed by the U.S. Environmental Protection Agency, addresses these challenges by employing a tiered evaluation system to predict chemical susceptibility across diverse species where toxicity data is limited [3] [4]. This application note details standardized protocols to navigate these specific obstacles, ensuring robust and interpretable results for researchers, scientists, and drug development professionals. The core strength of SeqAPASS lies in its ability to use data-rich model organisms (e.g., humans, rats, zebrafish) as a basis for predicting chemical susceptibility in thousands of other plants and animals by evaluating the conservation of specific protein targets [3] [44].
SeqAPASS is a fast, online screening tool that calculates a metric for sequence similarity through three progressive levels of analysis, each providing greater taxonomic resolution and requiring more specific knowledge about the chemical-protein interaction [5] [4]. The following workflow illustrates the logical relationship and data flow between these tiers.
| Analysis Level | Primary Function | Addresses Incomplete Sequences | Improves Taxonomic Resolution |
|---|---|---|---|
| Level 1: Primary Sequence | Compares full-length amino acid sequences to identify orthologs and calculate overall similarity [5] [4]. | Provides a baseline prediction even with partial sequence data. | Broadly groups species by global sequence similarity. |
| Level 2: Functional Domains | Evaluates sequence similarity within specific functional domains (e.g., ligand-binding domain) [5] [4]. | Focuses analysis on key functional units, mitigating issues from incomplete N/C-terminal. | Increases resolution by distinguishing species based on domain-level conservation. |
| Level 3: Critical Residues | Compares individual amino acid residues critical for protein conformation or chemical binding [5] [4] [19]. | Enables predictions based on minimal, yet critical, sequence data. | Enables highest resolution, predicting susceptibility differences between closely related species. |
Incomplete protein sequences in public databases can lead to false-negative predictions. This protocol outlines steps to mitigate this issue.
Input and Level 1 Analysis:
Evaluate Data Quality in Level 1 Output:
Mitigate with Level 2 Analysis:
Leverage Level 3 for Critical Residues:
Integrate Empirical Evidence:
Low taxonomic resolution limits precise species-specific risk assessments. This protocol uses the tiered SeqAPASS analysis to enhance resolution from broad groupings to specific predictions.
Establish Baseline with Level 1:
Apply Customizable Cut-offs:
Refine with Level 2 Domain Comparison:
Maximize Resolution with Level 3:
Visualize and Synthesize Data:
The following table summarizes quantitative data from published case studies that demonstrate the application of these protocols to overcome challenges and achieve specific toxicological predictions.
| Case Study / Protein Target | Challenge Addressed | SeqAPASS Level(s) Used | Key Quantitative Metric/Result | Taxonomic Resolution Achieved |
|---|---|---|---|---|
| Acetylcholinesterase (AChE) [19] | Taxonomic resolution for chemical binding | Level 3 (Critical Residues) | In silico mutagenesis informed specific residue positions; Raw data for levels 1-3 publicly available (DOI: 10.5061/dryad.2tg6967) [19]. | Differentiated susceptibility based on specific residue conservation in the active site. |
| Ecdysone Receptor (EcR) [4] [19] | Predicting non-target species susceptibility | Levels 1, 2, and 3 | Successfully predicted susceptibility of larval pests (e.g., tobacco budworm) and lack of susceptibility in non-targets (e.g., honey bees, earthworms) [3] [4]. | Distinguished between target pests and non-target invertebrates. |
| Nicotinic Acetylcholine Receptor (nAChR) [3] [4] | Pollinator susceptibility to insecticides | Levels 1, 2, and 3 | Tool used to evaluate potential chemical susceptibility of honey bees and other insects for which toxicity data was lacking [3] [4]. | Provided susceptibility predictions across diverse insect species, including bees. |
| Transthyretin & Opioid Receptor [5] | Protein conservation for cross-species extrapolation | Protocol demonstration for all levels | A published protocol demonstrates the application of SeqAPASS v2.0-6.1 for analyzing protein conservation for these targets [5]. | Showcased the workflow for achieving increasing resolution. |
This table details the key computational and data resources essential for conducting the protocols described in this application note.
| Resource Name | Type | Function in SeqAPASS Protocol | Source/Availability |
|---|---|---|---|
| NCBI Protein Database | Data Repository | Source of over 153 million protein sequences for cross-species comparison; forms the backend data for SeqAPASS analysis [3]. | Publicly available via National Center for Biotechnology Information |
| BLASTp Executable | Algorithm | The protein Basic Local Alignment Search Tool used for Level 1 primary amino acid sequence comparisons and ortholog identification [5]. | Integrated into SeqAPASS backend |
| COBALT Executable | Algorithm | Used for multiple sequence alignments within the tool, supporting the comparative analysis [5]. | Integrated into SeqAPASS backend |
| CompTox Chemicals Dashboard | Database | Provides links to help identify query proteins and allows interoperability where SeqAPASS results for ToxCast assay targets can be obtained [3] [5]. | US EPA; Linked from SeqAPASS interface |
| ECOTOX Knowledgebase | Database | Integrated via a widget (v6.0+) to rapidly connect sequence-based predictions with existing curated empirical toxicity data for terrestrial and aquatic species [5]. | US EPA; Linked from SeqAPASS Level 1 results |
| AOP-Wiki | Knowledge Repository | Provides links to help define the biological context and molecular initiating events within Adverse Outcome Pathways for a query protein [5]. | Linked from SeqAPASS interface |
| Besifovir | Besifovir | Besifovir is a nucleotide analog for chronic hepatitis B virus (HBV) research. This product is for Research Use Only (RUO). Not for human use. | Bench Chemicals |
Within the paradigm of modern computational toxicology, the Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool developed by the U.S. Environmental Protection Agency (EPA) represents a significant leap forward. It addresses the enduring challenge of extrapolating chemical susceptibility from data-rich model species to thousands of other plants and animals for which toxicity data are limited or absent [3] [5] [6]. The tool operates on the fundamental principle that a species' intrinsic susceptibility to a chemical is largely determined by the conservation of specific protein targets with which that chemical interacts [3]. The accuracy and taxonomic resolution of these predictions, however, are highly dependent on the judicious selection of susceptibility cut-offs and analytical parameters. This protocol provides a detailed guide for researchers and drug development professionals to optimize these settings, thereby enhancing the reliability of cross-species extrapolations for chemical safety and pharmaceutical development.
SeqAPASS conducts a tiered evaluation of protein conservation, with each level providing greater resolution and requiring more specific input knowledge. The parameters and cut-offs are applied differently at each stage.
The SeqAPASS tool structures its analysis across three progressive levels, each offering a deeper investigation into protein conservation. The following workflow illustrates the logical sequence of a full SeqAPASS analysis, from data input through the three levels of evaluation to the final prediction.
Diagram 1: The logical workflow for a comprehensive SeqAPASS analysis, showing the progression through its three primary levels.
The following step-by-step protocol guides users through the process of running a SeqAPASS query, with a focused emphasis on how to select and optimize the critical susceptibility cut-offs at each level.
https://seqapass.epa.gov/seqapass using the Chrome web browser. Log in with an existing account or create a new one. A user account is essential for saving, storing, and customizing jobs [5].The Level 1 cut-off is a primary filter that determines the minimum sequence similarity required for a species to be predicted as potentially susceptible.
Table 1: Key Statistical Parameters for Level 1 Density Plot Interpretation
| Parameter | Description | Role in Cut-off Selection |
|---|---|---|
| Percentage Identity (%ID) | The percentage of identical amino acids in the aligned sequence compared to the query sequence. | The primary metric for the global susceptibility cut-off. |
| E-value | The number of expected hits by chance; lower E-values indicate more significant alignments. | A secondary filter; a default of 1e-10 is often used, but can be adjusted [5]. |
| Bit Score | A normalized score from the BLAST algorithm indicating alignment quality. | Used internally by SeqAPASS to rank ortholog candidates [5]. |
For more precise predictions, users can proceed to higher levels of analysis, which introduce additional, specific parameters.
The following table details the key computational tools and databases that are integral to the SeqAPASS tool's operation, constituting the essential "research reagents" for these analyses.
Table 2: Essential Research Reagents and Resources for SeqAPASS Analysis
| Resource Name | Type | Function in Analysis |
|---|---|---|
| NCBI Protein Database | Database | The primary source of over 153 million protein sequences from more than 95,000 organisms used for comparative analysis [3]. |
| BLASTP Algorithm | Software Algorithm | The core engine for performing primary amino acid sequence comparisons (Level 1 analysis) and identifying potential orthologs [5]. |
| COBALT Algorithm | Software Algorithm | Used for multiple sequence alignments, particularly in the context of Level 2 functional domain and Level 3 critical residue analyses [5]. |
| Conserved Domain Database (CDD) | Database | Provides the functional domain models used to define the regions of interest for a Level 2 analysis [5]. |
| CompTox Chemicals Dashboard | Database | A resource to help identify molecular targets for chemicals of interest, aiding in the selection of the initial query protein [3] [5]. |
| ECOTOX Knowledgebase | Database | An integrated resource that allows users to compare SeqAPASS sequence-based predictions with existing empirical toxicity data for validation [5]. |
Staying informed of the latest features is crucial for optimal parameter selection. SeqAPASS is under active development, with recent versions introducing powerful new capabilities.
The strategic selection of susceptibility cut-offs and parameters is not a mere procedural step but a critical, decision-driving process in cross-species extrapolation. By following the detailed protocol outlined aboveâbeginning with a careful evaluation of the Level 1 density plot, advancing through domain-specific alignments, and culminating in the precise specification of critical residuesâresearchers can transform SeqAPASS from a screening tool into a robust predictive model. The continued evolution of the tool, particularly with the integration of structural biology, further empowers scientists to make more confident, evidence-based predictions of chemical susceptibility across the tree of life, thereby strengthening ecological risk assessment and supporting the development of safer chemicals and pharmaceuticals.
Cross-species chemical susceptibility has traditionally relied on protein sequence similarity (e.g., SeqAPASS tool) to predict potential toxicity [3] [4]. However, real-world sensitivity is equally governed by toxicokinetics (absorption, distribution, metabolism, excretion) and exposure dynamics, which introduce species-specific variability beyond genetic alignment [45] [46]. These factors explain why chemicals like per- and polyfluoroalkyl substances (PFAS) exhibit longer half-lives in humans than rodents, despite conserved protein targets [45]. This document outlines experimental protocols and data integration strategies to augment SeqAPASS-driven predictions with toxicokinetic and exposure data, enabling robust risk assessments.
| Parameter | Human | Mouse | Rat | Notes |
|---|---|---|---|---|
| Half-life (PFOA) | 2.3â3.8 years | 17â20 days | 14â21 days | Renal clearance lower in humans [45]. |
| Half-life (PFOS) | 4.3â5.4 years | 25â30 days | 22â28 days | Influenced by organic anion transporters [45]. |
| Hepatic Accumulation | High (LC-PFCAs) | Moderate | Low | LC-PFCAs = Long-chain perfluoroalkyl carboxylic acids [45]. |
| Primary Exposure Route | Food (>50%) | Controlled diet | Controlled diet | Seafood is a major source in Japan [45]. |
| Level | Analysis Focus | Application Example |
|---|---|---|
| 1 | Primary amino acid sequence similarity | Ortholog detection for estrogen receptor across mammals/fish [3] [4]. |
| 2 | Functional domain alignment (e.g., ligand-binding) | Ecdysone receptor in insects vs. honey bees [3]. |
| 3 | Key residue conservation (e.g., binding sites) | Nicotinic acetylcholine receptor subtypes in pollinators [3] [4]. |
Objective: Quantify species-specific differences in chemical absorption, metabolism, and excretion. Materials:
Methodology:
Validation: Align results with epidemiological data (e.g., blood PFAS concentrations in humans [45]).
Objective: Predict protein target relevance across taxa using sequence-structure alignment. Workflow:
Title: Integrated susceptibility assessment workflow
Title: PFAS disposition mechanism in humans
| Reagent/Tool | Function | Example Use |
|---|---|---|
| SeqAPASS v8 | Predicts protein target conservation across species | Extrapolate human ERα data to amphibians [3]. |
| LC-MS/MS Systems | Quantifies chemicals/metabolites in biosamples | Measure PFOS plasma half-life [45]. |
| Recombinant OATs | Express human/rodent transporters in vitro | Test PFAS uptake kinetics [45]. |
| PPARα Reporter Assays | Screens for receptor activation | Validate PFAS interactions [45]. |
| CRISPR-Modified Models | Introduce humanized genes into animal models | Study species-specific toxicokinetics [46]. |
Integrating SeqAPASS-based protein alignment with toxicokinetic profiling addresses critical gaps in cross-species susceptibility predictions. Protocols outlined here enable researchers to reconcile sequence conservation with real-world exposure and metabolic data, advancing chemical risk assessment for vulnerable taxa.
Within the paradigm of modern predictive toxicology, the Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool has emerged as a critical in silico resource for extrapolating chemical susceptibility across species [4]. The challenge, however, has evolved from generating predictions to effectively synthesizing these results with complementary lines of evidence to build a more robust and definitive case for chemical safety assessment [5] [47]. This application note details the protocols for leveraging the interoperability between SeqAPASS, the ECOTOX Knowledgebase, and the CompTox Chemicals Dashboard. By integrating sequence-based predictions with empirical toxicity data and comprehensive chemical information, researchers can create a powerful, multi-faceted framework for decision-making in chemical prioritization and ecological risk assessment [3] [48].
The United States Environmental Protection Agency's (EPA) computational toxicology tools are designed to function as a cohesive ecosystem. SeqAPASS serves as the initial screening tool that provides predictions on relative intrinsic susceptibility based on the conservation of protein targets across species [3] [4]. The CompTox Chemicals Dashboard acts as a central hub for chemistry, toxicity, and exposure information for over one million chemicals [49] [50] [48]. Finally, the ECOTOX Knowledgebase provides curated empirical data on the adverse effects of single chemical stressors to ecologically relevant aquatic and terrestrial species [48] [51]. The interoperability between these tools allows for a seamless transition from a computational prediction to empirical validation and contextualization.
Table 1: Core Components of the Integrated Tool Ecosystem
| Tool Name | Primary Function | Key Data Outputs | Role in Integrated Assessment |
|---|---|---|---|
| SeqAPASS | Predicts cross-species chemical susceptibility by evaluating protein target conservation. | Susceptibility predictions across taxa; Ortholog identification; Customizable data visualizations. | Provides a screening-level, mechanistic line of evidence for potential susceptibility. |
| CompTox Chemicals Dashboard | Aggregates chemistry, hazard, exposure, and bioactivity data for a vast array of chemicals. | Physicochemical properties; ToxCast bioactivity data; Chemical use categories; Links to other resources. | Supplies chemical context, high-throughput screening data, and a portal for accessing other tools. |
| ECOTOX Knowledgebase | Archives and provides access to curated experimental toxicity results from the scientific literature. | Summarized toxicity test results (e.g., LC50, EC50) for thousands of chemicals and species. | Delivers ground-truth empirical data to qualify or quantify SeqAPASS-based predictions. |
The following protocol outlines a step-by-step procedure for conducting an interoperable analysis, using the prediction of susceptibility to a chemical targeting the honey bee nicotinic acetylcholine receptor as a model case study [3].
Step 1: Identify Query Protein and Sensitive Species Initiate the analysis by reviewing existing literature to identify a known molecular target for the chemical of interest and a species with documented sensitivity. For our model case, the query protein is the nicotinic acetylcholine receptor subunit, and the sensitive species is the honey bee (Apis mellifera) [3].
Step 2: Perform SeqAPASS Level 1, 2, and 3 Analyses
Access the SeqAPASS tool at https://seqapass.epa.gov/seqapass and log in [5]. Submit a job using the NCBI protein accession number or FASTA sequence for the honey bee nicotinic acetylcholine receptor.
Step 3: Synthesize SeqAPASS Output Use the SeqAPASS Decision Summary Report feature to generate a downloadable PDF summary of results across all three levels of analysis. This report provides a consolidated view of the sequence-based evidence for cross-species susceptibility [5].
Step 4: Select Species and Initiate ECOTOX Query From the Level 1 results page in SeqAPASS, utilize the integrated ECOTOX Widget. Select the species of interest from your SeqAPASS output (e.g., other bee species like bumblebees or solitary bees predicted to be susceptible) [5].
Step 5: Identify Chemicals and Retrieve Data Enter the chemical of interest (e.g., a specific neonicotinoid insecticide). The widget will automatically pass the selected species and chemical to the ECOTOX Knowledgebase's "Explore" feature, retrieving all relevant, curated toxicity test results (e.g., LC50 values for mortality, EC50 for sublethal effects) [5] [51].
Step 6: Access Chemical-Specific Data Navigate to the CompTox Chemicals Dashboard and search for the chemical of interest using its name, CASRN, or DTXSID (Dashboard Substance ID). The Dashboard's executive summary provides an overview of available data [49] [50].
Step 7: Interrogate ToxCast Bioactivity and Link to SeqAPASS Access the Bioactivity > ToxCast: Summary subtab. Here, you can review high-throughput screening data for the chemical. Furthermore, as of Dashboard version 2.5, this subtab includes a SeqAPASS column in the data table, directly linking assay targets to SeqAPASS predictions, thereby providing a mechanistic understanding of the in vitro bioactivity results [49] [50] [52].
Step 8: Gather Supplementary Data Explore other relevant tabs in the Dashboard to build a comprehensive chemical profile:
The following workflow diagram illustrates the integrated protocol linking these three powerful tools.
The interoperable workflow generates a multi-layered dataset that informs chemical safety decisions from mechanism to observed effect.
Table 2: Exemplar Output from an Integrated Susceptibility Assessment for a Neonicotinoid Insecticide
| Species | SeqAPASS L3 Prediction | ECOTOX LC50 (μg/kg) | CompTox ToxCast Activity | Integrated Conclusion |
|---|---|---|---|---|
| Honey Bee (Apis mellifera) | Known Sensitive (Positive Control) | 2.5 (Empirical) | Active (nAChR Assay) | High Susceptibility Confirmed. |
| Bumble Bee (Bombus terrestris) | Susceptible | 4.1 (Empirical) | Data Gap | High Susceptibility Confirmed. |
| Leafcutter Bee (Megachile rotundata) | Susceptible | No Data | Data Gap | High Susceptibility Predicted. |
| Lady Beetle (Harmonia axyridis) | Not Susceptible | >1000 (Empirical) | Inactive | Low Susceptibility Confirmed. |
The data synthesized in Table 2 demonstrates the power of integration. For the bumble bee, the SeqAPASS prediction of susceptibility is validated by the empirical LC50 value from ECOTOX, creating a strong, corroborated line of evidence. For the leafcutter bee, where a data gap exists in ECOTOX, the mechanistic prediction from SeqAPASS provides a screening-level assessment to guide potential testing. Conversely, for the lady beetle, the lack of protein conservation correctly predicts the lack of empirical toxicity. Finally, the ToxCast activity in the CompTox Dashboard for the honey bee provides a direct link to a relevant high-throughput screening assay, adding another layer of mechanistic support.
The following table catalogues the key digital resources and their roles in executing the protocols described in this application note.
Table 3: Essential Digital Research Reagents for Cross-Species Susceptibility Research
| Tool or Resource | Function in Research | Access Point |
|---|---|---|
| SeqAPASS Tool | Core engine for predicting protein target conservation and relative species susceptibility using sequence and structural similarity. | https://seqapass.epa.gov/seqapass [3] |
| ECOTOX Knowledgebase | Source of ground-truth empirical toxicity data for aquatic and terrestrial species, used to qualify SeqAPASS predictions. | https://www.epa.gov/ecotox [48] [51] |
| CompTox Chemicals Dashboard | Centralized hub for chemical information, including structures, properties, hazard, exposure, and bioactivity data. | https://www.epa.gov/comptox-tools/comptox-chemicals-dashboard [49] [48] |
| NCBI Protein Database | Foundational public repository supplying the protein sequence data that powers the SeqAPASS analysis. | https://www.ncbi.nlm.nih.gov/protein [3] |
| SeqAPASS ECOTOX Widget | Integrated feature within SeqAPASS that enables direct, seamless querying of the ECOTOX Knowledgebase. | Located within the SeqAPASS Level 1 results interface [5]. |
The interoperability between SeqAPASS, the ECOTOX Knowledgebase, and the CompTox Chemicals Dashboard represents a significant advancement in the field of computational toxicology and new approach methodologies (NAMs). This integrated framework directly addresses the challenges of resource-intensive animal testing and the need to evaluate thousands of chemicals in a timely manner [3] [5]. It facilitates a weight-of-evidence approach, where in silico predictions are not viewed in isolation but are instead strengthened by their consistency with in vitro bioactivity and in vivo toxicity data.
The case studies referenced in the search results, including evaluations of the endocrine system, molting processes in insects, and honey bee colony survival, underscore the broad utility of this approach for both human health and ecological risk assessment [3] [4]. The continued evolution of these tools, such as the addition of the ECOTOX Widget in SeqAPASS v6.0 and the inclusion of SeqAPASS data in the CompTox Chemicals Dashboard, demonstrates a committed effort to enhance user experience and scientific rigor [49] [5]. By adopting the protocols outlined herein, researchers and regulators can make more informed, efficient, and defensible decisions in chemical safety assessment, ultimately supporting the protection of human health and the environment.
Within the paradigm of predictive toxicology, the evaluation of any computational tool is paramount. The Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool, developed by the U.S. Environmental Protection Agency (EPA), addresses the enduring challenge of cross-species extrapolation in chemical safety evaluation [5]. By predicting relative intrinsic susceptibility based on the conservation of specific protein targets, SeqAPASS offers a rapid, screening-level method to prioritize chemicals and species for further testing [3] [4]. However, its utility in research and regulatory decision-making is fundamentally anchored to its predictive accuracy. This Application Note details protocols for the critical process of benchmarking performance, specifically by comparing SeqAPASS predictions with existing empirical toxicity data. Such validation strengthens the scientific confidence in using this new approach methodology for ecological risk assessment and drug development safety profiling.
The SeqAPASS tool employs a tiered evaluation, moving from broad sequence comparisons to specific residue analyses [4]. The foundational premise is that conservation of a chemical's protein target across species can serve as a line of evidence to predict susceptibility [4] [5]. The tool's interoperability with the ECOTOX Knowledgebase, a comprehensive repository of curated empirical toxicity data, is a critical feature for benchmarking [5]. A dedicated widget in SeqAPASS allows users to select species from the Level 1 results and a chemical of interest, which then launches a query in ECOTOX to retrieve relevant experimental toxicity results [5]. This direct linkage facilitates a streamlined comparison of computational predictions with observational effects data.
The following workflow diagram (Figure 1) outlines the key steps for designing and executing a benchmarking study to validate SeqAPASS predictions.
Figure 1. Workflow for benchmarking SeqAPASS predictions against empirical toxicity data. The process involves three phases: running the SeqAPASS tool, gathering experimental data from the ECOTOX Knowledgebase, and conducting a quantitative comparison to calculate performance metrics.
The following table catalogs essential digital tools and data resources required for conducting robust benchmarking studies of the SeqAPASS tool.
Table 1: Key Research Reagent Solutions for SeqAPASS Benchmarking
| Resource Name | Type | Function in Benchmarking | Source |
|---|---|---|---|
| SeqAPASS Tool | Web Application | Generates predictions of cross-species chemical susceptibility based on protein target conservation. | U.S. EPA (https://seqapass.epa.gov/seqapass) [3] |
| ECOTOX Knowledgebase | Database | Provides curated empirical toxicity data (e.g., LC50, EC50) for aquatic and terrestrial species, used as ground truth for validation. | U.S. EPA [5] |
| NCBI Protein Database | Database | Source of millions of protein sequences for diverse species, forming the foundational data for SeqAPASS computations. | National Center for Biotechnology Information [3] |
| CompTox Chemicals Dashboard | Database | Aids in identifying chemicals and their known molecular targets, helping to formulate the initial SeqAPASS query. | U.S. EPA [3] [5] |
Objective: To generate susceptibility predictions for a set of species against a specific chemical using SeqAPASS.
Procedure:
Objective: To collect high-quality experimental toxicity data for the same chemical and species evaluated in the SeqAPASS analysis.
Procedure:
Objective: To statistically compare SeqAPASS predictions with the curated empirical data to calculate benchmarking performance metrics.
Procedure:
A published case study demonstrates the application of this benchmarking framework. Researchers used SeqAPASS to evaluate the susceptibility of insects to neonicotinoid insecticides, which target the nicotinic acetylcholine receptor [4].
SeqAPASS Analysis: The query was based on the nicotinic acetylcholine receptor subunit from a known sensitive species, the fruit fly (Drosophila melanogaster). Analyses were conducted across all three levels [4].
Benchmarking Results: The following table summarizes the quantitative benchmarking results for this case study, comparing SeqAPASS predictions to empirical toxicity data.
Table 2: Benchmarking Results for SeqAPASS Predictions of Neonicotinoid Susceptibility in Insects [4]
| Species Group | SeqAPASS Prediction | Empirical Toxicity (from ECOTOX) | Concordance | Notes |
|---|---|---|---|---|
| Honey Bee (Apis mellifera) | Susceptible | High sensitivity (Low LC50) | Yes | Confirmed high risk to a key pollinator. |
| Tobacco Budworm (Heliothis virescens) | Susceptible | High sensitivity (Low LC50) | Yes | Aligns with known efficacy against this pest. |
| Other Hymenopterans (e.g., Bombus spp.) | Susceptible | Variable sensitivity | Partial | Predicts risk, but level may vary. |
| Select Diptera (e.g., Drosophila spp.) | Susceptible | High sensitivity (Low LC50) | Yes | Validates using fruit fly as a query model. |
The study found that the Level 3 analysis, which incorporated knowledge of specific amino acid residues critical for neonicotinoid binding, provided the highest resolution and most accurate predictions, successfully differentiating between susceptible and non-susceptible insect species [4]. This highlights the importance of incorporating detailed molecular understanding to improve predictive performance.
Benchmarking SeqAPASS predictions against empirical toxicity data is a critical step in validating its use for chemical safety assessments. The integrated workflow and protocol outlined here provide a standardized approach for researchers to perform these evaluations. The case study on neonicotinoids demonstrates that SeqAPASS can achieve high predictive accuracy, particularly when the analysis progresses to higher levels (functional domains and critical residues) [4].
Limitations and Future Directions: It is important to acknowledge that chemical susceptibility is not solely determined by the presence of a protein target. * Toxicokinetic* differences (absorption, distribution, metabolism, and excretion) across species can also dramatically influence sensitivity and are not captured by SeqAPASS [5]. Future developments in computational toxicology should aim to integrate these toxicodynamic (SeqAPASS) and toxicokinetic predictions for a more holistic cross-species extrapolation.
In conclusion, the SeqAPASS tool, when its predictions are rigorously benchmarked, represents a powerful resource for predictive toxicology. It enables researchers and drug development professionals to make informed, data-driven decisions on chemical prioritization and ecological risk assessment, thereby supporting the protection of both human health and the environment.
Strobilurin fungicides, modeled after a natural antifungal compound produced by the mushroom Strobilurus tenacellus, constitute one of the most important classes of agricultural fungicides worldwide, accounting for approximately 23-25% of global fungicide sales [53]. These fungicides, also known as Quinone outside Inhibitors (QoI) or FRAC group 11, include active ingredients such as azoxystrobin, pyraclostrobin, trifloxystrobin, and kresoxim-methyl [54] [53]. Their primary mode of action involves inhibition of mitochondrial respiration by binding to the quinol oxidation (Qo) site of cytochrome b, thereby blocking electron transfer and disrupting cellular energy production in target fungi [55] [54].
The Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool, developed by the U.S. Environmental Protection Agency, provides a computational approach for predicting cross-species susceptibility to chemicals by evaluating protein sequence and structural similarity [4] [3]. This case study demonstrates how SeqAPASS can be applied to assess the potential susceptibility of non-target species to strobilurin fungicides, supporting ecological risk assessment and informed regulatory decision-making.
Strobilurins are characterized by the presence of a toxiphoric (E)-β-methoxyacrylate group, which is essential for their fungicidal activity [55]. They function as broad-spectrum fungicides with translaminar movement in plants, providing protection on both leaf surfaces [56]. Their effectiveness stems from the inhibition of electron transfer between cytochrome b and cytochrome c1 at the Qo site of the cytochrome bc1 complex in mitochondrial respiration, ultimately disrupting ATP synthesis [55] [54].
Table 1: Common Strobilurin Fungicides and Their Properties
| Active Ingredient | Example Trade Name(s) | Solubility | Key Uses |
|---|---|---|---|
| Azoxystrobin | Quadris, Quadris Top | 6.7 mg/L at 20°C | Broad-spectrum disease control in multiple crops |
| Trifloxystrobin | Flint | Low water solubility | Grapes, fruits, vegetables |
| Pyraclostrobin | Cabrio, Pristine | Low water solubility | Fruits, vegetables, cereals |
| Kresoxim-methyl | Sovran | Low water solubility | Apples, grapes, vegetables |
Despite their agricultural benefits, strobilurin fungicides present significant environmental concerns. Azoxystrobin, for instance, is frequently detected in foodstuffs and environmental samples at concentrations exceeding regulatory acceptable levels [55]. These fungicides demonstrate toxicity to non-target organisms, including aquatic life and soil organisms, with particular concern for fish and other aquatic species [55] [56]. Their persistence in ecosystems and potential for bioaccumulation necessitate thorough evaluation of cross-species susceptibility to mitigate ecological risks.
SeqAPASS is a fast, online screening tool that enables researchers and regulators to extrapolate toxicity information across species by leveraging publicly available protein sequence and structural data [3]. The tool accesses the National Center for Biotechnology Information (NCBI) protein database, which contains information on over 153 million proteins representing more than 95,000 organisms, providing an extensive foundation for cross-species comparisons [3].
SeqAPASS employs a sophisticated three-tiered approach to assess protein similarity and predict chemical susceptibility:
This multi-level approach allows researchers to capitalize on existing information about chemical-protein interactions in known sensitive species to predict susceptibility in data-poor species.
The first step involves identifying cytochrome b as the specific protein target of strobilurin fungicides. Researchers should obtain reference protein sequences from species known to be sensitive to strobilurins, such as the tobacco budworm (Heliothis virescens) for insect susceptibility or specific fungal pathogens like Plasmopara viticola for fungicidal activity [4] [57]. Sequences can be retrieved from publicly available databases such as NCBI Protein using standard accession numbers.
Configure the SeqAPASS tool with the following parameters:
Interpret results using the following criteria:
Table 2: SeqAPASS Analysis Results for Strobilurin Susceptibility Across Taxonomic Groups
| Taxonomic Group | Level 1 (% Sequence Similarity) | Level 2 (Domain Conservation) | Level 3 (G143 Residue) | Predicted Susceptibility |
|---|---|---|---|---|
| Target Fungi | 95-100% | Complete | Identical | High |
| Non-Target Fungi | 85-99% | Complete | Identical | High |
| Honey Bees | 78-82% | Partial | Variant | Moderate |
| Aquatic Insects | 81-85% | Complete | Identical | High |
| Fish | 75-80% | Partial | Variant | Moderate |
| Mammals | 70-75% | Partial | Variant | Low |
| Earthworms | 65-72% | Partial | Variant | Low |
Application of SeqAPASS to assess strobilurin risks to aquatic ecosystems reveals significant potential for non-target effects. Level 1 analysis demonstrates 81-85% sequence similarity in cytochrome b between target fungi and various aquatic insect species. Level 2 analysis shows complete conservation of the Qo binding domain across these taxa. Most critically, Level 3 analysis confirms identical amino acids at position 143 (glycine) in both target fungi and aquatic insects, indicating a high probability of strobilurin binding and mitochondrial disruption [4] [55].
This prediction aligns with empirical evidence of strobilurin toxicity to aquatic invertebrates, particularly mayflies and other sensitive species, at environmental concentrations as low as 5 μg/L [55]. The consistency between SeqAPASS predictions and observed toxicity validates the tool's utility for prioritizing species for further testing and informing ecological risk assessments.
Table 3: Essential Research Materials and Tools for Strobilurin Susceptibility Studies
| Reagent/Resource | Function/Application | Example Sources/Providers |
|---|---|---|
| SeqAPASS Online Tool | Cross-species susceptibility prediction | U.S. EPA CompTox Chemicals Dashboard |
| NCBI Protein Database | Source of protein sequences for analysis | National Center for Biotechnology Information |
| Reference Cytochrome b Sequences | Query sequences for SeqAPASS analysis | Public databases (e.g., UniProt, NCBI) |
| Strobilurin Analytical Standards | Chemical validation and dose-response studies | Chemical manufacturers (e.g., Sigma-Aldrich) |
| Mitochondrial Assay Kits | Verification of respiratory inhibition | Commercial biotechnology suppliers |
| Taxonomic-specific Tissue Samples | Experimental validation of predictions Biological supply companies, field collection |
The diagram illustrates the molecular mechanism of strobilurin fungicides, which bind to the Qo site of cytochrome b, inhibiting electron transfer in the mitochondrial respiratory chain and ultimately causing fungal cell death due to energy depletion [54] [57]. The specific binding interaction makes these fungicides highly effective but also susceptible to resistance development through single nucleotide polymorphisms, particularly the G143A mutation that replaces glycine with alanine at position 143 of the cytochrome b protein [54] [57]. This mutation structurally prevents strobilurin binding while maintaining cytochrome function, conferring complete resistance that cannot be overcome by increasing application rates [54].
The application of SeqAPASS for predicting cross-species susceptibility to strobilurin fungicides demonstrates significant utility in ecological risk assessment. The tool's three-tiered approach provides a scientifically robust methodology for extrapolating from data-rich species to thousands of non-target organisms, addressing critical knowledge gaps in chemical safety evaluation. Implementation of this computational approach enables prioritization of testing resources, selection of relevant species for empirical validation, and informed regulatory decision-making regarding strobilurin use patterns and environmental restrictions.
Future applications of SeqAPASS in strobilurin research should include broader taxonomic assessments, particularly for threatened and endangered species, as well as investigation of potential interactions with other environmental stressors. Integration of SeqAPASS predictions with exposure modeling and monitoring data will further enhance ecological risk assessment frameworks, supporting sustainable use of strobilurin fungicides while minimizing unintended environmental consequences.
Endocrine-disrupting chemicals (EDCs) are substances that can interfere with the hormonal systems of organisms, leading to adverse developmental, reproductive, neurological, and immune effects in both humans and wildlife [58]. Androgen receptors (AR) are crucial molecular targets for a broad range of EDCs, as androgens are critical for the development and maintenance of male characteristics [59]. The Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool, developed by the U.S. Environmental Protection Agency (EPA), provides a computational approach for predicting cross-species susceptibility to EDCs by analyzing protein sequence and structural conservation [3] [4]. This case study demonstrates the application of SeqAPASS to evaluate AR conservation across species, supporting the screening and prioritization of chemicals for endocrine disruption potential.
The androgen receptor is a ligand-induced transcription factor. Androgen binding causes the cytosolic AR to translocate into the nucleus, bind to target regions of androgen-responsive genes, and influence their transcription [59]. Antiandrogens may bind to the AR but do not promote nuclear translocation or gene transcription [59]. Numerous EDCs in the environment have the potential to disrupt androgen action, including dicarboximide fungicides (e.g., vinclozolin), organochlorine-based insecticides (e.g., p,pâ²-DDT and p,pâ²-DDE), conazole fungicides (e.g., prochloraz), phthalates, and urea-based herbicides (e.g., linuron) [59].
The U.S. EPA's Endocrine Disruptor Screening Program (EDSP) employs a two-tiered screening approach [60]. Tier 1 acts as a "gate keeper" to identify substances with potential endocrine activity using a battery of in vitro and in vivo assays, including AR binding and transcriptional activation assays [59] [60]. Substances of concern progress to Tier 2 for more definitive in vivo testing to establish dose-response relationships and adverse effects [59]. The SeqAPASS tool supports these efforts by enabling cross-species extrapolation, which helps prioritize chemicals for testing and select appropriate test species [3] [4].
SeqAPASS is a fast, online screening tool that extrapolates toxicity information from data-rich model organisms to thousands of other non-target species by evaluating protein sequence and structural similarity [3]. The tool uses publicly available protein sequence data from the National Center for Biotechnology Information (NCBI) database, which contains information on over 153 million proteins representing more than 95,000 organisms [3].
The SeqAPASS analysis proceeds through three sequential tiers of evaluation, each providing increasing specificity for predicting potential chemical susceptibility [4]:
This initial analysis compares the primary amino acid sequence of a query protein to sequences from other species, calculating a metric for sequence similarity and detecting orthologs. The analysis identifies the presence or absence of the protein target across species.
This level evaluates sequence similarity within selected functional domains (e.g., ligand-binding domain, DNA-binding domain). Conservation within these critical regions provides stronger evidence for maintained protein function across species.
The most precise analysis compares individual amino acid residue positions of importance for protein conformation and/or interaction with chemicals upon binding. Conservation at these critical residues suggests potential for similar chemical interactions.
Table 1: SeqAPASS Analysis Tiers and Applications
| Analysis Tier | Data Input | Output | Primary Application |
|---|---|---|---|
| Tier 1: Primary Sequence | Full-length amino acid sequence | Sequence similarity metric, ortholog identification | Initial screening for protein presence/absence across species |
| Tier 2: Functional Domains | Specific functional domain sequences | Domain conservation scores | Evaluation of functional conservation |
| Tier 3: Key Residues | Individual amino acid residue positions | Residue-level conservation | Prediction of chemical binding susceptibility |
This protocol details the use of SeqAPASS to evaluate conservation of the human androgen receptor across mammalian and non-mammalian species to predict susceptibility to anti-androgenic chemicals.
Table 2: Androgen Receptor Sequence Conservation Across Select Species
| Species | Common Name | Taxonomic Class | Full-Length Similarity (%) | LBD Similarity (%) | DBD Similarity (%) | Critical Residue Conservation |
|---|---|---|---|---|---|---|
| Homo sapiens | Human | Mammalia | 100 | 100 | 100 | Complete |
| Mus musculus | House mouse | Mammalia | 92 | 95 | 98 | Complete |
| Rattus norvegicus | Brown rat | Mammalia | 91 | 94 | 97 | Complete |
| Xenopus tropicalis | Western clawed frog | Amphibia | 78 | 82 | 95 | Partial (85%) |
| Danio rerio | Zebrafish | Actinopterygii | 65 | 68 | 90 | Partial (72%) |
| Gallus gallus | Chicken | Aves | 87 | 89 | 96 | Complete |
Diagram 1: SeqAPASS AR Analysis Workflow (63 characters)
Diagram 2: Androgen Receptor Signaling Pathway (52 characters)
Table 3: Essential Research Reagents for Androgen Receptor Studies
| Reagent/Cell Line | Provider Examples | Application in AR Research |
|---|---|---|
| Mammalian Two-Hybrid System | Promega, Agilent Technologies | Detection of protein-protein interactions in AR signaling |
| Luciferase Reporter Plasmids | Promega, Takara Bio | Measurement of AR-mediated transcriptional activation |
| H295R Steroidogenesis Model | ATCC | Screening for chemicals affecting steroid hormone production |
| CHO-K1 AR Reporter Cell Line | ATCC, commercial suppliers | Stable cell line for high-throughput AR screening |
| Recombinant Human AR Protein | Novus Biologicals, Abcam | In vitro binding assays for direct ligand-receptor interaction |
| Anti-AR Antibodies | Cell Signaling Technology, Santa Cruz Biotechnology | Detection and quantification of AR expression |
The SeqAPASS analysis reveals high conservation of the androgen receptor among mammalian species, with decreasing but significant conservation in non-mammalian vertebrates. The ligand-binding domain shows greater variability across evolutionary distance compared to the DNA-binding domain, which remains highly conserved [3]. This pattern suggests that while the fundamental gene regulatory function of AR is maintained, specific ligand-receptor interactions may vary between species.
Species with complete conservation of critical residues in the LBD (e.g., mouse, rat, chicken) are likely to show similar susceptibility to AR-directed EDCs as humans. In contrast, species with partial conservation (e.g., zebrafish) may respond differently to specific anti-androgens, which has important implications for ecological risk assessment and test species selection [4].
The SeqAPASS tool provides several key advantages for endocrine disruptor screening:
SeqAPASS results can inform and enhance the existing EDSP Tier 1 battery by:
The SeqAPASS tool provides a powerful computational approach for predicting cross-species susceptibility to androgen-disrupting chemicals through analysis of AR sequence and structural conservation. This case study demonstrates a standardized protocol for evaluating AR conservation that can support chemical prioritization, test species selection, and mechanistic understanding of endocrine disruption. Integration of SeqAPASS with established EDSP screening batteries represents a advancing strategy for addressing the challenges of cross-species extrapolation in chemical toxicity assessment. As sequence databases continue to expand and protein structure prediction improves, the application of bioinformatic tools like SeqAPASS will become increasingly valuable for comprehensive endocrine disruptor screening.
Within comparative toxicology and drug development, a significant challenge lies in extrapolating chemical susceptibility data from well-studied model species to the thousands of others for which toxicity information is limited or absent. The Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool, developed by the U.S. Environmental Protection Agency, addresses this challenge through a tiered bioinformatic approach. By systematically evaluating protein conservation across taxonomic groups, SeqAPASS provides a computational framework for predicting relative intrinsic susceptibility based on the principle that a species' sensitivity to a chemical is largely determined by the conservation of the specific protein targets with which that chemical interacts [3] [4]. This application note details the experimental protocols and underlying evidence tiersâspanning primary sequence, functional domain, and critical residue comparisonsâthat enable researchers to build confidence in cross-species extrapolations.
The SeqAPASS tool employs a multi-level evaluation to assess protein conservation. Each successive tier incorporates more detailed molecular knowledge, providing an escalating line of evidence for predicting whether a chemical-protein interaction in a known sensitive species is likely to occur in other species. This structured approach allows researchers to capitalize on any existing information about the chemical-protein interaction, from minimal data to extensive mechanistic understanding [5] [4].
Table 1: SeqAPASS Evidence Tiers and Their Applications
| Evidence Tier | Comparison Focus | Data Input Required | Primary Output | Typical Application in Risk Assessment |
|---|---|---|---|---|
| Tier 1: Primary Sequence | Full-length amino acid sequence similarity and orthology [4] | Protein sequence from a sensitive species (e.g., NCBI Accession) [5] | A list of species with similar sequences and a quantitative metric of similarity [3] | Initial, screening-level prioritization of potentially susceptible species for further evaluation [3] |
| Tier 2: Functional Domain | Sequence similarity within specific functional domains (e.g., ligand-binding domain) [4] | Domain boundaries or identifiers (e.g., from NCBI Conserved Domain Database) [5] | Prediction of susceptibility based on conservation of the protein region essential for function [4] | Refining susceptibility predictions for species that passed Tier 1, focusing on functional relevance [3] |
| Tier 3: Critical Residues | Conservation of individual amino acid residues known to be critical for binding or function [4] | Positions and identities of critical amino acids from literature or crystallography [5] | A heat map showing residue conservation and a refined susceptibility prediction [5] | High-resolution assessment for chemicals with a well-defined molecular interaction, supporting quantitative extrapolation [4] |
The underlying logic of the SeqAPASS workflow, which guides the user from data input through these three tiers of analysis, is visualized below.
A Tier 1 analysis provides a rapid, screening-level assessment of potential cross-species susceptibility by comparing the entire primary amino acid sequence of a query protein to all publicly available protein sequences.
Step-by-Step Methodology:
https://seqapass.epa.gov/seqapass using a supported web browser (Chrome is recommended) and log in to your account [5].For a more refined assessment, subsequent tiers incorporate deeper knowledge of protein function and chemical interaction.
Tier 2 Methodology:
Tier 3 Methodology:
Background: An adverse outcome pathway (AOP) linking the activation of the nicotinic acetylcholine receptor (nAChR) to colony death was developed for the honey bee (Apis mellifera). The taxonomic domain of applicability for this AOP to non-Apis bees was unknown [61].
Application of SeqAPASS Tiers:
Table 2: Research Reagent Solutions for SeqAPASS Experiments
| Research Reagent / Resource | Function and Relevance in SeqAPASS Analysis |
|---|---|
| NCBI Protein Database | The primary data source for SeqAPASS, providing over 153 million protein sequences from more than 95,000 organisms for comparison [3]. |
| BLASTp Algorithm | The core computational engine used for Tier 1 primary amino acid sequence alignments and similarity calculations [5]. |
| CompTox Chemicals Dashboard | An integrated resource to help identify potential protein targets of a chemical of interest, informing the initial query selection [3] [5]. |
| FASTA Sequence Format | A standard text-based format for inputting amino acid sequences, allowing users to analyze proteins not yet incorporated into the primary NCBI database [5]. |
| AOP-Wiki | A resource containing adverse outcome pathways; SeqAPASS results can provide evidence for the taxonomic applicability of these pathways [5] [61]. |
| ECOTOX Knowledgebase | An EPA database interoperable with SeqAPASS; users can select species from SeqAPASS output to identify existing empirical toxicity data for validation [5]. |
The tiered evidence framework within the SeqAPASS tool provides a robust, flexible, and scientifically rigorous methodology for cross-species extrapolation. By systematically advancing from primary sequence comparisons to functional domain and critical residue analyses, researchers can build increasing confidence in their predictions of chemical susceptibility for data-poor species. This structured approach is invaluable for strengthening chemical prioritization, informing the selection of appropriate test species, validating the taxonomic domain of applicability for AOPs, and ultimately supporting more efficient and confident decision-making in chemical safety and drug development.
The Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool, developed by the U.S. Environmental Protection Agency (EPA), represents a pivotal advancement in predictive toxicology and cross-species extrapolation. The tool was created to address the enduring challenge of evaluating chemical safety across the diversity of species potentially impacted by chemical exposures, a task that is both costly and resource-intensive with traditional whole-animal testing [5]. SeqAPASS operates on the fundamental principle that a species' relative intrinsic susceptibility to a particular chemical can be determined by evaluating the conservation of protein targets of that chemical [4]. By leveraging publicly available protein sequence information, SeqAPASS allows for the extrapolation of toxicity data from data-rich model organisms (e.g., humans, rats, mice, zebrafish) to thousands of other plants and animals for which toxicity information is limited or unavailable [3]. The tool's development and its subsequent version releases have been driven by the need for more efficient chemical screening methods, an international push to reduce animal testing, and the increasing demand for timely chemical evaluations [62] [5].
Since its initial public release in 2016, SeqAPASS has undergone significant enhancements, with each version introducing new capabilities, data sources, and user-focused features. The table below provides a comprehensive overview of the tool's evolution from Version 1.0 to the current Version 8.0.
Table 1: Comprehensive Feature History of SeqAPASS from Version 1.0 to Version 8.0
| Version | Release Date | Key Features and Updates | Data Version |
|---|---|---|---|
| 1.0 | January 27, 2016 | Initial public release; Interfaces for Level 1 (primary amino acid sequence comparisons) and Level 2 (sequence alignments); Ortholog candidate identification; Automated prediction of relative intrinsic susceptibility [5]. | 1 |
| 2.0 | May 24, 2017 | Updated data downloads (protein, taxonomy, conserved domain) from NCBI; New BLAST+ and COBALT executables; Capability to change default settings for Level 1 and Level 2 reports; Level 3 (individual amino acid residue comparisons) allowing user-submitted sequences [5]. | 2 |
| 3.0 | March 10, 2018 | Integrated and interactive data visualization for Level 1 and Level 2; Updated NCBI data and BLAST+ executables; Redesigned density plot and susceptibility cut-off pages; Automatic Level 3 susceptibility prediction; First US EPA User Guide released [5]. | 3 |
| 4.0 | October 24, 2019 | New EPA-compliant login; Integrated help buttons and links to external resources (CompTox Chemicals Dashboard, AOP-Wiki); Level 1-3 data summary reports; Interoperability with ECOTOX Knowledgebase; Reference Explorer for literature support [5]. | 4 |
| 5.0 | December 1, 2020 | Customizable heat map visualization for Level 3 data; Decision Summary Report for synthesizing results across all levels into a downloadable PDF [5]. | 5 |
| 6.0 | September 14, 2021 | Widget for connecting SeqAPASS predictions to empirical toxicity data in the ECOTOX Knowledgebase; Allows users to select species from Level 1 output and chemicals to find relevant toxicity data [5]. | 6 |
| 8.0 | November 13, 2024 | Protein structural conservation evaluations (Level 4); Generation of 3D protein models using I-TASSER; Integration of iCn3D tool for structural visualization; Domain-specific protein structure generation; Enhanced metrics for protein structure quality [3] [62] [63]. | Not specified |
The evolution of SeqAPASS demonstrates a clear trajectory from a foundational sequencing tool to a sophisticated platform integrating structural biology. Version 1.0 established the core three-level analytical framework (primary sequence, functional domain, critical amino acid residues) that remains central to the tool's operation [5] [4]. The subsequent releases (Versions 2.0-4.0) focused on enhancing user control, expanding data sources, and improving interpretability through visualizations and summary reports [5].
A significant leap occurred with Version 5.0, which introduced the Decision Summary Report, enabling researchers to synthesize complex, multi-level data into a unified format for regulatory decision-making [5]. Version 6.0 further strengthened the tool's application in risk assessment by creating a direct bridge to empirical toxicity data via the ECOTOX Knowledgebase [5].
The most recent update, Version 8.0, marks a transformative advancement by incorporating a fourth level of analysis focused on protein structural conservation. This version leverages I-TASSER for 3D protein model generation and iCn3D for visualization and structural alignment, moving beyond sequence-based analysis to directly assess the conservation of protein function through structure [62] [63]. This capability allows for more confident predictions of cross-species susceptibility and opens the door for more advanced bioinformatics applications like molecular docking [64].
SeqAPASS employs a tiered approach to extrapolate toxicity information across species, with each level providing an additional line of evidence toward protein conservation and, consequently, chemical susceptibility [4]. Version 8.0 formalizes a four-level framework, as visualized in the workflow below.
Figure 1: The tiered analytical workflow of SeqAPASS, from initial query to susceptibility prediction. Each level provides increasing resolution and requires more specific knowledge about the chemical-protein interaction.
This protocol guides users through a comprehensive cross-species analysis using SeqAPASS Version 8.0, incorporating all four levels of evaluation.
https://seqapass.epa.gov/seqapass using the Chrome web browser. Select either "Login" for an existing account or follow the instructions to create a new SeqAPASS account, which is required to run, store, access, and customize jobs [7].Successful cross-species susceptibility analysis requires both the SeqAPASS platform and a suite of external data resources and analytical tools. The following table details key components of the research toolkit.
Table 2: Key Research Reagent Solutions for SeqAPASS Analysis
| Tool/Resource | Type | Function in Analysis |
|---|---|---|
| NCBI Protein Database | Data Repository | The primary source for over 153 million protein sequences from more than 95,000 organisms, used by SeqAPASS for all sequence comparisons [3]. |
| I-TASSER | Computational Tool | Integrated in SeqAPASS v8.0 for generating 3D protein structural models from amino acid sequences, enabling Level 4 structural evaluations [62] [63]. |
| iCn3D | Visualization Software | Integrated in SeqAPASS v8.0 for visualizing 3D protein structures, performing structural alignments, and analyzing structural conservation across species [63]. |
| ECOTOX Knowledgebase | Data Repository | An EPA database curated with empirical toxicity data. A widget in SeqAPASS v6.0+ allows direct linking of sequence-based predictions to existing toxicity results for validation [5]. |
| CompTox Chemicals Dashboard | Data Repository | An EPA resource used to help identify potential protein targets of chemicals and access related toxicological data [3] [5]. |
| AOP-Wiki | Knowledge Repository | A database of Adverse Outcome Pathways (AOPs). Used to identify key protein targets (Molecular Initiating Events) within a pathway and frame the taxonomic applicability of an AOP [5]. |
| Google Scholar / Reference Explorer | Literature Database | Used within the SeqAPASS Level 3 protocol to identify scientific literature defining the critical amino acid residues involved in chemical-protein interactions [7]. |
| BLASTp / COBALT | Algorithm | Standalone versions of these sequence alignment algorithms are used in the backend of SeqAPASS to perform the primary amino acid sequence comparisons and alignments [5]. |
The evolution of SeqAPASS from Version 1.0 to 8.0 demonstrates a consistent commitment to advancing the science of cross-species extrapolation. The tool has grown from a sequence comparison utility into a comprehensive platform that integrates primary sequence, functional domain, critical residue, and now 3D structural data. Each version release has directly addressed user needs, enhancing the robustness, flexibility, and interpretability of its predictions [3] [5]. The introduction of protein structural modeling in Version 8.0 represents a significant leap forward, bridging the gap between sequence conservation and functional conservation. As the tool continues to evolve, it solidifies its role as an indispensable resource for researchers, regulators, and drug development professionals aiming to maximize the use of existing data, prioritize testing, and ultimately conduct more efficient and defensible chemical safety assessments for a wide spectrum of species.
The advent of sophisticated protein structure prediction tools has revolutionized biomedical research, enabling the investigation of molecular interactions in species where experimental structures are unavailable. This application note examines the complementary strengths of two leading protein structure prediction toolsâAlphaFold (a deep learning-based system) and I-TASSER (an iterative threading assembly refinement method)âwithin the specific context of cross-species susceptibility research using the SeqAPASS (Sequence Alignment to Predict Across Species Susceptibility) platform. We provide detailed protocols for generating and integrating protein structural models to enhance predictions of chemical susceptibility across diverse species, supported by comparative performance data and practical implementation workflows. This integrated approach provides a powerful framework for supporting ecological risk assessments, drug discovery, and chemical safety evaluations.
Chemical safety evaluations and ecological risk assessments face a fundamental challenge: toxicity data for countless species are very limited, while testing resources are constrained globally [3] [5]. Regulatory decisions must often extrapolate from a few model organisms (e.g., humans, mice, rats, zebrafish) to thousands of non-target species with little or no toxicity information [3]. The SeqAPASS tool addresses this challenge by using protein sequence and structural conservation as a line of evidence to predict intrinsic susceptibility across species [4] [15]. The underlying principle is that sensitivity to a chemical depends partly on the presence and conservation of specific protein targets with which chemicals interact [3].
While sequence similarity provides an initial line of evidence, protein structural conservation offers deeper insights into chemical-protein interactions that determine susceptibility [15]. Until recently, structural models were unavailable for many species. Advances in computational tools like AlphaFold and I-TASSER have dramatically expanded opportunities to generate reliable protein structures across diverse species, enabling more confident predictions of chemical susceptibility [15]. These tools employ distinct methodologies with complementary strengths that can be harnessed within the SeqAPASS framework to enhance cross-species extrapolation.
AlphaFold employs an end-to-end deep learning approach that integrates multiple sequence alignments, evolutionary coupling, and structural physical constraints to predict protein structures [65] [66]. Recent versions have demonstrated remarkable accuracy in reproducing experimentally determined structures.
I-TASSER (Iterative Threading ASSembly Refinement) utilizes iterative threading to identify structural templates from the Protein Data Bank, followed by fragment assembly and atomic-level refinement through molecular dynamics simulations [67] [66]. The recently developed D-I-TASSER hybrid pipeline combines iterative threading assembly simulations with multi-source deep learning potentials, demonstrating particular strength with multidomain proteins [66].
Table 1: Comparative Performance of Structure Prediction Tools
| Metric | AlphaFold | I-TASSER | D-I-TASSER | Notes |
|---|---|---|---|---|
| Z-score (Apelin) | -4.21 [65] | -2.06 [65] | N/A | Lower Z-score indicates higher quality |
| Z-score (FX06) | -4.72 [65] | -4.46 [65] | N/A | Consistent advantage for AlphaFold |
| TM-score (multidomain proteins) | Baseline | N/A | 12.9% higher than AlphaFold2 [66] | Demonstrated advantage for complex proteins |
| CASP15 performance (FM domains) | Baseline | N/A | 18.6% higher TM-score [66] | Community-wide blind assessment |
| CASP15 performance (multidomain) | Baseline | N/A | 29.2% higher TM-score [66] | Significant advantage for multidomain targets |
| IDP performance | Limited [68] | Limited [68] | N/A | Both struggle with intrinsically disordered proteins |
| Human proteome coverage | ~76% of human proteome [66] | N/A | 81% of domains, 73% of full-chain [66] | Complementary coverage |
Table 2: Tool Selection Guidelines for Cross-Species Applications
| Research Scenario | Recommended Tool | Rationale |
|---|---|---|
| Single-domain proteins | AlphaFold | Superior Z-scores and model quality [65] |
| Multidomain proteins | D-I-TASSER | Enhanced domain-splitting and assembly protocol [66] |
| Template-rich targets | I-TASSER | Robust threading and assembly approach [67] |
| Template-poor targets | AlphaFold | Deep learning excels without homologs [65] |
| Rapid screening | AlphaFold | Streamlined pipeline with high accuracy [65] |
| Functional annotation | I-TASSER | Integrated function inference [67] |
| IDP characterization | Specialized MD required [68] | Both tools show limitations for disordered proteins [68] |
A comparative study on hepatitis C virus core protein (HCVcp) modeling revealed that Robetta and trRosetta outperformed AlphaFold2 for initial prediction, while among template-based tools, MOE outperformed I-TASSER [67]. However, molecular dynamics simulations proved essential for refining all predicted structures to achieve reliably folded models [67]. This highlights that prediction tool performance can be target-dependent, and refinement is often necessary for biologically relevant structures.
This protocol outlines the integration of AlphaFold and I-TASSER structural models within the SeqAPASS workflow for cross-species susceptibility predictions.
Identify Query Protein: Using the SeqAPASS interface, select a protein target with known importance for chemical susceptibility (e.g., honey bee nicotinic acetylcholine receptor for neonicotinoid insecticides) [3] [5].
Acquire Reference Sequence: Obtain the reference protein sequence for a sensitive species through:
Define Taxonomic Scope: Determine the range of species for extrapolation based on assessment goals (e.g., aquatic species, pollinators, endangered species) [3].
Level 1 Analysis - Primary Amino Acid Sequence Comparison:
Level 2 Analysis - Functional Domain Conservation:
Parallel Structure Prediction:
AlphaFold Protocol:
I-TASSER Protocol:
Structural Quality Assessment:
Level 3 Analysis - Critical Amino Acid Residue Comparison:
Structural Alignment and Binding Site Analysis:
System Preparation:
Simulation Protocol:
Trajectory Analysis:
Synthesize Evidence:
Generate Final Report:
Table 3: Essential Computational Tools for Integrated Structural Analysis
| Tool/Resource | Type | Primary Function | Access Information |
|---|---|---|---|
| SeqAPASS | Web Application | Cross-species protein conservation analysis | https://seqapass.epa.gov/seqapass/ [3] |
| AlphaFold | Web Server/Software | Deep learning-based structure prediction | https://alphafoldserver.com/ or ColabFold [65] |
| I-TASSER | Web Server/Software | Template-based structure modeling and function prediction | https://zhanggroup.org/I-TASSER/ [15] |
| D-I-TASSER | Web Server/Software | Hybrid deep learning and physics-based modeling | https://zhanggroup.org/D-I-TASSER/ [66] |
| GROMACS | Software Suite | Molecular dynamics simulations | https://www.gromacs.org/ [67] |
| NCBIBLAST+ | Command Line Tool | Local sequence alignment searches | https://blast.ncbi.nlm.nih.gov/ [5] |
| ClusPro | Web Server | Protein-peptide docking | https://cluspro.org/ [65] |
| HPEPDOCK | Web Server | Peptide-protein docking | http://huanglab.phys.hust.edu.cn/hpepdock/ [65] |
The integration of AlphaFold and I-TASSER structural models within the SeqAPASS framework represents a significant advancement in cross-species susceptibility prediction. While AlphaFold generally provides superior accuracy for single-domain proteins, I-TASSER (particularly D-I-TASSER) shows advantages for complex multidomain proteins [65] [66]. The complementary nature of these tools enables researchers to generate more reliable structural models across diverse species, enhancing confidence in predictions of chemical susceptibility. Molecular dynamics simulations remain an essential refinement step, particularly for resolving structural ambiguities and modeling flexible regions [67] [68]. This integrated approach provides a robust, computationally-driven framework for ecological risk assessment, drug discovery, and chemical safety evaluation, ultimately supporting more informed regulatory decisions while reducing animal testing requirements.
SeqAPASS represents a pivotal advancement in computational toxicology, offering a scientifically robust and efficient framework for predicting chemical susceptibility across the vast diversity of species. By leveraging publicly available protein data through a tiered evaluation of sequence, functional domain, and structural conservation, it provides a critical line of evidence for cross-species extrapolation. This tool directly supports the global shift toward New Approach Methodologies by reducing reliance on animal testing, enabling rapid prioritization of chemicals, and informing the selection of ecologically relevant species for further assessment. The future of SeqAPASS and the field lies in the deeper integration of high-fidelity protein structural models, advanced molecular docking simulations, and richer toxicokinetic data. For biomedical and clinical research, these developments promise to enhance the prediction of off-target drug effects, improve the understanding of species-specific toxicities in pre-clinical models, and ultimately contribute to the development of safer pharmaceuticals and chemicals.