rna-tools.online


rna-tools.online

rna.tools.online

The rna-tools.online server hosts many bioinformatic tools to perform various operations on different types of RNA data with ease. This project has been developed to give users access to a comfortable interface that doesn’t require any knowledge of programming or command line execution.


Feel free to contribute to improve this documentation at Google Docs.


Motivation         

Documentation         

The tool interface         

Job id         

Fetch         

The syntax  for selection         

Re-run jobs         

Form validation         

Tools         

RNA 3D structure conversion from CIF to PDB         

Conversion between mmCIF and PDB         

RNA 3D structure analysis         

Get sequences         

Get secondary structures         

​​Contacts (Interactions) classification with ClaRNA         

Analysis with X3DNA         

RNA 3D structure standardization         

RNA 3D structure editing         

RNA 3D structure minimization         

​​RNA 3D structures comparison         

RNA 3D model quality assessment         

Demo files         

Feedback is welcome!         

Motivation

Significant improvements have been made in the efficiency and accuracy of RNA 3D structure prediction methods in recent years; however, many tools developed in the field stay exclusive to only a few bioinformatic groups. To perform a complete RNA 3D structure modeling analysis as proposed in the RNA-Puzzles publications, e.g., [1], the researchers must familiarize themselves with a quite complex set of tools.

The goal of the rna-tools package [2] was to provide a more abstract way to process data for RNA 3D modeling. Nowadays, rna-tools has become a wide toolbox to approach every aspect of working with various types of RNA data. The package was used to provide computational resources for the RNA-puzzle community [2] and also offered tools for other biological applications, e.g., [3,4,5].

However, using rna-tools requires the installation of a mixture of library and tools and basic knowledge of the Linux terminal command line. To give a chance for all biologists to take advantage of developments in RNA 3D structure prediction, we provide a user-friendly server to perform many standard analyses required for the typical modeling workflow: secondary structure prediction, 3D structure manipulating and editing, structure minimization, structures analysis, and comparison tools.

In the server, each tool has been translated into a web application. The user can use the web browser, select the tool to use, and upload its own files. Once the computation is done, the webserver allows the user to download the results and explore the steps that were used to perform the analysis. This will also provide a way to learn how the rna-tool package can be used and will spur the user into trying to perform more customized analyses.

All tools are well documented and examples are provided to help the users to understand the tools.

[1] Z. Miao, R. W. Adamiak, M. Antczak, M. J. Boniecki, J. M. Bujnicki, S.-J. Chen, C. Y. Cheng, Y. Cheng, F.-C. Chou, R. Das, N. V. Dokholyan, F. Ding, C. Geniesse, Y. Jiang, A. Joshi, A. Krokhotin, M. Magnus, O. Mailhot, F. Major, T. H. Mann, P. Piątkowski, R. Pluta, M. Popenda, J. Sarzynska, L. Sun, M. Szachniuk, S. Tian, J. Wang, J. Wang, A. M. Watkins, J. Wiedemann, Y. Xiao, X. Xu, J. D. Yesselman, D. Zhang, Y. Zhang, Z. Zhang, C. Zhao, P. Zhao, Y. Zhou, T. Zok, A. Zyła, A. Ren, R. T. Batey, B. L. Golden, L. Huang, D. M. Lilley, Y. Liu, D. J. Patel, and E. Westhof, “RNA-Puzzles Round IV: 3D structure predictions of four ribozymes and two aptamers.,” RNA, vol. 26, no. 8, pp. 982–995, Aug. 2020.

[2] M. Magnus, M. Antczak, T. Zok, J. Wiedemann, P. Lukasiak, Y. Cao, J. M. Bujnicki, E. Westhof, M. Szachniuk, and Z. Miao, “RNA-Puzzles toolkit: a computational resource of RNA 3D structure benchmark datasets, structure manipulation, and evaluation tools.,” Nucleic Acids Research, vol. 48, no. 2, pp. 576–588, Jan. 2020.

[3] M. Magnus, K. Kappel, R. Das, and J. M. Bujnicki, “RNA 3D structure prediction guided by independent folding of homologous sequences.,” BMC Bioinformatics, vol. 20, no. 1, pp. 512–15, Oct. 2019.

[4] K. Eysmont, K. Matylla-Kulinska, A. Jaskulska, M. Magnus, and M. M. Konarska, “Rearrangements within the U6 snRNA Core during the Transition between the Two Catalytic Steps of Splicing.,” Molecular Cell, vol. 75, no. 3, pp. 538–548.e3, Aug. 2019.

[5] F. Stefaniak and J. M. Bujnicki, “AnnapuRNA: A scoring function for predicting RNA-small molecule binding poses,” PLoS Comput Biol, vol. 17, no. 2, p. e1008309, Feb. 2021

Documentation

The rna-tools.online server hosts many bioinformatic tools to perform various operations on different types of RNA data with ease. This project has been developed to give users access to a comfortable interface that doesn’t require any knowledge of programming or command line execution  and complicated installation.

The Tools page is divided into seven categories, each one describing the type of tools it contains. The tools are listed one below another, and a small description follows the name of each tool. By clicking on the hyperlink it is possible to reach the page of every single tool. The sections in which the tools have been divided are the following:

  1. RNA 3D structure conversion from CIF to PDB
  2. RNA 3D structure analysis
  3. RNA 3D structure standardization
  4. RNA 3D structure editing
  5. RNA 3D structure minimization
  6. RNA 3D structures comparison
  7. RNA 3D model quality assessment

The tools may differ from one another, but the general pipeline is the following:

  1. The user opens  the tool page for the required actions;
  2. Then, following the instruction displayed on the page,  the user can drag and drop input files into a box on the page or Fetch  files from another job, and press “ Run!
  3. The tool will start to work in the background and after a few seconds or minutes (depending on the tool) , the result files will be generated.
  4. The user will then be able to retrieve the data of the analysis and re-use it to perform other tasks on the rna-tools web server.

Demo

Each Tool has an option to “Load demo”. If the user is unsure about the file or the correct formats that the tool accepts, “run” the demo and download and explore the loaded example file and the resulting  output.

The tool-specific documentation for each tool can be found on  each page, at the bottom.

Job id

If you reload the page when the job id (identifier) is not in your URL then a new folder will be created for the user:

http://rna-tools.online/tools/calc-rmsd/  

To access the same folder and the same job, in a new browser window or after refreshing your page, you need a full URL:

http://rna-tools.online/tools/calc-rmsd/11e4e814  

Fetch

The user can use “Fetch” with the job ID of another job to get files into a current job.

Syntax for selection

Various tools use the selection scheme.

For “Mutate”, “A:1A+2A+3A+4A,B:13A” defines mutating all selected residues into adenine (“A”). “A:1A” means taking the first residue from chain A and mutating it into A (adenine). The user can combine resides from the same chain with “+” and add residues to be mutated in another change by using “,”.

For “Calculate RMSD”, ranges of residues can be defined using “-”, e.g.,  “A:1-17+24-110+115-168”, meaning, select residues from 1 to 17 (including), and from 24 to 110 (including) and from 115 to 168 (including) of chain A.

Moreover for “Calculate RMSD” negative selection is possible to remove a single atom from selection, e.g. “A/57/O2\'’, meaning remove “O2’” of residues 57 of chain A (\ is required to protect ‘ from being interpreted as an end of a string).

Re-run jobs

The goal of the server was to implement an interactive workflow to allow the user for complex analyses. The user can work in the same server folder by removing some input files and keeping outputs of the tools, and by adding new files, one can perform interactively more complex analyses. Even a finished job can be easily re-run (for example, after removing one of the input files or adding new files) to get a new result.

Form validation

Some forms need extra information, when missing, the following information will be displayed and the submission will be stopped.

Figure. Form validation in enabled for some tools when the input is required.

In some tools, input verification is challenging without running the tool. In these cases, such as "Calculate RMSD", the tool reports the problem in the output of a given tool. Here, because of the different lengths of segments taken for the calculations (chain A, residues 1-17 (including) vs chain A, residues 1-18 (including)), the number of atoms is different and RMSD can not be calculated. The information about the issues is shown in the output.

Figure. Errors can be reported in the output of a given tool.

Tools

RNA 3D structure conversion from CIF to PDB

Conversion between mmCIF and PDB

As only a limited number of chains and atoms can be deposited in the PDB format, the mmCIF format has been introduced to provide an alternative way to save structures. As the predicted RNA structures are normally within the capability of the PDB format, this format is still used in the RNA-Puzzles community. Moreover, many tools available in the field of RNA 3D bioinformatics still are using the PDB format and likely will not be updated for the mmCIF format. Thus, we decided to provide a web application to convert mmCIF format to PDB format (“Convert CIF files to PDB”) and reverse (“Convert PDB files to CIF”). These two tools are based on the open-source version of PyMOL.

Convert CIF files to PDB

Convert PDB files to CIF

RNA 3D structure analysis

The first group of tools includes programs that aim to facilitate the analysis of RNA 3D structure. With “Get sequences” the user can easily obtain RNA sequences for the uploaded PDB files. To obtain secondary structures from the PDB files, the tool “Get secondary structures” can be used  that is based internally on 3DNA /DSSR software. The 3DNA/DSSR software is also used for the next tool, “Analysis with X3DNA” which provides various detailed statistics for PDB files such as a list of RNA elements (helixes, stems, motifs, nucleotide modifications) and configuration of base pairs. The last tool in this group uses ClaRNA to classify the contacts (interactions) between base pairs in a PDB file.

  Get sequences  get sequences of a bunch of PDB files

  Get secondary structures  get secondary structures of a bunch of PDB files

  Analysis with X3DNA  get statistics and details on PDB files

  Analysis with ClaRNA  get interactions detected for PDB files

Get sequences

There are   two   ways how to obtain the sequences from PDB files. As the demo, we provide four  models from the RNA-Puzzle 21 target.

The default options will show the filename after ‘#’ and the sequence of all chains  (in here, only one chain is present per model):

# 21_3dRNA_1_rpr

>A:1-41

CCGGACGAGGUGCGCCGUACCCGGUCACGACAAGACGGCGC

# 21_Adamiak_1_rpr

>A:1-41

CCGGACGAGGUGCGCCGUACCCGGUCACGACAAGACGGCGC

# 21_ChenHighLig_1_rpr

>A:1-41

CCGGACGAGGUGCGCCGUACCCGGUCACGACAAGACGGCGC

# 21_Das_1_rpr

>A:1-41

By using the option “ fasta ” the user can change the formatting and get an output in the fasta format:

>21_3dRNA_1_rpr.pdb A:1-41

CCGGACGAGGUGCGCCGUACCCGGUCACGACAAGACGGCGC

>21_Adamiak_1_rpr.pdb A:1-41

CCGGACGAGGUGCGCCGUACCCGGUCACGACAAGACGGCGC

>21_ChenHighLig_1_rpr.pdb A:1-41

CCGGACGAGGUGCGCCGUACCCGGUCACGACAAGACGGCGC

>21_Das_1_rpr.pdb A:1-41

CCGGACGAGGUGCGCCGUACCCGGUCACGACAAGACGGCGC

For the structure with the break in the chain, in the output, the user can find the numbering of two segments:

# 5k7c_clean_onec hain_renumber_as_puzzle_srr

>A:1-47 52-62

CGUGGUUAGGGCCACGUUAAAUAGUUGCUUAAGCCCUAAGCGUUGAUAUCAGGUGCAA

Get secondary structures

Here, we use the same data as for the “Get Sequence” tool. The output of the tool shows the information on the secondary structure stored in the dot-bracket notation. Briefly, in this notation, “.” represents unpair residue, “()” represents paired resides, “[]” represents pseudoknot. As you can see here, only two groups correctly predicted the pseudoknot in the model: “21_Adamiak_1_rpr.pdb” and “21_Das_1_rpr.pdb”. Interestingly, model “21_3dRNA_1_rpr.pdb” has less number of pairs detected, so even without looking at the structure, this is a strong suggestion that the model is not folded into a compact RNA structure.

21_3dRNA_1_rpr.pdb

((((...............))))..................

21_Adamiak_1_rpr.pdb

((((.......[[[[[[[..))))..........]]]]]]]

21_ChenHighLig_1_rpr.pdb

((((.......(((((((..))))..........)))))))

21_Das_1_rpr.pdb

((((.......[[[[[[[..))))..........]]]]]]]

Figure. The model is relatively compact; however, most of the residues are unpair, which makes this model likely of low quality (see “Assess”).

​​

​​Contacts (Interactions) classification with ClaRNA

We can use the next tool to  look more carefully at the type of interactions within an RNA molecule. Again, let’s look at the previous model. Here, we can get the detailed view of what types of interactions were detected by ClaRNA.

21_3dRNA_1_rpr.pdb

Classifier: Clarna

chains: A 1 41

A 1 A 24 bp C G WW_cis 0.9184

A 2 A 23 bp C G WW_cis 0.8763

A 3 A 22 bp G C WW_cis 0.8950

A 4 A 21 bp G C WW_cis 0.8362

A 39 A 40 bp C G SH_cis 0.6261

​​Only one chain was detected, chain A. Within this chain 4 canonical WW_cis (Watson-Watson in the cis conformation) interactions were detected. Interestingly, there is one non-canonical interaction between residues 39 and 40, SH_cis, which means that the sugar of residue 39 interacts with the Hoogsteen edge of the residue 40. The score is assigned to each interaction, between 0 and 1, the higher, the more given interaction looks like the perfect interaction.

Analysis with X3DNA

"Analysis with X3DNA” provides various detailed statistics for PDB files, such as a list of RNA elements (helixes, stems, motifs, nucleotide modifications) and the configuration of base pairs.

In one example, crystal structure for the RNA Puzzle 17, the user can find that the method was able to identify an A-minor motif between residues of chain A: G4, C13, and A23.

List of 1 A-minor motif(s)

1 type=II A/G-C A.A23 vs A.G4+A.C13 [WC]

+A.G4 H-bonds[0]: ""

-A.C13 H-bonds[3]: "O2'(hydroxyl)-O3'[3.13]; O2'(hydroxyl)-O2'(hydroxyl)[2.99]; N3-O2'(hydroxyl)[2.81]"

Figure. A-minor motif between residues of chain A: G4, C13, and A23.


RNA 3D structure standardization

This group of tools aims to facilitate operations on the standardization of RNA 3D structure. “Standardize PDB for RNA puzzle submission” allows standardizing a PDB file to be compatible with the format proposed by the RNA-Puzzles community. The tool standardizes the naming of atoms, residues, and chains, reports and adds missing atoms removes water and ions, and keeps only canonical RNA atoms (see Examples below).. “Standardize PDB for Molecular Dynamics” allows standardizing a PDB file to be compatible with the format used in Molecular Dynamics e.g. by OpenMM. The tool standardizes the naming of atoms, residues, and chains, reports and adds missing atoms removes water and ions, and keeps only canonical RNA atoms.

Standardize PDB files (get-rnapuzzle-ready, "_rpr.pdb")  get a standardized naming of atoms, residues, chains to be compatible with the format proposed by the RNA-Puzzles community [report and add missing atoms], remove water and ions, keep only canonical RNA atoms

Standardize PDB files for Molecular Dynamics (get-molecular-dynamics-ready, "_mdr.pdb")  get a standardized naming of atoms, residues, chains to be compatible with the format used e.g., by openmm [report and add missing atoms], remove water and ions, keep only canonical RNA atoms . This tool is the same as "Standardize your PDB files" with one action added to remove starting OP3 atoms.

Figure: (Starting from left) input structure, structure with rebuilt atoms, and reference. The B fragment is observed in the reference used here as a “benchmark”; fragment A is reconstructed atoms (not observed in the reference”).

Figure. Add missing O2′ atom (before and after).

Figure. The residue fixed is in cyan. The G base from the library is in red. Atoms O4′, C2′, C1′ are shared between the sugar (in cyan) and the G base from the library (in red). These atoms are used to superimpose the G base on the sugar, and then all atoms from the base are copied to the residues.

Figure. An example of rebuilding ACGU base-less fragment. The output should be minimized.

RNA 3D structure editing

The  next group of tools allows the user to edit PDB files ; most  of them are using BioPython. The “Concatenate PDB files” tool is able to merge two or more PDB files into one file. “Extract from PDB” extracts and “Delete from PDB” removes specified residues from a PDB file. “Edit PDB file” edits specific residues in a PDB file. “Mutate residues in PDB file” mutates residues in a PDB file using an improved (multiple chains) code with ModeRNA. “Swap chains name in PDB file” swaps names of the chains in a PDB file, “Replace XYZ coordinates in PDB file” replaces XYZ coordinates of one PDB file with XYZ coordinates from another file which is useful for homology modeling.

The group of tools aims to facilitate operations on RNA 3D structure.

  Concatenate  merge a bunch of PDB files into one file

  Extract  extract parts of PDB files

  Delete  delete parts of PDB files

  Replace  HETATM with ATOM in PDB files

  Edit  edit (not the same as mutate) of PDB files

  Mutate  mutate residues in PDB files

  Swap chains  swap names of the chains in PDB files

  Replace  replace XYZ coordinates of one PDB file with XYZ coordinates from another file

Figure. Example of “Mutate” (“--mutate 'A:1A+2A+3A+4A,B:13A'”). Input structure on the left, mutated structure on the right, the first four residues of chain A into adenines, and 13th A of chain B are mutated.

RNA 3D structure minimization

A common task in RNA bioinformatics is to remove steric clashes and optimize bonds and angles of an RNA model. We provide two tools: “Minimize with QRNAS” which offers structural minimization based on QRNAS, a software tool for fine-grained refinement of nucleic acid structures, and “Minimize with OpenMM” which offers minimization for PDB files based on OpenMM, a molecular dynamics simulation toolkit.

  Minimize with QRNAS  minimize PDB files to fix clashes based on QRNAS

  Minimize with OpenMM  minimize PDB files to fix clashes based on OpenMM

Figure. Example of minimization with QRNAS. One way to assess the progress is to use tools like MolProbity ( http://molprobity.biochem.duke.edu ). With the progress of minimization, a lower clash score can be determined (bottom panel) at level 2.02, starting from a very high clash score of 176 (top panel).

​​ RNA 3D structures comparison

The next group contains three tools that can be used to compare structural files. The “diffpdb” tool is a simple program to perform a text-based comparison of two files of PDB format to identify the difference in the annotation of atoms, missing atoms, and missing fragments. "Calculate Root Mean Square Deviation (RMSD)” can be used to calculate an RMSD that is a measure of the average distance between the atoms of superimposed RNAs. The tool at the server provides options for the selection of fragments in the target structure and structures used for comparison, as well as the exclusion of specific atoms RMSD is a relatively simple, geometrical measure, useful in some scenarios, however for RNA comparison more useful can be a measurement that takes into account interactions networks of RNA molecules. We provide “Calculate Network Interaction Fidelity (INF)” where RNAs are represented by a network of interactions, and the closer two networks of interactions of two molecules are similar, the higher the INF score (inf_all).

  diffpdb  it is a simple tool to compare two files of PDB format to identify the difference in the annotation of atoms, missing atoms, missing fragments

  Calculate Root Mean Square Deviation (RMSD)  RMSD is the measure of the average distance between the atoms of superimposed RNAs (also proteins etc.). This is a simple, geometrical measure.

  Calculate Network Interaction Fidelity (INF)  INF is the measure specific for RNA molecules. RNAs are represented by network of interactions, and the closes two networks of interactions of two molecules are similar, the higher INF.

Figure. The results for RMSD calculation for the 4 models submitted for the RNA Puzzles 21 vs the crystallographic structure (21_solution_2_rpr.pdb). Here no selection was applied, full-length models are compared. The lowest RMSD the better. We can see again that 21_3dRNA_1_rpr.pdb obtained the highest RMSD (the worst score) as suggested by other analyses performed above (e.g. very low number of base pairs) and model 21_ChenHighLig_1_rpr.pdb is to be the most accurate (see Assess below and Calculate INF).

Figure. The results for comparison of networks of interactions for the 4 models submitted for the RNA Puzzles 21 vs the crystallographic structure (21_solution_2_rpr.pdb). "Inf_all" is a score that summarizes all subscores, the highest the better. Again, the model 21_3dRNA_1_rpr.pdb obtained the lowest score as suggested by other analyses performed above (e.g. very low number of base pairs) and model 21_ChenHighLig_1_rpr.pdb is reported to be the most accurate (see Assess below and also Calculate RMSD).

RNA 3D model quality assessment

The last step of RNA 3D structure modeling is the assessment of the quality of a model. The web server provides two tools that are executed to obtain predicted quality scores: RASP and Dfire. In both cases, lower scores mean a higher probability of a given structural model being of good quality.

  Assess  tools to assess how good are the models obtained from modeling procedure.

The results for the 4 models submitted for the RNA Puzzles 21. The lowest score the better. We can see again that 21_3dRNA_1_rpr.pdb obtained the highest (the worst score) as suggested by other analyses performed above (e.g. very low number of base pairs) and model 21_ChenHighLig_1_rpr.pdb is predicted to be the most accurate. Interestingly, both methods scored selected the less accurate model as the same file. However, Dfire and RASP differ in what the methods predicted to be the best model.

Figure. The results sorted by RASP, the best model is predicted to be 21_ChenHighLig_1_rpr.pdb.

Figure. The results sorted by Dfire, the best model is predicted to be 21_Das_1_rpr.pdb.

Demo files

For the tools, you can use these demo files   rp21.zip . For each tool, there is tool-oriented documentation and a description of the analysis of these files.

Feedback is welcome!

Please report any issues via the Github Issue tracker: https://github.com/mmagnus/rna-tools/issues  

Follow us, ask a question on Twitter https://twitter.com/rna_tools


We are exploring using https://piwik.pro to track usage of the server for grant proposals, scientific presentations.

Please write to us if there is any problem with your job mail us [behind Apple Hide My Mail].