
Protein consensus sequences are always upper case letters indicating most common amino acid at that position. Regions spanned by multiple insertions and deletions are difficult to align we attempt to anchor alignments in such regions on glycosylation sites, and to preserve the minimal elements which span such regions. In cases of nonunanimity the most common nucleotide is shown in lowercase. An upper case letter in a DNA consensus sequence indicates that the nucleotide is preserved unanimously in that position in all sequences used to make the consensus. If a column in a subtype group contained equal numbers of two different letters we resolved that tie by looking at the same column throughout the M group and using the most common letter as the consensus. The consensus sequences were calculated according to the default values on the consensus website except that they were computed for all subtype groups having 3 or more (rather than 4 or more) sequences in the alignment. From the input, consensus sequences were built using our consensus website. Specifically, question marks in consensus sequences have been resolved, and glycosylation sites have been aligned. These sequences have undergone additionalĪnnotation after retrieval. The input alignments are the HIV Sequence Database Web Alignments. For more details, see M-group Consensus Construction explanation file. Ancestral sequences are based on the Complete Genome M-group Ancestral sequence and its phylogenetic tree. We also provide a Consensus M-group, which is a consensus of consensus sequences for subtypesĪ, B, C, D, F, G, H. The circulating recombinant forms CRF01 and CRF02 and group O. We provide consensuses for the M group subtypes A (including A, A1, and A2), B, C, D, F (including F1 and F2), and G
BUILDING A TREE FROM CONSENSUS SEQUENCES DOWNLOAD
Examples of the download formats can be seen on the Consensus Maker Explanation page.Īdditional information about consensus and ancestral alignments What sequences are included In the pretty print version, the alignments are broken into lines of 50 characters each, and the sequences are presented in an aligned style. Pretty print files are not available for unaligned sequences. It is usually better to download 1 file at a time. If you select more than 1 file, all the files will be concatenated into 1 big file, and you will have to split them out yourself. The files include a consensus of each subtype, an M-group consensus-of-consensuses, and some ancestral sequences.ĭetails: Files are available in 4 formats, either as alignments or as gapless files. Purpose: To provide consensus and ancestral sequences of genetically associated subsets of HIV-1 sequences. Consensus and Ancestral Sequence Alignments
