Table S1 (Shah et al. 2018, preprint PDF)

CRISPR-Cas Type III accessory protein families

List of putative accessory gene clusters conserved near Type III genetic modules which passed the Type III association score cut-off (>24). All annotated accessory protein sequences are available in Shah2018.TypeIIIaccessory.faa. For each putative accessory protein family, the cluster id, the size (i.e. number of members per cluster), and the calculated Type III association score are listed. An example (gene-) locus id is also provided for reference. Names are provided for accessory protein families identified in earlier studies (Haft et al. 2005; Vestergaard et al. 2014; Makarova et al. 2015). Thirty nine of 76 putative accessory protein families are newly identified. Links to sequence alignments, gene and genome maps are provided as well as profile-profile alignment results against Pfam and PDB. A rudimentary functional prediction is given in the last two columns.

1Csx1/Csm673.88116446J114_15030ALN Gene Pfam PDBCARF+HEPNN-ter CARF, C-ter HEPN toxin
2Csx169.2594464Thet_1098ALN Gene Pfam PDBCARF+HEPNN-ter CARF, C-ter HEPN toxin
3Csm662.3369347NIES39_J03350ALN Gene Pfam PDBCARF+RelEN-ter CARF, C-ter RelE toxin
5Cas11/Csx1959.3664179Athe_0139ALN Gene Pfam PDBcore gene (SS)
6Csx158.5161461aq_378ALN Gene Pfam PDBCARF+HEPNN-ter CARF, C-ter HEPN toxin
7WYL/Csx142.5951335THEYE_A0059ALN Gene Pfam PDBCARF+WYLN-ter CARF, C-ter WYL
9Csx137.9149399PTH_1930ALN Gene Pfam PDBCARF+NucleaseN-ter CARF, C-ter putative endonuclease
11Csx15/20/peptidase35.5941129Vpar_1818ALN Gene Pfam PDBpeptidasepeptidase associated with CRISPR3 in Synecocystis. Suggests an unknown step, may be a proteolytic maturation of some Cas protein
17Cas_RecF24.731331Tneu_0569ALN Gene Pfam PDBSMC ATPaseABC AAA ATPase cassette found in SMC type DNA repair proteins. Called CasRecF in Vestergaard et al. 2014
23Csx14.6120378LS215_0610ALN Gene Pfam PDBfound elsewhereCARF protein also found elsewhere on the genome far from Type III modules
25Csx355.8318100THEYE_A0067ALN Gene Pfam PDBCsx3 (CARF)
27Csx1850.071794sll7069ALN Gene Pfam PDBadaptation associatedfound in cyanobacterial Type III systems that have an adaptation module
28Csx2166.7617229Cyan7425_0157ALN Gene Pfam PDBType III-D associatedpossible C-terminal RNA binding domain. Associated exclusively with Type III-D systems
29CorA45.9114572CLB_2115ALN Gene Pfam PDBCorA-likeC-ter. CorA-like Mg2+ transporter domain. N-ter. domain unknown. Associated with Type III-B modules in bacteria and archaea.
33Csa310.4713215Pcal_0266ALN Gene Pfam PDBType I-A associatedCARF-HTH or Csa3 normally associated with Type I-A systems as a transcriptional repressor or activator
35Csx361.0712310CYB_0599ALN Gene Pfam PDBCsx3-AAA
36WYL50.7111451Mic7113_2620ALN Gene Pfam PDBHTH-WYLN-ter. DNA binding domain and C-ter WYL domain that may be involved in regulation (mainly cyanobacteria)
37Csx2656.6211125M1627_1089ALN Gene Pfam PDBHNH nucleaserestriction endonuclease/HNH endonuclease
38C3a3854.3510277CLOAM0849ALN Gene Pfam PDBunknownUnknown structure/function
42Cas11/Csx1974.49140FSU_2045ALN Gene Pfam PDBcore gene (SS)CasT3 - specific for bacterial type III-A/III-B systems
43C3a4358.839492Ferpe_1557ALN Gene Pfam PDBLon proteaselikely ATP-dependent protease domain at N-ter. (Lon family). Large unknown C-terminal domain.
45PD-DExK28.549288Dd586_3238ALN Gene Pfam PDBpossible nucleasePossible nuclease-related domain at C-terminal end
47protease27.099103YN1551_2381ALN Gene Pfam PDBpeptidasepeptidase like cluster 11. The finding of peptidases as part of the accessory proteome, both in achaea and in some baceria points at an unknown step. Possible proteolytic maturation or activation of some Cas protein.
50Csx161.078277Cpha266_2053ALN Gene Pfam PDBTM+CARFN-ter trans membrane, C-ter CARF
55C3a5544.447283Riv7116_3423ALN Gene Pfam PDBABC permeaselikely oligonucleotide ABC transporter permease
57HerA20.467583Vdis_1158ALN Gene Pfam PDBType III coevolvedHerA helicase normally involved in DNA repair, but this version of it has coevolved with crenarchaeal Type III modules
58NurA33.017348Cmaq_1511ALN Gene Pfam PDBType III coevolvedNurA-like nuclease: ssDNA endonuclease and 5'-3' exonuclease on ss or dsDNA
59C3a5955.847178Nos7107_2826ALN Gene Pfam PDBTrans-membrane (TM)transmembrane, unknown structure/function
64C3a6443.07778M1627_1085ALN Gene Pfam PDBUnknownunknown structure/function
67C3a6740.036388Cagg_1075ALN Gene Pfam PDBAAA-Csx3Csx3 at C-terminus; looks like nucleotidyl phosphokinase at N-terminus
69C3a6929.696125B005_5545ALN Gene Pfam PDBunknownunknown structure-function
76C3a7639696Ssol_2352ALN Gene Pfam PDBunknownunknown structure-function
77Csx1629.73699NE0116ALN Gene Pfam PDBa.k.a. cas_VVA1548
80C3a8027.4861184Metvu_1289ALN Gene Pfam PDBAAA+ ATPaseAAA+ ATPase, DUF499 family, unknown function
81Csx1-9.486179M1425_0870ALN Gene Pfam PDBType I associatedsmall CARF protein often also associated with Type I systems
83cas_RFas88.696157VMUT_1493ALN Gene Pfam PDBcluster 17 associatedunknown structure-function, always associated with Cas-RecF, cluster 17
84cas_RFas66.576128VMUT_1494ALN Gene Pfam PDBcluster 17 associatedpossible RNA recognition motif (RRM) at N-ter. Always associated with Cas-RecF, cluster 17
87Cmr774.656185LS215_0814ALN Gene Pfam PDBCmr7Cmr7 (Sulfolobus)
93C3a9330.575155Thit_1351ALN Gene Pfam PDBposs. AAA ATPaselikely AAA+ ATPase
96Mvol_0529-fam42.255125CLB_2116ALN Gene Pfam PDBDNA binding C-ter.possible DNA binding C-ter. domain
104Csx139.745466PYCH_07970ALN Gene Pfam PDBCARF+PINN-ter CARF, C-ter PIN toxin
107Csx184.375680TTHB155ALN Gene Pfam PDBCARFlarge CARF protein with interspersed stretches of sequence that have no significant domain matches
108cas_RFas78.85180Tneu_0568ALN Gene Pfam PDBcluster 17 associatedunknown structure-function, always associated with Cas-RecF, cluster 17
116C3a11657.924257AZL_010430ALN Gene Pfam PDBcluster 29 associatedunknown structure-function, always associated with CorA, cluster 29
121Csx2362.044168CFF8240_1673ALN Gene Pfam PDBunknownunknown structure-function
123C3a12337.094211Cagg_1060ALN Gene Pfam PDBunknownunknown structure/function
124C3a12444.424433Cyan10605_3519ALN Gene Pfam PDBunknownSYNPCC7002_F0039 has an SMC_N domain pointing at a function in DNA metabolism and recombination such as recN, recF. The gbk entry for SYNPCC7002_F003 states that Slr7100 is a homolog, which seems unlikely. However, the latter is CRISPR3-associated.
139C3a13928.9141158Metvu_1226ALN Gene Pfam PDBHKD+Snf2N-ter. HKD family nuclease fused to C-ter. Snf2-like helicase domain which has no helicase activity
146C3a14668.434177SSO1421ALN Gene Pfam PDBDNA binding HTHLikely HTH domain, DNA biinding protein
152C3a15225.853143Anacy_5891ALN Gene Pfam PDBType III-C specificType III-C specific. Likely N-ter. RRM (RNA recognition motif; unknown function
156C3a15628.13330NIES39_M01150ALN Gene Pfam PDBoxidoreductasemethylenetetrahydromethanopterin reductase
159C3a15927.063304Calkr_2554ALN Gene Pfam PDBmethyl transferaseN-ter. methyl transferase domain
162C3a16240.433181Calni_1607ALN Gene Pfam PDBunknownunknown structure-function
166C3a16632.13181Chy400_2477ALN Gene Pfam PDBposs. crRNA proc.possible crRNA maturation ribonuclease, instead of Cas6
168C3a16831.13374MYO_4810ALN Gene Pfam PDBunknownslr7083 in S.6803. Single k.o. for this gene. Belongs to CRISPR3.
173Csx21703107Adeg_0988ALN Gene Pfam PDBunknownunknown structure-function
174C3a17474.33309Adeg_0798ALN Gene Pfam PDBnucleotidyl trans.nucleotidyl transferase. Could be involved as the toxin in the Type IV toxin-antitoxin antiphage mechanism. See PMID: 24465005 for details.
178C3a17837.973120Fisuc_1555ALN Gene Pfam PDBputative invertaseputative transposase or invertase
181C3a18125.553275Kcr_0450ALN Gene Pfam PDBdiadenylate cyclasediadenylate cyclase (c-di-AMP synthetase), DisA, possible involved in signalling
186C3a18628.773469ANT_24470ALN Gene Pfam PDBC-3',4' desaturaseC-3',4' desaturase CrtD
187C3a187393624Syn7502_02851ALN Gene Pfam PDBNERD+UvrDDNA/RNA helicase + (exo)nuclease at C-terminus
189C3a18950.7331053Hoch_1322ALN Gene Pfam PDBkinasecatalytic domain of bacterial serine/threonine kinases(STKs)
193C3a19343.7831067Thebr_0949ALN Gene Pfam PDBposs. alt. Cas3looks like a variant of Cas3 for Type III
194C3a19426.833477TepRe1_0131ALN Gene Pfam PDBAA transporterAmino acid transporter or permease
196C3a19646.29396Runsl_4349ALN Gene Pfam PDBunknownunknown structure-function
198C3a19840.673532K649_15110ALN Gene Pfam PDBTPR-repeattetratricopeptide repeat
205C3a20567.53387MLP_11360ALN Gene Pfam PDBunknownunknown structure-function
206C3a20629.393535B005_5542ALN Gene Pfam PDBTAP-like proteinTAP-like protein; family of peptidases and hydrolases
211cas_RFas51.673187Pogu_1163ALN Gene Pfam PDBunknownunknown structure-function
212C3a21239.673338RoseRS_0371ALN Gene Pfam PDBSNc+LTDN-ter. endonuclease domain with a C-ter. lamin tail domain (LTD)
214C3a21430.163639Rcas_3915ALN Gene Pfam PDBGlgB1,4 alpha glucan-branching enzyme, GlgB
216C3a21625.6377SSA_1254ALN Gene Pfam PDBadaptation associatedassociated with Type III systems that have an adaptation module. Shows sequence similarity with Cas1 in other bacteria.
222C3a22244339Vpar_1800ALN Gene Pfam PDBunknownunknown structure-function
223C3a22343.33384YG5714_0635ALN Gene Pfam PDBunknownunknown structure-function
227PrimPol36.733577Thein_1350ALN Gene Pfam PDBadaptation polymeraseN-ter. tetratricopeptide, C-ter. likely primase/polymerase domain. Linked to adaptation. Possible reverse transcriptase repair
230C3a23028.763145TT_P0114ALN Gene Pfam PDBRecX family proteinlikely RecX recombination regulator