-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathatam2ssts.py
2011 lines (1729 loc) · 99.8 KB
/
atam2ssts.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
'''
Top-level file for DNA single-stranded tile (SST) sequence designer used in the following publication.
"Diverse and robust molecular algorithms using reprogrammable DNA self-assembly"
Woods*, Doty*, Myhrvold, Hui, Zhou, Yin, Winfree (*Joint first co-authors). Nature, 2019.
With special thanks to Constantine Evans.
This program takes as input a set of tile types and outputs SSTs encoding them, given as a parameter
file using the syntax given in, for example ./input_tilesets/ibc_6bit.py.
See README.md file for installation info.
The glues on each tiles should all have "strength 1" in the aTAM. The program has the following design
criteria (these, and a number of related constraints, are described in more detail in Suppl Info A of the
above-mentioned publication, and in the above-mentioned example input file):
1) Glues (aka sticky ends) should have roughly equal binding energy (i.e., be isoenergetic).
2) Tiles should have small internal secondary structure.
3) Interactions between all pairs of tiles should be small, whether or not they have equal (complementary) glues.
4) Minimize interactions of sticky ends that are not complementary, but which might end up "close" due to a tile attachment error.
(one input domain matching, one input domain mismatching) should have very low binding affinity.
5) Minimize interactions of sticky ends that are not complementary, but which might end up "close" by
being co-located on a growing lattice frontier.
6) No GGGG in a tile and no {C,G}^4 in a sticky end.
Regarding criterion (4), the sticky ends might be close because they are on the
same tile type, but this should be handled by case (2). Therefore, we only
assume y and x are "close" if y* is an "output" sticky ends of some tile type, and
x is an "input" sticky ends, and it is possible for them to get close because the tile
where x is could bind by strength 1 to a tile adjacent to y*'s tile. Here is an
illustration (the # represents a boundary between sticky ends):
/---------#---------> tile type t1
| b* /---------#--------->
| | w
| w* |
\---------#---------- | y
/----N----#----E----> \---------#----------
|
|
| tile type t2
\----W----#----S----- /---------#--------->
/---------#---------> | w
| y* |
| | x
| a* \---------#----------
\---------#----------
In this example, tile type t1 matches both available output sticky ends w* and
y*. However, tile type t2 matches w* but not y*. Therefore, because of the
match between t2's w and w*, and the mismatch between x and y*, we require
that x and y* have very low binding affinity.
Regarding criterion (5), we would want y* and w* to have little interaction, and w and x to have little interaction.
'''
from __future__ import print_function
import dsd
import sst_dsd as sd
import random
import string
import collections
import sys
import itertools
import time
import multiprocessing
from multiprocessing.pool import ThreadPool
import math
import re
global_thread_pool = ThreadPool()
_length10 = 10
_length11 = 11
directions = ('S', 'W', 'N', 'E')
canonical_directions = ('N','W')
ALGO_CONF_GLUES_WEIGHT_EXCESS = 4
ALGO_CONF_GLUES_WEIGHT_EXTEND = 2 # Natural number > 1
TILE_SEC_STRUCT_WEIGHT_EXCESS = 4
TILE_SEC_STRUCT_WEIGHT_EXTEND = 2
TILE_PAIRS_HEURISTIC_WEIGHT_EXCESS = 1
TILE_PAIRS_HEURISTIC_WEIGHT_EXTEND = 1
INPUT_GLUE_PAIRS_EXTEND = 1
OUTPUT_GLUE_PAIRS_EXTEND = 1
LATTICE_BINDING_EXTEND = 1
QUIT_EARLY_OPTIMIZATION = True
THREADED = True
THREADED_TILE_PAIRS = True
# negate standard defn of free energy so quantities are positive
NEGATE = True
def is_canonical(direction):
return direction in canonical_directions
def opposite(direction):
if direction == 'S': return 'N'
elif direction == 'N': return 'S'
elif direction == 'E': return 'W'
elif direction == 'W': return 'E'
else: raise ValueError('%s is not a direction' % direction)
def direction2axis(direction):
if direction == 'S': return 'NS'
elif direction == 'N': return 'NS'
elif direction == 'E': return 'EW'
elif direction == 'W': return 'EW'
else: raise ValueError('%s is not a direction' % direction)
# maketrans is in string if python2 but in str if python3
try:
_wctable = str.maketrans('ACGTacgt','TGCAtgca')
except AttributeError:
_wctable = string.maketrans('ACGTacgt','TGCAtgca')
def wc(seq):
"""Return reverse Watson-Crick complement of seq"""
return seq.translate(_wctable)[::-1]
def random_iter(iterable):
"""Random selection from itertools.permutations(iterable, r)"""
pool = tuple(iterable)
return iter(random.sample(pool, len(pool)))
class Glue:
_glues = dict()
def factory(label, axis, end_constraint):
"""Create new glue if none already exists with given label, axis, and lower
and upper bounds on strength (lowDG and highDG), length, and biotin_direction direction."""
if (label,axis) in Glue._glues:
glue = Glue._glues[(label,axis)]
if end_constraint != glue.end_constraint:
raise ValueError('glue with label %s and axis %s already exists with different EndConstraint' % (label,axis))
else:
new_glue = Glue(label, axis, end_constraint)
Glue._glues[(label,axis)] = new_glue
glue = new_glue
return glue
factory = staticmethod(factory)
def __init__(self, label, axis, end_constraint):
"""axis is either "NS" or "EW"; label is a string"""
self.label = label
self.end = None
self.axis = axis
self.end_constraint = end_constraint
end_constraint.glues.append(self)
self.tiles = []
self.end_bound = False
self.conflicting_glues_first_order = set()
self.conflicting_glues_generalized = set()
def __str__(self):
return self.label + ';' + self.axis
def __repr__(self):
return str(self)
# def __repr__(self): return 'Glue(label=%s,end=%s,axis=%s)' % (self.label,self.end,self.axis)
def __hash__(self):
return hash(self.label + ';AXIS;' + self.axis)
def __eq__(self, other):
return self.label == other.label and self.axis == other.axis
def __cmp__(self, other):
cmp_label = cmp(self.label, other.label)
if cmp_label != 0:
return cmp_label
else:
return cmp(self.axis, other.axis)
def add_tile(self, tile):
if tile not in self.tiles:
self.tiles.append(tile)
def input_tiles(self):
"""Return tiles for which this is an input glue"""
return [tile for tile in self.tiles if self in [tile.glues['N'], tile.glues['W']]]
def output_tiles(self):
"""Return tiles for which this is an output glue"""
return [tile for tile in self.tiles if self in [tile.glues['S'], tile.glues['E']]]
def canonical_tiles(self):
"""Return tiles for which this is a glue in a canonical direction"""
return [tile for tile in self.tiles if self in [tile.glues[direction] for direction in canonical_directions]]
def canonical_direction(self):
if self.axis == 'NS':
for direction in ['N','S']:
if direction in canonical_directions:
return direction
else:
for direction in ['E','W']:
if direction in canonical_directions:
return direction
def direction(self, which):
if self.axis == 'NS' and which == 0: return 'S'
elif self.axis == 'NS' and which == 1: return 'N'
elif self.axis == 'EW' and which == 0: return 'W'
elif self.axis == 'EW' and which == 1: return 'E'
else: raise ValueError('which must be 0 or 1, not %s' % which)
def _check_direction(self, direction):
if self.axis == 'NS' and direction not in ('N','S'):
raise ValueError('glue axis %s must be used with direction either N or S, %s is invalid' % (self.axis, direction))
elif self.axis == 'EW' and direction not in ('E','W'):
raise ValueError('glue axis %s must be used with direction either E or W, %s is invalid' % (self.axis, direction))
def get_parity(self, direction):
"""Get parity of tile that has this glue in given direction."""
for tile in self.tiles:
if tile.glue(direction) == self:
return tile.parity
raise ValueError('no tile has glue %s in direction %s' % (repr(self), direction))
def input_parity(self):
for tile in self.tiles:
if tile.glue('N') == self:
return tile.parity
elif tile.glue('W') == self:
return tile.parity
elif tile.glue('S') == self:
return 1 - tile.parity
elif tile.glue('E') == self:
return 1 - tile.parity
raise ValueError('no tile has glue %s in any direction' % (repr(self)))
def get_end(self, direction, include_biotins=False):
if self.end is None: return None
self._check_direction(direction)
if is_canonical(direction):
end = self.end
else:
end = wc(self.end)
if include_biotins and self.end_constraint.biotin_direction == direction:
if not self.tiles:
raise ValueError('assign this glue to a tile before calling get_end with include_biotins=True')
parity = self.get_parity(direction)
biotin_pos = biotin_end_pos(direction, parity)
if end[biotin_pos] != 'T':
raise ValueError('to include internal biotin_direction on glue %s of tiles %s, base at position %d must be T, but instead is %s' %
(self, self.tiles[0], biotin_pos, end[biotin_pos]))
end = end[:biotin_pos] + '/iBiodT/' + end[biotin_pos+1:]
return end
def input_end(self):
"""If axis is "NS", return end of "N", if axis is "EW", return end of "W" """
if not self.assigned():
raise ValueError('glue %s is not assigned' % self.label)
if self.axis == 'NS':
return self.get_end('N')
else:
return self.get_end('W')
def output_end(self):
"""If axis is "NS", return end of "S", if axis is "EW", return end of "E" """
if not self.assigned():
raise ValueError('glue %s is not assigned' % self.label)
if self.axis == 'NS':
return self.get_end('S')
else:
return self.get_end('E')
def canonical_end(self):
"""If axis is "NS", return end of "S", if axis is "EW", return end of "W" """
if not self.assigned():
raise ValueError('glue %s is not assigned' % self.label)
return self.end
def bind(self, end):
"""Bind this glue to given end, so that it cannot be assigned anything else.
Unlike assign, bind is used to state that a glue MUST have the given end;
this is used to design new ends for glues that are part of a system
of pre-existing tiles, where some ends are already existing and cannot
be changed."""
if self.end_bound:
raise ValueError('glue %s is already bound to end %s; cannot re-bind to end %s' % (self.label, self.end, end))
self.end_bound = True
self._assign_no_check_for_bind(end)
def _assign_no_check_for_bind(self, end):
if self.tiles:
tile = self.tiles[0]
self_direction = None
for direction in directions:
if tile.glue(direction) == self:
self_direction = direction
break
proper_len = tile.len_end(self_direction)
if len(end) != proper_len:
raise ValueError('glue %s cannot be assigned to end %s; length %d is required but end is of length %d' % (self.label, end, proper_len, len(end)))
self.end = end
def assign(self, end):
if self.end_bound:
raise ValueError('glue %s is bound and cannot be assigned end %s' % (self.label,end))
self._assign_no_check_for_bind(end)
def unassign(self):
self.end = None
def bound(self):
return self.end_bound
def assigned(self):
return self.end is not None
def short_list(lst):
if len(lst) < 1000:
return str(lst)
else:
return '%s ... %s' % (str(lst[:10])[:-1], str(lst[-10:])[1:])
class EndConstraint:
"""This represents a constraint on a glue that affects which sticky ends it
may have (e.g., glue label does not constrain sticky end, but length does).
Currently it refers to:
1) lowDG and highDG, bounds on the energy of the end assigned to the glue
2) the length end assigned to the glue
3) (potential) direction of biotin_direction on the glue (which means that base must
be either a T or an A, depending on whether it is canonical)
4) the alphabet to use (for three-letter codes)"""
_end_constraints = dict()
def factory(lowDG, highDG, length, biotin_direction, endGC, endAT, orth_any, orth_any_ave, orth_algorithmic_conflict, orth_colocated, domain_indv_sec_struct, domain_pair_sec_struct, domain_pair_sec_struct_ave, tile_sec_struct, hamming, orth_algorithmic_conflict_generalized, lattice_binding_lower_threshold): #, seqs):
"""Create new end constraint if none already exists."""
args = tuple((lowDG, highDG, length, biotin_direction, endGC, endAT,
orth_any, orth_any_ave, orth_algorithmic_conflict, orth_colocated, domain_indv_sec_struct,
domain_pair_sec_struct, domain_pair_sec_struct_ave, tile_sec_struct, hamming, orth_algorithmic_conflict_generalized, lattice_binding_lower_threshold))
if args in EndConstraint._end_constraints:
end_constraint = EndConstraint._end_constraints[args]
else:
new_end_constraint = EndConstraint(*args)
EndConstraint._end_constraints[args] = new_end_constraint
end_constraint = new_end_constraint
return end_constraint
factory = staticmethod(factory)
def end_constraints():
"""Return list of all EndConstraints created in system."""
return list(EndConstraint._end_constraints.values())
end_constraints = staticmethod(end_constraints)
def __init__(self, lowDG, highDG, length, biotin_direction, endGC, endAT,
orth_any, orth_any_ave, orth_algorithmic_conflict, orth_colocated, domain_indv_sec_struct,
domain_pair_sec_struct, domain_pair_sec_struct_ave, tile_sec_struct,
hamming, orth_algorithmic_conflict_generalized, lattice_binding_lower_threshold): #, seqs):
self.lowDG = lowDG
self.highDG = highDG
self.lowPF = None
self.highPF = None
self.length = length
self.biotin_direction = biotin_direction
self.endGC = endGC
self.endAT = endAT
self.orth_any = orth_any
self.orth_any_ave = orth_any_ave
self.orth_algorithmic_conflict = orth_algorithmic_conflict
self.orth_colocated = orth_colocated
self.domain_indv_sec_struct = domain_indv_sec_struct
self.domain_pair_sec_struct = domain_pair_sec_struct
self.domain_pair_sec_struct_ave = domain_pair_sec_struct_ave
self.tile_sec_struct = tile_sec_struct
self.hamming = hamming
self.orth_algorithmic_conflict_generalized = orth_algorithmic_conflict_generalized
self.lattice_binding_lower_threshold = lattice_binding_lower_threshold
# self.seqs = seqs
self.glues = []
def __str__(self):
return 'EndConstraint(lowDG=%.1f,highDG=%.1f,len=%d, bio_dir=%s, endGC=%s, endAT=%s, orth_any=%.1f, orth_any_ave=%.1f,orth_algorithmic_conflict=%.1f, orth_colocated=%.1f, domain_indv_sec_struct=%.1f, domain_pair_sec_struct=%.1f, domain_pair_sec_struct_ave=%.1f, tile_sec_struct=%.1f, hamming=%d, orth_algorithmic_conflict_generalized=%.1f, lattice_binding_lower_threshold=%.1f)' \
% (self.lowDG,self.highDG,self.length, self.biotin_direction, self.endGC, self.endAT, self.orth_any, self.orth_any_ave,self.orth_algorithmic_conflict, self.orth_colocated, self.domain_indv_sec_struct, self.domain_pair_sec_struct, self.domain_pair_sec_struct_ave, self.tile_sec_struct, self.hamming, self.orth_algorithmic_conflict_generalized, self.lattice_binding_lower_threshold)
def __hash__(self):
if not hasattr(self,'_hash'):
self._hash = hash(str(self))
return self._hash
def __eq__(self, other):
return self is other
# return hash(self) == hash(other)
def __cmp__(self, other):
return cmp(hash(self),hash(other))
class Tile:
def __init__(self, name, s, w, n, e, parity, sec_struct, orth, orth_share):
"""Create tile with given name, glues, and "SST parity".
name = name of tile
s, w, n, e = glues
parity = 0 if inner sticky ends are length 10,
1 if inner sticky ends are length 11 """
self.name = name
self.glues = {'N': n, 'S': s, 'E': e, 'W': w}
self.glue_set = set([n,s,e,w])
for glue in [n,s,e,w]:
glue.add_tile(self)
self.parity = parity
self.sec_struct = sec_struct
self.orth = orth
self.orth_share = orth_share
for (direction,glue) in list(self.glues.items()):
if glue.bound():
seq = glue.get_end(direction)
seq_len = len(seq)
if not seq_len == len_end(direction, parity):
raise ValueError('sequence %s of glue %s is wrong length for direction %s on tile %s with SST parity %d' % (seq, repr(glue), direction, self.name, parity))
def __hash__(self):
return hash(self.name)
def __eq__(self, other):
return self.name == other.name
def __cmp__(self, other):
# return cmp(self.name, other.name) # not legal in Python 3; the next is actually the officially recommended replacement
return (self.name > other.name) - (other.name < self.name)
def __str__(self):
return self.name
def __repr__(self):
return 'Tile(name=%s,S="%s",W="%s",N="%s",E="%s",parity=%d)' % (self.name,
self.glues['S'],
self.glues['W'],
self.glues['N'],
self.glues['E'],
self.parity)
def idt_sequence(self):
"""DNA sequence, together with possible biotin_direction labeling."""
seq = self.sequence_spaced(include_biotins=True)
#if self.biotin_direction: seq = '/5Biosg/ ' + seq
return seq
def idt(self):
"""IDT order string for this tile."""
return '%s;%s;%s' % (self.name, self.idt_sequence(), 'sst')
def glue(self, direction):
if direction not in directions:
raise ValueError('%s is not a direction' % direction)
return self.glues[direction]
def all_glues(self):
"""Return all glues on this tile."""
return list(self.glues.values())
def unbound_glues(self):
"""Return all unbound glues on this tile."""
uncons_glues = []
for glue in self.glues.values():
if not glue.bound():
uncons_glues.append(glue)
return uncons_glues
def len_end(self, direction):
return len_end(direction, self.parity)
def directions_with_length(self, length):
"""Return directions of glues with given length"""
directions_local = []
for direction in directions_local:
if self.len_end(direction) == length: directions_local.append(direction)
return directions_local
def _ends_in_order(self, include_biotins=False):
"""List of DNA sequence of sticky ends in order S, W, N, E."""
ends = []
for direction in ['S','W','N','E']:
glue = self.glue(direction)
end = glue.get_end(direction, include_biotins=include_biotins)
if not end: end = '?' * self.len_end(direction)
ends.append(end)
return ends
def sequence(self, include_biotins=False):
"""DNA sequence of sticky ends in order S, W, N, E."""
return ''.join(self._ends_in_order(include_biotins))
def sequence_spaced(self, include_biotins=False):
"""DNA sequence of sticky ends in order S, W, N, E, with spaces between ends."""
return ' '.join(self._ends_in_order(include_biotins))
def len_end(direction, parity):
if parity == 1:
if direction in ['S','E']: return _length10
elif direction in ['N','W']: return _length11
else: raise ValueError('%s is not a direction' % direction)
elif parity == 0:
if direction in ['S','E']: return _length11
elif direction in ['N','W']: return _length10
else: raise ValueError('%s is not a direction' % direction)
def biotin_tile_pos(direction):
"""Which position should the internal dT with biotin_direction go, if the end lies in
the given direction. (Should make the major groove, where the internal
biotin_direction resides, point towards the "inside" of the SST tube).
The indexing is with respect to the entire tile, not with respect to the
start of the glue end (hence the positions range between 0 and 41, not
between 0 and 9 or 0 and 10)."""
if _length10 == 10 and _length11 == 11:
if direction == 'S': return 5
elif direction == 'W': return 15
elif direction == 'N': return 27
elif direction == 'E': return 37
else: raise ValueError('%s is not a valid direction' % direction)
else: # this is for debugging purposes when I want to test fewer sticky ends than 4^10 or 4^11
if direction == 'S': return 2
elif direction == 'W': return 8
elif direction == 'N': return 13
elif direction == 'E': return 18
else: raise ValueError('%s is not a valid direction' % direction)
def biotin_end_pos(direction, parity):
"""Like biotin_tile_pos, but relative to the glue end, not the
whole tile, so indexed from 0 to 9 or 10. Depends on "SST parity" of tile
since that determines the length of the glue end.
Note that if the end is not canonical, to have a T at a
position pos, needs its canonical partner to have an A at position
end_length - pos.
parity = 0 if inner sticky ends ('W','N') are length 10,
1 if inner sticky ends are length 11"""
tile_pos = biotin_tile_pos(direction)
if direction == 'S': return tile_pos
elif direction == 'W': return tile_pos - (len_end('S', parity))
elif direction == 'N': return tile_pos - (len_end('S', parity) + len_end('W', parity))
elif direction == 'E': return tile_pos - (len_end('S', parity) + len_end('W', parity) + len_end('N', parity))
def random_bad_unconstrained_tile(bad_tiles):
bad_unconstrained_tiles = []
for tile in bad_tiles:
if tile.unbound_glues():
bad_unconstrained_tiles.append(tile)
return random.choice(bad_unconstrained_tiles)
def split(l, num):
"""Yield successive chunks from l, each of same size, to make num total.
http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python"""
assert 0 < num <= len(l)
n = len(l) // num
if len(l) % num != 0:
n += 1
for i in range(0, len(l), n):
yield l[i:i+n]
LOG_BAD_GLUES = False
def log_bad_glues(reason):
if LOG_BAD_GLUES:
print(reason + ';', end=' ')
class TileSet:
def __init__(self, tiles, detect_nondeterminism, mutually_exclusive_tilename_pairs):
for tile in tiles:
tile.tileset = self
self.tiles = tiles
tile_name_list = [tile.name for tile in tiles]
tile_name_set = set(tile_name_list)
if len(tile_name_set) != len(tile_name_list):
raise ValueError('duplicate tile name')
self.name2tile = { tile.name: tile for tile in tiles }
self.tile_pairs_to_check = [(t1,t2) for (t1,t2) in itertools.combinations_with_replacement(tiles,2)
if (t1.name,t2.name) not in mutually_exclusive_tilename_pairs]
self.tile_pair_names_to_check = [ (t1.name,t2.name) for (t1,t2) in self.tile_pairs_to_check ]
self.find_conflicting_glues(detect_nondeterminism)
self.check_parity_against_glues()
def end_constraints_unbound(self):
if not hasattr(self,'_end_constraints_unbound'):
set_end_constraints_unbound = set()
for glue in self.glues_unbound():
set_end_constraints_unbound.add(glue.end_constraint)
self._end_constraints_unbound = list(set_end_constraints_unbound)
return self._end_constraints_unbound
def check_parity_against_glues(self):
"""Check that the tiles are consistent in that glues only join tiles
to tiles of the opposite parity."""
for tile in self.tiles:
for direc in directions:
glue = tile.glue(direc)
opp = opposite(direc)
for other_tile in glue.tiles:
if other_tile.glue(opp) == glue and tile.parity == other_tile.parity:
raise ValueError('glue %s joins tiles %s and %s but they have the same parity' % (glue,tile,other_tile))
def find_conflicting_glues(self, detect_nondeterminism):
"""Search through all tiles, assuming north and west are "input" sticky
ends, and assign to each such glue a list of its conflicting glues.
These are glues it could end up next to because
1) the other input glue is shared with another tile,
2) two output glues of a given tile,
3) two output glues of diagonally adjacent tiles (e.g., w* and y* above)."""
# reason 1
for (tile1,tile2) in self.tile_pairs_to_check:
if tile1 == tile2:
continue
ng1 = tile1.glue('N')
wg1 = tile1.glue('W')
ng2 = tile2.glue('N')
wg2 = tile2.glue('W')
if detect_nondeterminism and ng1 == ng2 and wg1 == wg2:
raise ValueError('nondeterminism detected: tiles %s and %s share north glue %s and west glue %s in common' % (tile1.name,tile2.name,ng1.label,ng2.label))
# In the following case we have a nondeterministic tileset, that is:
# tile1 != tile2, and tile1 & tile2
# share both N and W glues (i.e. tile1 & tile2 share both input sides)
if ng1 == ng2 and wg1 == wg2:
continue
if ng1 == ng2:
wg1.conflicting_glues_first_order.add(wg2)
wg2.conflicting_glues_first_order.add(wg1)
if wg1 == wg2:
ng1.conflicting_glues_first_order.add(ng2)
ng2.conflicting_glues_first_order.add(ng1)
if len(tile1.name.split(';')) == 3 and len(tile2.name.split(';')) == 3:
# If tile names are of the form ___;___;___ then the code assumes they are using
# our proofreading block naming convention and carries out an orthogonality test
# for generalised algorithmic (tile attachment) errors. Generalised algorithmic (tile attachment)
# errors are errors where we assume one or more errors have already occurred elsewhere
# in a proof-reading block. Generalised algorithmic (tile attachment) errors are also described in Suppl. Info. A.
gate_position1,_,proof_internal_pos1 = tile1.name.split(';')
gate_position2,_,proof_internal_pos2 = tile2.name.split(';')
if gate_position1 == gate_position2 and proof_internal_pos1 == proof_internal_pos2:
if wg1 != wg2:
wg1.conflicting_glues_generalized.add(wg2)
wg2.conflicting_glues_generalized.add(wg1)
if ng1 != ng2:
ng1.conflicting_glues_generalized.add(ng2)
ng2.conflicting_glues_generalized.add(ng1)
def check4G(self):
# find glues whose tiles have 4 G's
bad_glues = []
for tile in self.tiles:
if tile.unbound_glues(): # if all bound, then nothing we can do
seq = tile.sequence()
if 'GGGG' in seq:
log_bad_glues('4G')
for dir1, dir2 in [('S', 'W'), ('W', 'N'), ('N', 'E')]:
g1 = tile.glue(dir1)
g2 = tile.glue(dir2)
end1 = g1.get_end(dir1)
end2 = g2.get_end(dir2)
if 'GGGG' in end1 + end2:
if not g1.bound():
bad_glues.append(g1)
if not g2.bound():
bad_glues.append(g2)
if g1.bound() and g2.bound():
print('\n' + '*' * 79 + 'WARNING: glues %s and %s on tile %s are bound to sequences %s and %s, which produces a G quadruplex\n' % (g1, g2, tile, end1, end2))
return bad_glues
def check_every_glue(self, temperature):
bad_glues = []
# check all unbound glues for too much interaction with themselves
# (both its end and the WC complement)
for glue_unbound in self.glues_unbound():
ec = glue_unbound.end_constraint
end_unbound = glue_unbound.input_end()
end_unbound_wc = glue_unbound.output_end()
energy = sd.binding(end_unbound, end_unbound, temperature, NEGATE)
if energy > ec.orth_any:
log_bad_glues('oa%.1f' % energy)
bad_glues.append(glue_unbound)
energy = sd.binding(end_unbound_wc, end_unbound_wc, temperature, NEGATE)
if energy > ec.orth_any:
log_bad_glues('oa%.1f' % energy)
bad_glues.append(glue_unbound) # check this unbound glue against all glues (potentially bound)
# for too much interaction; if found, add unbound glue (also add glue
# if it is not bound)
for glue in self.glues():
end = glue.input_end()
end_wc = glue.output_end()
problem = False
if end != end_unbound_wc:
energy = sd.binding(end, end_unbound, temperature, NEGATE)
if energy > ec.orth_any:
problem = True
log_bad_glues('oa%.1f' % energy)
energy = sd.binding(end_wc, end_unbound_wc, temperature, NEGATE)
if energy > ec.orth_any:
problem = True
log_bad_glues('oa%.1f' % energy)
if end != end_unbound:
energy = sd.binding(end, end_unbound_wc, temperature, NEGATE)
if energy > ec.orth_any:
problem = True
log_bad_glues('oa%.1f' % energy)
energy = sd.binding(end_wc, end_unbound, temperature, NEGATE)
if energy > ec.orth_any:
problem = True
log_bad_glues('oa%.1f' % energy)
if problem:
bad_glues.append(glue_unbound)
if not glue.bound():
bad_glues.append(glue)
return bad_glues
def check_tile_sec_struct(self, temperature, num_bad_now, num_bad_opt, threaded, weight_excess):
"""Find glues whose tiles have too much secondary structure."""
if QUIT_EARLY_OPTIMIZATION and num_bad_opt > 0:
bad_glues = []
if threaded:
group_size = global_thread_pool._processes
for tile_group in grouper(self.tiles, group_size):
tile_group = [tile for tile in tile_group if tile] # strip out None's that fill in to make size of last group equal to group_size
results = [global_thread_pool.apply_async(sd.hairpin, args=(tile.sequence(), temperature, NEGATE)) for tile in tile_group]
energies = [result.get() for result in results]
for tile,energy in zip(tile_group, energies):
if tile.unbound_glues() and tile.sec_struct > 0: # if all bound, then nothing we can do
if energy > tile.sec_struct:
# add proportional to excess of energy
num_to_add = int(math.ceil(weight_excess*(energy - tile.sec_struct)))
bad_glues.extend(tile.unbound_glues() * num_to_add)
if len(bad_glues)*TILE_SEC_STRUCT_WEIGHT_EXTEND + num_bad_now > num_bad_opt:
return bad_glues
else:
for tile in self.tiles:
energy = sd.hairpin(tile.sequence(), temperature, NEGATE) if tile.sec_struct > 0 else 0
if tile.unbound_glues() and tile.sec_struct > 0: # if all bound, then nothing we can do
if energy > tile.sec_struct:
# add proportional to excess of energy
num_to_add = int(math.ceil(weight_excess*(energy - tile.sec_struct)))
bad_glues.extend(tile.unbound_glues() * num_to_add)
if len(bad_glues)*TILE_SEC_STRUCT_WEIGHT_EXTEND + num_bad_now > num_bad_opt:
return bad_glues
return bad_glues
else:
if threaded:
def ss_if_positive(tile, temperature):
if tile.sec_struct < 0:
return (tile,0)
else:
return (tile,sd.hairpin(tile.sequence(), temperature, NEGATE))
results = [global_thread_pool.apply_async(ss_if_positive, args=(tile, temperature)) for tile in self.tiles]
tile_energies = [result.get() for result in results]
else:
tile_energies = []
# tile_num = 1
for tile in self.tiles:
# print 'checking tile# {}'.format(tile_num)
# tile_num += 1
energy = sd.hairpin(tile.sequence(), temperature, NEGATE) if tile.sec_struct > 0 else 0
tile_energies.append((tile,energy))
bad_glues = []
for tile,energy in tile_energies:
if tile.unbound_glues() and tile.sec_struct > 0: # if all bound, then nothing we can do
if energy > tile.sec_struct:
# add proportional to excess of energy
num_to_add = int(math.ceil(weight_excess*(energy - tile.sec_struct)))
bad_glues.extend(tile.unbound_glues() * num_to_add)
return bad_glues
def check_input_pairs(self, temperature, num_bad_now, num_bad_opt, threaded):
"""Check pairs of ends that each tile's input ends bind to (which must
be pulled apart for the tile to bind)."""
if QUIT_EARLY_OPTIMIZATION and num_bad_opt > 0:
bad_glues = []
for tile in self.tiles:
ng = tile.glue('N')
wg = tile.glue('W')
if ng.bound() and wg.bound():
continue
end1 = ng.output_end()
end2 = wg.output_end()
orth_colocated = ng.end_constraint.orth_colocated
if orth_colocated < 0:
continue
energy = eval_colocated_end_pair(end1, end2, temperature)
if energy > orth_colocated:
if not ng.bound(): bad_glues.append(ng)
if not wg.bound(): bad_glues.append(wg)
if len(bad_glues)*INPUT_GLUE_PAIRS_EXTEND + num_bad_now > num_bad_opt:
return bad_glues
return bad_glues
else:
glue_end_pairs = []
for tile in self.tiles:
ng = tile.glue('N')
wg = tile.glue('W')
if ng.bound() and wg.bound():
continue
nend = ng.output_end()
wend = wg.output_end()
orth_colocated = ng.end_constraint.orth_colocated
if orth_colocated < 0:
continue
glue_end_pairs.append((ng,wg,nend,wend,orth_colocated))
end_pairs = [(nend,wend) for (_,_,nend,wend,orth_colocated) in glue_end_pairs]
energies = eval_colocated_end_pairs(end_pairs, temperature, threaded)
bad_glues = []
for ((ng, wg, nend, wend, orth_colocated), energy) in zip(glue_end_pairs, energies):
if energy > orth_colocated:
if not ng.bound(): bad_glues.append(ng)
if not wg.bound(): bad_glues.append(wg)
return bad_glues
def check_output_pairs(self, temperature, num_bad_now, num_bad_opt, threaded):
"""Check pairs of ends that are output ends of tiles (which must
be pulled apart another tile to bind using either of them)."""
if QUIT_EARLY_OPTIMIZATION and num_bad_opt > 0:
bad_glues = []
for tile in self.tiles:
sg = tile.glue('S')
eg = tile.glue('E')
if sg.bound() and eg.bound():
continue
end1 = sg.output_end()
end2 = eg.output_end()
orth_colocated = sg.end_constraint.orth_colocated
if orth_colocated < 0:
continue
energy = eval_colocated_end_pair(end1, end2, temperature)
if energy > orth_colocated:
if not sg.bound(): bad_glues.append(sg)
if not eg.bound(): bad_glues.append(eg)
if len(bad_glues)*OUTPUT_GLUE_PAIRS_EXTEND + num_bad_now > num_bad_opt:
return bad_glues
return bad_glues
else:
glue_end_pairs = []
for tile in self.tiles:
sg = tile.glue('S')
eg = tile.glue('E')
if sg.bound() and eg.bound():
continue
out_end1 = sg.output_end()
out_end2 = eg.output_end()
orth_colocated = sg.end_constraint.orth_colocated
if orth_colocated < 0:
continue
glue_end_pairs.append((sg,eg,out_end1,out_end2,orth_colocated))
end_pairs = [(end1,end2) for (_,_,end1,end2,orth_colocated) in glue_end_pairs]
energies = eval_colocated_end_pairs(end_pairs, temperature, threaded)
bad_glues = []
for ((sg, eg, end1, end2, orth_colocated), energy) in zip(glue_end_pairs, energies):
if energy > orth_colocated:
if not sg.bound(): bad_glues.append(sg)
if not eg.bound(): bad_glues.append(eg)
return bad_glues
def check_lattice_binding(self, temperature, num_bad_now, num_bad_opt, threaded):
"""Check the strength of binding of each tile, by its two input ends,
to the matching pair lattice ends."""
if QUIT_EARLY_OPTIMIZATION and num_bad_opt > 0:
bad_glues = []
for tile in self.tiles:
wg = tile.glue('W')
ng = tile.glue('N')
if wg.bound() and ng.bound():
continue
end1 = wg.input_end() # was "output_end"
end2 = ng.input_end()
lower_threshold_binding_energy = ng.end_constraint.lattice_binding_lower_threshold # assume all glues get same lattice_binding_lower_threshold
if lower_threshold_binding_energy < 0:
continue
energy = eval_lattice_binding_energy(end1, end2, temperature)
if energy < lower_threshold_binding_energy:
if not wg.bound(): bad_glues.append(wg)
if not ng.bound(): bad_glues.append(ng)
if len(bad_glues)*LATTICE_BINDING_EXTEND + num_bad_now > num_bad_opt:
return bad_glues
return bad_glues
else:
glue_end_pairs = []
for tile in self.tiles:
wg = tile.glue('W')
ng = tile.glue('N')
if wg.bound() and ng.bound():
continue
end1 = wg.input_end()
end2 = ng.input_end()
lower_threshold_binding_energy = ng.end_constraint.lattice_binding_lower_threshold # assume all glues get same lattice_binding_lower_threshold
if lower_threshold_binding_energy < 0:
continue
glue_end_pairs.append((wg,ng,end1,end2,lower_threshold_binding_energy))
end_pairs = [(end1,end2) for (_,_,end1,end2,lower_threshold_binding_energy) in glue_end_pairs]
energies = eval_lattice_binding_energies(end_pairs, temperature, threaded)
bad_glues = []
for ((wg, ng, end1, end2, lower_threshold_binding_energy), energy) in zip(glue_end_pairs, energies):
if energy < lower_threshold_binding_energy:
if not wg.bound(): bad_glues.append(wg)
if not ng.bound(): bad_glues.append(ng)
return bad_glues
def check_algorithmic_conflicting_glues(self, temperature, num_bad_now, num_bad_opt, threaded, weight_excess):
""" Find glues that have too much interaction with an algorithmic conflicting glue
(could end up near each other during a strength-1 binding event)."""
bad_glues_first_order = []
bad_glues_generalized = []
if QUIT_EARLY_OPTIMIZATION and num_bad_opt > 0:
# bad_glues = []
evals = 1
for glue in self.glues():
end = glue.input_end()
for conflicting_glue in glue.conflicting_glues_first_order:
conflicting_end = conflicting_glue.output_end()
orth_algorithmic_conflict_first_order = conflicting_glue.end_constraint.orth_algorithmic_conflict
if orth_algorithmic_conflict_first_order < 0:
continue
# the proper order in which to concatenate the ends depends
# on the axis, observe:
# /---------#--------->
# |
# |
# |
# \----W----#----------
# /---------#----E----> this should be W (in) then E (out)
# |
# |
# |
# \---------#----S----- this should be S (out) then N (in)
# /----N----#--------->
# |
# |
# |
# \---------#----------
if glue.axis == 'EW': end1,end2 = end, conflicting_end
else: end1,end2 = conflicting_end, end
energy = eval_colocated_end_pair(end1, end2, temperature)
evals += 1
if energy > orth_algorithmic_conflict_first_order:
num_to_add = int(math.ceil(weight_excess*(energy - orth_algorithmic_conflict_first_order)))
if not glue.bound():
bad_glues_first_order.extend([glue] * num_to_add)
if not conflicting_glue.bound():
bad_glues_first_order.extend([conflicting_glue] * num_to_add)
# print 'testing for early quit; len(bad_glues) = {}; num_bad_now = {}; len(bad_glues)+num_bad_now = {}; num_bad_opt = {}'.format(len(bad_glues), num_bad_now, len(bad_glues)+num_bad_now, num_bad_opt)
if len(bad_glues_first_order)*ALGO_CONF_GLUES_WEIGHT_EXTEND + num_bad_now > num_bad_opt:
# print '\nnum acg evaluations: {}'.format(evals)
return (bad_glues_first_order, bad_glues_generalized) # bad_glues_first_order
for conflicting_glue in glue.conflicting_glues_generalized:
conflicting_end = conflicting_glue.output_end()
orth_algorithmic_conflict_generalized = conflicting_glue.end_constraint.orth_algorithmic_conflict_generalized
if orth_algorithmic_conflict_generalized < 0:
continue
if glue.axis == 'EW': end1,end2 = end, conflicting_end
else: end1,end2 = conflicting_end, end
energy = eval_colocated_end_pair(end1, end2, temperature)
evals += 1
if energy > orth_algorithmic_conflict_generalized:
num_to_add = int(math.ceil(weight_excess*(energy - orth_algorithmic_conflict_generalized)))
if not glue.bound():
bad_glues_generalized.extend([glue] * num_to_add)
if not conflicting_glue.bound():
bad_glues_generalized.extend([conflicting_glue] * num_to_add)
# print 'testing for early quit; len(bad_glues) = {}; num_bad_now = {}; len(bad_glues)+num_bad_now = {}; num_bad_opt = {}'.format(len(bad_glues), num_bad_now, len(bad_glues)+num_bad_now, num_bad_opt)
if len(bad_glues_generalized)*ALGO_CONF_GLUES_WEIGHT_EXTEND + num_bad_now > num_bad_opt:
# print '\nnum acg evaluations: {}'.format(evals)
return (bad_glues_first_order, bad_glues_generalized) # bad_glues_generalized
# print '\nnum acg evaluations: {}'.format(evals)
else:
glue_end_pairs_first_order = []
glue_end_pairs_generalized = []
for glue in self.glues():
end = glue.input_end()