forked from primesearch/mfakto
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathChangelog-mfakto.txt
206 lines (194 loc) · 10.1 KB
/
Changelog-mfakto.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
version 0.16
- merge features from mfaktc 0.23
- json results output
- logging
- RDNA3 support
- fix barrett15 double precision issues
version 0.15 (...)
- more GPU models are now detected
- some support for Intel HD Graphics
- mfakto now compiles and runs on macOS
- updated Windows build instructions and fixed some errors
- fixed some Linux compilation issues
- removed dependency on AMD APP SDK as it has been discontinued
- menu / key-press evaluation to dynamically adjust settings like
SievePrimes, SieveSize, SieveProcessSize (and others)
- bug fix: SieveCPUMask on Linux had "random" results
- bug fix: kernel compilation did not work correctly if a device number
greater than 1 was specified via the -d option
- bug fix: verbosity level was not passed to valid_assignment()
- bug fix: the '-d g' option was hard-coded to use an invalid device number
- bug fix: the DETAILED_INFO and CHECKS_MODBASECASE debug options no longer
cause compilation to fail
- bug fix: the line "Overflow!" was printed ad infinitum when the DETAILED_INFO
debug option was enabled
- bug fix: the number of threads per grid could be set to zero when less than
the vector size * maximum threads per block
version 0.14 (2014-04-17)
- --perftest enhancements including GPU sieve evaluation (for optimizing GPUSievePrimes etc.)
- successfully resync when the working directory was temporarily lost (ejected USB device or interrupted network drive)
- save and reload compiled OpenCL kernels => reduce startup time (UseBinfile config variable)
- MoreClasses config variable to allow for a "less classes" version for very short assignments (GPU sieve only)
- BugFix: enforce GPUSieveSize being a multiple of GPUSieveProcessSize
- FlushInterval config variable to fine tune the number of kernels in the GPU queue => address high CPU load issue of newer AMD drivers
- MinGW build (thanks to kracker)
- slight performance improvement for the montgomery kernels
- improved English wording in program output, ini file etc. (thanks to kracker)
- recognition of new GPUs (8xxx, R9, new APUs) (thanks to kracker)
- added a warning when using VectorSize=1
- fix for a small memory leak (~0.5kB per assignment)
- compatible with windows 8.1
version 0.13 (2013-05-19)
- most important: GPU sieving (thanks to George Woltman)
o set SieveOnGPU=1 to enable
o use GPUSievePrimes, GPUSieveSize, GPUSieveProcessSize to tweak it
(see mfakto.ini for details)
o on lower-end GPUs GPU sieving works as well, but may result in ~20% less
throughput than CPU sieving: try it out.
o almost no CPU utilization, but Catalyst 13.4 and 13.5 have a bug so mfakto
will consume up to one CPU core. Stay below 13.4 for now.
- performance improvements for all kernels between 2 and 20% (also visible when CPU sieving)
o use of more intrinsics like amd_bitalign and mad_hi
o direct use of comparison results
o skipping a few calculations due to better judgement of required precision
o new kernels (thanks for the ideas and CUDA examples to George Woltman and Oliver Weihe)
o big set of kernels based on 15-bit-math (speedup for Cayman and GCN)
- closer alignment with mfaktc
o -v switch (also available as a Verbosity config variable)
o output adjustments
o internal file and function structure
- better diagnostics through improved tracing and error number decoding
- prepare for running on Intel HD4000 or NVIDIA devices (not fully functional yet)
- --perftest mode to test CPU sieving performance (to be extended for kernel performance tests)
- allow setting CPU affinity for the CPU sieve thread (SieveCPUMask config variable)
- 30k extra test cases now available in the -st2 selftest
version 0.12 (2012-07-29)
- IMPORTANT: bugfix for missing factors for certain exponent ranges.
Thanks to dabaichi and Axelsson for bug report and help on
http://mersenneforum.org/showthread.php?t=13977&page=16#383
- handling of <worktodo>.add (worktodo file name as configured, with suffix
replaced): will be appended to <worktodo> between 5 and ~10 minutes after
the add file's creation, or when moving to the next assignment
- new mfakto.ini key SieveCPUMask to set the siever thread's CPU affinity
- auto-detection of the GPU-type, along with new GPUType key in mfakto.ini,
removed PreferKernel key
- optimization for GCN (Graphics Core Next: HD77xx, HD78xx, HD79xx)
- improved estimation of the number of compute units
- bugfix: occasional abort on very high SievePrimes (> 450,000)
- progress line output format: percent complete now %5.1f (was %6.2f),
exponent now %-10u (was %d)
- bugfix: if a kernel file cannot be found, tell it's name instead of
KERNEL_FILE (thanks to Axelsson)
- approx. 30,000 new test cases (factors) for pre-release testing
(thanks to James Heinrich)
version 0.11 (2012-05-21)
- new 24-bit barrett kernel for FCs up to 2^70 - very fast!
- new 15-bit barrett kernel for FCs up to 2^73 - almost as fast,
especially on Cayman this one has a speedup of 50%
- new SievePrimesMin ini-file variable to replace the so-far fix
value of 5000
- new V5UserID and ComputerID ini-file variables that let you configure
these ID's for the results file output (so far only useful for mersenne.ca)
- new TimeStampInResults ini-file variable allows to configure that each
output line in the results file should be preceded by a time stamp
- new ProgressHeader and PrintFormat ini-file variables to adapt the
information that is printed after each class is finished. See the included
mfakto.ini file for details.
- On Linux: Siever code is now compiled with gcc4.6: ~10% faster sieve
- file locking: worktodo and results files accesses are now synchronized
using a lock file (.lck appended to the file name).
- evaluation of GHz-days of assignments, and current speed as GHz-days/day
- Ctrl-C handler already in selftest to get a summary of so-far-completed
tests
- new --pertest option to test the siever performance depending on SievePrimes
and SieveSizeLimit (if that is not fix at compile time)
- using a fix power of 2 for the number of GPU threads (still set via
GridSize)
- removed many compiler-warnings
version 0.10 (2011-12-19)
- added workaround for compatibility with Catalyst 11.10 and above
- MODBASECASE for barrett kernels enabled
- Checkpoints now keep a backup (.bu) that is automatically read if the
.cpt file is corrupt
- mfakto now allows reading checkpoints from any version of mfaktc and mfakto
as long as the other parameters and the checksum are OK
- When Checkpoint files cannot be used, there are some diagnostic messages
- added some optimization options to the Linux Makefile
- split mul24 kernel in two bitranges 0-64 and 61-72 allowing for some
optimizations: +10-20% for 70-bit assignments using the mul24 kernel
- merged mfaktc 0.18 features:
+ inifile parameter CheckpointDelay (see below)
+ extended selftest (-st2)
+ minor bugfix reporting a bitlevel as incompletely tested if a factor was
found in the last class
+ Changes to the factor found result line as discussed in the mersenne forum
+ Linux: the signal handler now also catches SIGTERM
- Writing checkpoints can now be limited:
+ set CheckpointDelay <s> to write a checkpoint only if at least s seconds
have passed since the last checkpoint
+ set Checkpoint <n>, n>1 to write a checkpoint only after n classes have
been tested (set to n=1 to enable CheckpointDelay)
- added commandline parameter -i|--inifile <file>
to load <file> as inifile (default: mfakto.ini), allowing multiple instances
of mfakto in the same directory
- added ResultsFile parameter to inifile
- added program support for GPU-sieving (siever kernel not yet functional!)
version 0.9 (2011-10-01)
- fixed bug in 72-bit kernel that might miss small factors (<48 bit)
unfortunately, the fix slows this kernel by ~3-5%
- added test for the above bug to the selftest
- better calculation for SievePrimesAdjust
version 0.8 (2011-09-13)
- added --help and parameter checking
- exclude single-vectored 72-bit-mul24-kernel (not working with AMD APP SDK 2.5)
- removed THREADS_PER_BLOCK (no longer needed, will be selected automatically
based on the GPU capabilities)
- removed slow 95-bit kernel (not usable anyway)
- added to mfakto.ini a config setting
PreferKernel=mfakto_cl_barrett79|mfakto_cl_71 as HD5xxx and HD6xxx
have their top-speed with different kernels
- tuned the settings for SievePrimesAdjust - should now be usable
version 0.7 (2011-08-10)
- vectorized barrett kernels
- re-enabled the MODBASECASE checks
- removed crash-workaround from barrett kernels
(no longer needed since Catalyst 11.7): +3%
- fixed index-evaluation in barrett kernels
- fix for bitshifts of >31 bits in barrett kernels
- fixed a few compiler warnings
- improved error handling for bad config
- added "-d c" switch to force running on CPU
- added warning when no atomics are available
- added option for debuggable kernel code
- fixed/optimized limits for preprocessing on CPU
version 0.6 (2011-07-09)
- ported barrett kernels for 79 and 92 bit factor sizes
- added support for GPUs without atomics (HD4xxx)
- dropped 2- and 16-wide vectors
- optimization in the 24-bit kernels: +3%
- added a few testcases to the short selftest
- SievePrimes auto-adjust now working (still not optimal)
- slightly faster sieve-initialization per class
version 0.5 (2011-06-19)
- fixed 72-bit subtraction (in various places) as per TheJudger's comment
on www.mersenneforum.org - Thanks for the hint!
- fixed boundaries for 95-bit kernel (-st failures)
- added vectorized versions of the 72-bit kernel (2, 4, 8, 16 wide)
- added VectorSize config item to select one of the vectorized kernels
- added GridSize=4 (up to 2M threads per grid)
- combined the "if (a>b) sub(a,b)" to a "sub_if_gt" without any "if"s
(only conditional moves - prerequisite for the vector approach)
- moved the new DETAILED_INFO and CL_PERFORMANCE_INFO debug defines to
params.h
- added GPLv3 headers everywhere
version 0.4 (2011-06-09)
- first version to go out to some people
- cleanup of
- unused code
- cuda-references and workarounds
- removed OpenCL-example-code
- merged in the signal handler from mfaktc 0.18-pre2
- 71-bit-mul24-kernel working for all tests:
- fixed bit-shifting offset in square_72_144_shl
- fixed carry in sub72
- unrolled the modulo loop in the 95-bit kernel: 10% faster