-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathgipa_man.txt
77 lines (55 loc) · 5.59 KB
/
gipa_man.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
GIPA GIPA
<b>NAME</b>
<b>gipa</b> -- compression/decompression tool to package compress and encode massive archive files with floating-point data
<b>SYNOPSIS</b>
<b>python gipae.py</b> [file] [<b>-dh</b>] [{<b>-p</b>}]
<b>DESCRIPTION</b>
The <b>gipa</b> program compresses and decompresses files holding massive amounts of floating point data.
The specification of all the parameters is <b>optional</b>. If no parameter is specified, it will exit normally with the display of this man/help page.
If only the file is specified and no other arguments, it will use default arguments for precision and delimiter.
When compressing the input file, it will generate another output file with extension <b>'.tiff'</b>. The output file can be opened with any standard image viewer.
The <b>naming convention is important</b> and should <b>not</b> be manipulated/modified for the file to decompress properly using the counterpart of this program.
The naming convention followed is: name of the file being compressed_file extension_max data value_number of zeros.tiff (For more information please read the patent application).
The dependance on the naming convention is because of no additional bits to the data for storing the file and then using those bits for decompression.
This saves some additional space without the manipulation of data.
There are only <b>two</b> bits added to the data used for the purpose of decompression.
The first bit is the precision mode, which is an integer in the range [0,1], where 0 signifies <b>low</b> precision and 1 signifies <b>high</b> precision.
In case of low precision, the data is stored exactly as given in floating point into the output image whereas in case of high precision, the data is scaled down into another range of {0,1} and then stored in the output image.
Both the techniques do not manipulate the data but it is not guaranteed because if the floating point numbers are not in the representable range or fall in the number system gaps, it may scale to the nearest representable number.
The low precision works well with around 5 to 7 decimal places whereas the high precision can go even upto 22 decimal places.
This version of gipa code is not capable of decompressing files compressed using gipae. For decompressing the files, another utility exists called <b>gipad</b>.
They have been separated for security purposes, but can be combined to work together or could be invoked from a single <b>shell script</b>.
The code is written in <b>Python 2.7.3 [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin</b>, but should work on all the distributions of python.
The code is dependent on two additional python libraries: <b>PIL</b> and <b>numpy</b>.
To install:
<b>Numpy</b>: pip install numpy
or visit https://pypi.python.org/pypi/numpy for other options
<b>PIL</b>: pip install pillow or pip install PIL
or visit https://pypi.python.org/pypi/PIL for other options
<b>OPTIONS</b>
The following options are available:
<b>-d, --d, -delimiter, --delimiter</b>
This option specifies the delimiter used in the file to separate the floating-point data.
The file can have data separated by any delimiter except single quote(<b>'</b>) and double quote(<b>"</b>) as they have been reserved for internal purposes.
The delimiter can be specified on the command line in single or double quotes.
This parameter is <b>optional</b>. Default: ' ' (space). Usage: <b>--d=' '</b>.
<b>-p, --p, -precision, --precision</b>
This option selects the precision mode to be used when compressing the floating-point input file.
Currently only <b>two</b> modes are supported, high and low.
High can be specified with the keyword <b>'high'</b> or the integer <b>1</b>, while low can be specified using <b>'low'</b> or <b>0</b>.
If a parameter outside these 4 is specified, the program exits with a <b>Precision not recognised error</b>.
This parameter is <b>optional</b>. Default: high. Usage: <b>--p=1 or --p='high'</b>.
<b>-h, --h, -help, --help</b>
This option prints this usage summary/man page and exits.
<b>ENVIRONMENT</b>
The environment variable python needs to be set first in order to run it from command line with arguments.
Options on the command line will override the options in the environment.
<b>HISTORY</b>
The gipa program was originally started for the compression of pointer files generated by the mapping of data of data onto an organism's DNA by a software called <b>Nibble (Malik and Dhar, 2015 PCT/IB2015/057964)</b> by Girik Malik and Pawan K. Dhar.
The algorithm used in <b>GIPA</b> is called <b>pkd</b>.
It was earlier written for a general compression but was later converted to the one for Massive Data (the word <b>BIG DATA</b> is not used here as the data is unstructered but only to a certain extent, otherwise it could have been called a BIG Data Compression Tool).
<b>AUTHORS</b>
This implementation of <b>gipa</b> was written by Girik Malik <[email protected]>.
<b>BUGS</b>
(<b>Warning</b>) The data may get manipulated in the trailing decimal places to some extent only if it is not in a representable format.
GIPA May 30, 2016 GIPA