-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathchap3.tex
1045 lines (960 loc) · 40.3 KB
/
chap3.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\chapter{System Design}
\label{sec:design}
This chapter will discuss the design
details of Sieve. First,
we will give an overview of the
major system components. Next, we
will discuss the best usage models
for Sieve. We will provide an overview
of ABE and how to assign attributes to
user data. Then, we will talk about
how data is uploaded to the storage provider.
We will discuss how to define
access policies and issue keys to web services.
We will provide a key revocation scheme
and talk about protecting against
device loss. We, then, present a technique
to minimize ABE overheads. Finally,
we show how to change attributes on data
in the storage provider.
\section{System Components}
As shown and described in Chapter~\ref{sec:intro},
Sieve
consists of three components: a user client,
a storage provider daemon, and an import
daemon run by third party web
services. The \textbf{Sieve client} runs
on each user device. The client implements
a Javascript library which allows users to define the
access policies for each web service
(\S\ref{sec:policies}). For each policy, the
client generates the associated ABE
decryption key, and sends it to the
appropriate web service. Before the user
uploads an object to the storage provider,
the Sieve client transparently signs the
object, and then
encrypts the object and signature with the
relevant ABE attributes (\S\ref{sec:attrGen}).
To protect against lost or stolen devices,
the RSA signing key and the ABE master secret
are partitioned across user devices; before a
client can sign or encrypt an object, the
client must participate in a secret sharing
protocol with two-factor authentication
(\S\ref{sec:secretSharing}).
The \textbf{storage provider daemon}
accepts encrypted data from Sieve clients.
Each encrypted object is tagged with
cleartext attributes $a_0,\ldots,a_N$, where $a_i$
is an attribute name and value, such as
``name=Joe". The storage provider
builds an index of these attributes so
that, when a web service requests objects whose
attributes satisfy a certain policy, i.e. the policy
on their ABE decryption key,
the storage provider can
efficiently locate those objects (\S\ref{sec:attrGen}).
The \textbf{Sieve import daemon} is run
by a third party web service. The import
daemon accepts ABE keys and upload
notifications from Sieve clients. After
receiving an upload notification, the import
daemon fetches encrypted objects from the
storage provider. The import daemon then
decrypts the objects using its ABE key,
and feeds the cleartext data into the
web service's legacy pipeline for consuming
user data (\S\ref{sec:attrGen}).
From the perspective of a third party,
a user's Sieve storage is read-only;
only the user can write
new objects and update old ones. Third
parties use their own storage for data
that is derived from a user's Sieve
objects.
In Sieve, there are five types of keys.
There is the user \textbf{ABE master key},
which is used to derive \textbf{ABE decryption keys}. The
\textbf{ABE public key} is used to encrypt the metadata
block(\S\ref{sec:dataPush}) under user-provided
attributes. The ABE decryption keys, which are
associated with a policy and given to web
services by the user, are used to decrypt
metadata blocks. The actual data are signed
with \textbf{per-user RSA keys} and encrypted
with \textbf{symmetric keys}.
\section{Usage Model}
In theory, any web service that imports
user data is compatible with Sieve. In
practice, certain kinds of web services
and user data are easier to integrate into
the Sieve model. Sieve works best with
\begin{smitemize}
\item data streams that are tightly
bound to a particular user, and
\item web services that can tolerate
those data streams being read-only
(and perhaps only partially disclosed).
\end{smitemize}
Examples of Sieve-amenable data streams
include demographic information like age and
location; sensitive financial and medical
records; sensor data from quantified self
applications; longitudinal, cross-site
histories of browsing behavior and e-commerce
transactions; and multimedia data streams
containing photos, videos, and audio.
Examples of web services that consume such
data streams are social media applications
like Instagram~\cite{instagram},
exercise trackers like Open mHealth~\cite{omh},
and financial analysis sites like Mint~\cite{mint}
that require access to a user's spending habits.
Applications like Reddit~\cite{reddit}
and StackOverflow~\cite{soverflow} are
less appropriate for Sieve. In these applications,
user data has less standalone value to the
owner; instead, most of the value derives from
embedding the data in a larger, service-specific
context like a Reddit discussion.
%@@@Cut the text paragraph below is crunched for space.
Web-based email is also an awkward fit for the
Sieve model, since email services require mutable,
per-user state like mailboxes, but Sieve exports
read-only storage. An email service could pull
read-only outgoing messages from Sieve storage,
and implement the mutable mailbox state on the
service's own machines. However, such an
architecture would be awkward, since users
would have no way to selectively expose
\emph{incoming} messages to the mail service.
\section{Overview of ABE}
\label{sec:abeOverview}
\begin{figure}
\centering
\includegraphics{figs/abeExample.pdf}
\caption[ABE Example]{In this example, there are two ABE keys at the
top, and three ABE-encrypted objects at the bottom.
An arrow indicates that a key can decrypt a particular
object.}
\label{fig:abeExample}
\end{figure}
Attribute-based encryption~\cite{kpabe} is a
public-key encryption scheme. Each
user has a unique ABE master key, which is used to derive
ABE decryption keys.
A cleartext object is associated with attributes
that govern how the object is encrypted.
The ABE public key used to encrypt the object is derived directly
from these attributes, which can be integers or strings.
Since ABE is a public-key
encryption scheme, anyone can encrypt objects.
In Section~\ref{sec:relabel}, we discuss how to
change these labels after encryption. Each
decryption key has an access control structure
(ACS) which enumerates one or more attributes.
An ACS can test an attribute for equality (e.g.,
\texttt{location=``Paris''}) or comparative
value (e.g., \texttt{age > 35}). An ACS can
also chain those simple tests using \texttt{ANDs},
\texttt{ORs}, and \texttt{NOTs}. As shown in
Figure~\ref{fig:abeExample}, a key can only decrypt
an object if the key's ACS matches the object's
attribute values. Using this user-generated ACS,
the user derives these ABE decryption
keys from the ABE master key. Finally,
since ABE is a public-key encryption scheme,
it is slower than symmetric key cryptography.
We use the shorthand notation $K_{a_0,\ldots,a_N}$
to refer to an ABE key whose ACS contains the
attribute tests $a_0,\ldots,a_N$ where $a_i$ is
an attribute name and value. All tests are
implictly joined via \texttt{ANDs} unless we
explicitly note otherwise.
\section{Assigning Attributes to User Data}
\label{sec:attrGen}
Raw user data comes from a variety of
sources. Some of it is directly generated
by a user's devices; for example, a user
might enter financial information directly into a
spreadsheet. Data might also come from an
external source, like an email attachment.
Sieve associates each object, regardless
of its provenance, with a set of attributes.
Some attributes can be automatically generated
by hardware, like the GPS coordinates for a
running route. Other attributes can be
extracted by software, using application-specific
transducers or semantic file systems~\cite{sfs,contextsfs,graphsfs}.
Users can also manually tag objects. Sieve is
agnostic to the manner in which attributes
are created, although our implementation of
the Sieve client provides a GUI which simplifies
manual tag assignment. Users can also
change the data's
attributes after encryption (\S\ref{sec:relabel}).
Users and web services must agree on data schemas,
so that web services can meaningfully aggregate
and process information from different users. In
particular, web services need to know a
standardized set of attributes which are defined
by user data. To define this standardized set,
Sieve uses FOAF~\cite{FOAF} as the schema % http://xmlns.com/foaf/spec/ [For higher-level overview, see http://en.wikipedia.org/wiki/FOAF_%28ontology%29]
model for data about human users, and RDF~\cite{RDF} %http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/ [For higher-level overview, see http://en.wikipedia.org/wiki/Resource_Description_Framework]
as the schema model for data about user objects.
Each Sieve user has a standardized FOAF record
which stores basic scalar information like
her name, location, and birthday. Sieve
associates each entry in the FOAF record
with an ABE attribute; for example, a user's
name is associated with the $userName$ attribute.
Sieve does not upload the actual FOAF record
to the storage provider, since the only purpose
of the FOAF record is to standardize the
metadata that is associated with each user.
Instead, Sieve uploads individual FOAF
entries, encrypting each one with the associated
ABE attribute (e.g., $<alice>_{K_{field=userName}}$,
where ``alice" is the encrypted data and $K_{field=userName}$
means that the data can only be
decrypted by web services whose ABE keys
possess the access to the $userName$ field attribute).
Sieve associates each data object with a
type-specific RDF schema. For example, a
W2 tax record has attributes for the user's
pre-tax income, her number of dependents,
and so on. Similar to FOAF records, RDF
records are only used to define a standard
attribute set for each data type. Sieve uploads
individual encrypted RDF entries
to the storage provider.
Some data objects like images are not
decomposable, i.e., each object is disclosed
or not disclosed at the granularity of the
entire object. For objects like this, Sieve
uploads the entire object, encrypting it
using the standardized RDF attributes and
any manually added user tags. For example,
a photo has standard attributes like height
and width, and may also possess user-defined
tags like ``Vacation'' or ``Family.''
As applications evolve, RDF and FOAF schemas
may change.Sieve is compatible with
preexisting techniques for synchronizing
schema changes across a distributed system~\cite{f1,maintaindataware, onthefly}.
\section{Uploading Data to the Storage Provider}
\label{sec:dataPush}
Suppose that the user wishes to upload a file
$F$ that has attributes $a_0,\ldots,a_N$.
Before uploading the file, the Sieve client
must encrypt the file such that decryption
is only possible with the ABE key whose ACS
match the attributes $a_0,\ldots,a_N$. The na{\"i}ve
approach is to directly encrypt $F$ with
$K_{a_0,\ldots,a_N}$. Since ABE is significantly
slower than symmetric key cryptography. Thus,
Sieve uses a \textit{hybrid encryption} scheme,
encrypting the file data with a symmetric key,
and encrypting the symmetric key with ABE.
The end-to-end upload protocol is the following.
First, Sieve generates a symmetric key $k$,
and uses that key to encrypt $F$. Sieve uploads
the encrypted $<F>_k$ to the storage provider.
The storage provider responds with a GUID for
the file. To reduce latency, a user can also
request and cache available GUIDs before
the Sieve protocol starts to avoid having to
wait for a server response for each file upload.
Sieve then uploads a \emph{metadata
block} for $F$. The metadata block is an encrypted
pointer containing $<GUID, k>_{K_{a_0,\ldots,a_N}}$.
Only principals which possess keys with ACSes that
match $a_0,\ldots,a_N$ can
decrypt the pointer, fetch the object, and
decrypt the object.
In the next section, we describe how web services
acquire ABE keys. For now, we merely say that
users do not share ABE keys with storage providers.
Thus, a storage provider cannot inspect the data
that it stores. The provider can try to tamper
with the data or produce fake user data, but
Sieve clients sign each object with a user-specific
RSA key before encrypting the object and its signature,
allowing web services to detect tampering.
From the perspective of a third party
web service, Sieve storage is read-only.
However, a user is free to create new
objects, delete old ones, and update
objects that reside at preexisting
GUIDs. If a user's device has cached
the GUID and the symmetric key for a
particular object, the user can update
that object directly, without having
to fetch the associated metadata block
and incur ABE overheads to decrypt it.
\section{Defining and Enforcing Access Policies}
\label{sec:policies}
In Sieve, all user data is private by
default, since users must explicitly
share ABE decryption keys that provide
access to data. When a third party requests
access permissions, e.g., upon the first
time that a user visits a site, the user
generates an access policy for the site.
Policies are defined in terms of attributes,
and the Sieve client provides a GUI which
makes it easy for users to explore which
attributes her data contains, and which
objects would be exposed for a given policy.
Policies are simple boolean expressions;
for example, a web service used by a
physician might receive the policy
(fileType=``medicalRecord''
AND year>2010 AND doctor=``John'').
After the user has defined a policy, her
Sieve client assembles the ABE master secret
(\S\ref{sec:secretSharing}) and generates
an ABE key with the appropriate ACS. The
Sieve client then sends the key and the name
of the user's storage provider to the remote
web service. The message is protected with SSL/TLS~\cite{tls}
to ensure confidentiality and integrity.
Later, when the web service desires to
access user data, the service does not
need to interact with the user. Instead,
the service sends an access request directly
to the
user's storage provider. The request
contains a list of the attributes which
belong to the data of interest. The
storage provider returns the encrypted
metadata blocks for the relevant objects.
The web service decrypts the metadata,
revealing the GUIDs for the requested
objects as well as their symmetric
encryption keys. The web service uses the
GUIDs to fetch the encrypted objects.
After decrypting the objects locally,
the service feeds the cleartext data
into an application-specific data pipeline.
Once that happens, Sieve is uninvolved in
the application workflow. Thus, Sieve is
compatible with the current web ecosystem
which uses third party computation and
storage to add value to user data. However,
Sieve provides users with cryptographically
strong control over the raw data that each
service receives. Sieve's access policies
also have three attractive properties:
\begin{smitemize}
\item The number of policies (and keys) scales with
the number of web services that a user
shares with, not the much larger number
of objects that she owns.
\item Policy generation is decoupled from
object generation. At object creation
time, users do not have to speculate
a priori about whom a new object might
be shared with.
\item Policies safeguard objects using
cryptography, but users are insulated
from the details of key management.
\end{smitemize}
Given all of this, we believe that Sieve
strikes a good balance between security,
usability, and backwards compatibility with
current web services.
\section{Key Revocation}
\label{sec:revocation}
In Sieve, an individual object is encrypted
with a symmetric key $k$; the object's metadata
block (which contains $k$ and the object's
GUID) is encrypted with an ABE key $K_{a_0,\ldots,a_N}$.
A web service caches its ABE key, and it may also
cache symmetric keys and GUIDs, to avoid
repeated fetches and decryptions of metadata
blocks. Caching makes revocation tricky,
since a user that wants to revoke a service's
access rights cannot force the service to delete
cached keys. An honest storage provider can
refuse access requests from deprivileged third
parties, but if the storage provider is
compromised, it can leak data that is
encrypted with ostensibly revoked keys
that are still in the wild.
To protect against storage server compromise,
Sieve revokes keys by \emph{re-encrypting}
user data and metadata with new keys that
are not shared with the newly deprivileged
third party. If the storage provider is
honest at the time of rekeying, then even
if it is compromised later, it will never
leak data that is encrypted with revoked
keys.
Leveraging homomorphic encryption~\cite{gentry},
the storage provider re-encrypts the data
locally, using a rekeying token provided
by the user. The storage provider learns
nothing about the old encryption key, the
new encryption key, or the underlying cleartext;
the user avoids the need to download, re-encrypt, and
re-upload data from her personal devices.
In the rest of this section, we first describe
how data is re-encrypted, and then explain how
the metadata is re-encrypted.
\subsection{Re-encrypting data}
To enable storage providers to re-encrypt data in situ,
Sieve employs a key homomorphic pseudorandom
function~\cite{keyhom, npr99}. The intuition is that
we use this pseudorandom function in place of the AES
block cipher for counter mode encryption. With the properties
of counter mode and a key homomorphic pseudorandom function,
we can construct a key homomorphic encryption scheme
that allows Sieve to re-key data without decrypting.
We define that function $F$ as
\begin{equation*}
F(k, x) = H(x)^k
\end{equation*}
where $H$ is a hash function and $k$ is the
secret key associated with each object $x$ is the
input into the function. $F$
is additively key homomorphic, which means
that, for two keys $k$ and $k'$,
$F(k,x) \cdot F(k',x) = F(k+k',x)$.
All operations described in this section
are done modulo $p$ where $p$ is a large prime.
Using $F$, we define an encryption scheme whose
security is similar to those
of AES. We discuss the security of this
scheme more formally below.
Like AES-CTR, our new encryption scheme
operates on blocks of data, and uses a random
counter to convert a block cipher into a stream
cipher. In our new scheme, the $j$th ciphertext
block $c_j$ is equal to
\begin{equation*}
c_j = m_j \cdot F(k, N + j)
\label{new_enc}
\end{equation*}
where $m_j$ is the $j$th cleartext block, and
$N$ is a public nonce that is equivalent to the
initialization vector in AES-CTR. To decrypt,
a third party extracts $k$ from a metadata
block and performs the
following calculation:
\begin{equation*}
m_j = c_j \cdot F(-k, N + j)
\end{equation*}
To revoke the ABE key $K_{a_0,\ldots,a_N}$, a user's
Sieve client generates a \emph{rekeying token}
for each object that is accessible via
$K_{a_0,\ldots,a_N}$. For an object encrypted by
$k$, the rekeying token is $\delta = -k + k'$,
where $k'$ represents the new encryption key
for the object. The client sends $\delta$ to
the storage provider; this operation is safe
because the provider cannot recover $k$ or $k'$
from $\delta$. The storage provider
uses $\delta$ to compute a new version of
each ciphertext block $c_j$:
\begin{align*}
c_{j,new} &= c_j \cdot F(\delta , N + j) \\
&= m_j \cdot F(k, N+j) \cdot F(-k + k', N+j) \\
&= m_j \cdot F(k', N+j)
\end{align*}
In this manner, the storage provider re-encrypts
objects without learning the encryption keys
or the underlying cleartext.
\paragraph{Security for key homomorphic encryption scheme:}
We discuss some security goals and properties of the
key homomorphic encryption scheme described above.
We want to preserve the semantic security of AES-CTR,
a common symmetric key encryption scheme. We will
show our key homomorphic encryption scheme is also
semantically secure.
First, Naor, Pinkas, and Reingold~\cite{npr99} showed that
$F_{NPR}(k,x) = H(x)^k$ (described above)
is a secure pseudorandom function (PRF)
in the random oracle model~\cite{canetti2004random} if
the Decision Diffie-Hellman assumption holds in $\mathbb{G}$.
An additional feature of this PRF is that it is key
homomorphic, but that additional property does not affect
the security of the PRF.
Now that we have a secure PRF. We define counter-mode
encryption as $c_j = m_j \cdot F(k, N+j)$ where $c_j$
is the $j$th block of the ciphertext, $m_j$ is the
$j$th block of the plaintext, $k$ is the symmetric key,
and $N$ is a unique nonce. A survey by Rogaway~\cite{rogaway2011evaluation}
has shown that this encryption scheme is semantically
secure as long as $F$ is a secure PRF. Typically, the AES
block cipher is
used as $F$ because AES is assumed to be a secure PRF.
In our encryption scheme, we use $F_{NPR}$ as our PRF,
which is key homomorphic.
Therefore, our encryption scheme is semantically secure
in the random oracle model if the Decision Diffie-Hellman
assumption holds.
\subsection{Re-encrypting metadata} Each user
device maintains an integer counter called the
epoch counter. It is initialized to zero, and
represents the number of revocations that the
user has performed. When a user device
generates a new ABE key, Sieve automatically
tags the key with an $epoch$ attribute that
is set to the current value of the epoch
counter. The epoch attribute is a standard
ABE attribute; until now, we have elided the
$epoch$ attribute in key descriptions, but
we explicitly represent it in this section.
Suppose that a web service possesses the ABE
key $K_{a_0,\ldots,a_N,epoch=i}$, where $i$ is a
whole number. To remove the service's access
permissions, the user first re-encrypts the
affected data using homomorphic encryption.
The user then increments the epoch counter to
$i+1$. Next, the user generates a new metadata
block for each re-encrypted object, inserting
the new $k$. The user encrypts the new metadata
block and uploads it to the storage provider;
the metadata is encrypted using the updated
ABE key $K_{a_0,\ldots,a_N,epoch=i+1}$. Finally, the
user sends the new ABE key to any non-revoked
web services who possess the old version of
the key from epoch $i$; remember that if the
user gives multiple web services access to
$a_0,\ldots,a_N$, those services will receive
the same ABE key.
Additional web services may require new ABE
keys, depending on how the attributes in ABE
keys overlap. For example, consider two web services:
the first possesses \allowbreak $K_{(a_0\; OR\; a_1)\; AND\; epoch=0}$,
and the second has $K_{(a_1\; OR\; a_2)\; AND\; epoch=0}$.
Both keys grant access to a metadata block
with attributes $a_1\; AND\; epoch=0$. To revoke
the first key, Sieve re-encrypts the metadata
block using the attributes $a_1\; AND\; epoch=1$.
As a result, the second, non-revoked web service
loses access to the block. Thus, Sieve must
give an updated key $K_{(a_1\; OR\; a_2)\; AND\; epoch=1}$
to the second service.
When Sieve re-encrypts a data object, the
object's GUID stays the same, but its symmetric
key changes. This invalidates any cached keys
that are held by web services. So, when a
service receives an updated ABE key, the
service discards any cached symmetric keys
that were decrypted using the old version
of the ABE key. The signature is unaffected
because the signature is on the cleartext data, which
remains unmodified.
\subsection{Additional details} A Sieve user
will often possess multiple devices; for example,
a single user might possess a smartphone, a tablet, a
laptop, and a quantified life device like a
Fitbit. If a user has multiple devices, then
the device which initiates a revocation will
broadcast the revoked key and the new epoch
counter to the other devices. This ensures
that the other devices do not use an old epoch
number to encrypt new metadata blocks. Network
partitions may delay the rate at which devices
learn about the revocation, so when a device
receives a revocation notice, it proactively
rekeys any data and metadata that it mistakenly
encrypted using the revoked key. Each revocation
notice will have the issue time to help devices
figure out what data was uploaded with the old
epoch number. Computationally
weak devices like Fitbits can delegate rekeying
work to more powerful devices like laptops.
Revocation messages are signed by the public
key of the device that issued the message.
Devices learn about each other's public keys
at Sieve initialization time, and later, when
the user adds a new device. By signing revocation
messages, Sieve prevents arbitrary devices
from injecting fradulent revocation notices.
%!!!jwm: Make sure that we mention this point
% in the intro.
To the best of our knowledge, Sieve is the
first ABE system which supports full re-keying
of \emph{both} data and metadata. Prior ABE systems either
cannot revoke keys at all~\cite{privio}, or can
only revoke access to metadata~\cite{persona,aauth,cachet};
in the latter case, data remains encrypted with
revoked symmetric keys, leaving that data vulnerable
to storage server compromise or negligence.
\section{Protecting Against Device Loss}
\label{sec:secretSharing}
At initialization time, Sieve creates an ABE
master secret. Sieve uses the master secret
to derive the ABE decryption keys that are
given to web services. Thus, the entire
cryptosystem is compromised if the master
secret is lost.
In a straightforward implementation of ABE, each
user device has a copy of the master secret.
However, portable devices like smartphones and
tablets are often lost~\cite{wang2012smartphone},
meaning that a na\"{\i}ve implementation of ABE
exposes the master secret to great risk. Even if
users encrypt the master secret with a
password-derived key~\cite{pbkdf}, users often
pick weak passwords~\cite{gaw2006password},
giving the master secret weak protection in
practice if a device is lost.
To mitigate the impact of lost devices, Sieve
uses Shamir secret sharing~\cite{shamir} to
partition the master secret across a user's
devices. In a $(k,n)$ sharing scheme, the
secret is divided across $n$ devices, and
$k$ shares are required to reconstruct the
secret. In the context of Sieve, this means
that when a user device needs to generate
an ABE key, the device must first gather $k-1$
shares from other devices. Only then can the
device assemble the master secret, generate
the ABE decryption key, send the key to the
web service, and then delete the local copy
of the assembled master secret.
When the master secret is being assembled,
Sieve requires the user to explicitly authorize
each participating device to release its local
share. By default, Sieve uses a $k$ of 2, so
this authorization scheme is similar to
two-factor authentication~\cite{twofactor}---a user
cannot generate an ABE decryption key unless
she controls two separate devices (e.g., a
laptop and a smartphone). This means that,
if an attacker finds a single lost device,
that device cannot generate the master secret.
Sieve also employs secret sharing to protect
the user's signature key. During uploads to
the storage provider, the signature key is
used to authenticate the client-side of the
SSL/TLS session. Thus, the storage provider
can reject fradulent upload attempts from
arbitrary devices.
Secret sharing protects the ABE master secret
and the user's signing key. However, a lost
device possesses a device key that is used to
authenticate messages from that device. An
attacker with a lost device can try to use
the device key to subvert the revocation
protocol (\S\ref{sec:revocation}). For example,
if a malicious lost device can roll back the
epoch to a \emph{smaller} number, uncompromised
devices will upload new data that can be
decrypted with revoked keys. To prevent attacks
like this, Sieve relies on the multi-factor
authentication that is built into the secret
sharing protocol---revocation requires a device
to assemble the master secret, and assembling
the master secret requires the user to possess
multiple devices.
%@@@Delete is crunched for space.
To add or remove devices from the secret sharing
scheme, or to change $k$, the user must invalidate
the old shares. To do so, the user must find $k$
devices to participate in a new secret sharing
exchange that uses the updated $k$ and $n$.
% This process invalidates the old shares
% on any lost devices as long as only $k-1$
% devices are lost before resetting the shares.
Sieve provides no protections against a subverted
device that the user believes is not lost or
malfunctioning. For example, if the user wants
to upload data from the smartphone that she
is currently using, and the smartphone has a
rootkit, the phone can arbitrarily delete the
user's data, upload garbage, or improperly
revoke keys.
\section{Minimizing ABE Overheads}
\label{sec:minAbeCost}
Until now, we have assumed that clients
perform two encryptions for every object
upload: an ABE encryption for the metadata
block, and a symmetric encryption for
the data block. ABE is a public key
cryptosystem, so ABE operations are
much slower than symmetric
ones. Fortunately, Sieve clients can use
several techniques to reduce the frequency
of ABE operations.
The simplest approach is for clients to
store multiple objects inside each data block.
Creating the associated metadata block
will still require an ABE encryption,
but subsequent writes and reads of the data
block will only incur symmetric cryptography
costs---clients can update the data block
in-place, without changing the metadata,
and third parties can cache the data block's
symmetric key to use during reads. For
example, a smartphone with a GPS unit might
use a single data block to store a month's
worth of location data. The phone appends
new location samples to the current month's
data block, creating a new data block and
metadata block when a new month begins.
\begin{figure}
\centering
\includegraphics{figs/guidMap2.pdf}
\caption[Example of Sieve storage-based data structure]
{An example of a storage-based data structure.
Using indirect GUIDS, the metadata block at the top
points to a data block that only contains GUIDS.
Those GUIDs point to raw data blocks. Raw data blocks
can also embed pointers, as demonstrated by the simple
tree structure that links the data blocks.}
\label{fig:guidMap}
\end{figure}
Clients can also leverage more complex \emph{storage-based
data structures}. For example, as shown in
Figure~\ref{fig:guidMap}, a Sieve client can
use indirect GUIDS like a Unix file system uses
indirect data pointers. In this scheme, the
top-level GUID for a storage-based data
structure refers to a metadata block that
points to a \emph{GUID map}. The GUID map
is just a data block that contains a symmetric
key and additional GUIDs; those GUIDs point to
raw data blocks that are encrypted with the
symmetric key. Once again, clients and web
services eliminate ABE costs by encrypting
many objects with the same symmetric key,
and caching that key.
Having many, smaller data blocks instead of
fewer, larger data blocks is useful if the
storage provider does not allow partial block
writes (meaning that all writes force the
client to upload at least a block's worth of
data).\footnote{For example, Amazon's S3
allows partial reads, but not partial writes~\cite{AmazonPUT}.} % http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html
For similar reasons, multiple small blocks
are useful if the symmetric cipher does
not allow new data writes or
updates to random offsets in the
ciphertext.\footnote{Most commonly used
block cipher modes, such as CBC and CTR mode,
do not easily support new writes to random offsets.}
A raw data block can also embed GUIDs which
reference other data blocks. This allows
a client to build more complex data structures
than flat arrays of data blocks.
More complex structures may be preferrable
if they can reduce the number of wide-area
data fetches that web services must perform.
For example, Figure~\ref{fig:guidMap} shows
a simple tree with a single parent and three
children; by convention, the parent of the
tree is the first entry in the GUID map. Each
data block can hold multiple items, but when
a block fills up, the client creates a new
data block, adds the associated GUID to
the GUID map, and then updates any internal
GUIDs within preexisting data blocks. A third
party whose ABE key decrypts the metadata block
can traverse the tree structure without
additional ABE operations, since all of the
data blocks are encrypted with the same
symmetric key.
For future modifications to the data structure,
such as adding new or updating existing data
blocks or creating new pointers and GUIDs,
the user caches the relevant GUIDs and keys
(data and GUID map) allowing her to modify
the contents of the data structure without
creating a new metadata block, thus,
avoiding an ABE operation. Similarly, a web
service could and should cache the same information
and use it to download any changes to the data
structure to avoid incurring the cost of ABE
operation. However, the caching is acceptable
because the user can revoke the web service's
access to her un-accessed data using Sieve's
revocation protocol (\S\ref{sec:revocation})
Each storage-based data structure defines a
Python API for adding and removing objects,
as well as traversing the entire structure.
Sieve clients and web services cache the
metadata blocks for storage-based data
structures, and use the Python APIs to
interact with those structures. Thus, Sieve
clients and web services are insulated from
the low-level details of GUID maps (although
both parties can access raw storage if desired,
and if Sieve's ABE policies allow such accesses).
Sieve's revocation protocol
is compatible with GUID maps, which are re-encrypted
in place, just like any other data block. However,
when Sieve determines that a metadata block must
be re-keyed, Sieve checks whether the metadata
refers to a GUID map. If so, Sieve must chase
the GUIDs in the GUID map to identify the
raw data blocks that must be re-keyed. The revocation
protocol does not change the GUIDs that are
associated with re-keyed data blocks, so embedded
GUIDs inside data blocks remain valid after
re-keying.
Each data block that is referenced by
a particular GUID map is encrypted with the
same $k$. However, Sieve uses counter-mode
encryption~\cite{ctrmode}, and employs a
different counter for each block.
%\footnote{The counter is a public initialization
% value at the beginning of the ciphertext.}
Thus, if an attacker learns the cleartext
for one ciphertext block, the attacker does
not have an easier job of decrypting other
ciphertext blocks with the same $k$.
\section{Relabelling}
\label{sec:relabel}
In Sieve, a user may relabel an object. For
example, a user can restrict access by adding
an additional attribute to the object. A user
can also remove attributes, or swap one
attribute for another. Each web service's
ABE decryption key does not change unless
the relabelling results in an over-privileged
web service, the user has to perform
revocation and re-issue the web service's
key to ensure that it has the correct access
policies.
To implement relabelling, a user's Sieve
client performs three actions. First,
the client replaces the old
metadata block on the storage server with a new one
that contains a new symmetric key
and is ABE-encrypted using the new attributes.
Second, the client uses homomorphic encryption to
re-encrypt the object under the new symmetric key
on the storage server.
Finally, the client updates storage-based
references to the object, ensuring that the
references adhere to the object's new access
policy. The client can locate these references
because the client knows the old attributes
for the object, the new attributes for the
object, and the attributes for all of the user's
metadata blocks. Thus, the client can determine
which references must be patched. For example,
suppose that a user has two storage-based data
structures $S_0$ and $S_1$; further suppose that,
due to relabelling, an object must move from
$S_0$ to $S_1$. By inspecting the object's old
attributes, the client determines that the object
was originally referenced by $S_0$. The client
homomorphically re-encrypts the object using
symmetric key $k'$, traverses $S_0$ to remove
any references to the object, and then adds a
$<GUID, k'>$ reference for the object to $S_1$.
Sieve performs the traversals, removals, and
additions using the APIs defined by the
storage-based data structures.
\section{Discussion}
\label{sec:discussion}
%@@@Cut if crunched for space.
\subsection{Alternative policy languages}
Attribute-based disclosure policies are easy
for users to understand, and these policies
naturally map to ABE cryptosystems. However,
ABE cannot express arbitrarily complex policy
functions. Garbled circuits~\cite{lindell2009proof}
and functional encryption~\cite{boneh2011functional}
are Turing complete, but they are prohibitively
slow. For example, garbled circuits decrypt
AES data at a rate that is four orders of
magnitude slower than native AES decryption~\cite{justGarble}.
Relative to functional encryption and garbled
circuits, ABE is several orders of magnitude
faster. We believe this trade-off is worthwhile.
\subsection{Paying for storage}
In Sieve,
each user places her objects in private cloud
storage. Someone must pay for that storage.
One option is for ad networks to pay. In Sieve,
ad networks can be third parties, and they can
receive ABE keys to access user data. Using a
micropayment system like FileTeller~\cite{fileteller},
advertisers could pay for the right to collect
longitudinal data about a user, and generate
targeted advertisements based on that data.
By deferring user storage costs, advertisers
would encourage users to continue to declassify
a subset of their data. Indeed, since each user
now stores all of her data in a single place
instead of multiple locations, ad networks
would gain access to more contextual information
than in the web ecosystem, even if users choose
which objects to reveal~\cite{adnostic}. Thus,
Sieve might enable a happy middle ground in
which users gain explicit control over the data
seen by third parties, and third parties willingly
subsidize private user storage in return for
better contextual information.
If ad-driven storage subsidies are poorly
designed, they may lead to perverse trade-offs
between subsidy amounts and the required
levels of data disclosure. A full study of
such interactions is beyond the scope of
this paper. For now, we merely observe that
some users may opt out of the subsidy system
entirely. These users will have to pay for
their own storage, but there is reason to
believe that they would do so. Well-known
sites like Pandora, Slashdot, and OkCupid
already allow users to pay a small monthly
fee to remove advertisements, so there is a
preexisting demographic which is willing to
pay money in exchange for better privacy.
The popularity of open source applications
also demonstrates that developers are willing
to make high quality applications without the
expectation of direct payments from users.
Thus, we believe that Sieve's application
model is realistic.
\subsection{Efficient data importing}