I Introduction
Recent years have witnessed the prosperity of applications based on graphstructured data [1, 2], such as online social networks, road networks, web graphs [3], biological networks, and communication networks [4, 5]. Consequently, many systems for managing, querying, and analyzing massive graphs have been proposed in both academia (e.g., GraphLab [6], Pregel [7] and TurboGraph [8]) and industry (e.g., Titan, DEX and GraphBase). With the prevalence of cloud computing, graph owners (e.g., enterprises and startups for graphbased services) desire to outsource their graph databases to a cloud server, which raises a great concern regarding privacy. An intuitive way to enhance data privacy is encrypting graphs before outsourcing them to the cloud. This, however, usually comes at the price of inefficiency, because it is quite difficult to perform operations over encrypted graphs.
Shortest distance querying is one of the most fundamental graph operations, which finds the shortest distance, according to a specific criterion, for a given pair of source and destination in a graph. In practice, however, users may consider multiple criteria when performing shortest distance queries [2]. Taking the road network as an example, a user may want to know the shortest distance, in terms of travelling time, between two cities within a budget for total toll payment. This problem can be represented by a constrained shortest distance (CSD) query, which finds the shortest distance based on one criterion with one or more constraints on other criteria.
In this paper, we focus on singleconstraint CSD queries. This is because most practical problems can be represented as a singleconstraint CSD query. For instance, such a query on a communication network could return the minimum cost from a starting node to a terminus node, with a threshold on routing delay. In addition, multiconstraint CSD queries can usually be decomposed into a group of subqueries, each of which can be abstracted as a singleconstraint CSD query. Formally, a CSD query^{1}^{1}1For simplicity, we refer to singleconstraint CSD queries as CSD queries hereafter. is such that: given an origin , a destination , and a cost constraint , finding the shortest distance between and whose total cost does not exceed .
Existing studies in this area can be roughly classified into two categories. The
first category mainly focuses on the CSD query problem over unencrypted graphs [9, 10, 11, 12, 2]. However, these methods cannot be easily applied in the encrypted graph environment, because many operations on plain graphs required in these methods (e.g., addition, multiplication, and comparison) cannot be carried out successfully without a special design for encrypted graphs. The second category aims at enabling the shortest distance (or shortest path) queries over encrypted graphs [1, 13]. They usually adopt distance oracles such that the approximate distance between any two vertices can be efficiently computed, e.g., in a sublinear way. The main limitation of these approaches is that they are incapable of performing constraint filtering over the cloudbased encrypted graphs. Therefore, they cannot be directly applied to answering CSD queries.Motivated by the limitations of existing schemes, our goal in this paper is to design a practical graph encryption scheme that enables CSD queries over encrypted graphs. As the CSD problem over plain graphs has been proved to be NPhard [10], existing studies (e.g., [2]) usually resort to approximate solutions, which guarantee that the resulting distance is no longer than times of the shortest distance (where is an approximation ratio predefined by graph owners), subject to the cost constraint . The encryption of graphs would make the CSD problem even more complicated. Hence, we also focus on devising an approximate solution.
Specifically, this paper presents Connor, a novel graph encryption scheme targeting the approximate CSD querying over encrypted graphs. Connor is built on a secure 2hop cover labeling index (2HCLI), which is a type of distance oracle such that the approximate distance between any two vertices in a graph can be efficiently computed [1, 2]. The vertices of the graph in the secure 2HCLI are encrypted by particular pseudorandom functions (PRFs). In order to protect real values of graph attributes while allowing for cost filtering, we encrypt costs and distances (between pairs of vertices) by the orderrevealing encryption (ORE) [14, 15] and the somewhat homomorphic encryption (SWHE) [16], respectively. Based on the ORE, we design a simple but efficient treebased ciphertexts comparison protocol, which can accelerate the constraint filtering process on the cloud side.
The main contributions of this paper are as follows.

We propose a novel graph encryption scheme, Connor, which enables the approximate CSD querying. It can answer an CSD query in milliseconds and thereby achieves computational efficiency.

We design a treebased ciphertexts comparison protocol, which helps us to determine the relationship of the sum of two integers and another integer over their ciphertexts with controlled disclosure. This protocol can also serve as a building block in other relevant application scenarios.

We present a thorough security analysis of Connor and demonstrate that it achieves the latest security definition named CQA2security [17]. We also implement a prototype and conduct extensive experiments on realworld datasets. The evaluation results show the effectiveness and efficiency of the proposed scheme.
To the best of our knowledge, this is the first work that enables the approximate CSD querying over encrypted graphs.
The rest of this paper is organized as follows. We summarize the related work in Section II and describe the background of the approximate CSD querying in Section III. We formally define the privacypreserving approximate CSD querying problem in Section IV. After that, the construction of Connor is presented in Section V, with a detailed description of the treebased ciphertexts comparison protocol in Section VI. We exhibit the complexity and security analyses in Section VII, evaluate the proposed scheme through extensive experiments in Section VIII, and conclude this paper in Section IX.
Ii Related Work
In an era of cloud computing, security and privacy become great concerns of cloud service users [18, 19, 20, 21, 22, 23]. Here we briefly summarize the related work from two aspects, i.e., CSD querying over plain graphs and graph privacy protection.
Plain CSD queries. The constrained shortest distance/path querying over plain graphs has attracted many research attentions. Hansen [9] proposed an augmented Dijkstra’s algorithm for exact constrained shortest path queries without an index. This method, however, resulted in a significant computational burden. In order to improve the querying efficiency, another solution [11] focused on approximate constrained shortest path queries, which were also indexfree.
The stateoftheart solution to the exact constrained shortest path querying with an index was proposed by Storandt [12], which accelerated query procedure with an indexing technique called contraction hierarchies. This approach still results in impractically high query processing cost. Wang et al. [2] proposed a solution to the approximate constrained shortest path querying over largescale road networks. This method took full advantage of overlay graph techniques to construct an overlay graph based on the original graph, whose size was much smaller than that of the original one. Consequently, they built a constrained labeling index structure over the overlay graph, which greatly reduced the query cost. Unfortunately, all these solutions are merely suitable to perform queries over unencrypted graphs.
Graph privacy protection. Increasing concerns about graph privacy have been raised with the wide adoption of the cloud computing paradigm over the past decade. Chase and Kamara [17] first introduced the notion of graph encryption, where they proposed several constructions for graph operations, such as adjacency queries and neighboring queries. Cao et al. [24] defined and solved the problem of privacypreserving query over encrypted graph data in cloud computing by utilizing the principle of “filteringandverification”. They built the featurebased index of a graph in advance and then chose the efficient inner product to carry out the filtering procedure. Some approaches [25, 26, 13] utilized the differential privacy technique to query graphs privately, which might suffer from weak security. These studies, however, introduced prohibitively great storage costs and were not practical for largescale graphs. Meng et al. [1] proposed three computationally efficient constructions that supported the approximate shortest distance querying with distance oracles, which were provably secure against a semihonest cloud server.
Secure multiparty computation (SMC) techniques have been widely applied to address the privacypreserving shortest path problem [27, 28, 29, 30], as well as other secure computation problems [31]. Aly et al. [28] focused on the shortest path problem over traditional combinatorial graph in a general multiparty computation setting, and proposed two protocols for securely computing shortest paths in the graphs. Blanton et al. [27] designed dataoblivious algorithms to securely solve the singlesource singledestination shortest path problem, which achieved the optimal or nearoptimal performance on dense graphs. Keller and Scholl [29] designed several oblivious data structures (e.g., priority queues) for SMC and utilized them to compute shortest paths on general graphs. Gupta et al. [30] proposed an SMCbased approach for finding policycompliant paths that have the least routing cost or satisfy bandwidth demands among different network domains. However, existing generalpurpose SMC solutions for the shortest path problem may result in heavy communication overhead.
Although there are respectable studies on graph querying over encrypted graphs, the privacypreserving CSD query remains unsolved. In this paper, we propose a novel and efficient graph encryption scheme for CSD queries.
Iii Background
This section presents the formal definition of the CSD query problem and introduces the 2HCLI structure for graph queries.
Notation  Meaning 

Input graph  
Number of vertices and edges in  
Distance and cost of an edge  
Distance and cost of the edge from to  
Origin, destination, approximation ratio, amplification factor and cost constraint in an CSD query  
Plain and encrypted graph index  
In and outlabel set associated with vertex  
Depth of a cost constraint tree  
Length of a path code  
ORE ciphertext of  
Security parameter  
Output length of ORE encryption  
Input length of symmetric encryption algorithms  
Query token  
Candidate sets as the outputs of the cost constraint filtering  
Maximum distance over all the sketches 
Iiia Approximate CSD Query
Let be a directed graph^{2}^{2}2We refer to as a directed graph in this paper, unless otherwise specified. with a vertex set and an edge set . Each edge is associated with a distance and a cost . We regard the cost as the constraint. We denote the set of edges that connect two vertices as a path. For a path , its distance is defined as , which indicates the distance from its origin to its destination. Similarly, we define the cost of as . The notations throughout the paper are summarized in Table I.
Given a graph , an origin vertex , a destination vertex , and a cost constraint , a CSD query is to find the the shortest distance between and with the total cost no more than . Since the CSD query problem has been proved to be NPhard [10], we keep in line with existing solutions [2] and focus on proposing an approximate CSD solution in this paper.
Inspired by a common definition of the approximate shortest path query over plain graphs [2], we define the approximate CSD query (i.e., CSD query) as follows. (CSD QUERY). Given an origin , a destination , a cost constraint and an approximation ratio , an CSD query returns the distance of a path , such that and , where is the optimal answer to the exact CSD query with the origin , destination and cost constraint .
Fig. 1 shows a simple graph with five vertices, where the distance and cost of each edge are marked alongside it. Given an origin , a destination , a cost constraint , the exact CSD query returns the optimal distance , where the corresponding path is . For an approximation ratio , a valid answer to the CSD query with the same parameters (e.g, the origin , the destination , and ) is 8, with the corresponding path . That is because and .
Based on the above definition, given two paths and with the same origin and destination, we say that dominates iff and . With this principle, we can reduce the construction complexity of graph index significantly, because a great deal of redundant entries in the index can be filtered out. We will make a further illustration in the following subsection.
IiiB Constructing Labeling Index
The encrypted index designed in this paper is mainly constructed based on the wellknown 2HCLI, which is a special data structure that supports the shortest distance query efficiently [32, 33, 2]. Here we briefly describe the basic idea of the 2HCLI, and illustrate its application in building a constrained labeling index.
Given a graph with a vertex set and an edge set , each vertex is associated with an inlabel set and an outlabel set, which are denoted by and , respectively. Each entity in corresponds to the shortest distance from a vertex to . It implies that is reachable from by one or more paths, but is not necessarily a neighbor, or 2hop neighbour, of . Similarly, each entity in corresponds to the shortest distance from to another vertex in . To answer a shortest distance query from an origin to a destination , we first find the common vertices in the labels and , and then select the shortest distance from to . Note that the entities in and are carefully selected [33] so that the distance of any two vertices and can be computed by and .
Considering the graph in Fig. 1, if we ignore the cost criterion of edges, the basic unconstrained shortest distance query with an origin and a destination can be answered with the help of the 2HCLI, as shown in Fig. 2. Given the labels and , it is easy to obtain the set of common vertices, which consists of vertices and . The final answer to the basic shortest distance query should be 5, because .
Although it is simple and straightforward to construct the 2HCLI for a graph with only the distance criterion, constructing a labeling index based on the 2HCLI for the CSD query is much more complex. That is because in the CSD query setting with two types of edge criteria, there might be multiple combinations of distance and cost for each pair of vertices in the labels and . For ease of illustration, we also take as an example the graph, as well as the CSD query, in Fig. 1. The corresponding 2HCLI is shown in Fig. 3, where the 2tuple alongside each arrow represents the distance and cost from the starting vertex to the ending vertex. Note that in the shortest distance query in Fig. 2, the shortest distance from to via is unique. However, in the CSD query setting depicted in Fig. 3, there are four possible distances with different costs from to via . Due to the existence of the cost criterion, the number of possible distances for each pair of vertices could increase dramatically in largescale graphs, which results in a higher complexity in constructing the 2HCLI and calculating the answers to a CSD query.
In order to improve the querying efficiency, we adopt a methodology that combines an offline filtering operation and an online filtering operation.
The offline filtering aims at reducing the construction complexity of the 2HCLI and decreasing the number of entries in the inlabel and outlabel sets as many as possible. We adopt the method proposed in [2]. The entities in the 2HCLI are carefully selected in such a way that for any CSD query from to with a cost constraint , the query can be answered correctly using only the 2HCLI. Since the construction of the 2HCLI should be independent of the cost constraint in specific CSD queries, we can use the definition of domination to filter out redundant entries in the in and outlabel sets.
Taking for example the two entries from to with in Fig. 3, the path with the tuple of (3,2) dominates another path with the tuple of (2,6). Therefore, the entry corresponding to the path can be filtered out (as depicted by a dashed arrow), which helps to reduce the number of entries in . The resulting 2HCLI is exhibited in Fig. 4. We refer the reader to [2] for more construction details.
The online filtering aims at selecting the possibly valid answers to a given CSD query, based on only the 2HCLI. For instance, given an CSD query from to with a cost constraint , we can first find the common vertex set between and , and then return the minimum with for each . Since the above comparisons should be conducted with the corresponding ciphertexts, an efficient online filtering approach will be devised in Section VI.
Iv Problem Formulation
This section presents the system model and the security model of the privacypreserving CSD querying, as well as the preliminaries of the proposed graph encryption scheme.
Iva System Model
We adopt the general system model in the literature [17, 1] for the privacypreserving CSD querying, as illustrated in Fig. 5, which mainly involves two types of entities, namely a user and a cloud server.
The user constructs the secure searchable index for the graph and outsources the encrypted index along with the encrypted graph to the cloud server. When the user, say Alice, performs an CSD query over her encrypted graph, she first generates a query token and then submits it to the cloud server. Upon receiving Alice’s query token, the cloud server executes the predesigned query algorithms to match entries in the secure index with the token. Finally, the cloud server replies the user with the answer to the CSD query.
The graph encryption scheme is formally defined as follows.
(GRAPH ENCRYPTION). A graph encryption scheme consists of three polynomialtime algorithms that work as follows:

: is a probabilistic secret key generation algorithm that takes as input a security parameter and outputs a secret key and a public/secretkey pair .

: is a graph encryption algorithm that takes as input an approximation ratio , a secret keys , a key pair , an amplification factor and a graph , and outputs a secure index .

: is a twoparty protocol between a user that holds a secret key , a key pair and a query , and a cloud server that holds an encrypted graph index . After executing this protocol, the user receives the distance as the query result and the cloud server receives a terminator .
IvB Security Model
Graph encryption is a generalization of symmetric searchable encryption (SSE) [34, 35, 36, 37, 38]. Thus, we adopt the security definition of SSE settings in our graph encryption scheme. This security definition is consistent with the latest proposed security definition in [35, 39, 17], which is also known as CQA2security (i.e., the chosenquery attack security). Now we present the formal CQA2security definition as follows.
(CQA2security model). Let be a graph encryption scheme and consider the following probabilistic experiments where is a semihonest adversary, is a simulator, and and are (stateful) leakage functions.
Real:

outputs a graph , an approximation ratio and an amplification factor .

The challenger begins by running to generate a secret key and a public/secretkey pair , and then computes the encrypted index by . The challenger sends the encrypted index to .

makes a polynomial number of adaptive queries, and for each query , and the challenger execute .

computes a bit as the output of the experiment.
Ideal:

outputs a graph , an approximation ratio and an amplification factor .

Given the leakage function , simulates a secure graph index and sends it to .

makes a polynomial number of adaptive queries. For each query , is given the leakage function , and and execute a simulation of , where is playing the role of the cloud server and is playing the role of the user.

computes a bit as the output of the experiment.
We say that the graph encryption scheme is secure against the adaptive chosenquery attack, if for all PPT adversaries , there exists a PPT simulator such that
where is a negligible function.
IvC Preliminaries
Now we briefly introduce an encryption technique employed in our design, i.e., the orderrevealing encryption.
Orderrevealing encryption (ORE) is a generalization of the orderpreserving encryption (OPE) scheme, but provides stronger security guarantees. As pointed by Naveed et al. [40], the OPEencrypted databases are extremely vulnerable to inference attacks. To address this limitation, the ORE scheme has been proposed [14, 15], which is a tuple of three algorithms described as follows:

ORE.Setup(): Input a security parameter , output the secret key .

ORE.Encrypt(): Input a secret key and a message , output a ciphertext .

ORE.Compare(): Input two ciphertexts and , output a bit , which indicates the greaterthan or lessthan relationship of the corresponding plaintexts and .
V Construction of Connor
In this section, we introduce our graph encryption scheme Connor for the privacypreserving CSD querying.
Va Construction Overview
The construction process is based on two particular pseudorandom functions and , and a somewhat homomorphic encryption (SWHE) scheme.
In this paper, we adopt the concrete instantiation of a SWHE scheme in the literature [16].
The parameters of and are illustrated in Equation eq:parameters,
& h: {0,1}^λ ×{0,1}^* →{0,1}^λ
& g: {0,1}^λ ×{0,1}^* →{0,1}^λ+ z + k
where is the security parameter, and
and are the output lengths of the ORE
and SWHE encryptions, respectively.
We start with a straightforward construction as follows, including:

KeyGen: Given the security parameter , the user randomly generates a secret key and a pair of public and secret keys for SWHE.

Setup: Given an original graph , an approximation ratio , and an amplification factor , the user obtains the encrypted graph index by using Algorithm VA. The 2HCLI of can be generated by the method described in Section IIIB.
Let be the maximum distance over all the sketches and . Motivated by the literature [1], each distance is encrypted as by the SWHE to protect its real value (line 8). Considering that is bounded by , the SWHE encryption of distance allows for obtaining the minimum sum over a certain number of distance pairs.
Each cost , multiplied by the amplification factor , is encrypted by the ORE encryption (line 9). is a big integer and should be carefully selected to enlarge the plaintext space of . In practice, the product of and the maximum cost value over all the sketches should be sufficiently large (e.g., at least ), which is used to provide a sufficient randomness to the inputs. Since is kept private by the user, the cloud server cannot learn the real values of .

Query: To perform an CSD query with an origin , a destination , and a cost constraint , the user generates query tokens and , and sends them to the cloud server. The cloud server obtains and from the index. For each encrypted vertex identifier that appears in both and , the cloud server performs a cost constraint filtering operation (which will be described in details in Section VI), and adds each pair which satisfies the cost constraint into a candidate set . Note that the cost constraint is multiplied by because we encrypt the cost , instead of .
Then, the cloud server directly obtains , where SWHE.Eval for each pair in . The correctness of the above calculation follows homomorphic properties of SWHE. We refer the readers to [1] for more details.
Finally, the cloud server returns to the user, who, in turn, obtains the answer to the CSD query by decrypting with its secret key .
Note that this straightforward approach does not only correctly answer the CSD query over encrypted graphs, but also protects the vertex identifier, distance, and cost information.
However, the encrypted graph index obtained from Algorithm VA, without performing any queries, still results in information leakage. On one hand, it reveals the length of each encrypted sketch, i.e., and , as well as the order information of OREencrypted costs in all sketches. On the other hand, it also discloses the number of common vertices between and , which indicates the number of vertices that connect to . In particular, if the cloud server knows that there is no common vertex between and , it learns that cannot reach .
[t] [1] A secret key , a key pair , an approximation ratio , an amplification factor , and an original graph . The encrypted graph index .
Generate the 2hop labeling index from . Initialize two dictionaries and . Let be the maximum distance over all the sketches and set .
each Set , . each Compute . Compute SWHE.Enc. Compute ORE.Enc. Insert () into the dictionary . Repeat the above procedure for each sketch in and add entries into .
return as the encrypted graph index.
VB Privacypreserving CSD Querying
In order to enhance protection of sensitive information, we construct a privacypreserving CSD querying scheme , where the key generation procedure is the same as in , with improved index construction and CSD query procedures as exhibited in Algorithms VB and VB, respectively.
[t] [1] A secret key , a key pair , an approximation ratio , an amplification factor , and an original graph . The encrypted graph index .
Generate the 2HCLI of . Initialize two dictionary and . Let be the maximum distance over the sketches and set .
each Set , , , and . Initialize a counter
each Compute . Compute SWHE.Enc. Compute ORE.Enc.
Set and . Compute . Set . Set .
Repeat the above procedure for each sketch in and obtain , except that: (i) set and , and (ii) compute .
return as the encrypted graph index.
[htbp] [1] The user’s input are the secret key , secret key pair , an amplification factor , and the query . The cloud server’s input is the encrypted index . user’s output is and cloud server’s output is .
user generates , , and . user constructs a cost constraint tree based on using secret as described in Section VI. user sends to cloud server.
cloud server parses as .
cloud server initializes a set and a counter . cloud server computes . cloud server computes . cloud server performs . cloud server add into . Set . cloud server computes .
cloud server initializes a set and a counter . cloud server computes . cloud server computes . cloud server performs . cloud server add into . Set . cloud server computes .
For each encrypted vertex identifier that appears in both in and , the cloud server performs the cost constraint filtering operation through Algorithm VIC, and add the pair which satisfies the cost constraint into a set . The pair that Algorithm VIC cannot verify is also added into .
For each pair in , the cloud server first computes SWHE.Eval, and then computes .
cloud server returns to the user. user decrypts with .
return Decrypted value of as .
The Setup for works as follows. The user first builds the 2HCLI of graph , and then encrypts sketches associated with (i.e., and ), as described in lines 217.
Note that in order to prevent the leakage of the sketch size in the previous straightforward approach, we split each encrypted sketch and , and ensure that they are stored in the dictionary separately, with a size of one. More precisely, we utilize a counter and generate the unique and for each entity in (line 11). Similarly, the unique and for each entity in can be generated (line 16). The (or ) indicates the position that this entity will be stored in (or ), which ensures each position in the dictionary (or ) having only one entity.
(or ) is used to make an XOR operation with . Since (or ) is different for each sketch, the XOR operation makes the resulting indistinguishable, which guarantees that the static encrypted graph index reveals neither the number of common vertices between and , nor the order information of costs.
The Query in Algorithm VB works as follows. Assume that the user asks for the shortest distance between and , whose total cost does not exceed . She first generates the query token and sends it to the cloud server (lines 13). Upon receiving the token , the cloud server searches in the index and obtains and (lines 522). That is, the cloud server iteratively judges whether the dictionary () contains the key () or not. If it exists, then it adds the corresponding entity into the set ().
Once and are obtained, the cloud server performs the cost constraint filtering (line 23) and computes (line 24), which are the same as described in the straightforward approach. Finally, the user gets the final answer by decrypting , which is returned by the cloud server, using its .
Vi TreeBased Ciphertexts Comparison Approach
This section introduces a treebased ciphertexts comparison approach, which is used for cost constraint filtering in the graph encryption scheme described in Section V.
Via Scenarios
Assume that there is a user (i.e., ) and a server (i.e., ). has many integers which are encrypted by a kind of cryptography algorithm and then outsourced to . Now, wants to ask for to obtain integer pairs, e.g., (, ), whose sum does not exceed . Note that the plaintexts of , and could not be disclosed to , except for the greaterthan, equality, or lessthan relationship. A naive approach is to download all the integers, calculate the summation locally, and choose the integer pairs satisfying the constraint. This method, however, is meaningless if one wants to offload the computation to the cloud. Hence, it is desirable to have a practical solution to this problem.
Note that this scenario is different from the wellknown SMC scheme. In the setting of SMC [41, 42], a set of (two or more) parties with private inputs wish to compute a function of their inputs while revealing nothing but the result of the function, which is used for many practical applications, such as exchange markets. SMC is a collaborative computing problem that solves the privacy preserving problem among a group of mutually untrusted participants. The ciphertexts of all pairs of (, ) and the cost constraint are outsourced to the cloud server, which is responsible for the inequality tests. Furthermore, we could reveal the relationship between the sum of two ciphertexts and another ciphertext to the server, which is referred to as controlled disclosure in the literature [17].
It seems that we might leverage the homomorphic encryption technique, since it supports a sum operation of calculating . Nevertheless, as the homomorphic encryption is probabilistic, we are unable to determine the relationship between and over their ciphertexts.
ViB Main Idea
The main idea of the treebased ciphertexts comparison protocol is to encode an integer with the ORE primitive. To the best of our knowledge, none of the existing approaches can support ORE and homomorphism properties simultaneously. Hence, we design a novel method to address this problem, which is motivated by the following facts.
If we want to compare with , we can compare with and with , respectively. Now, we result in 4 possible cases corresponding to combinations of the two relationships. If () and (), we can know that (). In the rest two cases, i.e., and , or and , we cannot achieve a deterministic result. At this point, we can further divide into . And then we can compare and with and , respectively.
By iteratively performing such an operation, we can determine the relationship between and
with an increasing probability. Due to the ORE property, it is easy to perform the above operations over ciphertexts. Next, we will show how to implement this idea efficiently by utilizing a tree structure.
ViC Details of Protocol
To implement the comparison of and over their ciphertexts, we construct a cost constraint tree, whose nodes represent specific values that are related to . For clarity, we define as the ORE ciphertext of .
An example of the tree structure is depicted in Fig. 6. For each node, we assign 0 to its left child path, while 1 to the right child path. If an integer is not greater than the value of this node, we take the left child path for further comparison; otherwise, we take the right child path. Thus, for any path from the root node to a leaf node, we can obtain a path code, which is an effective representation of the comparison procedure. For instance, an incoming integer would traverse Nodes , , and , and thereby end with a path code of 010. We define the length (i.e., the number of bits) of a path code as . Note that is actually equal to the depth of the tree which is denoted by .
Now the relationship between and can be determined as follows. We first get the ORE ciphertexts of and , as well as their path codes and by traversing the tree separately. When computing , if an overflow occurs (i.e., ), we know that with confidence. If , we also know that with confidence. Otherwise, we are unable to determine the relationship and end up with an uncertainty. We summarize this procedure in Algorithm VIC.
[htbp] [1] Two ORE ciphertexts , and a cost constraint tree whose depth is . The relationship between and .
Initialize a counter and two empty strings and .
Visit the th level of the tree with and concatenate with corresponding or . Visit the th level of the tree with and concatenate with corresponding or . Set
return . return .
return uncertainty.
Discussion. Observe that when we go through a cost constraint tree, one more step can further reduce the uncertainty of the relationship between and by half. We denote the probability of uncertainty as
Pr.
where is the length of the path code. We can easily know the probability of certainty is
Pr Pr.
When the tree depth is 6 (e.g., ), the probability of certainty could reach about 0.9844.
Another observation is the comparison procedure reveals the order information between (or ) and . Thus, the server can infer the interval that belongs to with precision of . To prevent the server from inferring the real value of , in Connor, the user randomly picks a big integer number that is applied to , , and simultaneously, which significantly enlarges the plaintext and ciphertext spaces (e.g., ). The value of is generally a small integer (e.g., 6 in our implementation) that is determined by the user, and both and are kept secret by the user. Therefore, the server cannot infer the real value of (or ) from the order relationship among ciphertexts. We will formally analyze the leakage functions and security issues in the next section.
Vii Complexity and Security Analyses
This section presents the complexity and security analyses on the proposed graph encryption scheme Connor.
Viia Complexity Analysis
The dominant component in determining the complexity of the Setup algorithm is the encryption of the plain 2HCLI generated from a graph . Let be the total sketch for all vertices in , then the time complexity and space complexity are both , where is the number of vertices in .
The Query algorithm consists of a query token generation process on the user side and a CSD query process on the cloud server side. Let be the maximum size of the sketch associated with each vertex in . The complexity of the query token generation process is mainly determined by the construction of a cost constraint tree, whose time complexity and space complexity are both . For the CSD querying process, the time complexity of getting and , performing cost constraint filtering, and performing distance computation are , , and , respectively. The space complexity of the above three components are , , and , respectively. Therefore, the total time complexity and space complexity of the CSD querying process are and , respectively.
ViiB Security Analysis
We now present the security analysis on Connor. For clarity, we first discuss the leakage functions, and then prove that Connor is secure under the CQA2security model.
Setup Leakage. The leakage function of our construction reveals the information that can be deduced from the secure 2HCLI of graph , including the total number of vertices in the graph , the maximum distance over all the sketches , and the size of . More precisely, the size of consists of the total number of sketch entities in and , which are denoted by and , respectively. Thus, the leakage function .
Note that the order relationship of pairwise costs and the order relationship between the cost and cost constraint are not included in , because for each entity in sketches, we make an XOR operation using a unique integer value after we encrypt it, and this makes each entity in sketches are indistinguishable.
Query Leakage. The leakage function of our construction consists of the query pattern leakage, the sketch pattern leakage, and the cost pattern leakage. Intuitively, the query pattern leakage reveals whether a query has appeared before. The sketch pattern leakage reveals the sketch associated to a queried vertex, the common vertices between two different sketches, and the size of the sketches of queried vertices. The cost pattern leakage reveals 1) the order relationship among costs, and 2) the order relationship between costs and the cost constraint during the query procedure. We formalize these leakage functions as follows.
(QUERY PATTERN LEAKAGE). Let be a nonempty sequence of queries. Each query specifies a tuple (, , ). For any two queries and , define , i.e., whether each element of matches each element of , respectively. Then, the query pattern leakage function returns an (symmetric) matrix, in which each entry (, ) equals . Note that does not leak the identities of the query vertices.
(SKETCH PATTERN LEAKAGE). Given a secure 2HCLI of a graph and a query , the sketch pattern leakage function is defined as . is a list, each element of which is the sketches associated to the queried vertices, and is a pair , where and are multisets and is a particular pseudorandom function.
(COST PATTERN LEAKAGE). The cost constraint in a query can essentially be represented by a certain number of uniform intervals. Let be the depth of the cost constraint tree (c.f. Section VI). The intervals associated with are , where . Assign each interval with a list , i.e., the th interval is associated with , which stores all the cost values belong to this interval. The leaked interval information forms an array , of which the th element is (i.e., ). In addition, assume that is the total number of entries in the sketches of the queried vertices. For each pair of costs and , its order relationship of the greaterthan, equality, and lessthan can be represented by , , and , respectively. The leaked order information of costs is a (symmetric) matrix with each entry (, ) being , , or . Therefore, the cost pattern leakage function .
Thus, .
The leakage functions are defined over the 2HCLI rather than the original graph. In fact, the information leakage of the original graph is limited to the minimum number of paths for the queried sourcedestination vertices. It can be defined as an (symmetric) matrix , where is the number of vertices in the graph. Each element in is NULL, 0, or a positive integer, which indicates an uncertain status (i.e., topology is well protected), disconnection, or the minimum number of paths of the two queried vertices, respectively.
For the cost values in the 2HCLI, we introduce a userheld amplification factor to enlarge the plaintext and ciphertext spaces. Thus, the server cannot infer the real cost values just from their order information revealed by the leakage function . For the distance values in the 2HCLI, we use the SWHE encryption to protect their real values from the server.
Theorem 1. If the cryptography primitives , , ORE, and the SWHE are secure, then the proposed graph encryption scheme is secure against the adaptive chosenquery attack.
The key idea is constructing a simulator . Given the leakage functions and , constructs a fake encrypted 2HCLI structure and a list of query . If for all PPT adversaries , they cannot distinguish between the two games Real and Ideal, we can say that our graph encryption scheme is secure against the adaptive chosenquery attack.
Simulating . handles each vertex () to generate a fake in 2HCLI based on the leakage function . randomly chooses for with , and samples and uniformly without repetition. For all , takes the following steps to simulate each sketch: computes and , where is a particular pseudorandom function. Then, it encrypts each vertex in the sketch of by computing , where is a fake secret key. It randomly generates two integers and and obtains ciphertexts and by encrypting () and using the SWHE and ORE schemes. Let . stores in the index . That is, . Similarly, generates a fake and finally obtains the fake 2HCLI .
Simulating . Given the leakage function , simulates the query token as follows. first checks if either of the queried vertices and has appeared in any previous query. If appeared previously, sets and to the values that were previously used. Otherwise, it sets and for some previously unused and . It then remembers the association among , , and . takes the same steps for the queried vertex : setting and analogously and associating with the selected and .
To simulate a fake cost constraint tree , first checks if the queried appeared in any previous query. If appeared previously, sets the to the value that was previously used. Otherwise, constructs a full binary tree based and encrypts each tree node by using the ORE scheme with a randomly generated key. returns this encrypted tree as .
simulates the query procedure as follows. Given the query token , first checks if the query has been queried before. If yes, returns the value that was previously used as the query result. Otherwise, checks whether the queried vertex (or ) has been queried before. If the query vertex has appeared in a previous query, sets to the values that were previously used from of . Otherwise, for a newly appeared vertex , takes the following steps: To generate the sketches associated with , first initializes a set and a counter , Then, it iteratively computes
Comments
There are no comments yet.