+1(978)310-4246 credencewriters@gmail.com


1- Summarise the solutions (proposed systems) in the attached papers. – only 3 solutions

2- Compare between these solutions in a table.

2016 Online International Conference on Green Engineering and Technologies (IC-GET)
J. Vijayaraj, R. Saravanan, P. Victer Paul, R. Raju
Department ofInformation Technology
Sri Manakula Vinayagar Engineering College, Puducherry India
vijayaraj .ajay@gmail.com, rsaravan26@gmail.com, victerpaul@gmail.com, raj updy@gmail.com
Abstract-The new emerging technology to handle large
number of dataset in an efficient manner is the big data which
is used in various different platforms and domain to support
several services and improve the system performance in a
reliable manner. The main weakness in this domain is the
security which can be easily destroyed or surpassed by the
user. To enhance it in a protected way this paper provides a
detailed survey about the various security mechanisms and
methodologies used in the big data technique. These security
measures should satisfy basic parameters such as
Authentication, Authorization and confidentiality so that the
system is stronger against different attacks and threats.
Various different algorithms and technique can be proposed to
improve the system protection and enables a reliable platform
in the storage of datasets.
dataset to improve the performance and speed of the
Map Reduce: This is used for mapping and
merging the data which provides high data processing
speed in terms of huge datasets. It’s a Google product
which is highly used in website to improve the searching
and manipulation speed.
C. Grid gain: The alterative tool which is used In
terms of map reduces is this tool in which it used In
memory processing in real time systems.
HPCC: This high performance tool which is used
in cluster computing which yields greater Efficiency
similar to Hadoop. It is free software which is operated
in LINUX operating system since it a high security
Index Terms- Big data, Dataset, Security, Authentication,
Authorization, Confidentiality .
Big Data is a new emerging domain in which the various
types of datasets are grouped into single unit which can be
used to perform several tasks and operation in an organized
manner. [1] It is also a storage mechanism in which
complex and huge size data can be stored in a larger manner
at the same time it can be accessed and processed in an
efficient way.
The primary aim of big data analytics tools is to assist
the various companies to get more aware of the decisions
related to the industries or business by allowing data
scientists, predictive modelers and related analytics
professionals to examine a huge amount of trans action data
with the help of available business intelligence programs.
These tools can be used in social media, network activities,
customer reports, sensors networks and emails. The
importance of these tools also service serves various
domains such as c10ud computing, sensor based networks,
internet of things and others Information or data related
Some of the tools which are used in big data to improve
the efficiency and capability ofthe systems are
A. Hadoop: The tool which is delivered by the
Apache foundation to run in a parallel platform and
cluster based in big data is the Hadoop tool which is an
open source software and free licensed tool [2]. It is used
for large number of storage and processing of complex
Hadoop is an open framework which is being developed
by Apache foundation in order to process large number of
system in a reliable manner [3]. This is highly used in
Distributed and cluster based system in which the need of
data is very high and most ofthe parallel operation is carried
out in both the systems. This tool can be implemented both
in single and multiple modes as per user wish since it is
basically designed to run large number of datasets and
system in a single server.
This section discusses about various Hadoop security
techniques proposed by researchers and the corresponding
table is shown in the Table 1.
The Research of the Data Security for Cloud Disk
Based on the Hadoop Framework
Theme: In order to enhance these issues a network
c10ud disk safety storage scheme is proposed in this
paper which is fully based on the Hadoop system. It is
fully based on the security features such as
confidentiality and encryption scheme which protects the
user for effects such as, user data no verification, the
user data privacy might be leaked, etc.
Proposed Model :A model is being proposed in
this paper which is the Combination of rapid encryption
of symmetric encryption algorithm and identify
978-1-5090-4556-3/16/$31.00 ©2016 IEEE
Authorized licensed use limited to: Consortium – Saudi Arabia SDL. Downloaded on January 21,2021 at 12:28:50 UTC from IEEE Xplore. Restrictions apply.
2016 Online International Conference on Green Engineering and Technologies (IC-GET)
stored in the cloud is not a manageable domain, so to
protect the important data and provide user
confidentiality to provide strong security in the user side.
In traditional public encrypt mechanism [9], the
encryption resource provider needs obtain all related
information of user, it will damage the user’s privacy and
it will increase the need of bandwidth so that user can
transmit data in a secured paper.
Proposed Model: The proposed system is to solve
this issue which is being described above by a new
security technique which also provide a new access
control solution security for Hadoop namely CP-AßE
technique which uses multiple attributes (collection of
properties) to identify auser, rather than use a only
identity information, and various abstract analysis
presented that our CP-ABE based solution can avoid
obtaining client absolute message and improved security
for user access rights to the file on Hadoop.
3) Experimental Setup: This setup is based on the
threshold attributes based encryption which is used to
provide security to each user in the Hadoop system.
Performance Factors: The paper provides a strong
form security and confidentiality to user in Hadoop
system so that complex dataset can be easily optimized
in this system.
5) Justification: In this system a proposed work, CPABE based method has being proposed in this paper
which has high security and reliability, but the efficiency
of the implementation has yet to be improved which is
our next step of implementation which can improve
more security in the Hadoop that allows to run the
process in an efficient manner without any fai lure.
authentication system of RSA, overtime checking and
the perfonnance of Hadoop, the distributed system for
closed network which provide cloud based data security
storage disk can source secured, effective, stable effect
[7]. Hence the security of the scheme is proved by a
technique named as BANLOGIC.
3) Experimental Setup: This setup in this paper is
usually carried out in cluster which are present inside the
Hadoop and the cloud domain which is used as the
storage components to store the data for a security
purpose and yield a big change when compared to the
existing environment.
Performance Factors: The performance factors is
being compared with the Cost time and the number of
files which is being uploaded in the cloud.
5) Justification: The existing system does not have
enough sophistication in providing the security to cluster
node. In the upcoming versions of our system, we have
planned to implement a more refmed technique for
encryption and authentication which will provide more
security enable large users to store data securely.
A Novel Triple Encryption Scheme for Hadoop-based
Cloud Data Security
Theme: This paper is fully based on cloud
computing domain which has being developing in recent
years since it has capability to provide large number
users with on-demand, flexible, reliable, and low-cost
services in which data security protection is a main
issues in the cloud computing domain because large of
number of data is being stored in the cloud [8].
Proposed Model: This model ensures data security
in cloud data storage, a novel tripie encryption scheme is
proposed in this paper, which combines HOFS files
encryption using OEA technique and the information
key encryption mechanisms with RSA algorithm, and
then encrypts the user’s RSA private key using IDEA.
Experimental Setup: The setup in the paper is
usually based on the Hybrid encryption principle which
consists file encryption and decryption in which security
is being enhanced in this paper. Performance Factor:
The performance factors are being compared with the
Writing speed and File size which is provided in the
HDFS system to check speed ofthe system.
4) Justification: In this system a proposed work, we
plan to achieve the parallel processing of the encryption
and decryption using Hadoop in order to improve the
perfonnance of data both the parameter can be achieved
at the same time which yields greater performance than
other system.
A New Solution of Data Security Accessingfor Hadoop
Based on CP-ABE
A Survey on Security of H adoop
Theme: The main theme of this paper is Trusted
computing and security of services is one of the most
stimulating topics today and cloud computing core
technology which is the focus of international IT
uni verse.
an open-source
computing[12] and big data framework, is increasingly
used in the business world, while the flaw of security
mechanism currently turn out to be one of the major
problems hindering to its development.
Proposed Model: The proposed system developed
new architecture namely-Hadoop security architectures a
standard for Gird Security. It also provides a single signon method and an genuine message by using asymmetric
cryptography as the base for its functionality which
enhances all the security feature which is present in the
Hadoop system
3) Experimental Setup: This paper consists of the
setup of GSI Architecture which is used to provide
security to the cluster and user who are involved in the
Hadoop system [13].
Theme: The main theme ofthis paper is to enhance
cloud computing platform using Hadoop, since user data
Authorized licensed use limited to: Consortium – Saudi Arabia SDL. Downloaded on January 21,2021 at 12:28:50 UTC from IEEE Xplore. Restrictions apply.
2016 Online International Conference on Green Engineering and Technologies (IC-GET)
Performance Factors: The paper provides a new
architecture with improved features which improves all
the Performance parameter in the security system.
5) Justification: In this system a proposed work, the
scalability factor of the system is large. Hadoop is a
distributed file system, so file is partitioned and
distributed through the cluster. Next job execution to be
executed on a different node from which the user has
been authenticated and the job has been submitted.
Enhancing Security o{ Hadoop in a Public Cloud
Theme: The main theme is to enhance Hadoop has
developed gradually popular as it rapidly processes data
in parallel processing where simultaneously more than
one pro gram can run in the single server. This domain
provides other features such as reliability, flexibility,
scalability, elasticity and cost saving to cloud users
which make security effective in big data [14]
Proposed Model: This proposed system has
developed a security enhancement for a public cloudbased Hadoop, named SE Hadoop, to improve the
compromise resilience through enhancing isolation
among Hadoop components and implement slightest
right of entry license for Hadoop processes.
3) Experimental Setup: This setup is built on the
various components such as Job Clients, Node Managers
and Containers running Application Masters, Resource
Manager, Name Node, and Kerberos which enhance
security in this model.
Performance Factors: Here in the above tabulation
the comparison is made between the delegation token
and block token where SE Hadoop brings a huge impact
in terms of performance than the existing Hadoop
system which is used in big data [17].
5) Justification: In this system a proposed work SE
Hadoop Block Token does not appear to intlict
overhead, and SE Hadoop Delegation Token has very
limited performance impact on the existing Hadoop
which can be modified in the upcoming system and the
experiment result also showed migrating Hadoop jobs to
SE Hadoop is straightforward.
System. Nivethitha et al has proposed new kind of
authentication service for Hadoop using one time pad
which is not strong towards to offline password guessing
2) Proposed Model: This proposed system has
developed a light weight OTP support validation service
for Hadoop environment which is termed as Nivethitha
scheme which provides security to both the front and
back end of a system and its authentication is very easier
to implement and it provides robust mechanisms which
save time to users to access the resource in a stabilized
Performance Evaluation: This setup consists of a
Registration server where the user must register
themselves obtain an OTP which servers a security code
for the users to access the resource in the Hadoop
system. Here the performance factor is not used more in
this paper but the technique used is little advanced and
more efficient for the web based user to get a seeured
environment at ease.
4) Justification: In this system a proposed work is less
seeured which is being overcome by various techniques
since it is lightweight it can be easily broken by the
outsider. OTP is not more secure which makes the
attacker to get it and make the system more vulnerable
which others user to get unauthorized access to
individual system.
A Secure and Light Weight Authentication Service in
Hadoop using One Time Pad
Theme: The main theme is to enhance to store and
process sensitive data related to credit card, healthcare,
financial, management which comprise the big data
Hadoop environment is enhancement with HDFS i.e.
Hadoop Distributed File System, which is a cloud
computing environment based, parallel processing, open
framework system in Big Data [15]. Usually the HOFS
stores and process critical sensitive data, there is a more
necessitate for secure validation service to authenticate
and authorize who are willing to connect with HOFS
A Novel Approach for Improving Security and Storage
Efficiency on HDFS
Theme: The main theme is to enhance Distributed
file system for the storage of enormous files has clear
benefits when compared with the conventional file
system. Meanwhile, Hadoop Distributed File System
(HDFS) executed with service hardware has the benefits
such as low cost, high fault tolerance, scalability.
2) Proposed Model: This proposed system has
developed a new architecture based on HDFS, combined
with network coding and multi-node reading, to improve
the security and storage efficiency of the distributed file
system [16].
Experimental Setup: This setup consists of a
Encoding process in which the a variety of information
are being encoded from one form to another and shows
that how the data are being stored in a block
organ izati on.
Performance Factors: Here the performance factor
is compared based on the number of nodes and the file
size to show that proposed system is more efficient than
the present system in terms ofstorage and security [19].
5) Justification: In this proposed system work huge
amount of storage is required by the user which makes
them more inconvenient in accessing the data from the
nodes. The improvement provided by the system is very
less in the existing method.
Authorized licensed use limited to: Consortium – Saudi Arabia SDL. Downloaded on January 21,2021 at 12:28:50 UTC from IEEE Xplore. Restrictions apply.
2016 Online International Conference on Green Engineering and Technologies (IC-GET)
The Research of the
Data Security for
Cloud Disk Based on
the Hadoop
A Novel Tripie
Encryption Scheme for
Hadoop-based Cloud
Data Security
A New Solutiou of Data
Security Accessing for
Hadoop Based on CPABE
A Survey on Security
Enhancing Security
of Hadoop in a
Public Cloud
A Secure aud Light
Service iu Hadoop
using One Time Pad
A Novel Approach for
Improving Security
aud Storage Efficieucy
Encryption of
data in network and
G-Hadoop security
a Light Weight OTP
based Authentication
Special Decoding mode
and Multi-nodes
To enhance its trust
and security and finally
based on previous
descriptions, coneludes
Hadoop’s Security
Gives reliability,
flexibility, scalability,
elasticity and cost
saving to e10ud users
Nivethitha et AL great
need for sec ure
authentication service
to authenticate and
authorize the user
connecting to HDFS
A new architecture
based on HDFS,
combined with network
coding and multi-node
reading, to improve the
security and storage
efficiency ofthe
distributed file system
Trusted computing and
security of services
The vulnerabilities are
the overloaded
authentication key,
and the lack of finegrained access control
at the data access
Nivethitha et al
scheme is vulnerable
to omine password
guessing attack and on
success of it, an
attacker can perform
all major attacks on
The potential safety
hazard due to the
unencrypted data stored
in Data node,
which may cause data
leakage during the
GSI is a standard for
Gird Security. It
provides a single signon process and an
communication by
using asymmetrie
To improve the
compromise resilience
through enhancing
isolation among
Hadoop components
and enforcing least
access privilege for
Hadoop processes
Proposed first of its
kind of authentication
service for Hadoop
using one time pad
The storage efficiency
of the system may
improve obviously
through the decrease
OfName node’s
workload because work
of data encoding
HDFS files
encryptiou usiug IDEA
Symmetrie Encryption
the data key encryption
The user’s RSA private
key using IDEA.
Checking and the
performance of
Hadoop, the
distributed network
cloud data security
storage disk can supply
seeured, eflective,
stable effect
A Novel Tripie
Encryption scheme
Its ability to provide
users with on-demand,
flexible, reliable, and
low-cost services.
How to protect the
important data ofuser
User data stored in the
eloud is not a controllable
Problems Discussed
Proposed Technique
The distributed
network eloud disk,
such as transmission,
storage security
A network eloud disk
safety storage scheme
Data security protection
The Tripie Encryption
scheme in Hadoop-based
cloud data storage
Traditional public encrypt
mechanism, the
encryption resource
provider needs obtain all
relevant information of
user, it will damage the
user’s privacy certainly,
and it will need more
bandwidth and large
processing overhead.
CP-ABE based Solution
can avoid obtaining user
complete information and
Enhanced security for
user accessing file on
Authorized licensed use limited to: Consortium – Saudi Arabia SDL. Downloaded on January 21,2021 at 12:28:50 UTC from IEEE Xplore. Restrictions apply.
2016 Online International Conference on Green Engineering and Technologies (IC-GET)
The main issue in the big data parameter is the security
which is not fulfilled or provided completely in all the
existing papers. In the above survey papers each technique
does not provide the security measures at a very huge place
and have a great impact in big data dataset. Each technique
used in the paper yield a performance which is not sufficient
to handle complex operation at the same time increase
complexity of the system. Mostly of the security issues
depends on the way how the user uses the dataset in a very
defined and organized manner. To enhance security
different can be proposed and implemented using
Encryption algorithms and various new security techniques
for authorization and authentication which are the most
important parameter for a basic security needs.
[1] Katal, A.; Wazid, M.; Goudar, R.H., “Big data: Issues,
challenges, tools and Good practices”, IEEE Contemporary
Computing (IC3), 2013 Sixth International Conference, pp:
404 – 409.
[2] Fazal-e-Amin; Alghamdi, A.S. ; Ahmad, 1. ; Hussain, T.
“Big data tor C4i systems: goals, applications, challenges and
tools” ,IEEE Innovative Computing Technology (INTECH),
2015 Fifth International Conference ,pp:89 – 93.
[3] Menon, S.P.; Hegde, N.P. “A survey oftools and applications
in big data “Intelligent Systems and Control (ISCO), 2015
IEEE 9th International Conference, pp: 1 – 7.
[4] Tiwari, A.K.; Chaudhary, H. ; Yadav, S “A review on Big
Data and its security “Innovations in Information, Embedded
and Communication Systems (ICITECS), 2015 International
Conference, pp: 1 – 5
[5] James Nunns, “10 of the most popular Big Data tools for
24 November 2015.
Douglas, Laney, “The Importance of ‘Big Data’: A
Definition”. Gartner. Last Accessed 24 November 2015.
[6] “Research in Big Data and Analytics: An Overview”
International Journal ofComputer Applications (0975 – 8887)
Volume 108 – No 14, December 2014
[7] Sofiya Mujawar, Aishwarya Joshi.” Data Analytics Types,
Tools and their Comparison” TTJARCE 2015Vol. 4, Issue 2,
pp. 488-491.
[8] Peter Wayner” 7 top tools for taming big data”, Last
Accessed 24 November 2015.
[9] Amit Gupta .” Top 10 Open Source Big Data Tools” Last
Accessed 24 November 2015
[10] [Huang Jing ; LI Renfa; C. Tang Zhuo”The Research of the
Data Security tor Cloud Disk Based on theHadoop
Framework” International Conference on Intelligent Control
and Intormation Processing, June 9 – 11,2013.
[11] Chao YANG, Weiwei LIN*, Mingqi LIU “A Novel Tripie
Encryption Scheme for Hadoop-based Cloud Data Security”
International Conference on Emerging Intelligent Data and
Web Technologies,2013
[12] Huixiang Zhou; Qiaoyan Wen “A New Solution of Data
Security Accessing tor
[13] Hadoop Based on CP-ABE” International Journal of
Computer Applications, 2014.
[14] R. Baskaran, P. Victer Paul and P. Dhavachelvan, “Algorithm
and Direction for Analysis of Global Replica Management in
P2PNetwork”, IEEE International 2012.
[15] R. Baskaran, P. Victer Paul and P. Dhavachelvan, “Ant
Colony Optimization for Data Cache Technique in MANET”,
International Conference on Advances in Computing (ICADC
2012), Advances in Intelligent and Sott Computing” series,
Volume 174, Springer, June 2012, pp 873-878, ISBN: 97881-322-0739-9.
[16] B. Saraladevi, N. Pazhaniraja, P. Victer Paul, M.S.
SaleemBasha, P. Dhavachelvan, Big Data and Hadoop-a
Study in Security Perspective, Procedia Computer Science,
Volume 50, 2015, Pages 596-601, ISSN 1877-0509.
[17] S. Saranya, M. Sarumathi, B. Swathi, P. Victer Paul, S.
Sampath Kumar, T. Vengatlaraman, Dynamic Preclusion of
Encroachment in Hadoop DistributedFile System, Procedia
Computer Science, Volume 50, 2015, Pages 531-536, ISSN
[18] M. Thamizhselvan, R. Raghuraman, S.G. Manoj, P. Victer
Paul, “Data security model for Cloud Computing using VGRT methodology,” IEEE 9th International Conference on
Intelligent Systems and Control (ISCO), Jan 2015, India.,
[19] M. Thamizhselvan, R. Raghuraman, S.G. Manoj, P. Victer
Paul, “A novel security model for cloud using trusted third
party encryption,” International Conference on Innovations in
Information. Embedded and Communication Systems
(ICIIECS), March 2015, India, pp.I-5.
Authorized licensed use limited to: Consortium – Saudi Arabia SDL. Downloaded on January 21,2021 at 12:28:50 UTC from IEEE Xplore. Restrictions apply.
Received October 23, 2018, accepted November 19, 2018, date of publication November 23, 2018,
date of current version December 27, 2018.
Digital Object Identifier 10.1109/ACCESS.2018.2883105
HEAP: An Efficient and Fault-Tolerant
Authentication and Key Exchange Protocol for
Hadoop-Assisted Big Data Platform
ASHOK KUMAR DAS 2 , (Senior Member, IEEE), NEERAJ KUMAR 3 , (Senior Member, IEEE),
JOEL J. P. C. RODRIGUES 4,5,6 , (Senior Member, IEEE),
1 Subir
Chowdhury School of Quality and Reliability, IIT Kharagpur, Kharagpur 721 302, India
for Security, Theory and Algorithmic Research, International Institute of Information Technology, Hyderabad 500 032, India
of Computer Science and Engineering, Thapar University, Patiala 147 004, India
4 National Institute of Telecommunications, Santa Rita do Sapucaí 37540-000, Brazil
5 Instituto de Telecomunicações, 1049-001 Aveiro, Portugal
6 University of Fortaleza, Fortaleza 60811-905, Brazil
7 School of Electronics Engineering, Kyungpook National University, Daegu 41566, South Korea
2 Center
3 Department
Corresponding author: Youngho Park (parkyh@knu.ac.kr)
This work was supported in part by the Basic Science Research Program through the National Research Foundation of Korea funded by the
Ministry of Science, ICT & Future Planning under Grant 2017R1A2B1002147, in part by the BK21 Plus Project funded by the Ministry of
Education, South Korea, under Grant 21A20131600011, in part by Finep/Funttel through the Radiocommunication Reference Center
project of the National Institute of Telecommunications, Brazil, under Grant 01.14.0231.00, in part by the National Funding from the
Fundação para a Ciência e a Tecnologia under Project UID/EEA/500008/2013, and in part by the Brazilian National Council for Research
and Development (CNPq) under Grant 309335/2017-5. This work was also supported by the Ministry of Human Resource Development,
Government of India (to carry out this research work at the Subir Chowdhury School of Quality and Reliability, IIT Kharagpur), through
the Institute Fellowship.
ABSTRACT Hadoop framework has been evolved to manage big data in cloud. Hadoop distributed
file system and MapReduce, the vital components of this framework, provide scalable and fault-tolerant
big data storage and processing services at a lower cost. However, Hadoop does not provide any robust
authentication mechanism for principals’ authentication. In fact, the existing state-of-the-art authentication
protocols are vulnerable to various security threats, such as man-in-the-middle, replay, password guessing,
stolen-verifier, privileged-insider, identity compromization, impersonation, denial-of-service, online/off-line
dictionary, chosen plaintext, workstation compromization, and server-side compromisation attacks. Beside
these threats, the state-of-the-art mechanisms lack to address the server-side data integrity and confidentiality
issues. In addition to this, most of the existing authentication protocols follow a single-server-based user
authentication strategy, which, in fact, originates single point of failure and single point of vulnerability
issues. To address these limitations, in this paper, we propose a fault-tolerant authentication protocol suitable
for the Hadoop framework, which is called the efficient authentication protocol for Hadoop (HEAP). HEAP
alleviates the major issues of the existing state-of-the-art authentication mechanisms, namely operatingsystem-based authentication, password-based approach, and delegated token-based schemes, respectively,
which are presently deployed in Hadoop. HEAP follows two-server-based authentication mechanism. HEAP
authenticates the principal based on digital signature generation and verification strategy utilizing both
advanced encryption standard and elliptic curve cryptography. The security analysis using both the formal
security using the broadly accepted real-or-random (ROR) model and the informal (non-mathematical)
security shows that HEAP protects several well-known attacks. In addition, the formal security verification
using the widely used automated validation of Internet security protocols and applications ensures that HEAP
is resilient against replay and man-in-the-middle attacks. Finally, the performance study contemplates that
the overheads incurred in HEAP is reasonable and is also comparable to that of other existing state-of-theart authentication protocols. High security along with comparable overheads makes HEAP to be robust and
practical for a secure access to the big data storage and processing services.
2169-3536 2018 IEEE. Translations and content mining are permitted for academic research only.
Personal use is also permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-Tolerant Authentication and Key Exchange Protocol
INDEX TERMS Cloud computing, authentication, key agreement, big data security, hadoop, formal
security, AVISPA.
The existence of the digital universe is expanding by a factor
of 300, from 130 exabytes to 40,000 exabytes, or 40 trillion
gigabytes (more than 5,200 gigabytes for every man, woman,
and child in 2020) from 2005 to 2020. From the recent time
until 2020, the digital universe will about double every two
years.1 This of course, drag the attention of many researchers
and practitioners in the field of Big Data storage and processing issue. To deal with this, various distributed file systems’ namely Hadoop Distributed File System (HDFS) [1],
Google File System (GFS) [2], MooseFS,2 zFS [3], Ceph [4],
etc. have evolved. However, among these, due to popularity,
simplicity and easy availability (open source), HDFS (the
principal component of Hadoop framework 3 ) is widely used
in industries and became the de facto standard platform for
Big Data storage. HDFS has been evolved to provide the
storage service, where data is reliably kept in a distributed
fashion into different servers. In this connection, client stores
Big Data into the geographically dispersed remote third party
servers through an insecure channel. This excavates several security concerns as the storage service access to be
made over an insecure communication channel. Towards the
solution, a robust authentication mechanism is the preferred
solution. In this synergy, different authentication protocols
have been proposed such as Kerberos,4 OAuth,5 OpenID
connect,6 SAML,7 etc. But for the sake of simplicity, scalability and applicability, operating system based security (i.e.,
password-based approach), Kerberos authentication protocol
(i.e., password with possession-based approach) and delegated token based approaches are currently employed in
Hadoop for enhancing its security [5]–[9].
To access Big Data storage or processing services over the
Internet, it is necessary for an end user (or service server) to
initially enroll himself (or itself) with the Key Distribution
Center (KDC) or a Centralized Registration Authority (CRA)
offline. In centralized registration mechanism, it is difficult
to update secret credentials of user and Hadoop clusters visa-vis service servers dynamically. After enrollment, the end
user can access Big Data storage or processing services from
the service server remotely over the Web. Usually, in such
a setting, the KDC (or CRA) stores the secret information
of all the principals’ in its database, where a single point
of vulnerability and single point of failure makes the whole
system jeopardized [10]. In order to address these issues,
many schemes have been reported in [5] and [11]–[30] that
1 http://www.emc.com/leadership/digital-universe/2012iview/executivesummary-a-universe-of.htm
2 MooseFS: Can Petabyte Storage be super efficient. https://moosefs.com/
3 Apache Hadoop: https://hadoop.apache.org/
4 http://web.mit.edu/kerberos/
5 http://www.oauth.net/
6 http://openid.net/
7 saml.xml.org/
VOLUME 6, 2018
are based on different techniques namely, smart card
based approaches [18], [20], [27]–[29], one-time padding
[13], [14], PKI (Public-Key Infrastructure) based approach
[24], [31], implementation of a Trusted Computing
Platform (TCP) [15], combination of both password and
possession based strategy [25], [26], [30], authorization delegation based approach [22], [32], combined public and
private key cryptography with random number generator
based scheme [12], utilizing basic geometry structure
based password storing [33] and Identity-Based Authentication (IBA) scheme [23], respectively. However, these
explications are either expensive in terms of extra hardware cost or computationally intensive. Further, the existing
approaches [24]–[26], [30] enrol an end user (or service
server) by asking his username and password (or service
server identity and secret key), where the username (or service
server identity) is used as the primary credential, which is
verified at the time of mutual authentication between user
and service server respectively. In fact, selecting a username
(or service server identity) is not enough to be considered as
a strong private identifier. As a result, an adversary can easily
incorporate different attacks, such as impersonation attacks
and identity compromisation attacks by sniffing the username
(or service server identity) from the insecure media [10].
Moreover, these approaches are not considered the user-side
and service server-side identity untraceability and anonymity
properties. In spite of this, the existing password-based
user enrollment strategy [30] which is currently incorporated
in Hadoop is vulnerable to password guessing, online or
offline dictionary and stolen-verifier attacks. Additionally,
the existing approach [30] derive client’s secret key as the
hash value of its password. Therefore, the key will remain
same until client changes the current password. However,
changing this password needs updating the enrolled data
maintained by the KDC (or CRA) and this, in fact, invites
many key rollover problems [10]. In addition to this, man-inthe-middle, privileged-insider, denial-of-service, workstation
compromisation, chosen plaintext and replay attacks are the
key security threats that are not properly addressed in the
existing schemes [10].
In order to ensure mutual authentication and session key
distribution between end user and service server (Namenode
or JobTracker), in possession based (also called token based)
approach, a trusted server distributes a token with large numbers of authentication parameters, that is, more parameters
are included into the constitution of an Authentication Token
(AT ) and authorization token (or Service Token (ST )). Hence,
AT and ST verification increases the overhead to the existing authorization server (or service server). In addition to
this, tokens and session keys are stored into user’s credential cache [24]–[26], [30] in the respective workstation, and
each token has its own lifetime. So, it leads to workstation
compromisation attack, disclosure of session key as well as
D. Chattaraj et al.: HEAP: Efficient and Fault-Tolerant Authentication and Key Exchange Protocol
misuse of tokens. Moreover, an end user blindly accepts the
authentication services, that is, he completely rely on the
trusted third party server (KDC’s AS) issued shared secret
session key without verifying the strong authenticity of the
AS. Therefore, if the AS is compromised by a malicious
insider, a byzantine attack can be induced into the system
which can falsify the primitive operations and it can also lead
to the wrong desires [10]. Nonetheless, some mutual authentication schemes [24]–[26], [30] use time synchronization for
joint authentication between end user and the service servers.
More precisely, all principals in a realm must be synchronized
with a centralized time server. In fact, this is an overhead for
the implementation of the protocol. In addition to this, clock
in a distributed system may not be always synchronized, so it
may cause a replay attack for both the end user and the service
server [34]. In spite of this, in the existing authentication
approaches [24]–[26], [30], a user blindly trusts the authentication server (AS) without verifying any cross parameters
(e.g., message authentication codes, server-side generated
one-way hash chain based one-time identifiers [16], digital
signatures [10], etc.) after receiving the authentication token.
To the best of our knowledge, there is no solution to verify the
originality of AS except the timestamps and visualization of
password (or session key protected authentication or service
token) [10]. Hence, this shortcoming opens a possibility of
impersonation attacks [34], where a compromised principle
can falsify the basic operations of the authentication system. In addition to this, in Hadoop, there is no provision to
verify the data integrity and confidentiality after archiving
end user’s or organization’s Big Data into HDFS. Since,
the raw data blocks (constructed from the Big Data) are
stored into various Datanodes as plaintext format, it is easy
for an adversary to modify the content of the data blocks
easily [5]–[9], [35]–[41]. Additionally, the result of the processed Big Data utilizing MapReduce framework is stored
into end user’s local file system, so anybody can read this
content. To the best of our knowledge, there is no solution
exists to mitigate these issues.
To address the aforesaid issues and challenges of the existing
authentication schemes that assists security in Hadoop framework, we set the following objectives in the proposed scheme:
1) The proposed protocol should have the capability
to enroll the Hadoop cluster’s service server online
(instead centralized deployment of service servers)
with the authentication service provider by advocating
the scalability issue.
2) The proposed protocol should prevent different
well-known attacks, such as man-in-the-middle,
replay, denial-of-service, privileged-insider, impersonation, identity compromisation, ciphertext-only,
sever-spoofing and chosen plaintext attacks.
3) The proposed scheme should have a fault-tolerant and
dependable authentication architecture to address the
existing SOV and SOF issues.
4) In the proposed scheme, the authentication task for both
Big Data technology provider and user should be more
robust and user friendly advocating less usage of security credentials and hardwares (smart card, biometric
scanner, smart mobile device, etc.).
5) The proposed protocol should disseminate securely the
session key between two communicating parties.
6) The proposed scheme should provide a mechanism to
read, write and process the user’s Big Data securely in
Hadoop cluster.
7) The proposed protocol should support user and service
server anonymity by hiding their original identities
from eavesdroppers and privileged-insiders.
8) The proposed scheme should have a provision to generate a fresh session key securely in each session to
mitigate the workstation compromisation attack.
9) The proposed protocol should have the capability
to store the dictionary of password securely at the
server-side to mitigate the offline dictionary, password
guessing and stolen-verifier attacks.
10) The proposed scheme should able to establish the session key between two communicating parties without
timestamps utilization.
To fulfill the above objectives, a two-server based authentication framework has been introduced. This framework is
structured in such a way that it mitigates the single point
of failure (SOF) and single point of vulnerability (SOV)
issues. Further, the proposed framework resists various well
known security threats, such as, man-in-the-middle, replay,
privileged-insider, Denial-of-Service (DoS), chosen plaintext, password guessing, identity compromisation, impersonation, stolen-verifier, server spoofing, offline dictionary
and workstation compromisation attacks. According to the
policy of the proposed framework a service provider can
enrol any number of Hadoop clusters vis-a-vis service servers
online with the Key Distribution Center (KDC). In this framework, the proposed KDC consists of three different servers.
Among them, two are public servers (one server interacts with
clients only and the other communicate with Big Data service
providers) and the other is private server. Initially, clients
and Big Data service providers enrol themselves offline with
the private server. After offline registration, both client and
service provider would eligible for online registration through
the respective KDC’s public server. As the private server
is hidden from universal access, it ensures the server-side
security. In the proposed framework, after online registration,
the service providers are able to enroll his service servers
(Namenode servers or Job Trackers) online with the KDC. So,
the service server registration is simple and scalable in nature.
Mean while, the clients’ are able to communicate directly
with the service servers’ (Namenode Servers or Job Trackers)
in a Hadoop cluster after establishing a secret key with the
KDC’s public server followed by a two-server based mutual
authentication process (we call it as single sign-on). This single sign-on facility provides the access to any number of service servers by accomplishing only one time authentication
VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-Tolerant Authentication and Key Exchange Protocol
with the KDC. As a solution to the server spoofing and
DoS attacks, we consider here a two-factor (password and
authorization token) based authentication strategy. To preserve privacy, in our scheme, we make the original identity
of users and service servers fully anonymous. To enhance the
robustness and correctness of entity authentication, we propose a new digital signature based entity verification scheme
utilizing both symmetric and asymmetric key cryptography.
As a remedy of chosen plaintext attacks, we use stateless
Cipher Block Chaining (CBC) mode of symmetric encryption
and decryption strategy where a random nonce is utilized as
an Initial Vector (IV). To establish a secure session between
two communicating parties, we propose a new pair-wise session key agreement policy using elliptic curve cryptography.
As a solution to the client-side workstation compromisation
attacks, we store client’s secret information indirectly into a
private place (server-side) and later on it can be fetched by
the legitimate user’s only. Moreover, to check the integrity
of data blocks which resides into various chunk servers or
Datanodes in HDFS, a Hash-based Message Authentication
Code (HMAC) based secure HDFS-read and HDFS-write
operations has been introduced.
The major research contributions devised in this paper are
listed below.
• We propose a new secure and scalable enrollment
methodology to register a cluster of service servers with
the trusted third party server by eliminating traditional
in-house (centralized) service server registration policy.
• We then introduce a new fault tolerant authentication
framework to provide dependable authentication services for remote clients.
• Next, we propose a new digital signature based mutual
authentication policy, where each principal is able to
verify the legitimacy of other intended principals along
with the trusted third party on which both the principal
rely on.
• We also introduce a new approach for security credentials distribution and replication policies in order to mitigate server-side single point of failure and single point
of vulnerability issues.
• To distribute the session key securely between two
intended principals, we then propose an elliptic curve
cryptography based session key distribution policy by
utilizing the concept of in-memory caching.
• In addition, we propose a mechanism to disseminate
the session key between two communicating entities
without compromising their identities.
• The extensive formal security inspection by utilizing de
facto Real-Or-Random (ROR) model and the informal
security analysis substantiate that the proposed protocol
can address various well-known attacks against active
and passive adversaries.
• The formal security verification using the widely-used
AVISPA tool has been carried out for the proposed
VOLUME 6, 2018
protocol, and the AVISPA simulation results assist that
the proposed scheme is secure against man-in-themiddle and replay attacks.
To enhance security, the proposed protocol has a facility to dynamically update user’s password and service
server’s secret credentials online with the help of the
authentication servers.
Finally, the proposed scheme is user-friendly in nature
and the user needs to remember only his/her identity
and password to login into the proposed authentication
The remainder of the paper is structured as follows. We discuss the network model of HDFS in Section II. Section III
presents the related work associated with the entity authentication in Hadoop. The necessary related mathematical preliminaries are discussed in Section IV, which are helpful for
describing and analyzing the proposed protocol. In Section V,
we demonstrate the proposed scheme. Section VI presents
both formal security analysis using the widely-accepted Realor-Random (ROR) model and informal security analysis of
the proposed protocol. In Section VII, we simulate the proposed protocol under the broadly-used On-the-Fly Model
Checker (OFMC) and SAT-based Model Checker (SATMC)
backends by utilizing the AVISPA tool and summarize the
attack traces. Section VIII presents the performance analysis
of the proposed scheme. In Section IX, we elaborate few
appealing features as the realizations of the proposed scheme.
Finally, we conclude the paper in Section X.
Apache Hadoop8 is an open source and provides a new way
for storing and processing Big Data. It consists of two core
components. The former one is File Store (FS) and later one
is a Distributed Processing System (DPS). The FS is called
as HDFS9 and the DPS is termed as MapReduce.10 HDFS is
a distributed file system designed for storing very large files
with streaming data access patterns, running on clusters of
commodity hardware. Files are divided into blocks (default
block size is 64 MB) and blocks are replicated and stored at
different chunk servers (also called slave servers). The basic
architecture of HDFS is shown in Figure 1.
Note here, we have shown only two Namenode
servers (NSs) and one JobTracker (JT) in Figure 1, but in
practical HDFS federation architecture11 it cloud vary up to n
number of such servers. Intuitively, the three noteworthy classifications of machine roles in a single Hadoop Cluster (HC)
are client machine (i.e., HDFS Client), Master Node (MN)
(i.e., combination of Namenode and Job Tracker) and Slave
Nodes (SNs) (i.e., combination of Datanodes (DNs) and
8 https://hadoop.apache.org/releases.html
9 https://hortonworks.com/apache/hdfs/
10 https://hortonworks.com/apache/mapreduce/
11 https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoophdfs/Federation.html
D. Chattaraj et al.: HEAP: Efficient and Fault-Tolerant Authentication and Key Exchange Protocol
FIGURE 1. System architecture of HDFS Federation (new release).
Task Trackers (TTs)) (see Figure 1). All these components
are connected through a communication network. In this
architecture, for a single Hadoop Cluster say HCj , the MN
regulates two useful functions i.e., reliable data storage
using HDFS and parallel computations utilizing MapReduce framework. The Namenode manages and facilitates
distributed data storage services wherein the JT administers
the parallel processing of stored data utilizing MapReduce
(MR). Moreover, SNs are responsible for streamline data
blocks storing and running the parallel computations over
the stored data blocks. Each SN runs both Datanode and TT
daemon and receives instructions from their MN. The TT
daemon is a slave to the JT and the Datanode daemon acts as a
slave to the Namenode. Client machine has Hadoop installed
with all the cluster settings, however, it is neither a master nor
a slave. Rather, the role of the Client machine is to load data
into the cluster, submit MapReduce jobs portraying how that
data ought to be processed, and after that, retrieve or view the
results of the job when it’s completed.
Hadoop framework has been evolved to manage a massive volume of data in Cloud. However, Hadoop does not
provide any robust authentication mechanism for principals’ authentication [5]–[7], [30], [34]. Since very few literature is available in this domain, the related work that
is illustrated here is two fold: first, we discuss the state
of the art authentication protocols and its variant that are
actually studied in Hadoop (Big Data) platform and finally,
we present the recent development of Authenticated Key
Exchange (AKE) protocols in the domain of Cloud Computing platform as well as two-server based Password-assisted
Authenticated Key Exchange (PAKE) schemes.
A limited number of authentication and key exchange
protocols [5], [11], [13]–[17], [22]–[24], [30], [32], [42] has
been found in this category.
Shen et al. [15] have proposed a theoretical prototype system combined with trusted platform support service. In their
scheme, they have used a Trusted Computing Platform (TCP)
to resolve the process of authentication in Hadoop. In TCP,
the users identity is preserved and it is encrypted with users
personal key and this mechanism is integrated in the hardware
such as the BIOS and TPM. So it is very hard to decipher a
user identity. The TCP is based on the Trusted Platform Module (TPM). The TPM is used to safeguard the system from
different kind of hardware and software attacks. Authors have
also pointed out the limitations in their scheme: (i) the stored
data in the Datanodes will be decrypted when being accessed
and will re-encrypted with different key after being accessed,
the performance of system will be reduced, (ii) in order to
make the authentication system trusted, some information
are need to be stored among Namenodes, Datanodes and
users, and finally (iii) TCB needs to fulfill many requirements
VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-Tolerant Authentication and Key Exchange Protocol
of server-side and user-side, so it may be raised bottleneck
situation in the system.
Kohl and Neuman [30] proposed Kerberos authentication
protocol. In their proposed approach, a user first registers
with the system to avail the services. In this scheme, all the
messages (i.e., authentication messages, service messages)
are first encrypted using a shared secret key between two
parties, and then the two parties communicate with each other
with encrypted forms of messages. It may be noted that in the
Kerberos protocol, a password based approach with a token
based strategy needs to be followed for principal authentication. According to the current practice, a user makes an
authentication request to an authentication server (AS) by
means of a plain text containing ‘‘username’’ [34]. In this
context, an attacker can eavesdrop the ‘‘username’’ and later
expose himself to the AS as a legitimate user. In other word,
an attacker can easily determine from the transmitted message that which users are currently online. In this situation,
an attacker has scope to make man-in-the-middle attacks and
replay attacks [10]. Further, an eavesdropper can make identity compromisation and impersonation attacks by stealing
the ‘‘username’’ if the channel is insecure [34], [43], [44].
Moreover, the AS issues an authentication ticket (AT) to
an end user after verifying only its ‘‘username’’ without
verifying user’s password or other security credentials [10].
However, as ‘‘username’’ is not a confidential credential,
there is an opportunity for an attacker to get multiple authentication tickets by simply sending a ‘‘username’’ to the AS.
As a consequence, a cryptanalyst can decrypt the ciphertexts
(i.e., ATs) using some knowledge about underlying user’s
password. Thus, this scheme is vulnerable to Ciphertext-only
Attack (COA). To avert this challenges, a public key infrastructure based Kerberos namely PKINIT [24] is reported and
deployed in Hadoop. But, it is not properly addresses the
user and service server’s privacy issues and other security
threats [10].
Somu et al. [14] proposed an authentication scheme for
Hadoop and it is based on the encryption mechanism using
one-time pad key. A random key is used to encrypt the password for secure transmission between the two servers (Registration Server and Back-end Server). Authors has claimed
that their protocol makes the Hadoop environment more
secure as the new random key for encryption is generated for
each login. They also claimed that their scheme reduces the
possibility to decrypt the cipher stored into the server for an
adversary as it involves the knowledge about the valid random
key. Sarvabhatla et al. [13] illustrated that Nivethitha et al.’s
scheme is vulnerable to offline password guessing attack and
on success of it, an attacker can perform all major attacks
on HDFS. They proposed a new authentication service for
Hadoop framework which is light weight and resists all major
attacks as compared to Nivethitha et al.’s scheme. The authors
also did a comparative analysis between their proposed user
authentication service versus Nivethitha et al. scheme and
found out that their scheme requires less number of hash
operations as compare to Nivethitha et al.’s scheme.
VOLUME 6, 2018
For users’ job authorization in Hadoop, an hash-based
(MD5 and SHA-1) delegated job token mechanism has
been reported in [5]. OAuth [22], OpenID connect [32] and
SAML [42] are the new evolving authorization delegation
based approach and it has been prioritize over traditional
Kerberos protocol for principals’ authorization and single
sign-on capability incorporated in Hadoop.
Rahul and GireeshKumar [12] proposed a novel authentication framework for Hadoop. Their framework uses cryptographic functions such as public key cryptography, private
key cryptography, hash functions, and random number generator. In this framework, they define a new key for each
client and authenticate all clients and services using this key.
They claimed that their authentication framework offers user
data protection, a new way of privilege separations, and basic
security needs for data storing inside HDFS.
Sadasivam et al. [11] proposed a novel authentication protocol for Hadoop in cloud environment, where they have
used the basic properties of a triangle and modified two
server-based model to improve the security level of Hadoop
clusters. In their scheme, they have interpreted and alienated
the user given password using the authentication server and
stored in multiple back-end servers along with the corresponding username.
Kang and Zhang [23] proposed an Identity-Based Authentication (IBA) scheme which is of short key size, identitybased, non-interactive. This scheme divides the sharing users
into the very same domain and in this domain relies on the
sharing global master key to exercise mutual authentication.
Their IBA scheme can be enabled by an emerging cryptographic technique from the bilinear pairing (i.e., Weil and
Tate pairing [45] and its security can be assured by the Bilinear Diffie-Hellman Problem (BDHP)). But the limitation of
this scheme is, if the global master key is leaked, then the total
system will be jeopardized.
Sharma and Navdeti [6] listed various security mechanisms inside Apache Hadoop Stack. According to the authors,
most of the cases, the Kerberos approach is preferably used
for delivering authentication services.
Srinivas et al. [17] proposed 2PBDC: a privacy-preserving
Big Data collection scheme in cloud environment utilizing
elliptic curve cryptography. The authors shows that 2PBDC
offers a better trade-off among the security and functionality
features, communication and computation overheads. Aujla
et al. [16] proposed SecSVA: Secure Storage, Verification,
and Auditing of Big Data in the Cloud Environment. The
authors presented an attribute-based secure data deduplication framework for data storage on the cloud, Kerberos-based
identity verification and authentication, and Merkle hashtree-based trusted third-party auditing on cloud.
Karla and Sood [46] proposed cookie-based authentication
and key exchange protocol for cloud and IoT environment.
Later, Kumari et al. [47] pointed out the security flaws
D. Chattaraj et al.: HEAP: Efficient and Fault-Tolerant Authentication and Key Exchange Protocol
of Karla and Sood scheme. Yang et al. [48] proposed an
authentication scheme in a cloud environment setting. However, Chen et al. [49] pointed out the security pitfalls in
Yang et al.’s scheme [48] that it is vulnerable to insider and
impersonation attacks. To withstand these security loopholes
in Yang et al.’s scheme, Chen et al. then designed a dynamic
ID-based authentication scheme for cloud computing environment, which is based on the elliptic curve cryptography
(ECC). Wang et al. [50] reviewed Chen et al.’s scheme [49],
and proved that their scheme is vulnerable to offline password guessing as well as impersonation attacks. In addition,
it was found that Chen et al.’s scheme does not provide user
anonymity and it also has clock synchronization problem.
Later, Hao et al. [51] presented a time-bound ticketbased mutual authentication scheme for cloud computing.
The purpose of using the time bound tickets is to reduce
the server’s processing overhead. Unfortunately, Jaidhar [52]
identified that Hao et al.’s scheme [51] is insecure against
denial-of-service attack during the password change phase.
Wazid et al. [20] also proposed a provably secure user
authentication and key agreement scheme for cloud computing environment. Their scheme withstands the weaknesses of
the existing schemes and it also supports extra functionality
features, such as user anonymity, and efficient password and
biometric update phase in multi-server environment.
Recently, Gope and Das [53] proposed an anonymous
mutual authentication scheme for ubiquitous mobile cloud
computing services, which allows a legitimate mobile cloud
user to enjoy n-times all the ubiquitous services in a secure
and efficient way, where the value of n may differ based on the
principal he or she has paid for. In addition, Odelu et al. [21]
reviewed Tsai-Lo’s scheme [54] and pointed out that their
scheme does not provide the session-key security and also
strong user credentials’ privacy. To remove the security weaknesses found in Tsai-Lo’s scheme, Odelu et al. designed
a provably secure authentication scheme for distributed
mobile cloud computing services. In addition to this, various biometric and smartcard based multi-factor authentication protocols [55]–[63] are found in the recent literature for
multi-server environment. In spite of these approaches, various two-server based PAKE schemes [64]–[68] are evolved
to mitigate server-side dependability (by addressing single
point of failure and single point of vulnerability) and security
The proposed authentication protocol is based on both asymmetric and symmetric key cryptography. In this context,
in this work, we use Elliptic Curve Cryptography (ECC) and
stateless CBC (cipher block chaining) mode of the Advanced
Encryption Standard (AES) for both public and private key
cryptography, respectively. The cryptographic hardness property related to Elliptic Curve Decisional Diffie-Hellman
Problem (ECDDHP), Elliptic Curve Discrete Logarithm
Problem (ECDLP) and Indistinguishability of Encryption
scheme under Chosen Plaintext Attack (IND-CPA) are briefly
explained in the following subsections. The proposed scheme
also utilizes the collision-resistant cryptographic one-way
hash function. This section provides a brief discussion about
the aforesaid mathematical preliminaries as follows.
The indistinguishability of encryption scheme under chosen
plaintext attack (IND-CPA) [61] is mathematically explained
as follow:
Definition 1 (IND-CPA Secure): Assume SGL or MEL
be the single or multiple eavesdropper/s respectively, and
OLEK1 , OLEK2 , · · · , OLEKM be M different independent
encryption oracles related to EK1 , EK2 , · · · , EKM encryption keys, respectively. The advantage functions of SGL
and MEL respectively, are defined as AdvIND−CPA
E ,SGL (K ) =
|2 · Prob[SGL
OLEK1 ; (p0 , p1
µ ←R {0, 1}; τ ←R OLEK1 (pµ ) : SGL(τ ) =
µ] − 1|, and AdvIND−CPA
E ,MEL (K ) = |2 · Prob[MEL ←
OLEK1 , OLEK2 , · · · , OLEKM ; (p0 , p1 ←R MEL); µ ←R
{0, 1}; τ1 ←R OLEK1 (pµ ), τ2 ←R OLEK2 (pµ ), · · · , τM ←R
OLEKM (ptµ ) : MEL(τ1 , τ2 , · · · , τM ) = µ] − 1|. We can
say a symmetric cipher E is IND-CPA secure for the single
or multiple eavesdropper/s setting if AdvIND−CPA
E ,SGL (K ) (or
parameE ,MEL
ter K of any probabilistic and polynomial time adversary SGL
(or MEL).
From Definition 1, it is easy to proof that a deterministic encryption scheme is not IND-CPA secure [61]. Further,
there exists five generic modes of symmetric encryption
scheme in the literature, namely Electronic Codebook (ECB),
Output Feedback (OFB), Cipher Block Chaining (CBC),
Cipher Feedback (CFB) and Counter (CTR) respectively.
From these aforesaid modes, both ECB and stateful CBC
modes are not IND-CPA secure, particularly in stateful CBC
mode the value of Initialization Vector (IV ) remains constrained which is shared between the sender and receiver. But,
in stateless CBC mode, the IV value is chosen randomly
for each message block. Thus, we use AES with stateless
CBC mode of encryption or decryption policy throughout this
paper so that it becomes IND-CPA secure [61].
A one-way hash function h: {0, 1}∗ → {0, 1}l takes a
binary string of variable length input, say x ∈ {0, 1}∗ and
results a binary string h(x) ∈ {0, 1}l as an output of fixed
length, say l bits. The formal definition of h(·) is provided as
follows [69] [10].
Definition 2 (Collision-Resistant One-Way Hash Function): If an adversary A’s advantage in finding collision
in hash outputs with the execution time t is denoted by
(t), it is defined by AdvHASH
(t) = Pr[(x, y) ←R A:
x 6= y and h(x) = h(y)], where Pr[E] is the probability of an
event E and (x, y) ←R A means the pair (x, y) is randomly
chosen by A. By an (η, t)-adversary A attacking the collision
VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-Tolerant Authentication and Key Exchange Protocol
resistance of h(·), it indicates that the execution time of A is
(t) ≤ η.
at most t and that AdvHASH
Examples of a one-way hash function include the Secure
Hash Standard (SHA-1) hashing algorithm and the stronger
SHA-256 hashing algorithm [70].
Suppose m, n ∈ Zp , where Zp = {0, 1, . . . , p−1} and p > 3 is
a prime [10]. A non-singular elliptic curve y2 = x 3 + mx + n
over the finite field Zp is the set Ep (m, n) of solutions (x, y)
∈ Zp × Zp to the congruence
y2 ≡ x 3 + mx + n (mod p),
where m, n ∈ Zp such that 4m3 + 27n2 6 = 0 (mod p), and a
point at infinity or zero point O.
Note that 4m3 + 27n2 6 = 0 (mod p) is a necessary and
sufficient condition to ensure a non-singular solution for the
Eq. x 3 + mx + n = 0 [71]. 4m3 + 27n2 = 0 (mod p) implies
the elliptic curve is singular. Let P = (xP , yP ), Q = (xQ , yQ )
∈ Ep (m, n). Then xQ = xP and yQ = −yP when P + Q = O.
Also, P + O = O + P = P, for all P ∈ Ep (m, n). Hasse’s
theorem states that the number of points on Ep (m, n), denoted
as #E, satisfies the following inequality [72]:
p + 1 − 2 p ≤ #E ≤ p + 1 + 2 p.
In other words, there are about p points on an elliptic curve
Ep (m, n) over Zp . Also, Ep (m, n) forms a commutative or an
abelian group under addition modulo p operation.
• Elliptic curve point addition: Let P, Q ∈ Ep (m, n) be
two points on the elliptic curve. Then, R = (xR , yR ) =
P + Q is calculated as follows [72]:
xR = (λ2 − xP − xQ ) (mod p),
yR = (λ(xP − xR ) − yP ) (mod p),
yQ − yP
(mod p), if P 6 = −Q
 x −x
where λ =
3xP 2 + m
(mod p), if P = Q.
Elliptic curve point scalar multiplication: In ECC,
multiplication is done as repeated additions. For example, 5P = P + P + P + P + P, where P ∈ Ep (m, n).
Definition 3 (ECDLP Assumption): Given an elliptic
curve Ep (m, n) and two points R, S ∈ Ep (m, n), find an integer
x such that S = x · R.
Definition 4 (ECDDHP Assumption): Given a point R on
an elliptic curve Ep (m, n) and two other points x · R, y · R
∈ Ep (m, n), find (x · y) · R.
In this section, we discuss the proposed scheme in
detail. We call the proposed scheme as HEAP (Efficient
Authentication Protocol for Hadoop). The system architecture of HEAP is shown in Figure 2.
VOLUME 6, 2018
Six types of principals are involved in the proposed system
model: 1) client (C), 2) Big Data Service Provider (BDSP)
3) Namenode Server (NS) or Job Tracker (JT ), 4) Client Management Server (CMS), 5) Namenode Management Server
(NMS) and 6) Enrolment Server (ES). Both CMS and NMS
are the public servers in two-server model, whereas ES is
the private server. CMS is reachable to Ci utilizing a client
application instance say HCAj , where i ∈ {1, 2, 3, · · · , n}.
NMS is reachable to BDSPj ’s administrator using a server
application instance say HSAk , where j ∈ {1, 2, 3, · · · , m}.
Both CMS and NMS are reachable to adversaries but, ES
operates in the background and it is fully supervised internally by the respective system administrator only. Thus, ES
is fully trusted principal in the network. To make the proposed
system model fault-tolerant, we distribute Ci ’s secret credentials into two servers (NMS and ES) whereas disseminates
BDSP’s administrators and their deployable service server’s
(NS’s and JT ’s) private (secret) information into another pair
of servers (CMS and ES).
Initially, BDSP’s administrator needs to register all the
service servers (NS and JT ) of his own Hadoop cluster with
ES online. To do this, BDSP’s administrator first enrol himself with ES and go through an authenticated key agreement
procedure utilizing both NMS and CMS server. C enrol himself with ES during registration phase, but the authenticated
session key agreement task will be held by both CMS and
NMS. To prove the legitimacy of client C and BDSP’s administrator, both need to give responses about three different
challenges (specifically maintained by a two-step verification
process utilizing user identity, password and digital signature) assisted by both the servers (CMS and NMS). The
successful legitimacy checking provides a Big Data storage
or processing service server ticket to C and service server
enrolment access privilege to BDSP’s administrator. The provided service ticket will then give access to the NS or JT after
accomplishment of a mutual authentication and session key
establishment process. The application instance HCAj will
give access to the CMS for Ci including adversaries whereas
HSAk will provide access to NMS for BDSP’s administrator
including attackers. But, it is not possible for an adversary
to access both CMS and NMS together utilizing a single
application say HCAj or HSAk .
Presently, we have found three widely used threat models in the literature such as, Dolev-Yao threat model
(DY model) [73], Canetti and Krawczyk adversary model
(CK-adversary model) [74], and Extended Canetti and
Krawczyk threat model (eCK-adversary model) [75] to
model active and passive adversaries. However, we adopt
DY model and CK-adversary model to study the proposed
Under DY model [73], an insecure channel between two
communicating parties has been modeled mathematically in
D. Chattaraj et al.: HEAP: Efficient and Fault-Tolerant Authentication and Key Exchange Protocol
FIGURE 2. System architecture of HEAP.
such a way that an adversary Adv can intercept, delete or
modify the exchanged messages. In addition to this, Adv
may insert a fake message into the communication media to
disgust the normal operations between two communicating
parties. In the CK-adversary model [74] (the super set of
DY model), the adversary Adv not only eavesdrop, delete or
modify the exchanged messages between two communicating
parties but also having the access to the session keys (shortterm keys), long-term secret keys and session states of each
party involves into the key agreement process. This model
ensures the security of the authenticated key agreement protocol considering some sorts of security credentials (long-term
and short-term) leakage and its impact on the security of other
secret credentials.
We follow both DY and CK-adversary model in the proposed protocol, where we assume HEAP-KDC is trusted
for both C and NS (or JT ). Further, it is assumed that
CMS and NMS are semi-trusted, whereas ES is fully trusted
server. According to the policy of the DY model, any two
parties such as C and NS or C and JT , are not considered as trustworthy principals in the network. Therefore,
in this DY model, an adversary (active and passive) Adv
can then eavesdrop, modify or delete the exchanged messages between C and NS or C and JT during communication. We also assume that the information stored at the
C’s workstation (mobile device) can be stolen by Adv and
after obtaining the stolen information, Adv can perform
the stolen-verifier and privileged-insider attacks. In addition,
under the CK-adversary model, adversary Adv can have
access some form of secret credentials including session key
and session states between C and NS or C and JT . Under
this assumption, the proposed protocol needs to show less
security breech possibility of other entities’ (BDSP, CMS,
NMS and ES) secret credential due to the leakage of ‘‘session
ephemeral secrets’’ between C and NS or C and JT .
However, in this study, we inspect several known security
threats such as, chosen plain-text, denial-of-service, manin-the-middle, online password guessing, server compromisation, replay, privileged-insider, stolen-verifier, offline
password guessing, workstation compromisation, server
spoofing and identity compromisation attacks considering
both DY and CK-adversary model.
HEAP goes through five basic operations: (i) HEAP-KDC
configuration, (ii) user enrollment, (iii) Big Data service
provider registration, (iv) Hadoop Cluster vis-a-vis service
server enrollment and (v) mutual authentication and session
key agreement between user and service server. Two security
application instances namely HCAj and HSAk are running
separately on user’s workstation and service provider workstation to access a particular public server (CMS or NMS)
of the HEAP-KDC’s realm. More precisely, C accesses only
CMS through the application HCA and the service provider
accesses only NMS through the application HSA.
VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-Tolerant Authentication and Key Exchange Protocol
Initially, a Big Data Service Provider (BDSP) (or
specifically the BDSP’s administrator (BA)) needs to register himself with the ES through out-of-band channel (for
example, a postal network) with his identity proof documents, service level agreement, service server details, etc.
This entails to avail a one-time dummy identity, one-time
dummy password and a pass-phrase to the service provider.
After receiving these parameters offline, the service provider
enrolls himself online with the HEAP-KDC utilizing both
NMS and HSA. During online registration, the service
provider needs to provide both the dummy identity and the
dummy password to NMS through HSA. This entails to get
a user account creation permission from NMS. In such a
provision, the service provider sends his information namely,
masked identity, masked password, email identity and mobile
number, etc. to both CMS and ES servers utilizing NMS via a
secure channel (utilizing pass-phrase as a key). After sending
the service provider’s information to CMS and ES, NMS only
keeps the masked identity in its database. Similarly, an end
user C register himself with ES by taking the help of CMS
and HCA. During this operation, C sends his transformed
identities, masked password, email identity and mobile number securely to NMS and ES servers via CMS. In this way,
both user and service provider registration process has been
After accomplishment of registration task, the BDSP needs
to keep his original identity and password with himself.
These two secrets are utilized at the time of service provider
login phase. After successfully logged in into its workstation (locally) using HSA, the BDSP can register his Hadoop
Cluster vis-a-vis Big Data storage and processing service
servers (NSs’ and JT s’) with HEAP-KDC followed by a
mutual authentication and key agreement process utilizing
both NMS and CMS servers (two-server based authentication). This process is scalable in nature, where any Big
Data service providers can able to enroll his cluster’s service servers’ online with HEAP-KDC. BDSP securely enrolls
each service server to both CMS and ES through NMS by
assigning a service server’s masked identity and a masked
Similarly, after registration, C needs to keep his user identity and password secret with himself for logging in into its
workstation. At the time of login, C needs to authenticate
himself in its workstation locally utilizing HCA by providing
his identity and password. After that, C goes through a mutual
authentication and key formation process utilizing both CMS
and NMS servers (dual server based authentication). This
entails a short-term key to C. Utilizing this short-term key, C
establishes a secure session with CMS. We call this process
as single sign-on of client C. After the single sign-on task,
CMS provides two encrypted tickets: (1) client ticket and
2) service server (either Big Data storage service server or
processing service server) ticket to C. Utilizing this two
tickets, C establishes a secret session with a service server
(either NS or JT ) associated with a particular cluster.
VOLUME 6, 2018
During two-server based mutual authentication and key
agreement phase, the service provider (BA) enters his original
identity and password to HSA. HSA computes a masked
identity of the given identity. HSA then construct a digital
signature on the masked identity utilizing a random nonce and
BA’s chosen private key. After that, HSA sends the masked
identity and the digital signature to NMS. In the same way,
NMS construct a digital signature by signing its original
identity with its private key. NMS sends BA’s masked identity
and BA’s digital signature along with NMS’s encrypted digital
signature and NMS’s identity to CMS. After verifying the
identities of both BDSP and NMS, CMS decrypt NMS’s signature. Thereafter, CMS modifies both the digital signatures
utilizing the previously shared pass-phrases of both the parties (BDSP and NMS) and CMS’s private key. CMS encrypts
the modified signatures using BDSP’s masked password and
NMS’s shared key. CMS then sends both the encrypted signatures to NMS. NMS decrypts the respective signature and
verifies the legitimacy of both BDSP and CMS utilizing the
previously loaded security parameters and sends the other
signature to BDSP. After receiving the modified signature,
BDSP (or BA) checks the legitimacy of both NMS and CMS.
Finally, using the random nonces and ECC, both BDSP and
NMS establish a session key between themselves. In the same
way, at the time of single sign-on process, C establishes a
short-term key with CMS.
After establishment of a secure session with NS utilizing
HCA, C outsources its Big Data in terms of raw data blocks
and its replicas into several Datanodes or chunk servers under
the supervision of NS. In such a provision, HCA supplies
the session key to the corresponding HDFS Client (HDCL)
to achieve secure and integrity-assisted HDFS-read and
HDFS-write operations. Thus, it will protect the user’s confidential Big Data from the third party interception.
To make the proposed authentication protocol fault tolerant in terms of security credentials’ replications, we keep
the service providers and service servers credentials (mainly
transformed identities and masked passwords) under CMS’s
custody whereas disseminate the C’s credentials to NMS.
Mean while, all the security credentials information are
replicated concurrently into ES server. In addition to this,
to transform the service provider’s identity and password, two
random secrets (one secret generated by NMS and other produced by the HSA) are embedded with the service provider’s
original user identity and password first, and afterwards a
cryptographic one-way hash function has been applied with
themselves. Similarly, C’s original user identity and password are encapsulated with CMS’s chosen secret and HCA’s
secret, respectively. Thus, the aforesaid mechanism leads to
create a strong password for both the parties (service provider
and C) as well as reduce the chance of both single point of
failure and single point of vulnerability issues.
To discuss HEAP methodology, we use various notations
throughout the paper. The notations and their descriptions
are listed in Table 1. In addition to this, we make certain
D. Chattaraj et al.: HEAP: Efficient and Fault-Tolerant Authentication and Key Exchange Protocol
TABLE 1. Notations and their meanings.
workstations (user’s and service provider’s workstations) which enable to access the public servers (i.e.,
CMS and NMSs) in a HEAP-KDC realm.
An administrative application instances say, ‘‘h-admin’’
takes the responsibility of initial credentials’ (i.e.,
private keys, pass-phrases and public identities) configuration in the key distribution center (HEAP-KDC)
(also see Figure 2).
A Trusted Central Certification Authority (TCCA)
chooses a generator G on the elliptic curve Ep (m, n) of
order q, and selects two cryptographic hash functions
say, H1 (·) and H2 (·). Further, the TCCA generates a
certificate CertE for an entity E. The entity E randomly
chooses sE ∈ Z∗q as private key and computes the
corresponding public key as QE = sE · G.
BDSP enrolls (i.e., online registration) himself with
CMS via NMS followed by an offline registration
with ES. After registration, BDSP needs to login into
the system. After login, BDSP registers his Hadoop
cluster vis-a-vis the service servers (NSs’ and JT ’s)
with HEAP-KDC. Note that at least one cluster needs
to be deployed with the KDC before initiating client
C enrolls (i.e., online registration) himself with NMS
via CMS followed by an offline registration with ES.
After registration, C needs to login into the system for
accessing the service servers (NS’s or JT ’s).
C and NS or C and JT are not considered as trusted
entity. They should mutually verify their legitimacy
with the help of both CMS and NMS. After verification,
either C and NS or C and JT become trusted to each
CMS keeps masked identities of Cs, RCMS
Ci s and all the
secret credentials related to the Hadoop cluster vis-avis the service servers information whereas NMS stores
masked identities of BDSPs, RNMS
BDSP s and all the secret
credentials of clients (Cs). ES is having all the secret
information of Cs’, BDSPs’ and service servers’. Note
here, it is not permissible for HEAP-KDC’s server
(CMS or NMS or ES) to store the secret credentials
(mainly identity, password) of any principals in a plaintext format.
C or BDSP goes through a two-server based mutual
authentication to avail services from the service server
or deploy a new cluster with HEAP-KDC.
Finally, ES does not available to any other entities (Cs,
NSs and JT s) except NMS and CMS.
This section illustrates the detailed description of the proposed protocol phases as follows.
assumptions for HEAP. These assumptions are described as
1) Two security application instances say HCAi and
HSAk are running concurrently into two separate
HEAP undergoes an initial configuration phase, where a
System Administrator (SA) frames the Key Distribution Center (i.e., HEAP-KDC). In this phase, all public servers are
VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-Tolerant Authentication and Key Exchange Protocol
TABLE 2. Summary of pre-loaded credentials into HEAP-KDC after the
execution of h-admin.
pre-loaded with secret credentials administered by the SA.
In this regard, an admin process, called h-admin, runs at the
time of HEAP-KDC configuration under the supervision of
the SA. This phase follows a public server registration process
to register both CMS and NMS with ES. In contrast, ES loads
the public identities of CMS and NMS namely CMSID and
NMSID in its database and share its public identity with both
the servers. In addition, the h-admin process assigns three
long-term shared secret symmetric keys between CMS and
ES, NMS and ES, and CMS and NMS as K(CMS,ES) , K(NMS,ES)
and K(CMS,NMS) , respectively. Further, h-admin generates one
pass-phrase say SID(CMS,NMS) . Finally, h-admin loads all
these security parameters into respective servers. After the
completion of h-admin process execution, the known security
parameters with the HEAP-KDC are summarized in Table 2.
Note that h-admin also assigns two public identities for
both client application (HCA) and service provider application (HSA), say HCAID and HSAID , respectively. HCAID is
publicly available to CMS and HSAID is publicly available
to NMS. Similarly, the public identity of CMS is known to
HCA whereas the public identity of NMS are known to HSA.
These identities are also useful for the key agreement process
at the beginning of the client (or Big Data service provider)
registration and login phases, respectively.
Suppose a Big Data Service Provider (BDSP) wants to deploy
a new Hadoop Cluster (HCj ) for providing the data storage
and processing services via Internet. To enrol the HCj with
HEAP-KDC, BDSP’s Administrator (BA) needs to register
himself with ES offline (i.e., via the out-of-band channel
or postal network) by giving the detail about total number of service servers’, service types, Service Level Agreements (SLAs), service servers’ location information, service
servers’ subscription, payment documents, etc. This entails
the BA to avail a synthetic identity (SBDSPID ), a synthetic
password (SBDSPPWD ) and a pass-phrase (KCMS,BDSP ) respectively, via the out-of-band channel to the BDSP’s physical
address. Before sending these three security credentials to
BA, ES securely sends SBDSPID and SBDSPPWD to the NMS for
creating a synthetic account of BA. ES also sends securely the
pass-phrase KCMS,BDSP to the CMS. Note that BA needs to
use SBDSPID , SBDSPPWD and KCMS,BDSP only for once to create
his own profile into CMS with the help of NMS. Although,
BA is permissible to use this pass-phrase KCMS,BDSP for his
password updation and service server registration process.
VOLUME 6, 2018
Figure 3 summarizes the BA’s registration phase, which contains the following steps:
Step BDSPRG1: BA enters SBDSPID and SBDSPPWD into
HSA. HSA generates a random nonce say n01 . HSA
encrypts SBDSPID and n01 using SBDSPPWD as BDSP Dtl =
E(SBDSPPWD : [SBDSPID , n01 ]). HSA sends the message
msgB1 = {NMSID , n01 , BDSP Dtl } to the NMS.
Step BDSPRG2: After receiving msgB1 , NMS decrypts
BDSP Dtl and checks the availability of SBDSPID in its
database. If it exists then, BA is permissible to create its own account into CMS and return msgB2 =
{NMSID , CMSID , n01 }, and goto Step BDSPRG3 else,
reject BA’s request.
Step BDSPRG3: BA enters its original identity as BDSPID
in HSA. HSA chooses a random secret d and transforms the BDSPID as T U 0 = h(BDSPID || d ||HSAID ).
Using this transformed identity T U 0 , both BA and NMS
establishes a shared secret key (SK(NMS,BDSP) ) between
themselves followed by a mutual authentication process
utilizing the similar analogy say ‘‘Initial key establishment between Ci and FEAS’’ reported in DPTSAP [10].
Step BDSPRG4: BA enters its new password as BDSPPWD
into HSA. HSA computes the masked password
BDSP ). Note that
at the time of shared secret key establishment process RNMS
BDSP was generated by NMS and it has been
delivered securely using the key SK(NMS,BDSP) . Further,
both BA’s request and NMS’s response messages using
DPTSAP [10] are represented as msgB3 and msgB4 ,
respectively in Figure 3.
[BDSP∗PWD , MT U 0 ]), where MT U 0 = h(BDSPID ||
BDSP ). HSA sends the message msgB5 =
{NMSID , n02 , BDSP 0Dtl } to the NMS.
Step BDSPRG6: After receiving msgB5 , NMS broadcasts
both MT U 0 and BDSP 0Dtl to both ES and CMS servers
(see msgB5.1 ). NMS keeps only MT U 0 and RNMS
BDSP in its
database and deletes other informations, wherein, after
decrypting BDSP 0Dtl , both ES and CMS stores MT U 0
and BDSP∗PWD into their corresponding databases (see
msgB5.2 ). Finally, NMS sends a registration confirmation
message as msgB6 to BA through HSA and goto Step
Step BDSPRG7: HSA computes d ∗ = d⊕ h(BDSPID
BDSP ||d), and then
stores these information into HSA’s database. HSA will
use these information at the time of BA’s login, service
server registration and password updation phases.
Note that after accomplishment of the registration process,
BA needs to remember only two parameters BDSPID and
BDSPPWD to login into the system and then he can enrol any
D. Chattaraj et al.: HEAP: Efficient and Fault-Tolerant Authentication and Key Exchange Protocol
FIGURE 3. Summary of service provider registration.
number of Hadoop Clusters (HC) online with HEAP-KDC.
Thus, the proposed service provider registration scheme is
user friendly and scalable in nature. The online enrolment
process of the HC driven by the BA is presented as follows.
Hadoop cluster registration vis-a-vis service servers enrolment phase is proposed to carry out the different activities
starting from service provider login to online registration of
service servers (mainly, all Namenode servers and JobTrackers belongs to a particular Hadoop cluster say HCj , where j =
1, 2, · · · , m). The proposed enrolment phase consists of three
activities: 1) service provider (BA) login, 2) mutual authentication and session key establishment between BA and NMS
and 3) service servers registration. The detail steps involved
in this process are discussed as follows. For simplicity, in this
study, we assume a particular cluster say HCj consists of a
single Namenode server (NSj ) and a single Job Tracker server
(JTj ), and NSj is responsible to provide the Big Data storage
services wherein JTj yields the Big Data processing services
to the remote user online.
(that is, FI i , where i = 1, 2, · · · , k) into its browser cookie.
A diagram summarizing several communication message
exchanges between BA and NMS involved throughout the
service provider login, authenticated key establishment and
service server registration process are shown in Figure 9, and
it contains the following steps:
Step BDSPL1: BA enters his original identity BDSPID and
password BDSPPWD into HSA. HSA computes FI ∗ =
h(BDSPID ||BDSPPWD ) and checks this entry exists in its
cookie or not. If it exists then HSA loads the respective
d ∗ , RNMS
BDSP , KCMS,BDSP and BDPW entries for the same
Step BDSPL2: HSA computes d = d ∗ ⊕ h(BDSPID
BDSP = RBDSP ⊕ h(d || BDSPPWD ) and
and goto Step BDSPL3. Else, BA can repeat Step
BDSPL1 with another user identity and password.
Step BDSPL3: HSA computes BDPW ∗ = h(BDSPID ||
HSAID || BDSPPWD || d) and checks if the condition
BDPW ∗ = BDPW holds or not. If it holds, BA is treated
as an authentic service provider.
After the completion of BA’s registration process, BA tries
to login into the system using the server application HSA.
Note here, before initiating service provider (BA) login into
BA’s workstation, HSA loads all BA’s transform identities
After successful logging in into the system, HSA initiate an
authenticated key formation process between BA/BDSP and
NMS. The detail steps involved in this process are shown
in Figure 9 and are discussed as follows.
VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-Tolerant Authentication and Key Exchange Protocol
FIGURE 4. BDSP’s signature generation process.
Step MASKA1: HSA computes BDSPi masked identity
MT U 0 = h(BDSPID ||d|| HSAID || RNMS
BDSP ). Further,
HSA generates two pseudo-random numbers (λ1BDSP and
λ2BDSP ) utilizing MT U 0 and different pre-loaded security domain parameters as the input to a function say
bdspSignPairGen[ ](·) (see Figure 4).
Step MASKA2: HSA forms the message M1 = {MT U 0 ,
λ1BDSP , λ2BDSP , NMSID , CertBDSP } and sends it to NMS
via a public channel.
Step MASKA3: After receiving M1 , NMS searches its
database to check the existence of MT U 0 . If it finds
the same then NMS generates two pseudo-random numbers (µ1NMS , µ2NMS ) by taking NMSID and other domain
parameters as the input to a function say nmsSignPairGen[ ](·) (see Figure 5).
Step MASKA4: NMS constructs a message M2
µ2NMS ])} and sends it to CMS via a public channel.
Step MASKA5: After receiving the message M2 , CMS
searches both MT U 0 and NMSID in its database. If both
are exists then CMS understands that both BDSP and
NMS are legitimate parties, and goto Step MASKA6;
otherwise, rejects NMS’s request.
Step MASKA6: CMS loads both BDSP’s pass-phrase
(KCMS,BDSP ) and NMS’s secret identity (SIDCMS,NMS )
from its database. CMS modifies both BDSP’s and
NMS’s partial signatures (i.e., µ2NMS and λ2BDSP ) using
a function cmodifiedSignPairGen[ ](·) (see Figure 6) as
1 = µ2
(1) µnew
(mod q) = RNMS ·H2 (H1 (NMSID ))+ SNMS ·H2 (µ1NMS )+
SCMS · H2 (H1 (KCMS,BDSP )) (mod q) and (2) λnew
λBDSP + SCMS · H2 (H1 (SIDCMS,NMS )) (mod q) =
RBDSP ·H2 (H1 (MT U 0 ) +SBDSP ·H2 (λ1BDSP ) +SCMS ·H2
(H1 (SIDCMS,NMS )) (mod q) and goto Step MASKA7.
Step MASKA7: CMS constructs a message M3
{CMSID , NMSID , CertCMS , E(KCMS,NMS : [λnew
1 Note: CMS digitally sign on the messages say µ2
µ1NMS using CMS’s private key SCMS .
2 Note: CMS digitally sign on the messages say MT U 0 , SID
λ1BDSP using its private key SCMS .
VOLUME 6, 2018
FIGURE 5. NMS’s signature generation process.
FIGURE 6. Signature updation process into CMS.
E(BDSP∗PWD : [MT U 0 , µnew
NMS ])} and sends the same
to NMS.
Step MASKA8: NMS verifies the legitimacy of both BDSP
and CMS utilizing the bcmsVerification(·) function
shown in Figure 7. If the function bcmsVerification(·)
returns ‘‘Accept’’, then NMS construct a session key
SK = RNMS · λ1BDSP = RNMS · RBDSP · G and a message
M4 = {MT U 0 , NMSID , µ1NMS , CertCMS , CertNMS ,
E(BDSP∗PWD : [MT U 0 , µnew
NMS ]}. NMS sends M4 to
BDSP, and goto Step MASKA9; otherwise, it rejects
BDSP’s request.
Step MASKA9: After getting the message M4 , BDSP
verifies both NMS and CMS utilizing the following
function ncmsVerification(·) (see Figure 8). If the function ncmsVerification(·) returns ‘‘Accept’’ then BSDP
receives NMS’s response and constructs a session key
otherwise; rejects NMS’s response.
Proof of Correctness: In order to verify the legitimacy
of both BDSP and CMS, NMS needs to check λnew
G = λ2BDSP · G + QCMS · H2 (H1 (SIDNMS,CMS )). To satisfy the verification condition, it must holds λ2BDSP · G =
λ1BDSP · H2 (H1 (MT U 0 )) +QBDSP · H2 (λ1BDSP ) = RBDSP ·
H2 (H1 (MT U 0 )) · G +SBDSP · H2 (λ1BDSP ) · G and QCMS ·
H2 (H1 (SIDNMS,CMS )) = SCMS · H2 (H1 (SIDNMS,CMS )) · G.
Similarly, to verify the legitimacy of both NMS and CMS,
D. Chattaraj et al.: HEAP: Efficient and Fault-Tolerant Authentication and Key Exchange Protocol
satisfies Y1 · µ1NMS = Y1 · RNMS · G, Y2 · QNMS = Y2 · SNMS · G
and Y3 · QCMS = Y3 · SCMS · G.
FIGURE 7. BDSP and CMS verification process at NMS.
After establishment of SKBDSP,NMS between BA/BDSP and
NMS, HSA starts a new session with NMS. In this regards,
BDSP can securely enrol all the service servers mainly NSj
(responsible for Big Data storage services) and JTj (responsible for Big Data processing services) with HEAP-KDC via
NMS. For simplicity, in this study, we assume that the BDSP
wants to configure a Hadoop Cluster HCj which consists of
only two service servers namely (1) NSj : responsible for controlling both namespace management and Big Data storage
service activities and (2) JTj : responsible for both task assignment and Big Data processing activities. To initiate a service
server registration process, BDSP sends an initial enrolment
request for both NSj and JTj to NMS via secure channel (using
SKBDSP,NMS ). In this regard, NMS asks CMS to provide two
shared symmetric keys for NSj and JTj . CMS sends the keys
in encrypted format as Kns,jt = E(KCMS,BDSP : [KCMS,NS
||KCMS,JT ]). Thereafter, as a response to the BDSP’s request,
NMS sends two random numbers namely RNMS
NSj and RJTj
along with Kns,jt to BDSP. A diagram summarizing several
communication message exchanges between BDSP and NMS
involved throughout the service server registration process
are shown in Figure 10, and it contains the following steps:
Step SSRG1: BDSP enters the security credentials namely
the identity, symmetric key and synthetic password for
both the service servers (i.e., NSj and JTj ) into HSA.
HSA computes the masked identities for NSj and JTj as
TNSID = h(NSID ||HSAID || rss1 ) and TJTID = h(JTID
||HSAID || rss2 ), and the masked password for the same
service servers as SIDNS = h(NSPWD || rss1 || RNMS
NSj ) and
FIGURE 8. NMS and CMS verification process at BDSP.
BDSP needs to verify µnew
NMS · G = Y1 · µNMS + Y2 · QNMS +
Y3 ·QCMS , where Y1 = H2 (H1 (NMSID )), Y2 = H2 (µ1NMS ) and
Y3 = H2 (H1 (KCMS,BDSP )). To satisfy the condition, it must
SIDJT = h(JTPWD ||rss2 ||RNMS
JTj ), respectively. Note that
rss1 and rss2 are two random numbers chosen by HSA.
Step SSRG2: HSA computes the masked identity of
BDSP as MT U 0 and construct a message M5 =
{MT U 0 , NMSID , n3 , E(K(CMS,BDSP) : [MT U 0 , n3 ,
E(SKBDSP,NMS : [MT U , n3 ])} and sends the same to
Step SSRG3: After receiving the message M5 , NMS checks
the presence of MT U 0 into Z5 = E(SKBDSP,NMS :
[MT U 0 , n3 ]) after decrypting the Z5 using the key
SKBDSP,NMS . If MT U 0 exists in its database then
BDSP is treated as authentic service provider and
NMS broadcasts {MT U 0 , E(K(CMS,BDSP) : [MT U 0 , n3 ,
both CMS and ES. After receiving E(K(CMS,BDSP) :
SIDJT ])}, both CMS and ES decrypts it and updates their
service server databases (after finding out the availability of masked server identity into their databases) and
sends their acknowledgements to NMS.
VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-Tolerant Authentication and Key Exchange Protocol
FIGURE 9. Summary of authenticated key agreement process between BDSP/BA and NMS. Note: Here, T10 = {CertCMS , CertNMS },
Z1 = E (KCMS,NMS : [MT U 0 , µ2NMS ]), Z2 = E (KCMS,NMS : [λnew
]), Z3 = E (BDSPPWD
: [MT U 0 , µnew
]), Operation1 , Operation2 , Operation3 ,
Operation4 and Operation5 : signifies execution of the function bdspSignPairGen[ ](·), nmsSignPairGen[ ](·), cmodifiedSignPairGen[ ](·),
bcmsVerif(·) and ncmsVerif(·), respectively (refer Figure 4, Figure 5, Figure 6, Figure 7 and Figure 8) and OC1 , OC2 and OC3 : denotes outcome of
bdspSignPairGen[ ](·), nmsSignPairGen[ ](·) and cmodifiedSignPairGen[ ](·) functions respectively.
Step SSRG4: Upon getting the acknowledgements from
both the servers (CMS and ES), NMS checks n03 = n3 .
If the condition is satisfied then NMS constructs a service servers’ (here, we consider two service servers
i.e., NS and JT in a particular cluster) registration
completion message as M6 = {NMSID , MT U 0 ,
n3 , E(SKBDSP,NMS : [NMSID , n3 ])} and sends it to
Step SSRG5: Getting the message M6 , BDSP decrypts
E(SKBDSP,NMS : [NMSID , n3 ]) and checks that n03 = n3 .
If the condition is satisfied then BDSP understand the
legitimacy of NMS and realized that the service servers
registration has been successfully accomplished with
Step SSRG6: HSA computes NSID
JTjPWD ⊕ h(BDSPID || BDSPPWD ), rss∗1 = rss1 ⊕
h(BDSPID ||BDSPPWD ) and rss∗2 = rss2 ⊕ h(BDSPID
0 , JT 0 ,
||BDSPPWD ), respectively. HSA stores NSID
, JTj
, rss∗1 , rss∗2 and HCjID information into its database for future use. Note that
VOLUME 6, 2018
HCjID signifies here as the pre-deployed Hadoop
cluster’s identity and HSA assigns a random value
for it.
After successful registration of the service servers, BDSP
configures the service servers and its client application
(HDCLj ) for the cluster HCj . In this regard, BDSP stores the
secret credentials offline among the corresponding service
servers and HDCLj . More precisely NSj has TNSID = h(NSID
|| HSAID || rss1 ), KCMS,NS and SIDNS = h(NSPWD ||rss1 ||
RNSj ); JTj has TJTID = h(JTID ||HSAID ||rss2 ), KCMS,JT
and SIDJT = h(JTPWD ||rss2 || RNMS
JTj ), and HDCLj has both
TNSID = h(NSID || HSAID || rss1 ) and TJTID = h(JTID
||HSAID ||rss2 ) information, respectively. Thus, completes
the HCj ’s deployment process. Finally, BDSP make the HCj
online for providing the Big Data storage and processing
services to the end users’.
Remark 1: In the service server registration phase,
we present an enrolment strategy considering two service
servers (Namenode Server (NS) and Job Tracker (JT )) belong
to a particular Hadoop cluster (HCj ). But, for simplicity and

Purchase answer to see full

error: Content is protected !!