An implementation of Replication Oriented Architecture (ROA) for Web Service Scalability

Web services provide application to application integration across different platforms. However, the consumption of web services generates request traffic that must be attended to by an instance of the web server without fail. To guarantee dependability of the web service, the instances of the web service are replicated as a way of scaling the web service. The Replication Oriented Architecture (ROA) has been designed and implemented using the Java Enterprise application development platform and interesting results have been obtained. Improvements in the PHP scripting language have made it a popular programming language for web and enterprise application development. In this paper, an implementation of the ROA architecture using PHP is done. The implementation is simulated on the Apache Jmeter and results compared to the results obtained in the Java implementation. The results show that both application development platforms achieve web service scalability as a quality of service (QOS) expected of a web service. In specific terms, 50.9% at 95.0% confidence level improvement in response time was achieved when PHP is used which compares favorably with 22.5% improvement at 95.0% confidence level achieved on the Java platform.


Introduction
The configuration of a web service requires a server instance of the web service. The carrying capacity of an instance, also referred to as a virtual server, depends on its configuration which includes memory, weights, administrator listener port, http listener port among other properties. Based on the configuration, the web server instance is given a weight which defines the number of requests it can handle among other things. Creating multiple server instances is a way of scaling the web service because a single server instance cannot scale to accommodate very large number of client connections (Braveti. Gilmore. Guidi, Tribastone, 2008).
An instance may fail and may need to be repaired automatically or otherwise. While awaiting repairs, the request will have to be transferred to another server instance. This ensures the availability of the web service at any given time. The need for several instances also arises when the number of requests outweighs the weight of the virtual server or the instance configuration is not the same. Creating several instances of web services defines its scalability.
There are empirical evidences that for the current generation of web server applications, multiprocessor platforms do not provide the needed scalability to handle large traffic volumes. A scalable web server architecture is therefore key to enabling web services to handle the ever increasing traffic loads (Marian, Birman and Rennese, 2006). The scalability of web services can be achieved by designing high quality architectures and replication is one such architecture. Replication is one of the most widely researched concepts in distributed computing and its initial use was in data replication. Data replication has been used to improve data access performances in distributed systems and it has been proved as capable of increasing data availability while reducing user waiting time, enhancing fault tolerance and ultimately improves scalability (Gopinath and Sherly, 2018).
A web service is a software component which is seen as a service it offers (ThiBen and Brambring, 2018). Web service is a self-contained component, which is published, located and invoked over the web. To achieve the goal of interaction across different platforms and programming languages, web service architecture defines standards for service definition (Web Service Description Language) and service interaction (Simple Object Access Protocol). (ThiBen and Brambring, 2018). The components of a web (Dustdar and Schreiner, 2005) service performs a discrete set of related functions. (Digvijaysinh, 2017) and there are no single web service standard. (Tilkov, 2005). Rather, the associated protocols aim to provide a means of describing data and behavior in a manner machine can process (Taylor and Harrison, 2009). When it is necessary to combine the functionality of several web service we speak of a composition service (Alonso et al, 2004). Composite services are recursively defined as aggregation of elementary and composite service (Dustdar and Schreiner. 2005).

Registry Infrastructure
With the number of services growing in today's computing environment, the need for meaningful cataloguing is a must. More so, as the number of consumers and services grow, remembering which service provides what functionality and the location of the service endpoint in order to send messages becomes very important. In the case of a small-scale SOA model, dependability upon high degree of human knowledge and interaction may be required but for a growing or large-scale SOA model, there is the need for a more effective and sustainable approach. In order to employ the full potentials of web service, the web service paradigm must be supported by an appropriate service publication and discovery infrastructure (Pilioura, Kapos and Tsalgatidou, 2004). At present, the most prevalent standard for WS publication and discovery is the Universal Description Discovery and Integration (UDDI) specification.
The UDDI information structure has four levels: The top level is the business entity level that provides the general data about the company, such as its address, a short description, contact information and other general identifiers. Associated with each business entity is a list of business services, including the description of each service and the categories of the service, for instance purchasing, shipping etc Also, within a business service, one or more binding templates provides more technical information about the web service.

Benefits of Web Services Scalabilty
Relying on a single instance of a web service is risky. It may become unavailable due to failure or even overloading. To reduce this inherent risk, many instances should be created thereby making the web service scalable and available at all times. The importance of scalability is even more obvious given the fact that web services are inherently poorly scalable (Birman, 2005a, Cignek et al, 2006. Scalability talks about the ability of web services to be massively deployed on a large scale on different platforms and still maintain availability (Ekoubase and Onibere 2011).
Scalability assumes that instances of web services can be created. A web service is thus composed of several instances. In the event that an instance is "deceased", the dependability quality property of web services requires that the load on the "deceased" instance is transferred to another instance. Scalability therefore comes handy in balancing load assuming the load exceeds the weight of an instance. An instance of a web service among other configuration properties has a weight. The weight of an instance determines the resources it needs to be instantiated. There are several approaches for the composition of web services. One prominent example is the Business Process Execution Language (BPEL) for web services (OASIS, 2007).
The preceding discussion laid a foundation for this study in terms of a background study. The rest of this paper is organized as follows: Section 2 reviews related work necessary to better understand the technical aspects of this work, with an emphasis on the replication technique as being central to achieving web service scalability. Section 3 describes the details of the procedures and tools used to show that the PHP software development environment can be used to build scalable web service applications. In Section 4, experiments using both java and PHP development environments will be performed using the procedures and the APACHE Jmeter as a tools. The results of the experiments will be discussed with the aim of comparing the results obtained in both application development environments Section 5 concludes the paper and outlines current and future lines of research.

Review of Related Literature
Replications, clustering and parallel computing (Loukopolous, Lampsas and Ahmad 2005) (Loukopolous, Lampsas and Ahmad 2005) are well known as solutions to problems of database scalability. They have been used to enhance the performance and availability of databases. Related to the discussion on how replication, clustering and parallel computing has been used to enhance the performance and availability of databases are Database Management Systems (DBMS) (Perez, Garcia-Carballeira, Carretero, Calderon and Fernandez (2010); Mobile Systems (Tu, Li, Xiao, Yen. and Bastani (2006) and Large-scale systems and data grid systems (Ranganathan and Foster, 2001;Chervenak, Deelman, Foster, Guy, Hoschek, Iamnitchi, Kesselman, Kunst, Ripeanu, Schwartzkopf, B, Stockinger. and Tierney (2002).
All of the solutions related to achieving might have been first used to enhance the performance and availability of databases but they have found its application in other areas especially web service scalability. All the solutions rely on the ability to create multiple copies of data or service and to have multiple computers (that may or may not be at the same location) working together to provide the user with access to data or service. In this way, the solutions are closely interconnected and sometimes referred to interchangeably. However, there are subtle differences that are worth reviewing.
In a distributed environment, data replication can take place in a distributed storage this includes: (1) Distributed DBMS (2) Peer-to-Peer Systems (3) Data Grids (4) World Wide Web Replicating database defines a database where multiple copies of some data items are stored at multiple sites or nodes (Goel, Sunshant, Buyya and Rajkumar, 2006). Poor scalability can result into poor system performance, hence the need to evolve better replicating strategies to improve performances especially in distributed systems. Data replication is a major technique used in distributed system to meet the challenges of high availability and improved data access performance. Data replication increases data availability, reduces user waiting time, increases fault tolerance and improves scalability. Static data replication strategies follow a deterministic approach were the number of replicas to be created and the node to place the replica is well defined and pre-determined. i.e when and where to create replicas are determined before commencing the execution of the application. In static data replication is done randomly on randomly chosen nodes for a fixed number of times. Some examples of static replication approaches in a cloud includes: (a) Google File System (b) Hadoop Distributed File System (c) Amazon Dynamo. Ghemawat et al. (2003) designed Google File system (GFS) used for scalable distributed file systems in data intensive applications. This file system support reliable, efficient access to large set data using big cluster of cheap hardware. The GFS implements a static distributed data algorithm for Google cloud. In GFS, the replicas in the multiple chunk servers are dynamically maintained. The limitation of this approach is that a fixed replica number is used for all files which may not approach replication properly.
Hadoop Distributed File System (Bui, Shujaat, Eui-Nam, and Sungyoung 2016), is a storage component developed by Apache Hadoop. This follows a static distributed replication policy to provide availability and reliability of data. The number of replica size for each file size is configured at the time of file creation. The placement of replicas is done in such a way that two replicas are stored in two separate nodes in the same local rack and one in a separate remote rack. The hadoop replication strategy improves data reliability, availability and network bandwidth utilization. The drawback of this approach is that access behaviour is not taken into consideration for replicating data. Dynamic replication policy is considered in such scenario where the replicas for each data is decided based on access popularity of data (Wei et al. (2010); Abad et al. (2011).

Clustering Architectures
Scaling web services means that as many instances of a web service as required are created to make the system fault tolerant. With redundant instances, it is easy to fail over from a "deceased" instance to another live instance. By default, the instances of a web service runs on one or more computers called clusters. Parallel computing is a particular example of cluster computing where multiple computers work together to provide some function or function whereby multiple computers each perform some sub-function. In the context of databases, a common example of parallel computing is a parallel database. Here multiple computers each process a subset of a query based on a subset of the data that they have access to. Sharded databases, or most NoSQL databases (MongoDB, Cassandra for example) are examples of this kind of system.
There are many kinds of clustering with implementations in different tiers. In every tier, it may be named differently meaning the same thing like virtualization, partitioning, mirroring. Some of the clustering tiers include:

Replication Oriented Architecture (ROA)
Replication is a mechanism whereby data or a web service is made available in more than one piece. In the simplest case of replication, there is a master and a slave. This master and slave arrangement could be in one computer or multiple computers. Where replication is implemented in more than one computer, the master copy could be in one computer and the slave in another. The computers may be in the same location or they may be located in geographically disperse area and connected via a communication network. The master and slave are first synchronized and after that, any change to the master is replicated to the slave. The changes may be replicated synchronously or asynchronously. Databases provided replication natively (semi-synchronous replication in MySQL) or one could use additional software (for example, Galera or Tungsten). Disk mirroring in either hardware or software may also be considered as forms of replication.
In situations where replication is implemented in more than one computer, the computers are said to be a cluster. Clustering is therefore a generic term used to describe a class of techniques where many computers work collectively, and perform some function or functions. For example, if data is replicated between two locations then it is possible for a database to access each data set and answer queries submitted to it. In such a system, a copy of data can be in one location and accessed by one database instance and data in a either the same or a different location accessed by a second database instance. Then these two database instances would be considered to be a database cluster. In this example, each database instance is able to completely answer queries against the data. Oracle Real Application Clusters (RAC) is an example of a system of this kind.
Drawing from the experiences garnered from replicating databases, several specifications have been developed for the replication of web services. In the work of Farouk, O., Badawy, O., Youssef, M.
[34] they proposed an architecture that integrate replication and clustering to provide reliability, availability and scalability of web service. In their approach, they posit that replication and clustering are needed to achieve availability and scalability. In their architecture, they proposed an N-tier architecture where components were divided into Four: Clients, Interface servers, Application servers, and Database Servers. They deployed replication in the interface servers and both clustering and replication for the application servers. However, this architecture with its manifold benefits is complex to implement and interface issues may present a challenge. (Liu et al, 2004) presented in his research a frame work to publish up-to-date QOS information for web services in which the success depends on the mechanism of the feedback from the users about the quality of service they consume.
Jaeger, Goldman and Muhl (2004) proposed a mechanism which could be more efficient by using an aggregation scheme for QOS aspects. The scheme and approach as proposed by the authors has a challenge in that, service for composition are chosen sometime before execution, the QOS parameters changes during service execution, the QOS demands of a user may be violated even if no issues are found during service selection time. Replication is seen by this author as a possible solution to deal with dynamic QOS on performance, high availability and fault tolerance. Several copies of service are used instead of running single copies. There are a lot of other replication architectures or strategies example includes: The Gossip Architecture or Quorum Consensus, the Double Quorum architecture, and Cassandra. However, these strategies are only worth mentioning but are not considered.
According to ThiBen and Brambring (2018), originally these replication strategies were designed for data/database but soon there was a transfer to object replication. However, there exist approaches to implement these concepts into service replication. Ye and Shen (2005), discussed the implementation of reliable web service by using active replication. In his architectural approach, proxies for separating a user from the web service. In this approach, the proxy accepts all request from the user and is responsible for ensuring consistency in the execution of the several replicas. A user can only view a single proxy on which it sends a request to, but in the background, the proxy sends data or request across all other proxies or recipients simultaneously (multicasting), each recipient or proxy hiding one of the web services of the group. This approach focuses on reliability of web services and only active replication is implemented. However, when trying to look at several QOS aspects, this approach is too inflexible. In a similar case, (Chan et al, 2007) approach is focused only on reliability, other QOS is not dealt with. Quality of Service (QOS) is a broad concept that can involve a number of context-dependent non-functional properties such as privacy, reputation and usability (Liu, Ngu and Zeng, 2004). More so, in the work of (Salas, Perez-Sorrosal, Patino and Peris, 2006) web service replication as an approach was carried out with a goal of providing highly available web service in a wide area network. Again, this approach made use of active replication to achieve high availability and introduced a multicast mechanism to communicate between replicas.
In (Ekoubase and Onibere, 2011), the proposed architecture for web service scalability is server side and it has been noted from the web service solution test presented that the scalability of web service is significantly better when built on ROA. This view is also supported by (Thiben and Brambring, 2018).
In the work, it was shown that ROA improves the web scalability by 31.7% with 90% of confidence. The web service architecture as proposed by Ekuobase and Onibere (2011) was however implemented on the Java Enterprise Application platform to neglect of other competing enterprise application development platforms.

Php Web and Enterprise Application Development Platform
PHP is currently one of the most popular languages used in open source community and in industry to build large web-focused applications and webservices (Dudhe and Sherekar, 2014). PHP has evolved over the years from a scripting language to an Object Oriented Programming (OOP) Language thereby providing the web development community with all the powerful benefits of OOP. With the evolution of PHP from the very first version to the current stable version 8.0, several aspect of the language has evolved: the use of libraries, removal of some functions, stability of user interfaces (Kyriakakis and Chatzigeorgiou. 2014).
PHP has gained maturity over the years by the number of growing open source community and the number third party libraries and APIs used. In terms of speed, PHP there has been tremendous improvement starting from versions 5.6 to the stable 8.0. In terms of frameworks, PHP boast a lot of open web frameworks such as symphony, cakePHP, Zend, Laravel and others thereby adding speed to web development which eases the development of SOAP and RESTful Webservices.

Methodology
The purpose of scalability testing is to check whether our system scales appropriately to the changing load. It is expected that a larger number of incoming requests should cause proportional increase in response time. The proposed architecture will be built to reflect this property.

Proposed Web Service Architecture
We chose to design a simple fictional web application called Students information system that pushes data to a backend that exposes its functionality as a web service based on our proposed web service replication architecture to test for scalability. The Use-Case and corresponding class diagram of the application are depicted in Figures 1 and 2.  In this implementation, a three-tier components architecture is proposed. The components are web component, application component and the database or backend component. The architecture is depicted in Figure 3.
A cluster with a single node in the application server is used with two instances created. Each instance with its configuration is deployed as a multi-tier application. The goal of this solution is to find metrics and check if the solution scales appropriately in response to increasing load based on the architectural design. The message brokers system is used due to the following advantages: (1) To process background jobs. When application needs to process a lot of data. E.g an email notification from an online ecommerce system. (2) To process message later when application is to. (3) Scaling.
Hardware Tools: the tools used here were HP Pavilion Notebook with the following configurations: IntelCore (TM)i5-2410M,CPU@2.30GHZ Dual Core, 6.0GB installed memory and 600GB of Hard disk was used for developing the prototype.
Software Tools: We chose to discuss our software tools under operating system, Integrated Development Environment (IDE), Development language. We chose to use XAMPP server not because it is silver bullet but because of familiarity and its good features with respect to deploying large web applications solutions. Windows 10 operating system was used not because of lack of other operating systems, but just for familiarity.
Rabbitmq: is a message oriented middleware tool that allows to communicate and exchange data by sending and receiving messages. It uses the AMQP (Advanced Message Queuing Protocol).
Docker: is a container management software with images, volumes and container. An image is a blueprint (structure with instructions) for building a container. Images are made up of layers.
Backend Database: by this we mean the relational database of choice of our application. There were several DBMS but in a small to medium application, the choice of MySQL database was deemed adequate for our solution.
Language and IDE: the language of implementing our application was Java programming and the Integrated Development Environment (IDE) of choice was Visual Studio. This tool was chosen not because it was better than other development environments like PHPStorm or Netbeans IDE but because of its familiarity and seamless compatibility with PHP and Mysql.
Simulation Tools: Apache JMeter 5.0 was used as a simulation tool. This is because of its extensive features and very vast array of listeners and comprehensive GUI. The researcher is aware of other simulation tools like SOAPUI, SOAPPro, MatLab etc

Implementation
In order to test ROA architecture as designed, Rabbitmq was installed with docker and our application deployed. Our Application was a fictitious Web Service application with a Mysql backend and able to display request of its content in JSON for other application to consume. The message queing middleware (RabbitMQ) is akin to Java Message Service in java. The request and response is handled by the RabbitMQ. A file called Dockerfile was created in order to provide docker configuration for our PHP web service application and its backend. The Dockerfile is a file that is used in building an image this file consist of various configurations for the rabbitmq that serves as a blueprint for a Docker image. The application was then simulated using Apache JMeter and the response time of the application was determined. We kept the number of request (sample size) similar with the Java ROA solution.
Apache Jmeter is simulation software that is designed to test and measure the performance and functional behaviour of client applications. It is one of the most popular and widely used open source, freely distributed testing application. JMeter was developed by Stephano Mazzochi of the Apache Software Foundation. It was primarily designed to test the performance of Apache JServ which was later substituted with the Apache Tomcat Project (Emily, 2008). Ever since its first release, JMeter has since developed and evolved to load-test FTP servers, database servers, java servlets and objects. JMeter is written in java and is highly extensible through a provided Application Programming Interface (API). JMeter works by simulating at client side of a client/server application. JMeter has been widely accepted as one of the best performance or load testing simulation tools for web applications and various companies have adopted Apache JMeter as a performance testing tool (Emily, 2008). Some of the companies includes: SharpMind of Germany for functional and regression testing, AOL for load testing of websites, ALALOOP of France has used JMeter since 2008 for performance testing of many web applications.

Results and Discussion
The results obtained from the two experiments performed on Apache Jmeter are depicted in tables 1 and 2. Table 1 shows the test results the ROA architecture built using PHP while Table 2 shows the results of the architecture using Java. Both experiments used the same dataset and ramp up period.   The results obtained are as shown in tables 1 and 2 were subjected to calculation using MS-Excel application. It is also important to note that the ramp-up period was chosen at random but it was chosen in a way to mimic real life load on the application solution.
Where  represents the user degradation tolerance. The computational strength Sij for an application i, for j request per unit time is given by: (Ekuobase, and Onibere, 2013).
Where C is a constant denoting server and hardware strengths, Tij=application throughput per unit request, Tij is converted to per milliseconds (ms), Rij =application mid response time set of request or sample in milliseconds (ms), i represents the application, where i=0(1) , 0 for the conventional approach solution and 1 for the ROA. j=number of samples. C=1, because we are comparing the computational strength with each other under the same hardware and software.
Let the samples, X and Y be the performance degradation at the Java ROA approach and our PHP ROA approach and their means µx and µy. We seek whether or not our solution built on PHP ROA approach will improve scalability.
We proposed hypotheses: H1: µx<µy (java ROA approach is not significantly scalable to ROA built on PHP). H2: µx>µy We make certain assumptions: (1) That our sample size is normally distributed.
(3) Our data points (sample size) are the same.
(4) We have the same variance. Where n1and n2 are the number of samples, Sx and Sy are the standard deviation and Sx 2 , Sy 2 is the variance between the two samples.
Using the Null Hypothesis, H0 (There are no significant difference between the samples), using the 0.05 (α level) probability,=> 95% times the null hypothesis will be rejected and only 5% the null hypothesis will not be rejected.
=38. The critical value from the two tailed T-Test=2.021.
Also, using descriptive statistics in our Microsoft Excel to calculate the confidence level for the mid response time, we calculate the confidence interval for Table1. The Application solution built on ROA Java solution) result as 22.5% improvement with 95.0 confidence level. Furthermore, we also calculated the confidence interval for Table 2. The Application solution built on ROA PHP solution) 50.9% improvement at 95.0 confidence level.
We conclude that since our T value is lower than the critical value, we don't reject the null hypotheses. It therefore means there is nothing statistically significantly different between the samples from both architectures.

Conclusion
Based on our analysis, we generally conclude that there are no differences in the scalability of web services built with ROA using Java and its equivalent in PHP. However, it was observed from literature reviewed that web services developers are now tilting towards the use of PHP for developing web services due to its popularity and available of several frameworks. This paper did not delve deeply into some of the areas that are considered novel in the general area of web services but this work is a foundation based on which such areas will be studied in our future research. Such areas include the following among others: (1) The area of building web services as a microservice.
(2) Docker/ kubernetes (Container Orchestration) for large container management of web services deployed in containers.
(3) Web Services security. Web services being distributed across different platforms on the internet faces the challenge of security.