Capture-recapture Techniques in Estimatingthe Hiv/Hepatitis Prevalence in Gboko, Benue State Nigeria

Capture-Recapture technique is used to estimate the number of individuals in a population. This study aimed at estimating the number of HIV/Hepatitis co-infection individuals in Gboko Local Government Area of Benue state, in order to plan or create more awareness of HIV/Hepatitis and to take control measures and prevention. The data were sourced from three data sources namely; General Hospital Gboko, Myom Hospital Gboko and NKST Hospital Mkar. The records showed that, between year 2015-2018, 1,205 HIV/Hepatitis patients registered in Gboko Local Government Area, 662 (54.94%) were identified in General Hospital (GH), 357 (29.63%) in Myom Hospital (MH) and 186 (15.44%) in Nongo Kristu u ser u Sha Tar (NKST) Hospital. Matching the records revealed that, 62 HIV/Hepatitis co-infection individuals or patients were captured by the three data source all together; 31(50%) were captured in General Hospital and Myom Hospital, 15(24.19%) in General Hospital and NKST Hospital, 5(8.07%) in Myom and NKST Hospitals and 11(17.74%) in General Hospital, Myom and NKST Hospitals. The three (3) health facilities were treated as 3 source capture-recapture (C-R) data to estimate the population size of HIV/Hepatitis co-infection in the study Area. The data was analyzed using record linkage/matching and log-linear model techniques, also R statistical package software was used to run the data. The results show that, the model that best fit the data is model (4) with interaction GH+MH+NKST+NKST*GH+NKST*MH with the estimated HIV/Hepatitis co-infected patients at 7182 with a 95% confidence interval of, 5348-10128. The total coverage stood at 16% , this implies that only 16% of the HIV/Hepatitis infected persons are recorded in the study area. Thus expanded data collection and analysis is recommended.


Introduction
Statistics deal with the collection, organization, summarization, presentation and analysis of data to the end that, valid conclusions are drawn to help in making reasonable decisions based on such analysis, including the planning of data collection in terms of the design of surveys and experiments. Two main statistical methods are used in data analysis: descriptive statistics, and inferential statistics. Descriptive statistics is the method of organizing, summarizing, and presenting data in a convenient and informative way, it also describe the data set that is being analyzed, but does not allow us to draw any conclusions, while inferential statistics is the process of making an estimate, prediction, or decision about a population based on a sample or make any interferences about the data (Romijn, 2014).
The capture-recapture technique originally developed for animal studies has been applied to human populations under the term multiple-record system for an arbitrary number of lists, and under the term dual-system especially for two lists. The earliest references to such applications can be traced back to Petersen in studies of marine fishes and to Lincoln in 1930 who used the method to the study of waterfowl populations, and is often referred to in wildlife applications as the Lincoln-Petersen estimator (Laska, 2002). The method was originally design to estimate animal population that are difficult to enumerate. Earlier application of this method on human population includes; (Sekar and Deming,1949) for two samples, (Wittes and Sidel, 1968) for three samples, (Fienberg ,1972) for five samples and (Wittes ,1974) for four samples. The method involves marking a number of individuals in a natural population, releasing them back to the population, and subsequently recapturing some of them as a basis for estimating the size of the population at the time of marking and release. The method is based on the principle that if a proportion of the population was marked in some way, returned to the original population and then, after complete mixing, a second sample was taken, the proportion of marked individuals in the second sample would be the same as was marked initially in the total population. The population of interest may be considered as "closed" or "open".
For a closed population, it is when the total number of individuals in the population does not change through birth, death or migration so that the population size remained constant over the time period of capturing. In a typical closed population capture-recapture procedure, it is assumed that, the interval between the preliminary capture period and the subsequent recapture period, cannot affect the proportions of marked to unmarked animals (that is, no new individuals were born or immigrated into the population, and none died or emigrated). That, the chance for each individual in the population to be caught are equal and constant for both the initial marking period and the recapture period. Other assumptions are that; the animals do not lose their marks and that sufficient time must be allowed between the initial marking period and the recapture period for all marked individuals to be randomly dispersed throughout the population. For an open population, the phenomena of births, deaths, migration are allowed to continue during the time of data collection. Open population methods allowed for estimating the changing population sizes, the survival rates and the number of individuals entering the population (Armstrup & McDonald, 2001).
The two types of capture-recapture models are namely; the discrete-time and continuous-time models. In a typical discretetime model, the target population is sampled several times (or over a certain number of occasions). After each sampling, one assesses or checks the mark and records first-capture or recapture for each capture. A unique tag or mark is attached to a first-capture, whereas for a recapture its tag number is recorded. For a continuous-time model, in addition to the tagging process it also records the exact capture times for each animal. Any capture is regarded as a trapping occasion and the exact time for each occasion is recorded (Hwang and Chao, 2002).
The purpose of this research is to estimate the number of infected individuals with HIV/Hepatitis co-infection in the study area to facilitate planning about the provision of health services for this population. Therefore, this study was conducted to estimate the size of the population living with HIV/Hepatitis who are neither diagnosed nor registered with any of the available data sources.

Review of Related Literature.
Capture-recapture (C-R) or markrecapture is a method for estimating population size and other parameters, based on ratios of marked to unmarked individuals. Biologists and ecologists have long recognized that the recapture information (i.e. overlap information) collected by marking or tagging can be used to estimate the number missing from all the samples, (Seber1982, 1986(Seber1982, , 1992 and (Schwarz and Seber,1999) provided comprehensive reviews on models of estimating animal abundance and on capture recapture models in particular. The purpose of many epidemiological surveillance studies is to estimate the size of a population characterized by a rare trait by merging several existing but incomplete lists of the target population. Yamusa and Jibasen (2017), performed a three source capture recapture estimate of the number of HIV positive pregnant women in Gembu, Taraba State Nigeria. Log linear model was used to analyze the capture recapture data, their results showed that the estimated completeness of the three sources was 14 % with the total prevalence of HIV in pregnant women estimated at 4372 (95% CI [2274 -9797]). The validity of the results was tested and was found to corroborate with the result obtained from existing alternative models. The result shows that capture recapture technique has the ability to provide comprehensive data with a near accurate estimate. In conclusion, the study shows that capture recapture procedure has the potential to provide reliable and accurate estimate of HIV positive pregnant women and other difficult-to-access population. Frank et al (2015), used two source capture-recapture technique to estimate TB-HIV Co-infection in Netherlands and estimated the prevalence and under-reporting in national registration databases using a capture-recapture analysis. Their study revealed that, out of the 932 TB-HIV infected patients, 293 (31.4%) were registered in both registers. Underreporting of TB-HIV co-infection ranged from 50% to 70% in the national TB register, and from 31% to 37% in the HIV database. The study concludes that, TB-HIV co-infection is markedly underreported in national disease databases and that there is an urgent need for improved registration and preferably a routine data exchange between the two surveillance systems. Jala, Younes and Farzad (2017), applied a three source capture-recapture technique on HIV positive population. Three incomplete sources of HIV-positive individuals, with partially overlapping data, were used, including: (a) transfusion center, (b) volunteer counseling and testing centers (VCTCs), and (c) prison. Out of the 2,456 HIVpositive patients registered in these 3 data sources, 1,175 (47.8%) were identified in transfusion center, 867 (35.3%) in VCTCs, and 414 (16.8%) in prison. After the exclusion of duplicate entries, 2,281 HIV-positive patients remained. Based on the capture-recapture method, 14,868 (95% confidence interval, 9,923 to 23,427) HIV-positive individuals were not identified in any of the registries. Ligia et al., (2013), estimated the number of HIV positive pregnant women in the state of Sergipe, Brazil using capture-recapture. The data coverage was between 2000 and 2010 and three database were used as independent lists; the Brazilian case registry database (SINAN), laboratory test control system (SISCEL) and medical data records of the STD/HIV/AIDS service of Sergipe (CEMAR). A log-linear regression model was used to ascertain the population size. The study identified 729 HIV sero-positive pregnant women from the three lists, among them only 317 (43.5%) were included in the SINAN, 646 were included in SISCEL and 274 (37.6%) appeared in the database. Thus, it was discovered that there were 1110 HIV sero-positive pregnant women, therefore 381 (34.3%) were not captured by any of the three systems. Lucy et al., (2016), conducted a research on prevalence and burden of HCV co-infection in people living with HIV: a global systematic review and meta-analysis, from 31,767 citations identified, 783 studies met the inclusion criteria, resulting in 902 estimates of the prevalence of HIV-HCV co-infection. In HIV-infected individuals, HIV-HCV coinfection was 2·4% (IQR 0·8-5·8) within general population samples, 4·0% (1·2-8·4) within pregnant or heterosexually exposed samples, 6·4% (3·2-10·0) in men who have sex with men (MSM), and 82·4% (55·2-88·5) in people who inject drugs (PWID). Odds of HCV infection were six times higher in people living with HIV, 5·8, 95% CI 4·5-7·4) than their HIV-negative counterparts. Worldwide, there are approximately 2,278,400 HIV-HCV coinfections (IQR 1,271,300-4,417,000) of which 1,362,700 (847,700-1,381,800) are in PWID, equaling an overall co-infection prevalence in HIV-infected individuals of 6·2% (3·4-11·9).

Materials and Methods
Capture-recapture was applied on HIV/Hepatitis co-infections patients' records between the year 2015 to 2018 in Gboko Local Government Area of Benue State, Nigeria. The three major health facilities (Hospitals) in the study Area were considered, namely; NKST Hospital Mkar (NKST), General Hospital Gboko (GH) and Myom Hospital Gboko (MH). The records from the three hospitals were matched for commonness and differences. The matching was done using the three registers to identify cases common, the variables considered for the matching are: (1) Surname, (2) first name and other names (3) date of birth or age, (4) gender, (5) individual location or geographic area. Individuals were considered to belong to a two list or the three, if any of these three variables are correctly matched between the two or three lists.
The analysis was performed using the public domain R statistical package (R foundation for Statistical Computing Vienna Australia), the model incorporates the expected value of the random variable, the total population size, the probabilities of being on each list, and an interaction parameter that accounts for list dependency. The output of the analysis produced an estimate of the population size, confidence interval and the Akaike Information Criterion (AIC), for selection of the most appropriate model. The log-linear model technique is commonly used for epidemiological data. Log-linear models that incorporate list dependence were first proposed by Fienberg (1972) for dealing with human populations. The general log-linear model for three categorical variables is given as,

Log
is the log of the expected cell frequency of the cases for cell in the contingency table.
= is the overall mean of the natural log of the expected frequencies.
the main effect for variable 1 the main effect for variable 2 the main effect for variable 3 the interaction effect for variable 1 and variable 2 the interaction effect for variable 1 and variable 3 the interaction effect for variable 2 and variable 3 the interaction effect for variable 1, variable 2 and variable 3

Tables.
A three-way I*J*K cross-classification of response variables 1, 2 and 3 has several potential types of independence. We assume multinomial distribution with cell probabilities Similarly, variable 1 could be jointly independent of variable 2 and variable 3, and for 3 to be jointly independent of variable1 and variable 2. Mutual independence (2) implies joint independence of any one variable from the others. 1 and 2 are conditionally independent, given 3 when independence holds for each partial table within which 3 is fixed. That is, if and k For joint probabilities over the entire table, equivalently and k (6) Conditional independence of 1 and 2, given 3, is the log-linear model Log (7) This is a weaker condition than mutual or joint independence. Mutual independence implies that 2 is jointly independent of 1 and 3, which itself implies that 1 and 2 are conditionally independent. The Table 1 summarizes these three types of independence, (Agresti, 2002). The description of main effects, two interaction effects and three interaction effects variables to each model are presented in Table 2.  (12,13,23,123) In model selection criteria, Akaike information criteria (AIC) was apply, the AIC was calculated as follows: (8) Where = is the likelihood ratio statistic associated with the fit of any model to the data.
= the number of estimated parameters included in the model (i.e. number of variables and the intercept).
Here, variable 1, represent data collected from General Hospital, Gboko (denoted as GH), variable 2, represents data collected from Myom Hospital (denoted as MH), while variable 3 stands for data collected from NKST Hospital Mkar (denoted as NKST), each variable has two levels; , , = 1, 2. Level 1 stands for patients who tested positive for HIV/Hepatitis coinfection, while level 2 is for negative results.

Results and Discussion
The total number of 1,205 HIV/Hepatitis co-infection patients registered in the three data sources, out of which 662(54.94%) registered at General Hospital Gboko; 357(29.63%) at Myom Hospital Gboko and 186(15.44%) at NKST Hospital Mkar. After duplicate entries were removed 1,132 HIV/Hepatitis co-infection patients were identified and registered in more than one data source, showing that there were 62 multiple cases of HIV/Hepatitis co-infection recorded in the study area. The result of the matching show that, GH captured 605 patients; NKST recorded 155 cases only; while 310 cases from MH were captured, GH and MH captured 31 cases only, 15 cases were identified in GH and NKST, while 5 patients were captured in MH and NKST. Also, 11 patients were captured in all the three data sources. The data was arranged as shown in Figure 1 .

Capture -recapture model selection
The model fit was assessed using the deviance statistics and the model was selected based on the smallest Akaike information criterion (AIC). Based on the result, model (4) which is (GH+MH+NKST+NKST*GH+ NKST*MH) was the best fit, with the smallest AIC value of 81.43, and with the smallest deviance residual of 29.16 with the estimated value of 7182 ( 95% confidence interval, 5348-10128).

Estimated Recaptures
The analysis of the recapture individuals is presented in Table 2.

Discussion and Findings
To estimate the number of HIV/Hepatitis individuals is very essential. The approach of capture-recapture used in this research was originally used in wildlife research over hundreds of years ago (Tilling, 2001). The technique can be used to estimate the size of rare and elusive populations, which are difficult to find and count or which are very capable of being moved and cannot be counted at one time. Hence, the method has being applied to estimate the size of those hidden or difficult human populations. This study used the method to estimate the number of HIV/Hepatitis co-infected individuals in Gboko, Benue State.
Findings of this study are that; a total of 62 HIV/Hepatitis co-infection patients were registered in more than one data source (health facility), all in Gboko town, GH recorded highest number of HIV/Hepatitis co-infection coverage with 54.94%, followed by MH with 29.63%, while NKST recorded the least with 15.44%. Further analysis on recaptures revealed that, GH and MH had 50% of all recaptures; GH and NKST with 24.19% of all recaptures, MH and NKST had 8.07%, while 17.74% patients were recaptured all the three data sources, these indicates that GH and MH got the highest number of multiple registered in the study area.
The result of the main finding revealed an estimated population of about 7182 HIV/Hepatitis co-infected persons in the study area between 2015-2018. Thus, 6050 were not identified by any of the data sources. This shows that about only 16% were registered while 84% were not identified, this result is an indication of a serious health implication.

Recommendation and conclusion
Based on the result of analysis, we have seen that capture-recapture approach can be used to estimate the size of rare, elusive, difficult to assess and count populations. Therefore, the following are recommended; i.
There should be expanded data collection and analysis on all epidemic cases, because only 16% data coverage was achieved in this case. ii. The Revised national HIV/AIDS strategic framework 2019-2021 should be pursued vigorously iii. The Nigeria National HIV/AIDS indicator and impact Survey (NAISS) should include other co-infected diseases such as hepatitis, and that statistical tools such as capture-recapture methods should be employed before conclusions are drawn. iv. The implementation strategies should be made for regular diagnosis in order to reduce or control the rate of transmission of the disease.