Predicting Life Expectancy:
A Cross-Country Empirical Analysis
Department of Economics CUB 256
University of Colorado
Boulder, C0 80309-0256
Philip E. Graves
Department of Economics CUB 256
University of Colorado
Boulder, CO 80309-0256
*We would like to acknowledge Ann Carlos, Nicholas Flores, and Donald
Waldman for helpful comments on an earlier draft, while absolving them
from responsibility for any remaining errors.
Most economic research on life expectancy focuses on building forecasting
models using mortality trends or constructing parameter life expectancy
models with samples of individuals. We here provide a cross-sectional model
of life expectancy, using a comprehensive worldwide sample, which analyses
the impact of country level variables on average life expectancy. The model
variants suggest robustly that proxies for technology, education, disposable
income and healthcare all have a significant and positive effect on country
variation in average life expectancy, at all income levels. A proxy for
the health risks/epidemics factor is significantly negative. This analysis
provides information of use to governments, particularly in the developing
world, since average life expectancy is predicted with high explanatory
power by variables that can be influenced through public policy. Indeed,
it is seen that quite low-cost policy interventions can have dramatic impacts
on life expectancy, in addition to other benefits of those interventions.
Nearly 500 years after Juan Ponce de Leon's search for the Fountain of Youth in 1513, the general public as well as researchers in many disciplines (medicine, biology, political science, economics, ethics, sociology, and epidemiology, among others) remain fascinated with prolonging life and avoiding death. Recent opinions as to the extent to which prolonging life expectancy is possible are quite divided. Olshansky et al. (2001) take the view that combined male-female life expectancy at birth is unlikely to exceed 90 years in the 21st century without scientific advances allowing modification of fundamental aging processes. However, progress in the developing world, where life expectancy is usually constrained to 45-55 yrs, is clearly available, through alterations in socioeconomic and medical conditions, as discussed here. But more surprisingly, Oeppen and Vaupel (2002) in a review of longevity data from many developed countries, find that life expectancy increases steadily by three months a year, per year. These authors suggest that there is no natural limit on life expectancy.
One approach to gaining insight into this area is to examine the impact of variables (or variable proxies) that make life expectancy differ among countries of varying levels of development. Projections for future life expectancy for particular countries based on predicted values of the independent variables from such a cross-sectional model might well be useful, based on the Oeppen and Vaupel work. However, given the less optimistic scenario of Olshansky et al. one would be hesitant to extrapolate life expectancy growth for countries with high life expectancy as the effects of the various independent variables would be unlikely to remain constant, with the extent of non-linearity largely unknown.
In Section II, we briefly discuss some approaches to understanding life expectancy found in the existing literature. Prior to presenting the empirical results of Section IV, section III describes the data for the analysis undertaken here. Section V concludes the paper with a summary and interpretation of the results along with suggestions for how the analysis can inform government policy-making.
II. Prior Work
Life expectancy is an extremely broad subject, hence the present discussion must be selective. In biology, for example, great emphasis is placed on telomere shortening, oxidative damage, and altered gene expression. Breakthroughs in any of these areas could have profound worldwide effects on life expectancy.(1) Similarly, hotly debated bioethical concerns revolve around stem cell research and somatic and germ cell line interventions to eliminate genetic diseases thus extending life expectancy. Political concerns join these bioethical issues with debates relating to population demographics (e.g. viability of the social security system), overpopulation in developing regions, and the labor force implications of the higher dependency ratios associated with greater life expectancy.
Considerable work has been conducted on the potential lifespans of human beings (see, for example, Carnes, et al. 1996 and Finch and Kirkwood 2000). Closely related are many studies of life expectancy (e.g. Manton, Stallard, and Tolley 1991, Olshansky et al. 2001, and Olshansky et al. 1990). The more traditional view is that life expectancy is unlikely to grow to more than 90 to 100 years, at best, and to obtain the upper end of this range would involve eliminating all current aging-related causes of death.
The visionary group agrees that mortality is mostly due to senescence, but they adjust their estimated value for maximal life expectancy for projected biomedical advances that could potentially raise the limit to 100-125 years. Lastly, the empiricist approach suggests that "we are not currently near a life expectancy limit, because mortality is declining and progress is being made in the treatment and management of the chronic diseases and disabilities that dominate mortality at later ages" (Manton, et al. 1991, 603).
Tulijapurkar and Boe (1998) review the development of mortality, the deviations in mortality variables, and different methods of forecasting mortality. Their analysis is partially on a worldwide scale, but is also more specifically focused on Canada, the United States, and Mexico. They find that although short-term mortality rates are very erratic, over long time spans mortality rates tend to decrease. Since mortality and life expectancy are inversely related, as mortality decreases, life expectancy increases in a similarly irregular manner.
Tulijapurkar and Boe (1998) also briefly review the history of mortality research. On a larger scale, of many countries or over many decades, research has focused on the causes and correlates of mortality. For research on a smaller scale, studies focused primarily on details of mortality change, usually with respect to age, sex and causes of death. Previous studies indicate that the causes of mortality have shifted from short term, bacterial diseases that arose mostly from poor sanitation to chronic, long-term illnesses such as cancer, AIDs, and heart disease. In developing countries where bacterial diseases are still prevalent, the average life span is often constrained to about 45-55 years. On the other hand, in more developed countries where chronic diseases have taken over as the main causes of death, the average life expectancy usually reaches at least seventy years.
The mortality decline in the nineteenth and early twentieth century in the developed world resulted mainly from medical advances, public education, and importantly, a rising standard of living. The rising standard of living tends to result in more adequate sanitation facilities, cleaner water, greater access to medical care and many other facets that decrease bacterial and viral diseases (see Rogers and Wofford 1989).(2)
Tulijapurkar and Boe (1998) also examine the deviation in mortality across groups, such as by gender, marital status, education, race and ethnicity. They find widely varying mortality rates among these subgroups. However these authors do point out that much of the variation may be due to variation in the prevalence of common or endogenous risk factors among the subgroups. For example, men normally consume more cigarettes, work in occupations with more hazards, and exhibit and die from violence to a greater extent than do women. Marriage also incorporates the risk factor of health in that people who are healthier are more likely to be married than are sick people. Thus, subgroup mortality variation may represent much more than variation by gender, race, or marital status; to some degree, such variation reflects health, cigarette consumption, as well as other factors.(3)
The existence of many models and forecasting methods for life expectancy reflects the economic, political and social importance of this subject. We contribute to this literature by studying the impacts of country level variables on the variation of the countries' average life expectancy for a large (158) sample of countries at many stages of development.
Countless factors affect life expectancy at the individual level (e.g. nutrition, exercise, income, education, risk-taking, and stress). Even in studies employing large numbers of individuals the full gamut of potential variables cannot be properly measured and controlled. Hence, using countrywide data might offer the advantage that many variables that are important at an individual level may "wash out" in the process of aggregation. The analogy to the consumption function literature in economics is instructive: individuals have transitory components to their observed income, and those transitory components obscure the relationship between permanent or typical income and consumption behavior. By grouping people of like income together and aggregating to create average groupwise income and consumption, those transitory components (unusually high incomes for some and unusually low incomes for others) can be eliminated. Thus, countrywide analysis of life expectancy offers some advantages, and a comparison of results of this approach with other methods is fruitful.
Principal categories of variables that influence life expectancy are technology, education, disposable income, urbanization, inequality, healthcare, and health risks/epidemics. The general categories must, however, be proxied by specific, measured variables and such proxies are seldom perfect. Some categories (e.g. technology) might require more than one variable for adequate representation. In other cases a proxy variable might represent more than one category, possibly making it difficult to independently estimate coefficient effects (e.g. GDP per capita will reflect technology, urbanization, and healthcare at a minimum). The proxies discussed below are formally defined in Table 1 and the summary statistics for the proxy variables are given in Table 2.
Since "technology" is a broad and encompassing factor, it is subdivided here into two groups: basic and advanced. Basic technology consists of inventions that reach to most places in the world and permeate the developed countries. Since these inventions are usually relatively old, they are both ubiquitous and inexpensive or, if expensive, they are supplied by most governments. Basic technology, which should have a positive affect on life expectancy, is proxied, for present purposes, by the percent of the population with access to adequate sanitation facilities and an improved water source. Advanced technology, which should also have a positive affect on life expectancy, consists of recent inventions, many of which are not widely distributed worldwide mostly due to high costs and interface issues. GDP per capita is expected, among other effects, to partially capture the effects of advanced technology on life expectancy.
Education should have a positive effect on life expectancy because as education increases so does the knowledge of how to lead a healthier life. This knowledge might, for example, take the form of improved nutrition or reduced exposure to various health risks, such as indoor pollution exposures. Education is measured by the education index.
Disposable income is expected to be positively related to life expectancy for diverse reasons. As disposable income increases people have more resources for better shelter, food, and medical care. Again, countrywide data might offer some advantages over individual data: a wealthy person living in a poor country is unlikely to have the same access to quality food and medicine as a wealthy person living in a wealthy country. Since income is highly correlated with many other categories that would effect life expectancy (e.g. education, technology), it is also held constant to estimate, without bias, the specific effects of those variables.
Countries with similar average GDP per capita often have large variance among individuals, hence greater or lesser inequality. Holding GDP per capita constant, greater inequality should have a negative effect on life expectancy. This expectation stems from diminishing return of income on health. People who have extremely low incomes are much more likely to have compromised health due to poor nutrition and sanitation, a limited supply of basic vaccines, and so on. Since a country with high inequality may have a large poor population and a small wealthy population the number of people dying at young ages greatly outweighs the number of people dying at old ages in determining the aggregate life expectancy. The Lorenz curve would be the better proxy for inequality but, due to lack of data, the gini coefficient is used instead.
Urbanization might seem to have a positive effect on life expectancy because as urbanization increases so does the accessibility, other things equal, to life-extending resources such as doctors, medicine, and education. However, urbanization also exerts negative effects such as stress, pollution, and congestion (with more rapid spread of infectious disease). Thus other things equal, the effect of urbanization depends on the net impact of various positive and negative effects.
As the quality of healthcare increases, ceteris paribus, so should the average life expectancy. Although this category is partly reflected in income, education, and perhaps urbanization, the specific proxies for healthcare employed here are the number of doctors per 1000 people and the percentage of the population that has access to essential medicines. These variables are seen here to have important independent effects.
People engage in many risky behaviors that compromise their health status and longevity. One of the foremost health risks today is smoking, with per capita tobacco consumption being the variable of choice to capture the impact of this risk factor on life expectancy. Unfortunately, country data do not exist for total tobacco consumption. While the measure used here, the amount of cigarettes consumed annually per capita, exists for each country, this measurement omits some forms of tobacco consumption such as pipe smoking and hand-rolled cigarettes. It is likely, however, that non-cigarette consumption of tobacco is highly correlated with cigarette consumption, particularly when other variables, notably, income are controlled.
A second risky behavior with potentially enormous implications for life expectancy is that of unprotected sex in the presence of the AIDS virus. This variable is of particular importance in that it strikes predominantly people in the younger age groups that would not otherwise be likely to die for many years. This is in contrast to the chronic diseases of aging, such as heart disease and cancer, whose cures would not dramatically affect overall life expectancy. We account for AIDS by the percent of adults living with AIDS or HIV.
The variables described above were acquired from the Human Development Report (HDI 2001). The HDI, which is commissioned by the United Nations Development Programme (UNDP), supplies numerous statistics for 162 countries. Twelve variables for 158 countries were extracted from this source.(4) The descriptive statistics of the variables are provided, as previously noted, in Table 2, while Table 3 provides a variable key for the empirical analysis of the following section.(5)
IV. Empirical Findings
Two different models were used to explain the variation in average life expectancy among the sample countries. Model one, the "parsimonious" model is composed of the variables that proved to be of primary importance in predicting life expectancy:
LEi = b0 + b1 leduc + b2 lh2o + b3 ldrug + b4 lmeds + b5 lGDP + b6 aids + b7 aids2 + ei
The natural log of the first five variables helps account for their diminishing marginal return to life expectancy. Sanitation and public health expenditure are left out of this model because they are reasonably proxied by other variables, such as adequate water supply, GDP per capita, and the supply of people in the medical field. Cigarette consumption is not included in model one because it only reached significance in a sub-sample of the more developed countries that usually have a relatively longer life expectancy. The gini coefficient is dropped from model one due to lack of significance, likely because of its inadequacy in measuring inequality vis-à-vis a Lorenz curve. Finally, urbanization is not incorporated in model one because it appears that the positive and negative effects cancelled each other out leaving no significant overall effect, particularly when GDP per capita is included in the regression.
Model two includes the five variables that model one leaves out:(6)
LEi = b0 + b1 leduc + b2 lh2o + b3 ldrug + b4 lmeds + b5 lGDP + b6 aids + b7 aids2 + b8 cigs + b9 gini + b10 urbanpop + b11 sanitize + b12 healthpu + ei
Both models perform exceptionally well. The predicted impact of the variables on life expectancy is robust in all specifications of the models. The adjusted R-squares are all similar, centering on 0.92. In model one, each of the variables, except for AIDs, which has a significant negative effect, has a positive and significant effect on life expectancy (see Table 4).
Five out of the eleven variables used in the regressions were not significant, leading to the model 1-model 2 taxonomy. As previously mentioned, some of these variables are likely to be poor proxies for the desired factor.
Tables 5 and 6 illustrate the results of Table 4 in a way that more readily permits one to gauge the policy implications of the findings. Table 5 shows how a ten-percent increase in an independent variable would impact life expectancy, holding other variables constant, while Table 6 provides the standardized (beta) regression coefficients. As is clear from either table, reducing the percentage of the population with AIDS and increasing the percentage of population with access to improved water sources and medical drugs would result in pronounced improvements in life expectancy--and these improvements could be obtained relatively cheaply. While it is the case that rising income per capita extends life expectancy among its many other benefits, it is clear that there are low-cost methods of increasing life expectancy that do not require dramatic increases in economic development.
Combining the life expectancy benefits with other benefits (e.g. morbidity, income prospects) of water provision, medical drug provision, educational improvements, or AIDS prevention programs would allow better resource allocation decisions to be made via benefit-cost analysis. This is particularly the case in the developing world, where proper resource allocation decision-making is most critical on the margin.
V. Conclusions and Implications
In a sample involving a large range of countries we find that life expectancy is affected by many variables, some of which suggest that life expectancy can continue to rise in both developed and developing countries. While there may be an upper limit to life expectancy, as discussed earlier, in the absence of fundamental breakthroughs in anti-aging research, the empirical analysis here suggests that we are not at that limit yet.
More importantly, from an immediate policy perspective, are the indications that fairly low-cost policies (e.g. enhanced provision of water, medical drugs, or AIDS education/care) can lead to dramatic improvements in life expectancy in developing countries. While doubtless the case that increased income levels would result in more or less automatic improvements in provision of water and medical care, we show here that, holding income constant, great improvements are possible with the expenditure of what would globally be regarded as trivial amounts of resources. This is important, since the solution to the general development problem has been an intractable goal.
Government policies affecting water quality and health care have many benefits, such as morbidity and workdays lost, benefits not limited to the life expectancy benefits focussed on here. However, the life expectancy benefits of affecting those variables should be added to other benefits of education, improved access to clean water and drugs, and AIDS prevention. Doing so offers the potential to result in better allocation of costly scarce resources in countries at various stages of development.
Future research could build a forecasting model incorporating mortality
trends along with the impacts of socioeconomic variables investigated here,
as well as others that are likely to become available as data measurement
and accessibility improves over time.
Definitions of Variables
AIDS/HIV, Adults living with--The estimated number of adults living with HIV/AIDS at the end of the year specified.
Cigarette consumption per adult, annual average--The sum of production and imports minus exports of cigarettes divided by the population aged 15 and above.
Drugs, population with access to essential--The percentage of the population for whom a minimum of 20 of the most essential drugs are continuously and affordably available at public or private health facilities or drug outlets within one hour's travel from home.
Education index--One of the three indices on which the human development index is built. It is based on the adult literacy rate and the combined primary, secondary and tertiary gross enrollment ratio. The adult literacy rate is weighted by two-thirds and the gross enrollment is weighted by one-third.
GDP (gross domestic product), per capita--The total output of goods and services for final use produced in a country, by both residents and non-residents, per person.
Gini coefficient--Measures inequality in the distribution of income among individuals or households, with a value of zero representing complete equality, a value of 100 complete inequality.
Human development index (HDI)--A composite index measuring average achievement in three basic dimensions of human development--a long and healthy life, knowledge and a decent standard of living.
Life expectancy at birth--The number of years a newborn infant would live if prevailing patterns of mortality at the time of its birth were to stay the same throughout its life.
Physicians--Includes graduates of a faculty or school of medicine in any medical field (including teaching, research and administration).
Population, total--Refers to the de facto population, which includes all people actually present in a given area at a given time.
Public Health expenditure--Recurrent and capital spending from government (central and local) budgets, external borrowings and grants (including donations from international agencies and nongovernmental organizations) and social (or compulsory) health insurance funds. Together with private health expenditure, it makes up total health expenditure.
PPP (purchasing power parity)--A rate of exchange that accounts for price differences across countries, allowing international comparisons of real output and incomes, in terms of the purchasing power of $1 in the United States.
Sanitation facilities, population using adequate--The percentage of the population using adequate sanitation facilities, such as a connection to a sewer or septic tank system, pour-flush latrine, a simple pit latrine or a ventilated improved pit latrine. An excreta disposal system is considered adequate if it is private or shared (but not public) and if it hygienically separates human excreta from human contact.
Urban population--The midyear population of areas defined as urban in each country and reported to the United Nations.
Water sources, population using improves--The percentage of the
population with reasonable access to an adequate amount of drinking water
from improved sources. Reasonable access is defined as the availability
of at least 20 liters per person per day from a source within one kilometer
of the user's dwelling. Improved sources include household connections,
public standpipes, boreholes with hand pumps, protected dug wells, protected
springs and rainwater collection (not included are vendors, tanker trucks
and unprotected wells and springs.
|AIDs, Population of Adults with (%)||2.65||5.75||0.22||35.8||0.001|
|Cigarette Consumption, Annual (per capita)||1314.08||840.88||1223||3923||82|
|Drugs, Population With Access to Essential (%)||75.99||22.82||80||100||15|
|GDP (per capita)||8004.56||8517.58||4556||42769||448|
|Health Care Expenditure, Public (as % of GDP)||3.38||2.02||2.75||8.3||0.2|
|Life Expectancy at Birth (years)||65.04||12.00||69.50||80.63||39.40|
|Physicians (per 100,000 people)||151.21||133.89||126||554||3|
|Sanitation Facilities, Population Using Adequate (%)||72.02||26.90||81||100||8|
|Urban Population (% of total)||55.20||23.78||56.4||100||6.1|
|Water Sources, Population Using Improved (%)||77.79||20.70||82.5||100||24|
Source: Parr, S.F. (2001) Human Development Report 2001
|Leduc||Log education index|
|Lh20||Log population using improved water sources (%)|
|Ldrug||Log population with access to essential drugs (%)|
|Lmeds||Log number of physicians (per 100,000 people)|
|Lgdp||Log disposable income (per capita, in U.S. dollars)|
|AIDs||Population of adults with AIDs (%)|
|Cigs||Cigarette consumption, annual (per capita)|
|Urbanpop||Urban population (%)|
|Sanitize||Population using adequate sanitation facilities (%)|
|Healthpu||Public health care expenditure (% of GDP)|
|Log Equation 1||Log Equation 2 (means)||Log Equation 2 (regression)|
* Significant at 99%; ** Significant at 95%; *** Significant
|Equation 1||Equation 2 (means)||Equation 2 (regression)|
* Significant at 99%; ** Significant at 95%; *** Significant
|CHANGE IN LIFE EXPECTANCY
FOR A CHANGE OF 10% (OR 1 UNIT FOR AIDS)
|AIDs, Population of Adults with (%)||-1.06 years|
|Drugs, Population with Access to Essential (%)||0.22 years|
|Education Index||0.92 years|
|GDP per capita||0.26 years|
|Water Sources, Population Using Improved||0.51 years|
|Standardized Regression Coefficients (BETAs)|
|Log Equation 1||Log Equation 2||Log Equation 2|
* Significant at 99%; ** Significant at 95%; *** Significant
Badiee, S. (2000). 2000 World Development Indicators. Washington, D.C.: World Bank.
Carnes BA, Olshansky SJ, Grahn D. (1996) Continuing the search for a law of mortality.
Population and Development Review. 22(2), 231-264.
Finch C, Kirkwood TBL. Chance, Development, and Aging. Oxford University Press;
Isaacs, A., Daintith, J. &, Martin, E. (Eds.). (1999). Oxford dictionary of science. Oxford: University Press.
Lee, R. D., & Carter, L. R. (1992). Journal of the American Statistical Association, 87 (419), 659-671.
Manton, K. G., Stallard, E., & Tolley, H. D. (1991). Limits to human life expectancy: Evidence, prospects, and implications. Population and Development Review, 17 (1), 603-637.
McNown, R., & Rogers. A. (1992). Forecasting cause-specific mortality using time series methods. International Journal of Forecasting, 8, 413-432.
McNown, R., & Rogers, A. (1989). Forecasting mortality: A parametrized time series approach. Demography, 26, 645-660.
Oeppen, J. and Vaupel JW. (2002) Broken Limits to Life Expectancy. Science May 10 2002, 1029-1031.
Olshansky SJ, Carnes BA, Désesquelles A. (2001) Prospects for human longevity.
Science. 291(5508), 1491-1492.
Olshansky SJ, Carnes BA, Cassel C. (1990) In Search of Methuselah: Estimating the
upper limits to human longevity. Science. 250, 634-640.
Parr, S. F. (2001). Human Development Report 2001. New York: Oxford University Press.
Rogers, R.G. and Wofford, S. (1989). Life expectancy in less developed countries: socioeconomic development or public health? Journal of Biosocial Science, 21,245-252.
Tuljapurkar, S., & Boe, C. (1998). Mortality change and forecasting: How much and how little do we know? North American Actuarial Journal, 2 (4), 13-47.
2000 World Development Indicators (CD Rom Version). (2000). Washington, D.C.: Development Data Group of the World Bank's International Economics Department.
1See Isaacs et al. 1999 for further details of current biological understanding of aging. Calorie restriction is the only known means, at this time, of extending maximum lifespan in mammals; while CR is almost certain to extend lifespan in humans, this approach is sometimes viewed as being impractical (see www.calorierestriction.org for more details).
2Rogers and Wofford 1989, discuss the relative importance of socioeconomic development versus public health in a sample of 95 less developed countries in a study akin to the present. In our analysis we find that a broader range of countries actually improves the estimates, we include additional explanatory variables, and we update to the most recent data (which is likely to be more reliable).
3 The forecasting models that Tulijapurkar and Boe (1998) describe are the Lee-Carter (1992) method, and the McNown-Rogers (1989, 1992) method. The Lee-Carter method is a basic model that relates mortality to two age-specific variables ("a" and "b") and a time specific vector ("k"). The McNown-Rogers method employs a non-linear function using seven parameters dependent on time. They calculate the variable values per year and use the trends of the parameters to forecast mortality.
4 Brnei Darussalam, Qatar, Tajikistan, and Nigeria were not used due to lack of data.
5 Many of the observations contained missing values for some of the independent variables (notably cigarette consumption and the gini coefficient). One solution is not to use the countries containing missing data (decreasing the sample size from 158 to 68). Most of the countries that are missing data, however, are only missing one or two variables. Two conventions for filling in missing data are to use the mean (unbiased) or the predicted values of a regression on the independent variable (possibly biased but more efficient). We use a hybrid approach that fills in the means of low, medium, and high-developed countries, depending on which group the observation falls in. The results presented in the text are remarkably robust to the various ways of handling this problem.
6The second model was run with two different data treatments. One data set has the missing cigarette observations filled in using the mean method. The other data set has the values filled in with the regression method. There is no substantial difference in the two methods of filling in the missing cigarette consumption observations, although the cigarette variable performed better using the regression method. Model one as well as both variations of model two were run on various permutations of log and linear dependent and independent variables. Additionally, the 158-country sample was broken into high, medium, low, and low/medium development using the HDI ranking, with the text regressions run on the subgroups. The text results were robust with respect to these treatments (available from the authors), although as one might expect the reduction in the range of variation in the sub-groupings generally reduced coefficient significance markedly.