DETERMINANTS OF HOUSEHOLD SCHOOL DEMAND IN ETHIOPIA: A MULTIVARIATE ANALYSIS

Mulugeta Gebreselassie*

Abstract: A logistic regression model is estimated to assess the role of household, community and regional factors in determining household demand for schooling in Ethiopia. Similarly a canonical Discriminant function, which separates those with the demand and those without, is computed. The strategy first to study primary and secondary school demand separately and then to study the overall school demand.

1. INTRODUCTION

Education is essential for raising individual productivity as it provides the skills and basic intellectual tools necessary for further learning which can later be transferred from job to job (UNESCO, 1993). Consequently, prominent on the agenda of social goals is the attainment of priority objectives in the field of education.

Despite this prominent agenda, evidence from published statistics indicate that in Ethiopia a strikingly low percentage of school age children participate in formal education. At less than 20% gross enrolment rate (GER), Ethiopia is one of the lowest in Sub Saharan Africa in terms of school access. This could be attributed to the low household demand for schooling. However, Ethiopia has come some distance in making education available to its people and recent trends are encouraging. For instance Gross Primary Enrolment in 1992/93 was 19.7% and in 1993/94: 23% and in 1994/95: 29% (MOE, 1997). To maintain this growing school participation rate, however, much remains to be done.

The overall objective of this research is therefore to study the main socio-economic and demographic determinants of school demand in Ethiopia. The specific objectives of this study are as follows:

The attainment of universal school participation, especially at primary school level, is prominent on the agenda of educational policy makers. The contribution of this study in this regard is significant. It will help policy makers and planners to identify priority areas of work and key determinants of household school demand in Ethiopia. It will also help to minimise cost in further studies, especially cost of data collection, as the study has summarised the most important factors which determine schooling in Ethiopia.

2. LITERATURE REVIEW

2.1 Studies on Other Countries

Tan, Lee and Mingat (1984) studied the determinants of the proportion of eligible siblings enrolled for Malawi. They applied the Ordinary Least Squares Regression (OLS) with the dependent variable defined as the proportion of children of eligible ages who are enrolled (number of siblings in school divided by the number of siblings in age interval 5-23 years). They found that cost of schooling (school fees + other schooling expenses) exerts a statistically significant negative affect on the actual number of eligible children that households enrol. Other variables which were statistically significant include, family background characteristics such as mother's education, urban-rural residence and proportion of girls among household's children.

The demand for primary schooling in rural Mali was analysed by using OLS regression (Birdsal, 1987). The dependent variable was the ratio of the number of persons in the household currently enrolled in school to the number of persons in the household between the ages of 6 and 14. School fees, distance to school and school quality measured by student teacher ratio, number of books per classroom and the late payment of teachers (perhaps reflecting the general level of administrative support teachers can expect) had negative association with enrolment. Among the household variables, the effect of income appeared to be negative. The study also found that school fee has a higher negative correlation, in absolute terms, with demand than with distance. Ethnic and religious difference across households had no effect on demand once other factors were taken into account.

A study in rural Peru (Gertler & Glewwe, 1989), used a Logit Model for two sets of data, households with local school alternative and households with far away school alternative. It found that, level of parents', education, presence of other children of secondary school age, sex, school quality and cost of schooling to be the most important determinants of schooling. Besides this the study found parent's, level of education to be positively correlated with enrolment.

Determinants of school demand in Sudan (Maglad, 1994) was studied by using a multiple regression model. The dependent variable was the net enrolment rate and the explanatory variables were age, sex of a child, parent's education level (mother's & father's ), land holding per adult (land ownership) , distance to school (distance to primary and secondary schools), urban-rural residence and region of residence.

This study used distance to school as a measure of price of schooling. It argued that, the further away the school was from the village, the larger were the costs, since more time was spent in travelling to and from school and might in some cases have resulted in longer periods of absence if the child had to reside at a school boarding or with a relative living near the school. The availability of a school inside a village makes it possible to attend school at lower indirect costs, since the child can attend school at times when demand for his/her labour is at its peak (for example, at harvesting time). Land ownership, which can be expressed by land holding per adult, was also found to have a significant effect on enrolment.

The variation on the effect of the different determinants of school enrolments for the different regions in Sudan was studied by using regional dummies in the model. The study described regional dummy variables as important to control for some of the unobserved heterogeneity in the community such as quality, ethnic composition and infrastructure in the explanation of child schooling.

As cited in Maglad's (1994) study of Sudan, Singh (1992) in his study of Brazil, found household size to be an important factor negatively affecting school enrolment. However, Chernichovsky (1985) in a study of schooling in Botswana, found family size to be positively correlated with school enrolment .

2.2 Studies on Ethiopia

A report for USAID (1993) which used a total of 520 households with school age children from four regional localities (namely Bale, Welaita, South Gondar and central Tigray) provided a comprehensive study on demand for schooling in rural Ethiopia. The study found that economic constraints represented the most salient impediments to participation and persistence in primary school in the rural areas. In the study both mothers and fathers agreed that opportunity costs (need to work at home) and school costs were the biggest obstacles to sending their sons and daughters to school. However those who sent their sons to school reported that they perceived boys as returning parents' investment in schooling.

The study concluded that "while school quality is undoubtedly an important factor in the learning process, initially improving school quality in ways that do not alleviate household financial burdens will probably not prove the answer to increasing enrolments in village schools. Parents must be educated to be both consumers of and informed consumers of schooling in order to recognise and demand school quality improvement''. In the same study, income, approximated by ownership of a tin roofed house was found to have a positive and significant association with educational demand.

A similar study by Psacharopoulos and Wood Hall (1995) showed gender difference to be linked with other important determinants of demand such as attitudes and values pertaining to costs and benefits of sending boy and girl child to school.

A report for the World Bank (PHRD, 1996)) undoubtedly provides the most recent and comprehensive study on demand for schooling in Ethiopia to date. The information for this report came from three different surveys: Two of them, the 1995/96 Household Income, Consumption and Expenditure Survey and the Welfare Monitoring Survey, were carried out by the Central Statistical Authority (CSA) and the third by the Department of Economics at Addis Ababa University (AAU) in collaboration with the Centre for the Study of African Economies in the University of Oxford. The data from the Ethiopia rural household survey, conducted by the AAU and Oxford University, concluded that economic constraints such as, boys' labour for farm activities, girls' labour for farm activities, and school distance represented the most salient impediments to attending school. Similarly the study from the 1995/96 Household Income, Consumption and Expenditure Survey and Welfare Monitoring Survey showed that the main reasons for less demand of schooling are boys' labour, failure, distance to school, inability to afford education and of low quality.

Finally the two studies concluded that school enrolment in rural Ethiopia was constrained by a combination of demand and supply considerations. They recommended that, it has instructive to consider both the household characteristics which influence demand for education and to examine both anecdotal and survey evidence to provide a full picture of the constraints to education in Ethiopia.

To summarise, the reviewed studies have some weak points. In some of them important factors that can affect school demand are not exhaustively considered and some of them are analysed by simple statistical methods. Model diagnostics, which is an important component of any statistical analysis is not used in any of the studies. Furthermore, other sophisticated statistical techniques and alternative scenarios of variable definition were not considered. It is against this backdrop that this study attempts to examine the main determinants of school enrolment.

3. DATA AND METHODOLOGY

3.1 The Data

The data for the analysis was procured from the 1995/96 Household Income, Consumption and Expenditure Survey and Welfare Monitoring Survey carried out by the Central Statistical Authority. The surveys covered more than 7000 households selected from 571 rural enumeration areas (agricultural and non-agricultural sedentary population) and more than 4000 households selected from 323 urban enumeration areas. The sample was selected using multi-stage sampling techniques (CSA, 1997). For detailed information on the sampling and selection techniques read the report of the surveys.

Even though the sample size is large, the data has some limitations. For instance, the number and the type of explanatory variables is constrained by the available information. Factors such as land ownership, type of shift system, indirect costs and quality of education were not included because the information was not available. In addition, the design of the survey was so complex that stratum interdependence could create inconsistent parameter estimates, though it is insignificant large samples.

3.2 Variables in the Study

In the endeavour to understand the main factors affecting school demand several studies have been reviewed. Most of these studies testified that schooling determinants, especially in Developing Countries, are of similar nature. Hence, this study considers those variables, which are important, in light of the reviewed literature and the actual situation of schooling demand in Ethiopia.

The definition of the dependent variables is based on the fact that, a household sending at least one child to school has the demand for schooling. The degree of variation in demand among households is not entertained here. The problem with such a definition, statistically speaking, is that, there will be loss of information due to reducing the data to two categories. However, this type of definition has the advantage that it will significantly take care of the smallest demand observed in the households.

A. Dependent variables

B. Explanatory Variables

II. Community Variables.

1= urban 0= rural

III. Regional Dummies

There are also interaction variables, which are important to detect the interaction effect of some of the factors. These are:

* if there is at least one person in the household who earns a salary because he is educated

3.3 Methodology

Different approaches were used to investigate the main determinants of school enrolment. Since the criterion variables are defined as binary, indeed not in conformity with other studies, this study uses logistic regression and Discriminants analyses methods.

3.3.1 Logistic Regression

In most cases the assumption that a probability model is linear in the independent variable is unrealistic. Further, if we correctly specify the model as linear, the statistical properties derived under the linearity assumption will not, in general, hold. The obvious solution to this problem is to specify a non-linear probability model in place of the linear probability model (Aldrich & Nelson, 1984; John & Forrest, 1984).

Denote a set of K predicators for a binary response Y by X1, X2 ------Xk and the probability that Y=1 by _(x) . The logistic regression model generalises to

Where the parameter _i refers to the effect of a unit increase in Xi on the log odds that Y=1 , controlling for other X's. Furthermore, Exp(_i) is the odds ratio, that is the effect of a unit increase in the odds of Y=1, on the odds of a unit increase in Xi , at fixed levels of the other X's.

To test the significance of the effect of X on the binary response, we set the null hypothesis as Ho: _=0 (the probability of success is independent of X). Since for large sample size the Maximum Likelihood (ML) estimate is normally distributed, it can be used to test the hypothesis under consideration. Hence this study uses the Likelihood Ratio (LR) test, which compares the maximum likelihood function when _=0 (i.e. when _(x) is forced to be identical at all X-values) to the ML function for unrestricted _ because the LR test is more reliable even, in small samples. (Agresti, 1996; Hosmer, W. & Lemeshow, S., 1989).

A model with several predictors has the potential for multicollinearity, that is, strong correlation's among predictors, making it seem that no one variable is important when all the others are in the model. Hence, to select a model, it is important to use variable selection procedure. This study uses the backward elimination procedure, starting with a complex model and successively taking out terms. At each stage, it eliminates the variable in the model that has the largest p-value when we test that its parameters equal zero.

After building a model it is important to examine the adequacy of the resulting model. Fitted logistic regression models provide predicted probabilities that y=1. At each grouping of the explanatory variable one can multiply the predicted probability by the total number of subjects to obtain a fitted count. The test of the null hypothesis that the model hold compares the fitted and observed counts using a Pearson Chi-Square (_2) or likelihood ratio (G2) test statistic. As usual, large _2 or G2 values provide evidence of lack of fit.

When the fit is poor residuals and other diagnostic measures describe the influence of individual observations on the model fit and highlight reasons for the inadequacy (Agresti, 1996). Residuals comparing observed and fitted counts are also useful for this purpose. The leverage, Cook's distance, and DfBeta are some of the measures which use residuals in measuring inadequacy. However the logistic regression module in SPSS-WIN does not give further analysis of this results. Hence the values of these diagnostics should be saved and analysed further using their normal probability plots, plots with each other and the explanatory variables. Still after calculating the residuals, multicollinearity diagnostics are not easy to interpret.

To avoid this problem a multiple regression model is constructed using the same dependent and independent variables, to search for the VIF (variance inflation factors) and the condition indices which are important to diagnose multicollinearity. The reason for the shift in methodology is multicollinearty has nothing to do with the form of the dependent variable, as it is the correlation between the explanatory variables.

The power of the fit can also be seen from the classification power of the model. That is, if the model classifies a significant number of the observations correctly to the group to which they belong then definitely, we can conclude that the fit is satisfactory. The SPSS-WIN has this facility.

3.3.2 Discriminant Analysis

The problem that is addressed with Discriminant analysis is: how well it is possible to separate two or more groups of individuals given measurements for these individuals on several variables (Manly, 1986). Furthermore Discriminant analysis is important for the identification of important discriminating factors of two or more groups.

As mentioned above, our objectives in this particular method of analysis were to examine whether all the variables together permit a greater discrimination between households with and without school demand, to see the effectiveness of the discriminating equation in classifying the households as with and without school demand, and to select the most important discriminating variables. To satisfy these objectives a Discriminant analysis was performed using demand status of the household as a criterion variable and the factors, which affect school demand, as predictor variables. For this purpose among the different options in SPSS-WIN the Wilks' lambda Minimisation stepwise procedure is used. The reason why we used the stepwise variable selection procedure is because it combines the features of forward selection and backward elimination methods. Our interest in the Wilks' procedure is twofold. One is, that if the groups are not too close, the criterion gives the most important subset of Discriminants, and the second is that, it does not require lengthy calculations.

The logic behind Wilks' stepwise procedure is as follows. It starts by calculating a Wilks' Lambda and associated F-to enter for each variable. Then, at the first step, the variable with the largest F-to enter which is greater than the minimum tolerance criterion is chosen, on successive steps one variable is chosen that maximises the F-value after adjusting for variables already chosen. The stepwise procedure continues until no further significant gain in discrimination can be achieved by the addition of more variables. This step by step procedure is important since a variable that is chosen at an early step may, at a later step, be removed because of relationship between it and other variables already chosen (Hawkins (1982); Affifi and Azen (1979)).

There are several features that an "optimal" discrimination function should possess. Firstly, it should result in few misclassifications, that is, the probabilities of misclassification should be small; secondly it may be that one group has a greater likelihood of occurrence than another and hence" prior probabilities of occurrence" should be taken into account; and lastly it should when ever possible, account for costs associated with misclassifications (Johnson & Wicherin, 1992). This study has considered the first two features strictly, but the third case is abandoned, as misclassifying a child has no cost.

We know that the discrimination procedures that we used above are based on the assumption of equal within group covariance matrix for both groups. Moreover, tests of significance require the assumption that within groups the data follow multivariate normal distributions. Though it may not be simple to establish the significance of results, Discriminant analysis is not very sensitive to violations of these assumptions, unless the violations are extreme (Harris, 1975; Manly, 1986). Nevertheless, the statistical package, which we are using for analysis, gives the above two tests by default. Results and interpretations of the analysed data follow in the next chapter.

4. RESULTS OF DATA ANALYSIS

In this chapter we shall be concerned with the analysis of the results of the study. The analysis is done at household level. Results of the analysis are presented below.

As we can see from Table 1, the average households in Ethiopia are either illiterate or less than, or equal to grade two. The education of fathers is a little better than that of mothers when measured by years of schooling. Most households are headed by males and only 40 percent of them are educated. On average about three persons of school age are found in each household. Among these around two are primary school age persons. A person must walk about 18 km to access a secondary school, while he has to walk around 3 km to access a primary school.

Table 4. 1. Descriptive Results

Variable

Mean

Std. Dev

Minimum

Maximum

N

SCHL_FEE

.04

.19

0

1

9912

RET_EDU

.06

.25

0

1

9912

HEAD_EDU

.40

.49

0

1

9912

FS15_18

.63

.79

0

9

9912

HEAD_SEX

.73

.45

0

1

9912

PRP_ADLT

.99

.86

.00

10.0

9874

PRNGR_BO

1.21

.85

.17

7.00

5219

MOTH_EDU

1.27

3.08

0

13

9896

FS7_14

1.37

1.18

0

7

9912

FATH_EDU

1.79

3.60

0

13

9901

NU_ENR

1.87

2.20

0

21

9912

FS19_95

2.37

1.20

0

14

9912

FS7_24

2.65

1.67

1

18

9912

DIST_PRI

3.04

5.50

.0

99

9912

LN_SCHEXP

3.46

2.38

-3.91

11.16

4091

FAM_SIZE

5.41

2.37

1

24

9912

LNEXP

5.62

1.47

1.10

11.16

1784

DIST_SEC

17.90

22.84

.0

99

9912

INCOME

494.91

9712.84

.00

802110

9912

4.1 Logistic Regression Results

Table 4.2 contains the estimated coefficient (under column heading _) and related statistics from the logistic regression model that predicts the outcome variable. For instance for the outcome variable ENROL it predicts from a constant and the predictor variables location of residence (urban or rural), family size of age 7-24, education and sex of the family head, expenditure, mother's education, and the proportion of adults to school age children. Similar values for P_ENROL and S_ENROL are also depicted in their respective headings.

Table 4.2A Parameter Estimates for the Logistic Regression for ENROL

Variable

_

S.E.

Sig

Exp(_)

URB_RUR

2.4601

.4930

.0000

11.7060

FS7_24

.6001

.1180

.0000

1.8222

HEAD_EDU

11.1604

22.1834

.6149

70291.77

HEAD_SEX

-2.2381

.5026

.0000

.1067

LNEXP

.4582

.1231

.0002

1.5813

MOTH_EDU

3.7970

7.2200

.5990

44.5662

Constant

-1.1960

.8374

.1532

 

The column Exp(_) is the factor which the odds change when the ith independent variable increases by one unit. If _ is positive, this factor will be greater than 1 which means that the odds are increased. And when it is negative the odds decrease. It remains constant when__ is zero. Finally the model is diagnosed for multicollinearity, auto-correlation and outliners.

Table 4.2B Parameter Estimates for the Logistic Regression for P_ENROL

Variable

_

S.E.

Sig

Exp(_)

URB_RUR

1.4204

.2710

.0000

4.1389

DIST_PRI

-.0464

.0297

.1183

.9546

HEAD_EDU

2.0088

.2676

.0000

7.4545

HEAD_SEX

-.9780

.2883

.0007

.3761

RET_EDU

-1.0869

.3842

.0047

.3373

MOTH_EDU

-.0620

.0350

.0763

.9399

FS7_14

1.5788

.1421

.0000

4.8492

LNEXP

.1725

.0761

.0234

1.1883

Constant

-2.3127

.5350

.0000

 

Next to parameter estimation is assessing the goodness of fit of the model. A good model is one that results in a high likelihood of the observed results. This translates to a small value for -2LL. Table 1 (Annex) shows the goodness of fit statistics for the model with all of the independent variables. In this table the entry model chi square is the difference between -2LL for the model with only a constant and -2LL for the current model. Thus the model chi square, which is found statistically significant, tests the null hypothesis that the coefficients for all the terms in the current model, except the constant, are zero.

Analysing the classification table is another way of assessing the goodness of fit of the model. The classification table shown in Table 2 (Annex) demonstrates 92.64%, 86.8% and 85.37% of the cases are correctly classified by the models for ENROL, P-ENROL and S-ENROL respectively. Finally the diagnostics aspect of the model is seen from the graph of the residuals and the VIF values in the multiple regression models. The results of these showed that multicollinearity as well as other model problems are not so severe as to challenge the goodness of the models.

Table 4.2C Parameter Estimates for the Logistic Regression for S_ENROL

Variable

_

S.E.

Sig

Exp(_)

URB_RUR

2.3959

.3864

.0000

10.9786

HEAD_EDU

.4739

.2394

.0478

1.6062

HEAD_SEX

-.5679

.2205

.0100

.5667

RET_EDU

.4771

.2775

.0855

1.6114

MOTH_EDU

.0513

.0254

.0434

1.0526

LNEXP

.2828

.0650

.0000

1.3269

DIST_SEC

-.0329

.0150

.0283

.9676

FS15_18

1.6544

.1354

.0000

5.2298

Constant

-5.4003

.5818

.0000

 

4.2 Discriminant Analysis Results

Discriminant analysis is performed on the data to Discriminant between the two groups, households with and without school demand for the three dependent variables, ENROL, P-ENROL and S-ENROL. To do this in each criterion variable the sample is divided into analysis (samples that the Discriminant analysis is performed) and holdout (samples that the classification power of the Discriminant function is tested). The holdout sample constitutes 12.7% of the total non missing cases.

Table 4.3 Number of Cases Used for Analysis by Group

Number of Cases

Group

ENROL

P-ENROL

S-ENROL

0

464

663

1597

1

1612

1413

479

Total

2076

2076

2076

To perform Discriminant analysis the assumption of two or more groups that are significantly different on any single variable is a prerequisite. Hence it is helpful to begin analysing the differences between groups for each of the explanatory variables by examining univariate statistics. In order to identify some differences among the groups the calculated F-statistics are displayed in Table 3 (Annex). In the table F-ratios, Wilks' Lambda value (the ratio of the within groups sum of squares to the total sum of squares) and P-value are depicted. From the table we can see that 15, 16 and 15 out of 22, 21 and 21 variables are responsible for the group differences in ENROL, P-ENROL and S-ENROL respectively. From these variables 4 for each criterion are regional dummies and the remaining are other household factors.

The analysis stage of discrimination requires determining and computing the possible number of canonical Discriminant functions that characterised the differences between the groups. Since the analysis involves only two groups, the maximum number of functions that could be obtained is one. And this is presented in Table 4.4. In this table a chi-square test for the significance of the discriminating power of the function is presented. The test showed that, at P-value less than 0.00001, the functions are statistically significant.

In order to test group differences the statistic, Wilks' lambda, based on the Eigen function is used. Because, as discussed in section 3.2, among the several stepwise procedures in "DISCRIMINANT" of SPSS-WIN, the Wilks' minimisation procedure is preferable. According to Table 4.3, Wilks' lambda value of 0.6374, 0.6785 and 0.5408 are associated with a chi-square (_2=27.219, 802.49 and 1272.38 df=10, 10 and 8 P<0.00001) each for ENROL, P-ENROL and S-ENROL respectively. This indicates that the difference in the two groups is highly significant.

Table 4.4 Canonical Discriminant Functions

Criterion

Eigen value

Canonical Corr

Wilks' Lambda

Chi-square

df

Sig

ENROL

.5690

.6022

0.6374

931.941

10

.0000

P-NROL

.4738

.5670

.678504

802.493

10

.0000

S-NROL

.8491

.6776

0.540817

1272.376

8

.0000

The other important statistic depicted in this table is the canonical correlation coefficient. It is interpreted as the multiple correlation in linear regression. Its square explains the amount of variation explained by the explanatory variables. About 36.3%, 32.2% and 45.8% of the variation in ENROL, P_ENROL and S_ENROL respectively is explained by the independent variables in the summary table for each of them.

The next step is to compute the standardised coefficients associated with canonical Discriminant function. These coefficients, which determine the influence of each explanatory variable in determining the canonical Discriminant function, are presented in Table 4.5. However Discriminant loading (structure correlation's: pooled with in groups correlation between discriminating variables) have recently been proved to be more valid than standardised canonical Discriminant function coefficients (Discriminant weights) in revealing the importance of the explanatory variables in discriminating groups (see Hair et al 1987). The reason is that Discriminant weights are affected by predictor intercorrelations and do not reflect common variance.

After examining the standardised coefficients the variables are ordered according to their degree of influence in determining the canonical Discriminant function, see table 4(Annex). In this regard it is indicated in the table that for ENROL: HEAD_EDU, URB_RUR, FATH_EDU, DIST_PRI, LNTEXP, MOTH_EDU and DIST_SEC, for P-ENROL: HEAD_EDU, URB_RUR, FS7-14, FAM-SIZE, FATH_EDU, DIST_PRI and LNTEXP and for S-ENROL URB_RUR, FS7-18, MOTH-EDU, FATH_EDU, DIST_SEC, RET_EDU and ADIS_REG have the largest influence. Similarly RET_EDU, ADDIS and FS7-24 for ENROL, MOTH_EDU, RET_EDU and ADDIS for P_ENROL and HEAD_EDU, LNTEXP and FAM_SIZE for S_ENROL have mild contribution. The remaining variables in each of the three dependent variables are the least contributors.

Table 4.5 Standardised Canonical Discriminant Function Coefficients

           

ENROL

P_ENROL

S_ENROL

LNTEXP

.17319

FAM_SIZE

.10745

FAM_SIZE

.09017

HEAD_SEX

-.23121

HEAD_EDU

.61900

HEAD_SEX

-.11544

MOTH_EDU

-.20174

HEAD_SEX

-.14931

FATH_EDU

.17991

URB_RUR

.32288

MOTH_EDU

-.13531

MOTH_EDU

.09920

DIST_PRI

-.22978

FS7_14

.61004

URB_RUR

.58793

TIG_REG

.14384

URB_RUR

.33635

ADIS_REG

.19149

AFAR_REG

-.09637

DIST_PRI

-.18799

FS15_18

.57698

GAMB_REG

.09060

TIG_REG

.15385

RET_EDU

.15468

 

 

GAMB_REG

.11710

 

 

In order to have a clear mapping of the groups on the Discriminant function space, the group means (centroids), that is the number of standard deviations each group is away from the average of both groups, are computed and are displayed in Table 4.6. This magnifies the length of disparity between the groups in standard deviation units.

Table 4.6 Canonical Discriminant Functions Evaluated at Group Means (Group Centroids)

Group

ENROL

P-ENROL

S-ENROL

0

-1.40529

-1.00442

-.50440

1

.40450

.47129

1.68168

Table 4.7 Prior Probabilities

Group

ENROL

P-ENROL

S-ENROL

0

.22351

.31936

.76927

1

.77649

.68064

.23073

Total

1.000

1.000

1.000

From the analysis of the preceding section we have seen that the two groups are differentiated from one another on the basis of the variables used in the study. Here we need to consider another importance of Discriminant analysis. Initially students were correctly classified into two groups. We want to classify these students to their groups on the basis of the explanatory variables. For this purpose prior probabilities estimated by their sample estimates are given in Table 4.6. The reason for using the sample estimates is that, sample proportions are unbiased estimators of population proportions.

Evaluation of the accuracy of a Discriminant function mandatory requests the development of a classification matrix. However the classification method, LDF is best for classification if the assumption of multivariate normality and equal covariance matrices hold true. As mentioned in chapter three Discriminant functions are robust to these assumptions. Besides, our sample size is very large, to take advantage of the law of large numbers. Therefore we can assume that both assumptions are satisfied. Table 5 (Annex) depicts Fisher's Linear Discriminant function coefficients.

Finally a summary table of classification for each sample (analysis and hold out) is given after calculating the Discriminant scores. This is done by multiplying each set of coefficients by the value of the variables, and then adding the sum of these with the constant. The classification rule, then follows. Classify a person to the group to the largest score. Table 6 (Annex) presents the results obtained from the classification procedure. The rows in these tables represent actual group membership, while the columns represent predicted group membership. The entries in the principal diagonal indicate the number of students correctly classified.

Table 4.8 Calculation of Chance Criteria

CHANCE

ENROL

P-ENROL

S-ENROL

Cmax

0.748

0.522

0.87

Cpro

0.623

0.500

0.77

As we can see from the above tables the classification power of the functions are high. Nonetheless, to evaluate the effectiveness of the models completely, we can again compare these correctly classified percentages to the maximum chance and proportional chance criterion. Maximum chance criterion (Cmax) is a method of determining the chance classification based on the sample size of the largest group. While the proportional chance criterion(Cpro) is the sum of the squares of Cmax for each group (Mathewos, 1996). Both chance statistics are given in Table 4.7.

Since Cmax is greater than Cpro the maximum chance criterion is the criterion to outperform. The percentages correctly classified for each criterion (86%, 79%, and 92%) exceed the Cmax criterion substantially, so we again conclude that the Discriminant model is valid.

Table 4.9A Summary Table for ENROL

Step

Vars

Wilks'Lambda

Sig.

Label

1

HEAD_EDU

.76823

.0000

Education of head

2

URB_RUR

.69759

.0000

Urban/ rural

3

FS7_24

.68323

.0000

Family size age 7-24

4

DIST_PRI

.67149

.0000

Dist. to prim. school

5

HEAD_SEX

.66025

.0000

Sex of head

6

MOTH_EDU

.65326

.0000

Mother's education

7

LNTEXP

.64616

.0000

Ln of expenditure

8

TIGRAY

.64145

.0000

Tigray

9

AFAR

.63924

.0000

Afar

10

GAMBELA

.63735

.0000

Gambela

The results presented so far used all the variables under study for discrimination and classification purposes. As can be seen from Table 4 (Annex) some of the variables are associated with very small coefficients and then their contribution to discrimination would be very small. So that selecting the most important or screening the redundant variables for discrimination and classification is of considerable importance. Next an attempt will be made to select the "best" subset of variables that classified students to their correct group as good as the whole set of variables.

Table 4.9B Summary Table for P-ENROL

Step

Vars

Wilks'Lambda

Sig.

Label

1

HEAD_EDU

.87846

.0000

Education of head

2

FS7_14

.76217

.0000

Family size age 7-14

3

URB_RUR

.70983

.0000

Urban/ rural

4

LNTEXP

.70101

.0000

Ln of expenditure

5

DIST_PRI

.69463

.0000

Dist. to prim. school

6

TIGRAY

.68889

.0000

Tigray

7

GAMBELA

.68565

.0000

Gambela

8

FATH_EDU

.68219

.0000

Father's education

9

HEAD_SEX

.68077

.0000

Sex of head

10

MOTH_EDU

.67931

.0000

Mother's education

11

FAM_SIZE

.67850

.0000

Family size

Wilks' stepwise procedure is used for this data with the minimum tolerance of F-to-enter = 3.84 and F-to-remove = 2.71. A variable is included in the analysis when it results in the largest F-to enter which is greater than the minimum F-to enter. Then the variables are entered and re-evaluated and those result in the largest F-to remove which are greater than the minimum F-to remove are selected. The summary tables (Table 4.9A,B&C) show the selected variables.

Table 4.9C Summary Table for S-ENROL

Step

Vars

Wilks'Lambda

Sig.

Label

1

URB_RUR

.68574

.0000

Urban/ rural

2

FS15_18

.57805

.0000

Family size age15-18

3

FATH_EDU

.56143

.0000

Father's education

4

ADDIS

.55210

.0000

Addis Ababa

5

RET_EDU

.54634

.0000

Return's from educn.

6

HEAD_SEX

.54410

.0000

Sex of head

7

MOTH_EDU

.54247

.0000

Mother's education

8

FAM_SIZE

.54082

.0000

Family size

The results show that education and sex of the head, location of residence, size of school age population in the household, distance to school, mother's education in years of schooling and total expenditure to be important determinants of household demand for schooling. Moreover, Father's education in years of schooling for both primary and secondary school demand and return's from education for secondary schooling were found important in addition to the above mentioned factors.

In general, the preceding results of data analysis demonstrate the usefulness of applying multivariate technique to the analysis of intercorrelated variables. The technique proves to be effective to the extent that 95%, 74% and 81% of the students are correctly classified by the selected variables for ENROL, P_ENROL and S_ENROL respectively.

5. DISCUSSION AND CONCLUSION

Findings of the logistic regression and Discriminant analyses are given in chapter four. The goodness of fit tests showed that the fits are satisfactory. This is confirmed both from the goodness of fit tests in the logistic regression models and LDF. This chapter covers discussion of the important findings and recommendations.

Table 4.2 shows that the binary response school demand is predicted from location of residence, size of school age population in the household, education of the mother and the head, sex of the head and total expenditure (a proxy for income of the household). Similarly, demands for primary and secondary schooling are estimated by the functions depicted in Tables 4.2B and 4.2C. The summary table of Discriminant analysis in Table 4.9 confirms this. The effect of the determinants of household demand, however, varies by sex, location of residence and region of residence (see Annexes).

The study revealed that school demand is highly influenced by the literacy status of the parents. Literate parents are far more likely to demand schooling than illiterate parents do. The structure matrix in Discriminant analysis showed that education of the head, mother's and father's education to be the most influential factors in households' school demand. This is observed at all school levels. This could be because there is a positive association between earnings and education, which gives, for educated parents, more taste for schooling, Hence to alleviate the problem, measures which can improve the educational background of parents should be devised. May be, literacy campaigns, adult education and distance education can be means to that end.

Contrary to expectations, demand was found to increase with family size. This is the result of getting another child's labour while one is in school. Similarly, the presence of another primary school age person in a household increase demand for schooling two fold. This comes from the fact that the presence of another boy who can contribute to the family demand for labour can help the other to go to school. Indeed some studies indicated that some families alternate their children for school year by year.

Distance to school, which has a reinforced effect in rural households and female students, was found significantly negatively associated with school demand. As depicted in table 4.2B a 4% decrease in demand is observed for every 10 Km increase in distance to a primary school. The reasons for this are quite clear. Distance has direct and indirect cost implications. The greater the distance, the greater the costs. In this case it is not surprising to observe parents becoming unwilling to send their children to school. Boosting the started strategy of building schools, especially in rural areas, could alleviate the problem in this regard. Urban schooling problems could be alleviated by encouraging the private sector to invest in education.

Income approximated by total expenditure has a positive correlation with schooling demand. The reason is that the willingness to send children to school, even when there is fee, is higher for high income families. This shows that, if price of schooling is reduced it will create willingness in parents to send their children to school, as the average household in Ethiopia is poor.

Generally, for all schooling levels, the study found that HEAD-EDU, URB-RUR, FS724, DIST-PRI, HEAD-SEX, MOTH-EDU, and LNTEXP to be the most important factors in determining the function that separates the two groups. Similarly in the primary level of schooling HEAD-EDU, FS714, URB-RUR, LNTEXP, DIST-PRI, FATH-EDU, MOTH-EDU, HEAD-SEX, and FAM-SIZE and for the secondary level of schooling URB-RUR, FS15-18, FATH-EDU, RET-EDU, HEAD-SEX, MOTH-EDU, and FAM-SIZE were large contributors to the function that Discriminants between the groups respectively. Therefore this study indicates that given the above-mentioned explanatory variables it is possible to Discriminant between households with and without school demand in each level of schooling.

The combined effect of the variables was also assessed by their effectiveness in classifying students correctly to their groups. The LDF proved to be effective to the extent that 95, 74 and 81 percent of the hold out cases in the outcome variables ENROL, P_ENROL and S_ENROL respectively were classified correctly. This suggests that using additional variables other than these selected ones does not substantially improve classification.

Regional difference factors are also important findings of the study. It is found that households living in Addis Ababa, Dire Dawa and Tigray have more demand for schooling than the other regions. For obvious reasons the case of Addis and Dire Dawa are not surprising. However in the case of Tigray , it may be that agriculture is not sufficiently affluent to support the family's income demand and hence the need to look for returns from education.

It can be inferred from the study that if more is done to make parents aware of the importance of education, especially its relationship to increasing income even in farming the observed low demand must improve. However more schools should be built to minimise the indirect cost of household, that is the time their children waste in travelling to school. To summarise, it can be concluded from the study that, the main constraints of schooling in Ethiopia are lack of parents' awareness, economic reasons and lack of access and supply to education in that order. Hence, it is wise for policy makers, as well as planners, to intervene in these areas.

Based on the observed results the investigator recommends the following:

1. There is a cultural perception that education is not important. However, if we make our society aware of the importance of education in changing the economic lives of the people, the situation will improve. In this regard the role of parents is very great. If parents are aware, they may be interested in participating in school affairs, they may restrict the full time labour demand of their children to after school time and delay early marriage, which is a key deterrent for females to go to school. This could be undertaken by promoting adult and distance education.

2. The lower demand observed in households living in rural areas is mainly because that households need their children's labour. This is evident from the significance of the variables, family size of school age people and distances to primary and secondary schools. Especially in rural schools, double shift systems have not been introduced. In the absence of a shift system it is unreasonable to accuse the household heads of not sending their children to school. However, if shift systems were introduced and more schools were built (which reduces students` wasted time in the double trip), parents could at last use their children's labour after school hours. The belief of the investigator is that it is not constructing big schools that improves the deterrent effect of distance. Rather, it is constructing a number of smaller schools (which could be self contained).

3. The number of schools available for the current school age population in Ethiopia is minimal. Rural households in particular are unwilling to send their children to school because of the costs incurred by the distance of schools. Hence more has to be done to construct schools in rural areas.

4. Defining the dependent variable on school demand analysis as binary is advantageous. It allows the use of different statistical methods. Hence future studies might use the same definition of the dependent variable.

To sum up, the results obtained from the study could well be used to provide policy makers with an early warning system. It might also help researchers to minimise cost of data collection as redundant and important variables variable are clearly identified here.

REFERENCES

Affifi, A. and Azen, S. P. (1979). Statistical Analysis: A Computer Oriented Approach. New York, Academic Press.

Agresti, A. (1996). An Introduction to Categorical Data Analysis. New York, John Wiley.

Aldrich, H. and Nelson, D. (1984). Linear Probability, Logit and Probit Models. Newbury Park, Sage Publications.

Birdsall, N. (1987). Demand for Primary Schooling in Rural Mali. Should User Fees be Increased? Washington D.C. World Bank.

Chernicovsky, D. (1985). Socio Economic and Demographic Aspects of School Enrollment and Attendance in Rural Botswana. Economic Development and Cultural Change, Vol 6, pp. 303-336.

Central Statistics Authority (1997). Household Income, Consumption and Expenditure Survey Report. Addis Ababa, Ethiopia.

Destafano, J. et al. (1993). The Demand for Primary Schooling in Rural Ethiopia. A Research Study, Addis Ababa. Addis Ababa, USAID.

Destafano, J. and Wilder (1992). Ethiopian Education Sector Review. Addis Ababa, USAID.

Draper, N. R. and Smith, H. (1996). Applied Regression Analysis. New York, John Wiley.

Fineberg, S. E. (1977). The Analysis of Cross-Classified Categorical Data, Cambridge. MIT Press.

Gertler, P. and Glewwe, P. (1989). The Willingness to Pay for Education in Developing Countries. Washington, D.C., World Bank.

Hair, J. F. et al. (1987). Multivariate Data Analysis. 2nd ed.

Hand, D. J. (1981). Discrimination and Classification. New York, John Wiley.

Harris, R. J. (1975). A Primer of Multivariate Statistics. New York, Academic Press.

Hawkins, D. M. (1982). Topics In Applied Multivariate Analysis. Cambridge, Cambridge University Press.

Hosmer, W. and Lemeshow, S. (1989). Applied Logistic Regression. New York, John Wiley.

John H. A and Forrest D. N (1984). Linear Probability, Logit, and Probit Models. Sage Publications.

Johnson, R. A. and Wicherin, D. W. (1992). Applied Multivariate Statistical Analysis. Prentice Hall, New York 3rd ed.

Maglad, N. E. A. (1994). School Supply, Family Background and Gender Specific School Enrolment and Attainment in Sudan. Eastern Africa Social Science Research Review, Vol. 10, No. 2, pp. 1-20.

Manly, F. J. (1986). Multivariate Statistical Methods A Primer. London, Chapman & Hall.

Mathewos Tamiru (1996). A Study of Factors Affecting Academic Achievement of Students at Bole Senior Secondary School. Unpublished Master's Thesis, Addis Ababa University.

Ministry of Education (1997). Education Statistics. Addis Ababa, Ethiopia.

Policy for Human Resources Development (1996). Household Demand for Schooling. Addis Ababa, Ethiopia.

Psacharopoulos, G. (1981). Returns to Education. An Updated International Comparison. Comparative Education, Vol. 17, pp. 321-314.

Singh, R. D. (1992). Under Investment, Low Economic Returns to Education, and the Schooling of Rural Children: Some Evidence from Brazil. Economic Development and Cultural Change, Vol. 40, No. 3, pp. 645 - 664.

Tan, J., Lee, K. and Mingat, A. (1984). User Charges for Education. The Ability and Willingness to Pay Malawi. Staff Paper No. 661, Washington D.C., World Bank

USAID (1993). Demand for Schooling in Rural Ethiopia. Addis Ababa, Ethiopia.

UNESCO (1993). Trends and Projections of Enrolment by Level of Education, by Age and by Sex. Paris, Division of Statistics.

UNESCO (1995). Trends and Projections of Enrolment by Level of Education, by Age and by Sex. Paris, Division of Statistics.

Wood Hall, M. and Psacharopoulos, G. (1995). Education for Development: An Analysis of Investment Choice, World Bank Publication, New York, Oxford University Press.

World Bank (1988). Education in Sub-Saharan Africa: Policies for Adjustment, Revitalisation & Expansion. Washington D.C.

ANNEXES

Table 1. Goodness of Fit Statistics for the Model

 

ENROL

P_ENROL

S_ENROL

-2 Log Likelihood (Initial)

726.91795

1087.1803

1392.901

-2 Log Likelihood (Final)

365.596

697.127

761.397

Goodness of Fit

464.314

1411.055

1348.552

Model Chi-Square

 

 

 

Chi-Square

361.322

390.054

631.504

Df

7

9

9

P-Value

.0000

.0000

.0000

Table 2. Percentage of Correct Classification

Group

ENROL

P_ENROL

S_ENROL

Not Enrolled

50.89%

47.42%

90.92%

Enrolled

97.31%

96.12%

73.45%

Overall

92.64%

86.80%

85.37%

Table 3. Wilks' Lambda and Univariate F-ratio

A) For ENROL

Variable

Wilks'Lambda

F

Significance

LNTEXP

.92869

159.2478

.0000

RET_EDU

.97629

50.3723

.0000

FAM_SIZE

.98857

23.9749

.0000

FS7_24

.96602

72.9591

.0000

PRNGR_BO

.99897

2.1318

.1444

HEAD_EDU

.76823

625.7290

.0000

HEAD_SEX

.99235

15.9879

.0001

FATH_EDU

.91232

199.3319

.0000

MOTH_EDU

.94839

112.8673

.0000

URB_RUR

.81924

457.6132

.0000

DIST_PRI

.92588

166.0279

.0000

DIST_SEC

.93350

147.7384

.0000

TIG_REG

.99477

10.9134

.0010

AFAR_REG

.99275

15.1445

.0001

AMHA_REG

.99981

.3984

.5280

OROM_REG

.99688

6.4896

.0109

BEN_GUM

.99915

1.7712

.1834

SOML_REG

.99974

.5338

.4651

GAMB_REG

.99882

2.4432

.1182

HARA_REG

.99989

.2264

.6342

DIRE_REG

.99880

2.4994

.1140

ADIS_REG

.96550

74.1000

.0000

B) For P-ENROL

Variable

Wilks' Lambda

F

Significance

LNTEXP

.93355

147.6191

.0000

RET_EDU

.98300

35.8580

.0000

FAM_SIZE

.93261

149.8753

.0000

PRNGR_BO

.99880

2.4938

.1145

HEAD_EDU

.87846

286.9489

.0000

HEAD_SEX

.99815

3.8467

.0500

FATH_EDU

.95925

88.1132

.0000

MOTH_EDU

.97039

63.2858

.0000

URB_RUR

.88162

278.4902

.0000

DIST_PRI

.95098

106.9030

.0000

TIG_REG

.99443

11.6186

.0007

AFAR_REG

.99613