1. Introduction

Given their constitutional and social significance, the balanced functioning of the courts is a priority. In addition to the legality of court decisions, one of the most reliable signs of this is the timeliness of court rulings. It is generally considered that one of the main reasons for the excessive duration of the proceedings is that the number of cases received is too high. As Stephen Reinhardt (1931–2018) Judge of the U.S. Court of Appeals summarily described in a similar situation: “You do not need long-range studies, extensive surveys, caseload measurement, or other bureaucratic techniques to learn the answer to our problem. There are simply far too few of us to do the job properly.”1 At the beginning of the development of modern judicial administration systems, the workload of judges was identified with the volume of caseload. Its measurement therefore was worldwide based solely on caseload, primarily on the number of cases received. We can call this a classical procedure, a “piece-by-piece” method of measuring the workload of judges, because from the point of view of workload – without any examination – all cases count as singular, i.e. implicitly or non- implicitly assumes that all cases are the same.

In my view, however, this premise is wrong. There is another important factor in terms of timeliness: the complexity of the cases. This also affects the workload of judges, as it can vary considerably from case to case. Thus, in fact, it is not merely the volume of the caseload, but also the workload of the judge are keys to the timeliness of the judgment: number of cases received + complexity of cases received → judicial workload → duration of proceedings → duration of judicial activity. Ensuring timeliness is therefore a complex task: it is not enough for the court administration to monitor caseload, but to measure, maintain and distribute the actual workload of judges at an appropriate level. In order to accurately measure the workload of judges, the complexity of the cases must also be taken into account, in addition to the number of them. The best method for this would be to use weights that express the complexity of the cases.

2. Problem

However, this is not an easy task. In view of this, the study of the functioning of the judiciary has been an important target of social science research for decades. However, the focus is not on the workload of the courts, but primarily on their performance and efficiency, because this has a direct social and economic impact, and international organisations (World Bank or the European Community) are urging national judicial reforms to take this into account.2 It is investigated in all continents (e.g. in Brazil,3 Sweden,4 Morocco,5 China6 etc.), through different groups of court cases (e.g. tax judiciary,7 labour litigation,8 criminal cases,9 medical malpractice cases10 etc.), using different methods. The most commonly used technique is DEA,11 but the SFA12 method has also been used and more recently the DDF13 technique. An attempt has also been made to compare the DEA and DDF methods.14 All procedures are based on comparing each organizational unit (court) on the basis of the output achieved per unit of input (usually number of judges, court staff, amount of budget spent, number of cases received). Efficiency has two dimensions, quantitative and qualitative. Therefore, in the case of the judiciary, the output considered may be the number of cases completed as a quantitative indicator or the duration of proceedings as a qualitative indicator.15 However, I am not aware of any research that has taken into account the complexity of cases when measuring efficiency.

The other part of the research goes one level deeper and tries to uncover the reasons behind poor performance, usually through some form of regression analysis. In doing so, a number of studies have shown the correlation of court performance with, among other things, judges’ salaries,16 qualifications,17 court size,18 geographical location, an implemented judicial reform19 and, last but not least, court caseload.20 Without disputing the importance of the other factors, I too considered the latter to be the most important factor, as I was interested in the workload of judges and its even distribution. Therefore, in my previous research, I myself have examined the implications of timeliness with court caseload. I have found that if the number of court cases received (hereinafter: cases) exceeds a certain amount, the number of cases resolved cannot keep up with it.21 Brazilian researchers have reached similar conclusions: an increase in court caseload generates an increase in the judge’s production, but this relationship is far more complex.22 The increase in the number of pending cases is leading to delays and a deterioration in timeliness. In the next phase of my research, I also demonstrated that the other factor of judicial workload, the complexity of cases, is indeed important, because it varies significantly across cases and the spatial distribution of cases with different complexity is not even.23 Foreign research has also come to this conclusion: raw case numbers alone do little to bridge the gap in timeliness, because in fact the judicial workload needs to be shared equally.24 Therefore, simply keeping the number of cases received at an appropriate level does not necessarily ensure a timely judgment.25 It is therefore also necessary to differentiate between cases and to weight them as well. This additional information could also make research on the efficiency of the courts more accurate.

Research into the possibility of case-weighted court workload measurement systems (hereinafter: case-weighting systems, CWS) and practical application of the results began in the United States in the late 1970s, and since then, the system, based on the principle developed, has been used in more than 35 countries.26 In continental Europe, the first analyses were conducted in Germany in the early 1970s and then in the Netherlands. By the 2000s, the issue of weighting had become an important subject of judicial research worldwide. The National Centre for State Courts (NCSC) alone has conducted more than 100 researches27 in at least 15 USA Member States.28 There has also been a lot of research in Europe, most recently in Belgium and Switzerland.29 Nowadays, a whole library and rapidly expanding international literature30 on judicial workload measurement have accumulated.

CWS assign weights to cases depending on the amount of judicial work required, i.e. the complexity of the case. Smaller or larger weights express, in absolute or relative terms, whether more or less time needs to be spent to conclude each case. In this context, the time required is not the time (of litigation) expressed in calendar months or days from the receipt of the case to its resolution, but the working time in hours or minutes spent on the case.31 Simply, CWSs convert the caseload into a workload.32 Weighted workload measurement systems are now widespread and are perhaps the best way to assess judicial workload and resource needs.33

I highlight four important features of CWS. On one hand, each is based on classifying cases into types based on their characteristics and complexity, depending on the sophistication of the system, and assigning weights to each type of case. On the other hand, the systems all try to capture the complexity of cases through their time demands. Thirdly, neither method gives weight to the cases retrospectively, when the cases are resolved, but in advance, when they are received. Fourthly, it follows that the determination of the weight is not a measurement, but in any case an estimate of the expected workload of the case, based on previous research and historical data.34 Unlike the “piece-by-piece” solutions, CWS are in any case more advanced: They do not give a “blind” estimate of a manifestly inaccurate approximation (i.e. all cases being of equal difficulty) of the expected workload of a given case, but they try to make a more reasonable estimate of them, using and analysing them – in a model-dependent varying range – on the basis of facts and data known at the filing of the case.

Researches have developed several methods for determining weights.35 Probably the best known and most commonly used method is to measure the amount of court time spent on all significant litigation activities specific to a given type of case. In the first step, the activities, that occur in each type of cases (case preparation, trial, sentencing, and post-completion) are determined, and then the frequency with which they occur. In the third step, the average time spent by judicial staff on each activity is measured by case type. The average time spent on all activities corresponds to the weight number for the given case type. Such methods are most recently referred to as “work sampling” (or “snap-shot sampling”, “occurrence sampling”, “multi moment analysis”).36

Another method is to measure the total time required for a given case. This means the amount of time spent by all judicial officials involved in the court event throughout the case. The essence of the method is that each judicial officer records the working time spent on each court event for a certain period of time (e.g. six months). From this the average time required can be calculated for a given type of case. This is the weight number for the particular case type. The disadvantage of this method is that the whole procedure is a “black box”: it is not possible to obtain detailed information on the workload of each sub-activity or work phase. It also requires strong cooperation from interested parties, but provides accurate data directly. However, the involvement of interested parties (or even uninterested ones) can also lead to distortions. These systems based empirical measurements are traditionally referred to as “time-study” procedures.37

The weight for each type of cases can be determined by estimating it, instead of measuring the working time spent on it. In one solution, the subjects of the questionnaire survey should provide the estimated time spent on each case type as a percentage of their total working time demand. The other method is the so-called Delphi Method.38 Case weights are first estimated by experts (judges or external experts). The organizers then review the results and the respondents can adjust their individual estimates based on the results in two or three rounds.39 The advantage of this method is that it gives case weights without a complicated, costly empirical survey. However, its weakness is that the wording of the questions and the way they are answered (free text or multiple choice) strongly influence the answers to them, forcing consensus and creating the illusion of precision, even though it is based on personal estimates. It is also often difficult for respondents to interpret whether the scale used, which is usually narrow between extreme values (e.g. 1 to 5 or 10), is linear or logarithmic and the results obtained are difficult to convert into working hours. At the same time, there is no doubt that the results of estimation-based solutions are often consistent with the results of other empirical studies.

The European Commission for the Efficiency of Justice (hereinafter: CEPEJ) assessed and presented Member States’ weighting practices in the framework of the “Evaluation of European judicial systems” project through a questionnaire completed in May 2019.40 Based on the results, it appears, that steps have been taken in most European countries to develop a CWS, as 23 of the 36 Member States use some method to determine the complexity of cases.41 The difference between the individual systems is reflected in the prioritization of the aspects taken into account and in determining the weights. However, most of the solutions used are limited by the classifying of cases and assigning cases to each group into equal categories, depending on the country, and assigning weights to each case group. Some weights are determined on the basis of professional experience (Delphi Method), others are based on empirical studies (methods based on the measurement of working time demand). The weights used may be time units or merely relative evaluation points, reference values and rates that cannot be converted to time units.42 See Figure 1 for a multi-aspect comparison of each model.43

Assessing case weighting models according to their accuracy, detail, and research resource demands
Figure 1 

Assessing case weighting models according to their accuracy, detail, and research resource demands.

Models can be considered more reliable, where the factors influencing the weights can be identified, and the determination of the weights is based on measurements rather than estimates. The more case types the model distinguishes, the more precise it will be.44 Any combination of components is conceivable, but any model can be classified and compared and evaluated based on the above considerations. Based on this, for example, a model classified as B + a + 1 can be considered more accurate and detailed, but at the same time requires more research than a model classified as C + b + 1.

In my view, however CWS, in their current forms, are far from perfect. Their most obvious disadvantage is that, for practical reasons, cases are classified into only a limited number of types, although they are very diverse. Cases cannot be classified in sufficient details on the basis of their subject matter alone – or by any other exclusive aspect. Consequently, the maximum number of weights that can be used in CWS is finite concerning the number of groups. Cases are weighted in ordinals or, at best, only on an interval scale.45 Another shortcoming of all methods is that the weights cannot be individualized: in fact, they do not relate exclusively to a particular case, but to a group of cases. Furthermore, especially for systems that use only relative evaluation points, the margin values of the weights are limited (e.g., 1 to 5). In addition, even the more sophisticated models do not take too many aspects into account, when determining weights. The relative weights preclude the definition of workload in absolute terms.

The numerous studies mentioned above prove beyond doubt that the efficiency of courts can be influenced not only by the workload but also by the characteristics of the judge (such as age, seniority, skills) or the court (staff, age and seniority of the staff, genre, skills and material conditions). However, the primary goal of my research was not to improve the accuracy of court performance measurement, but to improve the measurement of judicial workload alone by taking into account the complexity of cases and weighting cases appropriately. In terms of case weights, however, these mentioned factors are irrelevant, so I ignored them in this study.

3. Key questions

My empirical studies and statistical analysis on measuring the complexity of cases in 738 first instance criminal lawsuits (hereinafter: criminal lawsuits) concluded in 2010 have shown that the complexity of criminal cases can be measured retrospectively, based on the case file and quantified as working time costs.46 The working time demand ranged from 0.022 to 231.14 hours, with an average of 10.53 hours. Thus, I have objectively proved the professional opinion based on practical experience that – even within a group of cases – there is a very significant difference between the individual cases in terms of working time demands and complexity. The territorial distribution of criminal cases with different working time demand is not uniform, because, in terms of the average working time demand of criminal lawsuits there are differences between same level courts, with the same scope of authority, but with different jurisdiction,. This is certainly the case in other branches. Thus, I have also demonstrated by scientific methods that the use of CWS is essential to measure the workload of judges and to distribute cases evenly across branches, in a court and between courts. However, it is wrong, because it is an unjustified and rude simplification of reality, because the CWS’s common practice is only able to classify cases to a limited extent and to treat cases as the same, and to attach the same weight to them.

In view of the above-mentioned shortcomings of the CWS, the following questions have been created by me. How can the weighting of cases be individualized? How could a weight be added to each case based on its own characteristics, so that cases can be differentiated within each case type? How can be increased the level of measurement of the complexity of cases on this way and be measured it on a scale of proportions, while removing the margin values of case weights, especially the upper limit? How can we be sure of identifying the range of characteristics relevant to the complexity of cases, preferably in the initial documents,47 and quantifying their impact on complexity? How can absolute weights be created that ensure unrestricted comparability of cases? Can this increase the flexibility, reliability and accuracy of CWS?

Answering all these questions is important, because if working time demand as a dependent variable can be relevantly related to certain characteristics of initial documents as independent variables, regression analyses may be used to construct a prediction model, which allows the expected working time demand of the case to be estimated individually with satisfactory security as soon as it is received. In this way, a workload measurement system can be developed, and in the framework of which an algorithm derived from the pre-constructed prediction model can be individually estimated from the initial document characteristics of the given case, and its weight can be linked immediately. This time-based weight would measure the complexity of the cases received on a continuous scale, so it makes cases comparable indefinitely, regardless of the type of case, the court level, or even the legal system or the judicial administrative system.

Therefore, I have formulated the following questions.

  1. Is there any relation between working time demands of the cases and certain features of their initial documents?
  2. What are these characteristics and how strong is the correlation?
  3. Based on them, can a reliable prediction model be created, with the support of which the expected working time demand of incoming cases can be estimated in advance, with sufficient certainty?

4. Empirical methods

Based on the results of my research so far, I have already known the amount of working time demand spent on the conclusion of 738 criminal cases.48 Multiple linear regression studies require a sample of ten times the number of independent variables to be examined.49 So the sample, I already had, would have been of sufficient size for up to 73 relevant initial document characteristics. The next step was to identify the initial document characteristics related to the amount of working time demand measured in the examined criminal cases. To this end, even when designing the study, I compiled a list of initial document features that I considered relevant. After that, I carried out extensive consultation with professional judges in criminal branch. According to the professional consensus, the initial document characteristics, relevant to the expected working time demands in criminal lawsuits, are the following: the quantity and quality of investigation files,50 their appendices, the scope of the indictment,51 the fact of special procedural rules and legal institutions applicable in court proceedings, the number and other criteria of the accused, the number and nature of the accused acts, the number of witnesses, and the existence of the foreign elements. In order to record the required data at the same time, I edited questions about the initial document characteristics into the questionnaire, used during the determination of the working time demand. Blocks II-V of the questionnaire (questions 7–80) were used to record the data of a total of 34 types of initial document attributes. I prepared a detailed guide for completing the questionnaire, in order to standardize the data recording.52

Based on the recorded data, as a first step, I filtered out the initial variables that did not occur in the examined sample, or only once. The others were binomial, ordinal, or continuum variables, depending on the number of values they could take. First, I use hypothesis and relationship testing methods to examine, which of the initial document characteristics are actually related to the measured working time demand, and what type and strength of these relationships are. Then, I attempted to construct a multiple linear regression model, using the original initial documents features that proved relevant on this basis. The validity of the model was checked with model diagnostic tools. The significance level used in the researches was α = 0.05. I used the commercially available Microsoft Excel and Statistica software for the analysis.

5. Results

5.1. Relevance of initial document variables

Some of the initial documents variables could take only two values (e.g. whether the accused is in custody or on his/her own defence, whether special procedural rules are to be applied in the case or not, etc.). These were actually binomial variables. There were several initial documents variables, which in principle could have taken several values,53 but only two types of values occurred in the examined sample. Therefore, I also treated these as binomial variables.

In the first step, I examined, whether a difference could be detected in the average working time demand of cases depending on the value of each binomial initial document variable. Although most of the binomial variables had a significant effect on working time demand based on the descriptive statistics, the actual significance of the observed discrepancies was highly uncertain, because of the low number of cases in many court events. This was also confirmed by group box-plot diagrams of working time values. For clear answers, the non-parametric hypothesis studies (Mann-Whitney test) with non-normal distributions showed that only 10 of the 25 binomial variables, included in the questionnaire, had a significant difference in the amount of working time demand as a function of the group-forming variable. (whether the case is after repeal;54 whether a coercive measure should be decided;55 whether the case is anticipatory by law;56 whether the case is an arraignment;57 whether the case is privately prosecuted;58 whether the case was originally initiated without a hearing;59 whether the accused has been convicted of all acts (full confession); whether there is a group act in the indictment; or whether there is a concealment of assets60 among the acts charged, or the criminal form of misappropriation of funds.61 For the other 8 variables, there was a significant (±>2 hours) difference in the mean working time demand compared to the other group, the lack of significance was thus largely due to the extremely low number of cases.62 Given the lack of normality in the distribution of working time demand data, I also examined the correlation between working time demand and individual binomial initial document characteristics by using Kendall’s rank correlation test. According to the above, the binomial variables showed unsurprisingly one by one a weak or very weak (although in many cases strongly significant) correlation with working time demand. This suggests that these variables do not play a significant role in terms of working time demands. In view of this, I refrain from communicating the results of box-plot diagrams, descriptive statistics, and hypothesis and relationship tests for reasons of length. As an exception, I do this in Table 1. concerning the type of crime of concealment of assets, that later proved to be significant.

Table 1

The average value of the working time demand in cases of concealment of assets and other offences and the results of the hypothesis and relationship tests.


THE VALUE OF THE CHARACTERISTICS OF THE INITIAL DOCUMENT N MEAN OF THE WORKING TIME DEMAND (HOURS) MANN-WHITNEY TEST KENDALL TAU TEST


RANK SUM U Z ADJUSTED P-LEVEL KENDALL TAU Z p

concealment of assets: no 734 10.465 270185 440 –2.418 0.016 0.073 2.959 0.003

concealment of assets: yes 4 22.112 2506

The next group of examined initial document variables were those, where the number of recorded values was more than two, but not more than seven (ordinal variables). I performed a hypothesis test using the Kruskal-Wallis test, in order to assess the significance of the differences between the average working times demands measured in each group of variables. The results show that only changes in the number of professional issues, of experts called for interrogation, of civil law claims, of juvenile offenders, and of economic crimes and robberies cause significant differences between the average working time demand of each group. Using the Kendall’s rank correlation test, I also examined the correlations between each ordinal variable and working time demand. Based on the data, it was found, that the number of professional issues, involved in the lawsuit, is strong, the other ordinary variables in the initial documents are weak or very weak, but there is a significant and positive correlation with working time demand. I disregard the detailed communication of the above results in this case as well, because in the end, none of the variables proved to be significant.

The last group of initial document variables was the continuum variables. As a first step, I prepared descriptive statistics for these as well, the results of which are figured in Table 2. Normal analysis of the distribution of values showed that none of the variables showed a normal distribution (according to the Kolmogorov-Smirnov test p < 0.01, according to the Shapiro-Wilk W test p = 0.000). In view of this, the correlations between the raw data of the variables and the working time demand could be examined by the Kendall’s rank correlation test. The results are shown in Table 3. Based on this, it can be concluded, that, all contiguous initial document variables are highly significant (p = 0.000), with the exception of the number of other offenses prosecuted (p = 0.042), and it has a significantly stronger correlation (tau > 0.05 and <0.509 in absolute terms) with working time demand, than most binomial or ordinal variables.

Table 2

Descriptive statistics of continuum initial document variables.


THE NAME OF THE CHARACTERISTICS OF THE INITIAL DOCUMENT N MEAN CONFIDENCE –95% CONFIDENCE +95% MEDIAN MIN. MAX. LOWER QUARTILE UPPER QUARTILE STD. DEV. STD. ERROR SKEWNESS KURTOSIS

the scope of the indictment 738 2.587 2.379 2.795 2 0 38 1.3 2.8 2.878 0.106 5.656 45.382

the length of investigation files 738 113.387 101.796 124.978 65 0 1735 30 135 160.393 5.904 4.171 26.135

length of the appendices to the investigation file 738 0.190 0.035 0.346 0 0 50 0 0 2.152 0.079 18.839 406.394

number of expert opinions attached 738 0.599 0.502 0.696 0 0 16 0 1 1.348 0.050 5.482 42.869

the number of witnesses proposed to be heard 738 2.701 2.327 3.074 2 0 61 0 4 5.168 0.190 6.718 59.693

the number of exhibits 738 1.633 1.027 2.239 0 0 184 0 0 8.381 0.309 15.275 309.365

the number of accused 738 1.446 1.363 1.529 1 0 12 1 1 1.151 0.042 3.862 20.878

the number of acts charged 738 2.375 1.835 2.916 1 0 156 1 2 7.474 0.275 14.047 255.708

criminal offenses of fraud (section 318.(4)–(7) of CC.) 738 0.352 0.030 0.675 0 0 97 0 0 4.464 0.164 17.808 343.323

other crimes 738 0.836 0.676 0.996 1 0 31 0 1 2.219 0.082 9.599 108.579

crimes against public confidence (title III in chapter XVI. of CC.) 738 0.272 0.143 0.402 0 0 33 0 0 1.792 0.066 12.812 195.888

misdemeanours for offences against property (chapter XVIII. of CC.) 738 0.241 0.186 0.296 0 0 8 0 0 0.761 0.028 5.245 35.648

Table 3

Results of correlation studies between continuous initial document variables and working time demand.


THE NAME OF THE CHARACTERISTICS OF THE INITIAL DOCUMENT KENDALL TAU TEST

KENDALL TAU Z p

the scope of the indictment 0.407 16.542 0.000

the length of investigation files 0.509 20.676 0.000

length of the appendices to the investigation file 0.084 3.410 0.001

number of expert opinions attached 0.255 10.361 0.000

the number of witnesses proposed to be heard 0.421 17.123 0.000

the number of exhibits 0.230 9.358 0.000

number of accused 0.273 11.117 0.000

the number of acts charged 0.317 12.870 0.000

criminal offenses of fraud (section 318. (4)–(7) of CC.) 0.157 6.399 0.000

other crimes –0.050 –2.030 0.042

crimes against public confidence (title III in chapter XVI. of CC.) 0.085 3.458 0.001

misdemeanours for offences against property (chapter XVIII. of CC.) 0.101 4.118 0.000

To confirm the obtained results, I made point diagrams of the logarithmically transformed values (single or multiple) of each continuum initial documents variables and the single logarithmized values of the working time demand, and I also tried the curve fits. I also refrain from disclosing them for reasons of length. Although the Pearson correlation coefficient (also known as Pearson’s r), thus calculated, could not be directly compared with the Kendall’s tau coefficient values, the data obtained were still in good consistency with them.

According to the above, it can be concluded, that a significant part of the initial document variables examined can be more or less strongly, but mostly clearly significantly related to the amount of working time demand subsequently incurred in the case. Some features are rare or have a less significant effect, a larger sample would be needed to explore these exact relationships. Thus, the answer to the first two questions is yes, the working time demand is related to certain characteristics of the initial documents of the criminal lawsuits, and statistical methods can be used to identify these variables, and to determine the nature and strength of the relationship. This demonstrates, that during the pre-trial professional consultation, the judges basically delineated the research correctly, and the professional leaders who distribute cases among judges within a court appropriately determine the range of initial document characteristics to be considered in practice during case allocation, and the likely relevant initial document characteristics.

5.2. Regression model construction and its diagnostics

After that, only the third question remained unanswered. Which is the numerical relationship between working time demand variables that show a significant correlation? How reliably can be predicted the expected amount of working time demand by the set of initial document characteristics? I performed regression calculations for clarification. Given the seemingly relevant variables, the more or less linearizable relationship of the continuous variables to working time demand, and the close-to-normal distribution of logarithmically transformed working time data, the multiple linear regression method was possible to be used, for which I chose the GLM model of Statistica software. The regression model was based on the 371 cases remaining after filtering out the outstanding data.

After running the model with default settings, the program provided the result table reported in Table 4. It can be seen from this, that the initial document characteristics included in the model are very strongly (r = 0.791) related to the amount of working time demand, overall, the variability of working time demand is explained in 58.5% (r2 = 0.585), the correlation is strongly significant (p = 0.000).

Table 4

Scoreboard of multiple linear regression calculations between working time demand as a dependent variable and initial document characteristics as independent variables based on filtered data.


MULTIPLE MULTIPLE ADJUSTED SS df MS SS df MS F p

R R2 R2 Model Model Model Residual Residual Residual

0.791 0.625 0.585 52.530 70 0.750 31.531 660 0.048 15.708 0.000

Six initial document characteristics play a relevant role in the model: the length of the investigation file, the number of accused and experts proposed to be heard in the proceedings and the number and nature of the acts charged (p < 0.05). The strong effect of private or public prosecution of the proceedings and the weaker, but not significant effect of the scope of the indictment can also be demonstrated.

The role of the most important initial document characteristics in the model is summarized in Table 5. The number in the parameter column in the top row of the table, marked “intercept”, indicates the point of intersection of the regression line with the y-axis (a). The column of the table, called the “parameter”, contains the multipliers (b), calculated by the program for each variable in the initial document variables, which determine the slope of the regression line in the linear regression calculation formula. The values cannot be directly compared with each other, because each variable data has different dimension and magnitude. The standardized β value makes it possible to evaluate the relative weight and importance of each variable. Having the values (a) and (b) given by the program as a result, we are now able to calculate the expected working time demand for any case by substituting in the general formula of a multiple linear regression

y=a+b1*x1+b2*x2+.bn*xn

the value b1–n and a in Table 5, and the values of x1–n for a given case.

Where: y = the expected working time demand in the given case;

            x1–n = the numerical value of each of the initial document characteristics in the case (e.g. number of accused, number of investigation files pages, etc.)

            b1–n = computer-calculated multiplier for each initial document characteristic of the case (result of multiple linear regression calculations)

            a = constant (result of multiple linear regression calculations)

Table 5

The role of each initial document variable in the model constructed from the filtered data.


THE NAME OF THE CHARACTERISTICS OF THE INITIAL DOCUMENT LOG(WORKING TIME DEMAND+2)

PARAMETER t p BETA (ß)

intercept –0.072 –0.277 0.782

the scope of the indictment 10.176 1.714 0.087 0.066

the length of investigation files 0.179 7.775 0.000 0.318

length of the appendices to the investigation file 0.202 1.635 0.103 0.046

the number of witnesses proposed to be heard 0.201 4.466 0.000 0.157

the number of experts proposed to be heard 0.360 1.986 0.047 0.052

the number of exhibits 0.221 1.402 0.161 0.042

the number of accused 1.290 3.378 0.001 0.104

the number of acts charged 5.046 2.160 0.031 0.090

the number of criminal offenses of fraud (section 318. (4)–(7) of CC.) 1.878 1.616 0.107 0.052

the number of crimes against public confidence (title III in chapter XVI. of CC.) –0.206 –2.202 0.028 –0.063

the number of robberies (section 321. of CC.) 0.432 2.162 0.031 0.061

the case is privately prosecuted (Y/N) –0.125 –1.309 0.191 –0.279

the accused has been convicted of all acts (full confession) (Y/N) –0.131 –1.218 0.224 –0.346

concealment of assets (section 330 (1) of CC.) (Y/N) 0.031 0.280 0.779 0.013

criminal form of misappropriation of funds (section 319 (3) of CC.) (Y/N) –0.110 –1.204 0.229 –0.034

the case was originally initiated without a hearing (Y/N) –0.115 –1.363 0.173 –0.176

In order to check the regression model, I determined the estimated working time demand (or its log (y + 2) value) for each case included in the model using the appropriate function of the program. The difference between the factual data estimated by the model, and those I measured on the basis of the documents, is the residual, which essentially shows the rate of error in each case. According to the descriptive statistics of the residuals (logarithmically transformed values), the mean of the residues is 0, the standard deviation is 0.212, and the standard error is 0.008, because of the peculiarity of the linear regression method. The distribution is broadly symmetric (skewness = –0.272), however, it is more peaked, than normal (kurtosis = 2.015) and is not normally distributed, based on the tests (Kolmogorov-Smirnov d = 0.078, p < 0.01; Lilliefors p < 0.01; Shapiro-Wilk W = 0.968, p = 0.000) compared to the approximately normal distribution expected from the P-P plot diagram.

I also examined the constancy (homoscedasticity) of the variance of the variables. The result of the performed Levene’s test was significant (p in all cases <0.007) for all other independent variables, except the length of the investigation records. Rejecting the null hypothesis of homoscedasticity, we can conclude that the variables are heteroscedastic. It follows from all this, that the results of the tests used to verify the model, which require normality and homoscedasticity versus distribution, should be treated with caution.

Given the large number of independent variables, I also checked the rate of multicollinearity in the model. According to this, I calculated the variance inflation factor (VIF) for each independent variable and its reciprocal, the tolerance index (T). The rate of multicollinearity was considered strong and disturbing, when the VIF ≥ 2 and T ≤ 0.5, and very strong and harmful, when the VIF ≥ 5 and T ≤ 0.2, respectively. Based on this, a detrimental rate of multicollinearity was observed (VIF = 55.534, T = 0.018) among the independent variables included in the regression model at the private prosecution nature of the cases. Strong multicollinearity was also detected in the number of acts charged (VIF = 2.902, T = 0.345), the length of investigation files (VIF = 2.67, T = 0.371), the scope of the indictment (VIF = 2.363, T = 0.423), and the number of witnesses proposed to be heard in the proceedings (VIF = 2.067, T = 0.484). For the other variables, the degree of multicollinearity was weak (VIF <2, T> 0.5). In view of this, I left out the binomial independent variable, indicating the private prosecution nature of the case, from the final regression model.

I summarize the renewed model in Table 6. It can be seen that the 6 relevant initial document characteristics included jointly in the model are still very strongly (R = 0.77) related to the amount of working time demand, overall it explains the variability of working time demand in 57% (2 = 0.57), and the correlation is highly significant (p = 0.000).

Table 6

Final result table of multiple linear regression calculations between working time demand as a dependent variable and initial document characteristics as independent variables based on filtered data.


MULTIPLE MULTIPLE ADJUSTED SS df MS SS df MS F p

R R2 R2 Model Model Model Residual Residual Residual

0.770 0.592 0.570 49.778 37 1.345 34.283 693 0.049 27.195 0.000

The role of initial documents characteristics in the final model is summarized in Table 7. The model is now free of harmful multicollinearity for the remaining independent variables and is strong only for three relevant variables and weak for the others (VIF max. <2.89, T min. > 0.346).

The (logarithmic) standard deviation of the residuals is 0.217, and the standard error is 0.008, but the distribution is still not normal and heteroscedastic (Kolmogorov-Smirnov d = 0.073, p < 0.01; Lilliefors p < 0.01; Shapiro-Wilk W = 0.973, p = 0.000), despite the bar graph and P-P plot diagram in Figure 2. (the Levene’s test result is significant for all other relevant independent variables, except the scope of the indictment and the number of witnesses called for hearing (p < 0.001).

Table 7

The role of each initial document variable in the final model, based on the filtered data and the presence of multicollinearity.


THE NAME OF THE CHARACTERISTICS OF THE INITIAL DOCUMENT LOG(WORKING TIME DEMAND+2)

PARAMETER t p BETA (ß) T VIF

intercept –0.140 –0.707 0.480

the scope of the indictment 14.898 2.601 0.009 0.096 0.430 2.326

the length of investigation files 0.213 10.610 0.000 0.379 0.461 2.171

length of the appendices to the investigation file 0.183 1.518 0.129 0.041 0.800 1.250

the number of witnesses proposed to be heard 0.238 5.464 0.000 0.185 0.511 1.956

the number of experts proposed to be heard 0.268 1.511 0.131 0.039 0.888 1.127

the number of accused 1.140 3.036 0.002 0.092 0.640 1.563

the number of acts charged 5.135 2.209 0.028 0.091 0.346 2.890

the number of criminal offenses of fraud (section 318. (4)–(7) of CC.) 1.458 1.270 0.205 0.041 0.577 1.734

the number of crimes against public confidence (title III in chapter XVI. of CC.) –0.207 –2.231 0.026 –0.064 0.725 1.380

the number of robberies (section 321. of CC.) 0.308 1.619 0.106 0.044 0.804 1.243

concealment of assets (section 330 (1) of CC.) (Y/N) –0.141 –2.491 0.013 –0.061 0.975 1.026

criminal form of misappropriation of funds (section 319 (3) of CC.) (Y/N) 0.000 –1.254 0.210 –0.035 0.768 1.302

Bar graph and P-P plot diagram of the distribution of residuals
Figure 2 

Bar graph and P-P plot diagram of the distribution of residuals.

The expected working time demand in criminal lawsuits can be predicted with sufficient certainty on the basis of only 6 relevant initial document characteristics: the length of the indictment and investigation files, the accused and the witnesses to be heard in the proceedings and the number and nature of the accused acts. Further (minimum) 4 initial document characteristics (length of the appendices to the investigation file, number of experts requested to be heard in the proceedings, criminal offenses of fraud and robbery as an alleged offense) were identified, which, although not certain, are likely to have a significant impact on working time demand. It could have been found to be a relevant attribute in a study with larger sample.

Based on the values of the six relevant initial document characteristics (number of accused, number of investigation files pages, etc.), transformed logarithmically accordingly, the corresponding b values in the parameter column of Table 7. and the general formula already presented, the expected judicial working time demand can be calculated:

log (working time demand + 2) = –0.14 + 14.898 * log (log (log (log (log (scope of the indictment + 2) +1) + 1) +1) +1) + 0.213 * log (length of investigation documents + 2) + 0.238 * log (number of witnesses + 2) + 1.134 * log (log (number of accused + 2) + 1) + 5.135 * log (log (log (number of acts) +2) +1) +1) +1) – 0.207 * log (the number of crimes against public confidence + 2) – 0.141 * concealment of assets.

For example, if the prosecution charges on the basis of an 80-page investigation file and a 3-page indictment against an accused for an act other than a crime against public trust or a credit default (but other offenses such as theft or drunk driving) and does not propose witness evidence, the expected judicial working time demand for a court event will be 5.16 hours. However, if the case is based on a 1300-page investigation file and a 20-page indictment against 5 accused for 10 acts (e.g., a series of robberies committed in groups) and 10 witnesses are to be heard, the expected working time is already 40.17 hours.

6. Conclusions

6.1 Evaluation of the developed model

The question is, how the regression model can accurately estimate working time demand. Therefore, I performed the calculation again for each case examined. I determined the working time demand estimated by the regression model, which I compared with the (actual) working time demand I measured by the research. The usefulness of the applied multiple linear model is numerically confirmed by the data in Table 8.

Table 8

Comparison of the current “piece-by-piece” method and the regression model (based on filtered data, N = 731).


MEASURED DURING THE EXAMINATION OF THE FILES “PIECE-BY-PIECE” METHOD PREDICTED BY THE MODEL

sum of working time demand (hours) 7490.28 7490.28 6530.1

mean of working time demand (hours) 10.25 10.25 8.93

sum of residual (hours) 0 5781.86 3060.25

mean of residual (hours) 0 7.91 4.19

sum of residual/sum of working time demand (%) 0 77.19 46.86

One of the most important facts in the table is that the current practice (“piece-by-piece” method, all cases are equal) results in an error of 5781.86 hours in the allocation of 7490.28 hours of working time demand generated by 731 cases, an average of 7.91 hours per case! The error rate in % is even more frightening: 77.19% of the number of working hours allocated (100 * 5781.86/7490.28). This numerically also confirms the conclusion already drawn from the distribution of working time demand data that the measurement and distribution of workloads based on “piece-by-piece” is incorrect.63 The use of CWS is therefore necessary. However, even criminal lawsuits, that form a group of cases, should not be considered equal. CWS, based on the mere grouping of cases, are therefore unsatisfactory.

Comparing the current “piece-by-piece” method with the solution I developed, the first difference is that the regression model predicts on average 1.32 hours (10.25–8.93) less working time demand in each case. Thus, in 731 cases, total working time demand will also be reduced to 6530.1 hours from the actual 7490.28 hours. This may be caused by the fact that the model is not able to predict the amount of expected working time demand with sufficient certainty in more difficult cases and tends to underestimate it.

The other difference is, despite the above shortcomings of the developed regression model, that it still produces significantly better results overall. By applying it, the rate of error is significantly reduced: In the distribution of 6530.1 working hours, a total of 3060.25 hours were miscalculated, and on average only 4.19 hours per case. This means that the error rate decreases from the current 77.19% to 46.86% (100 * 3060.25/6530.1).

We can therefore state that the answer to the third question is yes: The expected working time demand required for criminal proceedings can be estimated with reasonable accuracy, based solely on certain easily recognizable features of the initial documents, but at least significantly better than the current “piece-by-piece” method. Regarding the developed method, it can be said that, like all other known CWS, it is a prediction method based on certain characteristics of the received criminal lawsuits: Upon receipt of the initial document, a weight number can be specified based on the data known. However, what it sets apart from all the other solutions I know, that it is not based on categorizing cases – more detailed or generous – but on a case-by-case basis, the system calculates its own weight for a given criminal event, based on its own 6 characteristics.

The procedure developed can be classified as B + a + 1 according to the classification used by the CEPEJ. According to the first aspect, the classification is B, because in my research, that underpins the method, when determining the working time demand of the cases, although I derived it from the measured data of the concluded documents, I undoubtedly determined the working time demand based on professional estimates. However, the case weights determined by the method can be considered extremely transparent, because starting from clearly defined attributes of cases and using a mathematical formula for the definition. Therefore in this respect the classification is: a. The uniqueness of the system is ensured by the algorithm, which is the soul of the model and takes into account all relevant factors. The specified weights represent the working time demand required for resolution, expressed in hours, thus time unit weights that provide unlimited comparability with all other cases. Thus, the method can be used not only to compare the workload of judges within the first instance criminal branch of district courts, but also, if properly adapted to other branches and applied, to compare the workload of judges universally across the entire court organization. The biggest innovation and uniqueness of the system – compared to any other system I know – is the essentially unlimited number of unit weights used, because it is statistically unlikely that all six underlying initial document characteristics are exactly the same in two cases. Every weight is unique. In view of this, the third criterion clearly falls into category 1.

6.2 The practical significance of the model

In view of this, I propose the following general design of the judicial workload measurement system, primarily for first instance litigations. The reason for this restriction is that the disproportionate workload in Hungary occurs primarily in litigation. Because of the longer time required for litigation, clients are really sensitive here due to the possible length of the procedure caused by the excessive workload. In addition, the complexity of non-litigation cases is likely to differ to a lesser extent.

The basic principle of the system operation is, that a program, integrated into the electronic register system (Judicial Integrated IT System, hereinafter: JIIS), generates a weight number for each incoming case, and records it in the system using an algorithm determined by coefficients using the regression model. Thus, the weight number is determined automatically at the time of registration in the JIIS or on the basis of the characteristics entered by the judicial officer conducting the registration, which, as described above, has an objectively verifiable influence on the actual working time demand of the case.

This requires IT development. Although a module for recording the weight number has already been installed in the JIIS in Hungary, for its operation in the way, I have proposed, it would be necessary for the attributes (currently six), on which the weight number calculation is based, to be fully recorded in the register. The number of accused, or the act, that is the subject of the charge, can already be extracted from the database, but the rest needs to be developed. The prosecution now also sends electronically the indictment and investigation files to the court, so a number of relevant factors can be recorded automatically.

The generated weight number is fixed and recorded invariably by the software until the case is concluded. Thus, in the future, the total weight of the cases can be also determined in addition to the number of cases at a judge or organizational unit at a given moment or how many cases representing a total weight were received or concluded in a given period. In this way, the system can ensure a central distribution of staff, now taking into account the weight of the cases, based on real proportionality and the data can also be used in the given organizational unit in the field of distributing cases among judges within a court or evaluation and control of judicial work.

In the course of development, it is also worth keeping in mind, that factors, that seem significant but, to the best of our knowledge, are probably irrelevant because of the small sample size, can be recorded in the system. There were several of these in the final model (see. Table 7). This would be useful, because in every branch the courts receive tens of thousands of submissions every year. Thus, in case of introducing the system in a short time, and instead of a few hundred samples taken during my research, at least two orders of magnitude more databases could be used containing tens of thousands of cases in each branch. In addition to the important (or considered important) initial document characteristics, the data, that form the basis of the subsequent determination of the working time demand for conclusion, can be recorded in the system and the linear regression model, which is the base of the weighting, could soon be recalculated and can be made orders of magnitude more reliable. A live system could be regularly re-weighted in the event of a change in substantive or procedural rules by running a new statistical analysis.

The judicial workload defined in the required judicial working hours will be very useful in practice, because it can be easily converted into judicial staff. This makes it easy to determine the staffing needs of individual departments or even the entire judicial organization. Thus, the most important resource of the judiciary, the required amount of human resources, can be precisely planned and optimally distributed among the organizational units.

Another benefit of weighting cases in the way I suggest is that it helps to measure the performance of judges and courts more accurately. Instead of the number of completed cases, the sum of the weights of completed cases should be taken into account in the future, and on this basis it will be possible to compare the performance of individual judges or courts in both absolute and relative terms. Foreign research has also come to this conclusion: raw case numbers alone do little to bridge the gap in timeliness, because in fact the judicial workload needs to be shared equally. Furthermore, the sum of the weights of completed cases can replace case numbers on the output side in research analyzing with DEA the effectiveness of courts, making them more robust.

7. Summary

Based on my research, I was able to identify the characteristics that determine the working time demands of criminal lawsuits by using statistical methods. Such features can already be identified in the initial documents. The characteristics of the initial document, found to be important, coincide well with the factors currently taken into account by the administrative managers who distribute cases among judges within a court and, according to their statement, routinely apply some weighting in practice.

The working time demand for criminal lawsuits can be loosely related to a number of features in the initial document, which together, however, already show a very strong correlation with it. This proves the legitimacy of CWS in itself, which are widely used in international practice. The relationship between working time demand and relevant starting document characteristics could be quantified and the constructed multiple linear regression model could be used to predict the expected working time demand with reasonable certainty for each criminal lawsuit based on the initial document alone. The novelty of the developed method is given by the fact, that with its help the strength and direction of the impact of each factor can be accurately quantify on complexity and working time demand. Thus, the accuracy of the estimate can be significantly improved compared to treating all criminal lawsuits as equal based on the “piece-by-piece” principle, which focuses only on the number of cases received. With this method, the weight of criminal lawsuits can be measured on a scale of proportions. I demonstrated, that the CWS, currently used and based on the grouping of cases, can also be surpassed, because the proposed method can also differentiate between cases within a case type. The resulting weight is time-based, so it provides unlimited comparability among different types of cases and courts.

The merit of the method is that its principle is general, so that, after appropriate adaptation, its application can be extended to other branches, to other categories of judicial staff, and even to other legal and judicial administration systems. This is particularly justified in branches, where the variance in the working time demand for cases is large and there are few cases close to the average. The target area is those branches, where the procedure is more bound and therefore the course of the procedure can be easierly predicted in advance.

The result of my research is therefore a theoretically grounded and proven model that meets international trends, but goes beyond them in some respects, and which can be applied in practice, developed, made more reliable and extended to the entire court administration system in a short time with small post-processing. It can be adapted to other staffing systems, both for branches and levels of judgment, and to other legal systems.