Benchmarking Labor Courts: An Efficiency Frontier Analysis

We aim to determine the average efficiency levels of the Argentine Labor Courts and their individual behavior with respect to the average. We also seek to establish the determinants of the relative efficiency levels of those courts. In so doing, we estimate a Data Envelopment Analysis efficiency frontier, and then analyze the efficiency score drivers and the Judiciary career incentives. Our sample comprises 80 courts during the period 2006–2012. Our findings show high levels of efficiency on average. Nonetheless, there is 9–12 percent room for improvement in the output with the same inputs on average, whether considering variable or constant returns, respectively. The efficiency results take caseloads and backlog into account. Given that no measure of capital input is used, this is a short-run analysis. The analysis of the efficiency scores with respect to its determinants shows that more variables, outside our sample, can help explain the variance in the scores.


Introduction
The efficiency of the judiciary is a first-order concern for citizens worldwide. Lengthy trials, costly systems and difficulties in the procedures raise apprehensions about the efficacy and efficiency of justice. For a meaningful diagnosis and intervention, it is relevant to analyze the differences in efficiency of individual courts and to identify best practices for benchmarking purposes. To properly benchmark it is necessary to measure outputs, inputs, quality and environmental (contextual) conditions.
Court efficiency can be defined as maximizing court output produced by court inputs holding court output quality constant. In the same vein, best practices can be defined as the fastest method that uses the least inputs to produce the highest quality output. 1 Additionally, performance can be measured over time to incorporate the dynamics and to evaluate reforms, according to Matsson et al. 2 An Efficiency Frontier Analysis can yield some indications on ways to improve relative efficiency. Remedies need a prior diagnosis, and these methods are appropriate for dealing with questions such as: what are the average efficiency levels of the Argentine Labor Courts and how do they behave individually with respect to the average? What are the determinants of the relative efficiency levels of those courts? The choice and implementation of remuneration or other rewards is expected to influence the results. Even when procedures (process technology) and career paths are regulated outside the courts; this kind of study provides the opportunity to discuss incentives "out of the box" of the judiciary. In this paper, we aim to respond to those questions.
In so doing, we estimate technical efficiency of Argentine Labor Courts applying a Data Envelopment Analysis methodology and using a database produced by the Judicial System of Argentina ("Poder Judicial de la Nación" or PJN for short).
The PJN compiles statistical information about all their branches and some recent efforts have been made to improve the quantity, quality and availability of the information. That information is comprehensive with respect to personnel and their characteristics (age, seniority, education, tenure and gender), cases (incoming, being resolved and resolved), etc. The available information identifies some inputs and their qualitative attributes, some possible outputs, as well as quality and environmental (contextual) information about some peculiarities of every court.
Unfortunately, it is not easy to study some jurisdictions (for instance Civil Courts) given the diverse nature of the cases, making an efficiency comparison difficult. In addition, the dissimilar procedures of different jurisdictions in a federal country introduce noise into the analysis. For these reasons, we limit our study to Labor Courts, where the "output" is more standardized and, hence, more comparable.
We have information on 80 Labor Courts for the period 2006-12 in a homogeneous database. Each court is taken as a decision unit which produces a specific number of sentences every year and in so doing uses inputs (mainly labor, including judges, degreed and non-degreed employees who work as secretaries, clerks, etc.). We do not have information concerning physical inputs, such as the number of computers, square meters of office space, software availability, so our analysis assumes they are constant in the period under analysis, implying that ours is a short-run assessment. Courts cannot control the inflow of the cases and the extent of available resources. Rules on staff appointment, remuneration and promotion (the incentive structure) are to a great extent exogenous to the courts. Thus, judges have limited ability to encourage the staff to increase efficiency and productivity: they cannot change the career rules, which are exogenously determined.
The paper is organized as follows. Section 2 presents the literature review on court efficiency; Section 3 explains the methodology and data employed; Section 4 describes the findings; it shows the different models that were estimated, the numerical results and inferences; and finally, Section 5 concludes.

Literature Review
This section provides the background and context of the study with a focus on the variables used in efficiency frontier studies of the judiciary.
The outcomes of the service of justice provision are resolved cases; sentences and other forms of dispute resolutions (agreements, mediations, decisions transferred to other jurisdictions, etc.).
The production of justice services is labor-intensive. The most commonly used variables for inputs (see Table 1) are the number of personnel working in the courts; distinguishing between different types, if data are available; some indication of other resources devoted to the service (square meters of office space, computers, etc., can also be included if data are accessible); and the caseload (stock from preceding periods, plus new pending cases), which is the raw material of the task and can be understood as the demand for justice services.
Judiciaries can be considered internal labor markets in which the main incentives are derived from career opportunities. Age, seniority, tenure and gender of staff or judges can drive court efficiency. Espasa and Esteller-Moré 3 suggest that the average efficiency of the staff tends to increase with time, and they provide evidence that hiring temporary workers is significantly less effective compared to tenured employees. Some qualitative aspects can enrich the interpretation of the value added by each input, such as the educational level of the judges, 4 the prospects for the promotion of judges and staff  the judges to administrative issues. 5 Beyond the labor inputs, another important issue is the delay in solving cases. The backlog and the timeliness of the decisions can explain the diligence and quality of the tasks' completion. 6 Additionally, the proportion of appeals reveals the quality of the judges' sentences.
To the outputs, inputs and quality variables (the last being in some respects under the control of the courts), we can add environmental drivers. The distinguishing feature of these variables is their uncontrollable condition by the courts.
Different jurisdictions (criminal, civil, labor, etc.) or different cases (a robbery or an assassination; a divorce or an inheritance; an executive or a blue collar severance payment, etc.) within the same jurisdiction can demand more or fewer resources (staff, time, procedures, etc.), and thus heterogeneity is present. We address it by focusing on the Labor jurisdiction, whose cases are more standardized than in other jurisdictions, such as criminal or civil courts.
Efficiency can be also affected by external and social factors. For instance, Gorman and Ruggiero 7 examine the staff efficiency of public prosecutors' offices and find it low in lowincome level US counties with a higher minority population. This is due partly to more complicated cases in those counties, and people's lack of willingness to cooperate and resolve cases. In the same vein, Kittelsen and Førsund 8 find that the performance of multifunctional rural courts differs from specialized urban courts in Norway.
The units of measurement can be physical (e.g. the number of full-time equivalent staff, or square meters of office space) or monetary factors (e.g. expenditures on salaries or on other resources). Some of the studies that evaluated judicial efficiency have been carried out in the context of judicial reform undertaken by different countries, such as Sweden, the Netherlands or Italy (Hagsted and Proos, 9 Djafari, 10 Falavigna. et al., 11 Finocchiaro Castro and Guccio 12 ). The relationship between efficiency and court size is a critical aspect to explore. Guzowska and Strak 13 find that the inefficiency of several offices can be attributed to the inappropriate scale of operations.
After this examination of the most influential publications, we conducted a detailed examination of the empirical literature, along with some conceptual papers, and concentrated on 30 empirical ones. We organized them chronologically and then we examined their purpose and main findings. Table A1 in Appendix A presents an exhaustive summary of the empirical literature on court efficiency. It is interesting that theory indicates with precision the presence and influence of certain determinants of the outcomes: it is expected that physical or monetary and human resources increase outcome. Nevertheless, in the empirical work a lot of contextual issues can influence the results, and theory does not provide in advance neither the importance nor the direction of the influence. Consequently, it depends on the context. For example, the age of the judges is correlated with their experience, and one is tempted to assume a positive influence on productivity based on this, instead it can generate incentives to be more conservative and prudent in the decisions and affect the rhythm of production. The same is valid for the interpretation of the role of temporary versus tenured employees: temporary would be more inexpert and their productivity being low, or they would be interested in gain tenure and their efficiency being high. On these issues, the context could be very specific and thus is not strange to have apparent contradictions in empirical results among different studies.
Initial contributions pointed to efficiency assessment in different contexts. We began with Lewin et al. 14 who study the efficiency of criminal courts and judicial districts in North Carolina (USA), finding an inefficient judiciary at both the district and county level. Kittelsen and Førsund 15 explore Norwegian district courts' efficiency, finding that most of the inefficiency was scale inefficiency with inefficient courts being smaller on average. Tulkens 16 evaluates courts' productivity and backlogs in Belgium, determining that the courts' efficiency has room for improvement. Pedraja-Chaparro and Salinas-Jiménez 17 provide a measure of technical efficiency of the administrative litigation division of Spanish high courts; both the efficiency and avoidable delays have been calculated. Pedraja-Chaparro and Salinas-Jiménez 18 assess the efficiency of the administrative litigation division of Spanish high courts, finding considerable scope for improvement. Bhattacharya and Smyth 19 study the relationship between aging and productivity for a sample of retired judges. They find that productivity increase peaks and then declines when nearing retirement. Beenstock and Haitovsky 20 study efficiency in Israeli courts. They find that judges complete more sentences when facing heavy caseloads and courts complete fewer cases when new judges are appointed. Marselli and Vannini 21 examine the efficiency of the Italian district courts. They observe that low efficiency and productivity increase across time, mostly due to technical change. Schneider 22 analyzes how judges' education and careers affect the courts' productivity and point out that caseload inclusion is necessary to avoid underestimating productivity. They find a positive and significant relationship between judicial efficiency and salaries. Also, judges with an ex ante high likelihood of promotion are deemed less productive.
Recently, some Judiciary systems were reformed, making room for more efficiency analysis possible, in this case, about the before and after of court mergers. Hagstedt and Proos 23 develop an efficiency analysis after Swedish district courts were restructured. They study whether efficiency improved after court reduction and find an overall increase in efficiency with many units operating at decreasing returns to scale. Guzowska and Strack 24 research the efficiency of public prosecution organizational units, as well as returns to scale; potential savings are determined. Nissi and Rapposelli 25 analyze the productive efficiency of the Italian Courts of Appeal. They identify best practices and document diseconomies of scale. Elbialy and García Rubio 26 analyze the performance of first instance courts, differentiating between civil and criminal jurisdictions. They show that the civil courts are relatively inefficient. This result could be influenced by the higher degree of complexity in civil cases.
The incentive factor for staff and the careers of judges is also a promising field of study for efficiency. Espasa and Esteller-Moré 27 address inefficiency in the justice administration in Spain and its drivers. They observe that the greater the percentage of temporary judges, the lower the efficiency of the courts is. Yeung and Azevedo 28 evaluate efficiency in Brazilian state courts, finding considerable efficiency variations across courts, depending on their internal management and organization. Deyneli 29 determines the relationship between justice efficiency and judges' salaries in 22 different European countries. A positive and significant relationship exists between justice efficiency and judges' salaries. Dimitrova-Grajzl et al. 30 examine the performance of lower court judges in Slovenia, documenting possible tradeoffs between quantity and quality of case resolution.
Heterogeneity matters and court practices can influence efficiency. Ferrandino 31 estimates the efficiency of criminal courts in the USA, finding only a part of the courts act efficiently. Odhiambo 32 tracks the technical efficiency of the Kenyan Judiciary (first instance courts) after a reform, observing technical efficiency improvement. Santos and Amado 33 measure the incidence of scale factors on efficiency and the need for reform in Portugal. They show that size affects efficiency: small courts are inefficient, suggesting they be closed. Falavigna et al. 34 analyze the impact of structural changes on Italian district courts. They determine that the role of judges is correlated with court productivity and efficiency. Major 35 addresses the efficiency of Polish courts in Cracow. There is concern over backlogs and a determination to shorten the queue of pending cases. Ippoliti and Ramello 36 measure efficiency in the Italian court system and try to identify the main drivers of efficiency of (part-time) tax judges. They find that by maximizing utility judges, their opportunity costs determine the time devoted to the Judiciary and to private activity. 24  Among the more recent studies, Ferro et al. 37 estimate efficiency in criminal courts and drivers of the judiciary careers in Argentina's federal courts. They find that caseload is an important environmental variable and that surrogate judges and temporary staff are more efficient on average than tenured judges and staff. Finocchiaro-Castro and Guccio 38 assess the financial efficiency gains after the merging of Italian courts to increase their size; finding low efficiency and few fully-efficient courts. Fusco et al. 39 analyze the efficiency of Italian judicial districts, observing technical efficiency to be consistent and stable. Mattsson et al. 40 study Total Factor Productivity (TFP) in Swedish courts, showing considerable scope for improvement and indicating that less efficient courts can catch up to more efficient ones. Rushid 41 measures Swedish district courts' technical efficiency in a merger context. The majority of the courts that were shut down registered the lowest efficiency scores. Finally, Yeung 42 estimates the efficiency of Brazilian judiciary and its dynamism in recent years. The results are useful for evaluating recent discussions on judiciary efficiency.

Method, Database and Models
There are two approaches of techniques to measure relative efficiency through frontier estimation: parametric (regression based), and non-parametric (mathematical programming). Efficiency refers to the ability of decision units (courts) to maximize output (sentences or other forms of dispute settlement) given inputs (human and non-human resources), as well as to minimize costs, maximize revenues, or maximize profits, if applicable. conditional on the existing technology.
The non-parametric methods model the productive process, but they do not estimate the parameters of a function, the most common being the Data Envelopment Analysis (DEA) method. This seeks to determine which units form an envelope surface with respect to the sample and characterizes the set of efficient producers as those on the frontier. Inefficiency is measured in terms of how far each observation deviates from the most efficient "peers". DEA yields scores between 0 (totally inefficient) and 1 (completely efficient). As a deterministic model, all distance from every decision unit to the frontier is deemed inefficient, not considering randomness or statistical noise separately.
There are two main kinds of envelopment surfaces: one assumes constant returns to scale 43 while the other assumes variable returns to scale. 44 Technical efficiency DEA models can also be input-oriented, output-oriented, or not oriented at all. They differ in the direction each unit's distance from the frontier is measured.
Productivity refers to changes in technology over time, such that a decision unit can generate more output with a given amount of inputs (technical progress) or less output with a given amount of inputs (technical regress). When efficiency is studied in different periods, the productivity change of each unit can be decomposed as catching-up to the frontier (technical change) plus the shifting of the latter (efficiency change) through a Malmquist index. But it assumes Constant Returns to Scale, which can be a restrictive assumption in some contexts. 37 Ferro G., Romero, C. and Romero-Gómez E., 'Efficient Courts? A Frontier Performance Assessment' (2018) 25 (9) Benchmarking: An International Journal 3443-3458. 38

Method
We employ DEA efficiency models to assess efficiency levels. DEA evaluates the relative efficiency of n decision units. Each one employs m inputs and r outputs. 45 The following diagram shows the information requirements to estimate the operative efficiency of n courts, delivering r different outcomes (outputs) and using m different resources (inputs). Input and output data allows to compare efficiency of each court with respect to the whole sample. DEA was originally developed to study relative efficiency in out-of-the-market activities (such as those of the public sector), where multiple outputs are delivered using multiple inputs to provide them, and where the way as which inputs are transformed in outputs (the productive process) is not clearly known.
The departing point is a productivity indicator. In a world of one input and one output (such as employees and sentences), it is possible to define a clear ranking of efficiency using the quotient between output and input (average productivity of the input in terms of the output produced; e.g. sentences by employee). Instead, if more than one input exist, results can be contradictory: productivity of a court can be superior than productivity of a peer when the comparison is made with input 1, but inferior when comparing the quotient between output and input 2. A synthetic measure is needed. A Total Factor Productivity index provides the synthesis; it is defined as a quotient of (weighted) sum of outputs divided by the (weighted) sum of inputs. The key elements are the weights, since efficient units are using different weights than inefficient ones. DEA calculates the weights of each output and each input implicit in the information of each decision unit (court). Therefore, each court performance is compared with one or more "peers". And inefficiency is determined on the basis of the comparison with the best practice (frontier). Technically, the linear programming problem solved by DEA is an optimization process of each court subject to a set of constraints.
The input vector of decision unit i, x i , is thus m dimensional, and its output vector, y i , is r dimensional. For each i, an optimization problem is solved to find the optimal weights of the inputs, v i and the optimal weights of the outputs, u i , all non-negative, which maximize the ratio between the sum of the weighted output and the sum of the weighted inputs, while securing that all relative efficiencies are less than or equal to 1.
The models can be formulated as n linear programming problems. In the output-oriented specification, the sum of the weighted outputs is maximized by one unit of weighted inputs. In an input-oriented approach, instead, the sum of the weighted inputs is minimized per one unit of weighted outputs.
Given that court authorities have limited control over the inputs and that they normally manage outputs, we use an output-oriented model. Assuming constant returns to scale (CRS), the model can be characterized as follows: , : ; Where for each court i = 1, …, N there is an output vector y i and an input vector x i . Y and X are the corresponding matrices for outputs and inputs representing the data for the N courts. This problem is solved N times for each of the courts in the sample, yielding the level of technical efficiency for each court i. Technical inefficiency is measured as the possible proportional increase in output while maintaining fixed levels of inputs. The δ variable is greater than or equal to 1 and 1/δ = θ is the efficiency score. Assuming variable returns to scale (VRS) enables us to distinguish between purely technical inefficiency and production scale inefficiency. Adding a convexity restriction, eλ = 1, to the CRS problem, we obtain the following model: , : ; Where e is a row vector with all its elements equal to 1. Under the VRS assumption, we only compare efficiency between courts of a similar scale. As a result, some courts that were inefficient under the assumption of CRS can be efficient under VRS.

Database
The above literature review highlighted the inputs that help explain court output; staff (differentiating types, if possible), non-human resources, caseload, while in terms of quality, backlog and some indication of the length of trials seem like sensible options. For a second stage, the DEA scores can be regressed against the characteristics of workers, such as age, seniority, their squares, the condition of temporary or tenured, gender, and the variables that offer some information about judges' heterogeneity (See Diagram 1).
In this case, the database provides the following possibilities for the inputs in the production function of courts; personnel, discriminating between judges and the rest of the court staff; and two categories of the latter by qualification; those who hold a college degree and those who do not. We do not have information on non-human resources. For the second stage, we have information about the age and seniority of the staff, gender; average time to promotion; and the condition of temporary or tenured personnel (a temporary judge is known as surrogate, sometimes a tenured judge from another court). The squares of age and seniority of both personnel and judges yield some indications about the increasing, decreasing or linear effect of those variables on output.
Concerning the output, the database contains information about Existent Cases, Incoming Cases, Re-Incoming Cases, Out-Going Cases, Stopped Cases, Sentences and Caseload (Stock). Every period, a court starts with a stock of cases, which can be understood as the raw material of the productive process. Those cases can end up as Out-Going Cases. Incoming and Diagram 1: Informative needs to estimate operative efficiency in courts of justice. Source: Own Elaboration. Re-Incoming Cases are filed during the period. Of the Out-Going cases, those resolved are the ones with Sentences, and that will be our output. We built two qualitative variables: one establishes the average age, in days, of the cases in a court until they are sentenced (Backlog); the other one measures the proportion of sentenced cases that are Appealed.
In Table 1 each database variable and its unit of measurement are listed and described. There are variables to design outputs, inputs (personnel and their characteristics of type, qualifications, age, seniority, promotion, gender, the condition of temporary or tenured, and the unfinished cases). Table 2 presents the descriptive statistics of the database. There are 80 courts in our sample, with data for seven years (2006-2012), totaling 560 for most of the variables (some observations are missing in some variables).
The average Resolved Cases (Sentenced) is 200, with a standard deviation of 49. The smaller court produced 82 and the larger yielded 541 sentences. On average, each court has a staff of Source: Own Elaboration based on data provided by the PJN. 12 (7 professionals and 5 non-professionals, with mean seniorities of 241 and 188 months, respectively, that is 20 and 15 years). Judges are on average older and with more seniority than the personnel (704 and 349 months, respectively, or 58 and 29 years). Some 27 percent of the judges are Surrogate while on average 81 percent of the personnel are tenured. The mean time to promotion is 108 months (11 years) and 65 percent of the personnel are female. The average sentence takes 511 days, with a standard deviation of 167 days, a minimum value of 169 and a maximum one of 1029. A simple measure of average productivity of personnel is 17.56 sentences per employee, with a standard deviation of 8, a minimum of 7.5, and a maximum of 64.6.

The Models
After presenting the database we would like to relate outputs to inputs. Here, the methodological options are to run a model, which assumes that scale does not matter, and to run another model, which assumes that scale can have some influence on efficiency. Another methodological decision is concerned with whether courts have decision power on output or on inputs. In the first alternative ("output oriented"), courts cannot control inputs (for example, they cannot fire employees); in the second alternative ("input oriented"), the court would have complete discretion on inputs. Normally, the first alternative is more reasonable in the public sector (obligation to perform certain tasks but limitation over inputs, notably labor force).
We estimate two models termed A1 and B1 estimating CRS and VRS versions. The output variables are SENTENCED (cases), the BACKLOG (or average time length to settle) and APPEALED (proportion of appealed sentences). The inputs are PROF, NOPROF and CASELOAD (proxies for skilled and non-skilled labor, respectively, and for the raw material). We also estimated alternative specifications using out-going cases to consider that not all trials end with a sentence; nevertheless, the results are almost the same as in our main model. We estimate output-oriented models, exploring CRS and VRS versions and testing returns to scale with a Kolmogorov-Smirnov Test. 46 The results are tentatively sensitive to some characteristics of the labor force: its age, gender, seniority, time from most recent promotion, and temporary or tenured condition, plus age, seniority and temporary or tenured condition for the judge in charge of the court, and we test those characteristics in the second stage, after estimating the DEA models.
It is important to highlight some features of the judiciary career in the country under study. First, temporary personnel are by definition non-tenured and subject to dismissal by not renewing their contracts, while permanent personnel are tenured and cannot be fired (except for a felony). Second, judges can be surrogate or tenured (lifelong if appointed before 1994, when the federal constitution was amended and established a mandatory age of retirement at 75, intended for judges appointed after the amendment). Third, the age of retirement for the personnel is 65 for male and 60 for female.

Efficiency
The estimates show a consistent pattern. The CRS model A1 has lower average efficiency scores than the VRS one B1, with higher deviation, and a fewer number of efficient courts ( Table 3). The evolution of efficiency scores across the periods does not show any stark variations. Table 5 groups the results for the period as a whole, which makes it possible to corroborate the impressions given for the yearly comparisons. We also add the statistics of the variable PRODUCTIVITYSTAFF which is simply the ratio SENTENCED divided by staff (PROF+NOPROF). Table 4 also shows the correlation between the efficiency scores of the different models. The correlation between A1 and B1 models is 0.68. The correlation between the staff productivity measure with the two models is low, from 0.2 with model A1 to 0.4 with model B1.
Even when CRS and VRS models' estimates are available, the technology of production does not have significant sources of scale economies. We could confirm this in communications with practitioners in the sector. The quotient between B1 and A1 models' efficiency scores is 0.95, on average, another indication that scale economies are not present. A third corroboration is the performance of a Kolmogorov-Smirnov Test which we used to test returns to scale. At a significance level of 5%, the null hypothesis of CRS was rejected in some years of the sample (2007, 2008, 2009 and 2012). Therefore, it is not possible to conclude that the VRS model better represents the phenomenon under study in each year. Thus, the CRS model was selected as a reasonable representation of the productive process and we analyze their efficiency scores below. Table 5 enables us to characterize the courts by quartiles, considering the period as a whole. It simply consists in grouping the courts in order from more efficient to less efficient and to divide the list in fur groups, each one containing a 25 percent of the courts. The first quartile    is the upper 25 percent efficient courts. On average, the most efficient courts have more caseloads to resolve. It took them more time to settle the cases; they received more appeals and had fewer sentenced cases on average than quartiles 2 and 3 (but not fewer than quartile 4). Since they work with lower PROF and NOPROF, the productivity of staff (sentenced over staff) is higher than in the rest of the quartiles.

Second Stage DEA: Determinants of Efficiency Scores
Until now our results had determined the relative efficiency level of each court and attached to every court a number (1 in the case of those courts in the frontier of best practices; those with which their inputs do most. Less than 1 to relative inefficient courts; those with which their inputs do less than their peers in the frontier). It's time to investigate whether in our sample we can find statistical relationship among those efficiency scores and some characteristics of the inputs and of the environment. That is the purpose of this second stage, which is econometric; an intent to detect the statistical determinants of the efficiency scores.
In the second stage, we take the DEA models and use a fixed-effects panel data model to analyze the main factors affecting efficiency in Labor Courts.
The model can be represented in the following way: where Y it represents the efficiency scores obtained as the result of using DEA (depent variable), x ki represents the k explanatory variables (independent variables), α i is the time invariant fixed effect for each entity, u t is the control for fixed effects by year and ε it is the error for each entity annually. The rationale for employing this model is to control for possible time or court specific effects. The independent variables considered in the model are: the age and seniority of the staff and its squares, the age and seniority of the judge and its squares, the percentage of tenured personnel, the percentage of female personnel and a dummy variable for surrogate judges. Table 6 shows the results of our estimations. The statistically significant variables are Senjudge, Agejudge, Agejudgesq and Surrogate, all related to judges. The model of A1 scores depends negatively on SENJUDGE and positively on the AGEJUDGE, although in this latter case the effect is decreasing, as shown by the square term. Efficiency is also positively affected by the presence of Surrogate judges. This could be explained as a result of surrogate judges having more incentives to be efficient in order to obtain a tenured position in the future. Table 7 presents quartiles of courts by efficiency scores, and we relate them to the variables of the regression model. With respect to those that are significant in the latter, AGEJUDGE, whose sign is positive, shows a decreasing average in successive quartiles, while the opposite is true for SENJUDGE (at least in the first quartile). The squared terms exhibit consistently decreasing patterns when the sign is negative.

Conclusion
We aimed to answer two questions about the technical efficiency of Labor Courts in Argentina: What are the average efficiency levels of the Argentine Labor Courts and how do they behave individually with respect to the average? What determines the relative efficiency levels of those courts? To respond to these questions, we used DEA models (CRS and VRS versions) to estimate the relative efficiency of a sample of 80 Labor Courts for the period 2006-2012. The estimates measure efficiency scores using the sentences, the average time to finish a case and the proportion of sentences that are appealed, against the inputs personnel with and without college degrees and caseload (the raw material of the process) as outputs. The results yield a mean efficiency of 0.87 and 0.91 in the CRS and VRS models, respectively, with standard deviations of 0.07 and 0.06. On average, 5.6 and 12.8 decision units were efficient according to each model. We also estimated alternative specifications using outgoing cases to recognize that not all trials end with a sentence, but the results are almost the same as in our main model. Given that no measure of capital input is used, this is a short-run analysis. The CRS models are preferable because of the apparent lack of economies of scale, according to communications with practitioners in the sector and statistical evidence.
We grouped the results by quartiles of efficiency scores, finding that the most efficient on average have more caseloads and appeals, their sentences demand more time and they settle fewer cases, using fewer resources consistently and exhibiting a higher mean labor productivity.
In the second stage, we regressed the efficiency scores of the CRS model against their possible drivers. Efficiency scores increase with the age of the judge and the existence of surrogate judges. On the other hand, the square of the age of the judge and the seniority of the judge reduce linearly the efficiency scores. The proportion of the variance of the efficiency scores explained in the model is low (only 0.23). Thus, the analysis of the efficiency scores with respect to its determinants suggests that more variables than those present in our database play a role. Our assertion is connected with the low R square of the regression. There are subtleties of the data which cannot be properly isolate. For example, we can distinguish personnel with college degrees. Nevertheless, the database does not give details whether the employee is a lawyer (probably correlated with specific knowledge and efficiency) or a staff member with a bachelors in political science or liberal arts (probably, but not for sure, uncorrelated with specific knowledge and efficiency). Also, if we would select variables to be included for testing, our candidates would be the stages of the judiciary career, its relationship with achievements or merits, the role of time in promotions, the salary scale of the different career stages, and so forth.
An analysis of efficient scores by quartile shows congruency between the average absolute value of the variables and the signs of the regression. Most efficient courts have older judges (not necessarily more senior in their jobs) and more surrogate judges than quartiles 3 to 4.