Joint Programming Initiative

More Years, Better Lives

The Potential and Challenges of Demographic Change

Continuous Working Life Sample 2004-2012 (MCVL)
Muestra Continua de Vida Laboral 2004-2012 (MCVL)

Topic
Work and Productivity
Social Systems and Welfare
Relevance for this Topic
Country Spain
URL
More Topics

Governance

Contact information

General Directorate for the Organization of the Social Security (DGOSS)
General Directorate for the Organization of the Social Security (DGOSS)
Jorge Juan 59
28001 Madrid
Spain
Phone: 91 363 2969
Email: FIPOR.SOCIAL.MTIN(at)seg-social.es
Url: www.seg-social.es/.../index.htm

Timeliness, transparency

There are five months between the end of the reference period (December) and when the microdata is distributed to users (May).

Type of data


Registry

Type of Study


Administrative data on working life, which is the basis for this sample.

Data gathering method

Registries

Type of data


Registry

Type of Study


Administrative data on working life, which is the basis for this sample.

Data gathering method

Registries


Access to data


Besides some very general results (http://www.seg-social.es/Internet_1/Estadistica/Est/Muestra_Continua_de_Vidas_Laborales/Algunos_resultados_generales/index.htm), microdata can be requested using this form, available here: http://www.seg-social.es/Internet_1/Estadistica/Est/Muestra_Continua_de_Vidas_Laborales/SolicitarM/index.htm#documentoPDF A single accessibility agreement must be signed, just to identify the user, his/her affiliation and research interests. A guideline document for users is also provided, paying attention to the large amount of data from the Sample (over 59 million data rows in 2012).

Conditions of access


No fees, just the accessibility agreement. A very small proportion of requests are rejected because they are not oriented to any sort of research. There have been prior institutional agreements between the Social Security, the Spanish Statistical Office (INE) and the Tax Office to provide the data.


After a short delay following application, data are sent to the user by mail. Applications must be completed to receive data for every sample year.


Anonymised data are available in txt format for user analysis.


Txt format


The data is only available in Spanish. Technical documents are available in Spanish, such as the general description and organisation (http://www.seg-social.es/Internet_1/Estadistica/Est/Muestra_Continua_de_Vidas_Laborales/CONTUTIL2009/index.htm), and information on the files composing the Sample with and without Tax data samples, 2004 to 2011 (http://www.seg-social.es/Internet_1/Estadistica/Est/Muestra_Continua_de_Vidas_Laborales/Descripci_n_de_ficheros_y_variables/MCVLSINDATOS/index.htm).

Access to data


Besides some very general results (http://www.seg-social.es/Internet_1/Estadistica/Est/Muestra_Continua_de_Vidas_Laborales/Algunos_resultados_generales/index.htm), microdata can be requested using this form, available here: http://www.seg-social.es/Internet_1/Estadistica/Est/Muestra_Continua_de_Vidas_Laborales/SolicitarM/index.htm#documentoPDF A single accessibility agreement must be signed, just to identify the user, his/her affiliation and research interests. A guideline document for users is also provided, paying attention to the large amount of data from the Sample (over 59 million data rows in 2012).

Conditions of access


No fees, just the accessibility agreement. A very small proportion of requests are rejected because they are not oriented to any sort of research. There have been prior institutional agreements between the Social Security, the Spanish Statistical Office (INE) and the Tax Office to provide the data.


After a short delay following application, data are sent to the user by mail. Applications must be completed to receive data for every sample year.


Anonymised data are available in txt format for user analysis.


Txt format


The data is only available in Spanish. Technical documents are available in Spanish, such as the general description and organisation (http://www.seg-social.es/Internet_1/Estadistica/Est/Muestra_Continua_de_Vidas_Laborales/CONTUTIL2009/index.htm), and information on the files composing the Sample with and without Tax data samples, 2004 to 2011 (http://www.seg-social.es/Internet_1/Estadistica/Est/Muestra_Continua_de_Vidas_Laborales/Descripci_n_de_ficheros_y_variables/MCVLSINDATOS/index.htm).


Coverage


Data are available for the period from 2004 to 2011. The sample size differs as the reference population does. For example, in 2006, a sample of 1.17 million was drawn from a population of 29 million, which equals one out of every 25 individuals.


The first issue refers to people who had an economic relationship with Social Security in 2004. However, each issue includes data on the entire working and pension life of the person selected, starting in 1980.


No stratification. The sample is 4% of the reference population (The sample includes around 1.2 million people).


The Sample reference population is defined as all individuals who have had some connection (contribution or pension) to Social Security at any moment of the reference year. The amount of people considered is larger than that referred to a fixed date of the year and one single person can have several different relationships, simultaneously or successively, depending on his/her working situation during the year. Otherwise, those not belonging to the Social Security system (some civil servants) and those in the informal economy are not included in this population. The random sampling method is simple and does not use any stratification. Every person whose Identity Card Number follows a given algorithm is automatically included in the sample. Thus, the sample represents the entire reference population each year according to age, sex, residence region, and nationality. The information of those individuals selected in the Social Security files is subsequently crossed with the INE’s Municipal Register to complete the Sample demographic data. This allows users to set up the Sample without Tax data file. Finally, the Tax Office draws tax data of the sample population for the Sample with Tax data.


Data cover the Spanish population at all geographical levels. Towns with less than 40,000 residents are identified only by the provincial code.


The population from which the sample is drawn consists of all individuals who have been registered in the Social Security system as workers or who are receiving a pension during the reference year. No specific year is considered as it depends on the working life starting point. (Those individuals in the Social Security system registered to receive health assistance or the recipients of non-contributory pension or welfare benefits are not included in the population). In 2012, 1% of the population covered are below 20 years of age, and 1.5% are over the age of 90.


It allows for a detailed study of several topics. As the information refers to individuals and changes along their working life, different types of variables enter into the sample as drawn from the Social Security files for employees and employers, not only for the reference year, but also for the time period from which individual data are recorded: - identification (individual’s code, SS code, company’s code,). - variables which have to do with the work performed: SS regime, registering dates, type of labour contract, working regime, payment, disability conditions (as reported by employer), contribution bases, changes along the working life. - data referred to employers: economic activity, company size, time acting as employer, place where the company is, type of employer. For those who are already retired, a set of variables describe the pension type (including disability), pension dates, salary to calculate pension, complements, monthly payments, extra payments, etc. Data from the Municipal Register refer to age, sex, birth and place of residence; age and sex of all other people in the same household. Some data conversion algorithms have been employed to preserve confidentiality and anonymity of ID, address, nationality and other sensitive information about individuals and companies. Results can be stratified for every classification variable included in the sample, given its very large size.


• Arranz, J. M., & Muñoz-Bullón, F. “Unemployment benefits and recall jobs in Spain: a split population model”. Working Paper # 1/2013, Instituto de Iniciativas Empresariales y Empresa Familiar, 2013. Available at: orff.uc3m.es/bitstream/10016/16792/1/ieefdt1301.pdf. • Cebrián López, I., & Raymundo, G. M. “La estabilidad de los nuevos contratos indefinidos durante la crisis económica”. Estudios de Economía Aplicada 30(1) (2012): 183–208. Available at: http://www.revista-eea.net/documentos/30113.pdf
. • Cebrián, I., & Moreno, G. “Labour Market Intermittency and its Effect on Gender Wage Gap in Spain”. Revue Interventions économiques 47 (2013): 2-19. Available at: http://interventionseconomiques.revues.org/1950
. • Durán, A. “La Muestra Continua de Vidas Laborales de la Seguridad Social”. REVISTA DEL MINISTERIO DE TRABAJO Y ASUNTOS SOCIALES, pp. 231-240. • Lapuerta, I. “Claves para el trabajo con la Muestra Continua de Vidas Laborales”. DemoSoc Working Paper 37, 2010. Available at: repositori.upf.edu/.../DEMOSOC37.pdf?sequence=1
. • López-Roldán, P. “La Muestra Continua de Vidas Laborales: posibilidades y limitaciones. Aplicación al estudio de la ocupación de la población inmigrante”. Metodología de Encuestas 13 (2011): 7-32. Available at: ddd.uab.cat/.../metenc_a2011v13p7.pdf
. There are also several publications using this source to analyze the transition from work to pension, and the sustainability of the pension system.

Coverage


Data are available for the period from 2004 to 2011. The sample size differs as the reference population does. For example, in 2006, a sample of 1.17 million was drawn from a population of 29 million, which equals one out of every 25 individuals.


The first issue refers to people who had an economic relationship with Social Security in 2004. However, each issue includes data on the entire working and pension life of the person selected, starting in 1980.


No stratification. The sample is 4% of the reference population (The sample includes around 1.2 million people).


The Sample reference population is defined as all individuals who have had some connection (contribution or pension) to Social Security at any moment of the reference year. The amount of people considered is larger than that referred to a fixed date of the year and one single person can have several different relationships, simultaneously or successively, depending on his/her working situation during the year. Otherwise, those not belonging to the Social Security system (some civil servants) and those in the informal economy are not included in this population. The random sampling method is simple and does not use any stratification. Every person whose Identity Card Number follows a given algorithm is automatically included in the sample. Thus, the sample represents the entire reference population each year according to age, sex, residence region, and nationality. The information of those individuals selected in the Social Security files is subsequently crossed with the INE’s Municipal Register to complete the Sample demographic data. This allows users to set up the Sample without Tax data file. Finally, the Tax Office draws tax data of the sample population for the Sample with Tax data.


Data cover the Spanish population at all geographical levels. Towns with less than 40,000 residents are identified only by the provincial code.


The population from which the sample is drawn consists of all individuals who have been registered in the Social Security system as workers or who are receiving a pension during the reference year. No specific year is considered as it depends on the working life starting point. (Those individuals in the Social Security system registered to receive health assistance or the recipients of non-contributory pension or welfare benefits are not included in the population). In 2012, 1% of the population covered are below 20 years of age, and 1.5% are over the age of 90.


It allows for a detailed study of several topics. As the information refers to individuals and changes along their working life, different types of variables enter into the sample as drawn from the Social Security files for employees and employers, not only for the reference year, but also for the time period from which individual data are recorded: - identification (individual’s code, SS code, company’s code,). - variables which have to do with the work performed: SS regime, registering dates, type of labour contract, working regime, payment, disability conditions (as reported by employer), contribution bases, changes along the working life. - data referred to employers: economic activity, company size, time acting as employer, place where the company is, type of employer. For those who are already retired, a set of variables describe the pension type (including disability), pension dates, salary to calculate pension, complements, monthly payments, extra payments, etc. Data from the Municipal Register refer to age, sex, birth and place of residence; age and sex of all other people in the same household. Some data conversion algorithms have been employed to preserve confidentiality and anonymity of ID, address, nationality and other sensitive information about individuals and companies. Results can be stratified for every classification variable included in the sample, given its very large size.


• Arranz, J. M., & Muñoz-Bullón, F. “Unemployment benefits and recall jobs in Spain: a split population model”. Working Paper # 1/2013, Instituto de Iniciativas Empresariales y Empresa Familiar, 2013. Available at: orff.uc3m.es/bitstream/10016/16792/1/ieefdt1301.pdf. • Cebrián López, I., & Raymundo, G. M. “La estabilidad de los nuevos contratos indefinidos durante la crisis económica”. Estudios de Economía Aplicada 30(1) (2012): 183–208. Available at: http://www.revista-eea.net/documentos/30113.pdf
. • Cebrián, I., & Moreno, G. “Labour Market Intermittency and its Effect on Gender Wage Gap in Spain”. Revue Interventions économiques 47 (2013): 2-19. Available at: http://interventionseconomiques.revues.org/1950
. • Durán, A. “La Muestra Continua de Vidas Laborales de la Seguridad Social”. REVISTA DEL MINISTERIO DE TRABAJO Y ASUNTOS SOCIALES, pp. 231-240. • Lapuerta, I. “Claves para el trabajo con la Muestra Continua de Vidas Laborales”. DemoSoc Working Paper 37, 2010. Available at: repositori.upf.edu/.../DEMOSOC37.pdf?sequence=1
. • López-Roldán, P. “La Muestra Continua de Vidas Laborales: posibilidades y limitaciones. Aplicación al estudio de la ocupación de la población inmigrante”. Metodología de Encuestas 13 (2011): 7-32. Available at: ddd.uab.cat/.../metenc_a2011v13p7.pdf
. There are also several publications using this source to analyze the transition from work to pension, and the sustainability of the pension system.


Linkage


No


Internal links are set up with the data sources from which the sample is drawn. In general, no linkages with other data sources are allowed due to privacy concerns. A few have been made under very restrictive conditions. A comparison with the Economically Active Population Survey (EPA) shows that both datasets provide quite a similar active population structure in Spain.

Linkage


No


Internal links are set up with the data sources from which the sample is drawn. In general, no linkages with other data sources are allowed due to privacy concerns. A few have been made under very restrictive conditions. A comparison with the Economically Active Population Survey (EPA) shows that both datasets provide quite a similar active population structure in Spain.


Data quality


A file of all the people who were part of the reference population with only its main characteristics (sex, age, region of residence and nationality) is drawn in order to make sure the sample is representative. In the explanatory document for each variable, possible errors are evaluated. Thus, the "education level" variable is considered "not reliable for younger people". However, most data are considered more reliable than the ones coming from a household survey.


No big methodological changes have occurred so far. However, data reported to different administrative offices, and included in the sample, may not match. For example, a child may be included in a household in the Municipal Population Register, but not reported by the mother to the employer for tax purposes.


The same as above. Changes in administrative classifications occur from time to time, as in types of labour contracts or economic activities, but both the old and new classification is usually retained.

Data quality


A file of all the people who were part of the reference population with only its main characteristics (sex, age, region of residence and nationality) is drawn in order to make sure the sample is representative. In the explanatory document for each variable, possible errors are evaluated. Thus, the "education level" variable is considered "not reliable for younger people". However, most data are considered more reliable than the ones coming from a household survey.


No big methodological changes have occurred so far. However, data reported to different administrative offices, and included in the sample, may not match. For example, a child may be included in a household in the Municipal Population Register, but not reported by the mother to the employer for tax purposes.


The same as above. Changes in administrative classifications occur from time to time, as in types of labour contracts or economic activities, but both the old and new classification is usually retained.


Applicability


The Continuous Sample of Working Lives (CSWL) is a set of anonymous microdata extracted from administrative records of both the Social Security, the Municipal Register and, depending on the versions, some data from the Tax Office. The Sample is updated every year, getting information from the variables selected from the Security Social system, dating back as far as the computerized records are kept, and from other administrative data sources where complementary information about individuals is recorded. The CSWL objectives are (i) to support research (information is preserved during many decades to develop deep and consistent studies, even in the case of small groups) and (ii) to keep the administrative data transparent leading to social policy exercises. It is a source of reference for the development of longitudinal studies and the application of life-history analysis techniques based on the key concept of 'life-course' and for the study of labour market dynamics and the evolution of the social welfare system. The CSWL design is characterized as follows: - A single and large sample enough for significant studies, even in the case of disaggregated variables. - Data aggregation procedure to minimize the resources deployment. - Information relating to the individuals working life. - Annual update. - Provision of data along with metadata. The main weaknesses of the sample have to do with several aspects related to the aggregation procedure, with the administrative constraints that every data source provides and with the multiplicity of situations regarding the working life of people. This causes the analysis to be performed and the results to have to be explained carefully in order to get an exact idea of the reference population. Just to synthesise, following some Lapuerta’s comments (2010): Some inconsistencies arise from the 'matching' process of the various sources of information • very few individuals have a duplicated ID number (0.1%) • not all people with ‘working life’ are obliged to bear an ID number (i.e. young adults receiving an orphan’s pension) • in some initial years in the sample period, some difficulties in matching data with the Municipal Register arose accounting for outstanding non-matching records Analysing concurring situations in the individual’s working life is also a matter of problems. The individual is the unit of analysis, but much of the information contained in the aggregated files refers to working status(e.g., work, collecting unemployment benefits, pension, etc.). Thus, a single person may have experienced them throughout his or her working life, in some cases occurring at the same time. The same can be said about the duration of contracts when these are different, but refer to one person in a fixed period of time, or the length of the unemployment periods. Another problem is the analysis of the family structure, which is possible using the Municipal Register and to some extent from the Tax data, but it is not possible from the Social Security data.

Applicability


The Continuous Sample of Working Lives (CSWL) is a set of anonymous microdata extracted from administrative records of both the Social Security, the Municipal Register and, depending on the versions, some data from the Tax Office. The Sample is updated every year, getting information from the variables selected from the Security Social system, dating back as far as the computerized records are kept, and from other administrative data sources where complementary information about individuals is recorded. The CSWL objectives are (i) to support research (information is preserved during many decades to develop deep and consistent studies, even in the case of small groups) and (ii) to keep the administrative data transparent leading to social policy exercises. It is a source of reference for the development of longitudinal studies and the application of life-history analysis techniques based on the key concept of 'life-course' and for the study of labour market dynamics and the evolution of the social welfare system. The CSWL design is characterized as follows: - A single and large sample enough for significant studies, even in the case of disaggregated variables. - Data aggregation procedure to minimize the resources deployment. - Information relating to the individuals working life. - Annual update. - Provision of data along with metadata. The main weaknesses of the sample have to do with several aspects related to the aggregation procedure, with the administrative constraints that every data source provides and with the multiplicity of situations regarding the working life of people. This causes the analysis to be performed and the results to have to be explained carefully in order to get an exact idea of the reference population. Just to synthesise, following some Lapuerta’s comments (2010): Some inconsistencies arise from the 'matching' process of the various sources of information • very few individuals have a duplicated ID number (0.1%) • not all people with ‘working life’ are obliged to bear an ID number (i.e. young adults receiving an orphan’s pension) • in some initial years in the sample period, some difficulties in matching data with the Municipal Register arose accounting for outstanding non-matching records Analysing concurring situations in the individual’s working life is also a matter of problems. The individual is the unit of analysis, but much of the information contained in the aggregated files refers to working status(e.g., work, collecting unemployment benefits, pension, etc.). Thus, a single person may have experienced them throughout his or her working life, in some cases occurring at the same time. The same can be said about the duration of contracts when these are different, but refer to one person in a fixed period of time, or the length of the unemployment periods. Another problem is the analysis of the family structure, which is possible using the Municipal Register and to some extent from the Tax data, but it is not possible from the Social Security data.


  • The information about this dataset was compiled by the author:
  • Vicente Rodríguez
  • (see Partners)