Data Issues Paper

Content type
Data use Documentation
Published

December 2023

Project
Part of a collection

Overview

The Data Issues Paper provides a summary of data-related issues that have been identified in the Ten to Men data. It has been designed to assist users of the data as they undertake research and analysis. It should be read in conjunction with the Ten to Men Data User Guide.

The Data Issues Paper provides information to data users on:

  • observed inconsistencies and issues that they should be aware of when analysing and interpreting the Ten to Men data
  • recommendations and guidance in the management of identified data quality issues in the Ten to Men data.

The Data Issues Paper has been divided into 3 sections:

  • a history of the Ten to Men datasets
  • changes to the structure of the Ten to Men datasets
  • description of identified data quality issues.

Further sections will be added as any data-related issues emerge.

Data Issue Paper Updates

DateVersionUpdateSuggested citation
September 20191.0Initial versionHowell, L., Bandara, D., Mohal, J., Andalon, M., Silbert, M., Garrard, B., Swami, N., & Daraganova, G. (2019). Ten to Men: The Australian Longitudinal Study on Male Health - Data Issues Paper, Version 1.0, September 2019. Melbourne: Australian Institute of Family Studies.
September 20212.0Updated for Wave 3Howell, L., Silbert, M., & Bandara, D. (2021). Ten to Men: The Australian Longitudinal Study on Male Health - Data Issues Paper, Version 2.0, September 2021. Melbourne: Australian Institute of Family Studies.
March 20222.1Addition of Section 3.9 Current OccupationHowell, L., Silbert, M., & Bandara, D. (2022). Ten to Men: The Australian Longitudinal Study on Male Health - Data Issues Paper, Version 2.1, March 2022. Melbourne: Australian Institute of Family Studies.
September 20233.0Updated for Wave 4Volpe, F. Suares, M., Silbert, M., & Martin, S. (2023). Ten to Men: The Australian Longitudinal Study on Male Health - Data Issues Paper, Release 4.0, (Waves 1-4). Melbourne: Australian Institute of Family Studies.
1. Ten to Men data

1. Ten to Men data

Periodically a new release of the Ten to Men datasets will be generated as additional information becomes available after each data collection wave. The releases are numbered in sequential order and a new Digital Object Identifier (DOI) is minted.

There have been 5 releases of data:

  • Release 1.0 was issued by the University of Melbourne and contained data from Wave 1 only.
  • Release 2.0 was also issued by the University of Melbourne. It contained data from both Wave 1 and Wave 2, as well as a respondent dataset.
  • Release 2.1 was issued by the Australian Institute of Family Studies (AIFS) and comprised of updated Wave 1 and Wave 2 datasets. Relevant data from the respondent dataset was included in these datasets and it is no longer available as a separate dataset.
  • Release 3.0 was issued by AIFS and contained data from Wave 1, Wave 2 and Wave 3.
  • Release 4.0 was also issued by AIFS, and contained data from Wave 1, Wave 2, Wave 3 and Wave 4.

A history of the dataset releases and suggested citations can be found in Appendix A.

2. Ten to Men datasets

2. Ten to Men datasets

This section documents the structural changes that have been applied to the Ten to Men datasets. These structural changes enhance the usability of the datasets, especially as data from additional waves are included. Structural changes include the merging of datasets, resolving data inconsistencies, addressing quality issues, and augmenting data resources with additional information.

Most of the major structural changes were implemented in Release 2.1. Table 1 provides a summary of all structural changes, and further details can be found in the corresponding sections.

Table 1: Summary of changes to the dataset structure

Structural changeImplementationSee section for further details
Addition of a data sharing frameworkRelease 2.12.1
Respondent data added to Wave 1 and Wave 2 datasets, removing the need for a separate datasetRelease 2.12.2
Renaming of variables to indicate wave, thus aligning with the standard naming convention for variablesRelease 2.12.3
Renaming of variables to indicate research domainRelease 2.12.4
Addition of a new research domain for linked dataRelease 2.12.5
Renaming of census linked variables to include reference yearRelease 2.12.6
Renaming of weight variables to include reference to population or sample weightsRelease 3.02.7

2.1 Addition of a data sharing framework

To increase the utility of information while minimising disclosure risks, a data sharing framework to differentiate the user's access level was adopted for Release 2.1 of the Ten to Men datasets. This resulted in 2 levels of datasets for each wave being generated - the General Release and the Restricted Release.

A lower level of confidentilisation was applied to the Restricted Release dataset, with all initial information preserved. The only information not included in the dataset are names, addresses and other contact details. Access to the Restricted Release dataset may only granted when data users are able to demonstrate a genuine need for the additional data, and when they also meet the necessary additional security requirements.

The General Release dataset has undergone further data confidentilisation. In addition to the information removed for the Restricted Release dataset, further confidentilisation for the General Release dataset includes supressing variables, aggregating response categories and recoding outlying values to a less extreme value. Users can consult the Ten to Men Data Dictionary for more information on the confidentialised variables.

As access requirements for the General Release dataset are less rigorous than for the Restricted Release dataset, this has improved accessibility for users to the Ten to Men datasets.

For further information about the Ten to Men datasets, including data access procedures, users can refer to Sections 3 and 8 of the Ten to Men Data User Guide.

2.2 Availability of respondent dataset

Release 2.0 comprised of 3 Ten to Men datasets - Respondent, Wave 1 and Wave 2. The Respondent dataset contained key indicator data, such as the unique study identifier, age, household identifier and geographical information. The dataset for each wave contained the responses to the corresponding questionnaires.

In Release 2.1, relevant information from the respondent dataset was included in the Ten to Men Wave 1 and Wave 2 datasets. This has removed the necessity of maintaining a separate respondent dataset, and thus only 2 datasets were released at each level - Wave 1 and Wave 2.

The inclusion of respondent data in the wave datasets has been applied in all subsequent releases.

2.3 Renaming of variables to indicate wave

The standard naming convention of the Ten to Men variables specifies that the first character of the variable should indicate the wave or be a 'z' if the variable is constant across waves.

In Releases 1.0 and 2.0, some variables in the respondent datasets did not follow this standard naming convention. The first character of the variable name was 'z' but they were no consistent across waves. In these cases, the variable label specified whether the variable related to Wave 1 or Wave 2.

It is important that variables conform to the Ten to Men standard naming convention to maintain consistency and uniformity across the data. Therefore, in Release 2.1, these variables were renamed to follow the standard naming convention; that is, the first charact of the variable was changed to indicate the wave if the variable was not constant across waves.

Further details of all variables that were renamed are shown in Appendix B.

2.4 Renaming of variables to indicate research domain

The standard naming convention of the variables in the Ten to Men datasets specifies that the second and third characters of the variable should indicate the research domain. A list of all the research domains can be found in the Ten to Men Data Dictionary and the Ten to Men Data User Guide.

One variable was identified in Releases 1.0 and 2.0 where the second and third characters of the variable did not correspond to a research domain. As it is important to maintain consistency, this variable has been renamed to reflect the correct research domain.

Table 2: List of variables with corrections to the research domain

 Variable nameResearch domain (2nd and 3rd characters)Time of correction
Originalbhxsex120ahx is not a research domain 
Correctionbbxsex120aBehaviours - sexual behaviour (bx)Release 2.1

2.5 Additional research domain for linked data

In Releases 1.0 and 2.0, the research domain of Data Collection (dc) is comprised of both key indicator variables and linked data. This includes variables such as the unique study ID, participation indicators, household indicators, statistical area codes (SA1, SA2) and numerous socio-economic indexes for areas (SEIFA).

To provide transparency about the data source, these variables were separated into 2 research domains for Release 2.1 The key indicator variables remain in the research domain of Data Collection (dc), while an additional research domain was created for Linked Data (ld).

The standard naming convention for variables specifies that the second and third characters of the variable name should indicate the research domain. Consequently, the creation of a new research domain resulted in the renaming of some variables to conform to this standard. That is, the second and third characters of the variable names in the linked data research domain were changed form 'dc' to 'ld'.

Further details of all variables that were renamed are shown in Appendix B.

2.6 Renaming of census-based variables

In Release 2.0, the respondent dataset contained linked data from the Australian Bureau of Statistics (ABS) 2011 Census. These variables did not contain any information to indicate the census year.

As new census data becomes available, this has been added to the Ten to Men datasets. It therefore became important to include a reference to the census year in the variable name.

In Release 2.1, the eighth and ninth characters of the variable name were changed to represent a year indicator. For example, the variable 'aldieod00i' has been renamed to 'aldieod11i' to indicate that it is based on the 2011 Census data. Linked data from the ABS 2016 Census was also available and added in this release.

Further details of all variables that were renamed are shown in Appendix B.

2.7. Renaming of weight variables

Population weights have been included in all releases of the Ten to Men datasets.

Sample weights were added in Release 3.0, as well as the development of a variable naming framework for the weight variables. Wave 1 and Wave 2 weighting variables were renamed to comply with this framework and to clearly indicate whether the variable referred to a population or sampling weight.

Table 3 indicates the naming convention for weights that has been applied since Release 3.0.

Table 3: Naming convention for weights

Character position in variable nameDescriptionVariable abbreviation
1WaveA, B, C or D
2,3Research DomainDC
4Initial or RakedI or R
5Longitudinal or Cross-SectionalL or C
6Population or SampleP or S
7,8,9For Wave 1WTA
7,8,9For Wave 2WTB
7,8,9For Wave 3WTC
7,8,9For Wave 4WTD
7,8,9Between Waves 1 and 2WAB
7,8,9Between Waves 1 and 3WAC
7,8,9Between Waves 1 and 4WAD
7,8,9Between Waves 2 and 3WBC
7,8,9Between Waves 2 and 4WBD
7,8,9Between Waves 3 and 4WCD
10DerivedD

Further details of the weighting variables that were renamed are shown in Appendix B.

3. Data quality issues

3. Data quality issues

Data quality is measured by factors such as accuracy, validity, consistency, and completeness. AIFS undertakes validation procedures to ensure that the Ten to Men data quality is of an appropriate standard. However, it is the responsibility of the data user to assess the data quality of the Ten to Men variables before any analysis is undertaken.

This section contains information about data quality issues that have been identified across the waves of Ten to Men. Further information will be added as any additional data quality issues emerge.

Table 4 provides a summary of the identified data quality issues and the wave/s that are affected. Detailed information about the data issue and any recommendations can be found in the corresponding sections of this paper.

Table 4: Summary of data quality issues

Data quality issueWave 1Wave 2Wave 3Wave 4Section
Behaviours - alcohol     
Behaviours - tobacco     
Behaviours - weight     
Data collection indicator     
Health status     
Social determinants - Life Events     
Social determinants - Socioeconomic Status     
Missing data3.1
Outliers3.2
Data from Parent Questionnaire  3.4
Additional Wave 1 participants   3.12
Pilot data for Wave 2   3.13
Variable naming inconsistencies in reference to the Research Domain   3.15
Derived variables  3.16
Age of first drink of alcohol  3.7
Age first smoked cigarette  3.8
Height, Weight and Body Mass Index3.5
Height  3.18
Update to weights 3.14
Obstructive sleep apnoea   3.15
Short form 12 (SF-12) Health survey  3.17
Other natural disasters   3.19
Age of respondents3.3
Level of education completed  3.6
Country of birth   3.9
Current occupation  3.10
Language spoken at home   3.11

3.1. Missing data

Most variables in the Ten to Men datasets have some proportion of missing data, which has been coded using the Ten to Men standard missing value code frame (see the Ten to Men Data User Guide for more information).

The proportion and reasons for missing data should be considered before drawing any conclusion from the data.

3.2 Outliers

All releases of the Ten to Men datasets contain the raw data, with variables that have not been cleaned for outliers. The exception to this is the categorising of the extreme ends as part of the confidentilisation process for the General Release datasets. The variables where this type of confidentilisation has been applied are indicated in the Ten to Men Data Dictionary.

Data users are advised to take care when using and interpreting the Ten to Men data as the presence of outliers may necessitate excluding values or categorising the extreme ends.

3.3 Age of respondents

Cohort inconsistencies

The scope of Ten to Men was males aged 10-55 years at Wave 1, with 3 cohorts:

  • males aged 10-14 years completing a Boys questionnaire
  • males aged 15-17 years completing a Young Men questionnaire
  • males aged 18 years and over completing an Adult questionnaire.

However, there were a small number of men invited to participate whose age was outside the scope or who completed the incorrect questionnaire for their age. The inconsistency arises with less than 0.5% of respondents and is likely to have occurred due to the difference in time between sending out the hard copy questionnaires and the respondents completing the questionnaires. The survey data for these respondents have been retained in the Ten to Men datasets.

The inconsistencies are present in all releases of the Wave 1 and Wave 2 datasets. From Wave 3, there was only one questionnaire and therefore this inconsistency is not an issue.

Calculation of age in Wave 3

In Wave 3 and all subsequent waves, the age of the respondent was not asked in the questionnaire. For inclusion in the datasets, the age is calculated using the respondent's date of birth.

As part of the respondent validation process for Wave 3 , the date of birth was asked. Therefore, there are 2 sources of the date of birth - the master contact file and Wave 3 survey data. A process was undertaken to compare the date of birth from the 2 sources, and it was the same for 97% of respondents.

Further investigation of the 3% where the date of birth differed showed that many only supplied the birth year for Wave 3 data. An assumption has been made that the birth date on the contact file is correct and this has been used to calculate the age of the respondent in Wave 3 (the Wave 3 survey date was also used in the calculations).

There are 5 observations where no date of birth has been supplied (in either Wave 3 or the master contact file). In these cases, the age at Wave 1 and Wave 2, as well as the survey completion dates have been used to impute an age for Wave 3. The 5 unique study identifiers (zdcid0001d) where this occurred are 5003136, 7006305, 7007404, 8010082 and 9015997.

In Wave 4, the date of birth was asked again as part of the respondent validation process.

A process was undertaken to compare the date of birth against the master contact file, and it was the same for 99% of respondents.

Further investigation of the 1% where the date of birth differed showed that many only supplied the birth year for Wave 3 data. As per Wave 3, an assumption has been made that the birth date on the contact file is correct and this has been used to calculate the age of the respondent in Wave 4 (the Wave 4 survey date was also used in the calculations).

New Inconsistencies have been found since asking date of birth in Wave 4. Differences in Wave 4, Wave 3 and master contact file are now present. To make age consistent in Wave 4, and across subsequent waves, age in Wave 4 has calculated either from date of birth from master contact file or if missing, from a combination of Wave 3 based on Wave 1 age and survey completion dates.

There is one observation where no date of birth has been supplied (in either Wave 4, Wave 3 or the master contact file). In this case, the age at Wave 1, Wave 2, Wave 3 as well as the survey completion dates have been used to impute an age for Wave 4. The unique study identifiers (zdcid0001d) where this occurred is 7007404.

3.4 Parent questionnaire data

For Wave 1 and Wave 2 of Ten to Men, the parents of the males aged 10-14 years also completed a questionnaire. The parent was not assigned an ID and therefore, it cannot be determined if the same parent filled in the questionnaire for both Wave 1 and Wave 2. This is important as some questions were subject to the parent's perception. For example, 'In the past 4 weeks, how often does your child feel happy?'

As a result, data users are advised to take extreme care if comparing responses from the Parent questionnaire across Wave 1 and Wave 2.

3.5 Anthropometric measurements

The Ten to Men questionnaires contain questions about anthropometric measurements. Some of the responses are implausible (e.g. a height of 1 cm).

Data users are advised to clean and make their own decisions when dealing with anthropometric measurements as they may contain erroneous data values that will affect derived values and interpretations.

3.6 Level of education completed

Questions about the completed level of education have been asked in all waves of Ten to Men. However, the response categories for the various questionnaires (Boys, Young Men, Adults, Parents) and waves has not been consistent. Extreme care needs to be taken when using this education data, especially if comparing values across questionnaires and/or waves.

The Australian Standard Classification of Education (ASCED) could be used to further categorise this data. In this case, Primary education should also include Year 7 for South Australia only. More information on the ASCED and how it is structured can be found on the ABS website.

3.7 Age when first drank alcohol

Summary

A data issue has been identified with the responses to the question 'How old were you when you first drank more than just a sip or a taste of alcohol?'. The question was included on 3 questionnaires (Boys, Young Men and Adults) for Waves 1 and 2, and a common variable was created to hold the responses for each wave:

  • 'abaalcagem' contains the responses from all questionnaires for Wave 1
  • 'bbaalcagem' contains the responses from all questionnaires for Wave 2.

The data issue arose as a format was applied to the responses to this question on the Boys questionnaire. No format, other than the missing value formats, was applied to the responses to this question on the Young Men and Adults questionnaires. When the data from the Boys questionnaire was merged with the data from the Young Men and Adults questionnaires to create the common variable, the format for the responses from the Boys questionnaire was not applied.

As a result, the data from the Boys questionnaire for this question was incorrectly reduced by 4 years. For example, a response of 12 years would be recorded as 8 years in the Ten to Men dataset.

This data issue is present in Releases 1.0 and 2.0 of the Ten to Men datasets, but the raw data has been amended in Release 2.1.

Further details

The data (excluding the missing values) from Release 2.0 of the Ten to Men datasets is shown in Table 5. Responses from both the Boys and Young Men questionnaires are shown for comparison. Each cell in the table is colour coded:

  • Grey - representing recorded plausible responses in the Ten to Men dataset
  • Green - representing no recorded responses in the Ten to Men dataset
  • Black - representing recorded implausible values given the age of the respondent at the time of the survey (e.g. a 10 year old cannot respond that they started drinking at 12 years).

Table 5: Data from Release 2.0 (Waves 1 and 2)

The data (excluding the missing values) from Release 2.0 of the Ten to Men datasets is shown in Table 5. Responses from both the Boys and Young Men questionnaires are shown for comparison. Each cell in the table is colour coded:  Grey - representing recorded plausible responses in the Ten to Men datasetGreen - representing no recorded responses in the Ten to Men datasetBlack - representing recorded implausible values given the age of the respondent at the time of the survey

There is an issue with the data from the Boys questionnaire, as there is no recorded response of anyone having their first drink of alcohol after the age of 10 (green cells). There is also a higher than expected number of respondents having their first drink of alcohol before the age of 5 years (grey cells).

This issue is especially evident when compared to the data from the Young Men questionnaire.

Further investigation identified a problem with different formats being applied.

The format applied to the responses to this question on the Boys questionnaire is shown in Table 6. Applying this format meant the if the respondent replied 10 years of age, the data was entered as 6.

Table 6: Format applied to the Boys questionnaire

CodeFormat
-8No questionnaire or interview completed
-7Unable to determine value
-6Value implausible
-5Invalid multiple response
-4Refused or not answered
-3Don't know
-2Not applicable
-1Not asked
15 years old
26 years old
37 years old
48 years old
59 years old
610 years old
711 years old
812 years old
913 years old
1014 years old

The corresponding question in the Young Men and Adults questionnaires only had the missing value formats applied (codes -8 to -1). For example, if the respondent replied 10 years of age, the data was entered as 10.

So in summary, if the respondent replied 10 years of age, the data entered was either 6 (Boys questionnaire) or 10 (Young Men or Adults questionnaires).

The data from all questionnaires was then combined to form the Ten to Men datasets. When the data from the Boys questionnaire was merged with the data from the Young Men and Adults questionnaires, no format other than the missing value formats was applied. The format for the Boys questionnaire was not applied and the formatted age value was replaced with the code. As a result, the age of the first drink of alcohol for the Boys data was reduced by 4 years (with the maximum age possible being 10).

This issue is present in both Release 1.0 and 2.0 of the Ten to Men datasets.

3.8 Age when first smoked cigarettes

A data issue has been identified with the responses to the question 'How old were you when you first smoked your first cigarette?'. The question was included on 3 questionnaires (Boys, Young Men and Adults) for Waves 1 and 2, and a common variable was created to hold the responses for each wave:

  • 'abtcigagem' contains the responses from all questionnaires for Wave 1
  • 'bbtcigagem ' contains the responses from all questionnaires for Wave 2.

The data issue arose as a format was applied to the responses to this question on the Boys questionnaire. No format, other than the missing value formats, was applied to the responses to this question on the Young Men and Adults questionnaires. When the data from the Boys questionnaire was merged with the data from the Young Men and Adults questionnaires to create the common variable, the format for the responses from the Boys questionnaire was not applied.

As a result, the data from the Boys questionnaire for this question was incorrectly reduced by 4 years. For example, a response of 12 years would be recorded as 8 years in the Ten to Men dataset.

This data issue is present in Releases 1.0 and 2.0 of the Ten to Men datasets, but the raw data has been amended in Release 2.1.

As it is the same data issue as described above, see section 3.7 for further details.

3.9 Country of birth

This section describes a data issue that was present in all Releases prior to Release 4.0. A file containing the raw country of birth data was located in late 2022, and therefore this data issue was corrected for the Wave 1 dataset in Release 4.0.

In Wave 1 of Ten to Men, each questionnaire contained 3 questions about participant's country of birth and their parents' country of birth. The response options included 'Other', where the respondent could specify any country using the free text field.

The data were recorded in the 3 variables:

  • participant's country of birth (asecobownm)
  • mother's country of birth (asemocob1m)
  • father's country of birth (asefacob1m).

This was then re-coded using the Standard Australian Classification of Countries (SACC) and an additional 9 variables at the 1-digit, 2-digit and 4-digit levels were created. These variables contain more detail than the categories provided on the questionnaire, as the 'Other' category has been expanded to include languages specified in the free text field. They are:

  • participant's country of birth (asecobow1md, asecobow2md, asecobow4md)
  • mother's country of birth (asemocob1md, asemocob2md, asemocob4md)
  • father's country of birth (asefacob1md, asefacob2md, asefacob4md).

Although the SACC is a 3-level hierarchical structure, this has not been strictly applied to the data. Small values at the 2-digit and 4-digit levels have been confidentialised by replacing with 99 or 9999 instead of using the supplementary codes (not further defined (nfd)).

Therefore, care should be taken when using the variables at the 2-digit and 4-digit levels, as it will give higher 'Other' results than expected. Further details of the coding are shown in Table 7.

For data users, it is recommended that the variables at the 2-digit and 4-didigt levels are used in conjunction with the 1-digit level variable. The confidentialised variables at the 2-digit and 4-digit levels can then be replaced with the corresponding nfd code.

Table 7: Country of birth codes

Country of birth (1-digit code)Country of birth (2-digit code)Suggested replacement country of birth (2-digit code)Wave 1 frequency
19910 Oceania and Antarctica nfd46
29920 North-West Europe nfd17
39930 Southern and Eastern Europe nfd26
49940 North Africa and Middle East nfd45
59950 South-East Asia nfd0
69960 North-East Asia nfd29
79970 Southern and Central Asia nfd28
89980 Americas nfd10

3.10 Current occupation

In Wave 1 and Wave 2 of Ten to Men, the Adult questionnaire contained a question about the participant's current occupation. It was a free text field, requesting both the Job title and the main duties/tasks.

This data was then coded using the Australian and New Zealand Standard Classification of Occupations (ANZSCO). Three variables for the participant's current occupation were created for each wave. These are at the 1-digit, 2-digit and 4-digit levels:

  • 1-digit level (aseempoc1ad, bseempoc1ad)
  • 2-digit level (aseempoc2ad, bseempoc2ad)
  • 4-digit level (aseempoc4ad, bseempoc4ad)

Although ANZSCO is a 3-level hierarchical structure, this has not been strictly applied to the data. Small values at the 2-digit and 4-digit levels have been confidentialised by replacing with 99 or 9999 instead of using the supplementary codes (not further defined (nfd)). Some values at the 2-digit level have been coded as -7 (Unable to determine value) because the 4-digit level has been confidentialised to 9999.

Therefore, care should be taken when using the variables at the 2-digit and 4-digit levels, as it will give higher 'Other' results than expected. Further details are shown below in Table 8.

Table 8: Current occupation codes

Current occupation 
(1-digit code)
Current occupation 
(2-digit code)
Suggested replacement current occupation (2-digit code)Wave 1 frequencyWave 2 frequency
1-710 Managers nfd166154
2-720 Professionals nfd6555
3-730 Technicians and Trades Workers nfd7097
5-750 Clerical and Administrative Workers nfd1412
59950 Clerical and Administrative Workers nfd4132
6-760 Sales Workers nfd4438
69960 Sales Workers nfd4938
7-770 Machinery Operators and Drivers nfd7543
8-780 Labourers nfd4140
89980 Labourers nfd042

For data users, it is recommended that the variables at the 2-digit and 4-digit levels are used in conjunction with the 1-digit level variable. The confidentialised variables at the 2-digit and 4-digit levels can then be replaced with the corresponding nfd code.

The Parent's questionnaire asked the same question about the parent's current occupation. The variables for this are:

  • 1-digit level (aseempoc1pd, bseempoc1pd)
  • 2-digit level (aseempoc2pd, bseempoc2pd)
  • 4-digit level (aseempoc4pd, bseempoc4pd)

This data has the same issue and recommendations as the participant's current occupation.

Table 9: Current occupation codes (Parents)

Current occupation 
(1-digit code)
Current occupation 
(2-digit code)
Suggested replacement current occupation (2-digit code)Wave 1 frequencyWave 2 frequency
1-710 Managers nfd1010
19910 Managers nfd10362
2-720 Professionals nfd52
29920 Professionals nfd6566
3-730 Technicians and Trades Workers nfd10
39930 Technicians and Trades Workers nfd540
499Community and Personal Service Workers nfd4662
5-750 Clerical and Administrative Workers nfd24
59950 Clerical and Administrative Workers nfd11968
8-780 Labourers nfd30
89980 Labourers nfd580
9-799 Other93

3.11 Language spoken at home

In Wave 1 of Ten to Men, each questionnaire contained a question about the language spoken at home. However, the response categories varied across the 3 different questionnaires.

Adult questionnaire

The Adult questionnaire had 7 options for the response to the question about language, which are shown in Table 10. Once option was 'Other', where the respondent could specify any other language using a free text field.

Table 10: Language codes for Adult questionnaire

CodeLanguage
1201English
2201Greek
2401Italian
4202Arabic
6302Vietnamese
7104Mandarin
9999Other

This data was then re-coded using the Australian Standard Classification of Languages (ASCL) and 3 variables at the 1-digit, 2-digit and 4-digit levels were created (aselangh1ad, aselangh2ad, aselangh4ad). These variables contain more detail than the categories on the questionnaire, as the 'Other' category has been expanded to include languages specified in the free text field.

Table 11: Language codes for Adult questionnaire

Language 
(1-digit level) 
aselangh1ad
Language 
(2-digit level) 
aselangh2ad
Suggested replacement language (2-digit level)Wave 1 frequency
19910 Northern European Languages nfd30
29920 Southern European Languages nfd72
39930 Eastern European Languages nfd56
49940 Southwest and Central Asian Languages nfd57
59950 Southern Asian Languages nfd2
69960 Southeast Asian Languages nfd30
79970 Eastern Asian Languages nfd20

Although detailed information on the language can be obtained, the small values at these levels have resulted in the variables being confidentialised (some values have been replaced by 99 or 9999). Care should be taken when using the variables at the 2-digit and 4-digit levels, as it will give higher 'Other' results than expected. Further details are shown in Table 11.

We recommend that the variables at the 2-digit and 4-digit levels be used in conjunction with the 'aselangh1ad' variable. The confidentialised variables at the 2-digit and 4-digit levels can then be replaced with the corresponding nfd code.

Boys and Young Men questionnaires

The Boys and Young Men questionnaires only had 3 options for the response to this question about language, as shown in Table 12, and recorded as the variable 'aselangh1u'.

Table 12: Language codes for Boys and Young Men questionnaires

CodeLanguage
1English
2Another language
3English and another language about equally

The respondent could specify the other language using the free text field and this was re-coded using the ASCL. Three variables at the 1-digit, 2-digit and 4-digit levels were created (aselangh1ud, aselangh2ud, aselangh4ud).

However, the small values at this level have resulted in the variables being totally confidentialised (all values have been replaced by 9, 99 or 9999).

Therefore, no information about the other languages spoken at home is available in the Ten to Men datasets for the Boys and Young Men.

3.12 Additional Wave 1 participants

During Wave 2 of Ten to Men, 33 additional participants were identified for Wave 1. They were not included in the original Wave 1 dataset (Release 1.0) as their eligibility and consent status had not been determined at that stage, but this issue was resolved during Wave 2.

In Release 1.0, the sample size for Wave 1 was 15,988. This was comprised of the 3 cohorts:

  • 1,087 males aged 10-14 years completing a Boys questionnaire
  • 1,017 males aged 15-17 years completing a Young Men questionnaire
  • 13,884 males aged 18 years and over completing an Adult questionnaire.

In Releases 2.0 and 2.1, the 33 additional participants have been subsequently included in Wave 1, taking the reconciled sample size for Wave 1 to 16,021. The reconciled cohort sizes are:

  • 1,099 males aged 10-14 years completing a Boys questionnaire
  • 1,026 males aged 15-17 years completing a Young Men questionnaire
  • 13,896 males aged 18 years and over completing an Adult questionnaire.

3.13 Pilot data for Wave 2

Of the reconciled Wave 1 sample, there were 314 respondents who were interviewed in the Ten to Men pilot for Wave 2. These respondents did not complete a questionnaire during the course of the main data collection period for Wave 2.

In Releases 1.0 and 2.0, the pilot data has been included in Wave 2 datasets. The sample size was 12,250 males.

In Release 2.1, the data for these 314 respondents have been removed from the Wave 2 dataset. This has reduced the sample size for Wave 2 to 11,936 males. From this Release onwards, these 314 respondents will remain part of the pilot and not be included in the main sample.

3.14 Update to weights

Release 1.0 and 2.0 only included sample weights for Wave 1.

Upon review of the Ten to Men data, it was decided to include Wave 2 weights in Release 2.1. It was necessary to update the Wave 1 weights to ensure that the weights for Wave 2 were developed using the same approach and references as those used in the calculation of the Wave 1 weights.

Therefore, Release 2.1 of the Ten to Men datasets contains the updated weights for Wave 1 and the new sample weights for Wave 2.

In Release 3.0, population and sample weights have been included for all waves.

3.15 Obstructive sleep apnoea

For Wave 2 of Ten to Men, there were 4 questions asked in the Adult questionnaire relating to obstructive sleep apnoea as part of the STOP-Bang questionnaire screening tool. Further information about this screening tool can be found on the STOP-Bang website.

Four objective measures are also required as part of the STOP-Bang questionnaire screening tool: BMI, age, neck circumference and gender. The responses to these 8 elements are scored, with the result indicating low, medium or high risk of obstructive sleep apnoea.

The resulting score was recorded in the Ten to Men Wave 2 dataset as the derived variable:

  • Risk of OSA (STOP-Bang) (bhsosarisad).

The values of this derived variable should be stored as a score (0-8 scale), or as a Low/Medium/High format.

In Release 2.0, this variable had values of 0 or 1.

In Release 2.1, the intention was to recalculate the derived variable. However, only 7 of the 8 elements of the STOP-Bang questionnaire screen were available, as we did not have information about the neck circumference.

As a result, this derived variable (bhsosarisad) has been removed from the datasets in Release 2.1 and all subsequent releases.

3.16 Derived variables

The Ten to Men dataset contains numerous derived variables, including scale and summary scores. The calculation of these derived variables generally require input from multiple raw variables, and it is possible that one or more of these input data values may be missing.

Missing values are given negative numeric values according to the Ten to Men standard missing value code frame. More information about this code frame can be found in the Ten to Men Data User Guide.

A couple of issues have been identified with the calculation of the derived variables in Releases 1.0 and 2.0:

  • Any negative data values were replaced with zero in the calculation of the derived variables. This could introduce misinterpretation of data, depending on the derivation of each variable. For example, the mean of individual components may be underestimated when zero is assigned to a missing value.
  • Incorrect calculation of some derived variables. For example, the elements of the General Wellbeing Scale were not reversed scored before calculating the mean.

Data users using Release 1.0 or 2.0 are advised to re-check and review the interpretation of the derived variables, as the derived variable values may be underestimated or overestimated.

For Release 2.1, a set of guidelines were developed for the treatment of missing input variables for the calculation of derived variables. These are:

  • If all the missing input values had the same code frame, the derived variable was assigned the same missing value as per the code frame. For example, if all input variables were -4, the derived variable was assigned to be -4.
  • If the input variables had any combination of missing values and some valid data values, the derived variable was assigned the missing value code of -7 (Unable to determine value).

All subsequent releases follow these guidelines.

3.17 Short Form 12 (SF-12) Health Survey

The Wave 1 and Wave 2 questionnaires for Adults included the SF-12 Health Survey: a licensed scale measuring respondent's health status. An SF-12 scale score was derived and included in the dataset for Release 1.0 and 2.0.

Due to issues relating to SF-12 license approvals, the raw data items and derived scale score have been removed from Release 2.1 and subsequent releases. These items have been redacted in the annotated questionnaires and have been deleted from the Data Dictionary.

3.18 Height

In Wave 3, height was only asked if the respondent was under 23 years. Therefore, 88% were not asked this question and the height variable was initially coded as -2 (Not applicable) in Wave3.

For Wave 4, height was only asked if the respondent was under 25 years. Therefore, 91% were not asked this question and the height variable was initially coded as -2 (Not applicable) in Wave 4.

The decision was made to impute the height at Waves 3 and 4 for all respondents where the question was not asked. There are 2 sources of height - from both the Wave 1 and Wave 2 data collection. Data from both waves were used, as some respondents in Wave 3 may not have participated in Wave 1 and/or Wave 2.

An assumption has been made that the largest height value is the most accurate, and this has been used to populate the height variable in Wave 3 and 4 for those respondents aged 23 years or above.

3.19 Other natural disasters

In Wave 3, the following 2 questions were asked about whether you or a family member had experienced a natural disaster:

  • Have you been affected by any of the following natural disasters in the past year?
  • Has a close friend or family member been affected by any of the following natural disasters in the past year?

One of the response options was 'Other', where a free text field then allowed more details about the type of natural disaster. A summary of the responses from the free text field is shown in Table 13.

More than 90% of the responses in the free text field specified coronavirus, covid, pandemic or something similar. Although technically correct (the Federal Government considers the COVID-19 pandemic a natural disaster), it was an unexpected response to these 2 questions. The COVID-19 pandemic has affected everyone, yet only some respondents reflected that.

The decision was made to include an additional 2 variables in the Wave 3 dataset. These 2 variables (cslndothc, cslndfmoc) reflect a new categorised other free text variable. The values of these variables are shown in Table 14.

Table 13: Free text responses for other natural disasters

QuestionDescriptionWave 3 frequency
Have you been affected by any of the following natural disasters in the past year?Coronavirus280
Earthquake8
House fire3
Other18
Total309
Has a close friend or family member been affected by any of the following natural disasters in the past year?Coronavirus166
Other10
Total176

Table 14: New categorical variables for other natural disasters

VariableLabelValueDescriptionWave 3 frequency
cslndothcNatural Disaster - Other Category0Coronavirus280
1Other29
-2Not Applicable7,610
cslndfmocNatural Disaster - Family/friend - Other Category0Coronavirus166
1Other10
-2Not Applicable7,743

Data users are advised to make their own decisions about whether to include or exclude the COVID-19 pandemic as a natural disaster.

Appendix A: History of dataset releases

Appendix A

History of dataset releases

DateReleaseDatasetSuggested citation and DOI
July 2016Release 1.0Wave 1

Pirkis, J., English, D., & Currier, D. (2016). The Australian Longitudinal Study on Male Health (Ten to Men), 2013. [computer file]. Canberra: Australian Data Archive, The Australian National University.

  • DOI:10.4225/87/587ebdbc851b1
August 2017Release 2.0

Respondent

Wave 1

Wave 2

Pirkis, J., English, D., & Currier, D. (2017). The Australian Longitudinal Study on Male Health (Ten to Men), 2013. [computer file]. Canberra: Australian Data Archive, The Australian National University.

  • Respondent DOI: 10.4225/87/N8C9NP
  • Wave 1 DOI: 10.4225/87/Z4PEZN
  • Wave 2 DOI: 10.4225/87/2KHTSV
September 2019Release 2.1

Wave 1

Wave 2

Bandara, D., Howell, L., & Daraganova, G. (2019). Ten to Men: The Australian Longitudinal Study on Male Health, Release 2.1 (Waves 1-2).

  • DOI:10.26193/V2IVIG, ADA Dataverse
September 2021Release 3.0

Wave 1

Wave 2

Wave 3

Bandara, D., Howell, L., Silbert, M., & Daraganova, G. (2021). Ten to Men: The Australian Longitudinal Study on Male Health, Release 3 (Waves 1-3).

  • DOI:10.26193/JDE1TD, ADA Dataverse
September 2023Release 4.0

Wave 1

Wave 2

Wave 3

Wave 4

Volpe, F. Suares, M., Silbert, M., & Martin, S. (2023). Ten to Men: The Australian Longitudinal Study on Male Health, Release 4.0, (Waves 1-4).
Appendix B: List of renamed variables

Appendix B

The tables below show the variables that have been renamed for each release.

Table 15: Variables renamed for Release 2.1

LabelWaveOld variable name 
(Release 2.0)
New variable name 
(Release 2.1)
SA1 code confidentialised (2011 Census based)1zdcsa1codmdaldsa1c11md
SA1 code confidentialised (2011 Census based)2zdcsa1codmdbldsa1c11md
SA1 code confidentialised (2016 Census based)2n/abldsa1c16md
SA2 code confidentialised (2011 Census based)1zdcsa2codmdaldsa2c11md
SA2 code confidentialised (2011 Census based)2zdcsa2codmdbldsa2c11md
SA2 code confidentialised (2016 Census based)2n/abldsa2c16md
SA Modified Monash Model Classification1zdcmmmcsamadcmmmcsam
SA Modified Monash Model Classification2zdcmmmcsambdcmmmcsam
ASGS Region (2011 Census based)1zdcremotemaldremt11m
ASGS Region (2011 Census based)2zdcremotembldremt11m
ASGS Region (2016 Census based)2n/abldremt16m
State (2011 Census based)1zshstate0idaldstat11id
State (2011 Census based)2zshstate0idbldstat11id
State (2016 Census based)2n/abldstat16id
Number of Household Participants1zdchmpartedadchmparted
Sampling Weights (2011 Census based)1zdcwgt001mdadcwgts11md
SEIFA Index of Relative Socio-Economic Disadvantage - Rank (2011 Census based)1zdcirsdr0ialdirdr11i
SEIFA Index of Relative Socio-Economic Disadvantage - Rank (2011 Census based)2zdcirsdr0ibldirdr11i
SEIFA Index of Relative Socio-Economic Disadvantage - Rank (2016 Census based)2n/abldirdr16i
SEIFA Index of Relative Socio-Economic Disadvantage - Percent (2011 Census based)1zdcirsdp0ialdirdp11i
SEIFA Index of Relative Socio-Economic Disadvantage - Percent (2011 Census based)2zdcirsdp0ibldirdp11i
SEIFA Index of Relative Socio-Economic Disadvantage - Percent (2016 Census based)2n/abldirdp16i
SEIFA Index of Relative Socio-Economic Disadvantage - Decile (2011 Census based)1zdcirsdd0ialdirdd11i
SEIFA Index of Relative Socio-Economic Disadvantage - Decile (2011 Census Based)2zdcirsdd0ibldirdd11i
SEIFA Index of Relative Socio-Economic Disadvantage - Decile (2016 Census based)2n/abldirdd16i
SEIFA Index of Relative Socio-Economic Advantage and Disadvantage - Rank (2011 Census based)1zdcirsadrialdiadr11i
SEIFA Index of Relative Socio-Economic Advantage and Disadvantage - Rank (2011 Census based)2zdcirsadribldiadr11i
SEIFA Index of Relative Socio-Economic Advantage and Disadvantage - Rank (2016 Census based)2n/abldiadr16i
SEIFA Index of Relative Socio-Economic Advantage and Disadvantage - Percent (2011 Census based)1zdcirsadpialdiadp11i
SEIFA Index of Relative Socio-Economic Advantage and Disadvantage - Percent (2011 Census based)2zdcirsadpibldiadp11i
SEIFA Index of Relative Socio-Economic Advantage and Disadvantage - Percent (2016 Census based)2n/abldiadp16i
SEIFA Index of Relative Socio-Economic Advantage and Disadvantage - Decile (2011 Census based)1zdcirsaddialdiadd11i
SEIFA Index of Relative Socio-Economic Advantage and Disadvantage - Decile (2011 Census based)2zdcirsaddibldiadd11i
SEIFA Index of Relative Socio-Economic Advantage and Disadvantage - Decile (2016 Census based)2n/abldiadd16i
SEIFA Index of Economic Resources - Rank (2011 Census based)1zdcierr00ialdierr11i
SEIFA Index of Economic Resources - Rank (2011 Census based)2zdcierr00ibldierr11i
SEIFA Index of Economic Resources - Rank (2016 Census based)2n/abldierr16i
SEIFA Index of Economic Resources - Percent (2011 Census based)1zdcierp00ialdierp11i
SEIFA Index of Economic Resources - Percent (2011 Census based)2zdcierp00ibldierp11i
SEIFA Index of Economic Resources - Percent (2016 Census based)2n/abldierp16i
SEIFA Index of Economic Resources - Decile (2011 Census based)1zdcierr00ialdierd11i
SEIFA Index of Economic Resources - Decile (2011 Census based)2zdcierr00ibldierd11i
SEIFA Index of Economic Resources - Decile (2016 Census based)2n/abldierd16i
SEIFA Index of Education and Occupation - Rank (2011 Census based)1zdcieor00ialdieor11i
SEIFA Index of Education and Occupation - Rank (2011 Census based)2zdcieor00ibldieor11i
SEIFA Index of Education and Occupation - Rank (2016 Census based)2n/abldieor16i
SEIFA Index of Education and Occupation - Percent (2011 Census based)1zdcieop00ialdieop11i
SEIFA Index of Education and Occupation - Percent (2011 Census based)2zdcieop00ibldieop11i
SEIFA Index of Education and Occupation - Percent (2016 Census based)2n/abldieop16i
SEIFA Index of Education and Occupation - Decile (2011 Census based)1zdcieod00ialdieod11i
SEIFA Index of Education and Occupation - Decile (2011 Census based)2zdcieod00ibldieod11i
SEIFA Index of Education and Occupation - Decile (2011 Census based)2n/abldieod16i
Sex in the past 12 months2bhxsex120abbxsex120a

Table 16: Variables renamed for Release 3.0

LabelWaveOld variable name 
(Release 2.1)
New variable name 
(Release 3.0)
Initial cross-sectional population weight for Wave 11adcicswgtmdadcicpwtad
Raked cross-sectional population weight for Wave 11adcrcswgtmdadcrcpwtad
Initial cross-sectional population weight for Wave 22bdcicswgtmdbdcicpwtbd
Raked cross-sectional population weight for Wave 22bdcrcswgtmdbdcrcpwtbd
Initial longitudinal population weight between Wave 1 and Wave 22bdcilgwgtmdbdcilpwabd
Raked longitudinal population weight between Wave 1 and Wave 22bdcrlgwgtmdbdcrlpwabd

Table 17: Variables renamed for Release 4.0

LabelWaveOld variable nameLabel
Coronavirus – lacked companionship (current)3cslcvfc01csllosc01
Coronavirus – felt left out (current)3cslcvfc02csllosc02
Coronavirus – felt isolated (current)3cslcvfc03csllosc03
Coronavirus – felt lonely (current)3cslcvfc04csllosc04
Coronavirus – lacked companionship (during restrictions)3cslcvfr01cslloscr1
Coronavirus – felt left out (during restrictions)3cslcvfr02cslloscr2
Coronavirus – felt isolated (during restrictions)3cslcvfr03cslloscr3
Coronavirus – felt lonely (during restrictions)3cslcvfr04cslloscr4
Glossary

Glossary

TermDescription
ABSAustralian Bureau of Statistics
ANZSCOAustralian and New Zealand Standard Classification of Occupations
ASCEDAustralian Standard Classification of Education
ASCLAustralian Standard Classification of Languages
AIFSAustralian Institute of Family Studies
ASGSAustralian Statistical Geographic Standards
BMIBody Mass Index
DCData Collection
DOIDigital Object Identifier
General ReleaseThis dataset includes data from which the more sensitive information has been removed. Confidentilisation has also been considered for all variables and applied if required
LDLinked Data
NFDNot further defined
Respondent DatasetA dataset containing key indicator data, such as the unique study identifier, age, household identifier and geographical information.
Restricted ReleaseThis dataset includes information at a more detailed level than the General Release datasets. Items include language, occupation, and country of birth at the 4-digit levels.
SA1Statistical Area 1
SA2Statistical Area 2
SACCStandard Australian Classification of Countries
SEIFASocio-Economic Indexes for Areas
SRCSocial Research Centre
TTMTen to Men Study
UoMUniversity of Melbourne
UpdateAn update occurs when significant changes are made to an existing release. For example, the update to Release 2.0 resulted in it being reissued as Release 2.1.
Wave datasetA dataset containing the responses to the corresponding questionnaire of a given wave.
Acknowledgements and citation

Acknowledgements

Ten to Men: The Australian Longitudinal Study on Male Health is the first large-scale, nationally representative, longitudinal study to focus exclusively on investigating and improving the health and wellbeing of males in Australia. It is also the largest longitudinal study of male health in the world.

Ten to Men was commissioned and is funded by the Australian Government Department of Health to inform the National Male Health Policy. The study was initially conducted by the University of Melbourne who released datasets, including data documentation, for Wave 1 and Wave 2. Roy Morgan Research undertook the data collection and initial data processing for these 2 waves.

After a competitive tender process in 2017, the Australian Institute of Family Studies (AIFS) was awarded with the responsibility to conduct Waves 3 and 4. Since then, AIFS has updated the Wave 1 and Wave 2 datasets, including data documentation.

In 2020, the study team re-evaluated and revised the survey content and methodology to enable contactless interviewing for Wave 3. New items designed to collect information on the impacts of COVID-19 and the recent effects of natural disasters were also incorporated into the revised survey. The online survey went live at the end of July 2020, with data collection concluding in February 2021.

Minimal changes, both in terms of the survey content and the data collection method, occurred between Wave 3 and Wave 4. The Wave 4 online survey data collection period was from August 2022 to December 2022.

The Social Research Centre (SRC), in collaboration with Ipsos, was contracted to undertake the fieldwork component for Waves 3 and 4 of the study.

Citation

Volpe, F. Suares, M., Silbert, M., & Martin, S. (2023). Ten to Men: The Australian Longitudinal Study on Male Health - Data Issues Paper, Release 4.0, (Waves 1-4). Melbourne: Australian Institute of Family Studies.

Share