Data Issues Paper

Content type

Data use Documentation

Published

December 2023

Project

Ten to Men

Part of a collection

Resources for data users

Downloads

Data Issues Paper - Wave 4 2.11 MB

Overview

The Data Issues Paper provides a summary of data-related issues that have been identified in the Ten to Men data. It has been designed to assist users of the data as they undertake research and analysis. It should be read in conjunction with the Ten to Men Data User Guide.

The Data Issues Paper provides information to data users on:

observed inconsistencies and issues that they should be aware of when analysing and interpreting the Ten to Men data
recommendations and guidance in the management of identified data quality issues in the Ten to Men data.

The Data Issues Paper has been divided into 3 sections:

a history of the Ten to Men datasets
changes to the structure of the Ten to Men datasets
description of identified data quality issues.

Further sections will be added as any data-related issues emerge.

Data Issue Paper Updates

Date	Version	Update	Suggested citation
September 2019	1.0	Initial version	Howell, L., Bandara, D., Mohal, J., Andalon, M., Silbert, M., Garrard, B., Swami, N., & Daraganova, G. (2019). Ten to Men: The Australian Longitudinal Study on Male Health - Data Issues Paper, Version 1.0, September 2019. Melbourne: Australian Institute of Family Studies.
September 2021	2.0	Updated for Wave 3	Howell, L., Silbert, M., & Bandara, D. (2021). Ten to Men: The Australian Longitudinal Study on Male Health - Data Issues Paper, Version 2.0, September 2021. Melbourne: Australian Institute of Family Studies.
March 2022	2.1	Addition of Section 3.9 Current Occupation	Howell, L., Silbert, M., & Bandara, D. (2022). Ten to Men: The Australian Longitudinal Study on Male Health - Data Issues Paper, Version 2.1, March 2022. Melbourne: Australian Institute of Family Studies.
September 2023	3.0	Updated for Wave 4	Volpe, F. Suares, M., Silbert, M., & Martin, S. (2023). Ten to Men: The Australian Longitudinal Study on Male Health - Data Issues Paper, Release 4.0, (Waves 1-4). Melbourne: Australian Institute of Family Studies.

1. Ten to Men data

Periodically a new release of the Ten to Men datasets will be generated as additional information becomes available after each data collection wave. The releases are numbered in sequential order and a new Digital Object Identifier (DOI) is minted.

There have been 5 releases of data:

Release 1.0 was issued by the University of Melbourne and contained data from Wave 1 only.
Release 2.0 was also issued by the University of Melbourne. It contained data from both Wave 1 and Wave 2, as well as a respondent dataset.
Release 2.1 was issued by the Australian Institute of Family Studies (AIFS) and comprised of updated Wave 1 and Wave 2 datasets. Relevant data from the respondent dataset was included in these datasets and it is no longer available as a separate dataset.
Release 3.0 was issued by AIFS and contained data from Wave 1, Wave 2 and Wave 3.
Release 4.0 was also issued by AIFS, and contained data from Wave 1, Wave 2, Wave 3 and Wave 4.

A history of the dataset releases and suggested citations can be found in Appendix A.

2. Ten to Men datasets

This section documents the structural changes that have been applied to the Ten to Men datasets. These structural changes enhance the usability of the datasets, especially as data from additional waves are included. Structural changes include the merging of datasets, resolving data inconsistencies, addressing quality issues, and augmenting data resources with additional information.

Most of the major structural changes were implemented in Release 2.1. Table 1 provides a summary of all structural changes, and further details can be found in the corresponding sections.

Table 1: Summary of changes to the dataset structure

Structural change	Implementation	See section for further details
Addition of a data sharing framework	Release 2.1	2.1
Respondent data added to Wave 1 and Wave 2 datasets, removing the need for a separate dataset	Release 2.1	2.2
Renaming of variables to indicate wave, thus aligning with the standard naming convention for variables	Release 2.1	2.3
Renaming of variables to indicate research domain	Release 2.1	2.4
Addition of a new research domain for linked data	Release 2.1	2.5
Renaming of census linked variables to include reference year	Release 2.1	2.6
Renaming of weight variables to include reference to population or sample weights	Release 3.0	2.7

2.1 Addition of a data sharing framework

To increase the utility of information while minimising disclosure risks, a data sharing framework to differentiate the user's access level was adopted for Release 2.1 of the Ten to Men datasets. This resulted in 2 levels of datasets for each wave being generated - the General Release and the Restricted Release.

A lower level of confidentilisation was applied to the Restricted Release dataset, with all initial information preserved. The only information not included in the dataset are names, addresses and other contact details. Access to the Restricted Release dataset may only granted when data users are able to demonstrate a genuine need for the additional data, and when they also meet the necessary additional security requirements.

The General Release dataset has undergone further data confidentilisation. In addition to the information removed for the Restricted Release dataset, further confidentilisation for the General Release dataset includes supressing variables, aggregating response categories and recoding outlying values to a less extreme value. Users can consult the Ten to Men Data Dictionary for more information on the confidentialised variables.

As access requirements for the General Release dataset are less rigorous than for the Restricted Release dataset, this has improved accessibility for users to the Ten to Men datasets.

For further information about the Ten to Men datasets, including data access procedures, users can refer to Sections 3 and 8 of the Ten to Men Data User Guide.

2.2 Availability of respondent dataset

Release 2.0 comprised of 3 Ten to Men datasets - Respondent, Wave 1 and Wave 2. The Respondent dataset contained key indicator data, such as the unique study identifier, age, household identifier and geographical information. The dataset for each wave contained the responses to the corresponding questionnaires.

In Release 2.1, relevant information from the respondent dataset was included in the Ten to Men Wave 1 and Wave 2 datasets. This has removed the necessity of maintaining a separate respondent dataset, and thus only 2 datasets were released at each level - Wave 1 and Wave 2.

The inclusion of respondent data in the wave datasets has been applied in all subsequent releases.

2.3 Renaming of variables to indicate wave

The standard naming convention of the Ten to Men variables specifies that the first character of the variable should indicate the wave or be a 'z' if the variable is constant across waves.

In Releases 1.0 and 2.0, some variables in the respondent datasets did not follow this standard naming convention. The first character of the variable name was 'z' but they were no consistent across waves. In these cases, the variable label specified whether the variable related to Wave 1 or Wave 2.

It is important that variables conform to the Ten to Men standard naming convention to maintain consistency and uniformity across the data. Therefore, in Release 2.1, these variables were renamed to follow the standard naming convention; that is, the first charact of the variable was changed to indicate the wave if the variable was not constant across waves.

Further details of all variables that were renamed are shown in Appendix B.

2.4 Renaming of variables to indicate research domain

The standard naming convention of the variables in the Ten to Men datasets specifies that the second and third characters of the variable should indicate the research domain. A list of all the research domains can be found in the Ten to Men Data Dictionary and the Ten to Men Data User Guide.

One variable was identified in Releases 1.0 and 2.0 where the second and third characters of the variable did not correspond to a research domain. As it is important to maintain consistency, this variable has been renamed to reflect the correct research domain.

Table 2: List of variables with corrections to the research domain

	Variable name	Research domain (2^nd and 3^rd characters)	Time of correction
Original	bhxsex120a	hx is not a research domain
Correction	bbxsex120a	Behaviours - sexual behaviour (bx)	Release 2.1

2.5 Additional research domain for linked data

In Releases 1.0 and 2.0, the research domain of Data Collection (dc) is comprised of both key indicator variables and linked data. This includes variables such as the unique study ID, participation indicators, household indicators, statistical area codes (SA1, SA2) and numerous socio-economic indexes for areas (SEIFA).

To provide transparency about the data source, these variables were separated into 2 research domains for Release 2.1 The key indicator variables remain in the research domain of Data Collection (dc), while an additional research domain was created for Linked Data (ld).

The standard naming convention for variables specifies that the second and third characters of the variable name should indicate the research domain. Consequently, the creation of a new research domain resulted in the renaming of some variables to conform to this standard. That is, the second and third characters of the variable names in the linked data research domain were changed form 'dc' to 'ld'.

Further details of all variables that were renamed are shown in Appendix B.

2.6 Renaming of census-based variables

In Release 2.0, the respondent dataset contained linked data from the Australian Bureau of Statistics (ABS) 2011 Census. These variables did not contain any information to indicate the census year.

As new census data becomes available, this has been added to the Ten to Men datasets. It therefore became important to include a reference to the census year in the variable name.

In Release 2.1, the eighth and ninth characters of the variable name were changed to represent a year indicator. For example, the variable 'aldieod00i' has been renamed to 'aldieod11i' to indicate that it is based on the 2011 Census data. Linked data from the ABS 2016 Census was also available and added in this release.

Further details of all variables that were renamed are shown in Appendix B.

2.7. Renaming of weight variables

Population weights have been included in all releases of the Ten to Men datasets.

Sample weights were added in Release 3.0, as well as the development of a variable naming framework for the weight variables. Wave 1 and Wave 2 weighting variables were renamed to comply with this framework and to clearly indicate whether the variable referred to a population or sampling weight.

Table 3 indicates the naming convention for weights that has been applied since Release 3.0.

Table 3: Naming convention for weights

Character position in variable name	Description	Variable abbreviation
1	Wave	A, B, C or D
2,3	Research Domain	DC
4	Initial or Raked	I or R
5	Longitudinal or Cross-Sectional	L or C
6	Population or Sample	P or S
7,8,9	For Wave 1	WTA
7,8,9	For Wave 2	WTB
7,8,9	For Wave 3	WTC
7,8,9	For Wave 4	WTD
7,8,9	Between Waves 1 and 2	WAB
7,8,9	Between Waves 1 and 3	WAC
7,8,9	Between Waves 1 and 4	WAD
7,8,9	Between Waves 2 and 3	WBC
7,8,9	Between Waves 2 and 4	WBD
7,8,9	Between Waves 3 and 4	WCD
10	Derived	D

Further details of the weighting variables that were renamed are shown in Appendix B.

3. Data quality issues

Data quality is measured by factors such as accuracy, validity, consistency, and completeness. AIFS undertakes validation procedures to ensure that the Ten to Men data quality is of an appropriate standard. However, it is the responsibility of the data user to assess the data quality of the Ten to Men variables before any analysis is undertaken.

This section contains information about data quality issues that have been identified across the waves of Ten to Men. Further information will be added as any additional data quality issues emerge.

Table 4 provides a summary of the identified data quality issues and the wave/s that are affected. Detailed information about the data issue and any recommendations can be found in the corresponding sections of this paper.

Table 4: Summary of data quality issues

Data quality issue	Wave 1	Wave 2	Wave 3	Wave 4	Section
Behaviours - alcohol
Behaviours - tobacco
Behaviours - weight
Data collection indicator
Health status
Social determinants - Life Events
Social determinants - Socioeconomic Status
Missing data	●	●	●	●	3.1
Outliers	●	●	●	●	3.2
Data from Parent Questionnaire	●	●			3.4
Additional Wave 1 participants	●				3.12
Pilot data for Wave 2		●			3.13
Variable naming inconsistencies in reference to the Research Domain		●			3.15
Derived variables	●	●			3.16
Age of first drink of alcohol	●	●			3.7
Age first smoked cigarette	●	●			3.8
Height, Weight and Body Mass Index	●	●	●	●	3.5
Height			●	●	3.18
Update to weights	●	●	●		3.14
Obstructive sleep apnoea		●			3.15
Short form 12 (SF-12) Health survey	●	●			3.17
Other natural disasters			●		3.19
Age of respondents	●	●	●	●	3.3
Level of education completed	●	●			3.6
Country of birth	●				3.9
Current occupation	●	●			3.10
Language spoken at home	●				3.11

3.1. Missing data

Most variables in the Ten to Men datasets have some proportion of missing data, which has been coded using the Ten to Men standard missing value code frame (see the Ten to Men Data User Guide for more information).

The proportion and reasons for missing data should be considered before drawing any conclusion from the data.

3.2 Outliers

All releases of the Ten to Men datasets contain the raw data, with variables that have not been cleaned for outliers. The exception to this is the categorising of the extreme ends as part of the confidentilisation process for the General Release datasets. The variables where this type of confidentilisation has been applied are indicated in the Ten to Men Data Dictionary.

Data users are advised to take care when using and interpreting the Ten to Men data as the presence of outliers may necessitate excluding values or categorising the extreme ends.

3.3 Age of respondents

Cohort inconsistencies

The scope of Ten to Men was males aged 10-55 years at Wave 1, with 3 cohorts:

males aged 10-14 years completing a Boys questionnaire
males aged 15-17 years completing a Young Men questionnaire
males aged 18 years and over completing an Adult questionnaire.

However, there were a small number of men invited to participate whose age was outside the scope or who completed the incorrect questionnaire for their age. The inconsistency arises with less than 0.5% of respondents and is likely to have occurred due to the difference in time between sending out the hard copy questionnaires and the respondents completing the questionnaires. The survey data for these respondents have been retained in the Ten to Men datasets.

The inconsistencies are present in all releases of the Wave 1 and Wave 2 datasets. From Wave 3, there was only one questionnaire and therefore this inconsistency is not an issue.

Calculation of age in Wave 3

In Wave 3 and all subsequent waves, the age of the respondent was not asked in the questionnaire. For inclusion in the datasets, the age is calculated using the respondent's date of birth.

As part of the respondent validation process for Wave 3 , the date of birth was asked. Therefore, there are 2 sources of the date of birth - the master contact file and Wave 3 survey data. A process was undertaken to compare the date of birth from the 2 sources, and it was the same for 97% of respondents.

Further investigation of the 3% where the date of birth differed showed that many only supplied the birth year for Wave 3 data. An assumption has been made that the birth date on the contact file is correct and this has been used to calculate the age of the respondent in Wave 3 (the Wave 3 survey date was also used in the calculations).

There are 5 observations where no date of birth has been supplied (in either Wave 3 or the master contact file). In these cases, the age at Wave 1 and Wave 2, as well as the survey completion dates have been used to impute an age for Wave 3. The 5 unique study identifiers (zdcid0001d) where this occurred are 5003136, 7006305, 7007404, 8010082 and 9015997.

In Wave 4, the date of birth was asked again as part of the respondent validation process.

A process was undertaken to compare the date of birth against the master contact file, and it was the same for 99% of respondents.

Further investigation of the 1% where the date of birth differed showed that many only supplied the birth year for Wave 3 data. As per Wave 3, an assumption has been made that the birth date on the contact file is correct and this has been used to calculate the age of the respondent in Wave 4 (the Wave 4 survey date was also used in the calculations).

New Inconsistencies have been found since asking date of birth in Wave 4. Differences in Wave 4, Wave 3 and master contact file are now present. To make age consistent in Wave 4, and across subsequent waves, age in Wave 4 has calculated either from date of birth from master contact file or if missing, from a combination of Wave 3 based on Wave 1 age and survey completion dates.

There is one observation where no date of birth has been supplied (in either Wave 4, Wave 3 or the master contact file). In this case, the age at Wave 1, Wave 2, Wave 3 as well as the survey completion dates have been used to impute an age for Wave 4. The unique study identifiers (zdcid0001d) where this occurred is 7007404.

3.4 Parent questionnaire data

For Wave 1 and Wave 2 of Ten to Men, the parents of the males aged 10-14 years also completed a questionnaire. The parent was not assigned an ID and therefore, it cannot be determined if the same parent filled in the questionnaire for both Wave 1 and Wave 2. This is important as some questions were subject to the parent's perception. For example, 'In the past 4 weeks, how often does your child feel happy?'

As a result, data users are advised to take extreme care if comparing responses from the Parent questionnaire across Wave 1 and Wave 2.

3.5 Anthropometric measurements

The Ten to Men questionnaires contain questions about anthropometric measurements. Some of the responses are implausible (e.g. a height of 1 cm).

Data users are advised to clean and make their own decisions when dealing with anthropometric measurements as they may contain erroneous data values that will affect derived values and interpretations.

3.6 Level of education completed

Questions about the completed level of education have been asked in all waves of Ten to Men. However, the response categories for the various questionnaires (Boys, Young Men, Adults, Parents) and waves has not been consistent. Extreme care needs to be taken when using this education data, especially if comparing values across questionnaires and/or waves.

The Australian Standard Classification of Education (ASCED) could be used to further categorise this data. In this case, Primary education should also include Year 7 for South Australia only. More information on the ASCED and how it is structured can be found on the ABS website.

3.7 Age when first drank alcohol

Summary

A data issue has been identified with the responses to the question 'How old were you when you first drank more than just a sip or a taste of alcohol?'. The question was included on 3 questionnaires (Boys, Young Men and Adults) for Waves 1 and 2, and a common variable was created to hold the responses for each wave:

'abaalcagem' contains the responses from all questionnaires for Wave 1
'bbaalcagem' contains the responses from all questionnaires for Wave 2.

The data issue arose as a format was applied to the responses to this question on the Boys questionnaire. No format, other than the missing value formats, was applied to the responses to this question on the Young Men and Adults questionnaires. When the data from the Boys questionnaire was merged with the data from the Young Men and Adults questionnaires to create the common variable, the format for the responses from the Boys questionnaire was not applied.

As a result, the data from the Boys questionnaire for this question was incorrectly reduced by 4 years. For example, a response of 12 years would be recorded as 8 years in the Ten to Men dataset.

This data issue is present in Releases 1.0 and 2.0 of the Ten to Men datasets, but the raw data has been amended in Release 2.1.

Further details

The data (excluding the missing values) from Release 2.0 of the Ten to Men datasets is shown in Table 5. Responses from both the Boys and Young Men questionnaires are shown for comparison. Each cell in the table is colour coded:

Grey - representing recorded plausible responses in the Ten to Men dataset
Green - representing no recorded responses in the Ten to Men dataset
Black - representing recorded implausible values given the age of the respondent at the time of the survey (e.g. a 10 year old cannot respond that they started drinking at 12 years).

Table 5: Data from Release 2.0 (Waves 1 and 2)

The data (excluding the missing values) from Release 2.0 of the Ten to Men datasets is shown in Table 5. Responses from both the Boys and Young Men questionnaires are shown for comparison. Each cell in the table is colour coded: Grey - representing recorded plausible responses in the Ten to Men datasetGreen - representing no recorded responses in the Ten to Men datasetBlack - representing recorded implausible values given the age of the respondent at the time of the survey

There is an issue with the data from the Boys questionnaire, as there is no recorded response of anyone having their first drink of alcohol after the age of 10 (green cells). There is also a higher than expected number of respondents having their first drink of alcohol before the age of 5 years (grey cells).

This issue is especially evident when compared to the data from the Young Men questionnaire.

Further investigation identified a problem with different formats being applied.

The format applied to the responses to this question on the Boys questionnaire is shown in Table 6. Applying this format meant the if the respondent replied 10 years of age, the data was entered as 6.

Table 6: Format applied to the Boys questionnaire

Code	Format
-8	No questionnaire or interview completed
-7	Unable to determine value
-6	Value implausible
-5	Invalid multiple response
-4	Refused or not answered
-3	Don't know
-2	Not applicable
-1	Not asked
1	5 years old
2	6 years old
3	7 years old
4	8 years old
5	9 years old
6	10 years old
7	11 years old
8	12 years old
9	13 years old
10	14 years old

The corresponding question in the Young Men and Adults questionnaires only had the missing value formats applied (codes -8 to -1). For example, if the respondent replied 10 years of age, the data was entered as 10.

So in summary, if the respondent replied 10 years of age, the data entered was either 6 (Boys questionnaire) or 10 (Young Men or Adults questionnaires).

The data from all questionnaires was then combined to form the Ten to Men datasets. When the data from the Boys questionnaire was merged with the data from the Young Men and Adults questionnaires, no format other than the missing value formats was applied. The format for the Boys questionnaire was not applied and the formatted age value was replaced with the code. As a result, the age of the first drink of alcohol for the Boys data was reduced by 4 years (with the maximum age possible being 10).

This issue is present in both Release 1.0 and 2.0 of the Ten to Men datasets.

3.8 Age when first smoked cigarettes

A data issue has been identified with the responses to the question 'How old were you when you first smoked your first cigarette?'. The question was included on 3 questionnaires (Boys, Young Men and Adults) for Waves 1 and 2, and a common variable was created to hold the responses for each wave:

'abtcigagem' contains the responses from all questionnaires for Wave 1
'bbtcigagem ' contains the responses from all questionnaires for Wave 2.

As a result, the data from the Boys questionnaire for this question was incorrectly reduced by 4 years. For example, a response of 12 years would be recorded as 8 years in the Ten to Men dataset.

This data issue is present in Releases 1.0 and 2.0 of the Ten to Men datasets, but the raw data has been amended in Release 2.1.

As it is the same data issue as described above, see section 3.7 for further details.

3.9 Country of birth

This section describes a data issue that was present in all Releases prior to Release 4.0. A file containing the raw country of birth data was located in late 2022, and therefore this data issue was corrected for the Wave 1 dataset in Release 4.0.

In Wave 1 of Ten to Men, each questionnaire contained 3 questions about participant's country of birth and their parents' country of birth. The response options included 'Other', where the respondent could specify any country using the free text field.

The data were recorded in the 3 variables:

participant's country of birth (asecobownm)
mother's country of birth (asemocob1m)
father's country of birth (asefacob1m).

This was then re-coded using the Standard Australian Classification of Countries (SACC) and an additional 9 variables at the 1-digit, 2-digit and 4-digit levels were created. These variables contain more detail than the categories provided on the questionnaire, as the 'Other' category has been expanded to include languages specified in the free text field. They are:

participant's country of birth (asecobow1md, asecobow2md, asecobow4md)
mother's country of birth (asemocob1md, asemocob2md, asemocob4md)
father's country of birth (asefacob1md, asefacob2md, asefacob4md).

Although the SACC is a 3-level hierarchical structure, this has not been strictly applied to the data. Small values at the 2-digit and 4-digit levels have been confidentialised by replacing with 99 or 9999 instead of using the supplementary codes (not further defined (nfd)).

Therefore, care should be taken when using the variables at the 2-digit and 4-digit levels, as it will give higher 'Other' results than expected. Further details of the coding are shown in Table 7.

For data users, it is recommended that the variables at the 2-digit and 4-didigt levels are used in conjunction with the 1-digit level variable. The confidentialised variables at the 2-digit and 4-digit levels can then be replaced with the corresponding nfd code.

Table 7: Country of birth codes

Country of birth (1-digit code)	Country of birth (2-digit code)	Suggested replacement country of birth (2-digit code)	Wave 1 frequency
1	99	10 Oceania and Antarctica nfd	46
2	99	20 North-West Europe nfd	17
3	99	30 Southern and Eastern Europe nfd	26
4	99	40 North Africa and Middle East nfd	45
5	99	50 South-East Asia nfd	0
6	99	60 North-East Asia nfd	29
7	99	70 Southern and Central Asia nfd	28
8	99	80 Americas nfd	10

3.10 Current occupation

In Wave 1 and Wave 2 of Ten to Men, the Adult questionnaire contained a question about the participant's current occupation. It was a free text field, requesting both the Job title and the main duties/tasks.

This data was then coded using the Australian and New Zealand Standard Classification of Occupations (ANZSCO). Three variables for the participant's current occupation were created for each wave. These are at the 1-digit, 2-digit and 4-digit levels:

1-digit level (aseempoc1ad, bseempoc1ad)
2-digit level (aseempoc2ad, bseempoc2ad)
4-digit level (aseempoc4ad, bseempoc4ad)

Although ANZSCO is a 3-level hierarchical structure, this has not been strictly applied to the data. Small values at the 2-digit and 4-digit levels have been confidentialised by replacing with 99 or 9999 instead of using the supplementary codes (not further defined (nfd)). Some values at the 2-digit level have been coded as -7 (Unable to determine value) because the 4-digit level has been confidentialised to 9999.

Therefore, care should be taken when using the variables at the 2-digit and 4-digit levels, as it will give higher 'Other' results than expected. Further details are shown below in Table 8.

Table 8: Current occupation codes

Current occupation (1-digit code)	Current occupation (2-digit code)	Suggested replacement current occupation (2-digit code)	Wave 1 frequency	Wave 2 frequency
1	-7	10 Managers nfd	166	154
2	-7	20 Professionals nfd	65	55
3	-7	30 Technicians and Trades Workers nfd	70	97
5	-7	50 Clerical and Administrative Workers nfd	14	12
5	99	50 Clerical and Administrative Workers nfd	41	32
6	-7	60 Sales Workers nfd	44	38
6	99	60 Sales Workers nfd	49	38
7	-7	70 Machinery Operators and Drivers nfd	75	43
8	-7	80 Labourers nfd	41	40
8	99	80 Labourers nfd	0	42

For data users, it is recommended that the variables at the 2-digit and 4-digit levels are used in conjunction with the 1-digit level variable. The confidentialised variables at the 2-digit and 4-digit levels can then be replaced with the corresponding nfd code.

The Parent's questionnaire asked the same question about the parent's current occupation. The variables for this are:

1-digit level (aseempoc1pd, bseempoc1pd)
2-digit level (aseempoc2pd, bseempoc2pd)
4-digit level (aseempoc4pd, bseempoc4pd)

This data has the same issue and recommendations as the participant's current occupation.

Table 9: Current occupation codes (Parents)

Current occupation (1-digit code)	Current occupation (2-digit code)	Suggested replacement current occupation (2-digit code)	Wave 1 frequency	Wave 2 frequency
1	-7	10 Managers nfd	10	10
1	99	10 Managers nfd	103	62
2	-7	20 Professionals nfd	5	2
2	99	20 Professionals nfd	65	66
3	-7	30 Technicians and Trades Workers nfd	1	0
3	99	30 Technicians and Trades Workers nfd	54	0
4	99	Community and Personal Service Workers nfd	46	62
5	-7	50 Clerical and Administrative Workers nfd	2	4
5	99	50 Clerical and Administrative Workers nfd	119	68
8	-7	80 Labourers nfd	3	0
8	99	80 Labourers nfd	58	0
9	-7	99 Other	9	3

3.11 Language spoken at home

In Wave 1 of Ten to Men, each questionnaire contained a question about the language spoken at home. However, the response categories varied across the 3 different questionnaires.

Adult questionnaire

The Adult questionnaire had 7 options for the response to the question about language, which are shown in Table 10. Once option was 'Other', where the respondent could specify any other language using a free text field.

Table 10: Language codes for Adult questionnaire

Code	Language
1201	English
2201	Greek
2401	Italian
4202	Arabic
6302	Vietnamese
7104	Mandarin
9999	Other

This data was then re-coded using the Australian Standard Classification of Languages (ASCL) and 3 variables at the 1-digit, 2-digit and 4-digit levels were created (aselangh1ad, aselangh2ad, aselangh4ad). These variables contain more detail than the categories on the questionnaire, as the 'Other' category has been expanded to include languages specified in the free text field.

Table 11: Language codes for Adult questionnaire

Language (1-digit level) aselangh1ad	Language (2-digit level) aselangh2ad	Suggested replacement language (2-digit level)	Wave 1 frequency
1	99	10 Northern European Languages nfd	30
2	99	20 Southern European Languages nfd	72
3	99	30 Eastern European Languages nfd	56
4	99	40 Southwest and Central Asian Languages nfd	57
5	99	50 Southern Asian Languages nfd	2
6	99	60 Southeast Asian Languages nfd	30
7	99	70 Eastern Asian Languages nfd	20

Although detailed information on the language can be obtained, the small values at these levels have resulted in the variables being confidentialised (some values have been replaced by 99 or 9999). Care should be taken when using the variables at the 2-digit and 4-digit levels, as it will give higher 'Other' results than expected. Further details are shown in Table 11.

We recommend that the variables at the 2-digit and 4-digit levels be used in conjunction with the 'aselangh1ad' variable. The confidentialised variables at the 2-digit and 4-digit levels can then be replaced with the corresponding nfd code.

Boys and Young Men questionnaires

The Boys and Young Men questionnaires only had 3 options for the response to this question about language, as shown in Table 12, and recorded as the variable 'aselangh1u'.

Table 12: Language codes for Boys and Young Men questionnaires

Code	Language
1	English
2	Another language
3	English and another language about equally

The respondent could specify the other language using the free text field and this was re-coded using the ASCL. Three variables at the 1-digit, 2-digit and 4-digit levels were created (aselangh1ud, aselangh2ud, aselangh4ud).

However, the small values at this level have resulted in the variables being totally confidentialised (all values have been replaced by 9, 99 or 9999).

Therefore, no information about the other languages spoken at home is available in the Ten to Men datasets for the Boys and Young Men.

3.12 Additional Wave 1 participants

During Wave 2 of Ten to Men, 33 additional participants were identified for Wave 1. They were not included in the original Wave 1 dataset (Release 1.0) as their eligibility and consent status had not been determined at that stage, but this issue was resolved during Wave 2.

In Release 1.0, the sample size for Wave 1 was 15,988. This was comprised of the 3 cohorts:

1,087 males aged 10-14 years completing a Boys questionnaire
1,017 males aged 15-17 years completing a Young Men questionnaire
13,884 males aged 18 years and over completing an Adult questionnaire.

In Releases 2.0 and 2.1, the 33 additional participants have been subsequently included in Wave 1, taking the reconciled sample size for Wave 1 to 16,021. The reconciled cohort sizes are:

1,099 males aged 10-14 years completing a Boys questionnaire
1,026 males aged 15-17 years completing a Young Men questionnaire
13,896 males aged 18 years and over completing an Adult questionnaire.

3.13 Pilot data for Wave 2

Of the reconciled Wave 1 sample, there were 314 respondents who were interviewed in the Ten to Men pilot for Wave 2. These respondents did not complete a questionnaire during the course of the main data collection period for Wave 2.

In Releases 1.0 and 2.0, the pilot data has been included in Wave 2 datasets. The sample size was 12,250 males.

In Release 2.1, the data for these 314 respondents have been removed from the Wave 2 dataset. This has reduced the sample size for Wave 2 to 11,936 males. From this Release onwards, these 314 respondents will remain part of the pilot and not be included in the main sample.

3.14 Update to weights

Release 1.0 and 2.0 only included sample weights for Wave 1.

Upon review of the Ten to Men data, it was decided to include Wave 2 weights in Release 2.1. It was necessary to update the Wave 1 weights to ensure that the weights for Wave 2 were developed using the same approach and references as those used in the calculation of the Wave 1 weights.

Therefore, Release 2.1 of the Ten to Men datasets contains the updated weights for Wave 1 and the new sample weights for Wave 2.

In Release 3.0, population and sample weights have been included for all waves.

3.15 Obstructive sleep apnoea

For Wave 2 of Ten to Men, there were 4 questions asked in the Adult questionnaire relating to obstructive sleep apnoea as part of the STOP-Bang questionnaire screening tool. Further information about this screening tool can be found on the STOP-Bang website.

Four objective measures are also required as part of the STOP-Bang questionnaire screening tool: BMI, age, neck circumference and gender. The responses to these 8 elements are scored, with the result indicating low, medium or high risk of obstructive sleep apnoea.

The resulting score was recorded in the Ten to Men Wave 2 dataset as the derived variable:

Risk of OSA (STOP-Bang) (bhsosarisad).

The values of this derived variable should be stored as a score (0-8 scale), or as a Low/Medium/High format.

In Release 2.0, this variable had values of 0 or 1.

In Release 2.1, the intention was to recalculate the derived variable. However, only 7 of the 8 elements of the STOP-Bang questionnaire screen were available, as we did not have information about the neck circumference.

As a result, this derived variable (bhsosarisad) has been removed from the datasets in Release 2.1 and all subsequent releases.

3.16 Derived variables

The Ten to Men dataset contains numerous derived variables, including scale and summary scores. The calculation of these derived variables generally require input from multiple raw variables, and it is possible that one or more of these input data values may be missing.

Missing values are given negative numeric values according to the Ten to Men standard missing value code frame. More information about this code frame can be found in the Ten to Men Data User Guide.

A couple of issues have been identified with the calculation of the derived variables in Releases 1.0 and 2.0:

Any negative data values were replaced with zero in the calculation of the derived variables. This could introduce misinterpretation of data, depending on the derivation of each variable. For example, the mean of individual components may be underestimated when zero is assigned to a missing value.
Incorrect calculation of some derived variables. For example, the elements of the General Wellbeing Scale were not reversed scored before calculating the mean.

Data users using Release 1.0 or 2.0 are advised to re-check and review the interpretation of the derived variables, as the derived variable values may be underestimated or overestimated.

For Release 2.1, a set of guidelines were developed for the treatment of missing input variables for the calculation of derived variables. These are:

If all the missing input values had the same code frame, the derived variable was assigned the same missing value as per the code frame. For example, if all input variables were -4, the derived variable was assigned to be -4.
If the input variables had any combination of missing values and some valid data values, the derived variable was assigned the missing value code of -7 (Unable to determine value).

All subsequent releases follow these guidelines.

3.17 Short Form 12 (SF-12) Health Survey

The Wave 1 and Wave 2 questionnaires for Adults included the SF-12 Health Survey: a licensed scale measuring respondent's health status. An SF-12 scale score was derived and included in the dataset for Release 1.0 and 2.0.

Due to issues relating to SF-12 license approvals, the raw data items and derived scale score have been removed from Release 2.1 and subsequent releases. These items have been redacted in the annotated questionnaires and have been deleted from the Data Dictionary.

3.18 Height

In Wave 3, height was only asked if the respondent was under 23 years. Therefore, 88% were not asked this question and the height variable was initially coded as -2 (Not applicable) in Wave3.

For Wave 4, height was only asked if the respondent was under 25 years. Therefore, 91% were not asked this question and the height variable was initially coded as -2 (Not applicable) in Wave 4.

The decision was made to impute the height at Waves 3 and 4 for all respondents where the question was not asked. There are 2 sources of height - from both the Wave 1 and Wave 2 data collection. Data from both waves were used, as some respondents in Wave 3 may not have participated in Wave 1 and/or Wave 2.

An assumption has been made that the largest height value is the most accurate, and this has been used to populate the height variable in Wave 3 and 4 for those respondents aged 23 years or above.

3.19 Other natural disasters

In Wave 3, the following 2 questions were asked about whether you or a family member had experienced a natural disaster:

Have you been affected by any of the following natural disasters in the past year?
Has a close friend or family member been affected by any of the following natural disasters in the past year?

One of the response options was 'Other', where a free text field then allowed more details about the type of natural disaster. A summary of the responses from the free text field is shown in Table 13.

More than 90% of the responses in the free text field specified coronavirus, covid, pandemic or something similar. Although technically correct (the Federal Government considers the COVID-19 pandemic a natural disaster), it was an unexpected response to these 2 questions. The COVID-19 pandemic has affected everyone, yet only some respondents reflected that.

The decision was made to include an additional 2 variables in the Wave 3 dataset. These 2 variables (cslndothc, cslndfmoc) reflect a new categorised other free text variable. The values of these variables are shown in Table 14.

Table 13: Free text responses for other natural disasters

Question	Description	Wave 3 frequency
Have you been affected by any of the following natural disasters in the past year?	Coronavirus	280
	Earthquake	8
	House fire	3
	Other	18
	Total	309
Has a close friend or family member been affected by any of the following natural disasters in the past year?	Coronavirus	166
	Other	10
	Total	176

Table 14: New categorical variables for other natural disasters

Variable	Label	Value	Description	Wave 3 frequency
cslndothc	Natural Disaster - Other Category	0	Coronavirus	280
		1	Other	29
		-2	Not Applicable	7,610
cslndfmoc	Natural Disaster - Family/friend - Other Category	0	Coronavirus	166
		1	Other	10
		-2	Not Applicable	7,743

Data users are advised to make their own decisions about whether to include or exclude the COVID-19 pandemic as a natural disaster.

Appendix A: History of dataset releases

Appendix A

History of dataset releases

Date	Release	Dataset	Suggested citation and DOI
July 2016	Release 1.0	Wave 1	Pirkis, J., English, D., & Currier, D. (2016). The Australian Longitudinal Study on Male Health (Ten to Men), 2013. [computer file]. Canberra: Australian Data Archive, The Australian National University. DOI:10.4225/87/587ebdbc851b1
August 2017	Release 2.0	Respondent Wave 1 Wave 2	Pirkis, J., English, D., & Currier, D. (2017). The Australian Longitudinal Study on Male Health (Ten to Men), 2013. [computer file]. Canberra: Australian Data Archive, The Australian National University. Respondent DOI: 10.4225/87/N8C9NP Wave 1 DOI: 10.4225/87/Z4PEZN Wave 2 DOI: 10.4225/87/2KHTSV
September 2019	Release 2.1	Wave 1 Wave 2	Bandara, D., Howell, L., & Daraganova, G. (2019). Ten to Men: The Australian Longitudinal Study on Male Health, Release 2.1 (Waves 1-2). DOI:10.26193/V2IVIG, ADA Dataverse
September 2021	Release 3.0	Wave 1 Wave 2 Wave 3	Bandara, D., Howell, L., Silbert, M., & Daraganova, G. (2021). Ten to Men: The Australian Longitudinal Study on Male Health, Release 3 (Waves 1-3). DOI:10.26193/JDE1TD, ADA Dataverse
September 2023	Release 4.0	Wave 1 Wave 2 Wave 3 Wave 4	Volpe, F. Suares, M., Silbert, M., & Martin, S. (2023). Ten to Men: The Australian Longitudinal Study on Male Health, Release 4.0, (Waves 1-4).

Appendix B: List of renamed variables

Appendix B

The tables below show the variables that have been renamed for each release.

Table 15: Variables renamed for Release 2.1

Label	Wave	Old variable name (Release 2.0)	New variable name (Release 2.1)
SA1 code confidentialised (2011 Census based)	1	zdcsa1codmd	aldsa1c11md
SA1 code confidentialised (2011 Census based)	2	zdcsa1codmd	bldsa1c11md
SA1 code confidentialised (2016 Census based)	2	n/a	bldsa1c16md
SA2 code confidentialised (2011 Census based)	1	zdcsa2codmd	aldsa2c11md
SA2 code confidentialised (2011 Census based)	2	zdcsa2codmd	bldsa2c11md
SA2 code confidentialised (2016 Census based)	2	n/a	bldsa2c16md
SA Modified Monash Model Classification	1	zdcmmmcsam	adcmmmcsam
SA Modified Monash Model Classification	2	zdcmmmcsam	bdcmmmcsam
ASGS Region (2011 Census based)	1	zdcremotem	aldremt11m
ASGS Region (2011 Census based)	2	zdcremotem	bldremt11m
ASGS Region (2016 Census based)	2	n/a	bldremt16m
State (2011 Census based)	1	zshstate0id	aldstat11id
State (2011 Census based)	2	zshstate0id	bldstat11id
State (2016 Census based)	2	n/a	bldstat16id
Number of Household Participants	1	zdchmparted	adchmparted
Sampling Weights (2011 Census based)	1	zdcwgt001md	adcwgts11md
SEIFA Index of Relative Socio-Economic Disadvantage - Rank (2011 Census based)	1	zdcirsdr0i	aldirdr11i
SEIFA Index of Relative Socio-Economic Disadvantage - Rank (2011 Census based)	2	zdcirsdr0i	bldirdr11i
SEIFA Index of Relative Socio-Economic Disadvantage - Rank (2016 Census based)	2	n/a	bldirdr16i
SEIFA Index of Relative Socio-Economic Disadvantage - Percent (2011 Census based)	1	zdcirsdp0i	aldirdp11i
SEIFA Index of Relative Socio-Economic Disadvantage - Percent (2011 Census based)	2	zdcirsdp0i	bldirdp11i
SEIFA Index of Relative Socio-Economic Disadvantage - Percent (2016 Census based)	2	n/a	bldirdp16i
SEIFA Index of Relative Socio-Economic Disadvantage - Decile (2011 Census based)	1	zdcirsdd0i	aldirdd11i
SEIFA Index of Relative Socio-Economic Disadvantage - Decile (2011 Census Based)	2	zdcirsdd0i	bldirdd11i
SEIFA Index of Relative Socio-Economic Disadvantage - Decile (2016 Census based)	2	n/a	bldirdd16i
SEIFA Index of Relative Socio-Economic Advantage and Disadvantage - Rank (2011 Census based)	1	zdcirsadri	aldiadr11i
SEIFA Index of Relative Socio-Economic Advantage and Disadvantage - Rank (2011 Census based)	2	zdcirsadri	bldiadr11i
SEIFA Index of Relative Socio-Economic Advantage and Disadvantage - Rank (2016 Census based)	2	n/a	bldiadr16i
SEIFA Index of Relative Socio-Economic Advantage and Disadvantage - Percent (2011 Census based)	1	zdcirsadpi	aldiadp11i
SEIFA Index of Relative Socio-Economic Advantage and Disadvantage - Percent (2011 Census based)	2	zdcirsadpi	bldiadp11i
SEIFA Index of Relative Socio-Economic Advantage and Disadvantage - Percent (2016 Census based)	2	n/a	bldiadp16i
SEIFA Index of Relative Socio-Economic Advantage and Disadvantage - Decile (2011 Census based)	1	zdcirsaddi	aldiadd11i
SEIFA Index of Relative Socio-Economic Advantage and Disadvantage - Decile (2011 Census based)	2	zdcirsaddi	bldiadd11i
SEIFA Index of Relative Socio-Economic Advantage and Disadvantage - Decile (2016 Census based)	2	n/a	bldiadd16i
SEIFA Index of Economic Resources - Rank (2011 Census based)	1	zdcierr00i	aldierr11i
SEIFA Index of Economic Resources - Rank (2011 Census based)	2	zdcierr00i	bldierr11i
SEIFA Index of Economic Resources - Rank (2016 Census based)	2	n/a	bldierr16i
SEIFA Index of Economic Resources - Percent (2011 Census based)	1	zdcierp00i	aldierp11i
SEIFA Index of Economic Resources - Percent (2011 Census based)	2	zdcierp00i	bldierp11i
SEIFA Index of Economic Resources - Percent (2016 Census based)	2	n/a	bldierp16i
SEIFA Index of Economic Resources - Decile (2011 Census based)	1	zdcierr00i	aldierd11i
SEIFA Index of Economic Resources - Decile (2011 Census based)	2	zdcierr00i	bldierd11i
SEIFA Index of Economic Resources - Decile (2016 Census based)	2	n/a	bldierd16i
SEIFA Index of Education and Occupation - Rank (2011 Census based)	1	zdcieor00i	aldieor11i
SEIFA Index of Education and Occupation - Rank (2011 Census based)	2	zdcieor00i	bldieor11i
SEIFA Index of Education and Occupation - Rank (2016 Census based)	2	n/a	bldieor16i
SEIFA Index of Education and Occupation - Percent (2011 Census based)	1	zdcieop00i	aldieop11i
SEIFA Index of Education and Occupation - Percent (2011 Census based)	2	zdcieop00i	bldieop11i
SEIFA Index of Education and Occupation - Percent (2016 Census based)	2	n/a	bldieop16i
SEIFA Index of Education and Occupation - Decile (2011 Census based)	1	zdcieod00i	aldieod11i
SEIFA Index of Education and Occupation - Decile (2011 Census based)	2	zdcieod00i	bldieod11i
SEIFA Index of Education and Occupation - Decile (2011 Census based)	2	n/a	bldieod16i
Sex in the past 12 months	2	bhxsex120a	bbxsex120a

Table 16: Variables renamed for Release 3.0

Label	Wave	Old variable name (Release 2.1)	New variable name (Release 3.0)
Initial cross-sectional population weight for Wave 1	1	adcicswgtmd	adcicpwtad
Raked cross-sectional population weight for Wave 1	1	adcrcswgtmd	adcrcpwtad
Initial cross-sectional population weight for Wave 2	2	bdcicswgtmd	bdcicpwtbd
Raked cross-sectional population weight for Wave 2	2	bdcrcswgtmd	bdcrcpwtbd
Initial longitudinal population weight between Wave 1 and Wave 2	2	bdcilgwgtmd	bdcilpwabd
Raked longitudinal population weight between Wave 1 and Wave 2	2	bdcrlgwgtmd	bdcrlpwabd

Table 17: Variables renamed for Release 4.0

Label	Wave	Old variable name	Label
Coronavirus – lacked companionship (current)	3	cslcvfc01	csllosc01
Coronavirus – felt left out (current)	3	cslcvfc02	csllosc02
Coronavirus – felt isolated (current)	3	cslcvfc03	csllosc03
Coronavirus – felt lonely (current)	3	cslcvfc04	csllosc04
Coronavirus – lacked companionship (during restrictions)	3	cslcvfr01	cslloscr1
Coronavirus – felt left out (during restrictions)	3	cslcvfr02	cslloscr2
Coronavirus – felt isolated (during restrictions)	3	cslcvfr03	cslloscr3
Coronavirus – felt lonely (during restrictions)	3	cslcvfr04	cslloscr4

Glossary

Term	Description
ABS	Australian Bureau of Statistics
ANZSCO	Australian and New Zealand Standard Classification of Occupations
ASCED	Australian Standard Classification of Education
ASCL	Australian Standard Classification of Languages
AIFS	Australian Institute of Family Studies
ASGS	Australian Statistical Geographic Standards
BMI	Body Mass Index
DC	Data Collection
DOI	Digital Object Identifier
General Release	This dataset includes data from which the more sensitive information has been removed. Confidentilisation has also been considered for all variables and applied if required
LD	Linked Data
NFD	Not further defined
Respondent Dataset	A dataset containing key indicator data, such as the unique study identifier, age, household identifier and geographical information.
Restricted Release	This dataset includes information at a more detailed level than the General Release datasets. Items include language, occupation, and country of birth at the 4-digit levels.
SA1	Statistical Area 1
SA2	Statistical Area 2
SACC	Standard Australian Classification of Countries
SEIFA	Socio-Economic Indexes for Areas
SRC	Social Research Centre
TTM	Ten to Men Study
UoM	University of Melbourne
Update	An update occurs when significant changes are made to an existing release. For example, the update to Release 2.0 resulted in it being reissued as Release 2.1.
Wave dataset	A dataset containing the responses to the corresponding questionnaire of a given wave.

Acknowledgements and citation

Acknowledgements

Ten to Men: The Australian Longitudinal Study on Male Health is the first large-scale, nationally representative, longitudinal study to focus exclusively on investigating and improving the health and wellbeing of males in Australia. It is also the largest longitudinal study of male health in the world.

Ten to Men was commissioned and is funded by the Australian Government Department of Health to inform the National Male Health Policy. The study was initially conducted by the University of Melbourne who released datasets, including data documentation, for Wave 1 and Wave 2. Roy Morgan Research undertook the data collection and initial data processing for these 2 waves.

After a competitive tender process in 2017, the Australian Institute of Family Studies (AIFS) was awarded with the responsibility to conduct Waves 3 and 4. Since then, AIFS has updated the Wave 1 and Wave 2 datasets, including data documentation.

In 2020, the study team re-evaluated and revised the survey content and methodology to enable contactless interviewing for Wave 3. New items designed to collect information on the impacts of COVID-19 and the recent effects of natural disasters were also incorporated into the revised survey. The online survey went live at the end of July 2020, with data collection concluding in February 2021.

Minimal changes, both in terms of the survey content and the data collection method, occurred between Wave 3 and Wave 4. The Wave 4 online survey data collection period was from August 2022 to December 2022.

The Social Research Centre (SRC), in collaboration with Ipsos, was contracted to undertake the fieldwork component for Waves 3 and 4 of the study.

Citation

Volpe, F. Suares, M., Silbert, M., & Martin, S. (2023). Ten to Men: The Australian Longitudinal Study on Male Health - Data Issues Paper, Release 4.0, (Waves 1-4). Melbourne: Australian Institute of Family Studies.

Downloads

Data Issues Paper - Wave 4 2.11 MB

Data Issues Paper

Downloads

Overview

Data Issue Paper Updates

1. Ten to Men data

2. Ten to Men datasets

2.1 Addition of a data sharing framework

2.2 Availability of respondent dataset

2.3 Renaming of variables to indicate wave

2.4 Renaming of variables to indicate research domain

2.5 Additional research domain for linked data

2.6 Renaming of census-based variables

2.7. Renaming of weight variables

3. Data quality issues

3.1. Missing data

3.2 Outliers

3.3 Age of respondents

Cohort inconsistencies

Calculation of age in Wave 3

3.4 Parent questionnaire data

3.5 Anthropometric measurements

3.6 Level of education completed

3.7 Age when first drank alcohol

Summary

Further details

3.8 Age when first smoked cigarettes

3.9 Country of birth

3.10 Current occupation

3.11 Language spoken at home

Adult questionnaire

Boys and Young Men questionnaires

3.12 Additional Wave 1 participants

3.13 Pilot data for Wave 2

3.14 Update to weights

3.15 Obstructive sleep apnoea

3.16 Derived variables

3.17 Short Form 12 (SF-12) Health Survey

3.18 Height

3.19 Other natural disasters

Appendix A

History of dataset releases

Appendix B

Glossary

Acknowledgements

Downloads

Related topics