UKBB GWAS Imputed v3 - File Manifest Release 20180731

	A	B	C
1
2	All data has been moved to AWS. Files on DROPBOX are no longer available.
3
4	This README is a description of the files available to download as part of the Neale Lab GWAS of UK Biobank phenotypes.
5	For a description of the project and details of the analysis, please see http://www.nealelab.is/uk-biobank
6	To download GWAS results, see the links in the manifest tab below. At the top of each column in the manifest is a triangle. Click the triangle and search options become available for that column. Once you've found the code you are looking for, refer to the "wget command" column for the corresponding wget command to download the relevant results file.
7	The code used to generate the files described here is publicly available: https://github.com/Nealelab/UK_Biobank_GWAS.
8	Questions or concerns not addressed by this README, the project website, our FAQs (http://www.nealelab.is/faq) or the Github repository can be directed to nealelab.ukb@gmail.com.
9	Note: Between the biomarker release and 8/8/2019, the manifest had incorrect descriptions of the phenotype corresponding the the phenotype code for 10 files. This is now fixed.
10
11
12	variants.tsv.bgz
13	This file contains annotations on each variant in the GWAS, calculated across the analysis subset of 361,194 samples.
14	NOTE: The order of variants in this file matches the order of variants in the results files described below. To join these annotations with a results file, either match on the "variant" field or simply paste the columns together (e.g. "paste variants.tsv K50.gwas.imputed_v3.both_sexes.tsv").
15
16	Contents:
17	variant	string	Variant identifier in the form "chr:pos:ref:alt", where "ref" is aligned to the forward strand.
18	chr	string	Chromosome of the variant.
19	pos	int	Position of the variant in GRCh37 coordinates.
20	ref	string	Reference allele on the forward strand.
21	alt	string	Alternate allele (not necessarily minor allele).
22	rsid	string	rsid (not guaranteed to be unique).
23	varid	string	Unique variant identifier included in imputed BGEN files.
24	consequence	string	Consequence annotated using VEP version 85.
25	consequence_category	string	Category of VEP-annotated consequence ("ptv", "missense", "synonymous", "non_coding").
26	info	float	Imputation INFO score as provided by UK Biobank.
27	call_rate	float	Call rate (calculated using hardcall genotypes).
28	AC	int	Allele count (calculated using hardcall genotypes).
29	AF	float	Allele frequency (calculated using hardcall genotypes).
30	minor_allele	string	Minor allele (equal to ref allele when AF > 0.5, otherwise equal to alt allele).
31	minor_AF	float	Minor allele frequency (calculated using hardcall genotypes).
32	p_hwe	float	Hardy-Weinberg p-value.
33	n_called	int	Number of samples with defined genotype at this variant.
34	n_not_called	int	Number of samples without a defined genotype at this variant.
35	n_hom_ref	int	Number of samples with homozygous reference genotype at this variant.
36	n_het	int	Number of samples with heterozygous genotype at this variant.
37	n_hom_var	int	Number of samples with homozygous alternate genotype at this variant.
38	n_non_ref	int	Number of samples with non-homozygous reference genotype at this variant (n_het + n_hom_var)
39	r_heterozygosity	float	Proportion of samples with heterozygous genotype at this variant.
40	r_het_hom_var	float	Ratio of samples with heterozygous genotype to samples with homozygous alternate genotype at this variant.
41	r_expected_het_frequency	float	Expected r_heterozygosity based on Hardy-Weinberg equilibrium.
42
43
44
45
46	phenotypes.{both_sexes,female,male}.tsv.bgz
47	These files contain a description and summary of each phenotype included in the analysis.
48
49	Contents:
50	phenotype	string	Unique phenotype identifier. Format differs depending on the source of the phenotype.
51	description	string	Free text description of the phenotype.
52	variable_type	string	{"categorical", "ordinal", "continuous_irnt", "continuous_raw"} Variable type. Each continuous variable has two versions: an untransformed version ("continuous_raw") and a version where values have been inverse rank normalized ("continuous_irnt").
53	source	string	{"icd10", "finngen", "phesant"} Source of the phenotype. See notes below.
54	n_non_missing	int	Number of samples within the analysis subset defined for this phenotype
55	n_missing	int	Number of samples within the analysis subset not defined for this phenotype.
56	n_controls	int	For case/control phenotypes, number of control samples within the analysis subset.
57	n_cases	int	For case/control phenotypes, number of case samples within the analysis subset.
58	PHESANT_transformation	string	This field describes the transformations performed by PHESANT for the applicable phenotypes.
59	notes	string	Any additional notes.
60
61	Analysis subset sizes:
62	both_sexes	361,194 samples
63	female	194,174 samples
64	male	167,020 samples
65
66	Phenotype sources:
67	icd10	These phenotypes were generated from UK Biobank fields 41202-0.0 - 41202-0.379. For each sample, the set of ICD10 codes (truncated to the first three characters, e.g. "K50") included in these fields was collected. The ICD10 phenotypes are booleans indicating whether the ICD10 code is included in that set of codes for each sample.
68	finngen	These phenotypes were manually curated by collaborators in the FinnGen research project. Many are combinations of different ICD10 codes.
69	phesant	These phenotypes were automatically processed using a modified version of the software PHESANT (https://www.ncbi.nlm.nih.gov/pubmed/29040602).
70
71
72	<phenotype_code>.gwas.imputed_v3.{both_sexes,female,male}.tsv.bgz
73	These are the GWAS results files (e.g., "K50.gwas.imputed_v3.both_sexes.tsv.bgz").
74
75	Contents:
76	variant	string	Variant identifier in the form "chr:pos:ref:alt", where "ref" is aligned to the forward strand of GRCh37 and "alt" is the effect allele (use this to join with variant annotation file).
77	minor_allele	string	The minor allele (alt allele is not always minor).
78	minor_AF	float	Frequency of the minor allele in the n_complete_samples defined for this phenotype.
79	expected_case_minor_AC	float	(Optional) For case/control phenotypes, calculated as (2 * minor_AF * n_cases).
80	expected_min_category_minor_AC	float	(Optional) For categorical phenotypes with less than 5 categories, calculated as (2 * minor_AF * number of samples in smallest category).
81	low_confidence_variant	boolean	Flag indicating low confidence results based on the following heuristics: - Case/control phenotypes: expected_case_minor_AC < 25 or minor_AF < 0.001. - Categorical phenotypes with less than 5 categories: expected_min_category_minor_AC < 25 or minor_AF < 0.001. - Quantitative phenotypes: minor_AF < 0.001.
82	n_complete_samples	int	Number of samples defined for this phenotype.
83	AC	float	Allele count of alt allele calculated on dosages within n_complete_samples.
84	ytx	float	Dot product of phenotype vector y and genotype vector x (alt allele count in cases for case/control phenotypes).
85	beta	float	Estimated effect size of alt allele.
86	se	float	Estimated standard error of beta.
87	tstat	float	t-statistic of beta estimate (= beta/se).
88	pval	float	p-value of beta significance test.
89
90
91
92
93
94
95
96
97
98
99
100