ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
2
All data has been moved to AWS. Files on DROPBOX are no longer available.
3
4
This README is a description of the files available to download as part of the Neale Lab GWAS of UK Biobank phenotypes.
5
For a description of the project and details of the analysis, please see http://www.nealelab.is/uk-biobank
6
To download GWAS results, see the links in the manifest tab below. At the top of each column in the manifest is a triangle. Click the triangle and search options become available for that column. Once you've found the code you are looking for, refer to the "wget command" column for the corresponding wget command to download the relevant results file.
7
The code used to generate the files described here is publicly available: https://github.com/Nealelab/UK_Biobank_GWAS.
8
Questions or concerns not addressed by this README, the project website, our FAQs (http://www.nealelab.is/faq) or the Github repository can be directed to nealelab.ukb@gmail.com.
9
Note: Between the biomarker release and 8/8/2019, the manifest had incorrect descriptions of the phenotype corresponding the the phenotype code for 10 files. This is now fixed.
10
11
12
variants.tsv.bgz
13
This file contains annotations on each variant in the GWAS, calculated across the analysis subset of 361,194 samples.
14
NOTE: The order of variants in this file matches the order of variants in the results files described below. To join these annotations with a results file, either match on the "variant" field or simply paste the columns together (e.g. "paste variants.tsv K50.gwas.imputed_v3.both_sexes.tsv").
15
16
Contents:
17
variantstringVariant identifier in the form "chr:pos:ref:alt", where "ref" is aligned to the forward strand.
18
chrstringChromosome of the variant.
19
posintPosition of the variant in GRCh37 coordinates.
20
refstringReference allele on the forward strand.
21
altstringAlternate allele (not necessarily minor allele).
22
rsidstringrsid (not guaranteed to be unique).
23
varidstringUnique variant identifier included in imputed BGEN files.
24
consequencestringConsequence annotated using VEP version 85.
25
consequence_categorystringCategory of VEP-annotated consequence ("ptv", "missense", "synonymous", "non_coding").
26
infofloatImputation INFO score as provided by UK Biobank.
27
call_ratefloatCall rate (calculated using hardcall genotypes).
28
ACintAllele count (calculated using hardcall genotypes).
29
AFfloatAllele frequency (calculated using hardcall genotypes).
30
minor_allelestringMinor allele (equal to ref allele when AF > 0.5, otherwise equal to alt allele).
31
minor_AFfloatMinor allele frequency (calculated using hardcall genotypes).
32
p_hwefloatHardy-Weinberg p-value.
33
n_calledintNumber of samples with defined genotype at this variant.
34
n_not_calledintNumber of samples without a defined genotype at this variant.
35
n_hom_refintNumber of samples with homozygous reference genotype at this variant.
36
n_hetintNumber of samples with heterozygous genotype at this variant.
37
n_hom_varintNumber of samples with homozygous alternate genotype at this variant.
38
n_non_refintNumber of samples with non-homozygous reference genotype at this variant (n_het + n_hom_var)
39
r_heterozygosityfloatProportion of samples with heterozygous genotype at this variant.
40
r_het_hom_varfloat
Ratio of samples with heterozygous genotype to samples with homozygous alternate genotype at this variant.
41
r_expected_het_frequencyfloatExpected r_heterozygosity based on Hardy-Weinberg equilibrium.
42
43
44
45
46
phenotypes.{both_sexes,female,male}.tsv.bgz
47
These files contain a description and summary of each phenotype included in the analysis.
48
49
Contents:
50
phenotypestringUnique phenotype identifier. Format differs depending on the source of the phenotype.
51
descriptionstringFree text description of the phenotype.
52
variable_typestring
{"categorical", "ordinal", "continuous_irnt", "continuous_raw"} Variable type. Each continuous variable has two versions: an untransformed version ("continuous_raw") and a version where values have been inverse rank normalized ("continuous_irnt").
53
sourcestring{"icd10", "finngen", "phesant"} Source of the phenotype. See notes below.
54
n_non_missingintNumber of samples within the analysis subset defined for this phenotype
55
n_missingintNumber of samples within the analysis subset not defined for this phenotype.
56
n_controlsintFor case/control phenotypes, number of control samples within the analysis subset.
57
n_casesintFor case/control phenotypes, number of case samples within the analysis subset.
58
PHESANT_transformationstringThis field describes the transformations performed by PHESANT for the applicable phenotypes.
59
notesstringAny additional notes.
60
61
Analysis subset sizes:
62
both_sexes
361,194 samples
63
female
194,174 samples
64
male
167,020 samples
65
66
Phenotype sources:
67
icd10These phenotypes were generated from UK Biobank fields 41202-0.0 - 41202-0.379. For each sample, the set of ICD10 codes (truncated to the first three characters, e.g. "K50") included in these fields was collected. The ICD10 phenotypes are booleans indicating whether the ICD10 code is included in that set of codes for each sample.
68
finngenThese phenotypes were manually curated by collaborators in the FinnGen research project. Many are combinations of different ICD10 codes.
69
phesantThese phenotypes were automatically processed using a modified version of the software PHESANT (https://www.ncbi.nlm.nih.gov/pubmed/29040602).
70
71
72
<phenotype_code>.gwas.imputed_v3.{both_sexes,female,male}.tsv.bgz
73
These are the GWAS results files (e.g., "K50.gwas.imputed_v3.both_sexes.tsv.bgz").
74
75
Contents:
76
variantstringVariant identifier in the form "chr:pos:ref:alt", where "ref" is aligned to the forward strand of GRCh37 and "alt" is the effect allele (use this to join with variant annotation file).
77
minor_allelestringThe minor allele (alt allele is not always minor).
78
minor_AFfloatFrequency of the minor allele in the n_complete_samples defined for this phenotype.
79
expected_case_minor_ACfloat(Optional) For case/control phenotypes, calculated as (2 * minor_AF * n_cases).
80
expected_min_category_minor_ACfloat(Optional) For categorical phenotypes with less than 5 categories, calculated as (2 * minor_AF * number of samples in smallest category).
81
low_confidence_variantbooleanFlag indicating low confidence results based on the following heuristics:
- Case/control phenotypes: expected_case_minor_AC < 25 or minor_AF < 0.001.
- Categorical phenotypes with less than 5 categories: expected_min_category_minor_AC < 25 or minor_AF < 0.001.
- Quantitative phenotypes: minor_AF < 0.001.
82
n_complete_samplesintNumber of samples defined for this phenotype.
83
ACfloatAllele count of alt allele calculated on dosages within n_complete_samples.
84
ytxfloatDot product of phenotype vector y and genotype vector x (alt allele count in cases for case/control phenotypes).
85
betafloatEstimated effect size of alt allele.
86
sefloatEstimated standard error of beta.
87
tstatfloatt-statistic of beta estimate (= beta/se).
88
pvalfloatp-value of beta significance test.
89
90
91
92
93
94
95
96
97
98
99
100