A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | ||||||||||||||||||||||||||
2 | All data has been moved to AWS. Files on DROPBOX are no longer available. | |||||||||||||||||||||||||
3 | ||||||||||||||||||||||||||
4 | This README is a description of the files available to download as part of the Neale Lab GWAS of UK Biobank phenotypes. | |||||||||||||||||||||||||
5 | For a description of the project and details of the analysis, please see http://www.nealelab.is/uk-biobank | |||||||||||||||||||||||||
6 | To download GWAS results, see the links in the manifest tab below. At the top of each column in the manifest is a triangle. Click the triangle and search options become available for that column. Once you've found the code you are looking for, refer to the "wget command" column for the corresponding wget command to download the relevant results file. | |||||||||||||||||||||||||
7 | The code used to generate the files described here is publicly available: https://github.com/Nealelab/UK_Biobank_GWAS. | |||||||||||||||||||||||||
8 | Questions or concerns not addressed by this README, the project website, our FAQs (http://www.nealelab.is/faq) or the Github repository can be directed to nealelab.ukb@gmail.com. | |||||||||||||||||||||||||
9 | Note: Between the biomarker release and 8/8/2019, the manifest had incorrect descriptions of the phenotype corresponding the the phenotype code for 10 files. This is now fixed. | |||||||||||||||||||||||||
10 | ||||||||||||||||||||||||||
11 | ||||||||||||||||||||||||||
12 | variants.tsv.bgz | |||||||||||||||||||||||||
13 | This file contains annotations on each variant in the GWAS, calculated across the analysis subset of 361,194 samples. | |||||||||||||||||||||||||
14 | NOTE: The order of variants in this file matches the order of variants in the results files described below. To join these annotations with a results file, either match on the "variant" field or simply paste the columns together (e.g. "paste variants.tsv K50.gwas.imputed_v3.both_sexes.tsv"). | |||||||||||||||||||||||||
15 | ||||||||||||||||||||||||||
16 | Contents: | |||||||||||||||||||||||||
17 | variant | string | Variant identifier in the form "chr:pos:ref:alt", where "ref" is aligned to the forward strand. | |||||||||||||||||||||||
18 | chr | string | Chromosome of the variant. | |||||||||||||||||||||||
19 | pos | int | Position of the variant in GRCh37 coordinates. | |||||||||||||||||||||||
20 | ref | string | Reference allele on the forward strand. | |||||||||||||||||||||||
21 | alt | string | Alternate allele (not necessarily minor allele). | |||||||||||||||||||||||
22 | rsid | string | rsid (not guaranteed to be unique). | |||||||||||||||||||||||
23 | varid | string | Unique variant identifier included in imputed BGEN files. | |||||||||||||||||||||||
24 | consequence | string | Consequence annotated using VEP version 85. | |||||||||||||||||||||||
25 | consequence_category | string | Category of VEP-annotated consequence ("ptv", "missense", "synonymous", "non_coding"). | |||||||||||||||||||||||
26 | info | float | Imputation INFO score as provided by UK Biobank. | |||||||||||||||||||||||
27 | call_rate | float | Call rate (calculated using hardcall genotypes). | |||||||||||||||||||||||
28 | AC | int | Allele count (calculated using hardcall genotypes). | |||||||||||||||||||||||
29 | AF | float | Allele frequency (calculated using hardcall genotypes). | |||||||||||||||||||||||
30 | minor_allele | string | Minor allele (equal to ref allele when AF > 0.5, otherwise equal to alt allele). | |||||||||||||||||||||||
31 | minor_AF | float | Minor allele frequency (calculated using hardcall genotypes). | |||||||||||||||||||||||
32 | p_hwe | float | Hardy-Weinberg p-value. | |||||||||||||||||||||||
33 | n_called | int | Number of samples with defined genotype at this variant. | |||||||||||||||||||||||
34 | n_not_called | int | Number of samples without a defined genotype at this variant. | |||||||||||||||||||||||
35 | n_hom_ref | int | Number of samples with homozygous reference genotype at this variant. | |||||||||||||||||||||||
36 | n_het | int | Number of samples with heterozygous genotype at this variant. | |||||||||||||||||||||||
37 | n_hom_var | int | Number of samples with homozygous alternate genotype at this variant. | |||||||||||||||||||||||
38 | n_non_ref | int | Number of samples with non-homozygous reference genotype at this variant (n_het + n_hom_var) | |||||||||||||||||||||||
39 | r_heterozygosity | float | Proportion of samples with heterozygous genotype at this variant. | |||||||||||||||||||||||
40 | r_het_hom_var | float | Ratio of samples with heterozygous genotype to samples with homozygous alternate genotype at this variant. | |||||||||||||||||||||||
41 | r_expected_het_frequency | float | Expected r_heterozygosity based on Hardy-Weinberg equilibrium. | |||||||||||||||||||||||
42 | ||||||||||||||||||||||||||
43 | ||||||||||||||||||||||||||
44 | ||||||||||||||||||||||||||
45 | ||||||||||||||||||||||||||
46 | phenotypes.{both_sexes,female,male}.tsv.bgz | |||||||||||||||||||||||||
47 | These files contain a description and summary of each phenotype included in the analysis. | |||||||||||||||||||||||||
48 | ||||||||||||||||||||||||||
49 | Contents: | |||||||||||||||||||||||||
50 | phenotype | string | Unique phenotype identifier. Format differs depending on the source of the phenotype. | |||||||||||||||||||||||
51 | description | string | Free text description of the phenotype. | |||||||||||||||||||||||
52 | variable_type | string | {"categorical", "ordinal", "continuous_irnt", "continuous_raw"} Variable type. Each continuous variable has two versions: an untransformed version ("continuous_raw") and a version where values have been inverse rank normalized ("continuous_irnt"). | |||||||||||||||||||||||
53 | source | string | {"icd10", "finngen", "phesant"} Source of the phenotype. See notes below. | |||||||||||||||||||||||
54 | n_non_missing | int | Number of samples within the analysis subset defined for this phenotype | |||||||||||||||||||||||
55 | n_missing | int | Number of samples within the analysis subset not defined for this phenotype. | |||||||||||||||||||||||
56 | n_controls | int | For case/control phenotypes, number of control samples within the analysis subset. | |||||||||||||||||||||||
57 | n_cases | int | For case/control phenotypes, number of case samples within the analysis subset. | |||||||||||||||||||||||
58 | PHESANT_transformation | string | This field describes the transformations performed by PHESANT for the applicable phenotypes. | |||||||||||||||||||||||
59 | notes | string | Any additional notes. | |||||||||||||||||||||||
60 | ||||||||||||||||||||||||||
61 | Analysis subset sizes: | |||||||||||||||||||||||||
62 | both_sexes | 361,194 samples | ||||||||||||||||||||||||
63 | female | 194,174 samples | ||||||||||||||||||||||||
64 | male | 167,020 samples | ||||||||||||||||||||||||
65 | ||||||||||||||||||||||||||
66 | Phenotype sources: | |||||||||||||||||||||||||
67 | icd10 | These phenotypes were generated from UK Biobank fields 41202-0.0 - 41202-0.379. For each sample, the set of ICD10 codes (truncated to the first three characters, e.g. "K50") included in these fields was collected. The ICD10 phenotypes are booleans indicating whether the ICD10 code is included in that set of codes for each sample. | ||||||||||||||||||||||||
68 | finngen | These phenotypes were manually curated by collaborators in the FinnGen research project. Many are combinations of different ICD10 codes. | ||||||||||||||||||||||||
69 | phesant | These phenotypes were automatically processed using a modified version of the software PHESANT (https://www.ncbi.nlm.nih.gov/pubmed/29040602). | ||||||||||||||||||||||||
70 | ||||||||||||||||||||||||||
71 | ||||||||||||||||||||||||||
72 | <phenotype_code>.gwas.imputed_v3.{both_sexes,female,male}.tsv.bgz | |||||||||||||||||||||||||
73 | These are the GWAS results files (e.g., "K50.gwas.imputed_v3.both_sexes.tsv.bgz"). | |||||||||||||||||||||||||
74 | ||||||||||||||||||||||||||
75 | Contents: | |||||||||||||||||||||||||
76 | variant | string | Variant identifier in the form "chr:pos:ref:alt", where "ref" is aligned to the forward strand of GRCh37 and "alt" is the effect allele (use this to join with variant annotation file). | |||||||||||||||||||||||
77 | minor_allele | string | The minor allele (alt allele is not always minor). | |||||||||||||||||||||||
78 | minor_AF | float | Frequency of the minor allele in the n_complete_samples defined for this phenotype. | |||||||||||||||||||||||
79 | expected_case_minor_AC | float | (Optional) For case/control phenotypes, calculated as (2 * minor_AF * n_cases). | |||||||||||||||||||||||
80 | expected_min_category_minor_AC | float | (Optional) For categorical phenotypes with less than 5 categories, calculated as (2 * minor_AF * number of samples in smallest category). | |||||||||||||||||||||||
81 | low_confidence_variant | boolean | Flag indicating low confidence results based on the following heuristics: - Case/control phenotypes: expected_case_minor_AC < 25 or minor_AF < 0.001. - Categorical phenotypes with less than 5 categories: expected_min_category_minor_AC < 25 or minor_AF < 0.001. - Quantitative phenotypes: minor_AF < 0.001. | |||||||||||||||||||||||
82 | n_complete_samples | int | Number of samples defined for this phenotype. | |||||||||||||||||||||||
83 | AC | float | Allele count of alt allele calculated on dosages within n_complete_samples. | |||||||||||||||||||||||
84 | ytx | float | Dot product of phenotype vector y and genotype vector x (alt allele count in cases for case/control phenotypes). | |||||||||||||||||||||||
85 | beta | float | Estimated effect size of alt allele. | |||||||||||||||||||||||
86 | se | float | Estimated standard error of beta. | |||||||||||||||||||||||
87 | tstat | float | t-statistic of beta estimate (= beta/se). | |||||||||||||||||||||||
88 | pval | float | p-value of beta significance test. | |||||||||||||||||||||||
89 | ||||||||||||||||||||||||||
90 | ||||||||||||||||||||||||||
91 | ||||||||||||||||||||||||||
92 | ||||||||||||||||||||||||||
93 | ||||||||||||||||||||||||||
94 | ||||||||||||||||||||||||||
95 | ||||||||||||||||||||||||||
96 | ||||||||||||||||||||||||||
97 | ||||||||||||||||||||||||||
98 | ||||||||||||||||||||||||||
99 | ||||||||||||||||||||||||||
100 |