ABCDEFGHIJKLMNOPQRSTUVWXYZAAABACADAEAFAGAHAIAJAK
1
GitHub IssueNameDOIFormatMeta DatasetAnnotationsHF DatasetsCitations (2-22-22)YearOfficial SplitsPaperDataset Download URLTask TypesDomainLicenseDead Souce Dataset LinkLanguagesMultilingualSourcesDescriptionAnnotators
2
https://github.com/bigscience-workshop/biomedical/issues/13BioCreative V: BC5CDR10.1093/database/baw068BioCBLUEManualNo3322015train,dev,testhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4860626/https://biocreative.bioinformatics.udel.edu/resources/corpora/biocreative-v-cdr-corpus/NER/NED, REBiomedicalPublic DomainEnglishNoPubMed Abstracts
3
https://github.com/bigscience-workshop/biomedical/issues/114ChEBIBRATManualNo72018NONEhttps://aclanthology.org/L18-1042/http://www.nactem.ac.uk/chebi/NER, REBiomedicalCC BY 4.0EnglishNoAbstracts + Full Papers
4
https://github.com/bigscience-workshop/biomedical/issues/14AnatEM10.1093/bioinformatics/btt580CoNLL, StandoffManualNo532013train,dev,testhttps://academic.oup.com/bioinformatics/article/30/6/868/285282http://nactem.ac.uk/anatomytagger/#AnatEMNERBiomedicalCC BY-SA 3.0EnglishNoPubMed abstracts, PMC OA full textsAnatomical entity mention recognition
5
https://github.com/bigscience-workshop/biomedical/issues/206AnEM-CoNLLAnatEMManualNo822012train,dev,testhttps://aclanthology.org/W12-4304/http://www.nactem.ac.uk/anatomy/NERBiomedicalCC BY-SA 3.0EnglishNo
6
https://github.com/bigscience-workshop/biomedical/issues/15JNLPBA-CoNLLGENIAManualhttps://huggingface.co/datasets/jnlpba412004train,testhttps://aclanthology.org/W04-1213/http://www.geniaproject.org/shared-tasks/bionlp-jnlpba-shared-task-2004NERBiomedicalCC BY NC 3.0EnglishNoPubMed abstractsBiomedical NER
7
https://github.com/bigscience-workshop/biomedical/issues/16MuchMore-XML-ModelNo-12001NONE?https://muchmore.dfki.de/resources1.htmNER/NED, POSBiomedical?English, GermanYes
8
https://github.com/bigscience-workshop/biomedical/issues/261BioASQ Task A10.1186/s12859-015-0564-6JSON-ManualNo3642013-2021train,testhttps://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0564-6http://participants-area.bioasq.org/general_information/Task9b/Topic ClassificationBiomedicalDUAEnglishNoPubMed abstracts
9
https://github.com/bigscience-workshop/biomedical/issues/17BioASQ Task B10.1186/s12859-015-0564-6JSON-ManualNo3642014-2020train,testhttps://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0564-6http://participants-area.bioasq.org/general_information/Task9b/QABiomedicalDUAEnglishNoPubMed abstracts
10
https://github.com/bigscience-workshop/biomedical/issues/208BioASQ Task MESINESP / MESINESP210.5281/zenodo.5602914JSON-ManualNo02020-2021train,dev,testhttp://ceur-ws.org/Vol-2936/paper-11.pdfhttps://zenodo.org/record/5602914#.YhSXJ5PMKWtTopic ClassificationBiomedicalCC BY 4.0SpanishNo
11
https://github.com/bigscience-workshop/biomedical/issues/209BioASQ Task C 201710.18653/v1/W17-2306JSON-ManualNo522017train,testhttps://aclanthology.org/W17-2306.pdfhttp://participants-area.bioasq.org/general_information/Task5c/NERBiomedicalNLM License Code: 8283NLM123EnglishNoPubMed abstracts, PMC
12
https://github.com/bigscience-workshop/biomedical/issues/210BioASQ Task SynergyJSON-ManualNo02022train,testhttp://ceur-ws.org/Vol-2936/paper-10.pdfhttp://participants-area.bioasq.org/general_information/Task9b/QABiomedicalNLM License Code: 8283NLM123EnglishNo
13
https://github.com/bigscience-workshop/biomedical/issues/18BioCreative II: Gene Mention (GM)10.1186/gb-2008-9-s2-s2CoNLLManualhttps://huggingface.co/datasets/bc2gm_corpus3882008train,dev,testhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559986/https://github.com/spyysalo/bc2gm-corpus/raw/master/conll/NERBiomedicalMIT LicenseEnglishNoMEDLINE articles
14
https://github.com/bigscience-workshop/biomedical/issues/211BioCreative II: Gene Normalization (GN)StandoffManualNo3772008train,testhttps://link.springer.com/article/10.1186/gb-2008-9-s2-s3https://biocreative.bioinformatics.udel.edu/resources/corpora/biocreative-ii-corpus/NEDBiomedicalEnglishNoMEDLINE articles
15
https://github.com/bigscience-workshop/biomedical/issues/212GENETAG10.1186/1471-2105-6-S1-S3textModelNo2982005train,testhttps://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-6-S1-S3https://github.com/openbiocorpora/genetagNERBiomedicalPublic DomainEnglishNo
16
https://github.com/bigscience-workshop/biomedical/issues/213AIMed10.1016/j.artmed.2004.07.016textManualNo4862004NONEhttps://www.cs.utexas.edu/~ml/papers/bionlp-aimed-04.pdfhttps://www.cs.utexas.edu/ftp/mooney/bio-data/NERBiomedical?EnglishNoPubMed abstractsgenes,proteins
17
https://github.com/bigscience-workshop/biomedical/issues/214BioInfer10.1186/1471-2105-8-50CSV, XMLManualNo5032007train,testhttps://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-50https://github.com/metalrt/ppi-dataset/tree/master/csv_outputRE, NERBiomedical?YesEnglishNoPubMed abstracts6 annotators (2 of them experts)
18
https://github.com/bigscience-workshop/biomedical/issues/215HPRD5010.1093/bioinformatics/btl616CSV, XMLModel-assisted ManualNo7162007train,testhttps://academic.oup.com/bioinformatics/article/23/3/365/236564https://github.com/metalrt/ppi-dataset/tree/master/csv_outputREBiomedical?EnglishNo2 annotators (with biochemical background)
19
https://github.com/bigscience-workshop/biomedical/issues/216IEPACSV, XMLModel-assisted ManualNo3482002train,testhttp://psb.stanford.edu/psb-online/proceedings/psb02/abstracts/p326.htmlhttps://github.com/metalrt/ppi-dataset/tree/master/csv_outputTopic ClassificationBiomedical?YesEnglishNoPubMed abstracts
20
https://github.com/bigscience-workshop/biomedical/issues/217LLLCSV, XMLManualNo2872005train,testhttp://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=4B3F165F09189F5487A59C6E0C19C855?doi=10.1.1.96.5066&rep=rep1&type=pdfhttp://genome.jouy.inra.fr/texte/LLLchallenge/RE, NERBiomedical?EnglishNoPubMed abstractsexperts (biologists)
21
https://github.com/bigscience-workshop/biomedical/issues/175EBM PICO10.18653/v1/P18-1019text-Manual, CrowdsouredNo1202018train,testhttps://aclanthology.org/P18-1019/https://github.com/bepnye/EBM-NLPNERBiomedical?EnglishNoPubMed abstractsexperts for test, AMT for train
22
https://github.com/bigscience-workshop/biomedical/issues/19ChemProt10.1093/nar/gkq906StandoffBLUERulesNo912017train,dev,testhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3013776/https://biocreative.bioinformatics.udel.edu/resources/corpora/chemprot-corpus-biocreative-vi/RE, NER/NEDBiomedicalPublic / UnknownEnglishNo
23
https://github.com/bigscience-workshop/biomedical/issues/20NCBI Disease Corpus10.1016/j.jbi.2013.12.006PubTatorBLUEManualhttps://huggingface.co/datasets/ncbi_disease4222013train,dev,testhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3951655/https://www.ncbi.nlm.nih.gov/CBBresearch/Dogan/DISEASE/NER/NEDBiomedicalCC0 1.0EnglishNo
24
https://github.com/bigscience-workshop/biomedical/issues/21BIOSSES10.1093/bioinformatics/btx238WordBLUE, BLURBManualhttps://huggingface.co/datasets/biosses952017NONEhttps://academic.oup.com/bioinformatics/article/33/14/i49/3953954https://tabilab.cmpe.boun.edu.tr/BIOSSES/Semantic SimilarityBiomedicalGNU Common Public License v.3.0 EnglishNoBiomedical articles5 experts
25
https://github.com/bigscience-workshop/biomedical/issues/22GENIA Term Corpus10.1093/bioinformatics/btg1023XMLGENIAManualNo12822003NONE
https://academic.oup.com/bioinformatics/article/19/suppl_1/i180/227927
http://www.geniaproject.org/genia-corpus/term-corpusNERBiomedicalCC BY 3.0EnglishNo
26
https://github.com/bigscience-workshop/biomedical/issues/23GENIA Relation Corpus10.1093/bioinformatics/btg1023StandoffGENIAManualNo12822011train,dev,test
https://academic.oup.com/bioinformatics/article/19/suppl_1/i180/227927
http://www.geniaproject.org/genia-corpus/relation-corpusREBiomedicalCC BY 3.0EnglishNo
27
https://github.com/bigscience-workshop/biomedical/issues/24GENIA Coreference CorpusXMLGENIAManualNo182011NONEhttps://aclanthology.org/W11-1811/http://www.geniaproject.org/genia-corpus/coreferenceCoreferenceBiomedicalCC BY 3.0EnglishNo
28
https://github.com/bigscience-workshop/biomedical/issues/25PubMedQA (PQA-L, PQA-U, PQA-A)JSONManualhttps://huggingface.co/datasets/pubmed_qa732019train,testhttps://arxiv.org/abs/1909.06146https://github.com/pubmedqa/pubmedqaQABiomedicalMIT LicenseEnglishNo
29
https://github.com/bigscience-workshop/biomedical/issues/26MedMentions-PubTatorManualNo452019train,dev,testhttps://arxiv.org/abs/1902.09476https://github.com/chanzuckerberg/MedMentionsNER/NEDBiomedicalCC0 1.0EnglishNo
30
https://github.com/bigscience-workshop/biomedical/issues/27S800 Corpus10.1371/journal.pone.0065390StandoffManualhttps://huggingface.co/datasets/species_8001152013NONE
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0065390
https://species.jensenlab.orgNERBiomedicalPublic DomainEnglishNo
31
https://github.com/bigscience-workshop/biomedical/issues/28CHEMDNER10.1186/1758-2946-7-S1-S2BioC, StandoffManualNo1972015train,dev,testhttps://jcheminf.biomedcentral.com/articles/10.1186/1758-2946-7-S1-S2http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/ NERBiomedicalPublic / RegistrationEnglishNo
32
https://github.com/bigscience-workshop/biomedical/issues/218PUBHEALTH10.18653/v1/2020.emnlp-main.623textManualhttps://huggingface.co/datasets/health_fact282020train,dev,testhttps://aclanthology.org/2020.emnlp-main.623/https://github.com/neemakot/Health-Fact-Checking/blob/master/data/DATASHEET.mdFact-VerificationHealth NewsMIT LicenseEnglishNoHealth news articles
33
https://github.com/bigscience-workshop/biomedical/issues/120ProGene10.5281/zenodo.3698568CoNLLManualNo22020train,dev,testhttps://aclanthology.org/2020.lrec-1.564.pdfhttps://zenodo.org/record/3698568#.YhTFu5PMKWsNERBiomedicalCC BY 4.0EnglishNo
34
https://github.com/bigscience-workshop/biomedical/issues/115CellFinder-BRATManualNo392012NONEhttps://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.385.9703&rep=rep1&type=pdfhttps://github.com/openbiocorpora/cellfinderNERBiomedicalCC BY-SA 3.0EnglishNo2 experts
35
https://github.com/bigscience-workshop/biomedical/issues/30SciTail-JSONL, DGEM, text-Crowdsourcedhttps://huggingface.co/datasets/scitail2682018train,dev,testhttp://ai2-website.s3.amazonaws.com/team/ashishs/scitail-aaai2018.pdfhttps://allenai.org/data/scitailNLIBiomedical
Apache License 2.0
EnglishNo
36
https://github.com/bigscience-workshop/biomedical/issues/31n2c2 2006 - Smoking Status10.1197/jamia.M2408XML-ManualNo3802006train,testhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2274873/https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/Topic ClassificationClinicalDUAEnglishNoClinical notes
37
https://github.com/bigscience-workshop/biomedical/issues/219n2c2 2006 - Deidentification10.1197/jamia.M2444XML-ManualNo4912006train,testhttps://academic.oup.com/jamia/article/14/5/550/720189https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/NERClinicalDUAEnglishNoClinical notes
38
https://github.com/bigscience-workshop/biomedical/issues/32n2c2 2008 - Obesity10.1197/jamia.M3115XMLManualNo2612008train,testhttps://academic.oup.com/jamia/article/16/4/561/766997https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/Topic ClassificationClinicalDUAEnglishNoClinical notes2 obesity experts
39
https://github.com/bigscience-workshop/biomedical/issues/33n2c2 2009 - Medication10.1136/jamia.2010.003947text-ManualNo4652009train,testhttps://academic.oup.com/jamia/article/17/5/514/2909108https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/NERClinicalDUAEnglishNo
40
https://github.com/bigscience-workshop/biomedical/issues/38n2c2 2010 - Relations10.1136/amiajnl-2011-000203textBLUEManualNo10012011train,testhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3168320/https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/RE, NERClinicalDUAEnglishNo
41
https://github.com/bigscience-workshop/biomedical/issues/34n2c2 2011 - Coreference10.1136/amiajnl-2011-000784textManualNo1722011train,testhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3422835/https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/CoreferenceClinicalDUAEnglishNoClinical notes2 annotators
42
https://github.com/bigscience-workshop/biomedical/issues/36n2c2 2012 - Temporal Relations10.1136/amiajnl-2013-001628XMLManualNo4072012train,testhttps://academic.oup.com/jamia/article/20/5/806/726374https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/REClinicalDUAEnglishNoClinical notes8 annotators
43
https://github.com/bigscience-workshop/biomedical/issues/220n2c2 2014 - Deidentification & Heart Disease10.1016/j.jbi.2015.06.007XMLManualNo1632014train,testhttps://pubmed.ncbi.nlm.nih.gov/26225918/https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/NER, Topic ClassificationClinicalDUAEnglishNoMedical records6 annotators
44
https://github.com/bigscience-workshop/biomedical/issues/37n2c2 2018 - Adverse Drug Events and Medication Extraction10.1093/jamia/ocz166StandoffManualNo522018train,testhttps://academic.oup.com/jamia/article-abstract/27/1/3/5581277?redirectedFrom=fulltexthttps://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/NER, REClinicalDUAEnglishNoMIMIC-III2 annotators
45
https://github.com/bigscience-workshop/biomedical/issues/221n2c2 2018 - Clinical Trial Cohort Selection10.1093/jamia/ocz163XMLManualNo192018train,testhttps://academic.oup.com/jamia/article-abstract/26/11/1163/5575392?redirectedFrom=fulltexthttps://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/Topic ClassificationClinicalDUAEnglishNoMedical records2 annotators with medical expertise
46
https://github.com/bigscience-workshop/biomedical/issues/138PharmaCoNER10.18653/v1/D19-5701textManualNo492020train,dev,testhttps://aclanthology.org/D19-5701/https://temu.bsc.es/pharmaconer/index.php/datasets/NERClinicalCC BY 4.0SpanishNoSpanish Clinical Case CorpusPhysicians and medicinal chemistry experts
47
https://github.com/bigscience-workshop/biomedical/issues/39emrQA10.18653/v1/D18-1258JSONRulesNo792018train,testhttps://www.aclweb.org/anthology/D18-1258https://github.com/panushri25/emrQAQAClinicalDUAEnglishNoi2b2
48
https://github.com/bigscience-workshop/biomedical/issues/40MEDIQA 2019 NLI10.18653/v1/W19-5039JSONLMEDIQA 2019ManualNo562019train,testhttps://www.aclweb.org/anthology/W19-5039/https://physionet.org/content/mednli-bionlp19/1.0.1/NLIClinicalDUAEnglishNoMIMIC-IIIexperts (clinicians)
49
https://github.com/bigscience-workshop/biomedical/issues/41ShAReCLEF 2014 Task 210.13026/0zgk-9j94textManualNo1442014train,testhttp://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.717.6237&rep=rep1&type=pdfhttps://physionet.org/content/shareclefehealth2014task2/1.0/NERClinicalDUAEnglishNoMIMIC-II
50
https://github.com/bigscience-workshop/biomedical/issues/42RadGraph10.13026/hm87-5p47JSONManualNo12021train,dev,testhttps://arxiv.org/pdf/2106.14463.pdfhttps://physionet.org/content/radgraph/1.0.0/NER, REClinicalDUAEnglishNo
51
https://github.com/bigscience-workshop/biomedical/issues/43MeDAL10.18653/v1/2020.clinicalnlp-1.15textRuleshttps://huggingface.co/datasets/medal52020train,dev,testhttps://arxiv.org/abs/2012.13978https://github.com/BruceWen120/medalAbbreviation DisambiguationClinicalApache 2.0EnglishNoPubMed abstractsAbbreviation disambiguationreverse substitution (see https://arxiv.org/pdf/1912.06174.pdf)
52
https://github.com/bigscience-workshop/biomedical/issues/44MQP - Medical Question Pairs Dataset10.1145/3394486.3412861CSVManualhttps://huggingface.co/datasets/medical_questions_pairs52020NONEhttps://drive.google.com/file/d/1CHPGBXkvZuZc8hpr46HeHU6U6jnVze-s/viewhttps://github.com/curai/medical-question-pair-datasetParaphrasingClinical?EnglishNoexperts (clinicians)
53
https://github.com/bigscience-workshop/biomedical/issues/45Why-QA10.18653/v1/W19-1913JSON, textManualNo42019NONEhttps://www.aclweb.org/anthology/W19-1913/https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/QAClinicalDUAEnglishNo2010 i2b2/VA NLP
54
https://github.com/bigscience-workshop/biomedical/issues/46CLIP10.18653/v1/2021.acl-long.109CSV, JSONManualNo12021train,dev,testhttps://arxiv.org/abs/2106.02524https://physionet.org/content/mimic-iii-clinical-action/1.0.0/Span Classification, Sentence ClassificationClinicalDUAEnglishNoMIMIC-III4 physicians and 1 resident
55
https://github.com/bigscience-workshop/biomedical/issues/47DDI10.1016/j.jbi.2013.07.011XMLBLUEManualNo2132013train,testhttp://dx.doi.org/10.1016/j.jbi.2013.07.011https://github.com/isegura/DDICorpusRE, NER/NEDBiomedicalCC BY-NC 4.0EnglishNoDrugBank database and MEDLINE articles2 expert pharmacists
56
https://github.com/bigscience-workshop/biomedical/issues/48BioNLP Shared Task 2009StandoffManualNo7292009train,dev,testhttps://www.aclweb.org/anthology/W09-1401.pdfhttp://www.geniaproject.org/shared-tasks/bionlp-shared-task-2009REBiomedicalCC BY NC 3.0EnglishNoPubMed abstractsEvent extraction shared task
57
https://github.com/bigscience-workshop/biomedical/issues/49ScieloTMX,JSONFound/modelhttps://huggingface.co/datasets/scielo112018NONEhttps://arxiv.org/abs/1905.01852https://sites.google.com/view/felipe-soares/datasets#h.p_92uSCyAjWSRBTranslationBiomedicalCC BY 4.0English, Portuguese, SpanishYes
58
https://github.com/bigscience-workshop/biomedical/issues/50SciCite10.18653/v1/N19-1361JSONLCrowdsourcinghttps://huggingface.co/datasets/scicite612019train,dev,testhttps://arxiv.org/pdf/1904.01608.pdfhttps://github.com/allenai/sciciteTopic ClassificationBiomedicalApache LicenseEnglishNoSemantic Scholar corpus850 crowdsource workers
59
https://github.com/bigscience-workshop/biomedical/issues/51SCAI diseaseCoNLLManualNo442010NONEhttps://pub.uni-bielefeld.de/record/2603398http://www.scai.fraunhofer.de/disease-ae-corpus.htmlNERBiomedical?EnglishNoMEDLINE abstracts2 annotators
60
https://github.com/bigscience-workshop/biomedical/issues/52SCAI chemical10.1093/bioinformatics/btn181CoNLLManualNo1432008NONEhttps://pubmed.ncbi.nlm.nih.gov/18586724/http://www.scai.fraunhofer.de/chem-corpora.htmlNERBiomedical?EnglishNoMEDLINE abstracts2 annotators
61
https://github.com/bigscience-workshop/biomedical/issues/53miRNA10.5256/f1000research.6352.r5979XMLManualNo312014train,testhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4602280/https://www.scai.fraunhofer.de/en/business-research-areas/bioinformatics/downloads/download-mirna-test-corpus.htmlNERBiomedical?EnglishNoMEDLINE abstracts2 annotators
62
https://github.com/bigscience-workshop/biomedical/issues/55MEDIQA 2019 RQE10.18653/v1/W19-5039XMLMEDIAQ 2019No562019train,dev,testhttps://www.aclweb.org/anthology/W19-5039/https://github.com/abachaa/MEDIQA2019/tree/master/MEDIQA_Task2_RQERQEClinical?EnglishNoConsumer Health Questions to NLM, FAQs from NIH
63
https://github.com/bigscience-workshop/biomedical/issues/56MASH-QA10.18653/v1/2020.findings-emnlp.342JSONManualNo102020train,dev,testhttps://people.cs.vt.edu/mingzhu/papers/conf/emnlp2020.pdfhttps://github.com/mingzhu0527/MASHQAQAClinicalApache License 2.0EnglishNoConsumer healthcare articles from WebMDHealthcare experts
64
https://github.com/bigscience-workshop/biomedical/issues/57PICO extraction10.18653/v1/2020.findings-emnlp.274JSON
Manual, Crowdsourcing
No12020NONE
https://aclanthology.org/2020.findings-emnlp.274/
https://github.com/Markus-Zlabinger/pico-annotationSentence ClassificationClinical?EnglishNoSentences annotated with PICO classes. Majority vote does not seem to be materialized in the dataset; needs to be inferred with script.Experts and crowdsourcing (goal was to compare assisted crowd annotation with expert annotation)
65
https://github.com/bigscience-workshop/biomedical/issues/58Hallmarks of Cancer (HoC)10.1093/bioinformatics/btv585textManualNo492016NONEhttps://academic.oup.com/bioinformatics/article/32/3/432/1743783https://github.com/sb895/Hallmarks-of-CancerTopic ClassificationBiomedicalGNU General Public License v3.0EnglishNoPubMed abstractsHallmarks of Cancers corpus1 expert
66
https://github.com/bigscience-workshop/biomedical/issues/59CEI10.1371/journal.pone.0173132text, CoNLLModel-assisted ManualNo132017NONEhttps://journals.plos.org/plosone/article?id=10.1371/journal.pone.0173132https://s-baker.net/resource/cei/Topic ClassificationBiomedicalCC BY 4.0EnglishNoPubMed abstractsChemical Exposure Information (CEI) Corpus
67
https://github.com/bigscience-workshop/biomedical/issues/60Colorado Richly Annotated Full-Text (CRAFT) Corpus-CoNLLManualNo372015train,dev,testhttps://hal.inria.fr/hal-01159065/documenthttps://github.com/UCDenver-ccp/CRAFTNER/NED, CoreferenceBiomedicalCC BY 3.0EnglishNoFull-text journal articles
68
https://github.com/bigscience-workshop/biomedical/issues/61SPL-ADR-200db - Adverse Drug Reactions10.1038/sdata.2018.1XMLManualNo292017train,testhttps://www.nature.com/articles/sdata20181https://bionlp.nlm.nih.gov/tac2017adversereactions/NER, NegationBiomedical?EnglishNoStructured Product Labels
69
https://github.com/bigscience-workshop/biomedical/issues/62Nagel-StandoffManualNo62009NONEhttps://www.ebi.ac.uk/sites/ebi.ac.uk/files/shared/documents/phdtheses/kevin_nagel.pdfhttp://sourceforge.net/projects/bionlp-corpora/files/ProteinResidue/ NERBiomedicalMIT LicenseEnglishNo
70
https://github.com/bigscience-workshop/biomedical/issues/63DIANN IberEval 2018textManualNo112018train,testhttp://ceur-ws.org/Vol-2150/overview-diann-task.pdfhttps://github.com/gildofabregat/DIANN-IBEREVAL-2018NERBiomedicalUnknown/Emailed Spanish, EnglishYesElseiver abstractsparallel annotations3 people
71
https://github.com/bigscience-workshop/biomedical/issues/64CodiEsp-text-ManualNo402020train,dev,testhttp://ceur-ws.org/Vol-2696/paper_263.pdfhttps://zenodo.org/record/3837305#.YL46cfdfjMUDocument ClassificationClinicalCC BY 4.0SpanishNo
72
https://github.com/bigscience-workshop/biomedical/issues/65eHealth-KD 2020Standoff,JSONModel-assisted Manualhttps://huggingface.co/datasets/ehealth_kd222020train,dev,test
http://ceur-ws.org/Vol-2664/eHealth-KD_overview.pdf
https://github.com/knowledge-learning/ehealthkd-2020RE, NER/NEDBiomedicalCC BY-NC-SA 4.0SpanishNoMedline
73
https://github.com/bigscience-workshop/biomedical/issues/137Mantra GSC10.1093/jamia/ocv037XML-Model-assisted ManualNo422015NONEhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4986661/https://files.ifi.uzh.ch/cl/mantra/gsc/GSC-v1.1.zipNER/NEDBiomedicalCC BY 4.0French, Spanish, Dutch, German, EnglishYesEuropean Medicines Agency, Medline, European Patent OfficeParallel datasets
74
https://github.com/bigscience-workshop/biomedical/issues/139eHealth-KD 2019-Standoff-ManualNo192019train,dev,testhttp://ceur-ws.org/Vol-2421/eHealth-KD_overview.pdfhttps://github.com/knowledge-learning/ehealthkd-2019/tree/master/dataRE, NER/NEDBiomedicalCC BY-NC-SA 4.0SpanishNoMedline
75
https://github.com/bigscience-workshop/biomedical/issues/66DrugSemantics Gold Standard10.1016/j.jbi.2017.06.013XMLManualNo182017NONEhttps://www.sciencedirect.com/science/article/pii/S1532046417301363?via%3Dihubhttps://data.mendeley.com/datasets/fwc7jrc5jr/1NERBiomedicalCC BY NC 3.0SpanishNoMedicines Online Information Center - CIMA - that belongs to the Spanish Agency for Medicines and Health Products - AEMPS
76
https://github.com/bigscience-workshop/biomedical/issues/67MoNERo10.18653/v1/W19-5008CoNLLModel-assisted ManualNo52019train,dev,testhttps://www.aclweb.org/anthology/W19-5008.pdfhttps://www.racai.ro/en/tools/text/NERBiomedicalCC BY-SA 4.0RomanianNoBioRo corpus
77
https://github.com/bigscience-workshop/biomedical/issues/68CLEF eHealth 2019, Task 1textManualNo162019train,devhttps://journals.plos.org/plosbiology/article/comments?id=10.1371/journal.pbio.2003217https://www.openagrar.de/receive/openagrar_mods_00046540?lang=enTopic ClassificationBiomedicalDUAGermanNoAnimalTestInfo database (http://animaltestinfo.de)classifying text on animal experiments with ICD codes
78
https://github.com/bigscience-workshop/biomedical/issues/69ESSAI -textManualNo92020NONE
https://www.cambridge.org/core/services/aop-cambridge-core/content/view/5E5DB27872B07185DB58A1507DFA05D8/S1351324920000352a.pdf/div-class-title-supervised-learning-for-the-detection-of-negation-and-of-its-scope-in-french-and-brazilian-portuguese-biomedical-corpora-div.pdf
https://clementdalloux.fr/?page_id=28
Negation/Speculation Classification
Biomedical?
French, Brazilian Portuguese
NoESSAI, CAS
79
https://github.com/bigscience-workshop/biomedical/issues/70CBLUE (Chinese Biomedical Language Understanding Evaluation Benchmark)-JSONManualNo52021train,dev,testhttps://arxiv.org/abs/2106.08087https://tianchi.aliyun.com/dataset/dataDetail?dataId=95414&lang=en-usNER, RE, Topic ClassificationBiomedicalCC BY-NC 4.0ChineseNo
80
https://github.com/bigscience-workshop/biomedical/issues/71Hindi Health DatasettextFoundNo-12018NONEhttps://www.kaggle.com/aijain/hindi-health-dataset/homeNERClinical?HindiNotumor morphology
81
https://github.com/bigscience-workshop/biomedical/issues/72CANTEMIST (CANcer TExt Mining Shared Task)Standoff-ManualNo402020train,dev,testhttp://ceur-ws.org/Vol-2664/cantemist_overview.pdfhttps://temu.bsc.es/cantemist/NER/NED, Multi-label Document ClassificationClinical?SpanishNo
82
https://github.com/bigscience-workshop/biomedical/issues/73Swedish Medical NERtextRules,Manualhttps://huggingface.co/datasets/swedish_medical_ner122016train,dev,testhttps://aclanthology.org/W16-5104.pdfhttps://github.com/olofmogren/biomedical-ner-data-swedishNERBiomedicalCC BY-SA 4.0SwedishNoSwedish Wikipedia, Läkartidningen, and 1177.se.NER dataset on medical text in Swedish.
83
https://github.com/bigscience-workshop/biomedical/issues/74QUAEROBRATManualNo602014train,testhttp://www.lrec-conf.org/proceedings/lrec2014/workshops/LREC2014Workshop-BioTxtM2014%20Proceedings.pdf#page=33https://quaerofrenchmed.limsi.frNER/NEDClinicalGNU Free Documentation LicenseFrenchNoEMEA/MEDLINE/EPO
84
https://github.com/bigscience-workshop/biomedical/issues/222PubTator Central10.1093/nar/gkz389PubTator-ModelNo1372019NONEhttps://academic.oup.com/nar/article/47/W1/W587/5494727https://ftp.ncbi.nlm.nih.gov/pub/lu/PubTatorCentral/NER/NED, CoreferenceBiomedicalPublic DomainEnglishNoPubmed abstracts, full text
85
https://github.com/bigscience-workshop/biomedical/issues/223BioScopeXMLManualNo4412008NONEhttps://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-S11-S9https://rgai.inf.u-szeged.hu/node/105Negation, Uncertain, ScopeBiomedicalPublic for ResearchEnglishNoClinical texts, biological full papers, biological paper abstracts from Genia2 linguists
86
https://github.com/bigscience-workshop/biomedical/issues/224Multi-XScience10.18653/v1/2020.emnlp-main.648JSONRuleshttps://huggingface.co/datasets/multi_x_science_sum162020train,dev,testhttps://arxiv.org/abs/2010.14235https://github.com/yaolu/Multi-XScienceMulti-doc SummarizationBiomedicalMIT LicenseEnglishNoarXiv articles and Microsoft Academic Graph
87
https://github.com/bigscience-workshop/biomedical/issues/225MedHopJSONRules,Manualhttps://huggingface.co/datasets/med_hop2462017train,dev
https://transacl.org/ojs/index.php/tacl/article/viewFile/1325/299
http://qangaroo.cs.ucl.ac.uk
Reading Comprehension
BiomedicalCC BY-SA 3.0EnglishNo
88
https://github.com/bigscience-workshop/biomedical/issues/226
CORD-NERJSONRulesNo362020NONEhttps://arxiv.org/abs/2003.12218https://uofi.box.com/s/k8pw7d5kozzpoum2jwfaqdaey1oij93xNERBiomedical?EnglishNo
89
https://github.com/bigscience-workshop/biomedical/issues/169MedQuAD10.1186/s12859-019-3119-4XML-RulesNo592019NONEhttps://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3119-4https://github.com/abachaa/MedQuADQAClinicalCC BY 4.0EnglishNo12 NIH websitesMedQuAD includes 47,457 medical question-answer pairs created from 12 NIH websites (e.g. cancer.gov, niddk.nih.gov, GARD, MedlinePlus Health Topics). The collection covers 37 question types (e.g. Treatment, Diagnosis, Side Effects) associated with diseases, drugs and other medical entities such as tests.
90
this dataset is not available - should we remove it?MedSTS10.1007/s10579-018-9431-1?BLUEManualNo432018train,testhttps://arxiv.org/abs/1808.09397Semantic SimilarityClinicalPRIVATEEnglishNo
91
https://github.com/bigscience-workshop/biomedical/issues/227ShAReCLEF 2013 Task 1textBLUEManualNo1212013https://pubmed.ncbi.nlm.nih.gov/25147248/https://physionet.org/content/shareclefehealth2013/1.0/NER/NEDClinicalDUAEnglishNoTwo professional coders (a healthcare professional who has been trained to analyze clinical records and assign standard codes using a classification system) trained for this task annotated each clinical note in a double-blind manner, followed by adjudication.
92
https://github.com/bigscience-workshop/biomedical/issues/228MEDIQA 2019 QA10.18653/v1/W19-5039XMLMEDIQA 2019ManualNo562019train,dev,testhttps://www.aclweb.org/anthology/W19-5039/https://github.com/abachaa/MEDIQA2019/tree/master/MEDIQA_Task3_QAQAClinical?EnglishNoConsumer health QA system CHiQAMedical experts
93
https://github.com/bigscience-workshop/biomedical/issues/229TREC-2017 LiveQAXMLFound,ManualNo202017https://trec.nist.gov/pubs/trec26/papers/Overview-QA.pdfhttps://github.com/abachaa/LiveQA_MedicalTask_TREC2017QAClinical?EnglishNoConsumer Health Questions to NLM
94
https://github.com/bigscience-workshop/biomedical/issues/230EPIC-QAJSONModel-assisted ManualNo?2020?https://bionlp.nlm.nih.gov/epic_qa/#collectionQAClinicalhttps://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/EnglishNoCORD-19Epidemic Question Answering for ad-hoc questions about the disease COVID-19Answer judgments will be provided by librarian indexers at the U.S. National Library of Medicine (NLM).
95
https://github.com/bigscience-workshop/biomedical/issues/231MedDialogtext, JSONFoundhttps://huggingface.co/datasets/medical_dialog12020train,dev,testhttps://arxiv.org/abs/2004.03329https://github.com/UCSD-AI4H/Medical-Dialogue-SystemDialog ClassificationClinicalPublic for ResearchEnglish, ChineseNoiclinic.com, healthcaremagic.com, haodf.comThe MedDialog dataset contains conversations between doctors and patients. It has 1.1 million dialogues in Chinese and 0.26 million dialogues in English
96
https://github.com/bigscience-workshop/biomedical/issues/170CSIRO: Matching Patients to Clinical Trials10.1145/2911451.2914672text, XML-ManualNo142015NONEhttps://dl.acm.org/doi/abs/10.1145/2911451.2914672https://data.csiro.au/collections/collection/CIcsiro:17152v1IRClinicalCC BY-SA 4.0EnglishNoClinicalTrials.govA Test Collection for Matching Patient to Clinical Trials4 Medical professionals
97
https://github.com/bigscience-workshop/biomedical/issues/232ParaMedtextFoundNo42020train,dev,testhttps://arxiv.org/abs/2005.09133https://github.com/boxiangliu/ParaMedTranslationBiomedical?Mandarin, EnglishYesNEJM
98
https://github.com/bigscience-workshop/biomedical/issues/233PhoNER10.18653/v1/2021.naacl-main.173CoNLL-ManualNo52021train,dev,testhttps://aclanthology.org/2021.naacl-main.173/https://github.com/VinAIResearch/PhoNER_COVID19NERBiomedical?VietnameseNoNews sites
99
https://github.com/bigscience-workshop/biomedical/issues/234Evidence-Infer-TreatmenttextManualhttps://huggingface.co/datasets/evidence_infer_treatment72020train,dev,testhttps://arxiv.org/abs/2005.04177https://github.com/jayded/evidence-inferenceReasoningClinicalMIT LicenseEnglishNoGiven both the answers and rationales of the prompt generator and prompt annotator, a third doctor — the verifier — was asked to determine the validity of both of the previous stages
100
https://github.com/bigscience-workshop/biomedical/issues/172CADEC10.1016/j.jbi.2015.03.010BRAT-ManualNo1742015NONEhttps://www.sciencedirect.com/science/article/pii/S1532046415000532?via%3Dihubhttps://data.gov.au/dataset/ds-dap-csiro%3A10948/details?q=NER/NER, RESocial MediaCSIRO Data License (Non-commercial)EnglishNoAskaPatientAn annotated corpus of consumer reviews in pharmacovigilance.four medical students