SIG-DLS Tool Inventory

	A	B	C	D	E	F	G	H	I	J	K
1		User's email	Tool name and version	Tool type	Tool references	Use case short description	Use case documentation	User's affiliation	Feedback	TaDiRAH	@dropdown
2	Your name, as the person providing this piece of information	Your email, just in case we need to check some information with you	The computational tool, software or library whose use you are going to describe. Please use unique referent identifiers (Handle, Pid, ...) whenever available and specifiy the version if applicable.	Specify what type of tool it is, web application, command line tool, library, ...	Add bibliography and or link to web page of the tool, and or page of existing repositories	The real world DLS use case (research, teaching) where the tool was used. Notice: PLEASE DO NOT INTRODUCE A TOOL DEVELOPED BY YOURSELF OR AD-HOC TOOLS. We are interested in recent use cases (within the last five years) of off-the-shelf tools. - Create one line per use case/tool.	Insert here reference to 1) published research (papers, books, blog posts) with links or 2) projects, or 3) academic courses - please use unique identifier of the publication (ISBN, DOI...) for publications whenever possible	Affiliation of the people using the tool at time of RESEARCH/USE, Institution - City - Country	Brief and constructive report on strengths and weaknessess of the tool/version that you used. Usability with regard to your research question	Optional. Add link to TaDiRAH taxonomy http://tadirah.dariah.eu/vocab/

3	Francesca Frontini	francescafrontini@gmail.com	FactoMiner 1.24	R library	Husson 2011 Husson, F., Josse, J., Le, S., and Mazet, J. "FactoMineR: Multivariate Exploratory Data Analysis and Data Mining with R", R package version 1.24 (2011). http://factominer.free.fr/	Stylistic comparison of four French novels. The FactoMiner R library was used to perform CA on a set of extracted syntactic patterns in order to compare the style of four French novel.	Francesca Frontini, Mohamed Amine Boukhaled, Jean-Gabriel Ganascia: Mining for characterising patterns in literature using correspondence analysis: an experiment on French novels. Digital Humanities Quarterly 11(2) (2017)	Labex OBVIL, Université Pierre et Marie Curie Paris 6, Paris, France		http://tadirah.dariah.eu/vocab/index.php?tema=31&/stylistic-analysis
4	Simone Rebora	simone.rebora81@gmail.com	Stylo 0.6.8	R library	Eder, M., Rybicki, J. and Kestemont, M. (2016). Stylometry with R: a package for computational text analysis. R Journal, 8(1): 107-121, url: https://journal.r-project.org/archive/2016/RJ-2016-007/index.html	Attribution to Robert Musil of a series of articles published in the journal "Tiroler Soldaten-Zeitung"	Rebora, Simone, J. Berenike Herrmann, Massimo Salgaro, and Gerhard Lauer. 2018. “Robert Musil, a War Journal, and Stylometry: Tackling the Issue of Short Texts in Authorship Attribution”. Digital Scholarship in the Humanities, [in press]. https://doi.org/10.1093/llc/fqy055	University of Verona, Verona, Italy; University of Basel, Basel, Switzerland	The tool was perfectly fit for our research (for both validation and experiments) and it was easily integrated in a series of ad-hoc scripts. Only (minor) shortfalls: it imposed a series of operations not immediately useful for our goal, thus increasing computation time; we had to slightly modify the "oppose" code in order to make all the most preferred/avoided words appear into the graph.	http://tadirah.dariah.eu/vocab/index.php?tema=31&/stylistic-analysis
5	Berenike Herrmann	juliaberenike@gmail.com	koRpus 0.10-2	R library	Michalke, Meik: koRpus: An R Package for Text Analysis (Version 0.10-2), 2017. URL: https://reaktanz.de/?c=hacking&s=koRpus	Used koRpus to measure text fragments for readability, focusing on Flesch index. Readability scores were correlated with other style markers (POS and relation to metaphor) to approximate a combined measure of "vividness" of the prose.	Herrmann, J. B. (accepted). Operationalisierung der Metapher zur quantifizierenden Untersuchung deutschsprachiger literarischer Texte im Übergang von Realismus zur Moderne. In Jannidis, Fotis (Ed.), Tagungsband des DFG-Symposiums „Digitale Literaturwissenschaft”, Villa Vigoni, De Gruyter.	University of Basel, Basel, Switzerland; University of Göttingen, Göttingen, Germany		http://tadirah.dariah.eu/vocab/index.php?tema=31&/stylistic-analysis; http://tadirah.dariah.eu/vocab/index.php?tema=30&/structural-analysis
6	Nanette Rißler-Pipka	nanette.rissler@gmail.com	Stylo 0.6.5	R library	Eder, M., Rybicki, J. and Kestemont, M. (2016). Stylometry with R: a package for computational text analysis. R Journal, 8(1): 107-121, url: https://journal.r-project.org/archive/2016/RJ-2016-007/index.html	Testing several candidates for the authorship of the second volume of the "Quijote", published under the pseudonym Fernández de Avellaneda and discussing the differences in style between the "Quijote II" by Cervantes and the apocryph version by Avellaneda. Testing cluster analysis and rolling delta.	Nanette Rißler-Pipka: Die Digitalisierung des goldenen Zeitalters – Editionsproblematik und stilometrische Autorschaftsattribution am Beispiel des Quijote. In: Zeitschrift für digitale Geisteswissenschaften. Wolfenbüttel 2018. text/html Format. DOI: 10.17175/2018_004	Karlsruhe Institute of Technology, Karlsruhe, Germany; University of Siegen, Siegen, Germany	Perfect to proof unliable methods in authorship attribution wrong. Not convincing enough for the community of "cervantistas".	http://tadirah.dariah.eu/vocab/index.php?tema=31&/stylistic-analysis
7	Octave Julien	firstname.lastname@univ-paris1.fr	TXM	Desktop or Web application	textometrie.ens-lyon.fr / Heiden, S., Magué, J-P., Pincemin, B. (2010). TXM : Une plateforme logicielle open-source pour la textométrie – conception et développement. In I. C. Sergio Bolasco (Ed.), Proc. of 10th International Conference on the Statistical Analysis of Textual Data - JADT 2010) (Vol. 2, p. 1021-1032). Edizioni Universitarie di Lettere Economia Diritto, Roma, Italy. Online. / Heiden, S. (2010). The TXM Platform : Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In K. I. Ryo Otoguro (Ed.), 24th Pacific Asia Conference on Language, Information and Computation (p. 389-398). Institute for Digital Enhancement of Cognitive Development, Waseda University.			PIREH, Université Paris 1 Panhéon Sorbonne	Multipurpose, opensource text analysis software, integrates TreeTagger for lemmatisation, supports CQL queries, and multiple common formats for corpora encoding.
8	Octave Julien	firstname.lastname@univ-paris1.fr	Iramuteq	Desktop application	http://www.iramuteq.org/			PIREH, Université Paris 1 Panhéon Sorbonne	Text analysis software, useful for its implementation of the Reinert/Alceste method (classification of text segments) and cooccurences analysis. Graphical interface, based on R, also produces R outputs.
9	Octave Julien	firstname.lastname@univ-paris1.fr	Lexico3 (beta of version 5 available)	Desktop application	http://www.lexi-co.com/index.html			PIREH, Université Paris 1 Panhéon Sorbonne	Multipurpose text analysis software. Can deal easily with verly large corpora (millions of words). Implements specific and powerful tools for the analysis of chronological evolutions within a corpus, of syntagms, and of patterns of repetitions of a word or syntagm within a corpus
10	Dominique Legallois	dominique.legallois@sorbonne-nouvelle.fr	Quanteda, tidytext, TM,	R library	Kenneth Benoit, julia Silge	textometry
11	Dominique Legallois	dominique.legallois@sorbonne-nouvelle.fr	SDMC		https://tal.lipn.univ-paris13.fr/sdmc/	extraction of syntactic patterns
12	Jan Rybicki	jkrybicki@gmail.com	Docuscope	Desktop application	https://www.cmu.edu/dietrich/english/research/docuscope.html	rhetorical analysis	Jonathan Hope, Michael Witmore (2014). "Quantification and the language of later Shakespeare," Actes des congrès de la Société française Shakespeare. 123-149. doi: 10.4000/shakespeare.2830.	Strathclyde U., UK	suite of interactive visualization tools for corpus-based rhetorical analysis
13	Jan Rybicki	jkrybicki@gmail.com	TRACER	Desktop application	https://www.etrap.eu/research/tracer/	analysis of intertextuality, versioning, comparing translations	Franzini, G. (2016) ‘English translations of Pan Tadeusz: a comparison with TRACER‘, Corpus-based Research in the Humanities workshop. January, 19. Online.	University of Göttingen, Germany	TRACER is a suite of 700 algorithms, whose features can be combined to create the optimal formula for detecting those words, sentences and ideas that have been reused across texts. Created by Marco Büchler, TRACER is designed to facilitate research in text reuse detection and many have made use of it to identify plagiarism in a text, as well as verbatim and near verbatim quotations, paraphrase and even allusions. The thousands of feature combinations that TRACER supports allow to investigate not only contemporary texts, but also complex historical texts where reuse is harder to spot.
14	Jan Rybicki	jkrybicki@gmail.com	WCopyFind	Desktop Application	http://plagiarism.bloomfieldmedia.com/wordpress/software/wcopyfind/	plagiarism detection, common word n-gram detection	Anna FIlipek (2014). „Pan Tadeusz”, or Translating the Untranslatable: An Analysis of English Translations, M.A. Thesis. Kraków: Uniwersytet Jagielloński	Jagiellonian University, Kraków, Poland	WCopyfind is an open source windows-based program that compares documents and reports similarities in their words and phrases.
15	Allen Riddell	riddella@indiana.edu	lxml	Python package	https://pypi.org/project/lxml/	Loading data		Indiana University Bloomington, Bloomington, USA	None; it's a mature, well-tested Python package.
16	Allen Riddell	riddella@indiana.edu	matplotlib	Python package	https://pypi.org/project/matplotlib	Plotting		Indiana University Bloomington, Bloomington, USA	None; it's a mature, well-tested Python package.
17	Allen Riddell	riddella@indiana.edu	nltk	Python package	https://pypi.org/project/nltk	Text analysis		Indiana University Bloomington, Bloomington, USA	None; it's a mature, well-tested Python package.
18	Allen Riddell	riddella@indiana.edu	numpy	Python package	https://pypi.org/project/numpy	Text analysis		Indiana University Bloomington, Bloomington, USA	None; it's a mature, well-tested Python package.
19	Allen Riddell	riddella@indiana.edu	scipy	Python package	https://pypi.org/project/scipy	Text analysis		Indiana University Bloomington, Bloomington, USA	None; it's a mature, well-tested Python package.
20	Allen Riddell	riddella@indiana.edu	pandas	Python package	https://pypi.org/project/pandas	Loading data, summarizing data		Indiana University Bloomington, Bloomington, USA	None; it's a mature, well-tested Python package.
21	Allen Riddell	riddella@indiana.edu	pystan	Python package	https://pypi.org/project/pystan	Analyzing data		Indiana University Bloomington, Bloomington, USA	None; it's a mature, well-tested Python package.
22	Allen Riddell	riddella@indiana.edu	scikit-learn	Python package	https://pypi.org/project/scikit-learn	Analyzing data, making predictions		Indiana University Bloomington, Bloomington, USA	None; it's a mature, well-tested Python package.
23	Allen Riddell	riddella@indiana.edu	statsmodels	Python package	https://pypi.org/project/statsmodels	Analyzing data, making predictions		Indiana University Bloomington, Bloomington, USA	None; it's a mature, well-tested Python package.
24	Allen Riddell	riddella@indiana.edu	pytorch	Python package	https://pytorch.org/	Analyzing data, making predictions		Indiana University Bloomington, Bloomington, USA	Easier to use than Tensorflow for building language models of text.
25	Allen Riddell	riddella@indiana.edu	cartopy	Python package	https://pypi.org/project/cartopy	Making maps		Indiana University Bloomington, Bloomington, USA	None; it's a mature, well-tested Python package.
26	Jonathan Reeve	jonathan.reeve@columbia.edu	text-matcher	Python package	https://pypi.org/project/text-matcher/	Text reuse detection, plagiarism detection	Jonathan Reeve, Milan Terlunen, and Sierra Eckert. "Middlemarch Critical Histories." [Forthcoming]	Columbia University, New York City, USA	Needs test suite, better documentation.
27	Jonathan Reeve	jonathan.reeve@columbia.edu	macro-etym	Python package	https://github.com/JonathanReeve/macro-etym	Macro-etymological text analysis	Reeve, Jonathan. "A macro-etymological analysis of James Joyce’s A Portrait of the Artist as a Young Man." Reading Modernism with Machines. Palgrave Macmillan, London, 2016. 203-222.	Columbia University, New York City, USA	Needs test suite, some bugfixes, better packaging for data in PyPi.
28	Jonathan Reeve	jonathan.reeve@columbia.edu	chapterize	Python package	https://pypi.org/project/chapterize/	Text segmentation		Columbia University, New York City, USA	Needs a less deterministic approach to chapter detection
29	Jonathan Reeve	jonathan.reeve@columbia.edu	spacy	Python package	https://spacy.io/	Natural language processing			Language models are often difficult to install
30	Jonathan Reeve	jonathan.reeve@columbia.edu	textacy	Python package	https://pypi.org/project/textacy/	Natural language processing			Some open bugs (see GitHub issues)
31	Fotis Jannidis	fotis@jannidis.de	gensim	Python package	https://radimrehurek.com/gensim/	Natural language processing
32	Fotis Jannidis	fotis@jannidis.de	spacy	Python package	https://spacy.io/	Natural language processing
33	Fotis Jannidis	fotis@jannidis.de	umap-learn	Python package	https://github.com/lmcinnes/umap	Dimensionality reduction
34	Fotis Jannidis	fotis@jannidis.de	seaborn	Python package	https://seaborn.pydata.org/	Visualization
35	Fotis Jannidis	fotis@jannidis.de	keras	Python package	https://keras.io/	Deep Learning Framework
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100