Lexical databases (repositories of information about words in a language) have been crucial to making advances in psycholinguistic research and improving our understanding of language processes. Many lexical databases for spoken languages have been created, compiling an enormous amount of detailed information about spoken and written words. For instance, the English Lexicon Project provides information about lexical frequency, neighborhood density, and orthographic and phonological length, morphological structure, and part of speech for more than 40,000 English words (Balota et al., 2007); for other databases see, for example, WordNet (Miller, Beckwith, Fellbaum, Gross, Miller, 1990; Miller, 1995) and MRC Psycholinguistic Database (Coltheart, 1981) for English, CELEX for English, Dutch, and German, (Baayen, Piepenbrock, & Gulikers, 1995) and LEXIQUE for French (New, Pallier, Brysbaert, & Ferrand, 2004). Numerous studies have demonstrated the importance of these properties for spoken and written language processing, making lexical databases critical tools for testing hypotheses and for controlling extraneous aspects of processing. It is not surprising that databases such as these have been collectively cited more than 23,000 times in studies of speech perception and production, literacy, bilingualism, language acquisition, dyslexia, Alzheimer’s Disease, autism, aphasia, memory, emotion, and machine learning (the number represents the sum of citations listed on Google Scholar on 5 August 2015). Not only have lexical databases been used in scientific research, these resources have also been critical to curriculum and assessment development (e.g., van Bon, Bouwmans, Broeders, 2006; Whitworth, Webster, & Howard, 2014).

Unfortunately, no large corpora or lexical databases are currently available for American Sign Language (ASL). Currently, only two small-scale lexical resources exist for ASL. Mayberry, Hall, and Zvaigzne (2014) published a list of subjective frequency ratings for 432 ASL signs, but the signs are not coded for lexical or phonological properties and were not rated for iconicity. Morford and MacFarlane (2003) created a corpus of 4,111 ASL sign tokens as a preliminary study of frequency in ASL, but this corpus is not publically available. There are a few small-scale databases available for other sign languages. For example, Vinson et al. (2008) developed a database for British Sign Language consisting of 300 signs rated by deafFootnote 1 signers for frequency, iconicity, and age of acquisition. Gutierrez-Sigut, Costello, Baus, and Carreiras (2015) created a searchable database for Spanish Sign Language consisting of 2,400 signs and 2,700 non-signs that were coded for phonological and grammatical properties (although frequency and iconicity data are not currently available). There are also a number of on-going efforts to develop large annotated corpora for other signed languages (New Zealand Sign Language: McKee & Kennedy, 2006; Australian Sign Language: Johnson, 2012; British Sign Language: Schembri et al., 2011).

Without a more comprehensive lexical database for ASL, it is difficult to develop well-controlled studies of ASL structure and processing. Ideally a database should have breadth—normative information for many lexical and phonological properties, and depth—many or all of the lexical items in the lexicon. To begin to address this need, we developed ASL-LEX, a broad lexical database of nearly 1,000 ASL signs. The database includes subjective frequency ratings by deaf signers and iconicity ratings by hearing non-signers. Each sign in ASL-LEX has been coded for four lexical properties (initialization, lexical class, compounding, fingerspelling) and for six phonological properties from which sub-lexical frequencies and neighborhood densities have been calculated. The database also includes information about sign length (reported as sign onset and offset times measured from a reference video clip of each sign) and, for a subset of signs, information about English translation consistency. ASL-LEX is available in CSV format through the Open Science Framework (OSF) (http://osf.io/53vmf/) and as a searchable, interactive visualization through the ASL-LEX website (http://asl-lex.org). In addition to sign data, the website provides access to the reference video clip for each sign. The videos are only available for download with the authors’ permission.

Like speakers, signers are sensitive to lexical frequency; for instance, lexical decision and naming times are longer for low than high frequency signs (e.g., Carreiras, Gutiérrez-Sigut, Baquero, & Corina, 2008; Emmorey, Petrich, & Gollan, 2013). For spoken languages, lexical frequency is commonly measured by counting the frequency of occurrence in large written and/or spoken corpora (for a discussion of these sources, see Brysbaert & New, 2009). However, because there is no conventional written form for sign languages, corpus-based frequency counts need to be derived from transcribed datasets. This method requires considerable effort and even the largest corpora currently available for a sign language do not even approach the size of those available for spoken language (i.e., millions of words). As an alternative, most psycholinguistic studies of sign language utilize subjective measures of sign frequency created by asking language users to estimate how frequently they encounter the sign. This is the measure of frequency included in ASL-LEX. Subjective frequency is highly correlated with corpus counts for both signed language (Fenlon, Schembri, Rentelis, Vinson, & Cormier, 2014) and spoken language (Balota, Pilotti, & Cortese, 2001).

Many signs are iconically motivated: there is a resemblance between form and meaning. Whereas in spoken language iconic motivation is primarily found in phenomena like onomatopoeia and sound symbolism (e.g., Hinton, Nichols, & Ohala, 2006), the visual modality abounds with opportunities to create sign forms that resemble their meaning. The role of iconicity in sign language processing and acquisition has been of great interest for decades (e.g., Emmorey et al., 2004; Frishberg, 1975; Orlansky & Bonvillian, 1984; Taub, 2001; Thompson, Vinson, Vigliocco, 2009). Iconicity has also been of interest to linguists, as iconicity appears to have a complex relationship with phonological regularity (e.g., Brentari, 2007; Eccarius, 2008; van der Hulst & van der Kooij, 2006; van der Kooij, 2002), as well as with semantics and syntax (e.g., Wilbur, 2003). Because sign languages offer a unique opportunity to study the impact of iconicity on linguistic structure and processing, ratings of iconicity are of particular value in a signed lexical database. As such, ASL-LEX includes a holistic measure of the degree to which a sign is visually similar to its referent. This is similar to the approach used by Vinson et al. (2008) in a corpus of British Sign Language.

Like spoken languages, sub-lexical (phonological) features play an important role in the way sign languages are organized and processed. Many sub-lexical features are distinctive, i.e., minimal pairs of signs exist that differ by only a single property; for example, in ASL, the signs ONION and APPLE differ only in their location. Additionally, psycholinguistic experiments have shown significant priming effects for phonologically related signs, indicating that phonological information is extracted during sign production and comprehension (Baus, Gutiérrez-Sigut, Quer, & Carrieras, 2008; Baus, Gutiérrez & Carreiras, 2014; Corina & Emmorey, 1993; Corina & Hildebrandt, 2002; Corina & Knapp, 2006; Dye & Shih, 2006). Unfortunately, the specific direction of phonological priming effects have been decidedly mixed in the literature, which may be an artifact of the different ways in which phonological overlap has been defined across studies (see Caselli & Cohen-Goldberg, 2014). These facts make it important to have an easily searchable, standardized phonological description of signs for use in ASL research.

ASL-LEX provides a linguistically motivated transcription of six phonological properties for each sign in the database: Sign type (Battison, 1978), Location (Major and Minor), Selected Fingers, Flexion, and Movement. First and foremost, these transcriptions make it possible to easily select stimuli with phonological descriptions that are consistent across studies. They may also be useful for linguistic analyses, facilitating the identification of fine-grained phonological patterns among various phonological features and between phonological and lexical properties across the lexicon. Since these transcriptions in effect represent the application of a particular phonological theory to a large swath of the ASL lexicon, ASL-LEX may be useful in assessing how well particular phonological formalisms describe the ASL phonological system. Lastly, consistent phonological transcriptions can serve as a machine-readable resource for ASL-related technology such as automated systems for sign recognition and production.

ASL-LEX also provides several measurements of the distribution of phonological properties in ASL. Research on spoken languages has suggested that sound structure is represented at multiple “grains” (e.g., sub-segmental, segmental, suprasegmental, lexical neighborhoods). Given the relatively fledgling status of sign language research, these distinctions have not been consistently made or investigated in psycholinguistic experiments on sign perception and production. To facilitate research in this area, we provide data about two grains of ASL sign phonology: sub-lexical frequency and neighborhood density. The terms sub-lexical frequency and neighborhood density have also not been consistently used in the literature. We define sub-lexical frequency as the frequency with which each sub-lexical feature value appears in the lexicon. This is straightforwardly calculated as the number of signs containing a particular value (e.g., the sub-lexical frequency of the forehead as a minor location is simply the number of signs that are made on the forehead). ASL-LEX reports the frequency of each value of the six phonological properties described above, plus handshape (unique combinations of flexion and selected fingers). Neighborhood density refers to the number of signs that are phonologically similar to a given target sign. We provide three broad measures of neighborhood density for each sign: Maximal Neighborhood Density, Minimal Neighborhood Density and Parameter-Based Neighborhood Density, defined as the number of signs that share at least 4/5, 1/5, and 4/4 sub-lexical features respectively, with the target sign (see below). Ideally, phonological distributions should be calculated over all of the signs of a language. As a first step to this goal, ASL-LEX provides sub-lexical frequency and neighborhood density counts calculated over all signs contained in the database.

In the following sections we describe the procedures we used to create ASL-LEX. We also report descriptive statistics for a number of sign properties. These data are useful in that they provide a characterization of the database and constitute a first-order description of much of the lexicalized ASL lexicon. We report which phonological properties appear more or less commonly in ASL signs. We then report a number of analyses designed to more deeply understand how phonological, lexical, and semantic factors interact in the ASL lexicon. For example, how are iconicity and lexical frequency related to each other? Is the frequency of certain phonological properties correlated with lexical frequency or iconicity? The answers to these questions provide important information for researchers interested in how signs are acquired and processed and may also illuminate how the lexicon evolves over time.

Methods

Deaf participants: Subjective frequency ratings

A total of 69 deaf adults (45 female; M age = 34 years, SD = 11 years) were included in the frequency rating study. Each ASL sign was rated for subjective frequency by 25–31 deaf signers. An additional 22 participants were recruited, but were excluded because (a) they did not complete at least one section of the ratings survey (N = 7), (b) they did not use the rating scale appropriately (i.e., their ratings had a standard deviation of only 1 or less; N = 8), or (c) they had acquired ASL after age six (N = 8). Nearly all participants were either congenitally deaf (N = 60) or became deaf before age 3 years (N = 8); one participant (who acquired ASL from birth) became deaf at age 10 years. Sixty-seven participants reported severe to profound hearing loss, and two reported moderate hearing loss. All participants reported using ASL as their preferred and primary language, and all rated their ASL fluency as high on a 1–7 self-evaluation scale (7 = fluent; M = 6.78, SD = 0.51). Thirty-nine participants were native signers (25 female; M age = 33 years, SD = 11) who acquired ASL from birth, and 30 participants (20 female; M age = 34 years, SD = 11) were “early signers” who acquired ASL before age 6 years. Subjective frequency ratings were highly correlated for the native and early signers, r = .94, p < .001 (standardized z-scores), and the mean ratings did not differ between these two groups, Kruskal-Wallisχ2 (1, 69) = .80, p = .37). These findings replicate those of Mayberry et al. (2014) who found that subjective frequency ratings did not differ for early and native signers. All analyses reported here are calculated over the full participant group but we also present the subjective frequency ratings for native signers separately in ASL-LEX for the convenience of researchers who wish to utilize native-only ratings.

The participants were recruited from across the USA and were compensated for their time. Forty percent of the participants were born in the West of the USA (primarily California), 29 % in the North-East, 13 % in the Mid-West, 6 % in the South, and 12 % did not report information about their birth place. Fifty-nine percent of the participants currently reside in the West of the USA (primarily California), 16 % in the North-East, 10 % in the South, 8 % in the Mid-West, and % did not report this information, and one participant resided abroad.

Hearing participants: Iconicity ratings

Each ASL sign was rated for iconicity by 21–37Footnote 2 hearing English speakers on Mechanical Turk (http://www.mturk.com). All participants reported normal or corrected-to-normal vision. None of the participants knew more than ten signs in any signed language. Non-signing participants were chosen partly because Vinson et al. (2008) previously reported that some signers rated initialized signs as highly iconic because the handshape was the fingerspelled counterpart to the first letter of an English translation. We were also concerned that folk stories about iconic origins of signs might influence iconicity ratings in signers. For example, the sign GIRL is produced with a curved movement of the thumb on the cheek bearing little resemblance to a girl, but folk etymology suggests that this sign was created to represent the chinstrap of a bonnet. By gathering iconicity ratings from non-signers, the ratings cannot be influenced by folk etymology and instead provide a better measure of the visual similarity between the sign form and referent.

Mechanical Turk workers and laboratory participants have been shown to perform similarly on a number of cognitive and perceptual experimental paradigms (e.g., Germine, Nakayama, Duchaine, Chabris, Chatterjee, & Wilmer, 2012). Two steps were taken to ensure that participants were human (e.g., not automated scripts) and were making genuine ratings. Participants had to complete a CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) in order to begin the survey. Additionally, each survey section included one question that was visually similar to the other questions (included a video and a rating scale), but asked participants to enter the number “5” rather than to rate the iconicity of the video. Participants who did not enter a 5 were excluded. The nationality of these participants is unknown, and we note that there may be cultural differences among the participants that could affect ratings of iconicity.

Materials

Stimuli selection and preparation

ASL signs were drawn from several sources: previous in-house psycholinguistic experiments, the Appendix from Mayberry et al. (2014), ASL translations of British Sign Language (BSL) signs from Vinson et al. (2008), and ASL translations of low and high frequency English words from SUBTLEXUS (http://expsy.ugent.be/subtlexus/). The latter were selected in order to create frequency-balanced survey sections (see below). “Neutral” fingerspelled words (Haptonstall-Nykaza & Schick, 2007) were not included, although a few lexicalized fingerspelled signs were included (#BACK, #FEBRUARY). Classifier constructions (also known as depicting constructions or polycomponential signs) were not included.

All ASL signs were produced by the same deaf native signer (female, middle-aged, White, born in the North-East USA, resides in California). Signs were produced with appropriate mouth gestures or spontaneous mouthings of the corresponding English word. Mouthing was not prevented because mouthing is a common feature of ASL signs (Nadolske & Rosenstock, 2007), and isolated signs produced with no mouth movements are perceived as unnatural.

A total of 1,011 ASL signs were rated for frequency by the deaf participants and for iconicity by the hearing participants. Of this original set of 1,011 signs, five signs were excluded from ASL-LEX because at least 50 % of participants indicated they did not know the sign and a further 13 signs were discovered to be duplicates once the phonological transcriptions were obtained (e.g., signs glossed as GAVEL and HAMMER were identical, so GAVEL was removed). If signs were similar but not identical (i.e., they differed slightly either manually or non-manually), then both variants were retained. Thus 993 signs were ultimately included in the database.

To collect ratings data, the signs were divided into four batches (labeled A, B, C, and D). There were 270 signs to be rated in each batch, with the exception of the last batch (D) which contained 282 signs. For ease of rating and to create breaks, the batches were administered in three sub-sections (with 90 items each). In batch A, each deaf participant rated at least one sub-section, in batches B, C, and D, each participant rated all three sub-sections for subjective frequency. Thirty-four deaf participants rated two or more batches. The order of presentation of signs within a sub-section was constant. For iconicity ratings, each hearing participant rated only one sub-section of 90 items, and the order of the signs within a sub-section was randomized. A second set of iconicity ratings were collected from hearing participants for 54 signs because the dominant translation provided by the deaf signers ultimately turned out to be different for these signs than what was originally used (see below). Only the revised ratings appear in the database.

In an attempt to ensure that high and low frequency signs were evenly distributed across batches and within each sub-section of the batch, we used the frequency of English translations as a proxy for ASL frequency. We obtained the log10 word frequency score per million for each sign’s English translation from SUBTLEXUS and used this data to create sub-sections with similar frequency distributions. The sub-sections did not differ significantly in mean log10 word frequency scores, F (2, 971) = .38, p = .68.

Procedure

The sign recordings were exported at a frame rate of 29.93 frames/s, and signs were edited into individual video clips (there was no carrier phrase). The video clips (video dimensions 640 × 480 pixels) were uploaded to YouTube and incorporated into an online survey tool, Survey Monkey (http://www.surveymonkey.com) for the frequency ratings by deaf participants. For the iconicity ratings, the same video clips (315 × 420 pixels) were accessed and rated through Mechanical Turk by hearing participants.

Frequency rating procedure

Participants completed the rating studies remotely via Survey Monkey. At the beginning of each sub-section, participants viewed instructions recorded in ASL and written English (see the Appendix for English instructions). Each video clip was presented individually with the rating scale below the clip and participants rated the video on a 7-point scale based on how often they felt the sign appears in everyday conversation (1 = very infrequently, 7 = very frequently). Participants were asked to rate the model’s sign rather than their own sign, if their sign happened to be different. If participants were unfamiliar with a sign, they were asked to check a box labeled cannot rate because do not know the sign (this box was checked for only 1.5 % of the total responses). If participants encountered a technical difficulty (e.g., a video failed to load), they were asked to check a box labeled cannot rate because of technical difficulties. Technical difficulties were rare (only 0.5 % of video clips). Participants were permitted to take breaks within sections of the survey, as well as between the survey sections. However, participants were required to complete each batch within two weeks.

To obtain a measure of the internal validity of the participants’ frequency ratings across survey sections (four surveys, each divided into three sections), we included a small number of repeated signs in each survey section. The same ten signs were repeated for batch A and B, and five of these signs were repeated in batches C and D. Ratings for the five repeated signs were consistent across sections and did not differ statistically (F (11, 216) = 1.8, p = .053, η p 2 = .06). Participants’ first rating and subsequent rating for the five repeated signs also did not statistically differ (F (1, 427) = 3.7, p = .06, η p 2 = .01), indicating that participants rated repeated signs consistently across the survey. Only first-time ratings for these repeated signs were included in ASL-LEX.

In addition to providing frequency judgments, participants were asked to provide an English translation for a subset of signs (N = 211). Signs were included in this subset when the expected English translation had a very low log10 word frequency score (<2.0) or when either pilot testing or native signer intuition suggested that the sign might be misperceived as another similar sign or that the sign may have more than one English translation. The signs for which English translations were requested were evenly distributed across the survey sections (roughly 20 % of signs in each section). For each sign in this subset, participants provided English translations by typing into a response box provided on the screen below the rating scale, immediately after rating the sign for frequency. If a participant indicated that they did not know the sign, any translation attempt was not counted.

For signs in this subset, the most frequent English translation (dominant translation) provided by participants was used to determine the Entry Identifier used in the database (see below). The percent agreement for the English translation for these signs is given in ASL-LEX for all participants and separately for native signers. If a participant provided more than one translation of a sign, only their initial response was used to calculate the percentage of dominant and non-dominant translations. All additional translations other than the initial translations and their counts are listed in a separate file entitled “English_Translations.csv”.

In some cases, participants provided English translations that were inflectionally related. Morphological inflections for aspect (e.g., SURF and SURFING), number (FLOWER and FLOWERS), or gender (WAITER and WAITRESS) were collapsed together for estimating the English translation consistency. Following Bates et al. (2003), we defined morphological alteration as “variation that shares the word root or a key portion of the word without changing the word’s core meaning” (p. 7). The breakdown of percentages for the translation variants is listed in the English Translations tab. For example, percentage agreement for the sign SURF (verb) is listed as 83.9 %, and this percentage reflects the combination of inflectional variants SURF (54.8 %) and SURFING (29.0 %). This breakdown of percentages is listed in the English Translations tab, along with a list of the non-dominant glosses, which for SURF were SKATEBOARD (9.7 %), RIDE (3.2 %), and SURFER (3.2 %). If a participant provided more than one translation for a sign, the additional translation(s) is also provided in the English Translations tab.

Iconicity rating procedure

Instructions were adapted from Vinson et al. (2008) and customized for use with non-signing participants (see Appendix). Instructions were presented in spoken English in a video with examples of ASL signs across the iconicity spectrum, and the instructions were also available in written English. Each clip was presented individually with the English translation and rating scale located below the clip, and participants rated the video on a 7-point scale based on how much the sign “looks like what it means” based on its English translation (1 = not iconic at all, 7 = very iconic). If participants encountered a technical difficulty (e.g., a video failed to load), they were asked to check a box labeled technical issues (could not rate). Participants were also able to check a box labeled prefer not to respond. Technical difficulties and abstaining responses were rare (only 0.2 % of video clips).

Because a different set of participants rated each survey section, all participants rated a set of five or ten “catch” signs in order to ensure that ratings were consistent across groups of participants. Ratings for these catch signs were consistent (did not differ statistically) across sections (F(12, 1558) = 0.88 p = 0.57, η p 2 = .0005) and participants (F(381, 1558) = 0.205 p = 1.00, η p 2 = .048). To further verify the accuracy of these ratings, an additional ten signs were added to each survey that were mislabeled (e.g., participants were asked to rate the iconicity of the sign GUESS when given “screwdriver” as its English translation). A Wilcoxon Rank Sum test revealed that the mislabeled signs were rated as less iconic (Mdn mislabeled = 1) than the properly labeled signs (Mdn correctlabel = 3, W = 79,808,358, p < 0.0001). This result indicates that participants made rational judgements about the relationship between sign forms and meanings, and did not rate all videos as highly iconic.

Phonological transcription procedures

Two ASL students independently coded the Major Location, Selected Fingers, and Sign Type for each sign. A hearing native signer (NC) checked all of these codes and arbitrated any disagreements. The hearing native signer coded all of the signs for Minor Location, Flexion, and Movement. To check for reliability once all of the signs were coded, a randomly selected subset of roughly 20 % of the signs (200 items) were also coded by a different hearing ASL (non-native) signer. Cohen’s Kappa tests showed that all properties were rated reliably (ϰ signtype = 0.82, ϰ majorlocation = 0.83, ϰ minorlocation = 0.71, ϰ selectedfingers = 0.90, ϰ flexion = 0.75, ϰ movement = 0.65; all p’s < 0.01).

ASL-LEX properties

Sign identification

Two kinds of glosses were generated for each sign: Entry Identifiers (EntryID, Column A) were designed to uniquely identify every video in the database, and Lemma Identifiers (LemmaID, Column B) were designed to identify each lemma in the database grouping together phonological and inflectional variants. The purpose of these glosses is to make ASL-LEX compatible with a machine-readable corpus of ASL (e.g., as the controlled vocabulary), and allow for comparisons between the items in ASL-LEX and corpora. First, EntryIDs were single English words that were evocative of the canonical meaning of the target sign. Where participants provided an English translation the dominant translation was used as the EntryID, otherwise EntryIDs were generated by a native deaf signer and evaluated by another native hearing signer (NC). For four pairs of signs, one English word was deemed the best gloss for both members of the pair (e.g., “fall” was used to identify a sign referring to the event of losing balance and the autumn season). In these cases, a number was appended to the gloss (e.g., fall_1 and fall_2). LemmaIDs, also referred to as ID Glosses, were selected according to Johnson (2014) and Fenlon, Cormier, and Schembri (2015). Each LemmaID is an English word that is used to refer to all phonological and inflectional variants of a single lemma. ASL-LEX currently includes only 14 lemmas that have more than one entry, but this category will become increasingly important as ASL-LEX expands and as corpora are developed. It is important to note that the primary purpose of EntryIDs and LemmaIDs is to uniquely identify each video and lemma in the database. As such, they may not be accurate translations, particularly because meanings can change with context. Furthermore these identifiers cannot be reliably used to ascertain the lexical class of the sign.

Frequency

For each sign entry, ASL-LEX provides the mean, standard deviation, and the Z score for ASL frequency ratings from all participants, along with the number of raters and the percentage of participants who did not know the sign (columns C–G). Z scores were calculated over each participant. The data for native signers only are given in columns H-L of the database. The percent agreement with the English translations (EntryIDs) for all participants and for native signers is provided in columns N and O, respectively. Signs that were not selected for translation are left blank. The log10 word frequency of the English translation (from SUBTLEXUS) for each sign is provided in column T.

Iconicity

For each sign, ASL-LEX provides the mean iconicity rating, standard deviation, and the Z-score for ratings from hearing participants, along with the number of raters for each sign (columns P-S). Z-scores were calculated over each participant, normalizing for differences in how individuals used the rating scale.

Lexical information

The lexical class is listed for each ASL sign in the database (column U). There are 605 nouns, 186 verbs, 108 adjectives, 23 adverbs, and 78 closed-class items (conjunctions, prepositions, interjections, pronouns). Lexical class was coded by two native signers trained in linguistics who judged the most common use of each sign. This information should be interpreted with caution because in many cases the lexical class of a sign depends on the context in which it is used. Whether a sign is a compound, an initialized sign, or a fingerspelled loan sign is indicated in columns V-X respectively. Fingerspelled loan signs are those that include more than one letter of the manual alphabet (#STAFF includes the manual letters S and F, #BACK includes all four manual letters). An initialized ASL sign contains a single handshape that represents the first letter of the corresponding English word for that sign. For example, the ASL sign WATER is signed with a “W” handshape touching the chin. Lexicalized fingerspelled signs are not included in the initialized signs subset. There are 60 compounds, 126 initialized signs, and six fingerspelled loan signs in ASL-LEX.

Sign length (onset and offset) and clip length

As the video clips were created to elicit frequency and iconicity judgments and were not designed for use as stimuli in psycholinguistic experiments, the onsets and offsets of the clips vary due to differences in editing procedures. Therefore, we have included timing information for the sign onset and offset within each video clip, along with the sign and clip lengths in milliseconds (columns Y - AB). Sign onset was defined as the first video frame in which the fully-formed handshape contacted the body for body-anchored or two-handed signs (e.g. ACCOUNTANT, BUTTERFLY). If the sign did not have contact (e.g. DRINK), sign onset was defined as the first video frame in which the fully formed handshape arrived at the target location near the body or in neutral space before starting the sign movement. Sign offset was defined as the last video frame in which the hand contacted the body for body-anchored or two-handed signs (e.g., BRACELET). If the sign did not end with contact (e.g. BOOK), the offset was defined as the last video frame before the hand(s) began to transition to the rest position. When no clear onset frame was present in the video clip because there was no initial hold (e.g., FIND), sign onset was coded from the first frame in which the fully formed handshape appeared. These criteria for determining sign onset and offset are very similar to those used by Johnson and Liddell (2011) and by Crasborn, Bank, Zwitserlood, van der Kooij, de Meijer, & Sáfár (2015). Agreement for sign onset coding among three independent coders for 205 signs (20 % of the data) was 91.2 %. Agreement for sign offset between two independent coders for these same signs was 87.3 %. All coders were hearing ASL signers.

Phonological coding

The goal of the phonological coding (columns AC – AH) was to identify the major formal properties of the signs using a theory of sign language phonology that allowed us to generate discrete values and to capture dependencies among properties. To this end, phonological coding was guided by Brentari’s Prosodic Model (Brentari, 1998) as applied to ASL, with some additions and exceptions outlined below. The Prosodic Model is an autosegmental theory of phonology, which aligns reasonably well with other prominent theories of sign language phonology, such as the Hand Tier model (Sandler, 1989) and the Dependency Model (van der Hulst, 1993). The advantage of using a phonological rather than phonetic description (Gutierrez-Sigut et al., 2015) is that the descriptions can be more easily generalized to other productions and to other signers.

Additionally, using Brentari’s model made it possible to capture a large amount of information by coding only a few properties. The Prosodic Model perhaps more so than other models (e.g., Liddell & Johnson, 1989) can be used to reduce redundancy because it describes sub-lexical properties that are predicted by other sub-lexical properties (e.g., it is not necessary to describe the specifications of the non-dominant hand if the sign is symmetrical; it is also not necessary to describe the flexion of the unselected fingers because this is predicted by the flexion of the selected fingers). The six properties described below were coded because each has substantial discriminatory power. Although these six properties do not fully describe each sign and alone are insufficient to uniquely identify all 993 signs, with only these properties it was possible to uniquely identify about half of the signs (52 % of signs were uniquely identified, and 32 % shared a phonological transcription with fewer than three other signs). These six sub-lexical properties do not uniquely identify each sign because the phonological descriptions exclude properties like thumb position, abduction, contact with the major location, non-manual markers, configuration of the non-dominant hand, and internal movements.

Sign type

Signs were coded using the four Sign Types described by Battison (1978): one-handed, two-handed and symmetrical or alternating, two-handed asymmetrical with the same hand configuration, and two-handed asymmetrical with different hand configurations (column AC). An additional category (“other”) was used to identify signs that violate either the Symmetry or Dominance Condition (Battison, 1978). The Symmetry Condition states that if both hands in a sign move, the other specifications of both hands (e.g., location, hand configuration) must be identical, and the movement must be synchronous or in 180° asynchrony. The Dominance condition states that in a two-handed sign, if only one hand moves, the inventory of non-dominant handshapes is limited to one of seven handshapes (B A S 1 C O 5). In total, 16 signs (17 entries) violated either the Symmetry or Dominance conditions.

Location

Location was divided into two categories (major and minor), following the concepts of Major Region and Minor Region proposed by Brentari’s Prosodic Model (Brentari, 1998). The Major Location of the dominant hand relative to the body comprised five possible values: head, arm, trunk, non-dominant hand, and neutral space (column AD). Though some signs—primarily compounds—are produced in multiple Major Locations, our coding reflects only the location at sign onset. Signs may or may not make contact with the Major Location (e.g., RADIO is produced near, but not touching, the head, and is coded as having a “head” location). The non-dominant hand was only considered the Major Location if the Sign Type was asymmetrical (i.e., if the non-dominant hand was stationary). The Prosodic Model states that for symmetrical/alternating signs the features of the non-dominant hand are the same as those of the dominant hand.

There are five Major Locations and each Major Location, except neutral space, was divided into eight Minor Locations (column AE). All 33 locations are listed in a separate file called “Key.csv.” Though many signs are produced in multiple Minor Locations, the coding only includes the Minor Location at sign onset.

Selected fingers

In keeping with Brentari (1998), Selected Fingers (column AF) was defined as the group of fingers that move. The Selected Fingers are coded only for the first free morpheme in compounds, and the first letter of fingerspelled loan signs. If none of the fingers moved, the distinction between selected fingers and non-selected fingers was ambiguous. In these cases, it was assumed that the non-selected fingers must either be fully open or fully closed (Brentari, 1998). If one set of fingers was neither fully extended nor fully flexed, this group of fingers was considered selected. If the ambiguity was still not resolved, the Selected Fingers were those that appeared foregrounded. The thumb was never coded as a selected finger unless it was the only selected finger in the sign.

Flexion

The selected fingers were assigned one of nine degrees of flexion from The Prosodic Model (Brentari, 1998). Flexion of the selected fingers was only coded at sign onset (column AG). The first seven degrees of flexion (coded as 1–7) roughly map on to an ordinal scale of increasing flexion (1 = fully extended), and the last two degrees of flexion are “stacked” (flexion of the selected fingers differs as in the fingerspelled letter “K”) and “crossed” (the fingers overlap as in the fingerspelled letter “R”).

Movement

The path of movement of the dominant hand through x-y-z space was coded for only one type of movement (column AH). Three categories (arc, circular, and straight) corresponded to the “path feature” from Brentari (1998). A fourth category, “back and forth,” was similar to the “zigzag” movement in the HamNoSys system (Hanke, 2004), and was used to code signs that have movements that change directions 180° at least once (e.g., IMPOSSIBLE, PIPE, TIME) and signs that have multiple direction changes of less than 180° so long as the angle of each direction changes and the length of the segment is the same (e.g., LIGHTENING, SNAKE, DOLPHIN).Footnote 3 Signs without a path movement were coded as “none” (e.g., APPLE has a wrist-twisting motion, but no path movement). Because path movements were restricted to those in which the hand changes position in x-y-z space, hand rotation and internal movements were not coded as movement. Although both types of movement are important phonological properties, they were not coded because their discriminant power appears to be lower than the six phonological properties included in ASL-LEX. However, we plan to elaborate the existing phonological transcriptions in future to capture more fine-grained distinctions between signs. Signs that did not fit any of these categories or that included more than one path movement were coded as “other” (e.g., CANCELLATION has two distinct straight path movements). The length of the movement was ignored (i.e., a straight movement could be short (e.g., ZERO) or long (e.g., NORTH). The values presented here represent the movement of the first free morpheme of the sign.

Neighborhood density

Neighborhood density for spoken language is typically defined as the number of words that differ from the target word by the substitution, insertion, or deletion of one grapheme or phoneme (Coltheart, Davelaar, Jonasson, & Besner, 1977; Luce & Pisoni, 1998). ASL-LEX includes three measurements of neighborhood density that are roughly parallel to this definition. The first (Maximal Neighborhood Density, column AI) defines neighbors as signs that share any four of the five sub-lexical properties described above. Because these five sub-lexical properties offered in ASL-LEX do not uniquely identify each sign, the neighborhood density definitions offered here differ from the traditional definitions of neighborhood density used for spoken language in that here neighbors are not necessarily true minimal pairs. The distribution for Maximal Neighborhood Density values was extremely skewed toward fewer neighbors (Mdn = 27; see Fig. 1A). The distribution of spoken English neighborhood density in the English Lexicon Project (Balota et al., 2007) is also skewed toward fewer neighbors (Min = 0, Max = 48, Mdn = 0). Signed languages are thought to have unusually small numbers of neighbors relative to spoken languages (true minimal pairs are extremely rare; van der Kooij, 2002), so Maximal Neighborhood Density may not fully capture the phonological structure of the lexicon. For this reason, an additional neighborhood density measure (Minimal Neighborhood Density, column AJ) was added that defines neighbors as signs that overlap in at least one feature of any kind with the target. The median Minimal Neighborhood Density is 780 (see Fig. 1B). Because Minimal Neighoborhood Density includes quite distant neighbors, the distribution is skewed toward more neighbors. There is also a ceiling on the Minimal Neighborhood Density, i.e., all the signs in the lexicon. A third neighborhood density measure – Parameter-Based Neighborhood Density – was included because it most closely reflects tendencies in the signed language literature to focus on three phonological parameters (movement, location, and handshape). Parameter-Based Neighborhood Density defines neighbors as those that share all four of these phonological properties: Movement, Major Location, Selected Fingers, and Flexion (these last two properties constitute the “handshape” parameter) (Mdn = 3; see Fig. 1C).

Fig. 1
figure 1

Frequency distribution of neighborhood density measurements. Note that the axes are not the same

Sub-lexical frequency

The neighborhood density measures described above calculate the number of shared sub-lexical properties irrespective of the type of property (i.e., location, movement, handshape). However, much of the linguistic work on sign languages has focused on the relationship between signs that share a particular sub-lexical feature (e.g., location) and the “neighborhood density” for that sub-lexical feature (e.g., location neighborhood density, handshape neighborhood density; Baus, Gutiérrez-Sigut, Quer, & Carrieras, 2008; Baus, Gutiérrez & Carreiras, 2014; Corina & Emmorey, 1993; Corina & Hildebrandt, 2002; Corina & Knapp, 2006; Dye & Shih, 2006). ASL-LEX offers several measures that are akin to these “one shared feature” neighborhood density measures. However, when neighbors are defined as signs that share only one sub-lexical property, neighborhood density is almost identical to the frequency of that sub-lexical property in the language (sub-lexical frequency values are always one larger than neighborhood density, as neighborhood density excludes the target sign from the count) . For this reason, we will refer to these types of measurements as sub-lexical frequency (e.g., major location frequency) rather than neighborhood density (e.g., major location neighborhood density).

For each of the six sub-lexical properties, ASL-LEX includes a sub-lexical frequency measurement which is a count of the number of signs that are specified for that phonological property. Because previous research has looked at relationships among signs that share the same handshape, one additional measurement was created to estimate handshape frequency in which handshapes were defined as unique combinations of selected fingers and flexion. Using this measure, ASL-LEX includes 26 unique handshapes.

Results and discussion

In order to examine the structure of the ASL lexicon that is represented in ASL-LEX, we conducted a number of analyses. First, we describe the distribution of sign frequency and compare the frequency ratings in ASL-LEX to frequency ratings in other datasets (one of ASL and one of BSL). We then describe the distribution of iconicity, phonological properties, and neighborhood density. Because in spoken language many lexical properties are correlated with one another, we ask how the lexical properties in ASL-LEX are related to one another (e.g., are sign frequency and neighborhood density correlated)? Lastly, we asked whether lexical properties (e.g. frequency) influence the duration of the signs in ASL-LEX.

Frequency

Frequency ratings were distributed evenly across the scale (Fig. 2A). As examples, STETHOSCOPE (M = 1.333), EMPEROR (M = 1.407), and CASTLE (M = 1.579) were rated among the least frequent signs in ASL-LEX, and WATER (M = 6.963), YOU (M = 6.889), and ME (M = 6.76) were rated among the most frequent signs.

Fig. 2
figure 2

Frequency histograms showing the distribution of raw iconicity ratings [A] raw frequency ratings [B] of signs in ASL-LEX

We conducted a comparison of subjective frequency estimates from ASL-LEX and another independent dataset that used the same 1–7 rating scale for ASL signs, although the ratings were from deaf ASL signers residing in Canada (Mayberry et al., 2014). We verified that a total of 297 items shared the same sign form in both datasets. The raw frequency ratings in the two datasets were moderately correlated (r s = 0.65, p < .001), suggesting good external validity. We also conducted a cross-linguistic comparison between raw subjective frequency estimates for a subset of 226 ASL and BSL signs from Vinson et al. (2008) that had translation equivalents in English (same rating scale). The results revealed a moderate correlation (r s = 0.52, p < .001), suggesting that signs expressing similar concepts in two different sign languages (evidenced by the same English translation) tend to receive similar frequency estimates. Using Fisher’s r-to-z transformation, we found that the ASL-ASL correlation was stronger than the ASL-BSL correlation (z = 2.24, p = 0.0251). In addition, raw frequency ratings were moderately correlated with log10 word frequencies of their English translations from SUBTLEXUS (r s = 0.58, p < .001). The ASL-ASL correlation did not statistically differ from the ASL-English correlation (z = 1.69, p = 0.091).

Interestingly, Bates et al. (2003) found similar lexical frequency correlations among seven spoken languages (Mean r = .51; SD = .079), even in typologically diverse languages such as English and Mandarin Chinese (r = .53). Bates et al. (2003) hypothesize that the relatively high consistency of ratings across languages may be related to shared cultural experiences. This would be a likely explanation for the finding that within the same language and/or culture (ASL-ASL; ASL-English), the correlation was stronger than across languages and cultures (ASL-BSL). There might be concepts that are more culturally relevant to deaf communities and their spoken counterparts in North America than in the UK, and this could influence the frequency with which some words/signs are used in each language, as well as participants’ familiarity with the concepts themselves.

Iconicity

Iconicity ratings were skewed towards the lower end of the scale (Fig. 2B), indicating that signs contained in ASL-LEX were generally considered to have low iconicity values. Although we selected the signs with the intention of achieving a normal frequency distribution, contra Vinson et al. (2008) we did not select signs with a target iconicity distribution (i.e., specifically selecting signs at both ends of the iconicity distribution). Our results thus indicate that lexicalized ASL signs tend to be of low iconicity when frequency is normally distributed. BOOK (Mean Iconicity = 6.684) and ZIPPER (M = 6.394) are among the most iconic signs in ASL-LEX, and YESTERDAY (M = 1.086) and LAZY (M = 1.567) are among the least iconic signs.

Phonological properties

The distribution of phonological properties can be seen in Fig. 3. The “neutral” minor location was by far the most frequent (N = 345), the next four most frequent locations were palm (N = 92), otherFootnote 4 (N = 76), chin (N = 60) and eye (N = 60). The remaining 28 minor locations were represented by 50 or fewer signs. The values of minor locations can be found in the Key.csv file at OSF (http://osf.io/53vmf/).

Fig. 3
figure 3

Frequency distribution of phonological properties in ASL-LEX. For selected fingers, m = middle finger, r = ring finger, p = pinky, i = index finger. For Sign Type, HS = handshape

The distributions of phonological properties in ASL-LEX roughly match other available estimates for ASL. Estimates based on the Stokoe system (Stokoe, 1965) have found that handshapes that select all four fingers (B, A, C, and 5) are among the most frequent, followed by those that select the index finger (G, X, L); those that select other fingers are less common (Klima & Bellugi, 1979; Henner, Geer, & Lillo Martin, 2013). Focusing only on single selected fingers, Ann (1996) also found that for both ASL and Taiwanese Sign Language, the index finger was used more than the thumb, followed closely by the pinky finger. All of these patterns are reflected in our data on selected fingers (Fig. 3, top left panel). The most common handshapes in the Stokoe system use fully extended flexion positions, with other positions being less common (Klima & Bellugi, 1979; Henner, Geer, & Lillo Martin, 2013). This too is reflected in our flexion counts (Fig. 3, bottom left panel) where signs with fully extended flexion account for 49 % of the database.

Relationships among lexical and phonological properties

Next we examined relationships among the lexical and phonological properties of the signs in ASL-LEX to gain insight into how phonological, lexical, and semantic factors interact in the ASL lexicon. Frequency and iconicity z-scores (SignFrequency(Z) and Iconicity(Z)) were significantly negatively correlated with each other (see Table 1), with more frequent signs rated as less iconic; however, this relationship was weak, r s = –0.14, p < 0.001. Although it is possible that this inverse correlation is driven by the relatively higher frequency of closed-class words which may be lower in iconicity than other signs, the negative correlation remains when closed-class words (i.e., words with a “minor” Lexical Class) are excluded (r s = –0.17, p < 0.001). This result is compatible with the early proposal that with frequent use, signs may move away from their iconic origins, perhaps due to linguistic pressures to become more integrated into the phonological system (Frishberg, 1975). Interestingly, the direction of this relationship was the opposite of that found for British Sign Language; that is, Vinson et al. (2008) reported a weak positive correlation between frequency and iconicity: r = .146, p < .05. These different results might be due to cross-linguistic differences in the properties of the BSL and ASL lexicons. Alternatively, the different correlations might be due differences in stimuli selection. Vinson et al. (2008) intentionally selected stimuli that had a range of iconicity values which resulted in a bimodal iconicity distribution while we did not select signs for inclusion in ASL-LEX based on their iconicity.

Table 1 Spearman correlations among continuous lexical properties

A number of phonological properties are highly correlated and in many cases this is due to the way they are defined (see Table 1). For example, each major location is comprised of one or more minor locations—high frequency minor locations will thus almost invariably be found in higher frequency major locations, and handshape frequency is similarly related to selected finger and flexion frequency. Likewise, all three measures of Neighborhood Density are highly correlated with one another partially because they are similarly defined and partially because any neighbors that share four of the five sub-lexical properties (Maximal Neighborhood Density) will necessarily also share one of five sub-lexical properties (Minimal Neighborhood Density). Finally, all three Neighborhood Density measures are correlated with each of the sub-lexical frequency measures. This makes sense given that by definition, common sub-lexical properties appear in many signs.

Interestingly, the basic sub-lexical frequencies are completely uncorrelated with each other, with the exception of selected fingers and minor location which are significantly but weakly correlated (r = .10, p < .01). This finding suggests that the space of possible ASL signs is rather large as each sub-lexical property can (to a first degree of approximation) vary independently of the others. This property contrasts with spoken languages where phoneme frequency is correlated across different syllable positions. For example, using position-specific uniphone frequencies from Vitevitch and Luce (2004) we estimate that in English monosyllabic words, vowel frequency is negatively correlated with the frequency of the preceding consonant (r = –.07, p < .001) and positively correlated with the following consonant (r = .17, p < .001), and that onset consonants have highly correlated frequencies (r = –.51, p < .001). We speculate that the relative independence of ASL sub-lexical features is related to both the motoric independence of the manual articulators (e.g., finger flexion is unaffected by the location of the hand in signing space) as well as the relative simultaneity of manual articulation (as opposed to serial oral articulation). We note that these non-significant correlations are for sub-lexical frequency only; specific sub-lexical properties have been argued to co-vary systematically (e.g., signs produced in locations far from the face may be more likely to be symmetrical, two-handed, and have larger, horizontal, and vertical motions; Siple, 1978).

Another interesting finding is that signs with many neighbors tend to be more iconic (see Table 1). One explanation for this finding is that signs with many neighbors are constructed from more typical sub-lexical properties (e.g., all four fingers, in neutral space), and these typical sub-lexical properties may be more amenable to iconicity. For example, one of the ways that lexical items can be iconically motivated is by demonstrating the way something is handled (Padden et al., 2013), and these handling configurations may be most compatible with more typical sub-lexical properties. For example, a grasping action, as in the signs CANOE, HAMMER, and PULL, recruits very common sub-lexical properties: all four fingers in the neutral location. Another example of a systematic relationship between semantics and phonology involves articles of clothing. For example, signs like PANTS, DRESS, and SKIRT are all fairly iconic and are produced with relatively common sub-lexical properties: the locations are all on the body (depicting where the clothes are worn), and recruit all four fingers fully extended. More research is needed to better understand the relationship between neighborhood density and iconicity.

There is a small correlation between frequency and neighborhood density (for both Parameter-based (r s = 0.11, p < 0.001) and Minimal Neighborhood Densities (r s = 0.13, p < 0.001)), such that high frequency signs tend to have many neighbors. This correlation is similar in size and direction to spoken language (e.g., Frauenfelder, Baayen, Hellwig & Schreuder, 1993; Landauer & Streeter, 1973). This result suggests that words that occur frequently are also those that are more phonologically confusable with other items in the lexicon. This correlation and the others reported in Table 1 underscore the need to be aware of (and possibly control for) collinearity among lexical variables in psycholinguistic research.

Duration

We conducted exploratory analyses of the relation between the duration of signs and lexical properties. The duration data were derived from the single signer who articulated the signs for ASL-LEX. She produced the signs at a natural signing rate and as consistently as possible across all signs. However, because the productions were not designed with the intention to measure articulatory length, these preliminary analyses should be interpreted with caution. We found a weak negative correlation between raw sign frequency and sign duration (as determined by sign onset and offset, see above), indicating that more frequent signs are shorter (r s = –.25; p < .001). This trend is consistent with work on spoken languages showing that word frequency is inversely related to phonetic duration, though the correlation in spoken language is generally weaker (Bell et al, 2009; Caselli, Caselli, & Cohen-Goldberg, 2015; Cohen-Goldberg, 2015; Gahl et al., 2012).

Though a number of studies have found that neighborhood density predicts word duration in spoken language (Caselli et al., 2015; Gahl, 2008; Gahl et al., 2012), we find no such relationship here between any of the neighborhood density measures and sign duration (see Table 1). While this result may reflect differences in lexical access in the signed and spoken modalities, the duration data come from a single signer and may not be generalizable. More work is needed to investigate whether a lack of relationship between frequency and neighborhood density is an artifact of a single signer producing citation-form signs or reflects a true linguistic difference between signed and spoken languages.

The data also revealed a correlation between raw iconicity ratings and sign duration, (r s = .11; p < .001). This finding suggests that iconic signs are longer, though frequency may be a confounding variable because in ASL-LEX, less frequent signs are longer and also more iconic.

Using ASL-LEX

The entire database is available for download as a set of CSV files through the Open Science Framework (http://osf.io/53vmf/). CSV is a non-proprietary plain-text format that can be opened by any text editor and many common statistical programs such as R. The primary data file is “SignData.csv”, which lists the lexical properties for each sign. Each sign’s neighbors are provided in a separate file, “Neighbors.csv”, and English translation data for a subset of the signs are available in “EnglishTranslation.csv”. In addition to the averaged frequency and iconicity values listed in “SignData.csv”, raw trial-level data (data from each rater for each sign) are provided in “FrequencyTrialData.csv” and “IconicityTrialData.csv”, respectively. Finally, the file “Key.csv” provides an explanation of the various terms used in the other data files.

The files containing the data reported here have been archived using the OSF’s “Registrations” feature which creates a permanent, uneditable copy of each file. This ensures that although the database may be updated and expanded in the future, these data will always be available in their current form, maximizing reproducibility. To access archived versions of the database, click on the “Registrations” button at the top of the ASL-LEX OSF page.

The data are also available for browsing and searching from the ASL-LEX website, http://asl-lex.org. As depicted in Fig. 4A, signs are represented visually by nodes. Larger nodes indicate signs with higher subjective frequency. Signs are organized into parameter-based neighborhoods by connecting signs that are neighbors (those that share selected fingers, flexion, movement, and major location). This organization was chosen because these parameters are commonly used in the literature on sign languages, and are more likely to be useful to researchers and educators. Additionally, under this definition neighborhoods are fully connected (signs in a given neighborhood are all neighbors with one another) making the organization more visually intelligible. Users can filter the visualization to only view signs with particular properties (e.g., a filter showing only signs that select the index finger is applied in Fig. 4B). By selecting a specific node, it is possible to view all of the information about that sign (see Fig. 4C). Users can also download all of the data for either the entire database, or for the database as filtered (excluding the videos).

Fig. 4
figure 4

Screenshots of http://asl-lex.org. The entire lexicon can be seen in [A], the lexicon filtered so that only the signs that select the index finger can be seen in [B], and the information for the sign FURNITURE can be seen in [C]

Conclusion

With 45 properties coded for 993 signs, ASL-LEX is the largest and most complete publicly available repository of information about the ASL lexicon to date. It offers detailed lexical and phonological information about these signs, such as frequency, iconicity, phonological composition, and neighborhood density. ASL-LEX is intended to provide a platform for future investigations into the structure and organization of the ASL lexicon. ASL-LEX also provides a much needed tool for psycholinguistic researchers to better select stimuli, create tightly controlled studies, and ask questions that would otherwise be difficult to answer. It also provides a critical tool for sign language phonologists and may prove useful for to linguists interested in understanding how signed phonological systems interact with semantic and iconic pressures. ASL-LEX can also be used by educators and early intervention specialists to identify and support children struggling with vocabulary. For example, it can be used in much the same way that the Dolch (1936) and Fry (1957) lists of high frequency English words have been used to identify children who are unable to recognize the most common words (i.e., sight words), and to track progress toward vocabulary milestones. ASL-LEX can also be used to promote signed phonological awareness of the formal properties of signs. For example, an educator who wishes to develop an ASL poetry lesson could use ASL-LEX to identify signs that rhyme with one another (i.e., signs that are phonological neighbors). This type of educational lesson is important because phonological awareness of the structure of signs has been shown to predict English reading proficiency in deaf signing children (McQuarrie & Abbott, 2013).

Although no ASL corpora currently exist, ASL-LEX has been designed so that it could be a complementary tool once a corpus is developed. As a lexical database, there are several important differences between ASL-LEX and a true sign language corpus. Whereas a corpus would have many tokens of each sign type, each entry in ASL-LEX is unique. The LemmaIDs have been included so that data from a corpus could be easily matched with the entries in ASL-LEX. For example, the LemmaIDs could be used as the controlled vocabulary for a corpus of ASL. Though we have made some effort to include a diverse set of lexical signs, the signs were selected and not generated from spontaneous language use. As such, without a corpus there is no way to ensure that the items in ASL-LEX are representative of ASL. Indeed, we have intentionally excluded or minimized some classes of signs (e.g., classifier constructions, modified verbs, lexicalized fingerspellings). Neighborhood density estimates and the frequency distributions may differ if calculated over a corpus of spontaneous signing. Nevertheless, robust frequency counts require relatively large corpora (i.e., millions of tokens), much larger than those currently available for sign languages. Until large-scale sign language corpora are available, subjective frequency may be preferable to corpus counts for psycholinguistic research.

The neighbors as defined in ASL-LEX may vary in how intuitively similar they are. This may in part be because neighborhoods are calculated using a phonological description of one signer’s rendition of the sign. Different signers or different productions of signs may be more or less similar than the tokens coded in ASL-LEX. A corpus is needed to capture this kind of variation. Variation in similarity may also arise in part because neighborhood density is calculated over a “bare bones” phonological description, which means that though neighbors overlap in the coded properties they may or may not differ on uncoded phonological properties. Variation in similarity may also arise in part because neighborhood density is calculated over an incomplete, “bare bones” phonological description, which means that though neighbors overlap in the properties coded in the database, they may or may not differ on uncoded phonological properties. For example, the pairs APPLE and ONION and APPLE and UNDERSTAND are both classified as maximal neighbors under the current transcription system, meaning they share 4/5 features. However, APPLE and ONION are more closely related to each other than APPLE and UNDERSTAND because they share features that are not yet coded in the database (e.g., internal movement, orientation, or contact). Nevertheless, the objective, continuous measures of phonological similarity in ASL-LEX serve as a starting place because although maximal neighbors may vary in similarity, closely related signs will not reside in different neighborhoods. As phonological properties are added to the transcriptions, neighborhoods should more closely align with signer intuitions.

We are working to expand ASL-LEX to include additional signs and properties. In the immediate future, we plan to add to the number of signs and to include additional lexical and phonological properties. In sum, ASL-LEX offers a more complete picture of the ASL lexicon than ever before, and we hope that this publically available and searchable database will prove useful to both researchers and educators.