1/32
Lecture 15 Week 9
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Term extractor
A tool that identifies terms in a text or corpus using formal cues to generate candidate terms.
Candidate terms
Words or phrases identified by extractors as potential terms, requiring human validation.
Three parameters for term identification
Frequency in the corpus, form of the unit, or a combination of both.
Statistical approach
Identifies terms based on frequency patterns in the corpus.
Linguistic approach
Identifies terms based on grammatical structures like noun + adjective or noun + noun.
Hybrid approach
Combines statistical and linguistic methods for term identification.
TermoStat Web 3.0
A free online hybrid term extractor developed by Patrick Drouin at the University of Montreal.
Languages supported by TermoStat
English, French, Spanish, Italian, Portuguese.
Part-of-speech tagger
Built-in tool in TermoStat that identifies nouns, adjectives, and complex combinations.
Reference corpus in TermoStat
A large collection of newspaper articles used to compare term frequency.
Specificity (TermoStat)
Score based on the difference in relative frequency between the analysis corpus and the reference corpus.
Lemmatization
Process of reducing word forms to their base form to count all variants as a single term.
Candidate (grouping variant)
Column in TermoStat showing lemmatized base forms of candidate terms.
Variants (TermoStat)
Column showing actual forms of candidate terms found in the text.
TermoStat file format requirement
Only accepts .txt files for analysis, not .doc or .pdf.
Frequency column (TermoStat)
Shows how many times a candidate term appears in the analyzed text.
Score (Specificity) column
Displays specificity scores, sorted in descending order by default.
Pattern column (TermoStat)
Shows grammatical categories of the candidate terms.
Wildcard character ()
Matches any number of characters. Example
Wildcard character (?)
Matches a single character. Example
Wildcard character (-)
Matches a range of characters. Example
Wildcard character (#)
Matches any single numeric character. Example
Fuzzy search
Finds approximate matches to a term, useful for misspellings. Use ~ (e.g., bank~ finds tank, benk, banks).
Boolean operators
AND, OR, NOT used to combine or exclude keywords for more focused search results.
Full-text search
Searches across one or more text fields in the termbase, such as definitions.
Search in multiple termbases
Allows users to search across several termbases at once.
Batch search
Upload or input a list of terms to search all at once and receive a report.
Filters
Use Boolean logic to search by metadata like term type, creation date, or user.
Incomplete records
Search for entries missing a definition.
Entries without translation
Search for entries missing a term in a specific language.
Reusable filters
Filters can be saved and reused.
Shared filters
Filters can be shared with other users or kept private.
Filter statistics
Filters display how many concepts and terms match the criteria.