L15: Describing and identifying the terms Concordancing techniques

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/32

flashcard set

Earn XP

Description and Tags

Lecture 15 Week 9

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

33 Terms

1
New cards

Term extractor

A tool that identifies terms in a text or corpus using formal cues to generate candidate terms.

2
New cards

Candidate terms

Words or phrases identified by extractors as potential terms, requiring human validation.

3
New cards

Three parameters for term identification

Frequency in the corpus, form of the unit, or a combination of both.

4
New cards

Statistical approach

Identifies terms based on frequency patterns in the corpus.

5
New cards

Linguistic approach

Identifies terms based on grammatical structures like noun + adjective or noun + noun.

6
New cards

Hybrid approach

Combines statistical and linguistic methods for term identification.

7
New cards

TermoStat Web 3.0

A free online hybrid term extractor developed by Patrick Drouin at the University of Montreal.

8
New cards

Languages supported by TermoStat

English, French, Spanish, Italian, Portuguese.

9
New cards

Part-of-speech tagger

Built-in tool in TermoStat that identifies nouns, adjectives, and complex combinations.

10
New cards

Reference corpus in TermoStat

A large collection of newspaper articles used to compare term frequency.

11
New cards

Specificity (TermoStat)

Score based on the difference in relative frequency between the analysis corpus and the reference corpus.

12
New cards

Lemmatization

Process of reducing word forms to their base form to count all variants as a single term.

13
New cards

Candidate (grouping variant)

Column in TermoStat showing lemmatized base forms of candidate terms.

14
New cards

Variants (TermoStat)

Column showing actual forms of candidate terms found in the text.

15
New cards

TermoStat file format requirement

Only accepts .txt files for analysis, not .doc or .pdf.

16
New cards

Frequency column (TermoStat)

Shows how many times a candidate term appears in the analyzed text.

17
New cards

Score (Specificity) column

Displays specificity scores, sorted in descending order by default.

18
New cards

Pattern column (TermoStat)

Shows grammatical categories of the candidate terms.

19
New cards

Wildcard character ()

Matches any number of characters. Example

20
New cards

Wildcard character (?)

Matches a single character. Example

21
New cards

Wildcard character (-)

Matches a range of characters. Example

22
New cards

Wildcard character (#)

Matches any single numeric character. Example

23
New cards

Fuzzy search

Finds approximate matches to a term, useful for misspellings. Use ~ (e.g., bank~ finds tank, benk, banks).

24
New cards

Boolean operators

AND, OR, NOT used to combine or exclude keywords for more focused search results.

25
New cards

Full-text search

Searches across one or more text fields in the termbase, such as definitions.

26
New cards

Search in multiple termbases

Allows users to search across several termbases at once.

27
New cards

Batch search

Upload or input a list of terms to search all at once and receive a report.

28
New cards

Filters

Use Boolean logic to search by metadata like term type, creation date, or user.

29
New cards

Incomplete records

Search for entries missing a definition.

30
New cards

Entries without translation

Search for entries missing a term in a specific language.

31
New cards

Reusable filters

Filters can be saved and reused.

32
New cards

Shared filters

Filters can be shared with other users or kept private.

33
New cards

Filter statistics

Filters display how many concepts and terms match the criteria.