Search Header Logo
What is a corpus and that is in it

What is a corpus and that is in it

Assessment

Presentation

Other

KG - 1st Grade

Hard

Created by

ADRIANA GARZON

FREE Resource

15 Slides • 3 Questions

1

2. What is a corpus and that is in it?

By ADRIANA GARZON MOZO

master's degree in linguistics

Antioquia University ​

​McEnery, Tony & Wilson, Andrew. 1996. Corpus Linguistics. Edinburg: Edinburg University Press.

2

2.1 Corpora vs. machine- readable text

media

3

media

The term 'corpus' when used in the context of modern linguistics tends most frequently to have more specific connotations than this simple definition. These may be considered under four main headings:

2.1.1. Sampling and representativeness.

2.1.2. Finite size.

2.1.3​. Machine- readable form.

2.1.4. A standard- reference​

2.1 Corpora vs. machine- readable text:

4

​2.1.1. Sampling and representativeness.

​¨there are two options for our data collection: first, we could analyze every single utterance in that variety; or second, we could construct a smaller sample of the variety. ¨

¨consideration of Chomsky's criticism should be directed towards the establishment of ways in which a much less biased and more generally representative corpus may be constructed.¨

2.1 Corpora vs. machine- readable text:

5

2.1.2. Finite size: monitor corpus: 1.000.000 words.

2.1.3​. Machine- readable form: Machine-readable corpus can be read in a few minutes, different from printed corpus.

​​

2.1.4. A standard- reference​: Although not an essential definition, it also contributes a standard reference for the linguistic variety What it represents

2.1 Corpora vs. machine- readable text:​

(Exception to this is the London-Lund corpus, which was augmented in the mid-1970s by Sydney greenbaum to cover a wider variety of genres.pág.31).

6

Multiple Choice

What are the headings that can define a Corpus?

1

- Sampling and representativeness.

-A standard- reference

2

-Finite size.

- Machine

3

- Sampling and representativeness.

-Finite size.

- Machine- readable form.

-A standard- reference

4

-Finite size.

- Machine- readable form.

7

2.2 Text Encoding and annotation

media

8

Open Ended

Question image

What kind of information does the word "Love" represent for you?

9

love: unannotated: only text

`love_VVZ´: annotated: various types of linguistic information.

​Z: third person singular present tense.

VV: ​form of a lexical verb

10

COCOA REFERENCE

Informal trend.: authors, date, titles

2.2.1 Formats of annotation

media
media
media

TEXT ENCODING INITIATIVE (TEI)

media

​https://www.youtube.com/watch?v=4sHYDfITjHY

11

2.2.2. Types of annotation

media
media

media
media
media
media
media

12

2.2.3.linguistics annotations

media
media
media
media
media

13

2.2.3.linguistics annotations

media
media
media
media
media
media
media
media
media

14

2.3. Multilingual corpora

media

15

2,3,1 Parallel Corpora. original lenguage (L1)

2,3,2​ Translation Corpora: comparation do L1

​​​​​

media

16

​​​​​

​2,3,​2,1 why use translation Corpora?

media

17

Open Ended

What is a corpus and that is in it?

18

Adriana Garzón

" Thanks so much " 

2. What is a corpus and that is in it?

By ADRIANA GARZON MOZO

master's degree in linguistics

Antioquia University ​

​McEnery, Tony & Wilson, Andrew. 1996. Corpus Linguistics. Edinburg: Edinburg University Press.

Show answer

Auto Play

Slide 1 / 18

SLIDE