Business Letter Corpus
Online KWIC Concordancer
|
I
N S T R U C T I O N S F O R F I R S T - T I M E
U S E R S
Search
String
=> Enter a search string
(i.e. a morpheme, word or phrase). You may use regular expressions
as your search string (See "Regular Expressions for Beginners" on the concordancer
toppage if you need information about regular expressions.) Note that the
system will now do "case sensitive matching" by default (as of March
22, 2001). This means that the search string, thank
you, for instance, matches lower
case "thank you" only. If you want to search for all the instances of "thank
you" including the upper case "Thank you," your search string should be:
(Thank|thank) you or (T|t)hank
you. |
Search
Type
[Equal
to] => This will search for
exact matches only. e.g. "appreciate"
will match all the (lower case) instances of "appreciate"
and "would like to" will match all
the (lower case) instances of "would like to."
The string, "But"on the other hand,
matches all the instances of "But" with the capital B.
[Start
with] => This will search for words that contain the specified
search string as a prefix. e.g. "ex"
will find all the instances that start with this particular prefix such
as "example, exclude, exit, examination, etc." e.g. "appreciat"
will find instances of "appreciate, appreciated, appreciates, appreciating,
appreciation, appreciative" ("appreciate" under this search type does not
match "appreciating, appreciation, appreciative.").
[End
with] => This will search for words that contain the specified
search string as a suffix. e.g. "ing"
will match all the instances ending with "ing" such as "going, doing, seeing,
etc.
[Contain]
=> This will search for words that contain the specified search string
of any type. e.g. "-" will find
all the instances of hyphenated compounds such as "two-week, award-winning,
high-quality" and so on (but it will also match hyphens or double-hyphens
used for other purposes). Similarly, "ask"
will match all the instances of "ask, asks, asked, asking," but it also
matches "task, mask, basket," etc. |
Line
Width
=> You may choose any line
width, but the default setting (= 40 characters to both the right and left
of the search word) is recommended. Note that the current system is designed
so as not to display concordance lines beyond the sentence boundaries within
which the search string is located. (This note applies only to the BLC). |
Sort
Type
[Right]
=> Sort the output at the first word to the RIGHT of the search string.
[Left]
=> Sort the output at the first word to the LEFT of the search string.
[Unsort]
=> Print the output without any sorting. (first-in-first-out) |
Search
Corpus
=> Currently the following
corpora are available (some are still under construction). The corpora
are constantly revised and updated. Watch this
page for any new development.
| 01 |
Business
Letter Corpus (BLC, contains 1,020,060 word tokens
of U.S. and U.K. samples, as of March 1, Y2K) |
| 02 |
POS
tagged BLC (A
part-of-speech tagged version of the BLC. Click here
for the list of POS tags). |
| 03 |
Personal
Letter Corpus (PLC, contains 113,522 word tokens
of American samples, as of June 16, Y2K). |
| 04 |
POS
tagged PLC (A part-of-speech tagged version of the Persponal
Letter Corpus, as of March 11, 2001). |
| --- |
(Letters of Historic Figures) |
| 05-09 |
Personal
Letters by 19th Century Historical Figures (These four corpora
contain personal and professional letters by 19th century celebrities.
Click here
for more details, as of June 15, Y2K) |
| 10 |
Above
05 to 09 combined (contains 910,363 word tokens). |
| --- |
(Literature and Screenplays) |
| 11 |
Alice's
Adventures in Wonderland (Lewis
Carroll, 1865: 26,949 word tokens) |
| 12 |
Through
the Looking Glass and What Alice Found There
(Lewis Carroll, 1872: 29,888 word tokens). |
| 13 |
The
Adventures of Tom Sawyer (Mark
Twain, 1876: 65,942 word tokens). |
| 14 |
The
Adventures of Huckleberry Finn (Mark
Twain, 1884: 110,865 word tokens). |
| 15 |
It's
a Wonderful Life (Screenplay by Frank
Capra, 1946: 17,066 word tokens) |
| 16 |
REBECCA
(Screenplay by A.
Hitchcock, 1940: 16,062 word tokens) |
| 17 |
U.S. Journalistic Articles (2,102,749 word tokens of U.S.
journalistic articles) |
| 18-23 |
State of the Union Address (1790-2006) (1,675,566 word
tokens of U.S. Presidential Addresses from 1790 to 2006) DOWNLOAD
19. Part 1: 1790-1899 (Apprx. 905,000 words, covering the 1st to 25th Presidents.) DOWNLOAD
20. Part 2: 1900-1933 (Apprx. 341,000 words, covering T.Roosevelt, Taft, Wilson, Harding, Coolidge, and Hoover.) DOWNLOAD
22. Part 3: 1934-1969 (Apprx. 214,000 words, covering F.Roosevelt, Truman, Eisenhower, Kennedy, and Johnson. DOWNLOAD
23. Part 4: 1970-2006 (Apprx. 213,000 words, covering Nixon, Ford, Carter, Reagan, Bush, Clinton, and G.W.Bush.) DOWNLOAD |
| 24 |
Learner BLC: WM98 (209,461 word
tokens from a total of 1,464 samples of business letter written by Japanese business people. All the linguistic surface erros contained in the original
data remain as they are.) |
|
Notes:
The system will usually respond in less than 10 seconds, but could take
several minutes to complete the operation when a high frequency word is
specified or when the system/server is busy. Regular
expressions can also be used as a search string, but I can't
guarantee the accuracy of search operation (some of the possible regular
expressions are not accepted due to the particular data structure of the
corpora used in the current system. Click here
if you need information about regular expressions.). |
Back
to Concordancer Page | Back
to Someya's HOMEPAGE |