Online KWIC Concordancer

I N S T R U C T I O N S   F O R   F I R S T - T I M E   U S E R S
Search String
=> Enter a search string (i.e. a morpheme, word or phrase). You may use regular expressions as your search string (See "Regular Expressions for Beginners" on the concordancer toppage if you need information about regular expressions.) Note that the system will now do "case sensitive matching" by default (as of March 22, 2001). This means that the search string, thank you, for instance, matches lower case "thank you" only. If you want to search for all the instances of "thank you" including the upper case "Thank you," your search string should be: (Thank|thank) you or (T|t)hank you
Search Type
[Contain] => This will search for words that contain the specified search string of any type.  e.g. "-" will find all the instances of hyphenated compounds such as "two-week, award-winning, high-quality" and so on (but it will also match  hyphens or double-hyphens used for other purposes). Similarly, "ask" will match all the instances of "ask, asks, asked, asking," but it also matches "task, mask, basket," etc. This search type works well with Japanese Kana-Kanji characters and, therefore, is set as the default search type with the current program.
[Equal to]  => This will search for exact matches only. e.g. "appreciate" will match all the (lower case) instances of "appreciate" and "would like to" will match all the  (lower case) instances of "would like to." The string, "But"on the other hand, matches all the instances of "But" with  the capital B.
[Start with] => This will search for words that contain the specified search string as a prefix.  e.g. "ex" will find all the instances that start with this particular prefix such as "example, exclude, exit, examination, etc."  e.g. "appreciat" will find instances of "appreciate, appreciated, appreciates, appreciating, appreciation, appreciative" ("appreciate" under this search type does not match "appreciating, appreciation, appreciative."). 
[End with] => This will search for words that contain the specified search string as a suffix.  e.g. "ing" will match all the instances ending with "ing" such as "going, doing, seeing, etc. 
Line Width 
=> You may choose any line width, but the default setting (= 40 characters to both the right and left of the search word) is recommended. Note that the current system is designed so as not to display concordance lines beyond the sentence boundaries within which the search string is located. (This note applies only to the BLC). 
Sort Type 
[Right] => Sort the output at the first word to the RIGHT of the search string. 
[Left] => Sort the output at the first word to the LEFT of the search string. 
[Unsort] => Print the output without any sorting. (first-in-first-out) 
Search Corpus 
=> Currently the following corpora are available (some are still under construction). The corpora are constantly revised and updated. Watch this page for any new development. 
 
01 Business Letter Corpus (BLC, contains 1,020,060 word tokens of U.S. and U.K. samples, as of March 1, Y2K. Developed as part my MA project.)
02 POS tagged BLC (A part-of-speech tagged version of the BLC.  Click here for the list of POS tags). 
03 Personal Letter Corpus (PLC, contains 113,522 word tokens of American samples, as of June 16, Y2K). 
04 POS tagged PLC (A part-of-speech tagged version of the Persponal Letter Corpus, as of March 11, 2001). 
--- (Letters of Historic Figures)
05-09  Personal Letters by 19th Century Historical Figures (These four corpora contain personal and professional letters by 19th century celebrities. Click here for more details, as of June 15, Y2K) 
10  Above 05 to 09 combined  (contains 910,363 word tokens).
--- (Literature and Screenplays)
11 Alice's Adventures in Wonderland (Lewis Carroll, 1865: 26,949 word tokens)
12 Through the Looking Glass and What Alice Found There (Lewis Carroll, 1872: 29,888 word tokens).
13 The Adventures of Tom Sawyer (Mark Twain, 1876:  65,942 word tokens).
14 The Adventures of Huckleberry Finn (Mark Twain, 1884:  110,865 word tokens).
15  It's a Wonderful Life (Screenplay by Frank Capra, 1946: 17,066 word tokens)
16 REBECCA (Screenplay by A. Hitchcock, 1940:  16,062 word tokens)
17 U.S. Journalistic Articles (2,102,749 word tokens of U.S. journalistic articles) 
18-23 State of the Union Address  (1790-2006) (1,675,566 word tokens of U.S. Presidential Addresses from 1790 to 2006)   DOWNLOAD
19. Part 1: 1790-1899 (Apprx. 905,000 words, covering the 1st to 25th Presidents.)   DOWNLOAD
20. Part 2: 1900-1933 (Apprx. 341,000 words, covering T.Roosevelt, Taft, Wilson, Harding, Coolidge, and Hoover.)   DOWNLOAD
22. Part 3: 1934-1969 (Apprx. 214,000 words, covering F.Roosevelt, Truman, Eisenhower, Kennedy, and Johnson.   DOWNLOAD
23. Part 4: 1970-2006 (Apprx. 213,000 words, covering Nixon, Ford, Carter, Reagan, Bush, Clinton, and G.W.Bush.)   DOWNLOAD
24 Learner Business Letter Corpus (WM98)  (209,461 word tokens from a total of 1,464 samples of business letter written by Japanese business people. All the linguistic surface erros contained in the original data remain as they are.) 
25 Learner Essay Corpus 2010-11 ( Contain a total of 1,426 essays in English and Japanese written by Japanese college students on a total of 15 different topics. All the linguistic surface erros contained in the original data remain as they are.) 
26 Souseki Natsume Complete Works (Japanese)  (Contain all the nobels by Soseki Natsune. Japanese texts only.)
27 Naoya Shiga Selected Works (J and E)  (Contain selected works by Naoya Shiga, i.e., Jyu-ichi-gatsu-mikka no koto, Kinosaki nite, Kozou-no Kami-sama, Yamashina-no Kioku, Chijyou, Hirobata-no Sumai, Ama-gaeru, Akianishi Kakita. English translation also added.)


Notes: The system will usually respond in less than 10 seconds, but could take several minutes to complete the operation when a high frequency word is specified or when the system/server is busy. Regular expressions can also be used as a search string, but I can't guarantee the accuracy of search operation (some of the possible regular expressions are not accepted due to the particular data structure of the corpora used in the current system. Click here if you need information about regular expressions.). 

Back to Concordancer Page  |  Back to Someya's HOMEPAGE

  Last updated: May 5, 2007