OGLS_LadyinCafe

Frequently Asked Questions

Here you can find answers to frequently asked questions about:

You can also look at a useful glossary.

What does a monolingual dataset from Oxford Global Language Solutions offer you?

A typical OGLS monolingual dataset offers between 30k and 150k defined terms of one language. Most of our monolingual datasets also contain additional information, such as part-of-speech information, word origins, examples, idioms, derivatives, or phonetics. This varies between datasets, due to variations between source dictionaries, so please find more details on the dataset pages, or speak to your Business Development contact for more information.
OGLS monolingual datasets are usually delivered as XML files implementing our standard data structure, however, we may be able to supply other data formats on request.
Monolingual datasets support solutions for applications such as dictionary look-up in online texts.

What does a bilingual dataset from Oxford Global Language Solutions offer you?

A typical OGLS bilingual dataset contains between 35k and 150k terms translated from one language into another. The majority of our bilingual datasets are two-way, including translations in both directions, for example from English to Chinese, as well as from Chinese to English. Bilingual dictionaries can be provided in multiple different language combinations.
OGLS bilingual datasets are usually delivered as XML files implementing our standard data structure, however, we may be able to supply other data formats on request.
Bilingual datasets support solutions for needs such as translation dictionaries when reading works in a foreign language.

What does a synonym dataset from Oxford Global Language Solutions offer you?

A typical OGLS synonym dataset groups together words and phrases that are similar in meaning to the headword. OGLS synonym datasets usually offer between 10k and 80k synonyms. Unlike a monolingual dataset, it does not usually provide definitions of the headword. Some synonym datasets may also provide examples of how some of the synonyms are used or information on nuances of meaning between synonyms or when to use one synonym in preference to another.
OGLS synonym datasets are usually delivered as XML files implementing our standard data structure, however, we may be able to supply other data formats on request. We can offer both stand-alone synonym datasets and synonyms linked to monolingual or bilingual dictionaries.
Synonym datasets support solutions for the following applications:

  • Search engines
  • Information extraction
  • Word games

What are morphology and inflections?

The morphology of a language describes the ways in which words are formed in that language. Inflection, which includes declension and conjugation, is one type of morphology. For example, in English, adding the letter ‘s’ to a noun makes it plural. This is declension. An example of a conjugation in English is adding the letters ‘ed’ to the regular verb ‘to walk’ to create the past tense.

What does a wordlist from Oxford Global Language Solutions offer you?

Our wordlists are drawn from web-crawled corpora which contain billions of words of current language. These corpora are then analysed to include frequency and part-of-speech information by Oxford’s in-house language experts, who use language engineering techniques to ensure the highest quality in the resulting wordlist. We offer three wordlist packages:

Wordlist-R(egular)
  • Format: tab-delimited TXT
  • Wordforms
  • Lemmas
  • Frequency information
Wordlist-M(edium)
  • Format: tab-delimited TXT
  • Wordforms
  • Lemmas
  • Frequency information
  • Part-of-speech information
Wordlist-L(arge)
  • Format: OUP’s proprietary XML data structure
  • Wordforms
  • Lemmas
  • Frequency information
  • Part-of-speech information

Wordlists include inflected forms, proper nouns, trademarks, acronyms and abbreviations, slang, and vulgar language. We aim to screen out misanalysed parts of speech and mark likely mispellings, although this cannot be guaranteed for web corpus-derived wordlists.
We can also supply wordlists as linked inflections datasets along with our monolingual and bilingual datasets.
Wordlists support solutions for the following applications:

  • Search engines
  • Information extraction
  • Word games

What do linked inflections from Oxford Global Language Solutions offer you?

Linked inflections are inflected forms delivered in our proprietary XML data structure, and are linked to the headwords of one of our monolingual or bilingual datasets. This type of linked data can support dictionary look-up applications. For example, if looking up the word ‘bought’ in a text, the user can be directed to the entry for ‘to buy’ to see the definition or translation.

What are n-grams?

N-grams are sequences of ‘tokens’ that occur together in a corpus. Tokens can be a variety of linguistic elements, but are usually words or word forms. The ‘n’ stands for the number of items in the sequence. For instance, the sequence ‘taken for granted’ is a tri-gram (three elements).
OGLS can supply lists of n-grams (2-, 3-, 4-, or 5-grams) with frequency information for a range of languages, all extracted from vast web corpora using state-of-the-art computational methods.
N-grams – especially when combined with frequency information – can support applications such as:

  • Predictive text
  • Information extraction
  • Word games

Solutions

We provide specialized solutions for a broad range of language technologies.Learn more

Find a content set

Search our catalogue of currently available content. Search now

Contact us

Please get in touch to discuss your content needs.Contact us