UMLS (Unified Medical Language system)
The UMLS was designed by the US National Library of Medicine(NLM) in the year 1986. The main purpose of UMLS is to connect different national and international terminologies, thesaurus, data bases, classifications and vocabularies through Mapping. It provides such a mapping structure among these vocabularies and thus allows one to translate among the various terminology systems. It can also be looked upon as a comprehensive thesaurus and ontology of biomedical concepts. UMLS further provides facilities for natural language processing. It is also intended to be used mainly by developers of systems in medical informatics.
UMLS consists of the following components:
- UMLS Metathesaurus, the core database of the UMLS, a collection of concepts and terms from the various controlled vocabularies, and their relationships;
- Semantic Network, a set of categories and relationships that are being used to classify and relate the entries in the Metathesaurus;
- SPECIALIST Lexicon, a database of lexicographic information for use in natural language processing;
- a number of supporting software tools(example: MetamorphoSys).
Purpose of UMLS:
The UMLS is not an end-user application. The purpose of UMLS is to facilitate the development of computer systems that behave as if they understand the meaning of the language of biomedicine and health. The US National Library of Medicine produces and distributes the UMLS Knowledge Sources (databases) and associated software tools (programs) for use by system developers in building or enhancing electronic information systems that create, process, retrieve, integrate, and/or aggregate biomedical and health data and information, as well as in informatics research.
Acessing UMLS:
UMLS offers several powerful tools to navigate and to view the Subsets. They are:
1. Navigation by means of RFF browsers
2. Database queries (after generating database by means of load file)
3. Text editor
The Rich Release Format (RRF) Subset Browser
The RRF Subset browser provides users with an easy way to search local subsets, print concept reports, and view raw file information for a specific concept. It is included in MetamorphoSys on the UMLS DVD or available for download from the UMLSKS. The RRF Subset browser can only access Rich Release Format subsets.
Users can search for a concept by:
- Concept number, for example (C0027051)
- String (with right truncation), for example (heart att*)
- By code, for example (D003924, the MeSH code for Diabetes Mellitus, Non-Insulin-Dependent)
- Using a tree browser to search the hierarchy
Clicking on an item from the search results list will display the concept information. Users can view concept date in the easily printed report view or the raw records view.
Users can filter the results of their searches by semantic type, content view flag, or source and change the display of the data using the 'Restrict Searches and Views' option from the options drop down menu. Other options include limiting the maximum number of child terms displayed and restricting a search string and the results to a specific language.
Table Structure, Data and their Relations:
MRFILES
This table contains information about the entities and the number of its entries.
| MRFILES |
|
Description |
| FIL |
varchar(50) |
Physical FILENAME |
| DES |
varchar(200) |
Descriptive Name |
| FMT |
text |
Comma separated list of column names (COL), in order |
| CLS |
int unsigned |
Number of COLUMNS |
| RWS |
int unsigned |
Number of ROWS |
| BTS |
bigint |
Size in bytes in this format (ISO/PC or Unix) |
Sample Records
MRCOC.RRF|Co-occurringConcepts|CUI1,AUI1,CUI2,AUI2,SAB,COT,COF,COA,CVF|9|13939548|786509996|
MRSTY.RRF|Semantic Types|CUI,TUI,STN,STY,ATUI,CVF|6|1146352|64528811|
MRREL
This table contains information about the relationship of two Concepts among themselves. All these records hold 2 CUIs(CUI1 and CUI2), to which two Concepts are linked to each other.
| MRREL |
|
Description |
| CUI1 |
char(8) NOT NULL |
Unique identifier of first concept |
| AUI1 |
varchar(9) |
Unique identifier of first atom |
| STYPE1 |
varchar(50) NOT NULL |
The name of the column in MRCONSO.RRF that contains the identifier used for the first concept or first atom in source of the relationship |
| REL |
varchar(4) NOT NULL |
Relationship of second concept or atom to first concept or atom |
| CUI2 |
char(8) NOT NULL |
Unique identifier of second concept |
| AUI2 |
varchar(9) |
Unique identifier of second atom |
| STYPE2 |
varchar(50) NOT NULL |
The name of the column in MRCONSO.RRF that contains the identifier used for the second concept or second atom in the source of the relationship |
| RELA |
varchar(100) |
Additional (more specific) relationship label (optional) |
| RUI |
varchar(10) NOT NULL |
Unique identifier of relationship |
| SRUI |
varchar(50) |
Source asserted relationship identifier, if present |
| SAB |
varchar(20) NOT NULL |
Abbreviated source name of the source of relationship |
| SL |
varchar(20) NOT NULL |
Source of relationship labels |
| RG |
varchar(10) |
Relationship group |
| DIR |
varchar(1) |
Source asserted directionality flag |
| SUPPRESS |
char(1) NOT NULL |
Suppressible flag. Values = O, Y, E, or N |
| CVF |
int unsigned |
Content View Flag |
Sample Records
C0002372|A0022284|AUI|RB|C0002371|A0022279|AUI| |R01983351| |MSH|MSH|| |N| |
C0002372|A0022284|AUI|SY|C0002372|A0062352|AUI| |R18851331| |MSH|MSH|| |N| |
MRCONSO
This table lists all Concepts, along with their String, Lexicals and atom.
| MRCONSO |
|
Description |
| CUI |
char(8) NOT NULL |
Unique identifier for concept |
| LAT |
char(3) NOT NULL |
Language of term |
| TS |
char(1) NOT NULL |
Term status |
| LUI |
char(8) NOT NULL |
Unique identifier for term |
| STT |
varchar(3) NOT NULL |
String type |
| SUI |
char(8) NOT NULL |
Unique identifier for string |
| ISPREF |
char(1) NOT NULL |
Atom status - preferred (Y) or not (N) for this string within this concept |
| AUI |
varchar(9) NOT NULL |
Unique identifier for atom - variable length field, 8 or 9 characters |
| SAUI |
varchar(50) |
Source asserted atom identifier [optional] |
| SCUI |
varchar(50) |
Source asserted concept identifier [optional] |
| SDUI |
varchar(50) |
Source asserted descriptor identifier [optional] |
| SAB |
varchar(20) NOT NULL |
Abbreviated source name (SAB) |
| TTY |
varchar(20) NOT NULL |
Abbreviation for term type in source vocabulary |
| CODE |
varchar(50) NOT NULL |
Most useful source asserted identifier, or a Metathesaurus-generated source entry identifier |
| STR |
text NOT NULL |
String |
| SRL |
int unsigned NOT NULL |
Source restriction level |
| SUPPRESS |
char(1) NOT NULL |
Suppressible flag. Values = O, E, Y, or N |
| CVF |
int unsigned |
Content View Flag |
Sample Records
C0001175|ENG|P|L0001175|VO|S0010340|Y|A0019182| |M0000245|D000163|MSH|PM|D000163|
Acquired Immunodeficiency Syndromes|0|N| |
C0001175|ENG|S|L0001842|PF|S0011877|N|A2878223|103840012|62479008| |SNOMEDCT|PT|62479008|AIDS|4|N| |
C0001175|ENG|P|L0001175|VC|S0354232|Y|A2922342|103845019|62479008| |SNOMEDCT|SY|62479008|
Acquired immunodeficiency syndrome|4|Y| |
C0001175|FRE|P|L0162173|PF|S0226654|Y|A0248753| | | |INS|MH|d000163|SIDA|3|N| |
C0001175|RUS|P|L0904943|PF|S1108760|Y|A1165232| | | |RUS|MH|D000163|SPID|3|N| |
MRDEF
This table contains the exact definitions of the Concepts, some atoms along with the SOURCE vocabulary.
| MRDEF |
|
Description |
| CUI |
char(8) NOT NULL |
Unique identifier for concept |
| AUI |
varchar(9) NOT NULL |
Unique identifier for atom |
| ATUI |
varchar(10) NOT NULL |
Unique identifier for attribute |
| SATUI |
varchar(50) |
Source asserted attribute identifier [optional-present if it exists] |
| SAB |
varchar(20) NOT NULL |
Abbreviated source name (SAB) of the source of the definition |
| DEF |
text NOT NULL |
Definition |
| SUPPRESS |
char(1) NOT NULL |
Suppressible flag. Values = O, E, Y, or N |
| CVF |
int unsigned |
Content View Flag |
Sample Records
C0001175|A0019180|AT15060425||MSH|An acquired defect of cellular immunity associated with infection by the human immunodeficiency virus (HIV), a CD4-positive T-lymphocyte count under 200 cells/microliter or less than 14% of total lymphocytes, and increased susceptibility to opportunistic infections and malignant neoplasms. Clinical manifestations also include emaciation (wasting) and dementia. These elements reflect criteria for AIDS as defined by the CDC in 1993.|N||
C0001175|A0021048|AT14042185||CSP|one or more indicator diseases, depending on laboratory evidence of HIV infection (CDC); late phase of HIV infection characterized by marked suppression of immune function resulting in opportunistic infections, neoplasms, and other systemic symptoms (NIAID).|N||
C0001175|A0021055|AT18420297||PDQ|Acquired immunodeficiency syndrome. An acquired defect in immune system function caused by human immunodeficiency virus 1 (HIV-1). AIDS is associated with increased susceptibility to certain cancers and to opportunistic infections, which are infections that occur rarely except in individuals with weak immune systems.|N||
MRCOLS
This table contains the abbreviations and the names of the column headers, derived from it, to all other tables.
| MRCOLS |
|
Description |
| COL |
varchar(20) |
Column or data element name |
| DES |
varchar(200) |
Descriptive Name |
| REF |
varchar(20) |
Documentation Section Number |
| MIN |
int unsigned |
Minimum Length, Characters |
| AV |
numeric(5,2) |
Average Length |
| MAX |
int unsigned |
Maximum Length, Characters |
| FIL |
varchar(50) |
Physical FILENAME in which this field occurs |
| DTY |
varchar(20) |
SQL-92 data type for this column |
Sample Records
AUI|Unique identifier for atom||8|8.00|8|MRCONSO.RRF|char(8)|
CODE|Unique Identifier or code for string in source||1|6.4|21|MRCONSO.RRF|varchar(50)|
MRSTY
This table contains the semantic information on the Concepts.
| MRSTY |
|
Description |
| CUI |
char(8) NOT NULL |
Unique identifier of concept |
| TUI |
char(4) NOT NULL |
Unique identifier of Semantic Type |
| STN |
varchar(100) NOT NULL |
Semantic Type tree number |
| STY |
varchar(50) NOT NULL |
Semantic Type. The valid values are defined in the Semantic Network |
| ATUI |
varchar(10) NOT NULL |
Unique identifier for attribute |
| CVF |
int unsigned |
Content View Flag |
Sample Record
C0001175|T047|B2.2.1.2.1|Disease or Syndrome|AT17683839||
How to retrieve data from Database:
To retrieve Details about a concept the following sql query is used:
select LUI,SUI,AUI,SAB,STR from MRCONSO where cui='<CUI>'
To retrieve Definitions about a concept the following sql query is used:
select a.AUI as AUI,b.STR as STR,a.DEF as DEF from MRDEF a,MRCONSO b where a.CUI= '<CUI>' AND a.AUI=b.AUI
To retrieve Semantics about a concept the following sql query is used:
select sty from MRSTY where cui='<CUI>'
To retrieve Relations about a concept the following sql query is used:
select a.CUI1 as CUI1,b.STR as STR,a.CUI2 as CUI2,a.REL as REL,a.RELA as RELA from MRREL a,MRCONSO b where a.cui2='<CUI>' AND a.CUI1=b.CUI
How are Relations Stored?
The Metathesaurus includes many relationships between different concepts. Most of these relationships come from individual source vocabularies. Some are added by NLM during Metathesaurus construction. Some have been contributed by Metathesaurus users to support certain types of applications.
Relationships are expressed in terms of CUIs and AUIs. Metathesaurus relationship files do not include concept names.
Relations are stored like Child, Parent, Sibling, Related narrow, Related broad, Alike terms, Related other, Co-occuring, Synonym, Is-a, Allowed qualifier.
Every relationship present in the Metathesaurus has a unique relationship identifier (RUI). The primary purpose of these identifiers is to enable easy detection of changes in relationships across versions of the Metathesaurus.
References:
- New Users' Guide to the UMLS, http://www.nlm.nih.gov/research/umls/new_users.html
- UMLS Load Scripts, http://www.nlm.nih.gov/research/umls/load_scripts.html
- Unified Medical Language System (UMLS) Documentation, http://www.nlm.nih.gov/research/umls/documentation.html
- Unified Medical Language System (UMLS) Teil I (Reference: http://www.people.imise.uni-leipzig.de/barbara.heller/Folien/Folien-Vorlesung-UMLS-Teil-I.pdf)
- Unified Medical Language System (UMLS) Teil II (Reference: http://www.people.imise.uni-leipzig.de/barbara.heller/Folien/Folien-Vorlesung-UMLS-Teil-II.pdf)
- Using JDBC with MySQL, Getting Started, http://www.developer.com/java/data/article.php/3417381#Installation_and_Preparation_of_MySQL