Nomenclature for naming loci, alleles, linkage groups, and chromosomes to be used in poultry genome publications and databases.

ChickMaP HomePage| About ChickMAP| Submissions| News| Database| Links| Participants| Meetings
Chromosome evolution paper

Contents

Introduction.

This draft is based on discussions of the draft of April 23, 1993 reviewed by the Poultry Species Committee, of the National Animal Genome Research Program (NAGRP) at its meeting in St. Louis on May 5, 1993. A revised draft (May 25,1993) was reviewed at an open meeting of the NAGRP, Poultry Committee at the 1993 Poultry Science Meetings. At the September 18, 1993 meeting of the NAGRP, Poultry Committee an International Nomenclature Committee of Bitgood, Boichard, Burt, Crittenden and Ponce de Leon was suggested, with a Resource Panel to help rename classical genes given in Somes (1988) now being revised by Bitgood.

This outline is aimed at suggesting nomenclature for use in Journal Publications and in the international chicken genome database (CHICKbase) being developed in Roslin (Edinburgh), UK by David Burt and his colleagues using G-Base developed at the Jackson Laboratory and other potential databases. The Tables appended are aimed at illustrating the use of the nomenclature and not the structure of G-Base as modified for maintenance of chicken genome data. In certain cases some aspects of the nomenclature may be different for publication in Journals, where further information cannot be readily called up from the database, and in a database where it can be readily located. For example, footnotes may be used in publications but further fields should be provided in a database. (However, memo fields for unique information may be needed in a database program.)

Some text is copied from Shows et al. (1987) with modifications to fit the Poultry Community's needs.

Naming Loci and Alleles.

1. Classical loci catalogued by Somes (1980).

The human locus and allele nomenclature outlined by Shows et al. (1987) will be adopted. Thus, present standard nomenclature will be converted to the new nomenclature and the new nomenclature used for naming any new identified genes. The new terminology will be much more adaptable for use in computer databases, and will appear as entered on non-graphics screens, except for italics. The Somes nomenclature should be directly convertible to the new nomenclature in many cases by the Resource Panel appointed by the Nomenclature Committee. All new names should be reviewed for duplication and standard nomenclature.

The following is modified from Shows et al. (1987), pages 12-15 to reflect poultry-specific aspects of nomenclature and to use known genes of the chicken as examples.

Gene Symbols

  1. Genes are designated by upper-case Latin letters or by a combination of upper-case letters and Arabic numbers. Since symbols should be short to be useful and should not attempt to indicate all known information about a gene, a total of three characters to designate gene names is optimal; it is recommended that no more than five characters be used except for coded anonymous loci which can have eight. Based on classical genetic guidelines, gene symbols always are either underlined or italicized. Gene symbols need not be italicized in catalogs of known genes. When fragments or synthesized segments of genes are referred to, symbols need not be italicized. New symbols must not duplicate existing gene symbols. Examples: PO (polydactyly); MM7 (micromelia VII); GPDA (alpha glycerol phosphate dehydrogenase-liver); HBB (hemoglobin, beta polypeptide).

  2. The first letter should be the same as that of the name of the gene to facilitate alphabetical listing and grouping.

  3. The initial character should always be a letter. Subsequent characters of the symbol may be other letters or, if necessary, Arabic numerals.

  4. All characters in a gene symbol should be written on the same line; thus, no superscripts or subscripts may be used.

  5. No Roman numerals may be used. Roman numerals in previously used symbols should be changed to the Arabic equivalents.

  6. Greek letters are not permitted in a gene symbol. All Greek symbols should be changed to letters in the Latin alphabet.

  7. A Greek letter prefixing a gene name must be changed to its Latin alphabet equivalent and placed at the end of the gene symbol. This permits alphabetic ordering of the gene in listings with similar properties such as substrate specificities. Examples: HBA (alpha globin); HBB (beta globin).

  8. Where gene products of similar function are encoded by different genes, the corresponding loci are designated by Arabic numerals placed immediately after the gene symbol, without any space between the letters and numbers used. Example: PA2, PA3 (two loci for pre albumin). However, single-letter suffixes may be used to designate these different loci only if they exist historically. Example: ADEA, ADEB (two loci for adenine synthetase).

  9. A final character in the gene symbol may be used to specify a characteristic of the gene. While letters to specify tissue distribution have been used historically, Arabic numbers are now preferred as experience has shown that tissue specificity may not be as restricted as described initially.

  10. If the name of a gene contains a character or property for which there is a recognized abbreviation, the abbreviation should be used; for example, the single-letter abbreviation for amino acids used in aminoacyl residues or approved biochemical abbreviations such as GLC for glucose and GSH for glutathione.

Allele Symbols

  1. The allele symbol should be limited to four characters, with an optimum of three characters. Only capital letters or Arabic numerals in any order should be used.

  2. Allele designations are written on the same line as gene symbols.

  3. The allele characters are separated from the locus characters by a new symbol--the asterisk--which serves to combine gene and allele symbols. There should be no spaces between gene, asterisk, and allele, and the entire symbol should be underlined or italicized. In order to keep the gene and allele designations separated but together, a new character, the asterisk, has been introduced. Advantages of the asterisk are many. The asterisk is convenient, universal, and does not convey past genetic meaning such as the dash, space, or comma. The asterisk preceding a symbol indicates that it is an allele of a gene. Likewise, an asterisk following a symbol indicates that it is a gene. After the gene and allele symbols have been identified, the allele symbol preceded by an asterisk can be used separately in text.

    For example: OV*A, OV*B (for alleles at the ovalbumin locus); EAA*1, EAA*7 (for haplotypes of the blood group A system).

  4. The allele symbol may convey additional information. The first allele in a series may be designated A or 1. The symbol may convey a morphological characteristic, biochemical property, cellular location, control property, or, ultimately, the amino acid nucleotide substitution (e.g., HBB*6V). No normal plus (+) symbol or variant minus (-) symbol, Roman numeral, or Greek symbol should be used. If the name of a geographic location is used in designating an allele, it should be limited to no more than a four-character symbol. If an allele exhibits no activity, this is indicated by an O (capital letter O). For optimal usage, allele symbols should be brief and need not summarize all information known about their genetic specificity.

  5. If the information regarding the genetic specificity is too complex to be conveyed conveniently in a symbol (e.g., kinetic properties, amino acid substitutions, or subcellular localization), alleles may be designated by letter or number and the information conveyed in tables.

  6. Dominance, recessiveness, and wild type as these terms have been used for classical genes are not addressed in Shows et al. presumably because these terms describe the phenotype and not the genotype. We suggest that no symbols denoting dominance or recessiveness be used in the allele symbol, but that tables of genes contain a column stating the dominance relationships of the alleles observed. Difficulty with dominance arises with multiple alleles. We suggest that new allele symbols for currently named genes retain a letter that corresponds to the phenotype observed or use a new one to serve that purpose. Although we prefer to stay away from the wild-type designation, in some cases it may be useful to use N for the normal allele. For example the W locus could have *Y and *W alleles for yellow and white skin. In contrast the sex-linked DW locus could have alleles *N for normal and *D for dwarf as well as the currently used *B and *M alleles. Table 1 illustrates examples of how this nomenclature could be used for some classical genes.

Printing Gene and Allele Symbols

Gene and allele symbols are underlined in manuscripts and italicized in print. Italics need not be used in catalogs. It may be convenient in manuscripts, computer printouts and in printed text to designate a gene symbol by following it with an asterisk (e.g., EAA*). When only allele symbols are displayed, they can be preceded by an asterisk. For example, for EAA*1, the allele is printed as *1.

  1. Heterozygote for common alleles at the EAA locus:

       EAA*1
       -----   or   EAA*1/EAA*2   or   EAA*1/*2
       EAA*2
    
    
    
  2. Genotype of an individual heterozygous at the EAA locus, homozygous at the EAB locus, and heterozygous at the EAP locus (all unlinked loci are separated by a semicolon; see below):
    
       EAA*1     EAB*1     EAP*1
       -----;    -----;    -----
       EAA*2     EAB*1     EAP*2
    
                  or
    
       EAA*1/EAA*2; EAB*1/EAB*1; EAP*1/EAP*2
    
                  or
    
       EAA*1/*2; EAB*1/*1; EAP*1/*2
    
    
    
  3. Genotypes for sex-linked traits distinguish between males and females. At the the growth hormone receptor locus (GHR), genotypes for heterozygous male and hemizygous females follow a similar pattern.
    
      Males:  GHR*A
              -----
              GHR*B
    
                or   GHR*A/GHR*B   or   GHR*A/*B
    
    Females:  GHR*A
              -----   or   GHR*A/W
                W
    
    

    (The W identifies the female and maintains the diploid nature of the symbol.)

Linkage and Phase

Horizontal lines or slashes separate alleles and indicate chromosome location.

  1. Loci not located on the same chromosome are separated by a semicolon:
       I*C  EAJ*1
       ---; -----  or  I*C/I*W; EAJ*1/EAJ*2  or  I*C/*W; EAJ*1/*2
       I*J  EAJ*2
    
    
    
  2. Loci on the same chromosome (linked or syntenic), where the phase is known, are joined by a horizontal line but separated by a space and listed in alphabetical order when gene order is not known. For text, the loci can be printed on a single line, with a space separating genes in phase and a slash indicating different homologs:
    
       EAJ*1 SE*N
       ----------
       EAJ*2 SE*S
    
    

    For text, the loci can be printed on a single line, with a space separating genes in phase and a slash indicating different homologs:

    
       EAJ*1 SE*N/EAJ*2 SE*S.
    
    
  3. Loci on the same chromosome but phase not known are separated by a comma:
    
       EAJ*1,SE*N
       ----------
       EAJ*2,SE*S
    
    
    or printed on a single line with a separating comma:
    
       EAJ*1/EAJ*2,SE*N/SE*S,.
    
    
    
  4. If the linear order and phase of the genes on the same chromosome are known, they are listed in order from the end of the short arm to the end of the long arm of the chromosome and separated by a space:
    
       EAH*1 SE*N EAJ*1
       ----------------
       EAH*2 SE*S EAJ*2
    
               or
    
       EAH*1 SE*N EAJ*1/EAH*2 SE*S EAJ*2.
    
    

    The linear order on chromosome 1 is pter-EAH-SE-EAJ-cen.

  5. If the gene order on the same chromosome is not known, then the loci are listed on the linear map alphabetically, separated by a comma, and enclosed by parentheses:

    pter-SE-EAJ-(EV1,O,P,)-cent.

2. Loci detected by DNA probes.

The use of DNA probes adds another level of nomenclature to the system; the probe name. Probe names cannot be used directly for locus symbols because one probe often detects polymorphisms at more than one locus, and the laboratory probe name may not even reflect the name of the cloned gene and is often long and complex. No attempt to standardize probe names will be made at this time.

  1. Loci detected by anonymous DNA sequences.

    Such loci have no known physiologic function and can be detected by Restriction Fragment Length Polymorpism (RFLP) using random genomic or cDNA library members as probes, or by Polymerase Chain Reaction (PCR) using arbitrary primers or primers derived from anonymous cloned sequences.

    Such loci will be named by each Laboratory defining them by segregation analysis or locating them by chromosomal in-situ hybridization using a laboratory code of not more than three upper case letters and sequential Arabic numbers of not more than three digits. The naming of alleles will follow the nomenclature for classical genes as outlined by Shows et al. (1987) quoted above. Expressed genes, such as those detected by cDNA library members, that have no known function shall be followed by an uppercase E (eg. COM110E). Note that the locus symbols exceed the allele limit of five suggested for named genes. However, allele symbols should be short so that the total symbol can be less than 12 letters or digits.

    This system does not contain an embedded chromosome number or other information on the type of probe as does the Human system. However, the advantages are that a unique name can be assigned to the locus by the typing laboratory which does not have to be changed with chromosome assignment, or assigned by the database manager or a committee. However, the probe should be renamed once it is shown to contain coding sequences for a named gene product (See the next section). Further information about each probe will be available in the original publication and in supplementary Tables that can be called up in a database.

    The locus name can be clarified in publications by adding a code for the type of probe in upper case letters in parentheses. F for RFLP, A for RAPD, R for CR1, M for microsatellite, and V for minisatellites are suggested. These letters will not be considered part of the official name and will not be included in the database, but are optional in Journal publication for clarity and should be footnoted.

  2. Loci detected by DNA sequences that represent coding sequences for a known gene product.

    These loci should be named in uppercase letters and numbers that reflect the name of the gene product. Three letters or numbers are preferred but five should be maximum. The name should begin with a letter that reflects the first letter of the gene product and numbers should be used when necessary. The general rules for naming loci and alleles should follow Shows et al. (1987) as modified above. All names should be reviewed against a central list for standard nomenclature and duplication maintained by the nomenclature committee. When genes are homologous to human genes, or have strong evidence for homology, the gene name should be the same as the human gene listed in the latest "Catalog of Mapped Genes", (Mc Alpine et al., 1993).

    A gene can consist of coding and non-coding regulatory and intron sequences. The general location of a specific gene on the genetic map can be found using probes representing coding or non-coding sequences. Thus, the gene can be considered a haplotype. Therefore, the gene name should be used for the locus symbol on genetic maps whether the probe represents a coding sequence or not. However, the anonymous nature of the probe should be clearly retained in publications and databases, and its anonymous locus name should be used in fine structure mapping. This is a rather confusing distinction and further discussion can be found in the attached excerpt from the 8/30/93 version of mouse "Rules and Guidelines for Gene Nomenclature".

Naming of Chromosomes and Linkage Groups.

Autosomes will be numbered in descending order by size. The sex chromosomes will not be numbered but called Z and W. Ideally, all mapped loci should be assigned to chromosomes, but very few linkage groups are now assigned to chromosomes. Therefore, the classical linkage groups should be designated in Roman numerals as assigned by Bitgood and Somes (1990, 1993). The linkage groups assigned in the Compton and East Lansing reference populations are not associated in many cases and will be called C01-nn and E01-nn until chromosomal assignments can be achieved. It may be necessary, before all linkage groups are assigned to chromosomes, to develop a distinct system of naming common linkage groups between the East Lansing and Compton maps that have not been assigned to chromosomes.

Microchromosomes, defined as chromosomes smaller than chromosome 8, Z and W, will be temporarily named for the first gene that is assigned to them by in-situ hybridization. Any gene linked to that locus by physical or genetic means, will be considered to be on that microchromosome.

Chromosomal and Physical Mapping Nomenclature.

A standard banding nomenclature was discussed at the North American Colloquium on Domestic Animal Cytogenetics and Gene Mapping held in Guelph, Ontario, July 13-16, 1993. Standard banding nomenclature for the Z, W and the eight largest autosomes was agreed upon and a publication should be submitted within the next few months. Such standardization is necessary for the integration of physical and genetic maps. Genes that are assigned to a unique location in the genome can be named as outlined above even though Mendelian segregation has not yet been detected. As physical mapping progresses nomenclature for expanded DNA fragments or contigs will need to be addressed.

Responsibility for naming loci and alleles.

Loci will be named by the Laboratory that first conducts the genetic segregation analysis or assigns a gene to a specific chromosomal location by in-situ hybridization. Alleles will be named by the Laboratory that first conducts the segregation analysis defining that allele. The allele symbol should be confined to four characters. The alleles may be arbitrary numbers or letters or may convey some meaning. Numbers or letters can be added for multiple alleles. In cases where classical allele symbols have specific meaning they can be retained if they do not exceed four sysbols and adhere to the rules stated above. The population in which the allele was found should be stated. This is particularly important for alleles defined by molecular probes which may have large numbers of detectable alleles.

References.

Mc Alpine, P. J., 1993. The 1992 calalog of mapped genes and report of the nomenclature committee. Genome Priority Reports 1: 11-142. (Alphabetical lists of human gene symbols and names)

Bitgood, J.J., and R.G. Somes Jr., 1990. Linkage relationships and gene mapping. Pages 469495 in: Poultry Breeding and Genetics. Crawford, R.D., ed. Elsevier, Amsterdam. (Classical Map summary)

Bitgood, J.J., and R.G. Somes Jr., 1993. Gene map of the chicken (Gallus Gallus or G. domesticus). Pages 4,333-4,342 in Genetic Maps, 6th ed., S.J. O'Bren, ed. Cold Spring Harbor Laboratory Press, Plainview, NY.

Shows, T.B., P.J. McAlpine, C. Boucheix, F.S. Collins, P.M. Conneally, J. Frezal, H. Gershowitz, P.N. Goodfellow, J.G. Hall, P. Issitt, C.A. Jones, B.B. Knowles, M. Lewis, V.A. McKusick, M. Meisler, N.E. Morton, P. Rubenstein, M.S. Schanfield, R.D. Schmickel, M.H. Skolnick, M.A. Spence, G.R. Sutherland, M. Traver, N. Van Cong, and H.F. Willard, 1987. Guidelines for human gene nomenclature: An international system for human gene nomenclature (ISGN, 1987). Cytogenet Cell Genet 46: 1128. (Human Gene Nomenclature)

Somes, R.G.Jr., 1980. Alphabetical list of the genes of domestic fowl. J Hered 71: 168174.

Somes, R.G.Jr., 1988. International registry of poultry genetic stocks. Bulletin 476, Storrs Agricultural Experiment Station. (Updated gene list and maps)

This page was last modifed on 26 November 1999 By Irene Black
Converted to HTML by Andy Law from an original Draft by Lymen B. Crittenden
This page is maintained at Roslin by Irene Black

© Copyright Roslin Institute 1999