تعريفات مهمة في المعلوماتية الحيوية


Genetics - the study of individual genes and their roles in inheritance and function.

Genomics - the study of the structure, function, and evolution of genomes; the collective characterization and quantification of genes.

Genome - an organism's combined genetic material, including its genes.

Metagenomics - (or community genomics) study of the combined genomes in an environment containing a mixture of organisms, i.e., It provides the profile of biodiversity in a natural sample.

Metagenome - genetic material recovered directly from an environmental sample, i.e., All genomes from all the members of the sampled community.

Microbiome - the collective genomes of all microorganisms in an environment. This is typically what is studied using ‘metagenomics.’

Transcriptomics - the study of the transcriptome, i.e., The complete set of RNA transcripts that are produced by the genome, which allows for the identification of genes that are expressed.

Antibiotic - a substance produced by microorganisms that have the capacity, in dilute solution, to selectively inhibit or kill other microorganisms.

Antimicrobial - any substance that can negatively affect microbial life, including synthetic and semi-synthetic compounds and substances without selective toxicity. Thus, all antibiotics are antimicrobials, but not vice versa.

Intrinsic Antimicrobial Resistance - innate tolerance to specific antimicrobials shared by all members of a group (defined at the species level or above).

Acquired Antimicrobial Resistance - property of bacterial strains to survive at higher antimicrobial concentrations compared to the wild-type population.

Resistome - a collection of all genes conferring resistance to antimicrobials.

16S ribosomal RNA (or 16S rRNA) gene - the 16S rRNA gene is used in reconstructing phylogenies, due to slower rates of evolution of specific regions of that gene. It has been used in early studies of environmental genomics, which revealed that the vast majority of microbial biodiversity had been missed by cultivation-based methods. 16S sequence data usually consists of clustering reads by sequence similarity into Operational Taxonomic Units (OTUs).

Operational Taxonomic Unit (OTU) - a molecular alternative to the ‘species’ concept, suited explicitly for asexual microorganisms. Organisms with a high percent DNA similarity can be grouped together in OTUs.

Mock microbial community - An artificial community of microbes, generated by mixing known microbes together. Useful for benchmarking bioinformatics methods and measuring sensitivity and specificity.

Library preparation - preparation of the genetic material into a form that is compatible with the sequencing instrument. This can include barcoding (for multiplexing) and adding adaptors, so the sequencer can read the sequence.

Insert size - A length given in base pairs that describe the length of the fragment inserted between adaptors in paired-end sequencing. The insert size thus includes the reads and is not to be confused with “inner mate-pair distance.”

Multiplexing - with multiplexing on high-throughput sequencing, it is possible to process a large number of samples without a significant increase in cost or time. Individual "barcode" sequences are added to each sample to make it distinguishable when different samples are pooled for a single sequencing run.

Long-read sequencing - sequencing that provides reads longer than 10-kilo basepair (1 kb = 1 000 bp). It is characterized by high consensus accuracy and uniform coverage, and no assembly may be required. It is, however, still associated with the high cost and time demands.

Short-read sequencing - sequencing that provides read lengths up to 600 bp. These technologies are typically parallel, making hundreds of thousands to billions of reads in a relatively short time.

Shotgun metagenomics - shotgun sequencing refers to the sequencing of random DNA fragments (think of the random pellet spread of a shotgun) in a microbial community. This is opposed to targeted amplicon sequencing, where, e.g., 16S fragments are PCR-amplified and sequenced. By preparing DNA from a sample for shotgun metagenomics, one can uncover completely novel sequences without any prior knowledge.

Next-generation sequencing (NGS) - a catchall category for all modern sequencing technologies from Roche 454’s “pyro-sequencing” onwards. NGS instruments often read many DNA fragments in parallel and have much higher throughput than previous Sanger sequencing.

De novo sequencing (or de novo assembly) - initial generation of the genetic sequence of a particular organism (when no reference genome is available for alignment/mapping). De novo assembly refers to assembling reads without a reference sequence to guide the process.

Base-calling - convert the raw signal from the sequencing instrument to nucleotide sequences (A, C, T, G). This raw signal can be, e.g., light intensities at different wavelengths (Illumina) or measured changes in electrical current (Nanopore).

Reads - continuous sequences of DNA produced when a sequencing instrument “reads” genetic material. Typically, determined whole chromosomes are not single reads as they have been stitched together from many individual reads.

Coverage - (or read depth/sequence coverage) is the average number of reads representing a given nucleotide in a reconstructed sequence, i.e., the average number of times a base is read. It can be calculated from the length of the original genome, the number of reads, and the average read length. A high coverage can prevent errors in base calling and assembly.

Read trimming - The act of trimming sequencing reads of unwanted sequence. Typically, artifact sequence without biological relevance contaminate the reads produced by a sequencing instrument, e.g., adaptor sequences. Machines also typically encode the certainty when calling each base in a read (e.g., Phred score in FASTQ format). Removing low-certainty stretches (e.g., below 99 or 99,9% certainty) is standard normal. Read trimming thus includes both ‘adaptor trimming’ and ‘quality trimming.’

Read classification - Classifying reads to reference sequences. When many reads are produced from a metagenome, classifying the reads can reveal the abundance of different organisms and genes within the sample.

Sequence assembly - merge short read fragments from a more extended DNA sequence to reconstruct the original sequence.

Genome annotation - find and designate locations of individual genes and other features present in assemblies.

Alignment - align/match reads to a reference sequence, to find the exact base to base correspondence.

Mapping - find correspondence of reads against a reference genome; mapping can be performed with or without alignment.

Consensus accuracy - when a single sequence read is mapped to a reference genome and discordant bases are found, this can represent actual biological variation or sequencing errors. Conclusions can only be drawn through averaging the discordances from multiple reads that map to the same region in the reference, i.e., by building consensus.

Contigs- Stretches of continuous, inferred DNA sequence, resulting from assembling shorter sequencing reads.

k-mers - all possible subsequences (of length k) from a read. Given a string of length L, the number of k-mers possible is L − k + 1. K-mers are typically used during sequence assembly, but can also be used in the sequence alignment.

Single Nucleotide Polymorphism (SNP) - The change of a single DNA nucleotide from one base to another. This is a type of mutation that might alter the product of the gene.

Alpha diversity - within-sample diversity. An index that indicates the richness and evenness of features (e.g., species, genes) in a sample. Many possible alpha diversity indices exist, including the Shannon and Simpson indices.

Beta diversity - between-sample diversity. Describes how several different samples are to each other when considering how many shared features they have and possibly the abundance of the features. A distance/dissimilarity matrix can describe the relationship between multiple samples.

Dissimilarity index - An index analogous to “Euclidean distance.” A dissimilarity index describes the beta diversity in a set of samples. Many possible indices exist and include the Bray-Curtis dissimilarity index.


Source:
https://www.coursera.org/learn/metagenomics/supplement/xiMPt/glossary

-----

Autosomal Dominant (AD) - describes the inheritance pattern of a genetic condition where one copy of the variant gene is inherited from one of the parents.

Autosomal Recessive (AR) - describes the inheritance pattern of a genetic condition where two copies of the variant gene are inherited (one from each parent).

Bioinformatics pipeline - also sometimes known as a workflow describes the set of steps required to convert the raw signal indicating a base pair, through to piecing the genome back together again and assigning where there are sequence variations in comparison to a reference genome.

Cloud computing - is the practice of using a network of remote servers to store, manage, and process data.

Codon - is a set of 3 bases within the mRNA that encodes for a particular amino acid.

Compound Heterozygous - is where an individual has two different recessive alleles at a specific gene locus, one on each identical chromosome.

De novo mutations - are when genetic variant occurs in an offspring but are not present in either parent.

Evolutionary Sequence Conservation (ESC) - is where sequence similarity is used as evidence of structural and functional conservation, and evolutionary relationships between sequences.

Exome Aggregation Consortium (ExAC) -  is a coalition of investigators seeking to aggregate and harmonize exome sequencing data from a variety of large-scale sequencing projects.

Exome - is the part of the genome resulting from exons that code for proteins and functional RNAs.

Exon - The genome consists of introns, non-coding areas, and exons, parts of the genome that can code for proteins or functional RNA molecules.

GenePanels - Sets of 10-100s of genes used to identify variants in the human genome linking to specific phenotypes or conditions.

Genotype - at its broadest sense, is the genetic characteristics of an individual. When referring to a particular trait, it describes the variant forms of a gene that are carried by an organism.

Incidental Finding (IF) - unexpected genetic changes found during sequencing of the genome.

in-Silico - Perform using computer modeling or simulation.

Locus specific Database (LSBD) - A database describing variants found at particular gene loci.

Multiple Sequence Alignment - is generally the alignment of two-three or more biological sequences (protein or nucleic acid) of similar length. From the output of the alignments, homology can be inferred, and the evolutionary relationships between the sequences studied.

Next-Generation Sequencing (NGS) - the process by which millions of fragments of DNA can be sequenced in parallel from the same sample.

Nonsense Mediated Decay - In cases where a premature stop codon is incorporated into the mRNA a truncated protein could be created, in this instance, this is the mechanism by which eukaryotic cells remove this protein from the cell.

Nonsense variant - is a single base change in the nucleotide sequence that causes the formation of a stop codon either forming a truncated protein or non-sense mediated decay of the transcript.

Nonsynonymous variant - is a single base change in the nucleotide sequence that changes the codon leading to the formation of alternate amino acid.

Phenotype - The set of observable characteristics or traits of an individual.

Reference Genome Sequence (RGS) - is a digital sequence assembled from sequencing the DNA from several donors.

Read - A Read is a fragment of data from the genome.

Sense variant - is a single base change in the nucleotide sequence that encodes the same amino acid, as several codons encode for the same amino acid.

Single Nucleotide Polymorphism (SNP) - is a position in the genome where other bases are commonly found amongst a population.

Single Nucleotide Variant (SNV) - is a position in an individual’s genome where an alternate base is found in the test genome relative to the reference genome.

Synonymous variant - is a single base change in the nucleotide sequence that encodes the same amino acid, as several codons encode for the same amino acid.

Splice-site - is the position of two base pairs at the intron/exon boundary by which the process of splicing occurs to produce the mature mRNA transcript from the pre-mRNA.

Variant of Unknown Significance (VOUS) - is a variation in a genetic sequence whose association with disease risk is unknown.

Whole Exome Sequencing (WES) - is the sequencing of exons only within a genome by NGS.

Whole Genome Sequencing (WGS) - is the sequencing of the entire genome by NGS.

X-linked - describes the inheritance pattern of a genetic condition that is inherited on the X chromosome; hence males will definitely inherit the disorder as they only have one X chromosome whereas females may show milder symptoms of the condition depending on which genetic disorder it is.


Source:

ليست هناك تعليقات:

إرسال تعليق