BACKGROUND: Our previously published CUDA-only application PaSWAS for Smith-Waterman (SW) sequence alignment of any type of sequence on NVIDIA-based GPUs is platform-specific and therefore adopted less than could be. The OpenCL language is supported more widely and allows use on a variety of hardware platforms. Moreover, there is a need to promote the adoption of parallel computing in bioinformatics by making its use and extension more simple through more and better application of high-level languages commonly used in bioinformatics, such as Python.RESULTS: The novel application pyPaSWAS presents the parallel SW sequence alignment code fully packed in Python. It is a generic SW implementation running on several hardware platforms with multi-core systems and/or GPUs that provides accurate sequence alignments that also can be inspected for alignment details. Additionally, pyPaSWAS support the affine gap penalty. Python libraries are used for automated system configuration, I/O and logging. This way, the Python environment will stimulate further extension and use of pyPaSWAS.CONCLUSIONS: pyPaSWAS presents an easy Python-based environment for accurate and retrievable parallel SW sequence alignments on GPUs and multi-core systems. The strategy of integrating Python with high-performance parallel compute languages to create a developer- and user-friendly environment should be considered for other computationally intensive bioinformatics algorithms.
DOCUMENT
To obtain large-scale sequence alignments in a fast and flexible way is an important step in the analyses of next generation sequencing data. Applications based on the Smith-Waterman (SW) algorithm are often either not fast enough, limited to dedicated tasks or not sufficiently accurate due to statistical issues. Current SW implementations that run on graphics hardware do not report the alignment details necessary for further analysis.
DOCUMENT
Background: G-protein coupled receptors (GPCRs) are involved in many different physiological processes and their function can be modulated by small molecules which bind in the transmembrane (TM) domain. Because of their structural and sequence conservation, the TM domains are often used in bioinformatics approaches to first create a multiple sequence alignment (MSA) and subsequently identify ligand binding positions. So far methods have been developed to predict the common ligand binding residue positions for class A GPCRs.Results: Here we present 1) ss-TEA, a method to identify specific ligand binding residue positions for any receptor, predicated on high quality sequence information. 2) The largest MSA of class A non olfactory GPCRs in the public domain consisting of 13324 sequences covering most of the species homologues of the human set of GPCRs. A set of ligand binding residue positions extracted from literature of 10 different receptors shows that our method has the best ligand binding residue prediction for 9 of these 10 receptors compared to another state-of-the-art method.Conclusions: The combination of the large multi species alignment and the newly introduced residue selection method ss-TEA can be used to rapidly identify subfamily specific ligand binding residues. This approach can aid the design of site directed mutagenesis experiments, explain receptor function and improve modelling. The method is also available online via GPCRDB at http://www.gpcr.org/7tm/. © 2011 Sanders et al; licensee BioMed Central Ltd.
DOCUMENT
Jaarlijks worden in Nederland ongeveer 600.000 mensen ziek door het eten van besmet voedsel. De voedselverwerkende industrie heeft sterke behoefte aan meer grip op het bewaken van de hygiëne in de fabrieken om te voorkomen dat besmette producten in de winkels komen. In het afgeronde RAAK-mkb project “Precision Food Safety” is onderzocht wat de meerwaarde is van de toepassing van Whole Genome Sequencing (WGS) bij het achterhalen van de transmissieroutes van de pathogene bacterie Listeria monocytogenes bij voedselverwerkende bedrijven. Er is een biobank opgebouwd met bijna 600 L. monocytogenes stammen afkomstig van de fabrieksomgeving en producten van vis-, vlees- en groente-verwerkende bedrijven. Deze stammen zijn gesequenced met behulp van Nanopore sequencing. Vervolgens is de verwantschap tussen de stammen bepaald met een in het project ontwikkelde bioinformatica pijplijn. Het project bleek zeer succesvol. In “Advanced Precision in Food Safety ” wordt het onderzoek naar voedselveiligheid verbreed, door L. monocytogenes al aan het begin van de voedselverwerkingsketen (in grondstoffen en ingrediënten) te monitoren. Verder zal de WGS-methodiek worden toegepast op Salmonella enterica en zal de huidige bioinformatica pijplijn worden aangepast om transmissieroutes van dit andere belangrijke voedselpathogeen te achterhalen. Ter verdieping zal het ziekteverwekkende karakter van L. monocytogenes stammen worden bepaald op basis van het serotype en de aanwezigheid van ~60 beschreven virulentiegenen. Daarbij worden gegevens uit verschillende databases, met sequence data van zowel humane als niet humane stammen, met elkaar vergeleken. Zowel in het laboratorium als in de fabrieksomgeving zal het effect van verschillende schoonmaakmiddelen en schoonmaaktechnieken worden onderzocht op het elimineren van L. monocytogenes van oppervlaktes. Tevens wordt onderzocht of shotgun metagenomics analyse kan worden ingezet om voedsel snel en breed op voedselpathogenen te monitoren. Een prototype van een webapplicatie, waarmee bedrijven verkregen resultaten kunnen inzien en aanvullen zal verder worden ontwikkeld en door voedselverwerkende bedrijven worden getest en geïmplementeerd.
Alcohol Use Disorder (AUD) involves uncontrollable drinking despite negative consequences, a challenge amplified in festivals. ARise is a project using Augmented Reality (AR) to prevent AUD by helping festival visitors refuse alcohol and other substances. Based on the first Augmented Reality Exposure Therapy (ARET) for clinical AUD treatment, ARise uses a smartphone app with AR glasses to project virtual humans that tempt visitors to drink alcohol. Users interact in a safe and personalized way with these virtual humans through phone, voice, and gesture interactions. The project gathers festival feedback on user experience, awareness, usability, and potential expansion to other substances.Societal issueHelping treatment of addiction and stimulate social inclusion.Benefit to societyMore people less patients: decrease health cost and increase in inclusion and social happiness.Collaborative partnersNovadic-Kentron, Thalamusa
Huntington’s disease (HD) and various spinocerebellar ataxias (SCA) are autosomal dominantly inherited neurodegenerative disorders caused by a CAG repeat expansion in the disease-related gene1. The impact of HD and SCA on families and individuals is enormous and far reaching, as patients typically display first symptoms during midlife. HD is characterized by unwanted choreatic movements, behavioral and psychiatric disturbances and dementia. SCAs are mainly characterized by ataxia but also other symptoms including cognitive deficits, similarly affecting quality of life and leading to disability. These problems worsen as the disease progresses and affected individuals are no longer able to work, drive, or care for themselves. It places an enormous burden on their family and caregivers, and patients will require intensive nursing home care when disease progresses, and lifespan is reduced. Although the clinical and pathological phenotypes are distinct for each CAG repeat expansion disorder, it is thought that similar molecular mechanisms underlie the effect of expanded CAG repeats in different genes. The predicted Age of Onset (AO) for both HD, SCA1 and SCA3 (and 5 other CAG-repeat diseases) is based on the polyQ expansion, but the CAG/polyQ determines the AO only for 50% (see figure below). A large variety on AO is observed, especially for the most common range between 40 and 50 repeats11,12. Large differences in onset, especially in the range 40-50 CAGs not only imply that current individual predictions for AO are imprecise (affecting important life decisions that patients need to make and also hampering assessment of potential onset-delaying intervention) but also do offer optimism that (patient-related) factors exist that can delay the onset of disease.To address both items, we need to generate a better model, based on patient-derived cells that generates parameters that not only mirror the CAG-repeat length dependency of these diseases, but that also better predicts inter-patient variations in disease susceptibility and effectiveness of interventions. Hereto, we will use a staggered project design as explained in 5.1, in which we first will determine which cellular and molecular determinants (referred to as landscapes) in isogenic iPSC models are associated with increased CAG repeat lengths using deep-learning algorithms (DLA) (WP1). Hereto, we will use a well characterized control cell line in which we modify the CAG repeat length in the endogenous ataxin-1, Ataxin-3 and Huntingtin gene from wildtype Q repeats to intermediate to adult onset and juvenile polyQ repeats. We will next expand the model with cells from the 3 (SCA1, SCA3, and HD) existing and new cohorts of early-onset, adult-onset and late-onset/intermediate repeat patients for which, besides accurate AO information, also clinical parameters (MRI scans, liquor markers etc) will be (made) available. This will be used for validation and to fine-tune the molecular landscapes (again using DLA) towards the best prediction of individual patient related clinical markers and AO (WP3). The same models and (most relevant) landscapes will also be used for evaluations of novel mutant protein lowering strategies as will emerge from WP4.This overall development process of landscape prediction is an iterative process that involves (a) data processing (WP5) (b) unsupervised data exploration and dimensionality reduction to find patterns in data and create “labels” for similarity and (c) development of data supervised Deep Learning (DL) models for landscape prediction based on the labels from previous step. Each iteration starts with data that is generated and deployed according to FAIR principles, and the developed deep learning system will be instrumental to connect these WPs. Insights in algorithm sensitivity from the predictive models will form the basis for discussion with field experts on the distinction and phenotypic consequences. While full development of accurate diagnostics might go beyond the timespan of the 5 year project, ideally our final landscapes can be used for new genetic counselling: when somebody is positive for the gene, can we use his/her cells, feed it into the generated cell-based model and better predict the AO and severity? While this will answer questions from clinicians and patient communities, it will also generate new ones, which is why we will study the ethical implications of such improved diagnostics in advance (WP6).