Bio Data Landscaping and Mining

Our team can curate data sets from a range of publicly available sources and databases. We can also assist in identifying suitable sources as starting points for dataset identification. In many instances, we can automate this to, for example, quickly search for co-occurrences of specific search terms in journal abstracts, or relevant metadata from GEO datasets to assist in identifying specific datasets of interest.  

Data Landscaping

There are now vast amounts of bio data available in the public domain that can be effectively mined for specific biological questions of interest. Since public domain data can be used for hypothesis generation, it can dramatically reduce the time and costs associated with wet-lab experimentation and data generation. Moreover, in many cases, public datasets provide a great resource for validation of in-house findings. While public datasets provide a rich source of data, harbouring data generated from thousands or tens of thousands of samples then accessing this data and harmonising the various sources available is a technical challenge from both a computing and scientific perspective.  

Data Mining

Once suitable data sets are identified, Novus Genomics can apply robust quality controls. This way we can provide the best starting point for downstream analysis and investigation. We can perform bio data mining by using standard or bespoke workflows. Our team also has experience with meta-analyses where data and outcomes from several studies are combined to increase the overall statistical power. 

What We Offer

Our team can curate data sets from a range of publicly available sources and databases. We can also assist in identifying suitable sources as starting points for dataset identification. In many instances, we can automate this to, for example, quickly search for co-occurrences of specific search terms in journal abstracts, or relevant metadata from GEO datasets to assist in identifying specific datasets of interest.  

Our Bio Data Mining Experience:

We have previously mined the below public datasets on behalf of clients:

  • The Cancer Genome Atlas (TCGA)
  • Cancer Dependency Map (DepMap)
  • Cancer Cell Line Encyclopaedia (CCLE) 
  • Gene Expression Omnibus (GEO)
  • European Nucleotide Archive (ENA)
  • Expression Atlas
  • Database of Immune Cell EQTLs, Expression, Epigenomics (DICE)
  • cBioPortal for Cancer Genomics 

Every time our clients work with us, they benefit from:

  • A dedicated analyst backed by an experienced team to curate all data, identify the most appropriate statistical approach to take and provide a biological interpretation of results.
  • An interactive data analysis report, internally peer-reviewed, including all analysis methods and results.
  • Post-report follow ups: upon receipt of our data analysis report, we arrange a teleconference so that our lead analyst can talk through the results.
  • Access to large capacity computing and secure data storage facilities.

We have utilized the Bioinformatics team at Novus Genomics for many of our drug discovery projects, as they provide expertise in the analysis of complex bioinformatic datasets. This includes large scale datasets from public sources as well as internally generated datasets. In many instances, at the start of a project, we have planned our large scale transcriptomic/proteomic studies with the Novus team, to ensure that the data generated would provide the information we need, and that our projects had the highest chance of success. We have been consistently impressed with the rigor of Novus’ work, their communication throughout the projects, and the rapid speed at which they complete their analyses.