Sherlock Laboratory: Advanced Analytics for Smart Data

Landry, T., Gouineau, F., Triplet, T. "Sherlock Laboratory: Advanced Analytics for Smart Data" in High Performance Computing Symposium (HPCS2015). Montréal, Canada
[Full text] [Abstract]

While understanding and advocating Big Data is often within reach of most organisations, it is quite different when it comes to show real-word applications. It is often difficult to obtain access to large databases that exhibit variety and quality. Complex analytics resides on scores of distributed machines, virtual or physical. User experience needs to provide useful cues and pertinent interaction, for instance through advanced visualization. In order to experiment, master and demonstrate this value chain, le Centre de recherche informatique de Montréal (CRIM) started the Sherlock project in early 2014.
In order to build flexible capacities that can be adapted to an array of domains, our system relies on a modular approach that combines different technologies from the Hadoop ecosystem. The infrastructure includes the batch-oriented MapReduce framework, often used in BI for reporting purposes, Apache Spark for high-performance in-memory computing and Cloudera Impala for interactive SQL querying of Big Data. To address the increasingly popular Internet-of-Things trend, we use GeoMesa and GeoTrellis to facilitate the real-time analysis (Spark) and storage (Accumulo/HDFS) of continuous streams of both vectorial and raster data collected from a variety of sensors. In addition, Sherlock relies on search engine like ElasticSearch and Solr to index relevant data in the cluster and facilitates their retrieval. All services are available through a unified REST interface that relies on the Play framework. New services can be deployed in the Sherlock Lab using either OpenStack virtual machines for flexibility and elasticity, or lightweight Docker containers for more demanding applications.
CRIM plans to open up progressively Sherlock's resources to its clients, partners and collaborators in an environment aligned with Labs-as-a-Service practices.
  • C'est par ici >
  • ???? À LIRE! Nouvel article sur le #blogue du @CRIM_ca , par Farooq Sanni et Martin Sotir, scientifiques des données !…