{"id":18002,"date":"2018-10-12T10:41:49","date_gmt":"2018-10-12T14:41:49","guid":{"rendered":"https:\/\/www.crim.ca\/blogue\/deep-learning-applied-to-graphs-extraction-and-processing-of-graph-information-by-convolutional-neural-networks\/"},"modified":"2023-06-14T10:00:52","modified_gmt":"2023-06-14T14:00:52","slug":"deep-learning-applied-to-graphs-extraction-and-processing-of-graph-information-by-convolutional-neural-networks","status":"publish","type":"blogue","link":"https:\/\/www.crim.ca\/en\/blogue\/deep-learning-applied-to-graphs-extraction-and-processing-of-graph-information-by-convolutional-neural-networks\/","title":{"rendered":"Deep learning applied to graphs: Extraction and processing of graph information by convolutional neural networks"},"content":{"rendered":"<div class=\"iw ix iy iz ja\">\n<p id=\"5a30\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">Graphs \u2014 frequently used in the fields of transport, telecommunication, biology, sociology and others \u2014 allow, in the simplest cases, an exploration of data and their connectivity from a simple visual analysis. However, in most applications, for example in the classic case of the representation of social networks, the graphs obtained contain an immense quantity of nodes and connexions between them. When that\u2019s the case, simple visual analyzes can quickly become complex. The use of machine learning on massive data graphs is a solution for numerous analytical and predictive approaches such as clustering for community detection, social link inference, vertex classification without a label, etc.<\/p>\n<p id=\"f19b\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">Deep learning is inspired by the functioning and architecture of brain systems: it uses artificial neural networks capable of solving complex problems with training. Deep learning seeks to learn through an algorithm composed of various layers of artificial neurons. The higher the number, the more the algorithm can handle with complex problems. In the context of large data volume processing, the use of deep learning and more specifically convolutional neural networks seems to be appropriate. Indeed, unlike multilayer perceptrons, convolutional neural networks limit the number of connections between neurons from one layer to another, which has the advantage of reducing the number of parameters to be learned.<\/p>\n<p id=\"ceb3\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">Recent research is concerned with the application of convolutional neural networks to graphs. Convolutional neural networks are generally used for processing data defined in a Euclidean space (finite dimensional vector space) such as images structured on a regular grid formed by ordered pixels. However, graphs have no set structure or order allowing them to be processed in this way by convolutional neural networks.<\/p>\n<p id=\"7e43\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">A recent method called \u201cDeep Graph Convolutional Neural Network\u201d (DGCNN) proposed by M.Zhang et al. (2018) [1] exposes a new architecture of convolutional neural networks for graph processing.<\/p>\n<figure class=\"kt ku kv kw hf kx gt gu paragraph-image\">\n<div class=\"ky kz dq la cf lb\" tabindex=\"0\" role=\"button\">\n<div class=\"gt gu me\"><img fetchpriority=\"high\" decoding=\"async\" class=\"cf lc ld\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/1*9-h8lzEXqaFYF7ApCe5mVg.png\" alt=\"\" width=\"700\" height=\"186\" \/><\/div>\n<\/div><figcaption class=\"le bm gv gt gu lf lg bn b bo bp co\" data-selectable-paragraph=\"\">DGCNN Architecture [1]<\/figcaption><\/figure>\n<p id=\"6386\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">This new architecture proposes the addition of two steps (graph convolutions and Sortpooling) to allow graphs to be processed by traditional convolutional neural networks [1].<\/p>\n<h2 id=\"e812\" class=\"mf mg jd bn mh mi mj mk ml mm mn mo mp kj mq kk mr km ms kn mt kp mu kq mv mw gi\">Graph convolution layers<\/h2>\n<h3 id=\"f7b2\" class=\"mx mg jd bn mh my mz na ml nb nc nd mp lr ne nf mr lv ng nh mt lz ni nj mv nk gi\">Weisfeiler-Lehman Subtree kernel :<\/h3>\n<p id=\"1ef5\" class=\"pw-post-body-paragraph li lj jd lk b ll nl ke ln lo nm kh lq lr nn lt lu lv no lx ly lz np mb mc md iw gi\" data-selectable-paragraph=\"\">The extraction of information from the DGCNN method graphs is inspired by the Weisfeiler-Lehman subtree kernel method (WL)[2]. This method is a subroutine aimed at extracting features from sub-trees at several scales for graph classification. Its operation is determined as follows:<\/p>\n<figure class=\"kt ku kv kw hf kx gt gu paragraph-image\">\n<div class=\"ky kz dq la cf lb\" tabindex=\"0\" role=\"button\">\n<div class=\"gt gu nq\"><img decoding=\"async\" class=\"cf lc ld\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/1*DzM_-dN_7cwn5UPa0vr3Ww.png\" alt=\"\" width=\"700\" height=\"631\" \/><\/div>\n<\/div><figcaption class=\"le bm gv gt gu lf lg bn b bo bp co\" data-selectable-paragraph=\"\">Subtree kernel Weisfeiler-Lehman [2]<\/figcaption><\/figure>\n<p id=\"f3dd\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">The basic idea of WL is to concatenate the color of a vertex with the colors of its neighbors at 1 jump thus contenance the signature WL of the vertex (b), then to sort the signature strings lexicographically to attribute new colors \u00a9. Vertices with the same signature are given the same new color (d). The procedure is iterated until the colors converge or reach a maximum iteration h. In the end, vertices with the same converged color thus share the same structural role in the graph and are no longer distinguished. WL is used to check the isomorphism of graphs, because if two graphs are isomorphic, they will have the same set of WL colors at each iteration. The core of the WL subtree uses this idea to measure the similarity between two graphs G and G\u2019: after iteration, the similarity is determined by the calculation of the kernel function (e)[2].<\/p>\n<p id=\"fc80\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">The DGCNN method differs from of WL. In fact, it doesn\u2019t use the WL sub-tree method as such but uses a soft WL version. In this method, the kernel function isn\u2019t calculated and the use of colors is not the same: the DGCNN method concatenates the colors obtained in the form of a tensor\u00a0<em class=\"nr\">Zt\u00a0<\/em>(with\u00a0<em class=\"nr\">t = 1, .., h<\/em>) horizontally in a tensor\u00a0<em class=\"nr\">Z1: h<\/em>\u00a0(h being the number of iteration \/ convolution performed) [1].<\/p>\n<h3 id=\"c2cd\" class=\"mf mg jd bn mh mi mj mk ml mm mn mo mp kj mq kk mr km ms kn mt kp mu kq mv mw gi\">SortPooling<\/h3>\n<p id=\"f556\" class=\"pw-post-body-paragraph li lj jd lk b ll nl ke ln lo nm kh lq lr nn lt lu lv no lx ly lz np mb mc md iw gi\" data-selectable-paragraph=\"\">The major innovation of the DGCNN method is a new layer named SortPooling. SortPooling takes as input the characteristics of unordered vertices of a graph from the convolutions of graphs. Then, instead of summarizing the features of the vertices (causing a loss of information) as traditional pooling layers do, SortPooling classifies them in a consistent order and produces a sorted graph representation with a fixed size, so that traditional convolutional neural networks can read the vertices in a coherent order and be trained on this representation.<\/p>\n<p id=\"4b00\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">Therefore, the main function of the SortPooling layer is to sort feature descriptors, each representing a vertex, in a coherent order, before feeding them into convolutional and dense layers of a single dimension. For the classification of images, pixels are naturally arranged following their spatial order, whereas to classify text, the alphabetical order is used to sort the words. For graphs, the SortPooling layer classifies the vertices according to their structural role in the graph as established by the WL subtree kernel method.<\/p>\n<p id=\"ef4b\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">Formally, the outputs of the convolutional layers of the\u00a0<em class=\"nr\">Zt<\/em>\u00a0graph are used to sort the vertices. The layer of SortPooling takes as input a tensor\u00a0<em class=\"nr\">Z1: h<\/em>\u00a0of size\u00a0<em class=\"nr\">\ud835\udc5b * \u03a3\ud835\udc50\ud835\udc61<\/em>, where each line is a vertex feature descriptor and each column is a feature channel. The output of SortPooling is a tensor \ud835\udc58 * \u03a3\ud835\udc50\ud835\udc61, where k is an integer to define. In the SortPooling layer, the input\u00a0<em class=\"nr\">Z1: h<\/em>\u00a0is first sorted by line according to\u00a0<em class=\"nr\">Zh<\/em>.<\/p>\n<p id=\"6f3f\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">More precisely, a coherent order is imposed on the vertices of the graphs, which makes it possible to form traditional neural networks on the representations of sorted graphs. The order of vertices based on\u00a0<em class=\"nr\">Zh<\/em>\u00a0is calculated first by sorting the vertices using the last channel of\u00a0<em class=\"nr\">Zh<\/em>\u00a0in descending order. Such an order is similar to the lexicographic order. The SortPooling layers acts as a bridge between graph convolution layers and traditional neural network layers by backpropagating loss gradients and thus integrating graph representation and learning into one end-to-end architecture [1].<\/p>\n<h3 id=\"8ba6\" class=\"mf mg jd bn mh mi mj mk ml mm mn mo mp kj mq kk mr km ms kn mt kp mu kq mv mw gi\">DGCNN in practice<\/h3>\n<p id=\"7bfb\" class=\"pw-post-body-paragraph li lj jd lk b ll nl ke ln lo nm kh lq lr nn lt lu lv no lx ly lz np mb mc md iw gi\" data-selectable-paragraph=\"\"><em class=\"nr\">Detailed information for using the DGCNN algorithm is on the authors\u2019\u00a0<\/em><a class=\"au lh\" href=\"https:\/\/github.com\/muhanzhang\/pytorch_DGCNN\/\" target=\"_blank\" rel=\"noopener ugc nofollow\"><em class=\"nr\">Git<\/em><\/a><em class=\"nr\">.<\/em><\/p>\n<h3 id=\"5d83\" class=\"mx mg jd bn mh my mz na ml nb nc nd mp lr ne nf mr lv ng nh mt lz ni nj mv nk gi\">Non Input dataset<\/h3>\n<p id=\"9d67\" class=\"pw-post-body-paragraph li lj jd lk b ll nl ke ln lo nm kh lq lr nn lt lu lv no lx ly lz np mb mc md iw gi\" data-selectable-paragraph=\"\">The input data of the DGCNN method must have a specific format. Several steps are needed in order to obtain the appropriately formatted data. The datasets tested by the authors (available on the\u00a0<a class=\"au lh\" href=\"https:\/\/ls11-www.cs.tu-dortmund.de\/staff\/morris\/graphkerneldatasets\" target=\"_blank\" rel=\"noopener ugc nofollow\">platform<\/a>\u00a0of the University of Dortmund) are reference datasets suitable for graph kernel technics. Datasets can be described by several text files: firstly, the obligatory text files for the continuation of method where n = total number of nodes, m = total number of edges and N = number of graphs :<\/p>\n<p id=\"4c27\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">(1) _A.txt (m rows) fragmented adjacency matrix (diagonal block) for all graphs, each row corresponds to (row, column) resp. (node_id, node_id)<\/p>\n<p id=\"3d6b\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">(2) _graph_indicator.txt (n rows) column vector of graph identifiers for all nodes of all graphs, the value in the\u00a0<em class=\"nr\">i-<\/em>th line is the graph_id of the node with node_id\u00a0<em class=\"nr\">i<\/em><\/p>\n<p id=\"f129\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">(3) _graph_labels.txt (N rows) class labels for all graphs in the dataset, the value in the<em class=\"nr\">\u00a0i<\/em>-th row is the graph class label with graph_id\u00a0<em class=\"nr\">i<\/em><\/p>\n<p id=\"b20a\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">(4) _node_labels.txt (n rows) column vector of the node labels, the value in the\u00a0<em class=\"nr\">i<\/em>-th line is the node with node_id\u00a0<em class=\"nr\">i<\/em><\/p>\n<p id=\"bd26\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">Then the optional text files:<br \/>\n(5) _edge_labels.txt (m lines) labels for the edges in _A.txt<\/p>\n<p id=\"0c1e\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">(6) _edge_attributes.txt (m lines) attributes for the edges in _A.txt<\/p>\n<p id=\"2d23\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">(7) _node_attributes.txt (n rows) node attribute matrix, the comma separated values in the i-th row are the attribute vector of the node with node_id\u00a0<em class=\"nr\">i<\/em><\/p>\n<p id=\"bc34\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">(8) _graph_attributes.txt (N rows) the regression values for all the charts in the dataset, the value in the\u00a0<em class=\"nr\">i<\/em>-th row is the graph attribute with graph_id\u00a0<em class=\"nr\">i<\/em><\/p>\n<p id=\"bf4e\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">The data contained in these files are only of numeric type: more precisely, the values can either be discrete or continuous. The data contained in these files are not directly usable by the DGCNN algorithm. The algorithm requires them to be transformed into a final single text file organized as follows:<\/p>\n<p id=\"537e\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\"><em class=\"nr\">N<br \/>\nn l<br \/>\nt m d<\/em><\/p>\n<p id=\"265f\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">Where\u00a0<em class=\"nr\">N<\/em>\u00a0is the number of graphs,\u00a0<em class=\"nr\">n<\/em>\u00a0is the number of nodes per graph,\u00a0<em class=\"nr\">l<\/em>\u00a0is the label of the graph,\u00a0<em class=\"nr\">t<\/em>\u00a0is the label of the current node,<em class=\"nr\">\u00a0m<\/em>\u00a0is the number of neighboring node of the current node (followed by the labels of each of these neighboring nodes) and finally\u00a0<em class=\"nr\">d<\/em>\u00a0are the features of the current node (not mandatory). This transformation also includes generation of ten training samples and ten other test samples (each sample being a text file).<\/p>\n<h3 id=\"97ec\" class=\"mx mg jd bn mh my mz na ml nb nc nd mp lr ne nf mr lv ng nh mt lz ni nj mv nk gi\">Launch of DGCNN algorithm<\/h3>\n<figure class=\"kt ku kv kw hf kx\">\n<div class=\"m l dq\">\n<div class=\"ns nt l\"><iframe class=\"fw aq as ag cf\" title=\"Deep Graph Convolutional Neural Network Test\" src=\"https:\/\/cdn.embedly.com\/widgets\/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2Fm5o0UH1mK_A%3Ffeature%3Doembed&amp;url=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3Dm5o0UH1mK_A&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2Fm5o0UH1mK_A%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube\" width=\"640\" height=\"480\" frameborder=\"0\" scrolling=\"auto\" allowfullscreen=\"allowfullscreen\" data-mce-fragment=\"1\"><\/iframe><\/div>\n<\/div><figcaption class=\"le bm gv gt gu lf lg bn b bo bp co\">Launch of DGCNN algorithm on AIDS dataset on 20 epochs<\/figcaption><\/figure>\n<p id=\"b945\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">The DGCNN algorithm consists of several scripts. It requires the use of a machine with a GPU which is approximately forty times faster than the CPU (allowing rapid processing of matrix products and convolutions ). The Pytorch package is also needed for script execution, which is used to exploit the power of the GPU.<\/p>\n<p id=\"aa49\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">For each data set, it is possible to adjust the hyperparameters concerning the number of epochs, the number of batches per period, the size of the convolution filter, the learning rate, etc.<\/p>\n<p id=\"6b02\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">The execution of the algorithm is done from the next command where DATANAME precise which dataset is used:<\/p>\n<blockquote class=\"nu nv nw\">\n<p id=\"b622\" class=\"li lj nr lk b ll lm ke ln lo lp kh lq nx ls lt lu ny lw lx ly nz ma mb mc md iw gi\" data-selectable-paragraph=\"\">.\/run_DGCNN.sh DATANAME<\/p>\n<\/blockquote>\n<p id=\"dd77\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">For each epoch, the algorithm displays the loss and accuracy of the training and the test phase.<\/p>\n<h2 id=\"ad53\" class=\"mf mg jd bn mh mi mj mk ml mm mn mo mp kj mq kk mr km ms kn mt kp mu kq mv mw gi\">Adaptability of DGCNN<\/h2>\n<h3 id=\"cc4a\" class=\"mx mg jd bn mh my mz na ml nb nc nd mp lr ne nf mr lv ng nh mt lz ni nj mv nk gi\">Variations of graph structures<\/h3>\n<p id=\"13cc\" class=\"pw-post-body-paragraph li lj jd lk b ll nl ke ln lo nm kh lq lr nn lt lu lv no lx ly lz np mb mc md iw gi\" data-selectable-paragraph=\"\">The datasets initially tested (PROTEINS and IMDB-M) pertain to the fields of biology and cinema. Each field of study has different graphs structures, it\u2019s easy to imagine that graphs representing molecules and others representing movies characteristics would have very different structures.<\/p>\n<figure class=\"kt ku kv kw hf kx gt gu paragraph-image\">\n<div class=\"gt gu oa\"><img decoding=\"async\" class=\"cf lc ld\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/659\/1*puabOCwp7qiYNnCucqzCxw.png\" alt=\"\" width=\"659\" height=\"196\" \/><\/div><figcaption class=\"le bm gv gt gu lf lg bn b bo bp co\" data-selectable-paragraph=\"\">DGCNN results of different datasets<\/figcaption><\/figure>\n<p id=\"0d88\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">In order to evaluate the DGCNN\u2019s adaptability when faced with graphs containing datasets from different domains, we tested three new datasets coming from the\u00a0<a class=\"au lh\" href=\"https:\/\/ls11-www.cs.tu-dortmund.de\/staff\/morris\/graphkerneldatasets\" target=\"_blank\" rel=\"noopener ugc nofollow\">platform<\/a>\u00a0of the University of Dortmund (AIDS, CUNEIFORM and MSRC_21).<\/p>\n<p id=\"e598\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">If we take as an example the AIDS dataset (composed of 2 000 graphs each representing a chemical component), the goal consist in detecting the components with or without a particular sub-structure reflecting anti-HIV activity and classifying them into two distinct classes. According to the results from the previous table, the algorithm\u2019s classification accuracy doesn\u2019t seem to depend on the structure of the data. In other words, the algorithm seems to be applicable to graphs of various structures (except for the CUNEIFORM dataset which deals with the field of literature and history).<\/p>\n<h3 id=\"73fc\" class=\"mx mg jd bn mh my mz na ml nb nc nd mp lr ne nf mr lv ng nh mt lz ni nj mv nk gi\">Others datasets formats<\/h3>\n<p id=\"3366\" class=\"pw-post-body-paragraph li lj jd lk b ll nl ke ln lo nm kh lq lr nn lt lu lv no lx ly lz np mb mc md iw gi\" data-selectable-paragraph=\"\">According to previous results, the DGCNN method seems to be adaptable to various types of data from numerous fields of study. To pursue this line of inquiry, we would like to test the method on a concrete case starting from a graph\u2019s dataset contained in a graph-oriented database. These tested data do not have the required form to be processed by the DGCNN algorithm, so various transformations have to be performed (all details and code for transformation are available on this\u00a0<a class=\"au lh\" href=\"https:\/\/github.com\/JadeGuisiano\/Deep-learning-on-Graphs\" target=\"_blank\" rel=\"noopener ugc nofollow\">git<\/a>\u00a0page) :<\/p>\n<figure class=\"kt ku kv kw hf kx gt gu paragraph-image\">\n<div class=\"ky kz dq la cf lb\" tabindex=\"0\" role=\"button\">\n<div class=\"gt gu ob\"><img loading=\"lazy\" decoding=\"async\" class=\"cf lc ld\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/max\/700\/1*h4gbHOHwSCTS0AzCflV1EQ.png\" alt=\"\" width=\"700\" height=\"98\" \/><\/div>\n<\/div><figcaption class=\"le bm gv gt gu lf lg bn b bo bp co\" data-selectable-paragraph=\"\">Transformations of a CSV dataset into the form required by DGCNN<\/figcaption><\/figure>\n<p id=\"1892\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">This on-going case study begins with the implementation of neighborhood crime data (CSV file) in the form of nodes and relationships in the Neo4j graph-oriented database, where each node represents a crime and is connected to adjacent neighborhood nodes. Cypher specific queries make it possible to organize the export of graph information to a csv file that has a particular shape and contains many superfluous characters.<\/p>\n<p id=\"98ba\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">The next step (Python script) allows to clean the data from the export Neo4j and partition the export into several text files in the data form proposed by the University of Dortmund. Finally, the information contained in these different text files will be assembled (Matlab script) in the final text file form used by the DGCNN algorithm. This last step also provides sample for training and test.<\/p>\n<p id=\"b660\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">This method of transformation makes it possible to use data of different formats by converting them to the required format for DGCNN processing. The second part of the case study (in progress) will consist in running the DGCNN algorithm on this transformed data.<\/p>\n<h2 id=\"f758\" class=\"mf mg jd bn mh mi mj mk ml mm mn mo mp kj mq kk mr km ms kn mt kp mu kq mv mw gi\">Conclusion<\/h2>\n<ul class=\"\">\n<li id=\"d800\" class=\"oc od jd lk b ll nl lo nm lr oe lv of lz og md oh oi oj ok gi\" data-selectable-paragraph=\"\">The DGCNN algorithm makes it possible to establish a supervised classification on the data of graphs while preserving a maximum of information.<\/li>\n<li id=\"607b\" class=\"oc od jd lk b ll ol lo om lr on lv oo lz op md oh oi oj ok gi\" data-selectable-paragraph=\"\">According to tests performed on graphs dealing with various subjects, the DGCNN\u2019s precision seems to be independent of the type of graph used.<\/li>\n<li id=\"7331\" class=\"oc od jd lk b ll ol lo om lr on lv oo lz op md oh oi oj ok gi\" data-selectable-paragraph=\"\">The required input data format of the DGCNN algorithm can be obtained after various transformations. It would be interesting to test the algorithm on data in other formats and from other domains.<\/li>\n<\/ul>\n<\/div>\n<div class=\"o dz oq or ii os\" role=\"separator\"><\/div>\n<div class=\"iw ix iy iz ja\">\n<p id=\"b849\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">[1] Zhang, Muhan et al. \u201cAn End-to-End Deep Learning Architecture for Graph Classification.\u201d\u00a0<em class=\"nr\">AAAI<\/em>\u00a0(2018).<\/p>\n<p id=\"066a\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">[2] Shervashidze, Nino et al. \u201cWeisfeiler-Lehman Graph Kernels.\u201d\u00a0<em class=\"nr\">Journal of Machine Learning Research<\/em>\u00a012 (2011): 2539\u20132561.<\/p>\n<p id=\"3259\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">Future work orientation :<\/p>\n<p id=\"3a4d\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">Hechtlinger, Yotam &amp; Chakravarti, Purvasha &amp; Qin, Jining. (2017). A Generalization of Convolutional Neural Networks to Graph-Structured Data.<\/p>\n<p id=\"f602\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">Jian, Du &amp; Zhang, Shanghang &amp; Wu, Guanhang &amp; Moura, Jose &amp; Kar, Soummya. (2017). Topology adaptive graph convolutional networks.<\/p>\n<p id=\"4115\" class=\"pw-post-body-paragraph li lj jd lk b ll lm ke ln lo lp kh lq lr ls lt lu lv lw lx ly lz ma mb mc md iw gi\" data-selectable-paragraph=\"\">Thanks to\u00a0<a class=\"au lh\" href=\"https:\/\/medium.com\/@dacosta.le\" rel=\"noopener\">Luis Da Costa<\/a><\/p>\n<\/div>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Graphs \u2014 frequently used in the fields of transport, telecommunication, biology, sociology and others \u2014 allow, in the simplest cases, an exploration of data and their connectivity from a simple visual analysis. However, in most applications, for example in the classic case of the representation of social networks, the graphs obtained contain an immense quantity [&hellip;]<\/p>\n","protected":false},"author":18,"featured_media":16697,"menu_order":0,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":"","_links_to":"","_links_to_target":""},"mots_cles":[451,446,452],"categorie_blogue":[449],"class_list":["post-18002","blogue","type-blogue","status-publish","format-standard","has-post-thumbnail","hentry","mots_cles-cnn-en","mots_cles-deep-learning-en","mots_cles-graph-neural-network-en","categorie_blogue-artificial-intelligence"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.crim.ca\/en\/wp-json\/wp\/v2\/blogue\/18002","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.crim.ca\/en\/wp-json\/wp\/v2\/blogue"}],"about":[{"href":"https:\/\/www.crim.ca\/en\/wp-json\/wp\/v2\/types\/blogue"}],"author":[{"embeddable":true,"href":"https:\/\/www.crim.ca\/en\/wp-json\/wp\/v2\/users\/18"}],"version-history":[{"count":1,"href":"https:\/\/www.crim.ca\/en\/wp-json\/wp\/v2\/blogue\/18002\/revisions"}],"predecessor-version":[{"id":19986,"href":"https:\/\/www.crim.ca\/en\/wp-json\/wp\/v2\/blogue\/18002\/revisions\/19986"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.crim.ca\/en\/wp-json\/wp\/v2\/media\/16697"}],"wp:attachment":[{"href":"https:\/\/www.crim.ca\/en\/wp-json\/wp\/v2\/media?parent=18002"}],"wp:term":[{"taxonomy":"mots_cles","embeddable":true,"href":"https:\/\/www.crim.ca\/en\/wp-json\/wp\/v2\/mots_cles?post=18002"},{"taxonomy":"categorie_blogue","embeddable":true,"href":"https:\/\/www.crim.ca\/en\/wp-json\/wp\/v2\/categorie_blogue?post=18002"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}