Untitled

SECTION 5.2

In this topic, we will approach and try to answer the question How we create the graph database? As we told before, we are using the Neo4J database for holding our graph database. To achieve this mission we start to look at this Neo4J documentation website, about imports of CSV files into Neo4J. Amazingly we found that load CSV is great for importing small – medium-sized data, i.e. up to the 10M records range.

After that, we knew that we are on the right path to creating this graph database. One problem that we had in creating this database was with the import of integer numbers from the CSV file, but quickly we found that all data from the CSV file is read as a string, so we have to use toInt, toFloat, split or similar functions to convert. Like in the example below.

LOAD CSV WITH HEADERS FROM "file:///Users/ricardoesteves/Desktop/college.csv" AS csvCollege
CREATE (college:College {toInt(college_id: csvCollege.college_id), name: csvCollege.name_full, state: csvCollege.state})

Other problem that we face, was because of the high number of lines in certain CSV files, this was a problem because the load into the graph database was painful. So one solution that we found to minimize this problem is using the prefix USING PERIODIC COMMIT 1000. This statement will import 1000 rows and commit them to the database, this will increase significantly our performance of loading data.

Our final result is this schema, generated by the command \texttt{call db.schema() }.

<IMAGE OF GRAPH DATABASE SCHEMA>