Graph Databases for the Public Sector

Jennetta George
AI/ML Specialist Leader

Graph Databases are the (not-so-) new must-have in your tech stack. Over the past few decades, more and more tech giants (Facebook, Twitter, and Google leading the way) have started migrating data into graph databases to not only store and retrieve data in a highly scalable way, but also incorporate models on graph structures to produce highly effective recommendations.

What makes graph databases so enticing to these companies with gargantuan data? And can the public sector benefit from this technology in similar ways?

What differentiates Knowledge Graphs from RDBMS?

RDBMS Basics

Relational Database Management Systems (RDBMS’s) refers to databases that store their data in a structured format with rows and columns, and have a schema that demonstrates the linkage between different tables within a database. RDBMS’s are generally queried with SQL (Structured Query Language) or a similar equivalent, and are great for fairly static (non-changing) and normalized data structures.

To give a basic use case of RDBMS, let’s pretend we are starting a company called GovBook, the government equivalent of Facebook, that allows users to register for an account, submit government documents, and interact with government officials in a social setting. There are thousands of data attributes being captured for each person, which each must get placed into a specific table in a database. We could structure our database schema to have a User Registration table, containing all information for a user that is given during registration, and a Documents table, containing a link to a pdf for every Document that a user has uploaded to the system. There would be some kind of unique identifier, called a primary key, to differentiate users in the User Registration table, and this key would also serve as the foreign key to link the two tables to each other.

This seems like a very nice and logical way to deal with data management, but of course there are pros and cons to this type of structured system. In this example, the biggest pit-fall is that all entered data be able to fit into the pre-designated columns of a table. But what if, a year into business, GovBook decides they want to capture additional data during registration. At the start, they capture Name, Username, Password, Email, and Gender, but they now want to add a field for Date of Birth.

NoSQL Graph Databases

This problem gave rise to the concept of NoSQL (Not Only SQL) databases, which allows for data storage that is:

  • non structured
  • highly scalable
  • schema less

Over the past decade, there has grown a huge popularity in one particular type of NoSQL database, the Graph Database. Instead of interpreting data as rows and columns, graph databases use subject-predicate-object triples. So in the example above, instead of inserting a row of data Jennetta George , JennettaGeorge1990, Password01, jennettageorge1990@email.com, Female into a Registration table, the following triples would be added to a graph:

  • Jennetta George -> hasUserName → JennettaGEorge1990
  • Jennetta George -> hasPassword -> Password01
  • Jennetta George -> hasEmail -> jennettageorge1990@email.com
  • Jennetta George -> hasGender -> Female
 Jennetta George Graph

The subject and object of a triple are the nodes of the graph and the predicates are the edges.

Do not let the term Relational in RDBMS fool you — in graph databases, the edges (or relationships between nodes) are treated as first class citizens, and this is where the true magic of graph databases comes from. Edges act as the connection between all data in the graph, and act as a rich source of data in themselves. Depending on your choice of graph database, you have the option to label, give direction, and add other informative attributes to your edges. For instance, if we consider the triple User — :contacted -> Agent , we can build many attributes and labels into the relationship contactedWe can subscribe an inverse relationship, receivedContactFrom, as well as assign labels like date, topic, and means of communication. If you stop and think about how to implement this in a relational database, you can see how much heavier of a lift creating this structure becomes.

example of a labeled edge 

example of a labeled edge

Benefits of Using a Graph Database

Let’s dive into how this different way of approaching data management affects the data’s lifecycle.

Faster Querying Speed

Just like how relational databases rely on a SQL language to retrieve data, graph databases have their own specialized query language as well. Neo4J, one of the leading graph database tools, using a language called Cypher.

One benefit of using graph databases to query data can be readily seen in the following example. Consider an analyst who wants to find all users who have connected with a government agent who lives in the New York City area. In a relational database, this simple sounding query can get quite complex, both in time and query complexity. This query would include performing multiple select statements across multiple joins, depending on how many tables the data is stored in that is relevant to this search.

data table 

On the other hand, in a graph database this would involve searching for analyst who live in the NYC area, and then traversing the graph in reverse along the path between user and government agent to collect the user’s information. The time complexity between the two searches is magnitudes different, and when you are dealing with petabytes of data, this is not something to be taken lightly.¹

graph database 

Other Graph Database Advantages:

Let’s consider a few more big advantages to using graph databases over relational databases:

  • Simultaneously update and Query Data — graph databases offer the unique ability to update data in real-time while allowing queries to be called on the data simultaneously (even big players in the big data streaming space such as Hadoop HDFS struggle to offer such a service).
  • Dynamic Schema — Unlike relational databases, which require data to conform to a set schema assigned at the table’s inception, graph databases allow you to add or remove nodes and edges at any point during the data’s lifecycle.
  • Incorporating AI / ML models — graph databases offer rich and accessible information from the data that add depth and meaning to AI models that are simply unavailable from relational databases.⁴

 

Graph Database Drawbacks

Of course no technology is without it’s flaws, so let’s explore some potential drawbacks to using graph databases.

  • Alotting time for education — Even though graph databases have been gaining popularity since the early 2000’s, there is still only a small sector of technologists who are comfortable working with this tool, let alone who are experts in it. When a company decides to migrate to a graph database, they are also deciding to invest considerable resource hours into training costs.
  • The data is only as rich as you make it — Simply migrating your RDBMS data to a graph database will not magically make it a richer dataset. The data engineer needs to architect a new schema, usually from scratch, and coerse the data to fit this new mold. This, again, takes a considerable amount of time and resource alottment. I have worked at several companies where I was part of the team working on migrating data from relational to graph databases, and each company had allotted a minimum of 1 year for completion of the project.

Impact on Public Sector Data

Public vs private sector data management and analysis have a lot of overlapping needs — both aim to cut costs while improving operational efficiency and adding business value from the story that their data tells. While government does not have the same pressing demands of commercializing profits, they are up against hefty requirements to make their data and information processes private, transparent, and trustworthy. Government organizations use data analysis for two lofty goals of both serving to protect and uphold government regulations for the safety and wellbeing of its citizens as well as provide the service of enhancing the quality of life and optimizing processes for government employees and citizens. With huge amounts of data to sift through, the answers to both of the problems exist somewhere in the data, and finding existing relationships within the data will expedite these results.

Another huge take away from graph databases is the removal of redundant data, which alone could save an organization countless hours and dollars in storage and data normalization. Instead of having to store PK/FK columns in multiple various tables throughout a database, a graph database eliminates the need for that by utilizing index-free adjacency, which is exactly what it sounds like: instead of relying on index keys to link pieces of data, graphs rely on edges to make the connection. This allows data administrators to create intricate graph schemas that are very intuitive and visually simplistic.³

Government Use Cases

Neo4J has compiled a list of use cases from various organizations on how the use of their technology has enhanced their scientists’ workflows. Here are a few specific to public sector organizations:

U.S. Army — Supply Chain, Bill of Materials and Maintenance Cost Management

The U.S. Army is the largest branch of our Armed Forces with over one million soldiers and 200,000 civilian staff, each relying on various pieces of equipment needing consistent maintenance and replacement. Traditionally, the Army used a mainframe-bases system to keep tabs on all supply chain management, but it is easy to see why this outdated system was not a great fit for this dynamic, expansive dataset.

Switching to a graph database allowed the Army to implement models and queries for:

  • forecasting replacement, maintenance and mean time to failure based on environmental factors
  • Perform multi-dimensional cost comparison and trend analysis
  • Perform analysis for the Army’s logistic and budget requirements process
  • Perform exploratory data analysis for vital questions surrounding deployment of forces to new war zones


MITRE — Fighting and Tracing Cybersecurity Threats

Mitre is a non profit federally funded company utilizing graph technology to combat cyberwafare in the U.S. With so many pieces of data available, finding the connections between the pieces is what ultimately leads to the pattern recognition needed to find and stop cyber attacks. MITRE has done just that by creating a tool called CyGraph that fuses desperate data into a consolidated picture, exposing vulnerabilities and threats.⁵

How I utilize Graph Technology at Attain

 graph technology at Attain

Continuous Intelligence Platform being designed by my team at Attain

As a lead data scientist at a government consulting firm, I am constantly looking for ways to improve data management in ways that have long lasting gains in predictive analytics. Along with other tech leaders in the company, we are working on developing a Continuous Intelligence Platform that incorporates graph databases to fuse subject matter expertise and data insights into wisdom. Its inception started with the idea to help both field and desk analysts automate their data analysis and build sophisticated pattern recognition and data retrieval systems to help combat threats against the U.S. By leveraging cutting edge CICD and ML microservices with our graph database, we can save the government countless man hours and dollars while providing a service that goes above and beyond what an analyst would be able to perform on their own.

About the Author

Jennetta George Headshot

Jennetta George is a lifelong mathematician and entrepreneur turned data scientist. She currently serves as an AI/ML Specialist Leader and head of the AI Center of Excellence at Attain where she provides thought leadership in the areas of AI engineering, business intelligence, and data science project management across Attain and various public and private sector data driven communities. 

References

 

Note: This blog post originally appeared on Towards Data Science and can be found here

By Jennetta George, AI/ML Specialist Leader