Don’t REST on Your Laurels

Introduction and Overview of GraphQL

As research for one of my master’s program classes, I am exploring GraphQL. GraphQL is a next generation API interface designed to overcome the shortcomings of traditional RESTful APIs. To fully grasp the value of GraphQL then, one must first understand what traditional APIs do and how they work, their limitations and challenges, and the value obtained from addressing these. In this post, I will briefly discuss the value and limitations, describe how GraphQL mitigates the challenges, and discuss at a high level how one might implement GraphQL in a scenario to make access to a back-end database resource easier and more efficient.

Notes: 1) This is a first stab, and I have not deployed anything I am writing about here yet. As such, and as with all my writing that follows my “learning out loud” philosophy, there are likely errors. Subsequent writings will probably be more technically precise, but this should be at least directionally accurate. 2) I assume the reader understands basic database concepts and can infer meaning from simple SQL code. If not, this may not be the best blog post for you to start with.

Hopefully I end up a little closer to reality than this… Corrections welcome!

Overview of RESTful APIs

Building interactive applications that work on the web and on mobile devices relies on a concept known as the Application Programming Interface, or API. At its core, an API is an endpoint connected to a back-end data source — usually a database of some kind — that is programmed to respond to a specific request with specific information pulled from the data source.

For example, let’s assume that we have a database table filled with census data separated at the county level that provides the FIPS ID, county name, state, population, population density, economic output, and a free-form notes column that can contain anything else of interest. If you have direct access to the database, you might generate a SQL statement to show the entire contents of the database as follows:

SELECT * FROM censusinfo.table

Or perhaps you just want information about a certain county. Since it’s possible for county names to match, you might specify the FIPS ID as a delineating field:

SELECT * FROM censusinfo.table WHERE FIPS_ID = “XXXXXXX”

Furthermore, you might not want all of the data returned for the county, but rather just the county name, state, and population, in which case you could structure your query as follows:

SELECT FIPS_ID, county_name, state, population FROM censusinfo.table WHERE FIPS_ID = “XXXXXXXX”

And so forth. This is helpful to know, assuming you have access to the database via a SQL interface. However, for most web and mobile applications, providing direct access to the database and the ability to run random queries is not a desirable design pattern. What you want to do instead is provide a controlled way to interface with the database data. RESTful APIs do that.

To deploy a RESTful API, you have to set up an API management service, the technical details of which go beyond what I will cover in this post. This service is available over a network (the Internet, for example) and responds to properly formatted requests for information. To define what “properly formatted” means and what information is provided in return, you establish an API “endpoint” within the API management service. This API endpoint defines what a properly formatted request looks like and what data is returned in response. For example, you might want users to be able to get a complete list of FIPS IDs plus county names and states from the database. You would define a GET request, which might look something like this.

GET https://myapiendpointservice.com/censusinfo.table/countyinfo

You would then configure the back-end service to respond to this request with the results of a specific SQL command:

SELECT FIPS_ID, county_name, state FROM censusinfo.table

By using this API endpoint in this manner, you have just provided a way to get the data needed without providing direct access to the SQL interface, making this a useful tool for dynamic web and mobile applications.

Limitations of RESTful APIs

One big problem with this approach is oversharing. Say, for example, all you really needed for your application was the FIPS_ID field data, and you weren’t going to use county_name or state information. Unless a specific endpoint was set up that perfectly matched your data needs, you would have no choice but to use the endpoint that provided too much information and simply ignore it in the response. This is functional, but can cause an application to experience performance issues as it has to process more data than would otherwise be necessary. With the explosion of mobile applications, users might experience issues because this unneeded data being transferred would still be counted against their data plans, potentially resulting in increased costs or service reductions.

Another big problem is undersharing. Say the RESTful APIs were set up such that you could only request one or two pieces of data at a time, and you really needed five or six of them. In order to gather all of the needed data, your application would have to issue multiple RESTful API calls. This again results in extra data being transferred back and forth between the app and the back-end data source, multiple queries being run where a single SQL query might have sufficed resulting in performance issues, plus additional burden on the app side to concatenate the data before using it.

These limitations and challenges, plus the inherent inflexibility of RESTful APIs, result in systems that over time grow increasingly and unnecessarily complex, becoming hard to manage, and provide users and developers with substandard performance and experiences.

Overview of GraphQL

GraphQL was designed to address these challenges with RESTful APIs. The basic setup looks similar — first, you establish GraphQL as your API service, which then listens for requests. Then, you define what properly formatted requests might look like. However, this is where GraphQL and RESTful APIs diverge in a significant way. Rather than defining a one-to-one relationship between a formatted request and the query that will be executed in response, you define a set of rules using a Schema Definition Language (SDL). This SDL represents data fields that are available in your back-end data source and the types of relationships between them, as appropriate for the kinds of information you want to be able to supply. For example, you might specify that a state can contain many FIPS IDs. By doing so, you are enabling GraphQL to be able to respond to a query that wants all the FIPS IDs for a given state without having to create a specific endpoint for every state like you would have to do for RESTful APIs. You might also establish that there is a 1-to-1 relationship between a FIPS ID and a county name. This means that GraphQL would be able to respond to an inquiry by matching FIPS IDs to county names, if requested.

Sample GraphQL Implementation

One way to set up the scenario above might be as follows:

type state {
name: String!
FIPS_ID: [String!]!
{
county_name: String!
}
}

Once the data relationships are established, you can now submit a GraphQL query that exploits them and gather multiple pieces of information in a flexible way, all while completely mitigating the oversharing issue because you are only going to get back the exact data you requested. For example, you might structure a GraphQL API request (which is actually now a full-blown query rather than a static request) something like this:

{
state(name: “MO”)
{
FIPS_ID
{
county_name
}
}
}

And if configured correctly, this would return the results of the following SQL query against the database:

SELECT state, FIPS_ID, county_name FROM censusinfo.table WHERE state=”MO”

However, assume the application only wanted the FIPS IDs returned and **not** the county names. With RESTful APIs, you would have to establish another endpoint to provide the precise data. With GraphQL, all you need to do is modify your query as follows:

{
state(name: “MO”)
{
FIPS_ID
}
}

And the resulting SQL query would be:

SELECT state, FIPS_ID FROM censusinfo.table WHERE state=”MO”

Again, note here that GraphQL allows us to establish the relationships between the data fields in our table, and then exploit those relationships to generate a flexible query that returns all the information we want, but only the information we want. This represents a significant step forward in terms of data management and application flexibility, as well as long-term maintenance of the data sources and API endpoints themselves.

Data Science, Big Data, & Cloud nerd with a focus on healthcare & a passion for making complex topics easier to understand. All thoughts are mine & mine alone.