What is an API?
Difficulty: General Audience
API is an abbreviation for Application Programming Interface. The key word is interface and you’re actually using one right now. Your phone or computer is a black box of software and hardware which you know very little (or nothing) about. Yet, you’re controlling it within safe and secure limits through a user interface. APIs are no different in this respect; they’re just designed for interaction between multiple black boxes of software.
Before we dive into examples, we need to split this discussion into the two most common types of black boxes of software where API is used to describe their points of connection:
- an application designed to handle communication over a network such as the internet (Web API)
- a library that is included in the development of an application (Library API)
Table of Contents
Say you’re creating a chat app and you want your app to import a user’s Facebook photos. You’ll need to useFacebook's Graph API to access their photos. Your app knows nothing about the software running on Facebook's servers, yet it's able to safely use their protected data and functionality thanks to Facebook's API.
An API is the collection of available connection points of a black box of software. It allows another black box of software to interact with the protected code and data behind the API. The two black boxes in the example were:
- a hypothetical chat app (could be iOS, Android, Mac, Windows, etc. — doesn’t make a difference),
- and Facebook’s server (the software exposed through their API).
This client-server architecture is one of the most prevalent design patterns in software:
Note that a network is often simulated within the same machine using loopback connections.
But black boxes of software don’t have to physically be on different machines communicating over a network. Let’s imagine you want to add a feature to your chat app where all numbers typed by users are converted to words, for example: 143 would become one hundred forty-three. You could implement this feature yourself, or you could find an open-source library that offers this functionality and include it in your app. This very basic library would provide a function that takes a number as an input and returns a string of characters as output. When that function executes, it executes code in the library. In this example, the library is the black box of software and the public function is its available connection point. That function is public because it’s meant for you to import into your application by referencing it. This is in contrast to the library’s private functions which are only meant to be used within the library. The collection of a library’s public functions is its API.Once a library's public function is called, execution of the application's code moves to the library until a result (possibly none) is returned to the calling function. This is what importing a public function looks like in a few languages:
from path.to.code.my_module import my_function # Calling my_function from my_module another_result = my_function()
Before looking closer, let’s reiterate that an API is most commonly used to refer to either:
- an application’s interface exposed over a network (like the internet)
- a library’s public functions
When working with a network API for the first time, three questions need to be answered:
- what type of access control does the API use?
- what type of media is the API serving?
- how is the media being served?
These three answers will greatly determine the amount of work required to build a client capable of reliably communicating with the server.
The topic of access control is huge and merits at least its own post. If I were to write one, it would look like this Nordic APIs article.
Media generally falls under three broad categories:
- other (images, PDFs, executables, compressed archives, etc…)
In order to answer how is the media being served, we need to discuss API architectures and transfer protocols. These topics are outside the scope of this post, however, a bird’s eye view is not. Let’s quickly run down what to expect from modern Network APIs:
Some of the most common types of APIs for serving textual data over a network are:
REST and Websockets are capable of serving any type of data, however, they’re most commonly used for textual data. GraphQL is always textual. REST and GraphQL are architecture styles which use HTTP as their transport protocol, while Websocket is its own transport protocol.
FTP is another common transfer protocol. However, FTP gives someone direct access to a filesystem (direct access into the black box of software). API and FTP exist in different paradigms — APIs are the connecting points of a shell which guards its black box of software.
REST is the most mature of the 3 and for this reason, it’s the easiest to get started. There’s a lot of online resources around best practices and architectures using REST APIs.
REST is an architecture where each resource has a separate URL. Fetching, creating, updating, or deleting resources of that type utilize the different HTTP request methods to specify the action.
You’ll notice that for newer services, the textual data for request and response payloads is JSON. Older services tend to use XML.
GraphQL was introduced in 2015, but the depth of resources online surrounding it is comprehensive. Whereas REST serves one resource per endpoint, GraphQL allows you to request a tree of related resources. For example, in REST, if you wanted information regarding a customer and their account, you would have to make two separate network requests. One request for the customer information, and another for the account information. With GraphQL, you could ask for both in a single request.
Note: a customer and a customer’s account are two distinct entities because of the many-to-many relationship between them. One customer can have multiple accounts, while multiple customers may have access to one account.
GraphQL typically structures its textual data in JSON. Note: GraphQL is not to be confused with Facebook’s Graph API — they are distinct.
Websockets enable bi-directional transfer between a browser and a server. One important part of this is that it allows servers to push data to clients without a prior request (other than the one that established the long-running websocket connection).
An alternative to websockets which run on HTTP is server-sent events, but since it’s not supported by Microsoft Edge/IE, its not commonly used. Visit this excellent deep-dive to learn more about both.
Any data can be sent over a network as a file, and audio/video data is no exception. However, audio and video are more often sent over networks using streaming protocols. The data is sent in small chunks from a server through its API, and that data is reconstructed for playback by the client. This allows playback to occur while the data is being transferred. The length of time from when the first packet of data is sent and when playback can begin is the latency. The latency of the transfer protocol determines the possible use cases.
Steaming APIs may use one of the following common streaming protocols:
- MPEG-Dash (high latency — okay for video streaming)
- Apple HLS (okay for video streaming)
- RTMP (okay for live streaming)
- RTSP/RTP (good for live streaming)
- WebRTC (very low latency — good for real-time chat)
Note: The number of factors which affect latency are tremendous and any protocol can be tweaked in many ways. These figures are from Googling, except for WebRTC (personal experience).
Building a client that interfaces with a streaming API is typically labor intensive compared to textual APIs. Streaming protocols are typically created with a limited scope to allow for flexibility. Due to their limited scope, a lot of plumbing is required to facilitate the actual stream. For example, WebRTC is strictly a video/audio transport protocol, so information to facilitate the stream must be sent using another transport protocol, such as Websocket (this mechanism is called signaling).
Images and other static resources are typically served through a separate URL per resource which maps to locations on a filesystem. However… back to the example of our chat app importing images using Facebook’s Graph API… they actually return a redirect to the image, while the actual URL can optionally be returned as a string of characters.
Libraries typically consist of code written in the same language that’s importing (including) the library. Each language typically has a package manager to help manage the third-party libraries of an application. Package managers also help manage the dependencies of third-party libraries. They often depend on other third-party libraries, creating a web of dependencies.
In the example, we used a very simple library with one public function. Libraries can be much more complex, encompassing many related features supported by multiple public functions acting as the library’s API.
Two tools are critical for exploring open-source libraries:
- github.com (most popular host of open-source libraries)
- package manager for your platform (language)
Here’s a list of the most common package manager for a few languages:
Package managers help manage third-party libraries of an application. Applications also have access to the native libraries that are included with the programming language. When using the API (public functions) of a native library, you seldom need to import anything, because the interpreter or compiler for your language already has references to the native libraries.
Software is a jungle of black boxes connected together by interfaces, but not all interfaces are APIs. The term API is not only dependent on context, it’s also subjective. However, libraries and web applications are the most common black boxes of software whose points of connection are referred to as APIs.
This post touched on a lot of topics. Google is your best friend to learn more about anything, but questions and comments are welcome. To read more about interfaces, visit: What is an Interface?