Coding in Python to Use AWS ElastiCache

Source
Cloud Architects need to wear many hats depending on who needs what and when. Especially when a new technology or service is added to the mix, they need to make sure not only the new service is provisioned/configured correctly, but also the developers/teams can use it in the most effective way. The same thing happened at one of my workplaces when ElastiCache was introduced. It was an AWS environment with python as the default language. In order to guide teams to start using ElastiCache quickly, I had to do a POC project to verify the service was provisioned properly and show developers how to use it in code. If I am asked today, here is how I will go about it.

We’ll start with a small script to verify connectivity with ElastiCache. Let’s name it ‘demo_elasticache.py’.

  • AWS has two types of caching engines — Redis and Memcached. We had Redis (which is more popular than the other). So in our code, we decided to use the python module named redis. Please install it first in your python environment (by using “pip install redis”).
  • You need correct values of these variables:

host = Endpoint of the ElastiCache cluster you have

port = 6379 (default for cache servers)

Now let’s ping the cache using Redis class. If it succeeds, it’ll mean we have the connectivity to cache.

Here is the code:

from redis import Redis

cache_endpoint = 'your.cache.server.endpoint.....'
service_port = 6379
redis = Redis(host=cache_endpoint, port=service_portde, code_responses=True)

try:
cache_is_working = redis.ping()
print('I am Redis. Try me. I can remember things, only for a short time though :)')
except Exception as e:
print('EXCEPTION: host could not be accessed ---> ', repr(e))

You can run the script in Cloud9, Lambda, ec2 or any other compute service that supports python. Let’s run it on a Cloud9 terminal:

python demo_elasticache.py

If your inputs are correct and the cache is up, you should see the success message.

Let’s enhance the program to show different ways to use it in apps. In cache, data is stored in key-value pairs. We use set() function to store data and get() to retrieve them. Here is the code:

from redis import Redis

# cache_endpoint = 'your.cache.server.endpoint.....'
# A system variable called CACHE_ENDPOINT holds the cache endpoint
cache_endpoint = os.environ("CACHE_ENDPOINT")
service_port = 6379

#redis = Redis(host=cache_endpoint, port=service_port)
redis = Redis(host=cache_endpoint, port=service_port, decode_responses=True)

key = "sportsman"
value = "Pele"

# Store the key-value pair
redis.set(key, value)
# Retrieve the value for key='sportsman'
player_name = redis.get(key)

Yes, it is that simple. Notice two things here:

  1. We used the cache_endpoint stored in a system variable avoiding the bad practice of hardcoding it in code.
  2. Two ways you can define redis — the first one without decode_responses=True. Try the first one (commenting out the second redis definition) and run the script. The first line of output shows:

player_name = b’Pele’

It means the values retrieved from cache is of ‘byte’ type. But if you need the value to be a string, we need to use the second definition that needs another parameter: decode_responses=True.

Often we should also specify the TTL behavior of data (how long the data should remain in cache). TTL is specified in seconds. Remember TTL is optional. If you do not specify it, as per the default cache configuration, your data may remain there forever.

# Store the key-value pair with TTL
redis.set(key, value, ttl)
# Retrieve the TTL value -- how many more seconds this data will live
ttl_value = redis.ttl(key)
# Reset the TTL value to 90 seconds from now
redis.expire(key, 90)

You can also store many key-value pairs in one command. Let’s see how to use a dictionary data structure. At first we’ll put all data sets to a dictionary and then use mset() to store them in cache.

# Put all key-value pairs to a dictionary at first
dict_data = {'sportsman':'Pele', 'sportswoman': 'Mia Hamm', 'robot':'R2D2'}
# Store the dictionary in cache
redis.mset(dict_data)

# Retrieve the dictionary with the usual get()
sportsman_listed = redis.get('sportsman')

What if we need to store data of some other data types? For example, we have a list named ‘african_countries’. We want to store it in cache. No problem. Just assign it to a key and store it using lpush(). To retrieve the list, use the function: lrange() with two additional arguments: ‘start’ and ‘end’. To get all values in the list, use 0 and -1. To get a partial list or some specific range of the list, use relevant ‘start’ and ‘end’ values.

# our data-type is 'list' 
african_countries = ['Nigeria', 'Egypt', 'Rwanda', 'Morocco', 'Egypt']

# Assign it to a key and use lpush() to store
redis.lpush('african_country_list', *african_countries)

# Retrieve the full list using lrange() with start=0 and end=-1
list_of_african_countries = redis.lrange('african_country_list', 0, -1)

# Retrieve a speicifc range of values from the list
list_of_african_countries = redis.lrange('african_country_list', 2, 3)

Notice how ‘*’ was used before the variable name when using lpush() to store a List.

List can have duplicate values. Let’s assume, we want to store and retrieve it as a ‘set’ (with no duplicates). Easy! Assign the list to a key and store it using sadd(). To retrieve it, use the function smembers().

# our data type is 'list' 
african_countries = ['Nigeria', 'Egypt', 'Rwanda', 'Morocco', 'Egypt']
# Assign it a key, use sadd() to convert it to a set and then store
redis.sadd('african_set', *african_countries)

# Retrieve the set using smembers() function
set_of_african_countries = redis.smembers('african_set')

What about some housekeeping tasks? Yes, Redis provides a bunch of cache management functions like the following:

  • to see all the keys in cache — keys() — Beware! It may return thousands of keys or more depending on how heavily your cache is used.
  • to see keys that match some pattern –

keys(“africa*”) — shows the keys that start with the letters “africa”.

keys(“*africa”) — for keys that end with the letters “africa”.

keys(“*afri*”) — for those that contain the letters “afri” anywhere in the key.

  • Remember, keys are case sensitive. It means “africa” and “Africa” are two different keys.
  • to check if a key already exists — exists(key) — checks if the key exists in the cache.
  • to remove data from cache — delete(key) — removes the key-value pair from cache.
  • to remove keys that match a pattern — delete(*keys) — removes all key-value pairs whose keys match the pattern. Notice the usage example — how the pattern was used and what the delete syntax is.

Here is how the code looks:

# get all keys available in the cache
redis.keys()

# get all keys that start with the letters "afri"
redis.keys("afri*")

# get all keys that end with the letters "rica"
redis.keys("*rica")

# get all keys that has the letters "afri" anywhere in them
redis.keys("*afri*")

# check if a specific key exists
redis.exists("Africa")

# delete a key from cache
redis.delete("african")

# delete keys that match the pattern "sports*"
keys_to_delete = redis.keys("sports*)
redis.delete(*keys_to_delete)

Remember, our objective was to show how redis can be used in various ways. In our samples, we ignored some general coding best practices. One of them is to use try-except blocks around calls to external systems. Here is a simple example:

# check if a specific key exists
try:
redis.exists("Africa")
except Exception as e:
print("EXCEPTION: ", str(e))

You see, we have covered a pretty wide range of ElastiCache usecases. If you want to get the source code of all routines shown here, see this GitHub repo:

https://github.com/shahidcc/elasticache

SQLAlchemy

Setting up MetaData with Table objects

When we work with a relational database, the basic data-holding structure in the database which we query from is known as a table. In SQLAlchemy, the database “table” is ultimately represented by a Python object similarly named Table.

To start using the SQLAlchemy Expression Language, we will want to have Table objects constructed that represent all of the database tables we are interested in working with. The Table is constructed programmatically, either directly by using the Table constructor, or indirectly by using ORM Mapped classes (described later at Using ORM Declarative Forms to Define Table Metadata). There is also the option to load some or all table information from an existing database, called reflection.

Whichever kind of approach is used, we always start out with a collection that will be where we place our tables known as the MetaData object. This object is essentially a facade around a Python dictionary that stores a series of Table objects keyed to their string name. While the ORM provides some options on where to get this collection, we always have the option to simply make one directly, which looks like:

>>> from sqlalchemy import MetaData
>>> metadata_obj = MetaData()

Once we have a MetaData object, we can declare some Table objects. This tutorial will start with the classic SQLAlchemy tutorial model, which has a table called user_account that stores, for example, the users of a website, and a related table address, which stores email addresses associated with rows in the user_account table. When not using ORM Declarative models at all, we construct each Table object directly, typically assigning each to a variable that will be how we will refer to the table in application code:

>>> from sqlalchemy import Table, Column, Integer, String
>>> user_table = Table(
...     "user_account",
...     metadata_obj,
...     Column("id", Integer, primary_key=True),
...     Column("name", String(30)),
...     Column("fullname", String),
... )

With the above example, when we wish to write code that refers to the user_account table in the database, we will use the user_table Python variable to refer to it.

Components of Table

We can observe that the Table construct as written in Python has a resemblance to a SQL CREATE TABLE statement; starting with the table name, then listing out each column, where each column has a name and a datatype. The objects we use above are:

  • Table – represents a database table and assigns itself to a MetaData collection.
  • Column – represents a column in a database table, and assigns itself to a Table object. The Column usually includes a string name and a type object. The collection of Column objects in terms of the parent Table are typically accessed via an associative array located at Table.c:
>>> user_table.c.name
Column('name', String(length=30), table=<user_account>)

>>> user_table.c.keys()
['id', 'name', 'fullname']

Building Data Access Layers in Python

Source
Imagine you’re building a web application for a student organization. You need to store member information, upcoming events, and potentially even forum posts. It would be a pain to write all the code to directly connect and query the database for every single action?

That’s where Data Access Layers (DAL) come in! Think of a DAL as a specialized library specifically designed for interacting with databases. It acts as a middle layer between your application code and the actual database itself. Here’s what a DAL does for you:

  • Abstraction: You don’t have to worry about the specifics of the underlying database technology (like SQL or NoSQL). The DAL handles the communication details, allowing you to write code that focuses on what the data actually represents (members, events, etc.).
  • Efficiency: DALs can optimize database queries and data retrieval, making your application perform better. They might handle things like connection pooling (reusing connections instead of constantly creating new ones) to improve database access speed.
  • Reusability: You can create generic functions within the DAL to perform common database operations (like adding, updating, or deleting data). This saves you time and ensures consistency across your codebase.
  • Maintainability: If you ever need to switch database technologies, the impact on your application code is minimized because the DAL acts as a buffer. You only need to modify the DAL itself, not rewrite large portions of your application code.

Here’s an analogy: Think of your application code as a chef preparing a delicious meal. The ingredients (data) are stored in the pantry (database). The DAL acts as your sous-chef, efficiently retrieving the exact ingredients you need (specific data) without you having to rummage through the pantry yourself. This frees you up to focus on the core logic of your application (cooking the meal).

Data Access Layers (DAL) in Python:

1. Using Object-Relational Mappers (ORMs):

ORMs like SQLAlchemy simplify data access by mapping database tables to Python classes. Here’s an example using SQLAlchemy:

from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

# Database connection
engine = create_engine('sqlite:///mydatabase.db')

# Define a base class for models
Base = declarative_base()

# Create a user model class
class User(Base):
  __tablename__ = 'users'

  id = Column(Integer, primary_key=True)
  name = Column(String)
  email = Column(String)

# Create database tables (if they don't exist)
Base.metadata.create_all(engine)

# Create a session for interacting with the database
Session = sessionmaker(bind=engine)
session = Session()

# Add a new user
new_user = User(name="Alice", email="alice@hotmail.com")
session.add(new_user)
session.commit()

# Get all users
users = session.query(User).all()
for user in users:
  print(user.name, user.email)

# Close the session
session.close() 

2. Using raw SQL queries:

For more control, you can write raw SQL queries. This requires managing database connections and parsing results:

import sqlite3

# Connect to the database
conn = sqlite3.connect('mydatabase.db')
cursor = conn.cursor()

# Insert a new user
cursor.execute("INSERT INTO users (name, email) VALUES (?, ?)", ("Bob", "bob@example.com"))
conn.commit()

# Get all users
cursor.execute("SELECT * FROM users")
users = cursor.fetchall()
for user in users:
  print(user[0], user[1])  # Access data by index

# Close the connection
conn.close() 

Choosing the right approach depends on your project’s needs. ORMs offer a higher level of abstraction and are generally recommended for complex data models. Raw SQL gives you more control but requires writing more code. Lightweight libraries (like peewee) provide a balance between the two.

Remember:

  • Always handle database connections and errors appropriately.
  • Use prepared statements to prevent SQL injection vulnerabilities.
  • Consider connection pooling for performance optimization in high-traffic applications.

As you embark on your database journey, Data Access Layers (DALs) will become an invaluable tool in your arsenal. They not only streamline database interactions but also lay the foundation for clean, maintainable, and scalable code. By abstracting away the complexities of different database technologies, DALs empower you to focus on the core logic of your application, allowing you to write efficient code.

Furthermore, the benefits of DALs extend beyond individual projects. As you progress to larger, more complex applications, DALs will become essential for managing intricate data models and ensuring consistency across your codebase. By adopting best practices in database interaction from the outset, you’ll be well-equipped to tackle even the most challenging data-driven projects in the future.

PostgreSQL JSON Functions

PostgreSQL jsonb_set() Function

Summary: in this tutorial, you will learn how to use the PostgreSQL jsonb_set() function to replace an existing value specified by a path with a new value in a JSON document.

Introduction to the PostgreSQL jsonb_set() function

The jsonb_set() function allows you to replace an existing value specified by a path with a new value in a JSON document of the JSONB type.

More specifically, the jsonb_set() function allows you to replace an array element or key/value in a JSON object, or nested combinations of them.

Here’s the syntax of the jsonb_set() function:

jsonb_set(
   target jsonb, 
   path text[], 
   new_value jsonb
   [, create_missing boolean]
)Code language: SQL (Structured Query Language) (sql)

In this syntax:

  • target: This is the original JSON document of the JSONB type that you want to modify.
  • path: This is an array of text elements that specifies the path to the key where you want to insert or update the data.
  • new_value is the new JSONB value that you want to set at the specified path.
  • create_missing: This is an optional boolean parameter indicating whether you want to create missing keys if they do not exist. It defaults to true, meaning that the function will create a new key if you attempt to set it for a key that does not exist.

The jsonb_set() function returns the modified JSON document with the new_value set at a specified path.

PostgreSQL jsonb_set() function examples

Let’s explore some examples of using the PostgreSQL jsonb_set() function

1) Updating an existing element in a JSON array

The following example uses the jsonb_set() function to update an existing element in a JSON array:

 SELECT jsonb_set('[1,2,3]', '{0}', '-1');Code language: SQL (Structured Query Language) (sql)

Output:

 jsonb_set
------------
[-1, 2, 3]
(1 row)

In this example:

  • The original array is [1,2,3].
  • The path {0} indicates the first element of the array.
  • The number -1 is the new value.

The jsonb_set() function sets the first element of the array to -1 and returns the modified document.

To insert the number 4 after the 3rd element, you use a non-existing path to the 4th element as follows:

 SELECT jsonb_set('[1,2,3]', '{4}', '4');Code language: SQL (Structured Query Language) (sql)

Output:

  jsonb_set
--------------
[1, 2, 3, 4]
(1 row)

2) Updating an element in a nested JSON array

The following example uses the jsonb_set() function to update an element in a nested array:

SELECT 
jsonb_set(
'[1,2,[4,5],6]', '{2,0}', '3'
);

Output:

     jsonb_set
-------------------
[1, 2, [3, 5], 6]
(1 row)

In this example:

  • The original array is [1,2,[4,5],6].
  • The path {2, 0}, 2 specifies the second element of the array which is the nested array [4,5], and 0 specifies the first element of the nested array.
  • 3 is the new value.

Therefore the jsonb_set() function changes the number 4 as the first element of the nested array [4,5] to 3.

3) Updating data in a JSON object

The following example uses the jsonb_set() to update the value of a key in a JSON object:

SELECT 
jsonb_set('{"name": "Jane Doe"}', '{name}', '"Jane Smith"');
Code language: SQL

Output:

       jsonb_set
------------------------
{"name": "Jane Smith"}
(1 row)

In this example:

  • {“name”: “Jane Doe”} is the original object.
  • {name} is the path that indicates the name property (or key).
  • “Jane Smith” is the new value to update.

Therefore, the jsonb_set() set the value of the name key in the JSON object to “Jane Smith”.

Note that if you attempt to set a key that does not exist, you’ll get an error, the jsonb_set will insert it. For example:

SELECT jsonb_set('{"name": "Jane Doe"}', '{age}', '25');Code language: JavaScript (javascript)

Output:

            jsonb_set
---------------------------------
{"age": 25, "name": "Jane Doe"}
(1 row)

But if you set the create_missing parameter to false, the function will not insert a new key/value pair:

SELECT 
jsonb_set(
'{"name": "Jane Doe"}', '{age}',
'25',
false
);

Output:

      jsonb_set
----------------------
{"name": "Jane Doe"}
(1 row)

4) Updating a value in a nested JSON object

The following example uses the jsonb_set() to modify a key/value pair in a nested JSON object:

SELECT 
jsonb_set(
'{"name":"John Doe", "address" : { "city": "San Francisco"}}',
'{address,city}', '"San Jose"'
);

Output:

                       jsonb_set
-------------------------------------------------------
{"name": "John Doe", "address": {"city": "San Jose"}}
(1 row)

In this example:

  • {"name":"John Doe", "address" : { "city": "San Francisco"}} is the original JSON object.
  • {address, city} is a path that specifies the address key whose value is an object and the city is the key of the address object that will be modified.
  • "San Jose" is the value of the city key.

Therefore, the jsonb_set() function updates the city with the value San Jose in the address object of the JSON document.

5) Updating an element in an array of a nested object

The following example uses the jsonb_set() to update an element in an array of a nested object

SELECT 
jsonb_set(
'{"name": "John", "skills" : ["PostgreSQL", "API"]}',
'{skills,1}',
'"Web Dev"'
);

Output:

                       jsonb_set
-------------------------------------------------------
{"name": "John", "skills": ["PostgreSQL", "Web Dev"]}
(1 row)

In this example:

  • {"name": "John", "skills" : ["PostgreSQL", "API"]} is the original JSON object.
  • {skills,1} is a path that specifies the skills key, which is an array, and 1 specifies the second element of the array.
  • "Web Dev" is the new value to update.

The jsonb_set() function sets the second element of the skills array to "Web Dev".

6) Using the PostgreSQL jsonb_set() function with table data

We’ll show you how to use the jsonb_set() function to insert a new value into a JSON document and update it back to a table.

First, create a new table called employee_skills:

CREATE TABLE employee_skills(
id INT PRIMARY KEY,
data JSONB
);

Second, insert rows into the employee_skills table:

INSERT INTO employee_skills(id, data)
VALUES
(1, '{"name": "John", "skills" : ["PostgreSQL", "API"]}'),
(2, '{"name": "Jane", "skills" : ["SQL","Java"]}')
RETURNING *;

Output:

 id |                       data
----+---------------------------------------------------
1 | {"name": "John", "skills": ["PostgreSQL", "API"]}
2 | {"name": "Jane", "skills": ["SQL", "Java"]}
(2 rows)

Third, replace the first skill in the skills array of the employee id 1 with the new skill "Web Dev":

UPDATE 
  employee_skills
SET 
  data = jsonb_set(
    data, '{skills,0}', '"Web Dev"'
  ) 
WHERE 
  id = 1 
RETURNING *;Code language: SQL (Structured Query Language) (sql)

Output:

 id |                      data
----+------------------------------------------------
1 | {"name": "John", "skills": ["Web Dev", "API"]}
(1 row)

Summary

  • Use the jsonb_set() function to update a JSON document of the type JSONB.

mirror=> SELECT id, name, properties
FROM products;
id | name | properties
—-+———————–+—————————————————
1 | Ink Fusion T-Shirt | {“size”: [“S”, “M”, “L”, “XL”], “color”: “white”}
2 | ThreadVerse T-Shirt | {“size”: [“S”, “M”, “L”, “XL”], “color”: “black”}
3 | Design Dynamo T-Shirt | {“size”: [“S”, “M”, “L”, “XL”], “color”: “blue”}
(3 rows)

mirror=> UPDATE products SET properties = jsonb_set(properties, ‘{color}’, ‘”purple”‘) WHERE id = 3 RETURNING *;
id | name | properties
—-+———————–+—————————————————-
3 | Design Dynamo T-Shirt | {“size”: [“S”, “M”, “L”, “XL”], “color”: “purple”}
(1 row)

UPDATE 1

How to query a JSON column in PostgreSQL

source
Querying JSON in PostgreSQL

PostgreSQL is a powerful relational database management system. One of its standout features is its ability to handle unstructured data by allowing you to store it in a JSON column. This means you can enjoy the benefits of a structured relational database while leveraging the flexibility of JSON for certain data types. Below are some common ways to

Using PostgreSQL JSON to query a column

Retrieving a Specific JSON Key as Text

If you have a table named **events and you want to retrieve the value associated with the key name from the JSON column params, you can use the following query:

SELECT params->>'name' FROM events;

This will return the value of params.name as text from the events table.

Filtering rows based on a specific JSON key value

If you want to find all events with a specific name, for instance, ‘Click Button’, you can use:

SELECT * FROM events WHERE params->>'name' = 'Click Button';

This will return all rows from the events table where the name key in the params JSON column has the value ‘Click Button’.

Accessing an element from a JSON array

If your JSON column contains arrays and you want to retrieve the first element (index 0) of the array associated with the key ids from the params column, you can use:

SELECT params->'ids'->0 FROM events;

This will return the first element of the ids array from the params column in the events table.

Filtering rows based on a nested JSON key

Sometimes, your JSON might have nested structures. For instance, if you have a table named users with a JSON column preferences and you want to find users where the nested key beta is set to true, you can use:

SELECT preferences->'beta' FROM users WHERE (preferences->>'beta')::boolean IS TRUE;

This query first type casts the value of preferences.beta from JSON to boolean and then filters the rows where it’s true.

Querying a JSONb column in PostgreSQL

In PostgreSQL, jsonb is a data type used to store JSON (JavaScript Object Notation) data in a more efficient and optimized binary format. It is an extension of the json data type. jsonb stands for JSON binary. It provides several advantages over the standard json type, especially when it comes to querying and indexing JSON data.

You can queryjsonb columns using various JSON functions and operators provided by PostgreSQL. Some commonly used functions and operators include:

  • ->: Extracts a JSON element by key or array index.
  • ->>: Extracts a JSON element as text.
  • #>: Extracts a JSON sub-object at a specified path.
  • #>>: Extracts a JSON sub-object as text.
  • @>: Checks if a JSON document contains another JSON document.
  • &lt;@: Checks if a JSON document is contained within another JSON document.
  • jsonb_array_elements(): Expands a JSON array into a set of rows.
  • jsonb_each(): Expands a JSON object into key-value pairs.

Suppose you have a table called employees with a jsonb column namedemployee_data.

-- Extract the employee's name
SELECT employee_data->>'name' AS employee_name
FROM employees;

-- Check if the employee has a skill in "Sales"
SELECT *
FROM employees
WHERE employee_data->'skills' @> '["Sales"]';

-- Find employees in the "Marketing" department
SELECT *
FROM employees
WHERE employee_data->>'department' = 'Marketing';

jsonb is a powerful tool for working with JSON data in PostgreSQL, especially when you need to query and manipulate complex JSON structures.

Fixing issues in querying JSON columns

Troubleshooting JSON column querying in PostgreSQL can involve identifying and addressing issues related to data integrity, query performance, and syntax errors. Troubleshooting JSON column querying in PostgreSQL often requires a combination of SQL knowledge, understanding of JSON data structures, and careful query optimization. By addressing these common issues and best practices, you can improve your ability to work effectively with JSON data in PostgreSQL. Here are some common troubleshooting steps and potential issues to watch out for when querying JSON columns.

Nested JSON structures

As shown in the tutorial, querying JSON columns is fairly straightforward. However, it can get a bit difficult to query nested JSON structures. It is important to use appropriate JSON operators and functions to navigate and query nested JSON objects and arrays. Functions like->->>#>, and#>> can help access nested elements. The-> operator returns a JSON object, and->> returns the value as text. By chaining these operators, you can navigate through nested JSON structures to retrieve the desired information.

Incorrect JSON path

While it seems obvious, more often than you would want, specifying the wrong JSON path in your queries results in incorrect output or failed queries. Examples of such error messages are – cannot extract elements from a scalar or JSON path not found. Double-check the JSON path you’re using in your queries, especially when dealing with nested structures. Use tools like JSON viewers to visualize the JSON structure.

Error handling

Data quality is an industry wide problem. While we fight this issue daily, lack of error handling in queries can cause errors. Poor quality data results in random missing keys. A single missing key in a large query can disrupt query execution and raise an error. To ensure your queries don’t fail entirely for a few missing keys, implement error handling in your queries. To handle such situations more gracefully, you can use the COALESCE function or conditional logic to provide a default value when a JSON key is missing. Instead of complete failure, the query will then return "Uncategorized" and you still get the values for which keys are present.

Database version

Some JSON functions and operators may not be available in older PostgreSQL versions.

Ensure that you’re using a PostgreSQL version that supports the JSON functionality you need. Consider upgrading, if necessary.

Performance bottlenecks

The reason for slow query performance can be inefficient JSON queries. To resolve this, profile your queries using tools like EXPLAIN to identify potential bottlenecks. Consider optimizing queries by creating appropriate indexes, rewriting queries, or denormalizing data where necessary.

Manage unstructured data like a pro

PostgreSQL’s ability to seamlessly integrate structured relational data with the flexibility of unstructured JSON data offers developers a unique advantage. By understanding how to query JSON columns effectively, one can harness the full potential of Postgres, making it easier to manage, retrieve, and analyze diverse datasets. Whether you’re accessing specific JSON keys, filtering based on specific values, or diving into nested structures, PostgreSQL provides the tools to do so with precision and efficiency. As data continues to evolve and become more complex, mastering these techniques will be invaluable for any database professional.

Going beyond JSON querying

Not that you have queried JSON columns successfully, it is time to use the results to gain further insights. Filtering this data is one of the ways to refine the results of JSON querying. Take a look at the tutorial How to Use Filter to Have Multiple Counts in PostgreSQL.

FAQs

1. How do I extract a specific JSON key from a JSON column?

You can use the -> or->>operators to extract JSON keys.` `-> returns the value as JSON, while->> returns the value as text.

2. Can I query nested JSON structures in PostgreSQL?

Yes, PostgreSQL allows you to query and extract data from nested JSON structures using nested -> or ->> operators.

3. How can I filter rows based on JSON data criteria?

Use the WHERE clause to filter rows based on JSON data criteria. For example,WHERE json_column->>’key’ = ‘value’.

4. What’s the difference between the json and jsonb data types for querying JSON in PostgreSQL?

jsonb is a binary JSON data type optimized for querying and indexing, while json is a plain text JSON data type. jsonb is recommended for querying JSON data.

5. How do I handle missing JSON keys or errors when querying JSON columns?

You can use theCOALESCE function or conditional logic to provide default values or handle missing keys when querying JSON columns.

How to Query Deeply Nested JSON Data in PSQL

source
First let’s start by creating a table, and specifying that we need a PSQL database. And right away let’s add some data we can work with later.

CREATE TABLE jsonb_exp (
  id serial primary key,
  data jsonb
);

INSERT INTO jsonb_exp (data) VALUES 
  ('{"data":{"1":{"items":[{"name":"John"},{"name":"Jack"}]},"2":{"items":[{"name":"John Dorian"},{"name":"Jack Sparrow"}]},"3":{"items":[{"name":"Tom Cruise"},{"name":"somerandomtext"}]},"5":{"items":[{"name":"Tom Cruise"}]}},"property":"Some string"}'),
  ('{"data":{"4":{"items":[{"name":"Maria"},{"name":"Jack"}]},"6":{"items":[{"name":"Jack Jackson"},{"name":"Thomas ---"}]},"7":{"items":[{"name":"-"},{"name":"somerandomtext"}]},"15":{"items":[{"name":"hello"}]}},"property":"Some string"}'),
  ('{"a": "b", "c": [1, 2, 3]}'

Usually, you wouldn’t have a different structure inside your JSON across records, but it’s just for the tests. We can now select and see what we’ve created in a table format.

SELECT * from jsonb_exp;

First, let’s get the basics out of the way.

-> returns jsonb
->> returns text

Not sure, what type you’re working with, you can use pg_typeof.

SELECT data->>'data' as data, pg_typeof(data->'data'), pg_typeof(data->>'data') from jsonb_exp where data->>'data' IS NOT NULL;

Querying from deeply nested JSONB

In this scenario, as you can see in the example above: We have an object, it has a defined structure, but some of the keys are like array indices. Not a common thing, maybe. But it happens.

To the rescue comes the following method:

jsonb_path_query()

It’s in the documentation, but it wasn’t clear to me right away how it works. For the first argument, you pass the object, next you can pass the query. Let’s say we need to query particular items from the object, from ‘items’ property, and only those that contain ‘Jack’.

SELECT id, data, item FROM jsonb_exp t, jsonb_path_query(data->'data', '$.*.items[*]') as item
 WHERE data->'data' IS NOT NULL AND item->>'name'='Jack';

Unpacking. Let’s first look at the jsonb_path_query query, everything else is just the support around it to demonstrate what we did.

Reading ‘$..items[]’, here is how you can read it. $ – the object itself, it can contain a key we don’t know – mark it with an asterisk. Inside of that, we want to look for property items, which is an array, and using [*] we say, that we want every item inside that array.

The comma in the query is just an implicit cross-join, not super relevant to JSONB methods.

Remove the ‘WHERE’ Clause and filter right away

Here is the code, you can add conditions inside the query.

SELECT id, data, jsonb_path_query(data->'data', '$.*.items[*] ? (@.name == $name)', '{"name": "John"}') FROM jsonb_exp
 WHERE data->'data' IS NOT NULL;

As you can see here, you can even add the third parameter where you can keep the variables to be used in the query. It’s just for demo purposes here, you can absolutely remove it and compare it with the string without using additional variables.

Notice here, you use double equal signs and not one as you would outside of this query.

You also can’t just write LIKE as your comparison operator. But you can use like_regex. Let’s say we want to find every item, where the name starts with ‘John’.

SELECT id, data, jsonb_path_query(data->'data', '$.*.items[*] ? (@.name like_regex "^John")') FROM jsonb_exp
 WHERE data->'data' IS NOT NULL;

Dealing with nested JSON objects in PostgreSQL

source
Over the past few weeks I’ve been having to deal with nested JSON columns in Postgres. It was turning into me just throwing queries at it until something stuck. Well, as it would turn out, it’s actually very simple! I just needed to take the time to sit down and learn about a few operators and what they’re actually doing.

With that being said, here is a condensed version of some things I learned that helped me navigate through JSON columns in Postgres.

First, let’s create a very basic table containing a json column

create table sample_table (json_data jsonb);

Now, let’s seed this table with some data about a few different vehicles

insert into sample_table
values 
    ('{ "year": "2011", "make":"Toyota", "model":"Camry", "misc": {"color": "Gray", "doors": "4"}}'),
    ('{ "year": "2017", "make":"Honda", "model":"Civic", "misc": {"color": "White", "doors": "4"}}'),
    ('{ "year": "2017", "make":"Toyota", "model":"Camry", "misc": {"color": "Red", "doors": "2"}}'),
    ('{ "year": "2023", "make":"Honda", "model":"Accord"}'),
    ('{ "year": "1908", "make":"Ford", "model":"T", "misc": {"doors": "2"}}')
;

Now that we have our data, let’s go over some of the basics.

If you’ve read my post about the Bacon Cannon, then you know that I am a firm believer that unique operators deserve unique names. So, as I cover some unique operators, I will be throwing in their nicknames, at no extra charge to you!

The -> Operator

Using the -> (Stabby) operator will return JSON data. So here we can see a few examples of how we would get the json values out of the column:

select json_data -> 'make' from sample_table;
select json_data -> 'model' from sample_table;
select json_data -> 'year' from sample_table;
select json_data -> 'misc' from sample_table;

image

As you can see, ‘misc’ is a nested JSON object. If you want to go another layer deeper into a nested JSON object, we can simply add another ->

select json_data -> 'misc' -> 'color' as color from sample_table;

image

The ->> Operator

Now, as I said before, this isn’t giving us text values. These quotations are an indicator that ->is returning JSON. In order to grab the text value we need to use the ->> (Double Stabby) operator. This is equivalent to adding ::text afterwards, typecasting it to a text.

select json_data -> 'misc' ->> 'color' as color from sample_table;

image

The #> and #>> operators

A simpler way to do nested JSON attributes is to use the #> (Waffle Cone) operator with an array of the path you want to follow.

select json_data #> Array['misc','color'] as color from sample_table;

Just like before, a single > returns JSON, so if we wanted to grab the actual text value, we’ll need to use the #>> (Double Waffle Cone) operator.

select json_data #>> Array['misc','color'] as color from sample_table;

The ?, ?|, and ?& operator

Okay, but what if you want the entire row that matches a certain criteria? We’ll move this into a where clause, and then use the ? operator.

A good way to think of this operator is to replace it with the word “contains”.

For example the next query would read in english as

“select all from sample_table where json_data’s misc doors contains the value 4”

This query will return to us every row where there is a 4 door car

select * from sample_table
where json_data #> Array['misc', 'doors'] ? '4'
;

image

This query will return to us every row containing a Toyota or a Honda

select * from sample_table
where json_data -> 'make' ?| Array['Toyota','Honda']
;

image

This query will return to us every row containing both color and doors keys inside of misc.

select * from sample_table
where json_data -> 'misc' ?& Array['color', 'doors']
;

image

(Notice this doesn’t contain the row that has just {“misc”: {“doors”: “2”}}

The @> and <@ operators

If you are looking for a specific JSON object inside of another, you want to use the @> *(Penguin)*operator.

For example, if were looking specifically for {"key": "value"}:

select * from sample_table
where json_data @> '{"make": "Toyota"}'
;

As you might have guessed, we can look for nested JSON objects

This will return to us every row containing red, 2-door vehicles”

select * from sample_table
where json_data @> '{"misc": {"color":"Red", "doors":"2"}}'
;

There is also the <@ (Ice Cream Cone or Reverse Penguin) operator that behaves the same, only in reverse.

select * from sample_table
where '{"misc": {"color":"Red", "doors":"2"}}' <@ json_data
;

In conclusion

I hope this helps clear up any confusion you might have had about querying nested JSON objects in Postgres.

If you want to learn more about these operators or if you want to see what else you can do, here is a link to the docs for JSON functions in Postgres.

Storing and Using JSON Within PostgreSQL 

https://www.percona.com/blog/storing-and-using-json-within-postgresql-part-one/
https://www.percona.com/blog/storing-and-using-json-within-postgresql-part-two

Recap & What’s Next

A few quick takeaways:

Recapping part 1:

  • Using JSONB is probably going to be your best option in most use cases.
  • Be very careful of type conversions and making assumptions on the data within your JSON/JSONB columns.  You may get errors or odd results.
  • Use the available indexes, generated columns, and expression indexes to gain substantial performance benefits.

What we learned in part 2:

  • The more straightforward the JSON, the easier it will be to pull out and interact with the data you need.
  • Nested JSON data can be pulled out in a few different ways. jsonb_to_rescordset is the easiest way I found to pull out the data I need.  However, using this function or others is very susceptible to the data structure in your document (data types matter!).
  • JSON data whose format changes ( elements added or removed ) may make using some functions difficult, if not impossible, to use.
  • JSON within a well-built, designed application and database can be wonderful and offer many great benefits.  JSON just dumped into databases won’t scale.  Database design still matters.

Now that we have covered the basics and a few nuances of JSON with PostgreSQL, next up, it is time to look at the same functionality and data within MongoDB before finally comparing the three databases. Stay tuned!