How can JSON Data be modified using jq?

jq, which is a command-line JSON processing tool, proves to be valuable for handling machine-readable data formats and is particularly beneficial in shell scripts. Utilizing jq can assist in data manipulation, such as extracting specific information from a server’s JSON API response when making a curl call. Furthermore, as a data engineer, you can incorporate jq into your data ingestion process. In the case of managing a Kubernetes cluster, you can utilize the JSON output of kubectl as an input for jq to extract the count of available replicas for a particular deployment.

In the beginning

Finding and manipulating information in big JSON files can be challenging. One approach is to manually extract and calculate relevant snippets, but this is time-consuming and prone to errors. Alternatively, you can utilize general-purpose tools such as sed, awk, and grep, which are available in all modern Linux systems. However, for machine-readable data formats like JSON, there are other options available.

 

Using jq, you will be able to manipulate a sample JSON file containing information about ocean animals. You will apply filters to transform the data and combine the transformed pieces to create a new data structure. By the end of the tutorial, you will gain the ability to use a jq script to analyze the manipulated data and answer queries.

Requirements

In order to finish this tutorial, the following items will be required.

  • jq, a JSON parsing and transformation tool. It is available from the repositories for all major Linux distributions. If you are using Ubuntu, run sudo apt install jq to install it.
  • An understanding of JSON syntax, which you can refresh in An Introduction to JSON.

The initial step is to execute your very first jq command.

You will configure your sample input file and verify it by executing a jq command to produce an output of the data from the sample file. jq can receive input from a file or a pipe, and in this case, you will utilize a file.

To start, you will generate the sample file. Use your preferred editor (for this tutorial, nano is used) to create and open a new file called seaCreatures.json.

  1. nano seaCreatures.json

 

Paste the given information into the file.

seaCreatures.json can be paraphrased as the JSON file containing information about underwater animals.
[
    { "name": "Sammy", "type": "shark", "clams": 5 },
    { "name": "Bubbles", "type": "orca", "clams": 3 },
    { "name": "Splish", "type": "dolphin", "clams": 2 },
    { "name": "Splash", "type": "dolphin", "clams": 2 }
]

You will be using this data throughout the tutorial. By the end of the tutorial, you will have created a single-line jq command that provides answers to the following queries about this data.

  • What are the names of the sea creatures in list form?
  • How many clams do the creatures own in total?
  • How many of those clams are owned by dolphins?

Please save the file and close it.

Besides requiring an input file, you will also need a filter that specifies the precise transformation you wish to perform. The filter known as the “. (period)” or identity operator simply returns the JSON input as it is, without any changes.

If you want to check if your setup is functioning correctly, you have the option to utilize the identity operator. In case you encounter any parse errors, make sure that the seaCreatures.json file consists of valid JSON data.

Use the given command to apply the identity operator to the JSON file.

  1. jq ‘.’ seaCreatures.json

 

To utilize jq with files, you should always provide a filter followed by the input file. It is advisable to enclose the filter in single quotation marks to account for special characters and spacing that may have significance in your shell. This approach informs your shell that the filter is a command parameter. It is important to note that running jq will not alter your original file.

You will get the resulting output.

Output

[ { “name”: “Sammy”, “type”: “shark”, “clams”: 5 }, { “name”: “Bubbles”, “type”: “orca”, “clams”: 3 }, { “name”: “Splish”, “type”: “dolphin”, “clams”: 2 }, { “name”: “Splash”, “type”: “dolphin”, “clams”: 2 } ]

By default, jq will format its output in a visually pleasing manner. It will automatically add indentation, insert new lines after each value, and use colors whenever feasible. This coloring enhances legibility, benefiting developers when examining JSON data generated by other tools. For instance, if you are sending a curl request to a JSON API and wish to make the response more readable, you can pipe it into jq ‘.’ to have it pretty printed.

Now that you have successfully installed and activated jq, you can proceed. With the input file properly configured, you will use several filters to manipulate the data and calculate the values for all three attributes: creatures, totalClams, and totalDolphinClams. Next, you will extract the information related to the creatures attribute.

Step 2 – Obtaining the value of the creatures.

During this particular stage, you will be creating an inventory of sea creatures by utilizing the creatures’ values to identify their names. Upon completion of this stage, you will obtain a list of names as follows:

Output

[ “Sammy”, “Bubbles”, “Splish”, “Splash” ],

To create this list, you need to extract the names of the creatures and then combine them into an array.

To obtain the names of all creatures and exclude everything else, you’ll need to enhance your filter. When operating on an array with jq, you have to specify that you want to work with the values within the array rather than the array itself. In this case, you can achieve this by utilizing the array value iterator, denoted as .[]. It will help you meet your objective.

Execute jq using the altered filter.

  1. jq ‘.[]’ seaCreatures.json

 

Now each value of the array is being output individually.

Output

{ “name”: “Sammy”, “type”: “shark”, “clams”: 5 } { “name”: “Bubbles”, “type”: “orca”, “clams”: 3 } { “name”: “Splish”, “type”: “dolphin”, “clams”: 2 } { “name”: “Splash”, “type”: “dolphin”, “clams”: 2 }

Instead of displaying the complete array item, you should only display the value of the name attribute and ignore the rest. By using the pipe operator |, you can apply a filter to each output similar to how you would use find | xargs on the command line to apply a command to every search result.

To access the name property of a JSON object, use .name. Execute this command on seaCreatures.json by combining the pipe with the filter.

  1. jq ‘.[] | .name seaCreatures.json

 

You will observe that the output no longer contains the remaining attributes.

Output

“Sammy” “Bubbles” “Splish” “Splash”

By default, when using jq, the output will be in valid JSON format which includes strings within double quotation marks. If you want the string to appear without the double quotes, you can use the -r flag to enable raw output.

  1. jq -r ‘.[] | .name’ seaCreatures.json

 

The quotation marks are no longer visible.

Output

Sammy Bubbles Splish Splash

You have acquired the knowledge of extracting particular details from the JSON input. This method will be applied to uncover additional specific information in the subsequent stage, leading to the generation of the creatures value in the concluding step.

Step 3 involves using the map function to calculate the sum of the totalClams value using the add method.

In this stage, you will determine the overall count of clams possessed by the creatures. By combining several data points, you can compute the result accurately. Once you become familiar with jq, this method will be quicker than manual computations and less susceptible to human mistakes. The anticipated outcome after this step is 12.

In Step 2, you obtained certain pieces of information from a collection of items. You can employ the same method to retrieve the values of the attribute called “clams”. Modify the filter accordingly and execute the command.

  1. jq ‘.[] | .clams seaCreatures.json

 

The output will consist of the separate values of the clams attribute.

Output

5 3 2 2

In order to calculate the total of separate values, you require the implementation of the add filter. The add filter is designed for arrays. Nonetheless, as you are presently displaying array values, they must be enclosed within an array initially.

Simply enclose your current filter with brackets like this: []

  1. jq [.[] | .clams] seaCreatures.json

 

A list will display the values.

Output

[ 5, 3, 2, 2 ]

You can enhance the legibility of your command and simplify its maintenance by using the map function before applying the add filter. By utilizing a single map invocation, you can iterate over an array, apply a filter to each item, and wrap the results in an array. When given an array of items, map will execute its argument as a filter for each item. For instance, if you employ the filter map(.name) on [{“name”: “Sammy”}, {“name”: “Bubbles”}], the resultant JSON object will be [“Sammy”, “Bubbles”].

Rewrite the filter function to produce an array and replace it with a map function, then execute it.

  1. jq ‘map(.clams)’ seaCreatures.json

 

You’ll get the identical result as previously.

Output

[ 5, 3, 2, 2 ]

Now that you possess an array, you can simply insert it into the add filter.

  1. jq ‘map(.clams) | add seaCreatures.json

 

You will be given the total of the array.

Output

12

Once you apply this filter, you’ll obtain the overall count of clams. This count will be utilized later to determine the value of totalClams. Currently, you have developed filters for two of the three questions. Once you create the remaining filter, you will be able to generate the final result.

In Step 4, the total value of Dolphin Clams is calculated using the add Filter.

After determining the number of clams owned by the creatures, you can calculate the dolphins’ share by summing the values of array elements that meet a specific condition. The desired outcome at this stage is to obtain a value of 4, which represents the total number of clams possessed by the dolphins. This resulting value will eventually be assigned to the totalDolphinClams attribute.

Instead of summing up the total value of all clams in Step 3, you will only consider the clams owned by creatures classified as “dolphin”. To achieve this, you will utilize the select function with a specific condition: select(condition). Any input that satisfies the condition will be included, while all other input will be disregarded. For instance, if your JSON input is “dolphin” and your filter is select(. == “dolphin”), the output will be “dolphin”. Conversely, if the input is “Sammy”, applying the same filter will result in no output.

To filter out values in an array, you can utilize the map function to apply a condition and only keep the values that meet the criteria.

If you specifically want to keep only the array values that have a type value of “dolphin,” the resultant filter will be:

  1. jq ‘map(select(.type == “dolphin”))’ seaCreatures.json

 

The filter won’t be able to identify Sammy the shark and Bubbles the orca, but it will correctly identify the pair of dolphins.

Output

[ { “name”: “Splish”, “type”: “dolphin”, “clams”: 2 }, { “name”: “Splash”, “type”: “dolphin”, “clams”: 2 } ]

To keep only the quantity of clams for each creature, you can add the field name to the end of the parameter of the map in order to discard irrelevant information.

  1. jq ‘map(select(.type == “dolphin”).clams)’ seaCreatures.json

 

When the map function is given an array as input, it will apply the filter function provided to each element in the array. As a result, the select function will be called four times, once for each creature. The select function will generate output for the two dolphins (as they meet the criteria) and exclude the others.

The result will be an array that includes only the clams values of the two corresponding creatures.

Output

[ 2, 2 ]

Pass the array values as input to the “add” function.

  1. jq ‘map(select(.type == “dolphin”).clams) | add seaCreatures.json

 

The result will be the total sum of the clam values specifically from creatures categorized as “dolphins”.

Output

4

In the previous step, you efficiently utilized the map and select functions to access an array, filter its elements based on a condition, modify them, and calculate the sum of the modified values. This technique can be employed to determine the total number of Dolphin Clams present in the resulting output, which we will accomplish in the upcoming phase.

Step 5 involves converting data into a different data format.

In the earlier stages, you created filters to extract and manipulate the sample data. At this point, you can merge these filters to produce an output that addresses your queries related to the data.

  • What are the names of the sea creatures in list form?
  • How many clams do the creatures own in total?
  • How many of those clams are owned by dolphins?

To obtain a list of sea creatures’ names, you utilized the map function by specifying .name. To calculate the total number of clams owned by all creatures, you applied the add filter after mapping .clams. For determining the quantity of clams possessed by dolphins specifically, you employed the select function with the condition .type == “dolphin” while mapping .clams, followed by the add filter.

You will merge these filters into a single jq command that performs all the tasks. By combining the three filters, you can create a new JSON object that represents the desired information in a new data structure.

Just a friendly reminder, the initial JSON file you have is identical to the following:

Could you please provide more context or additional information about “seaCreatures.json” in order to paraphrase it?
[
    { "name": "Sammy", "type": "shark", "clams": 5 },
    { "name": "Bubbles", "type": "orca", "clams": 3 },
    { "name": "Splish", "type": "dolphin", "clams": 2 },
    { "name": "Splash", "type": "dolphin", "clams": 2 }
]

The output of the converted JSON will create the following:

Final Output

{ “creatures”: [ “Sammy”, “Bubbles”, “Splish”, “Splash” ], “totalClams”: 12, “totalDolphinClams”: 4 }

This is an example of the complete jq command syntax using empty input values.

  1. jq ‘{ creatures: [], totalClams: 0, totalDolphinClams: 0 }’ seaCreatures.json

 

By utilizing this filter, you generate a JSON object that comprises three attributes.

Output

{ “creatures”: [], “totalClams”: 0, “totalDolphinClams”: 0 }

It seems like we’re nearing the final result, but the input values are incorrect as they haven’t been extracted from your seaCreatures.json file.

Substitute the predetermined attribute values with the filters that were generated in the previous stages.

  1. jq ‘{ creatures: map(.name), totalClams: map(.clams) | add, totalDolphinClams: map(select(.type == “dolphin”).clams) | add }’ seaCreatures.json

 

The filter mentioned above instructs jq to generate a JSON object that includes:

  • A creatures attribute containing a list of every creature’s name value.
  • A totalClams attribute containing a sum of every creature’s clams value.
  • A totalDolphinClams attribute containing a sum of every creature’s clams value for which type equals “dolphin”.

Execute the command and the result produced by this filter should be.

Output

{ “creatures”: [ “Sammy”, “Bubbles”, “Splish”, “Splash” ], “totalClams”: 12, “totalDolphinClams”: 4 }

You currently possess a solitary JSON entity containing pertinent information for all three inquiries. In case the dataset undergoes modifications, the jq filter you crafted will enable you to reapply the modifications whenever necessary.

In conclusion,

If you’re dealing with JSON input, jq is a valuable tool that enables you to carry out various data transformations that would be complex to achieve using text manipulation tools like sed. Throughout this tutorial, you utilized jq to filter data using the select function, modify array elements with map, calculate the sum of number arrays using the add filter, and acquire the knowledge to combine transformations into a fresh data structure.

If you want to gain knowledge about advanced features of jq, you should delve into the jq reference documentation. In case you frequently handle command output that is not in JSON format, you can refer to our guides on sed, awk, or grep to learn about text processing techniques applicable to any format.

 

More tutorials

JSON fundamentals(Opens in a new browser tab)

One example of Spring REST XML and JSON(Opens in a new browser tab)

get pandas DataFrame from an API endpoint that lacks order?(Opens in a new browser tab)

Using Kotlin Android SharedPreferences(Opens in a new browser tab)

How can you determine the standard deviation in R?(Opens in a new browser tab)

 

Leave a Reply 0

Your email address will not be published. Required fields are marked *