How to export large datasets into a JSON file in PHP without memory exhaustion

This article is about how I came up with a solution to export large datasets into a JSON file in PHP. If you just need a solution, use the php-json-exporter library.

Computers have become cheaper, but RAM capacity is still expensive. In a usual cloud provider, an 8GB VPS costs around 20-40$/month. 8GB is sufficient for most tasks. For example, a simple PHP request takes about 3MB-10MB when processed via PHP-fpm or CLI. If we say it’s 8MB, we can handle 1000 simultaneous requests. That’s a lot! If a request takes 1 second to process, an 8GB web server can handle 2 billion requests. Of course, there are other factors to consider, but you get the idea. For simple tasks, 8GB is a large amount of RAM.

But, when it comes to large amounts of data, you cannot simply save everything in RAM. Disk storage (or cloud storage like S3) is the best option, which has become incredibly cheaper over time.

Problem

Hyvor Talk is a commenting platform that I run with a small team. We have several clients who have millions of comments on their websites. Some of them want to regularly export their data for reasons like safety, and privacy, and also to analyze data to get a better understanding of their community.

As with any startup, we started small with small clients. Our first exporter read all comments from the DB to the server memory and wrote it to a JSON file.

Here’s a demonstration of how it worked:

1$comments = getAllCommentsOfWebsite($websiteId);
2file_put_contents('export.json', json_encode($comments));

However, as we started to get larger clients, this became a problem. Some export files were multiple GBs large. When an importing was running in the background in a server, it sometimes caused exports to fail due to insufficient memory. In the worst cases, it slowed down other processes on the server.

Solution Ideas

One pretty easy solution is to upgrade the server to have a HUGE RAM. No thanks.

The other solution is incremental writing into the disk. Take only a subset of data into memory. Then, write it to a file on the disk. Do the same until you reach the end of the dataset. This solution depends on the following facts

  • Relational databases are optimized to get data chunks faster using LIMIT and OFFSET (of course, the table should be indexed properly)

  • Disk storage is cheaper than RAM and can save large files

  • JSON is easy to manually write by just using string operations.

Solution

I wrote a simple library called php-json-exporter. It’s open-source, so feel free to use it in your projects. It handles JSON exporting part, however, you have to write the database logic on your own (usually with simple LIMIT and OFFSET). Here’s an example (getComments() is a hypothetical function that fetches 1000 comments at a time with a given offset):

1use Hyvor\JsonExporter\File;
2
3$file = new File('export-file.json');
4
5$comments = $file->collection('comments');
6$comments->addItems(getComments(offset: 0));
7$comments->addItems(getComments(offset: 1000));
8$comments->addItems(getComments(offset: 2000));
9
10$file->value('version', '1.0');
11
12$file->end();

The exporting JSON is always a single object.

When the $file object is created, the library starts writing a JSON object to the export-file.json file:

1{

Then, $file->collection(‘comments’) starts a collection with the key comments

1{
2 "comments": [

Each ->addItems() method call adds 1000 comments, which are returned from the getComments() function. So, we now have something like this:

1{
2 "comments": [
3 {"id": 1, "body": "Comment 1"},
4 {"id": 2, "body": "Comment 2"},
5 // more...

Note that adding a new value to the collection adds a , to the last value. JSON does not support trailing commas, so the library makes sure to handle commas properly.

After the collections, we add a value to our JSON object using $file->value('version', '1.0');. This does a couple of things.

  1. Closes the last value. In this case, it closes the collection array using ]

  2. Adds a comma after the last value

  3. Adds the new value

Now we have:

1{
2 "comments": [
3 // comments
4 ],
5 "version": "1.0"

$file->end() adds the trailing }. We finally have a valid JSON object saved in a file.

1{
2 "comments": [],
3 "version": "1.0"
4}

The peak memory usage only depends on the number of rows you fetch at a time, not the total number of rows in the table.

Conclusion

Writing incrementally to a file in the disk is an easy solution to write JSON files without memory exhaustion problems. Feel free to use our library in your projects. If you have any questions, feel free to comment below.

Newsletter

Comments