The most efficient write (tens of thousands of objects)


#1

Hi, I’m curious if there is a way how to improve the performance of saving a large amount of data.

Background:
Objects:

class StopTime: Object {
    @objc dynamic var serviceID = 0
    @objc dynamic var headsign = ""
    @objc dynamic var arrivalTime = ""
    @objc dynamic var departureTime = ""
}
class Station: Object {
    @objc dynamic var stopID = ""
    let list = List<StopTime>()

    override static func primaryKey() -> String? {
        return "stopID"
    }
}

My flow of processing data from network (simplified)

   func processData() {
        fetchData { (data) in
            decodeData(data) { (dataToConvert) in
                DispatchQueue.global(qos: .background).async {
                    transformAndSave(dataToConvert)
                }
            }
        }
    }

  func fetchData(completion: @escaping ((Data?) -> Void)) {
        // URLSessionDownloadTask
    }

  func decodeData(_ data: Data, completion: @escaping ((MyCustomStruct?) -> Void)) {
        DispatchQueue.global(qos: .background).async {
            let decoder = JSONDecoder()
            decoder.dateDecodingStrategy = .iso8601

            do {
                let decoded = try decoder.decode(MyCustomStruct.self, from: data)
                completion(decoded)

            } catch {
                completion(nil)
            }
        }
    }

IMPORTANT: converting data from custom struct to Realm Object and saving to Realm
I’m using local Realm, not default path.


  func transformAndSave(_ data: MyCustomStruct) {
        do {
            let config = Realm.Configuration(fileURL: Constants.realmURL)
            let realm = try Realm(configuration: config)
            var stations: [Station] = []

            for item in data {
                let station = Station()
                station.stopID = item.id

                let stopTimes: [StopTime] = item.stops.map { (stop) in
                    let stopTime = StopTime()
                    stopTime.serviceID = stop.serviceID
                    stopTime.headsign = stop.headsign
                    stopTime.arrivalTime = stop.arrivalTime
                    stopTime.departureTime = stop.departureTime
                    return stopTime
                }

                station.list.append(objectsIn: stopTimes)
                stations.append(station)
            }

            let oldStations = realm.objects(Station.self)
            let oldStopTimes = realm.objects(StopTime.self)

            try? realm.write {
                // delete old data
                realm.delete(oldStations)
                realm.delete(oldStopTimes)

                // insert new data
                realm.add(stations)
            }

        } catch {
            // handle errorc
        }
    }

Task:
I need to save 122 stations, each contains list of approx. 500 stop times (= approx. 70 000 stop times). This write transcation takes about 10 seconds.

Is it an average time due to the number of items or is there a way how can I save some time?

For example:
Would it be faster, if the stop times weren’t in the list? (I want to maintain the order…but I could sort them after getting it from the realm I guess)
Or is there some concurrent solution? (e. g. perform two writes - half and a half in the same time?)

Thank you for your time! <3

EDIT: I’ve updated background information. Thanks Jay.


#2

Can you please include your Realm object(s) in your question? Also, are you using a local Realm or writing/syncing to Realm Cloud? How do you connect to Realm?

Also, the code in the question is not very complete as to what/how you’re actually writing your data and it’s a little unclear why you’re deleting alreadySavedData and then adding data. Can you clarify?


#3

I’ve updated the original question. Let me know if you need anything else :slight_smile: Thank you very much!


#4

Good update. Questions:

Does the entire process take 10 seconds or does the actual write take 10s? Also, you’ve got other things going on at well during this process as you are reading a number of objects, and then deleting a number of objects as well as writing stations.

Another question: the stations array looks like it get’s pretty large. Have you tried writing each station in the for item loop? If there are 122 stations, this would equate to only 122 writes, and the memory would be freed up after each station is written.

        for item in data {
            let station = Station()
            station.stopID = item.id

            let stopTimes: [StopTime] = item.stops.map { (stop) in
                let stopTime = StopTime()
                stopTime.serviceID = stop.serviceID
                stopTime.headsign = stop.headsign
                stopTime.arrivalTime = stop.arrivalTime
                stopTime.departureTime = stop.departureTime
                return stopTime
            }

            station.list.append(objectsIn: stopTimes)
            try? realm.write {
               realm.add(station)
            }
        }

#5

I have print statement with time stamps, so I know that the transformation takes about 1s, deleting 100ms and writing 10s.
As you suggested I tried to write each station in the loop - but it got even worse to 16 seconds. So probably single write transaction is better?

I’ll probably try the difference, if stop times weren’t in the ordered list, but just plain results.
And I wonder if it would help if I changed Strings to Integers. (It’s 70 000 objects)
e.g.

{"arrivalTime": "04:45:25", "headsign": "station name", "departureTime": "04:45:45", "serviceID": 1}
{"arrivalTime": 44525, "headsign": 41, "departureTime": 44525, "serviceID": 1}

This could save some storage space as well as time of write, don’t you think?


#7

This intrigued me so I crafted a test. In this test we create 122 people, each having 500 dogs. So we have roughly 61000 objects.

Here are my two classes

class DogClass: Object {
    @objc dynamic var dog_id = NSUUID().uuidString
    @objc dynamic var dog_name = ""

    let owners = LinkingObjects(fromType: PersonClass.self, property: "dogs")
    
    override static func primaryKey() -> String? {
        return "dog_id"
    }
}

class PersonClass: Object {
    @objc dynamic var person_id = UUID().uuidString
    @objc dynamic var full_name = ""

    let dogs = List<DogClass>()
    
    override static func primaryKey() -> String? {
        return "person_id"
    }
}

That should be a similar set up to your station and stop times classes.

Then we have a function that creates 122 people, creates an array of 500 dogs and adds that array to each person, then the whole thing is written to Realm

func whoLetTheDogsOut() {
    if let realm = gGetRealm() {
        let startTime = Date()
        var persons: [PersonClass] = []
        for i in 0..<122 {
            let person = PersonClass()
            person.full_name = "person_\(i)"
            var dogArray = [DogClass]()
            for j in 0..<500 {
                let dogName = "dog_\(j)"
                let dog = DogClass()
                dog.dog_name = dogName
                dogArray.append(dog)
            }
            person.dogs.append(objectsIn: dogArray)
            persons.append(person)
        }
        
        try? realm.write {
            realm.add(persons)
        }
        
        let elapsed = Date().timeIntervalSince(startTime)
        print("  time: \(elapsed)")
    }
}

One difference is you are mapping over your item.stops to get the data from each stop to assign to stoptime. But as you mentioned, the time for that task is small. but even one second seems too long. My code is just brute force creating 500 dogs and then adding it to the persons dog List.

Results:

Without writing to Realm, my code take .19 seconds to run.

When writing to realm, my code takes 1.935 seconds to run.

37%20PM


#8

Oh, a Question: You have

func transformAndSave(_ data: MyCustomStruct)

but then you’re iterating over data?

for item in data

If MyCustomStruct is actually a structure, you can’t iterate over it as it’s not a sequence. So… What is MyCustomStruct?


#9

OH MY GOD … I’m so … “inattentive”. Thank you very much for your input and test you’ve made, Jay (:dog::dog::dog:). I really started to investigate this problem with your help.

And I came to conclusion:
My biggest mistake was the quality of the dispatch queue!
Without thinking I chose .background quality, which isn’t the best possibility for this task. (Apple docs) When I choose .userInitiated quality, the time of writing to Realm rapidly decrease.
So I think this is solved. Thanks @jay, I really appreciate your help.

See this test:

        DispatchQueue.global(qos: .background).async {
            transformAndSave(dataToConvert)
        }

Background quality:
iPhone 6 ,iPhone 8 - both take about 10s to write over 70 000 objects.

        DispatchQueue.global(qos: .userInitiated).async {
            transformAndSave(dataToConvert)
        }

User Initiated quality:
iPhone 6 - takes about 5s
iPhone 8 - takes about 2s