1 NVMOVE: Helping Programmers Move to Byte-based PersistenceHimanshu Chauhan with Irina Calciu, Vijay Chidambaram, Eric Schkufza, Onur Mutlu, Pratap Subrahmanyam
2 Critical Performance GapFast, but volatile. Cache DRAM We are all familiar with this picture. We have caches/registers and dram … And block-based devices such as .... That are... And in between them lies a big performance gap. Critical Performance Gap Persistent, but slow. SSD Hard Disk
3 Fast, but volatile. Cache DRAM Fast, and persistent.NVM is the technology that promises to fill this gap, its both fast and persistent. Non-Volatile Memory Persistent, but slow. SSD Hard Disk
4 Cache DRAM SSD Hard DiskIn terms of throughput and latency, it sits roughly somewhere here. And thus has great potential for boosting performance of new applications. SSD Hard Disk
5 Persistent Programs typedef struct { } node 1. allocate from memoryLet us consider persistent prorgrams that save their state or data to storage. 1. allocate from memory 2. data read/write + program logic 3. save to storage
6 Persistence Today Consider a binary search tree. If we want to persist it, we will have to perform these or very similar operations.
7 Persistence with NVM we know that this promise-land exists for programming new applications. but what about exsiting persistent programs. how can we improve them to leverage nvm?
8 Changing Persistence CodePresent NVM /* allocate from volatile memory*/ node n* = malloc(sizeof(…)) /* allocate from non-volatile memory*/ node n* = pmalloc(sizeof(…)) node->value = val //volatile update … node->value = val //persistent update … /* persist to block-storage*/ char *buf= malloc(sizeof(…)); int fd = open("data.db",O_WRITE); sprintf(buf,"…", node->id, node->value); write(fd, buf, sizeof(buf)); /* flush cache and commit*/ __cache_flush + __commit
9 Porting to NVM: TediousIdentify data structures that should be on NVM Update them in a consistent manner Redis: simple key-value store (~50K LOC) - Industrial effort to port Redis is on-going after two years - Open-source effort to port Redis has minimum functionality
10 Changing Persistence CodePresent NVM /* allocate from volatile memory*/ node n* = malloc(sizeof(…)) /* allocate from non-volatile memory*/ node n* = pmalloc(sizeof(…)) node->value = val //volatile update … node->value = val //persistent update … /* persist to block-storage*/ char *buf= malloc(sizeof(…)); int fd = open("data.db",O_WRITE); sprintf(buf,"…", node->id, node->value); write(fd, buf, sizeof(buf)); /* flush cache and commit*/ __cache_flush + __commit
11 Port existing applications to NVM with minimal programmer involvement.Goal: Port existing applications to NVM with minimal programmer involvement.
12
13 By Kiko Alario Salom [CC BY 2. 0 (http://creativecommonsBy Kiko Alario Salom [CC BY 2.0 (http://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons
14 Identify persistent types inFirst Step: Identify persistent types in application source.
15 Persistent Types in SourceUser defined source types (structs in C) that are persisted to block-storage. Application Code Block Storage
16 Solution: Static Analysiswe want to identify all possible candidates for persistence, irrespective of whether or not they are saved to storage in one particular run we observe.
17 Current Focus: C types = structs
18 Application Code write system call Block Storage
19 write system call node node *n = malloc(sizeof(node))iter *it = malloc(sizeof(iter)) /* persist to block-storage*/ char *buf= malloc(…)) int fd = open(…) sprintf(buf,”…”,node->value) write(fd, buf, …) write system call
20 write system call node node *n = malloc(sizeof(node))iter *it = malloc(sizeof(iter)) /* persist to block-storage*/ char *buf= malloc(…)) int fd = open(…) sprintf(buf,”…”,node->value) write(fd, buf, …) write system call
21 write system call node iter node *n = malloc(sizeof(node))iter *it = malloc(sizeof(iter)) /* persist to block-storage*/ char *buf= malloc(…)) int fd = open(…) sprintf(buf,”…”,node->value) write(fd, buf, …) write system call
22 Pipe Storage Network write system call node/* write to network socket*/ … write(socket, “404”, …) /* write to error stream*/ … write(stderr, “All is lost.”, …) /* persist to block-storage*/ … write(fd, buf, …) write system call Pipe Storage Network
23 node Save to block-storage Block Storage
24 node Save to block-storage Load/recover Block Storage
25 “rdbLoad” is the load/recovery function.
26 Mark every type that can be created during the recovery. *if defined in application source.
27 Call Graph from Load rdbLoad external library
28 BFS on Call Graph from LoadrdbLoad external library
29 Application type created/modifiedBFS on Call Graph from Load Application type created/modified external library
30 Currently supports C applicationsNVMovE Implementation Clang - Frontend Parsing Parse AST and Generate Call Graph- Find all statements that create/modify types in graph Currently supports C applications
31 Evaluation
32 In-memory data structure storestrings, hashes, lists, sets, indexes Persistence — data-snapshots(RDB), — command-logging (AOF) ~50K lines-of-code
33 122 types (structs) in Redis SourceIdentification Accuracy 122 types (structs) in Redis Source
34 Identification Accuracyfortunately for us, an industrial team has already gone thru the painful and laborious process of manually porting redis to nvm. So all we had to do was to ask them for the list of types that they make persistent in their implementaiton.
35 Identification Accuracy
36 Identification AccuracyTotal types NVMOVE identified persistent types True positives (manually identified) False positives False negatives
37 Performance Impact I would like to emphasize that this is not the main result: we have already shown the Nvmove produces no false negatives. This is just to get an estimate on the worst case performance using NVMove for NVM porting.
38 Both performed by forked background process.Redis Persistence Snapshot (RDB) Logging (AOF) Data snapshot per second Not fully durable Append each update command to a file Slow Both performed by forked background process.
39 NVM Emulation Emulate allocation of NVMovE identified types on NVM heap - Slow and Fast NVM - Inject delays for load/store of all NVM allocated types. - Worst-case performance estimate. Compare emulated NVM throughput against logging, and snapshot based persistence.
40 YCSB Benchmark Resultswrite-heavy (90% update, 10% read) 27K ops/s: No persistence in-memory (=1.0) Fraction of in-memory throughput
41 YCSB Benchmark Resultswrite-heavy (90% update, 10% read) 27K ops/s in-memory (=1.0) Possible Data loss 111 MB Fraction of in-memory throughput
42 Performance without False-PositivesSlow NVM 1.04x Fast NVM 1.49x 1.0 Speedup in throughput
43 Identify persistent types inFirst Step: Identify persistent types in application source. Recall that our … Nvmove is correct in identfying those. However, as we saw the cost of false positives is not always negligibel in terms of performance. So,
44 Next steps: Improve identification accuracy.Evaluate on other applications.
45 Backup
46 Iterators over persistent types.Identification Accuracy Iterators over persistent types.
47 Different byte-alignments of theIdentification Accuracy Different byte-alignments of the same type.
48 Throughputs (ops/sec)readheavy balance writeheavy PCM 28399 25,302 9759 STTRam 41213 38,048 12155 AoF (disk) 15634 6,457 2868 AoF (SSD) 27946 17,612 6605 RDB 46355 47,609 26605 Memory 50163 48,360 27156
49 NVM Emulation STT-RAM PCM (Fast NVM) (Slow NVM) 100 ns 40 ns 200 nsLatency Flush Latency Latency STT-RAM (Fast NVM) 100 ns 40 ns 200 ns PCM (Slow NVM) 300 ns 500 ns *Xu & Swanson, NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories, FAST16.
50 YCSB Benchmark Resultsin-memory (=1.0) Fraction of in-memory throughput PCM STT AOF (disk) (ssd) RDB NVM PCM STT AOF (disk) AOF (ssd) RDB PCM STT AOF (disk) AOF (ssd) RDB read-heavy
51 YCSB Benchmark Resultsin-memory (=1.0) Fraction of in-memory throughput PCM STT AOF (disk) AOF (ssd) RDB PCM STT AOF (disk) AOF (ssd) RDB PCM STT AOF (disk) AOF (ssd) RDB NVM NVM read-heavy balanced
52 YCSB Benchmark Resultsin-memory (=1.0) Fraction of in-memory throughput PCM STT AOF (disk) AOF (ssd) RDB PCM STT AOF (disk) AOF (ssd) RDB PCM STT AOF (disk) AOF (ssd) RDB NVM NVM NVM read-heavy balanced write-heavy
53 RDB Data Loss 111 MB 42 MB 26 MB read-heavy balanced write-heavy
54 Performance without False-Positives1.49x Speedup in throughput 1.15x 1.13x 1.09x 1.03x 1.04x 1.0 PCM STT PCM STT AOF (disk) AOF (ssd) RDB (disk) PCM STT AOF (disk) AOF (ssd) RDB (disk) PCM STT PCM STT PCM STT read-heavy balanced write-heavy