mohil

Hello I'm

Mohil Patel

9oipl19m8h@gmail.com  unscramble

About Me

I am currently working as a Member of Techincal Staff at Oracle Corporation. I am part of the GoldenGate team, responsible for developing the GoldenGate product. I completed my Master's in Computer Science from the University of Wisconsin-Madison. During my stay there, I worked on many projects under Prof. Shivaram Venkataraman and Prof. Theodoros Rekatsinas.

Before my Masters, I was working at Nvidia on the Geforce Now Cloud Game Streaming Platform. I worked with the QoS (Quality of Streaming) team, and we were responsible for ensuring a good gaming experience in varying network conditions. I have also completed an internship at the Samsung R&D center India, working with their Smart Devices Team.

I completed my undergraduate from IIT Bombay with a major in Electrical Engineering and a minor in Computer Science. For my final year project, I worked with Prof. Madhav Desai on building an end-to-end communication system with hardware encryption.

My interests broadly lie in the domains of Big Data Systems, Distributed Systems and Machine Learning. In my free time, I like to read books (my Goodreads profile), play sports (especially badminton and soccer), tinker with side projects, and play video games.

Download Resume

My Journey

This is my professional and educational journey.

Projects

LBGFS
Low Bandwidth GFS
Mar '23 - May '23
GFS is a well-known filesystem by Google and has inspired the development of many famous filesystems like HDFS. In our project, we aimed to improve GFS for Low Bandwidth Networks. In low bandwidth conditions, avoiding data transmission over the network (from client to server) would be beneficial if the data already exists on one of the GFS servers. We designed a system where we chunkify the file and generate hash values of these chunks. These hash values (assumed unique) are transferred over the network and compared to check for duplicate data on the server, thus avoiding transmission of those chunks. We implemented this using a wrapper on top of HDFS and showed that this system can significantly reduce data sent over the network if configured for proper workloads. [report // presentation]
Question-Answering
Ensembling and Data Augmentation for QA
Feb '23 - May '23
Question-Answering is one of the fundamental tasks in the field of NLP. There has been significant past work in the field of NLP for QA tasks. In our project, we focused on exploring ensembling, oversampling, and question-generation techniques to improve accuracy on QA tasks. We used SQuAD Dataset for our testing. For ensembling, we used variations of BERT and used the confidence estimator to generate ensemble results. In oversampling, we focused on oversampling underrepresented question types. And lastly, for question generation, we trained a separate BERT model for synthetic question generation. Ensembling showed the best results, and we also saw some improvements with question generation. [report // presentation]
Raft
RAFT
Feb '22 - Mar '22
In this project, we implemented a replicated key-value store. To design replicated key-value store, we implemented the Raft algorithm for consensus. We designed the system using a polling-based design, i.e., each thread runs independently and polls to check if any action is required. Similar to Raft, the system does a leader election and then replicates data to the followers. Inspired by Netflix's Chaos Monkey project, we also designed a system that randomly kills the server while the system is working to test the correctness of Raft implementation. [report // presentation]
AFS
WiscAFS
Jan '23 - Feb '23
Andrew File System (AFS) is a well-known distributed filesystem with the key idea of whole file caching. In our project, we implemented our version of AFS. We used libFUSE to implement filesystem calls on the client side and redirected those to the server using GRPC. The fundamental design principles in our implementation were that server maintained only the persistent state, the client did whole file caching, the last writer won, and we used the last modified timestamp to track file updates. These principles allowed us to achieve client crash consistency and durability. We demoed our filesystem by building the xv6 project, compiling the leveldb project, and successfully using Vim on top of our filesystem. [report // presentation]
Model Parallelism
Efficient Distributed Transfer Learning using Pipelined Model Parallelism
Oct '22 - Dec '22
Larger and Larger Neural networks are being trained daily, making it infeasible to train them from scratch. A huge fraction of ML training nowadays focuses on fine-tuning the last few layers of a pre-trained model. In our project, we propose a new distributed framework in which we can fine-tune multiple different models in parallel that share the same fixed initial part. This framework improves the performance and resource utilization by employing pipelined model parallelism and allows us to fine-tune multiple models together, avoiding repeated work. [poster // report]
LSM-Read-Write
Improving Performance in LSM-Tree based Key-Value Stores using NVMe
Oct '22 - Dec '22
Log-Structured Merge-Tree key-value stores convert random writes into sequential writes. This behavior helps improve performance for the write-intensive workload with a tradeoff of causing write amplification. These systems perform well for spinning hard disks, but with modern SATA SSDs and NVMe SSDs, there is scope for improvement. In this project, we improved the performance of LSM-Tree-based key-value stores (we used RocksDB for testing) for SSDs. We implemented two key ideas: splitting data across SSD and HDD to utilize the total bandwidth of both together. And second, we used SPDK (Storage Performance Development Kit) to bypass the kernel and access NVMe SSD directly for reads and writes. [presentation // report]
Graph-Databases
Database to Graph Conversion Tool
Sept '21 - May '22
This project is part of the Marius project. Marius is a system for training graph neural networks and embeddings for large-scale graphs on a single machine. As a part of preprocessing utility, we designed a database to graph conversion tool which converts relational databases into graphs as sets of triples that can be used as input datasets for Marius, allowing streamlined preprocessing from database to Marius. The tool uses out-of-memory processing to generate graphs of sizes up to billions of edges within a few hours. Currently, the tool supports Postgres, MySQL, and MariaDB as input databases. [code // documentation]
Social-Network-Graph
Analyzing System Characteristics of Graph Algorithms across different Graph Frameworks
Jan '22 - April '22
Graphs are increasingly used in the domain of Big Data. Over the last decade, graph sizes have increased to billions of edges. This increase has led to the development of many graph processing frameworks. In this project, we analyzed modern graph frameworks' performance and system characteristics. We analyzed three graph algorithms, Connected Components, PageRank, and Triangle Counting, for four different frameworks: Spark, GraphX, GraphFrames, and GraphChi & concluded which is better in what scenarios. [slides // report]
CT-Image
Modeling Biological Age & using Machine Learning to Predict Death
Jan '22 - April '22
The project's goal was to use CT data with death labels to predict when a person will die. To tackle the problem, we first defined biological age, which represents how healthy a person is. Next, we modeled (biological age / actual age) as a gaussian random variable. And lastly, we used defined parameters, models, and existing features with machine learning techniques like Linear Regression, Decision Trees, and Neural Networks to predict days till death. [slides // report]
Pencil Drawing
Combining Sketch and Tone for Pencil Drawing Production
March '21 - May '21
Implemented the research paper, Combining Sketch and Tone for Pencil Drawing Production, which generates Pencil Drawing from natural images using Java & OpenCV. The paper uses novel method for generating Line Drawing with Strokes and Pencil Texture Rendering to tranform the natural image into a pencil drawing. [code]
CHIP-8 Pong
Chip-8 Emulator
Jan '21 - March '21
CHIP-8 is a 8-bit interpreted programming language used in 1970s and 1980s in 8-bit microcomputers. There are many classic videogames ported to CHIP-8 like Pong, Space Invaders, Tetris and Pac-Man. In this project, I developed a Chip-8 Emulator using C++. I implemented the opcodes, memory, timer, keyboard and graphics to emulate Chip-8 ROMs. Keyboard and Graphics were programmed using SDL2.0. [code]
secure-comm
Real-Time Server Based Secure Communication
July '19 - April '20
In this project, we developed an end-to-end secure communication link with a programmable hardware block in the audio pipeline. We programmed the hardware block as an encryption engine to secure the communication link. The audio communication happens via a server and is a full-duplex link. [code // report]
Cyrix 6x86
Superscalar Architecture
January '19 - April '19
Designed a 16-bit microprocessor based on a superscalar architecture with fetch width of two Instructions and four different pipelines in VHDL. The architecture is based on a Turing Complete ISA with 17 Instructions and was successfully verified by simulation using Modelsim. [code]
optical-fiber
Data Transmission through Polymer Optical Fiber Link
January '19 - April '19
Polymer Optical Fiber provides a low-cost alternative, with the benefit of the high speed of light, for data transmission. In this project, we developed a module capable of transmitting digital data using Polymer Optical Fiber and a simple LED. We achieved data transmission rates up to 35 Mbps. [report]
texture
Texture Synthesis using Non-Parametric Sampling
October '18 - November '18
In the project, we implemented a research paper on Texture Synthesis, discussing a novel texture synthesis technique based on the assumption of spatial locality. We successfully implemented and replicated the results shown in the paper using MATLAB. [code]
DE0-nano-fpga
Inter-FPGA data Transmission using LVDS
May '18 - July '18
We developed a high-speed bidirectional data transmission link between two FPGAs using LVDS (Low Voltage Differential Signalling). The data flow is controlled using a simple request and acknowledgement interface along with FIFOs to store the data. The data transmission link is capable of transmitting at data rates up to 400 Mbps. [code]
color-sensor
Color Sensor using Phase Sensitive Detection
Jan '18 - April '18
In this project, we designed a color sensor using 3 LEDs (red, green, and blue) & phase-sensitive detection. Light from each LED is reflected by the colored surface and read using a photodiode & Transimpedance amplifier. The output of the Transimpedance amplifier is a voltage value that corresponds to RGB values of the color. We used phase-sensitive detection to remove the effects of ambient light noise on the photodiode. [report]
drona-aviation
Multiple Drone Tracking and Localization
November '17 - December '17
In the project, we localized the drone's position under a camera setup using Whycon ROS package and extracted 3D coordinates of multiple drones in real-time with accuracy up to 3cm. These coordinates were later used in a project to automate drone's flight movement based on its past location.

Please refer to my resume for additional information.