Pocket Agronomist: diagnosing plant diseases using convolutional neural networks


At Perceptual Labs, our machine vision work is driven by the needs of real world applications. I'd like to talk about an application that we're proud to unveil today. Called Pocket Agronomist, it is an iOS application for diagnosing common crop diseases using the live camera feed from your mobile device. It utilizes convolutional neural networks to perform object detection in real time in order to identify and label regions of plant disease detected when looking through the camera.

Pocket Agronomist was developed by Agricultural Intelligence, a joint venture between Perceptual Labs and Ag AI. Ag AI brings significant agricultural experience, knows the needs of the agricultural industry, and has the means to acquire a dataset for detection of these various plant diseases. When combined with Perceptual Labs' software and expertise for training and deploying convolutional neural networks on mobile devices, a unique product is made possible.

Designing, building, and testing this application took a significant amount of work, and I'd like to describe the process. 

Identifying the problem

We are fortunate to be located in Madison, WI, an area surrounded by farm country and where the University of Wisconsin, one of the nation's leading research universities, performs leading-edge agricultural research. I can see three corn fields from my front porch, and I come from a stereotypical Wisconsin family featuring dairy farmers and cheesemakers. It's natural, then, that one of the first conversations we had about the use of machine vision involved agriculture.

After the growing season last year, the founders of Ag AI came to us with an interesting problem. Every year, farmers struggle to minimize crop damage done by disease. In 2016 alone, it's estimated that 817 million bushels of corn were lost to disease. Early identification and treatment of these diseases could be a tremendous help to farmers, but it's impractical to have trained specialists walking every field.

Both parties here recognized that by using machine vision, we might be able to provide that expertise to every farmer via the device they already have in their pocket. Our early proof-of-principle tests showed very promising results, so we created a joint venture to explore this. Ag AI would provide their extensive agricultural expertise, access to crop disease datasets for training and testing, and a network of initial users. Perceptual Labs would design and train convolutional neural networks for disease diagnosis and build an application around this. When I use "we" in this article, I'm referring to the joint partnership between our two companies.

Designing the application

The first step was to determine the shape and capabilities of the end application. We decided to focus on a single crop at first, with the ability to quickly expand to others once the process had been established. Corn was a natural choice for our first target, as it is the dominant crop in the Midwest and greater United States.

We identified 14 diseases and one instance of non-disease damage that we would be able to train the system to detect. Detection wasn't enough, though, so we wanted to provide encyclopedic information about a disease once it was detected, as well as clear indications of the danger it posed to a field and the steps to mitigate this danger.

The application had to work in areas with no or limited network connectivity, which is the case for many farms. Perceptual Labs' focus on performing machine vision on device, with no server-side component, meant that our technology was ideally suited for this. By training a convolutional neural network to detect these diseases, and then performing inference on live camera video, a farmer or contractor would be able to simply point their mobile device at a corn leaf and get an immediate readout of what it detects.

Once we had a good idea of what the application would need to do, we starting building up the components required to make it work.

Aggregating a dataset for training

When training a convolutional neural network for classification or object detection using supervised learning, a sizable dataset of training images is needed. How large and diverse a dataset we needed was an early question, followed by where you could obtain such a dataset. Much of the work you'll see out there involving convolutional neural networks tends to be based on a few generic publicly available datasets, like ImageNet. That's fine if the things you want to detect are contained within those datasets, but for most applications you'll need something that is more targeted.

We had assumed that it would be fairly easy to find images for common corn diseases at land grant universities and others with strong agricultural programs, but until now most didn't have a need to capture hundreds of photos of specific diseases. A few images were good enough for educational and research purposes, and many of those were taken in artificial conditions. As we've found in other cases, good datasets were lacking because people couldn't have anticipated the needs of training maching learning systems.

Therefore, we quickly turned to acquiring training imagery ourselves. We aggregated what we could from universities and other agricultural partners, but needed to capture a lot more imagery from the field to make this viable. To address this, we built a system into our beta testing application where when disease was detected in the field, the application would automatically label it and upload an image to add to our dataset. At the same time, users could manually capture images and upload them with a single click, or directly inform us when something was misdiagnosed.

Distributing this among our beta testers let us gather hundreds of images from a single disease outbreak, and we could re-train our neural networks on a nearly daily basis with these additions, continually improving accuracy and eliminating false positives, one by one. Even though the corn growing season has come to an end in the Midwest, we're still processing all the images we captured right up to harvest.

Network design and training

While convolutional neural networks generalize very well to wide ranges of problems, we've found that a little tuning doesn't hurt when targeting specific real-world problems. Most published image classification convolutional networks have been built and benchmarked around the ImageNet ILSVRC 2012 dataset. While that provides a useful baseline, with a wide variety of images and classification categories, I'm of the opinion that people are maybe micro-optimizing for this dataset at the expense of other cases. For example, a recent study showed some of the problems I've seen in ILSVRC 2012, such as misclassified images, multiple classes in the same image, and so on.

When evaluating the performance of our neural networks, we wanted to make sure we were as rigorous as we could be, so we cultivated a diverse and challenging validation set of images for each disease category, as well as a large number of cases with no disease, no corn, or even no plants to test against. We made sure that every image was of something the network had never seen before, and they encompassed the wide range of lighting and environmental conditions you'd see in the field. We also made sure that all validation runs were performed on an iOS device running a modified version of our application, because differences in GPU floating point precision can cause subtle differences in classification.

As a result of this, our validation test accuracies provided a strong reflection of how our convolutional networks would perform out in the field.

During training, we continually evaluated runtime performance on a mobile device alongside accuracy. We tried to balance the accuracy of the network with speed, finding a sweet spot where we would only realize trivial accuracy gains by making the network much slower. We profiled and found bottlenecks in the network designs that slowed things down but didn't aid accuracy, and gradually thinned those out.

Testing in the field

We can perform all the validation we want on static images, but none of that matters if the product isn't usable in the real world. This particular application posed some unique testing challenges, because you literally had to go into the field to try it out. This outdoors environment also posed some interesting design challenges, as well.

Back before the launch of the iPhone App Store, I had a fascinating discussion with Craig Hockenberry of the Iconfactory, whose Twitter client Twitterrific had just won one of the first iPhone-oriented Apple Design Awards. He had commented that a driver for the dark background and light text of Twitterrific was that they had found it was more legible outdoors in bright sunlight. 

We originally had a more traditional iOS-7-style light interface, with a white background and black text, but farmers found that to be harder to read in bright sunlight when wandering rows of corn. Similarly to Craig's experience, we tested out a dark interface, and people immediately found it to be more legible in the field. We're also working to make sure that text is large enough and icons are clear enough to be read by the majority of our users who are outside of the under-20 demographic many iOS applications seem to be designed for.


Functionally, we originally built the application to perform image classification on a live video stream. Given a frame of video, it would tell us whether the camera sees undamaged corn, one of the corn diseases, corn that's too far to make out, a plant that isn't corn, or no plant at all. Our original design had detailed confidence percentages and alternative diagnoses that continually updated with incoming camera video.

While we thought this extra information would be useful, it turned out to be more of a distraction when used in the field, and confidence percentages didn't help in making actual diagnostic decisions. We simplified the readout to just show the current best diagnosis (if there was one), with any alternative above a certain confidence threshold shown below that. We also worked to prevent diagnoses from bouncing around from frame to frame when results were close.


This worked reasonably well, but we found some drawbacks with this classification approach. As mentioned above, the text results could bounce back and forth between very close diagnoses, making it hard to read when moving the phone across a leaf. The text was at the bottom of the screen, below the camera view that you were using to line up the corn leaf. You received no information about where the system detected disease, so you had to scan around to find where the disease was. 

Finally, image classification does a terrible job with images that contain multiple classes, which in our case could mean multiple diseases. This impacted both training and testing, preventing us from using many images containing multiple diseases in training and validation. It then reduced accuracy in the field when these multiple disease cases (which are not uncommon) were encountered. A single leaf could exhibit two different diseases as well as damage from fertilizer application all at once, and a single-category classification can't sufficiently express that.


At about this time, Perceptual Labs had just gotten our object detection networks and training operational, so we decided to see if that could be a solution to these issues. Areas of disease aren't your traditional objects (like cars or people) that have well-defined shapes and boundaries. Disease lesions could take on multiple sizes and shapes. Would object detection even work for this? Turns out it does, and it works very well.

We then shifted our efforts from a product based around image classification convolutional neural networks to one using these newer object detection networks. This took a lot of work on the dataset side, with us having to go back through and manually label areas of disease, a process we're still refining.

By using object detection, we were able to transform the application from providing a simple text readout of what it sees to labeled bounding boxes within the live camera video that show you exactly where disease was found. These boxes track and scale as you move the camera around the leaf. This both simplified the application interface and provided a lot more information to the user. It also significantly reduced our rates of false positives, while ultimately matching the accuracy of our previous image classification networks.

To our knowledge, this is the first case of object detection being used on live camera video to detect and localize disease, particularly on a mobile device. We're very proud of how this application has turned out, and in-the-field testing has been key in building a very useful product.

Looking forward

While development and data-gathering progressed, harvest loomed as a hard deadline. I'd drive past cornfields on my way to work and watch the plants grow as a kind of real-world progress bar. We used the entire growing season for data gathering and testing, right up to the day before harvest in many fields. We're still processing all the imagery and test results from the season, and the application is out in the hands of many beta testers as we enhance its capabilities.

We're very excited about the capabilities of this application, and we believe it will provide a unique solution to common agricultural problems. If you would like to see it in action, or are interested in talking with us about the use of this product or technology, feel free to contact us.

Again, to read more about the application, please visit the website at agriculturalintel.com