Deciphering Big Data in Water

Deciphering Big Data in Water

Before 2011, the U.S. National Library of Medicine’s PubMed Central had only seven articles and one grant recorded on the field of big data and water. Ten years later, that number has grown to over 8,249 articles and over 1,190 National Institutes of Health (NIH) grants in that field. The volume of articles has almost doubled every year since 2014, and the growth of NIH grants has similarly grown over the years, making it a dominant academic paradigm in the water space.

The rapidly growing interest around big data in water is not surprising; the possibilities big data has for water-related work and research could make U.S. water systems more sustainable both in agriculture and at home. For example, big data is being used to identify leaks in metropolitan water systems, track water quality problems in rivers and help farmers use less water more effectively for their crops. Some researchers hope that big data will soon be able to help people — both in agriculture and in metropolitan areas — better use water in the face of climate change.

Big data is used as the proverbial raw material that water researchers and water managers use to derive models, look for trends and train artificial intelligence (AI). It is necessary for making predictive tools about water.

In short, big data is the grease that makes digital farming, adaptive resource management and smart water systems run.

Defining big data

Despite the growing interest in big data in water, there are some challenges related to the field, one of the most fundamental being its definition.

Estimate reading time: 13 minutes

Researchers are making the flood of information work for water management in a changing future

More information

Want to get txH20 delivered right to your inbox? Click to subscribe.


Very simply, big data is what its name suggests: a lot of data. However, exactly how much data it takes for a body of data to count as “big” is a moving target because it has traditionally been a conditional definition.

“I tend to say that if you can manage it on your laptop and work with it, that’s not big data,” said Saurav Kumar, Ph.D., Texas A&M AgriLife Research assistant professor at the El Paso AgriLife Research and Extension Center, echoing a common definition of big data being a data set too big to work with on standard computing devices. Because the processing power of available computing devices continually expands, what counts as “big” also expands.

Robert Mace, Ph.D., executive director of the Meadows Center for Water and the Environment at Texas State University, also described big data in terms of the specialized tools needed to manage and work with it.

“When I hear the words, ‘big data,’ what I see is large databases and tools that are used to extract actionable information out of those large databases,” Mace said. He additionally explained that those tools can be anything from visualizing programs to custom-written algorithms to already-trained AIs.

Despite the fluidity of the definition of the “big” part of big data, it has some specific characteristics that differentiate it from data. Though sources vary on what the core definition of big data is, most agree on three main features: volume, variety and velocity.

Big data comes in large volumes, from a wide variety of sources, rapidly or semi-constantly. For example, Kumar explained that one of his drones equipped with sensors can collect terabytes of data about evapotranspiration over an agricultural field. Sensing equipment can also collect a wide variety of data. For example, a network of water sensors set up in a river might collect information on flow rate, water pH, dissolved solids and oxygen, and temperature. In terms of velocity, some sensing tools used in smart metering in large cities can collect water usage data at rates of every few seconds, resulting in an almost constant flow of data.

If you can manage it on your laptop and work with it, that’s not big data.

Saurav Kumar, Ph.D.

Challenges of working with big data

All that volume, variety and velocity makes gathering, cleaning and managing big data challenging. Mace described working with big data as being “messy” a lot of the time.

“There are warts in there,” he said. “There can be errors. Somebody without a lot of experience might assume it is correct and not realize that there’s a healthy error bar around it. Some of it might be flat out inaccurate, because, for example, a driller made something up.”

So-called “dirty data” requires data cleaning on a big scale. That involves correcting or removing incomplete, corrupted, incorrect or improperly formatted data. At the scale of big data, specialized tools — usually in the form of software and special skills — are needed.

Once cleaned, big data must also be structured to be useful to researchers or end users. Structuring can turn a “data lake” — a large collection of usually unstructured data from a variety of sources and likely different formats — into a “data warehouse” that is easy to navigate to find what is needed rapidly.

“These datasets are coming as a large volume of complex data in different formats, different velocities and heterogeneities of veracity,” said Binayak Mohanty, Ph.D. — Texas A&M University Regents Professor and College of Agriculture and Life Sciences Chair in Hydrologic Engineering and Sciences — speaking of the big data created by water sensing technologies.

“All of this together makes the challenge more complex and the issues more demanding, but at the same time, maybe there are more opportunities for the future. How can we organize all the data, bring it together so people can have a platform to find it, access it, interpret it and use it?”

Using big data in agriculture

Big data is both an outcome of other tools — usually sensing technologies like water monitors, flow meters, drones equipped sensing tools, soil moisture monitors and so on — and a tool in and of itself. Most fundamentally, big data is necessary for making informed, reliable predictions about the future that can translate into real-world actions and improve how water is used and managed.

In agriculture, big data is necessary for digital agriculture, which is the use of data and data-based tools like AI, the internet of things and model-based predictive tools to optimize agricultural operations. The more familiar phrase “precision agriculture” is part of the wider concept of digital agriculture.

Scientific irrigation scheduling tools are among the examples of water-related irrigation strategies that depend on big data, according to Ali Ajaz, Ph.D., Texas A&M AgriLife Extension program specialist at the Texas Water Resources Institute (TWRI), who has been studying the adoption of such technologies along the Rio Grande. These tools usually combine data derived from soil moisture sensors and weather data and then process them using AI to help farmers make targeted irrigation decisions for their crops and make better use of their water resources.

“Irrigation scheduling is a science where we are trying to find out how much water is required at what time to achieve a certain crop yield goal,” Ajaz explained. He added that for the majority of the Rio Grande growers who have adopted scientific irrigation scheduling technology, the biggest factor that pushed them to do so is to maintain the quality of their land.

“Being informed about the water application on your farm has been a link in the chain of sustainable ag.”

Even if growers aren’t yet using all scientific irrigation scheduling tools, Kumar said some use parts of the suite of tools, like soil moisture sensors. Kumar described one pecan farmer he worked with who had created a large data processing hub for himself where he collected data from his moisture sensors and generated graphs to better visualize the data to inform his irrigation plans.

Though not all irrigators have quite the big data do-it-yourself drive as Kumar’s example pecan farmer, Kumar said in his experience many farmers use information from soil moisture sensors.

“My impression was that he was an outlier, but no! He’s not that unique. There are other growers who have soil moisture sensors, at least, and use that data to time their irrigation very well.”

Other examples of using big data related to water in agriculture include the use of field imaging and similar remote sensing technologies. These efforts usually involve drones to determine what portions need to be irrigated either visually or through the rate of evapotranspiration over a field. Such tools generate massive amounts of data that can then be used to identify patterns. Apps and other services can use that data to produce useable tools for irrigators. Work was ongoing on such an app and funded in part through a TWRI-administered Water Seed Grant.

Saurav Kumar, Ph.D., prepares his drone for flight
Saurav Kumar, Ph.D., prepares his drone for flight

Using big data in water resources management

When it comes to big data in water management, Kumar gave the example of El Paso Water, El Paso’s water utility, having an issue with the taste and smell of the water coming into the city.

“They spend a lot of money just to improve those characteristics because people often complain,” he said.

The taste and odor issues with El Paso water often stem from algae blooms in New Mexican reservoirs along the Rio Grande, the source of the city’s water. These algae blooms can result from warm water temperatures in summer coupled with high levels of nutrients in the water. While the water’s musty taste and smell is a nuisance, it is not a safety issue.

Using big data could give El Paso Water enough advance notice to address the issue before it becomes a nuisance.

“What they are planning to do is install sensors near the Elephant Butte reservoir, which is several hundred miles away,” Kumar said. He explained that the data from the sensors could show the patterns of water conditions upstream that can result in an algae bloom.

“That way El Paso Water can know when they are likely to have issues with taste and smell about three days in advance and can plan their mitigation efforts,” Kumar said.

Using big data in municipal water use

As growers are using digital agriculture, municipalities have also been tapping into the growing potentials of smart water metering. Smart water metering — part of the advanced meter infrastructure concept — allows water utilities and municipal water users to better monitor overall water system efficiency and identify problems with the system. In early stages, smart water metering allowed utilities to more closely monitor water use for billing purposes and more easily read water meters at the household level. More recent advances in smart metering have extended near-real time information to water users on their household level usage.

Not only does smart metering like this generate big data, that data is required for current and future tools to help utilities and users alike to better understand and manage their usage.

Mace gave his participation in Pecan Street as an example. Pecan Street is a non-profit research and development organization originally founded in 2009 through seed funding from the University of Texas-Austin.

“They do detailed monitoring of energy use and water consumption,” he explained. “It’s difficult to measure water use at, say, the individual sink level, but there’s long-standing research that the whole-house signal can be decomposed into what’s irrigation, what’s a shower, what’s a washing machine and so on. Different uses have different volume and duration profiles that can be identified with this level of monitoring.”

Mace and his wife use Pecan Street’s water monitoring technologies at their home and keep track of their water usage with the organization’s tracking app. The smart meter sends usage data every 4-10 seconds. They were able to use this information to identify when a pipe burst under their house. They were also able to use the information to solve a water mystery.

Not only does smart metering like this generate big data, that data is required for current and future tools to help utilities and users alike to better understand and manage their usage.

“I was able to use that meter to show that I was having a mysterious use sometimes during the day when my wife and I were at work,” he said. “We realized that the neighbor was using our outdoor faucet to wash his car.”

Some utilities are rolling out similar big data-creating smart water meters and big data-dependent tools for residents. The City of Fort Worth started offering the MyH2O program in 2019, which includes smart meters that collect data every hour. An online portal allows municipal water users to track their water usage from the day before. Users can also set automated alerts on their account so that they are notified by text if their water use has not stopped running, suggesting a leak, or if their water bill is approaching a set amount. Austin’s My ATX Water program, launched in 2020, is very similar, and other Texas cities like Allen, Abilene and Mesquite have recently announced smart water metering projects.

Climate change and the future of water’s big data

Big data can also help water users of all kinds adapt to the uncertain future of climate change. However, climate change can make it hard to understand big data.

Processes are continuously changing because of climate change. So we need to understand these physical processes are happening and also how the parameters that are driving the physics are changing with time.

Binayak Mohanty, Ph.D.

“Based on climate change, our future is going to be even more nonstationary,” Mohanty said, explaining that “nonstationary” in this context means that what we know from the past may not hold true for the future.

“Basically what is happening here is the data is generated as the water is moving; it is a continuous cycle,” Mohanty added. “As soon as water hits the ground in the form of rain, it moves through the soil water systems, then it joins the groundwater. Or surface runoff goes to the rivers and eventually goes to the ocean and is recycled back through evaporation and clouds and begins the cycle again.

“But these processes are continuously changing because of climate change. So we need to understand these physical processes are happening and also how the parameters that are driving the physics are changing with time.”

He said that researchers need to figure out how to link knowledge of the physical with the data. He described the physical side of water issues as things like hydrology, engineering, the understanding of how contaminants or nutrients move in a water system, biology of plants and plant physiology and atmospheric science. The data side includes disciplines like mathematics, statistics, computer and data science, and electrical engineering. This transdisciplinary approach will allow researchers to go deeper and understand the systems better to better predict things in the future, he said.

“When you have these datasets that are coming from different space and time resolutions and they are evolving and changing because of climate change, you need to have a very good understanding of how to fuse them,” Mohanty said.

“This means you have to have a fusion of the laws of nature — the physics, the chemistry, the biology — and bring them into the realm of computer science, big data, AI and machine learning,” he added.

“The different pieces of the puzzle are available, but how to connect them together and how to massage them to get the right answers to the right questions is the future for us to look into. We have to sit together and try to answer the questions in a transdisciplinary way.”

Explore this Issue


As communications manager for TWRI, Kerry Halladay provided leadership for the institute's communications. As a strategic coordinator, she served as liaison between AgriLife's Marketing and Communications department and the client groups: TWRI, the Natural Resources Institute, the Norman Borlaug Institute for International Agriculture, and the Institute for Infectious Animal Diseases.