In this month’s columnist feature article, Morgan Hughes grapples with the complexities of data management…
Remember last month when I said that we’d installed static detectors at two sites and it was going really well?…The first two weeks of data were interesting, and I’ve now been back to Site A and collected in the remainder of the recordings for the trial month, which I’m analysing this morning.
At Site A, we’ve now started picking up brown long-eared (BLE) bats — we know that this is an important night roosting and swarming site; our mist net catch in 2017 comprised predominantly male BLEs and Daubenton’s, with a weeks worth of monitoring in autumn showing social calls of both species within the mine. See below.
The recorder at Site B has been happily sitting there recording for the whole month, with its microphone dangling at the end of a 50m extension cable, suspended above the mine entrance, which is otherwise inaccessible. I chose the Anabat Swift because they will record for one month on a set of eight AA batteries. So, to my surprise when we went to look at the data, the recording had stopped after a week — both of the 64GB memory cards were full!
This presents a problem in a few ways. Firstly, this is a LOT more data than I anticipated, when compared with Site A. (Ok, site B is bigger, more isolated, with a bigger species list, and we know it is a huge swarming site for a number of species, so perhaps I was naive in thinking the recording capture rates would be similar.) Extrapolating into the future, 128GB every week is 512GB per month, which is a whopping 6 terabytes (TB) of data per year — keeping in mind that November-March activity will be very low, but we can expect triggers to double or triple in the swarming season, so perhaps more like 5TB per year from this bat detector alone. Over the course of the four-year project, that is 20TB per static detector and we anticipate that we’ll have three running at all times. Where were we going to store 60TB of data? How were we going to get it to that storage? I don’t have anywhere near a free terabyte on my laptop (the most the Anabat Swift can store at any one point). The transfer time alone is astronomical, let alone analysis: how are we going to analyse that many bat calls?
After purchasing 128GB memory cards, I consulted with colleagues in the bat group, on Twitter, Facebook and LinkedIn to pick the brains of people who had no doubt encountered this issue before. The answer, most feel, is to record only in Zero Crossing (ZC). What is ZC and why is it inferior to the full spectrum we currently use? Basically, ZC files contain less information (ergo they are smaller). A LOT less information. Quiet calls may not be detected, and sound analysis is trickier, as you can tell straight away from the image below:
As much as it pains me to do this, I’m not planning on undertaking detailed call analysis at these sites – I want to ID species (at least to genus), and plot activity of those species over the course of 12 months; ZC will do that for me. The 256GB will be enough memory to collect a month of ZC data, even at a busy site, though I may still need a couple of 1TB portable hard drives for transporting the data (one for site staff to put the data onto while I have the other one at home for analysis).
All of this is actually quite timely, as I’m currently reading a book on data management (fascinating bedtime reading, I know!), in anticipation of the influx of different types of data. Even before visiting Site B, I had been drawing out the bare bones of a Data Management Plan, to cover what data I anticipate getting, what the workflow will be, how it will be stored, file naming systems, etc. But it also needs to cover stuff like data ownership, which is a surprisingly tricky area. Here’s an example:
Katy does a bat survey on a voluntary basis on public land, using bat group equipment that was paid for by our funders (HLF and BCT) but which is contributing to my PhD, so the University will have a stake in anything that contributes to published material. Katy then sends me the sound files and I identify the bats as Noctules, submit the record to Sam, our Records Officer who then submits the record to EcoRecord, our LRC. So, we have two pieces of data: the sound recording and the biological record.
Who owns the data? It’s complicated enough with the sound file, but when you then add in the fact that volunteer records are submitted to the bat group, who then submit all their records to the LRC as part of a service-level agreement, things get very complex indeed:
So, we need a data sharing policy, to which all parties are signed up, agreeing data flow, ownership, access and legacy. All very complicated stuff and this is what I’m working on now, during the maternity season. We are, of course, continuing to do traditional surveys – we are undertaking emergence surveys of farm buildings and churches in the landscape surrounding Site A, which has been a great opportunity to bring in some of our new volunteers and give them some old-school bat detector and field skills training.
Coming up, we have more emergence surveys followed by novel activity surveys in July – we’ve also had some awesome news about a grant! So, watch this space for updates at the end of July when we’re gearing up for the start of summer trapping season! In the meantime, I’ll be reading about data management…
Figures: ©Morgan Hughes
About the Author: Morgan Hughes is an ecologist specialising in bats and badgers. After growing up in south Florida, she moved to the UK where she studied Physical Geography and Biological Recording. She currently lives and works in the West Midlands and serves as Chair and Advanced Surveys Coordinator for the Birmingham and Black Country Bat Group. You can find her blog at www.thereremouse.com and on Instagram and Twitter where you’ll see updates of her survey work.