How long does it take to change the data, part I: confidence

A few days ago, I asked the following question in the Mashcat Slack: “if you’re a library data person, what questions do you have to ask of library systems people and library programmers?”

Here is a question that Alison Hitchens asked based on that prompt:

I’m not sure it is a question, but a need for understanding what types of data manipulations etc. are easy peasy and would take under hour of developer time and what types of things are tricky — I guess an understanding of the resourcing scope of the things we are asking for, if that makes sense

That’s an excellent question – and one whose answer heavily depends on the particulars of the data change needed, the people requesting it, the people who are to implement it, and tools that are available.  I cannot offer a magic box that, when fed specifics and given a few turns of its crank, spits out a reliable time estimate.

However, I can offer up a point of view: asking somebody how long it takes to change some data is asking them to take the measure of their confidence and of their constraints.

In this post I’ll focus on the matter of confidence.  If you, a library data person, are asking me, a library systems person (or team, or department, or service provider), to change a pile of data, I may be perfectly confident in my ability to so.  Perhaps it’s a routine record load that for whatever reason cannot be run directly by the catalogers but for which tools and procedures already exist.  In that case, answering the question of how long it would take to do it might be easy (ignoring, for the moment, the matter of fitting the work onto the calendar).

But when asked to do something new, my confidence could start out being quite low.  Here are some of the questions I might be asking myself:

Am I confident that I’m getting the request from the right person?  Am I confident that the requester has done their homework?

Ideally, the requester has the authority to ask for the change, knows why the change is wanted, has consulted with the right data experts within the organization to verify that the request makes sense, and has ensured that all of the relevant stakeholders have signed off on the request.

If not, then it will take me time to either get the requester to line up the political ducks or to do so myself.

Am I confident that I understand the reason for the change?

If I know the reason for the change – which presumably is rooted in some expected benefit to the library’s users or staff – I may be able to suggest better approaches.  After all, sometimes the best way to do a data change is to change no data at all, and instead change displays or software configuration options.  If data does need to be changed, knowing why can make it easier for me to suss out some of the details or ask smarter questions.

If the reason for the change isn’t apparent, it will take me time to work with the requester and other experts and stakeholders until I have enough understanding of the big picture to proceed (or to be told to do it because the requester said so – but that has its own problems).

Am I confident that I understand the details of the requested change?

Computers are stupid and precise, so ultimately any process and program I write or use to effect the change has to be stupid and precise.

Humans are smart and fuzzy, so to bring a request down to the level of the computer, I have to analyze the problem until I’m confident that I’ve broken it down enough. Whatever design and development process I follow to do the analysis – waterfall, agile, or otherwise – it will take time.

Am I confident in the data that I am to change?

Is the data to be changed nice, clean and consistent?  Great! It’s easier to move a clean data set from one consistent state to another consistent state than it is to clean up a messy batch of data.

The messier the data, the more edge cases there are to consider, the more possible exceptions to worry about – the longer the data change will take.

Am I confident that I have the technical knowledge to implement the change?

Relevant technical knowledge can include knowledge of any update tools provided by the software, knowledge of programming languages that can use system APIs, knowledge of data manipulation and access languages such as SQL and XSLT, knowledge of the underlying DBMS, and so forth.

If I’m confident in my knowledge of the tools, I’ll need less time to figure out how to put them together to deal with the data change.  If not, I’ll need time to teach myself, enlist the aid of colleagues who do have the relevant knowledge, or find contractors to do the work.

Am I confident in my ability to predict any side-effects of the change?

Library data lives in complicated silos. Sometimes, a seemingly small change can have unexpected consequences.  As a very small example, Evergreen actually cares about the values of indicators in the MARC21 856 field; get them wrong, and your electronic resource URLs disappear from public catalog display.

If I’m familiar with the systems that store and use the data to be changed and am confident that side-effects of the change will be minimal, great! If not, it may take me some time to investigate the possible consequences of the change.

Am I confident in my ability to back out of the change if something goes wrong?

Is the data change difficult or awkward to undo if something is amiss?  If so, it presents an operational risk, one whose mitigation is taking more time for planning and test runs.

Am I confident that I know how often requests for similar data changes will be made in the future?

If the request is a one-off, great! If the request is the harbinger of many more like it – or looks that way – I may be better off writing a tool that I can use to make the data change repeatedly.  I may be even better off writing a tool that the requester can use.

It may take more time to write such a tool than it would to just handle the request as a one-off, in which case it will take time to decide which direction to take.

Am I confident in the organization?

Do I work for a library that can handle mistakes well?  Where if the data change turns out to be misguided, is able to roll with the punches?  Or do I work for an unhealthy organization where a mistake means months of recriminations? Or where the catalog is just one of the fronts in a war between the public and technical services departments?

Can I expect to get compensated for performing the data change successfully? Or am I effectively being treated as if were the stupid, over-precise computer?

If the organization is unhealthy, I may need to spend more time than ought to be necessary to protect my back – or I may end up spending a lot of time not just implementing data changes, but data oscillations.

The pattern should be clear: part of the process of estimating how long it might take to effect a data change is estimating how much confidence I have about the change.  Generally speaking, higher confidence means less time would be needed to make the change – but of course, confidence is a quality that cannot be separated from the people and organizations who might work on the change.

In the extreme – but common – case, if I start from a state of very low confidence, it will take me time to reach a sufficient degree of confidence to make any time estimate at all.  This is why I like a comment that Owen Stephens made in the Slack:

Perhaps this is part of the answer to [Alison]: Q: Always ask how long it will take to investigate and get an idea of how difficult it is.

In the next post, I discuss how various constraints can affect time estimates.

CC BY-SA 4.0 How long does it take to change the data, part I: confidence by Galen Charlton is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.