Human in the loop #1306
Replies: 3 comments
-
|
@remi-braun thanks for the comments, questions, and frankly for noticing the changes to xarray-spatial. Happy to answer them one by one. Also, thanks for your thoughtful comments on the hillshade discrepancies from a while back on issue #748 . I don't mean to distract... but let's start with another question first...why did it take so long for me / others to fix issue #748 you were involved in? The issue was filed in 2022 by @thuydotm, 2023 I was triaging issues but something must have distracted...2024 you posted an excellent comment...2026 fixes were made, partly using help from Claude and we merged a fix. This is NOT an ideal timeline for problem to solution to release!! The reason to why such a timeline, like you pointed out, is that xarray-spatial stalled for a while. It stalled because I personally was unable to support the project due a few things:
This gives you a bit of context about my perspective on why the project stalled. Let me offer a personal apology that your contribution to the project wasn't better integrated and integrated sooner.
You're welcome, but I added the Claude files so I myself could share them on the different machines for testing, not transparency. It's also nice that other contributors can use them and I was also excited about hearing the military / US government banned Claude...
Yes, there is a human in the loop, and it's me. Nope, xarray-spatial is not a "testing repo"...although any software endeavor is going to "test" the ability of its contributors, human or agent, to accomplish goals in its given domain.
Is it fair if I rephrased the question as: "Why reimplement things that already have implementations folks use from Python and where do you find the motivation to so?" I wanted a geospatial stack in Python that doesn't involve GDAL because I have personally found GDAL difficult to install and extend. The world owes GDAL a huge "thank you", and I'm not saying that "write once, and wrap for all other languages" is a bad idea...but I wanted a library for geospatial python that didn't delegate heavy lifting to C/C++ extensions. I wanted a toolbox of geospatial "ufuncs" that didn't create new data structures (e.g. Let's be clear, there are redundancies that can't simply be explained by, "I don't like GDAL". xrspatial.hydro overlaps with pysheds...a great library with numba integration, but lacks focus on xarray, cupy or dask. Mapclassify is awesome, but focuses more on vector data than raster...and then there are many libraries that do similar things, but I don't like their spellings / interfaces. Zonal tools would be a good example of that. There may be many other popular tools I'm unfamiliar with that make features of xarray-spatial redundant, but its not my goal to duplicate for duplication's sake. To address your comment, "...moreover on your own"...yes its hard. I don't have institutional backing and I contribute a lot of unpaid personal time, but I do love it again now and want to increase momentum.
There is not "exact" scope, but I do have a some guiding rules:
I don't fully understand this question. Do you feel the library is asserts a claim of "geospatial completeness" or just "geospatial raster completeness"? My hope is that neither because we haven't even made it to 1.0.0 release yet. There is no stable API yet for xarray-spatial. I hope to complete a 1.0.0 release when all existing tools are fully validated for accuracy, performance, security and backend array parity (numpy, dask, cupy, dask+cupy).
Do you mean within the geospatial ecosystem, python, or xarray more specifically?
I'm not familiar with "bus number". Happy to learn though if you can provide more details. Claude doesn't scare me, but the issue #748 timeline does scare me. Core developers of popular libraries deciding to have more well rounded lives scare me too.
Not sure about "these issues", but the last few releases of xarray-spatial have been focused on security updates and accuracy updates. Each day I usually am able to run a few modules through security, performance, or accuracy sweeps. I've been finding many CRITICAL, and HIGH category bugs and fixing as fast as possible. Most of the recent frenetic pace has been fixing performance and security bugs...but there are many new features in 2026, and my hope is that "going fast fixes going fast". |
Beta Was this translation helpful? Give feedback.
-
|
Hello, First of all, thanks for the reply 😄 I totally understand your struggles and lack of time to maintain the library. Open source is far from easy and relies too many times on the maintainer's personal dedication. This discussion really doesn't aim to destabilise you or whatever. However, as AI usage grows, I think these practical and ethical discussions needs to be done.
ContextI had time to think, and maybe I have to explain my biggest discomforts with the new development of this library to add some context. I won't say that AI is not helping us all a lot finding bugs, etc. The issues I want to talk about here are elsewhere.
DiscussionAnd now, back to the discussion we had. I totally understand your need of a GDAL-free geospatial library. However, this task may look like a Promethean one, right? I think the right way of achieving this should to call to the community and ally with others, not maintain the entire ecosystem on you own. Even less relying so much on an AI. This is fragmentation, as there is for sure other people elsewhere that have the same goal as yourself, maybe building things that are already recognized, but you are like bypassing them using AI. This is where the bus factor also intervene. This concept, popular in open source, measures the "minimum number of team members that have to suddenly disappear from a project before the project stalls due to lack of knowledgeable or competent personnel". As you highlighted, it was more than one in the beginning when you had funds, decreased to one when you lost it. But now, it is even less than one, as you cannot state you still 100% own this library: there has been too many changes for a human to carefully follow. I suppose that you don't have suddenly more time to dedicate to this library, right? So how come using this much any generative AI tool can ensure this library is properly maintained? AI should be a copilot, not an autopilot. You could say, OK but my library is very well tested, which seems true, so everything AI does is working. I wouldn't agree. In your tests, I don't see any real-world examples (at least in flood and fire files), so how could I be sure your functions are working? I don't want to invest time and money to find out. These questions come from the fact that the trust implicitly existing between good willing humans has been broken. When a human created his library, it was legitimate to think the code is used elsewhere and tested in the real world, as people rarely do thing for the beauty of it. Now that it's AI-generated, I am not sure anymore: it could just be code for the sake of the ecosystem completion (like you seems to push for in the other discussion thread), without any real-world testing. This is an illusion of completeness: in your docs you do a lot of things, but I am not sure it lives up to the reality. I am really skeptical about your statement: "going fast fixes going fast". I am more in favor of "take your time to go fast" 😄 ConclusionThanks for reading all this, now you know my position. Your library is very interesting for that and started a lot of discussions with colleagues and other people I know. This question it raises are very interesting ethically et philosophically 😉 |
Beta Was this translation helpful? Give feedback.
-
|
@remi-braun Thanks for follow up and I totally understand about the language barrier. Thanks for doing this in English. You present valid concerns. I think it will help both of us to avoid hyperbolic-ish language like (e.g. "crazy", "extremely", "aims to do everything", "vibe coded"). Let's calibrate our words and make progress here. As an overall disclaimer: I'm a real human with actual constraints. Agentic workflows do change those constraints. I'm an open source project maintainer by self-selection. My opinions are fundamentally biased by my life experience.
When you use an LSP is that you? What if the LSP suggested an out-of-date function parameter? Could you please setup a separate github account for any code that used LSP autocomplete (joke)? When you use spell-checker, is that "you" anymore...even though you are falsely portraying an inflated level of spelling consistancy. Did my typo in "consistancy" enhance this conversation because you now know I have a tendency to swap "a" for "e" when typing that word? I think we can agree that it is a matter of degree. Xarray-Spatial is moving too fast for your taste and you don't have the time or money to verify its claims. That is a fair position to take. I think one should verify the claims of libraries before deploying to prod and you personally may not have bandwidth for that. I think we should be careful not over rely on human interaction in the context of mostly deterministic and verifiable toolsets. Have tests, keep testing, assume it doesn't work and keep testing.
This project lives within xarray-contrib org, but I don't receive any "backing" from Xarray or any official endorsement. I'm not sure about "implicit governance". As for "everything has changes", there have been few public api changes to existing tools besides adding new tools. Is there a module in particular, which changed during the last few months, you can cite as an example to strengthen your point here?
What policy change are you referring to? Sound like you are imply that this project had an AI contribution policy that has now changed. It has never had such a policy, nor will I be spending time to invent one, similar to the way I didn't write the software LICENSE. Are there currently AI policies (that aren't your opinions) you can point to that you feel projects like xarray-spatial should adopt? I haven't been marketing new xarray-spatial features around (blogs, youtube videos, meetups, conferences) yet because I'm focused on first getting to a 1.0.0 release with the scope mentioned above.
I think open source contribution has and is changing. Popular OS project were flooded with AI PRs, some honest enhancements, some dishonest attempts to gain community clout. I found this PR within the Dask repo particularly frustrating because what I personally want is for the Dask project to evolve quickly, potentially at the price of stability (or perceived stability). If I could have anything for Dask, it would be @mrocklin with unlimited tokens, a case of Le Croix, and no interruptions. Xarray-Spatial on the other hand, had ZERO slop PRs during the slopacolypse because the library simply was not a relevant player people wanted to enhance or gain clout from contributing to. I find "open-source spirit of community" to be too vague to be helpful here, similar to the "community-driven" issue label in EOReader. What are the community dynamics within EOReader that leave so many community-driven issues unfixed for years? EOReader hasn't closed a single community-driven issue in 2026?! Prove me wrong by fixing the issues listed in the link.
Each day, I choose whether or not I try to push xarray-spatial forward, but I can't make the decision for others. The past two months, I essentially been a lone developer not explicitly by choice...I just don't have the active contributors the project once had. Do you feel that past contributors should not get any credit because it gives a false impression of a large team backing the project? EOReader has 14 contributors, some non-human, but from the outside it seems like it just you...wrapping rasterio and xarray. You have 1600+ commits, and the next contributor is non-human, and the next has 14 commits. Why is EOReader different from your characterization of xarray-spatial regarding team size or longevity?
Thanks for acknowledging a need for a GDAL-free geospatial library. To plant a flag in the sand, xarray-spatial is not trying to duplicate GDAL or replace it. I just want to not require GDAL for some of my own common use cases and workflows while working on "real world" projects...I'm not there yet.
I'm not maintaining an entire ecosystem, just a raster toolbox, and hopefully others would want to help, but my primary objective is finishing a 1.0.0 scope as mentioned above. I agree on engaging with communities and having allies is important, but I think you are wrong about there being a "right way". If any "right way" exists, its hiring experienced engineers and paying them well...like money...not github stars. That is difficult because the market for software engineers has been distorted by the hiring practices of larger companies. To get specific, do the practices inside EOReader exemplify the standards for open source maintenance? Am I wrong that EOReader is primarily your own work and NOT "community driven"? I say that because it doesn't seem to have consistent community contribution, but does have consistent releases.
Why do you call have a raster toolbox free of GDAL a "Promethean" task, while then saying alternatives already exists. If you didn't wrap GDAL (i.e rasterio) for EOReader, what would you wrap instead from geotiff IO?
Ahh ok bus factor...how many people need to get run over by buses before a project can't be maintained...correct? I'm hoping to one day have a bus factor of NaN, but right now it is 1...because of my PyPI credentials I believe. That is a valid issue to raise and there has been discussion with xarray-contrib there.
Who pays and "who owns what" is a good question. Ownership here is laid out in the LICENSE file...but more fundamentally, when people contribute to open source, how do they still afford the basics of life? Some folks are academics and open source is a hobby and they survive on the tuition of students or institutional endowments. Some contributors are independently wealthy. Some have private business backing them, and some are funded by tax payer money. This stuff is difficult. I don't currently have the funds to allocate more than my own time to xarray-spatial, but I would love help from folks who can share a vision.
AI allows me to use smaller blocks of time in my fragmented schedule. In 2022, I needed large continuous blocks of uninterrupted time to get ANYTHING done. Being able to use a 25-minute block of time is a nonlinear productivity gain for me. Maybe you feel it's not true productivity. You feel differently about the tools and I respect your opinion.
Fine. How do you know the difference unless you really have your thumb on the pulse of a project? We all still have upper bounds to what we can contribute. Some days maybe things are moving too fast. Some days nothing happens.
I'm skeptical of my statement too... I think it possible that as an industry, community, or whatever, we've lost the luxury of "slow is smooth, smooth is fast". Sorry to call out EOReader again, but there are potential critical security bugs there that any malicious actor can discover, and weaponize on unsuspecting users of the library..., keyword "potential", because I myself don't have the necessary poisoned STAC catalog to exercise the vulnerability. So I forked EOReader and at least documented it in my fork: brendancol/eoreader#1 Do you think these issues are valid? They were discovered within 3 minutes of cloning eoreader and running a rudimentary security scan with Claude Code. Everyone's vulnerabilities on now on display for the world. Give a bored hacker, or a government hacker an evening of focus and they could craft the necessary poisoned STAC catalog to exploit EOReader users. You mention the impact on trust of using AI tools? What about community trust in maintainers to stay on the edge of available tools and improve their libraries to avoid such critical bugs? Closing thought...Any interest in seeing if we can collaborate on a few PRs? I don't have other folks in the project. If there is a concrete area of xarray-spatial you are interested in improving, I'm happy to focus there and/or review your code. Sorry if my response is incomplete or doesn't address things you called out. Thank you for putting time into this discussion. I find it motivating to connect with people, especially in other countries and wow what a beautiful place Strasbourg must be. I got to visit Colmar in 2017 after a GeoPython and loved it... 100% of my part of this discussion is free of AI... |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I must say I am a bit surprised by the way this library is now evolving.
xarray-spatialwas stalled for a couple of years and now it seems development has become crazy, with a multitude of new features appearing almost very day.You kindly added the Claude repository, so you 100% assume the extensive use generative AI, thanks for being so transparent. However, with this coding pace I must ask: is there still a human in the loop in
xarray-spatialor is this lib has become a testing repo for Claude's ability in geospatial?Moreover, other questions are emerging:
Best Regards,
Rémi
Beta Was this translation helpful? Give feedback.
All reactions