This is impressive! We use Metabase and I've been wanting this exact user experience for quite some time. So far, I've been dumping our Postgres schema into a Claude project and asking it to generate queries. This works surprisingly well, save for the tedious copy-paste between the two tabs. The Chrome extension workflow makes perfect sense.
Is there a way to select which model is being used? Anecdotally, I've found that Claude 3.5 Sonnet works incredibly well with even the most complex queries in one shot, which is not something I've seen with GPT-4o.
nuwandavek 148 days ago [-]
Haha, yes! We were doing the exact same thing. Also, there is so much context you can't capture with just table schema that you can if you integrate the extension deep into the tool. It also unlocks cross-app contexts (we're working on a way to import context from a doc to a metabase query, or from a sheet/dashboard to a jupyter notebook etc.
> Is there a way to select which model is being used?
Not at the moment, but this is in our pipeline! We will enable this (and the ability to edit the prompts, etc.) very soon.
Do try it out and let me know what you think!
147 days ago [-]
zurfer 148 days ago [-]
I love that you can take a screenshot and it starts to explain what it sees!
We love that feature too and use it quite a bit ourselves!
> Not quite sure if this should be a separate category?
We see ourselves at the intersection of generic browser-automation agents and generic coding agents. MinusX integrates deeply into jupyter/metabase (we had to do a lot of shenanigans to get the entire jupyter app context) and has more context than RPA agents do today. It is possible that eventually all these apps will converge, but we think MinusX will be more useful for anything data related than any of them for the foreseeable future.
To paraphrase geohot, we think that the path to advanced agents runs through specialized, useful intermediaries.
Erazal 147 days ago [-]
I really like you retrofit analogy - not sure if you coined it or geohot has.
It seems to me that's where a ton of start-ups are currently converging - not repairing the old, which would be too complicated, but understanding and "mending" for new usages, or functionalities.
nuwandavek 147 days ago [-]
Thanks! Not sure, I think the term has been in the ether for a while.
Yeah, I see that too. I think for the longest time there was no leverage in doing this sort of retrofitting (except for grammarly type of use cases). But with better intent capture (llms help here), we can actually fix up any existing gaps!
Erazal 147 days ago [-]
A Grammarly-style retrofit would’ve actually been appropriate here—I made a syntax error.
If you don’t mind, I’ll be stealing and using that analogy!
We were talking about that approach today with a friend who unifies parking apps across the country. He calls his engine UMM—Ultimate Macro Machine.
I’m working in the “classic” generic browser-automation agents with a unified API for meeting bots (transcription, voice input, etc.).
nuwandavek 147 days ago [-]
Haha, the analogy is totally yours to use :)
Nice, yeah, there is a lot of leverage in building agent-like hooks into current workflows. Even if the agents are pretty mid right now (they are for any complex use case that needs long horizon planning), it's a great place to be in time for the next generation models to drop!
edmundsauto 148 days ago [-]
How does the AI know about things like other tables? Does it have some basic knowledge of Metabase’s link structure so it can navigate to a listing of all tables, then pulls context from there for in-context learning while writing the query?
Anecdotally, my hardest problems w/ nl2sql are finding the right tables and adding the right filters.
ppsreejith 148 days ago [-]
Yep! MinusX uses Metabase APIs to pull relevant tables, schema, & dashboards to construct the context for your instruction.
> Anecdotally, my hardest problems w/ nl2sql are finding the right tables and adding the right filters.
Totally! especially in large orgs with thousands of tables. Using your existing dashboards and queries, gives useful context on picking the right tables for the query.
Test case: "Find leading economic indicators like bond yield curve from discoverable datasets, and cache retrieved data like or with pandas-datareader"
Use case: Teach Applied ML, NNs, XAI: Explainable AI, and first ethics
Yellowbrick ML; teaches ML concepts with Visualizers for humans working with scikit-learn, which can be used to ensemble LLMs and other NNs because of its Estimator interfaces : https://www.scikit-yb.org/en/latest/
Manim, ManimML, Blender, panda3d, unreal: "Explain this in 3d, with an interactive game"
Khanmigo; "Explain this to me with exercises"
"And Calculate cost of computation, and Identify relatively sustainable lower-cost methods for these computations"
"Identify where this process, these tools, and experts picking algos, hyperparameters, and parameters has introduced biases into the analysis, given input from additional agents"
__gcd 148 days ago [-]
This is very interesting. Can we bring our own API keys? Is that in the roadmap?
nuwandavek 148 days ago [-]
Yes! Both bring-your-own-keys and local models are on the roadmap. The ETA for both is ~1-2 weeks.
altdataseller 148 days ago [-]
In your demo, you seemed to have performed everything on a small dataset.
How’s the performance on doing the same analysis on a dataset with 1 billion rows for instance?
Also does this work with self hosted Metabase or Metabase Cloud? Or both?
ppsreejith 148 days ago [-]
> How’s the performance on doing the same analysis on a dataset with 1 billion rows for instance?
This really depends on whether your tool can handle the scale. We only use a sample of the outputs when constructing the context for your instruction so it should be independent of the scale of the data. We mostly use metadata such as table names, fields, schemas etc to construct the context.
> Also does this work with self hosted Metabase or Metabase Cloud? Or both?
Yep, it should work on both :) We have users across both
btown 148 days ago [-]
While I’m excited about the launch, I’m concerned that your data policies are extremely vague and seem to contain typos and missing parentheticals. As of 12:30p ET they say:
> We have nuanced privacy controls on minusx. Any data you share, which will be used to train better, more accurate models). We never share your data with third parties.
What are these nuanced controls? What data is used to train your models? Just column names and existing queries, or data from tables and query results as well that might be displayed on screen? Are your LLMs running entirely locally on your own hardware, and if not, how can you say the data is not shared with third parties? (EDIT: you mentioned GPT-4o in another comment so this statement cannot be correct.)
https://avanty.app/ is doing something similar in the Metabase space and has more clarity on their policies than you do.
Frankly, given the lack of care in your launch FAQs about privacy, it’s a hard ask to expect that you will treat customer data privacy with greater care. There is definitely a need for innovation in this space, but I’m unable to recommend or even test your product with this status quo.
nuwandavek 148 days ago [-]
I totally share your concerns about data (especially data that may be sensitive). We have a simple non-legal-speak privacy policy here: https://minusx.ai/privacy-simplified.
> Are your LLMs running entirely locally on your own hardware, and if not, how can you say the data is not shared with third parties? (EDIT: you mentioned GPT-4o in another comment so this statement cannot be correct.)
We're currently only using API providers (OAI + Claude) that do not themselves train on data accessed through APIs. Although they are technically third parties, they're not third parties that harvest data.
I recognize that even this may just be empty talk. We're currently working on 2 efforts that I think will further help here:
- opensourcing the entire extension so that users can see exactly what data is being used as LLM context (and allow users to extend the app further)
- support local models so that your data never leaves your computer (ETA for both is ~1-2 weeks)
We are genuinely motivated by the excitement + concerns you may have. We want to give an assistant-in-the-browser alternative to people who don't want to move to AI-native-data-locked-in platforms. I regret that was not transparent in our copy.
Thanks for pointing the error in the FAQs, we somehow missed it. It is fixed now!
penthi 148 days ago [-]
Very cool. Why is the ai so fast? (Impressive)
ppsreejith 148 days ago [-]
We've done a bunch of work to strip down the context and minimise the output tokens (which tends to be 100x as slow as input tokens). GPT-4o is pretty fast too :)
penthi 148 days ago [-]
Thanks for the explanation. Can't wait to see the code when you open it up!
world2vec 148 days ago [-]
This looks cool. Current company uses Metabase extensively and this could be handy. What LLM is being used?
ppsreejith 148 days ago [-]
Currently, we're using GPT-4o. We've tested it with Claude as well and plan to roll out support soon!
KeithBrink 148 days ago [-]
Any chance of a Firefox extension?
ppsreejith 148 days ago [-]
As a Firefox user myself, yes! We plan to launch for other browsers after open sourcing MinusX (in ~1-2 weeks).
kshmir 148 days ago [-]
What happens when Metabase releases this? (Asking without malice!)
ppsreejith 148 days ago [-]
We're building an assistant that works across all your analytics apps. This means MinusX can use context from multiple apps to better fulfil your instructions. You can imagine a future version of MinusX reading data from a spreadsheet, putting it onto a Jupyter notebook / Metabase Table, and running further analysis.
When Metabase (or any other tool) builds an assistant, we aim to use it to further extend MinusX's capabilities!
altdataseller 148 days ago [-]
What other analytics tools do you plan on supporting?
ppsreejith 148 days ago [-]
We're currently exploring the tools displayed on our website (Tableau, Grafana, Colab, & Google Sheets). But if you have a specific tool in mind, please do tell us at https://minusx.ai/tool-request
mqoca 147 days ago [-]
When do you expect Tableau support to be available?
Is there a way to select which model is being used? Anecdotally, I've found that Claude 3.5 Sonnet works incredibly well with even the most complex queries in one shot, which is not something I've seen with GPT-4o.
> Is there a way to select which model is being used? Not at the moment, but this is in our pipeline! We will enable this (and the ability to edit the prompts, etc.) very soon.
Do try it out and let me know what you think!
While this is clearly an ai analytics assistant your "retrofit" approach certainly differentiates you from existing approaches: https://github.com/Snowboard-Software/awesome-ai-analytics
Not quite sure if this should be a seperate category? It's more similar to the web automation agents like https://www.multion.ai/ than to https://www.getdot.ai/.
> Not quite sure if this should be a separate category?
We see ourselves at the intersection of generic browser-automation agents and generic coding agents. MinusX integrates deeply into jupyter/metabase (we had to do a lot of shenanigans to get the entire jupyter app context) and has more context than RPA agents do today. It is possible that eventually all these apps will converge, but we think MinusX will be more useful for anything data related than any of them for the foreseeable future.
To paraphrase geohot, we think that the path to advanced agents runs through specialized, useful intermediaries.
It seems to me that's where a ton of start-ups are currently converging - not repairing the old, which would be too complicated, but understanding and "mending" for new usages, or functionalities.
Yeah, I see that too. I think for the longest time there was no leverage in doing this sort of retrofitting (except for grammarly type of use cases). But with better intent capture (llms help here), we can actually fix up any existing gaps!
If you don’t mind, I’ll be stealing and using that analogy!
We were talking about that approach today with a friend who unifies parking apps across the country. He calls his engine UMM—Ultimate Macro Machine.
I’m working in the “classic” generic browser-automation agents with a unified API for meeting bots (transcription, voice input, etc.).
Anecdotally, my hardest problems w/ nl2sql are finding the right tables and adding the right filters.
> Anecdotally, my hardest problems w/ nl2sql are finding the right tables and adding the right filters.
Totally! especially in large orgs with thousands of tables. Using your existing dashboards and queries, gives useful context on picking the right tables for the query.
Use case: Evidence-based policy; impact: https://en.wikipedia.org/wiki/Evidence-based_policy
Test case: "Find leading economic indicators like bond yield curve from discoverable datasets, and cache retrieved data like or with pandas-datareader"
Use case: Teach Applied ML, NNs, XAI: Explainable AI, and first ethics
Tools with integration opportunities:
Google Model Explorer: https://github.com/google-ai-edge/model-explorer
Yellowbrick ML; teaches ML concepts with Visualizers for humans working with scikit-learn, which can be used to ensemble LLMs and other NNs because of its Estimator interfaces : https://www.scikit-yb.org/en/latest/
Manim, ManimML, Blender, panda3d, unreal: "Explain this in 3d, with an interactive game"
Khanmigo; "Explain this to me with exercises"
"And Calculate cost of computation, and Identify relatively sustainable lower-cost methods for these computations"
"Identify where this process, these tools, and experts picking algos, hyperparameters, and parameters has introduced biases into the analysis, given input from additional agents"
How’s the performance on doing the same analysis on a dataset with 1 billion rows for instance?
Also does this work with self hosted Metabase or Metabase Cloud? Or both?
This really depends on whether your tool can handle the scale. We only use a sample of the outputs when constructing the context for your instruction so it should be independent of the scale of the data. We mostly use metadata such as table names, fields, schemas etc to construct the context.
> Also does this work with self hosted Metabase or Metabase Cloud? Or both?
Yep, it should work on both :) We have users across both
> We have nuanced privacy controls on minusx. Any data you share, which will be used to train better, more accurate models). We never share your data with third parties.
What are these nuanced controls? What data is used to train your models? Just column names and existing queries, or data from tables and query results as well that might be displayed on screen? Are your LLMs running entirely locally on your own hardware, and if not, how can you say the data is not shared with third parties? (EDIT: you mentioned GPT-4o in another comment so this statement cannot be correct.)
https://avanty.app/ is doing something similar in the Metabase space and has more clarity on their policies than you do.
Frankly, given the lack of care in your launch FAQs about privacy, it’s a hard ask to expect that you will treat customer data privacy with greater care. There is definitely a need for innovation in this space, but I’m unable to recommend or even test your product with this status quo.
> Are your LLMs running entirely locally on your own hardware, and if not, how can you say the data is not shared with third parties? (EDIT: you mentioned GPT-4o in another comment so this statement cannot be correct.)
We're currently only using API providers (OAI + Claude) that do not themselves train on data accessed through APIs. Although they are technically third parties, they're not third parties that harvest data.
I recognize that even this may just be empty talk. We're currently working on 2 efforts that I think will further help here:
- opensourcing the entire extension so that users can see exactly what data is being used as LLM context (and allow users to extend the app further)
- support local models so that your data never leaves your computer (ETA for both is ~1-2 weeks)
We are genuinely motivated by the excitement + concerns you may have. We want to give an assistant-in-the-browser alternative to people who don't want to move to AI-native-data-locked-in platforms. I regret that was not transparent in our copy.
Thanks for pointing the error in the FAQs, we somehow missed it. It is fixed now!
When Metabase (or any other tool) builds an assistant, we aim to use it to further extend MinusX's capabilities!