hacker news Hacker News
  1. new
  2. show
  3. ask
  4. jobs
Hey, For those of you who don't know my little tool Musoq, I wanted to introduce it as a small tool that allows you to query with SQL-like syntax without any database.

It allows you to query various things from niche ones like CAN DBC files, weird ones like C# code, interesting ones with Git querying to regular stuff like CSV, TSV and various others.

I am quite a bit experimenting with various things so I'm hybridizing the engine with LLMs or doing other weird stuff that are more or less practical :-)

I wanted also to share some recent developments in this little project as I hope it might be interesting to some of you.

New Experimental Plugins: * Git Plugin (Beta): I've been working on Git repository querying - managed to test it on the EF Core repo (16k commits) and it seems to work okay * Roslyn Plugin (Beta): Added basic C# code analysis capabilities

For the very first time: I've extended CROSS APPLY to use computed results as arguments! Now the operator can use values from the current row as inputs. Here's an example:

  SELECT
    f.DirectoryName,
    f.FileName
  FROM #os.directories('/some/path', false) d
  CROSS APPLY #os.files(d.FullName, true) f
  WHERE d.Name IN ('Folder1', 'Folder2')
After another pack of fixes I'm finally able to query multiple git repositories AT ONCE!

  with ProjectsToAnalyze as (
    select
        dir2.FullName as FullName
    from #os.directories('D:\repos', false) dir1
    cross apply #os.directories(dir1.FullName, false) dir2
    where
        dir2.Name = '.git'
  )
  select
    c.Message,
    c.Author,
    c.CommittedWhen
  from ProjectsToAnalyze p cross apply #git.repository(p.FullName) r 
  cross apply r.Commits c
  where c.AuthorEmail = '[email protected]'
  order by c.CommittedWhen desc
Under the Hood: - Added a Buckets feature for memory management (currently just testing it with the Roslyn plugin)

- Moved to .NET 8

- Added CROSS/OUTER APPLY operators

- Made some improvements to error messages and runtime behavior

New piping features: I've been experimenting with piping capabilities: * Image Analysis with LLMs:

  ./Musoq.exe image encode "image.jpg" | ./Musoq.exe run query "select s.Shop, s.ProductName, s.Price from ..."
* Text Data Extraction:

  Get-Content "ticket.txt" | ./Musoq.exe run query "select t.TicketNumber, t.CustomerName ... from #stdin.text('Ollama', 'llama3.1') t"
* Data Source Combination:

  { docker image ls; ./Musoq.exe separator; docker container ls } | ./Musoq.exe run query "..."
I'm working on comprehensive documentation: I encourage you especially to look at section "Practical Examples and Applications" and "Data Sources" where you can look at all the tables the tool currently provides. <https://puchaczov.github.io/Musoq/>

Other Changes:

- Made some improvements to OS and Archive data sources (OS can now query metadata like EXIF)

- Added a few fields to CAN DBC plugin

- Command outputs can now be used as inputs for queries

I'm hoping to:

- Improve stability and add more tests

- Flesh out the documentation

- Work on package distribution (Scoop, Ubuntu packages)

- Share some examples of source code querying with Roslyn

Ideas for later:

- WHERE robust analysis and optimizations

- DISTINCT operator implementation

- PROTOBUF schema support

- Performance improvements

- Query parallelization

- Recursive CTEs

- Subqueries

I'd really appreciate any thoughts or feedback!

The documentation section where I write a short analysis of EF Core with git plugin: <https://puchaczov.github.io/Musoq/practical-examples-and-app...>

loading...