Skip to content

hyparam/parquet-grep

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

parquet-grep

npm minzipped workflow status mit license

A CLI tool for searching text within Apache Parquet files. Works like grep but for Parquet files, with support for recursive directory search and multiple output formats.

Built on top of hyparquet for high-performance Parquet parsing.

Installation

npm install -g parquet-grep

Or use directly with npx:

npx parquet-grep "search term" file.parquet

Usage

parquet-grep [options] <query> [parquet-file]

Options

  • -i - Force case-insensitive search (by default: case-insensitive if query is lowercase, case-sensitive if query contains uppercase)
  • -v - Invert match (show non-matching rows)
  • -m <n> / --limit <n> - Limit matches per file (default: 5, 0 = unlimited). Shows "..." when limit is exceeded
  • --offset <n> - Skip first N matches per file (default: 0). Useful with --limit for pagination
  • --table - Output in markdown table format (default, grouped by file)
  • --jsonl - Output as JSON lines (one match per line with filename, rowOffset, and value)

If no file is specified, recursively searches all .parquet files in the current directory, skipping node_modules and hidden directories.

Examples

Search a single file:

parquet-grep "Holland" bunnies.parquet

Search recursively in current directory:

parquet-grep "search term"

Case-insensitive search:

parquet-grep -i "HOLLAND" bunnies.parquet

JSONL output:

parquet-grep --jsonl "Holland" bunnies.parquet

Limit results:

parquet-grep --limit 10 "search term" file.parquet  # Show at most 10 matches per file
parquet-grep --limit 0 "search term" file.parquet   # Unlimited matches

Pagination with offset and limit:

parquet-grep --offset 5 --limit 10 "search term" file.parquet  # Show matches 5-14 (skip first 5)
parquet-grep --offset 0 --limit 5 "search term" file.parquet   # Show first 5 matches

About

Grep your parquet files

Resources

License

Stars

Watchers

Forks

Packages

No packages published