A CLI tool for searching text within Apache Parquet files. Works like grep but for Parquet files, with support for recursive directory search and multiple output formats.
Built on top of hyparquet for high-performance Parquet parsing.
npm install -g parquet-grepOr use directly with npx:
npx parquet-grep "search term" file.parquetparquet-grep [options] <query> [parquet-file]-i- Force case-insensitive search (by default: case-insensitive if query is lowercase, case-sensitive if query contains uppercase)-v- Invert match (show non-matching rows)-m <n>/--limit <n>- Limit matches per file (default: 5, 0 = unlimited). Shows "..." when limit is exceeded--offset <n>- Skip first N matches per file (default: 0). Useful with --limit for pagination--table- Output in markdown table format (default, grouped by file)--jsonl- Output as JSON lines (one match per line with filename, rowOffset, and value)
If no file is specified, recursively searches all .parquet files in the current directory, skipping node_modules and hidden directories.
Search a single file:
parquet-grep "Holland" bunnies.parquetSearch recursively in current directory:
parquet-grep "search term"Case-insensitive search:
parquet-grep -i "HOLLAND" bunnies.parquetJSONL output:
parquet-grep --jsonl "Holland" bunnies.parquetLimit results:
parquet-grep --limit 10 "search term" file.parquet # Show at most 10 matches per file
parquet-grep --limit 0 "search term" file.parquet # Unlimited matchesPagination with offset and limit:
parquet-grep --offset 5 --limit 10 "search term" file.parquet # Show matches 5-14 (skip first 5)
parquet-grep --offset 0 --limit 5 "search term" file.parquet # Show first 5 matches