Metadata-Version: 2.4
Name: binaryornot
Version: 0.6.0
Summary: Ultra-lightweight pure Python package to check if a file is binary or text.
Project-URL: bugs, https://github.com/binaryornot/binaryornot/issues
Project-URL: changelog, https://github.com/binaryornot/binaryornot/releases
Project-URL: documentation, https://binaryornot.github.io/binaryornot/
Project-URL: homepage, https://github.com/binaryornot/binaryornot
Author-email: Audrey Roy Greenfeld <aroy@alum.mit.edu>
Maintainer-email: Audrey Roy Greenfeld <aroy@alum.mit.edu>
License: MIT
License-File: LICENSE
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Typing :: Typed
Requires-Python: >=3.10
Description-Content-Type: text/markdown

# BinaryOrNot

Python library and CLI tool to check if a file is binary or text. Zero dependencies.

```python
from binaryornot.check import is_binary

is_binary("image.png")    # True
is_binary("README.md")    # False
is_binary("data.sqlite")  # True
is_binary("report.csv")   # False
```

```sh
$ binaryornot image.png
True
```

## Install

```sh
pip install binaryornot
```

## Why not just check for null bytes?

That's the first thing everyone tries. It works until it doesn't:

- A UTF-16 text file is full of null bytes. Your tool thinks it's binary and corrupts it.
- A Big5 or GB2312 text file has high-ASCII bytes everywhere. Looks binary by byte ratios alone.
- A font file (.woff, .eot) is clearly binary but might not have null bytes in the first chunk.

BinaryOrNot reads the first 128 bytes and runs them through a trained decision tree that considers byte ratios, Shannon entropy, encoding validity, BOM detection, and more. It handles all the edge cases above correctly, with zero dependencies.

Tested against [37 text encodings and 49 binary formats](https://binaryornot.github.io/binaryornot/usage/), verified by parametrized tests driven from coverage CSVs.

## API

One function:

```python
from binaryornot.check import is_binary

is_binary(filename)  # returns True or False
```

There's also `is_binary_string()` if you already have bytes:

```python
from binaryornot.helpers import is_binary_string

is_binary_string(b"\x00\x01\x02")  # True
is_binary_string(b"hello world")   # False
```

[Full documentation](https://binaryornot.github.io/binaryornot/) covers the detection algorithm in detail.

## Credits

Created by [Audrey Roy Greenfeld](https://audrey.feldroy.com).
