New ScanPptx scanner #147

MSAdministrator · 2025-12-18T19:02:09Z

Describe the change

This PR adds a new ScanPptx scanner for extracting metadata and text content from Microsoft PowerPoint OOXML files (.pptx, .pptm, .potx, .ppsx).

Summary

Adds ScanPptx scanner that extracts document metadata (author, title, creation date, etc.), slide/word/image counts, hyperlinks, full text of notes content, and optionally full text content
Uses the python-pptx library (v1.0.2) for reliable OOXML parsing
Follows the same pattern as the existing ScanDocx scanner for consistency

New dependency

python-pptx==1.0.2

Motivation

PPTX files were previously only processed by ScanZip (extracting internal XML files) without extracting the actual presentation content. This scanner enables direct extraction of slide text, which is valuable for content analysis and detecting phishing lures embedded in PowerPoint presentations.

Files changed

File	Description
`src/python/strelka/scanners/scan_pptx.py`	New scanner
`build/python/backend/requirements.txt`	Added python-pptx dependency
`build/configs/scanners.yaml`	Scanner configuration
`build/configs/taste.yara`	Added `pptx_file` YARA rule
`src/python/strelka/tests/test_scan_pptx.py`	Unit tests
`src/python/strelka/tests/test_scan_pptx_standalone.py`	Standalone pytest
`src/python/strelka/tests/fixtures/test.pptx`	Test fixture

Describe testing procedures

Unit tests

python -m pytest src/python/strelka/tests/test_scan_pptx.py -v
python -m pytest src/python/strelka/tests/test_scan_pptx_standalone.py -v### Manual testing

Build and run Strelka

docker-compose up --build

Submit a PPTX file

strelka-fileshot -c fileshot.yaml /path/to/test.pptx## Sample output

{
  "file": {
    "flavors": {
      "mime": ["application/vnd.openxmlformats-officedocument.presentationml.presentation"],
      "yara": ["pptx_file", "zip_file"]
    },
    "scanners": ["ScanPptx", "ScanZip"]
  },
  "scan": {
    "pptx": {
      "elapsed": 0.045123,
      "flags": [],
      "author": "",
      "category": "",
      "comments": "generated using python-pptx",
      "content_status": "",
      "created": 1359299656,
      "identifier": "",
      "keywords": "",
      "language": "",
      "last_modified_by": "Test Author",
      "modified": 1359299758,
      "revision": 1,
      "subject": "",
      "title": "",
      "version": "",
      "slide_count": 4,
      "word_count": 307,
      "image_count": 1,
      "notes": [
            "Speaker notes for slide 1: Introduction to contract update.",
            "Speaker notes for slide 2: Summary of key changes.",
            "Speaker notes for slide 3: Required steps for completion.",
            "Speaker notes for slide 4: Contact information and support.",
       ],
      "hyperlinks": [
        "https://test.tracking-domain.example.com/click/https%3A%2F%2Fphishing.example.com%2Flogin/tracking-id-12345#6a6f686e2e646f65406578616d706c652e636f6d"
      ]
    }
  }
}

Checklist

My code follows the style guidelines of this project
I have performed a self-review of and tested my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings

socket-security · 2025-12-18T19:02:42Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	python-pptx@1.0.2

View full report

notion-workspace · 2025-12-18T19:08:06Z

Create strelka PPTX scanner

MSAdministrator added 5 commits December 18, 2025 12:58

Adding scanner config for new pptx scanner

cdd2464

Updating yara files to support new scanner

caaed50

Adding python-pptx dependency

3fce107

Adding new ScanPptx strelka scanner class

f9ddb2c

Adding tests

10242da

MSAdministrator self-assigned this Dec 18, 2025

MSAdministrator added 3 commits December 18, 2025 15:07

Updating event to make single call and reduce variable allocation

edb2de0

Extracting text from notes

bc83858

Updating tests with related changes

197866d

MSAdministrator changed the title ~~Msadministrator.new.pptx scanner 2~~ Msadministrator.new.pptx scanner Dec 18, 2025

MSAdministrator changed the title ~~Msadministrator.new.pptx scanner~~ New ScanPptx scanner Dec 18, 2025

cameron-dunn-sublime approved these changes Dec 19, 2025

View reviewed changes

MSAdministrator added 2 commits December 29, 2025 10:03

Switching to use urls instead of hyperlinks

9f08a37

Updating collection of urls

7267c82

MSAdministrator merged commit d146b0a into main Jan 5, 2026
3 checks passed

MSAdministrator deleted the msadministrator.new.pptx_scanner_2 branch January 5, 2026 15:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New ScanPptx scanner #147

New ScanPptx scanner #147

Uh oh!

MSAdministrator commented Dec 18, 2025 •

edited

Loading

Uh oh!

socket-security bot commented Dec 18, 2025 •

edited

Loading

Uh oh!

notion-workspace bot commented Dec 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

New ScanPptx scanner #147

New ScanPptx scanner #147

Uh oh!

Conversation

MSAdministrator commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the change

Summary

New dependency

Motivation

Files changed

Describe testing procedures

Unit tests

Build and run Strelka

Submit a PPTX file

Checklist

Uh oh!

socket-security bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

notion-workspace bot commented Dec 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MSAdministrator commented Dec 18, 2025 •

edited

Loading

socket-security bot commented Dec 18, 2025 •

edited

Loading