Skip to content

Conversation

@Pandapip1
Copy link
Contributor

Adds support for passing multiple IPs via --ip, adds :: as a default IP that's used, and adds prompts that ask whether to add v4 or v6 rules.

@StevenBlack
Copy link
Owner

Thank you for this Gavin @Pandapip1.

Why is there so much force-pushing going on in this PR? I can't seem to follow this at all.

What's going on here? Why is github-advanced-security having conniptions?

¯_(ツ)_/¯

@funilrys
Copy link
Contributor

funilrys commented Aug 10, 2025

Hey @Pandapip1 and thank you for taking the time.

Please do not take the following comment too personally. It's just my 2¢ on my first rough and quick skim.

I may review a bit deeper later when I have more time ...


I have the feeling a lot of these changes are overly optimistic. Some of them don’t seem to take into account that humans write and maintain these sources.

For example — there’s a reason why this section exists.
It’s the result of many misleading rules coming from upstream sources over time. That history or background may matters.


Another thing: this PR feels like it lacks some real-world background.

Take this function:
https://github.com/StevenBlack/hosts/pull/2945/files#diff-0666db8e6c944e3c08fdc8f42f9bf8d85edd3aca630d2fa08d8adb0ebd677f02R1170-R1199

It "forgets" that in the real world, underscores (_) are perfectly possible — and often seen — starting from the 3rd level domain. It’s not in the RFC, but it exists and works. And if you account for public suffixes, you’ll sometimes see underscores in the 2nd level too.

Some threats use that "trick" to bypass exactly this kind of validation and get whitelisted or ignored.
I don’t recommend performing domain validation (cf: is_valid_hostname) checks here at all. Let upstream handle it, like we’ve been doing so far - if I recall correctly (cf: Steve @StevenBlack can tell).


Domain validation is a weird world — a niche that demands real-world experience. And I know what I’m talking about…
Steve @StevenBlack already knows, but for context: I’m the maintainer of PyFunceble and have spent years analyzing how a domain should look (RFC) versus how they actually look, are used, and behave in the real world.

I would never recommend going down the domain-validation rabbit hole in a project this large and impactful with such a minimalistic function that misses real-world threats.
That path comes with a lot of responsibility… and a lot of edge cases that will bite you sooner or later.

Let’s leave that responsibility to upstream sources - they know better anyway.


Again: Just my 2¢ on my first rough and quick skim.

@Pandapip1
Copy link
Contributor Author

Why is there so much force-pushing going on in this PR? I can't seem to follow this at all.

I try to commit often, and then I git rebase -i any fixes into their respective commits. The git history itself is clean with each commit doing only what it says on the tin.

What's going on here? Why is github-advanced-security having conniptions?

There was some file-related stuff I messed up, thought I fixed, and had to re-fix (that was at least two of the force-pushes). I ended up switching to the with pattern as that makes it near impossible to mess up.

For example — there’s a reason why this section exists. It’s the result of many misleading rules coming from upstream sources over time. That history or background may matters.

That wasn't clear from the comments. I'm happy to preserve the existing behavior though (it can definitely still be cleaned up significantly).

It "forgets" that in the real world, underscores (_) are perfectly possible — and often seen — starting from the 3rd level domain. It’s not in the RFC, but it exists and works. And if you account for public suffixes, you’ll sometimes see underscores in the 2nd level too.

Yea. When I'm used to working with something unfamiliar to me, I usually go straight for the docs (in this instance, that would be the RFC). I thought I had a good sense of what the code was trying to do and what made a valid domain--I wish this were documented better! It wasn't so much forgetting as never really knowing that in the first place. I looked to see if python had a built-in DNS parsing library like it did for IPs (it doesn't).

I’m the maintainer of PyFunceble and have spent years analyzing how a domain should look (RFC) versus how they actually look, are used, and behave in the real world.

This is outside of the scope of this PR, but @StevenBlack would you accept me adding this as a dependency in a follow-up PR to use this for validation? This looks like pretty much exactly what the project needs, and means that you aren't maintaining a homegrown validator.


Anyway, I'll continue working on this. I'll continue to do force pushes to keep a clean git history (I generally try to make every commit implement its change and be fully functional, so that e.g. git bisect works well).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants