Add offline usage #4

Open
jolheiser wants to merge 1 commits from offline into master
Owner

Fixes #3

However, I have to admit it's not very fast and I'm not entirely sure the best way to speed it up.

It took ~50s to get through the entire file (~25GB decompressed) and report that a password had no breaches.

Obviously mileage will vary depending on whether it finds the password early/late in the file, which version of the text file is downloaded, etc.

@techknowlogick if you have any suggestions I'd like to hear them. ?

Fixes #3 However, I have to admit it's not very fast and I'm not entirely sure the best way to speed it up. It took ~50s to get through the entire file (~25GB decompressed) and report that a password had no breaches. Obviously mileage will vary depending on whether it finds the password early/late in the file, which version of the text file is downloaded, etc. @techknowlogick if you have any suggestions I'd like to hear them. ?
jolheiser added 1 commit 2020-09-07 22:53:24 +00:00
Signed-off-by: jolheiser <john.olheiser@gmail.com>
First-time contributor

It took ~50s to get through the entire file (~25GB decompressed) and report that a password had no breaches.

Yikes, this function then likely wouldn't be used in the real-world if it always took that long.

I wonder if there is an optimized way of looking for the specific hash. A couple ways off the top of my head would be to pre-optimize the file and break it up into many files where the name would be the first 5 chars of the hash, but that would double the file size on disk required as the original file would still be there (can't rm it as it'd be unexpected for the user), another way would be to jump to artibtrary parts of the file and do somewhat of a binary search (is the hash in first half of file or second half, if it second half then split that up into two and keep splitting and looking that way).

> It took ~50s to get through the entire file (~25GB decompressed) and report that a password had no breaches. Yikes, this function then likely wouldn't be used in the real-world if it always took that long. I wonder if there is an optimized way of looking for the specific hash. A couple ways off the top of my head would be to pre-optimize the file and break it up into many files where the name would be the first 5 chars of the hash, but that would double the file size on disk required as the original file would still be there (can't rm it as it'd be unexpected for the user), another way would be to jump to artibtrary parts of the file and do somewhat of a binary search (is the hash in first half of file or second half, if it second half then split that up into two and keep splitting and looking that way).
Author
Owner

The only problem with that is then the user would need to download a specific version of the archive, as HIBP releases two formats.

Sorted by hash or sorted by prevalence.

I think #5 might be a better option for the self-hosted users.


I wonder if perhaps it would make sense to have a convert command to load the file into a Bolt DB and then search against that.
I'm not sure how feasible that is, though.

The only problem with that is then the user would need to download a specific version of the archive, as HIBP releases two formats. Sorted by hash or sorted by prevalence. I think #5 might be a better option for the self-hosted users. ----- I wonder if perhaps it would make sense to have a `convert` command to load the file into a Bolt DB and then search against that. I'm not sure how feasible that is, though.
First-time contributor

Sorted by hash or sorted by prevalence.

Hmm... perhaps a test of the file could be run first (check if the file is sorted by hash) and fail if it isn't sorted?

> Sorted by hash or sorted by prevalence. Hmm... perhaps a test of the file could be run first (check if the file is sorted by hash) and fail if it isn't sorted?
This repo is archived. You cannot comment on pull requests.
No reviewers
No Label
No Milestone
No project
No Assignees
2 Participants
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: jolheiser/pwn#4
No description provided.