Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding option to match without replacement. #20

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

tlooden
Copy link

@tlooden tlooden commented Aug 7, 2019

Hi Ben,

Thanks for making this nice tool! If you like, i've implemented a new feature that is common in eg. R packages for the same purpose which is to have the option to match without replacement. The downsides to this can be slightly worse matching overall as well as possible order effects - however for some types of analyses you really want to have unique subjects in each group. Now the user has the choice to make that decision! :)

I've also implemented (line 189) a randomization for the order in which the matching proceeds. This is so that you can check for said ordering effects, and e.g. run it a couple of times until the matching is at a desirable level.

Please let me know if i can make anything more clear.
it's my first GH pull request so i hope i am following the right protocol.

All the best!

Tristan

…d body

changed 'matchtype' to 'with_replacement' in matcher function body
Before, the 'threshold' parameter was only references for method='random'.
@skjerns
Copy link

skjerns commented May 17, 2020

same as #13 , this somehow does not work correctly, or I did something wrong?

from pymatch.Matcher import Matcher
import pandas as pd

cases_ages =[23, 21, 26, 25, 23, 44, 24, 22, 46, 26]
controls_ages = [34, 30, 24, 25, 25, 27, 30, 33, 53, 27, 26, 28, 23, 23, 28, 23, 24, 22, 23, 25]
cases_group = [1 for _ in range(len(cases_ages))]
controls_group = [0 for _ in range(len(cases_ages))]

df_cases = pd.DataFrame(list(zip(cases_ages, cases_group )), columns=['age', 'group'])
df_controls = pd.DataFrame(list(zip(controls_ages, controls_group )), columns=['age', 'group'])

m = Matcher(df_cases , df_controls , yvar='group')
m.fit_scores(balance=True, nmodels=100)
m.match(method='min', nmatches=1, with_replacement=False)
print(m.matched_data)
# only 4 matches are found?

@harveyaa
Copy link

@tlooden Thank you for this feature :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants