Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot import recipes from the wayback machine #2630

Open
solonovamax opened this issue Jan 6, 2025 · 0 comments
Open

Cannot import recipes from the wayback machine #2630

solonovamax opened this issue Jan 6, 2025 · 0 comments

Comments

@solonovamax
Copy link

Failing website:

https://web.archive.org/web/20210921133924/https://www.finecooking.com/recipe/carrot-fingerling-potato-and-pea-ragout

Checking if valid metadata are present:

Yes, I check the source code of the website and found metadata.

It has two application/ld+json script tags. The second one is the one which has the recipe information:

{
    "@context": "https:\/\/web.archive.org\/web\/20210921133924\/http:\/\/schema.org\/",
    "@type": "Recipe",
    "name": "Carrot, Fingerling Potato, and Pea Rago\u00fbt",
    "author": [
        {
            "@type": "Person",
            "name": "Susie Middleton",
            "url": "https:\/\/web.archive.org\/web\/20210921133924\/https:\/\/www.finecooking.com\/author\/susie-middleton"
        }
    ],
    "datePublished": "2012-03-01",
    "image": "https:\/\/web.archive.org\/web\/20210921133924\/https:\/\/s3.amazonaws.com\/finecooking.s3.tauntonclud.com\/app\/uploads\/2017\/04\/18125930\/051116057-01-spring-vegetable-ragout-thumb1x1.jpg",
    "description": "Hearty caramelized carrots and potatoes are the base for this delicious side dish, while peas, baby spinach, lemon, and tarragon add a bright, fresh twist. Serve with roast chicken or…",
    "recipeYield": " 4 to 6 servings",
    "recipeIngredient": [
        "1 medium lemon",
        "1 tsp. balsamic vinegar",
        "1 tsp. maple syrup",
        "3-1\/2 Tbs. unsalted butter, chilled",
        "2 Tbs. extra-virgin olive oil; more as needed",
        "1-1\/2lb. large carrots, cut into 2-inch-long, 1\/2-inch-thick sticks",
        "Kosher salt",
        "12 oz. small fingerling potatoes, cut in half lengthwise (if longer than 2 inches, cut in half crosswise)",
        "1 cup lower-salt chicken broth or water",
        "1-1\/2 tsp. minced garlic",
        "3 oz. (about 3\/4 cup) fresh peas, blanched, or frozen peas, thawed",
        "2 oz. stemmed baby spinach leaves",
        "2 tsp. chopped fresh tarragon"
    ],
    "recipeInstructions": [
        "Finely grate the lemon to yield 1 tsp. zest and juice it to yield 1-1\/2 tsp. juice. In a small bowl, combine the zest, juice, vinegar, maple syrup, and 1 Tbs. water.",
        "In a 5- to 6-quart Dutch oven (or other deep, wide pan), heat 1 Tbs. of the butter and the olive oil over low heat. Add the carrots and 3\/4 tsp. salt. Cover and cook, stirring frequently but gently, until the carrots are nicely browned and just tender, about 20 minutes. With a slotted spoon, transfer the carrots to a large plate.",
        "Add 1 Tbs. butter to the remaining fat in the pan. (If there\u2019s no fat in the pan, add 1 Tbs. olive oil too.) When the butter has melted, arrange the fingerlings cut side down in a single layer in the pan and season with 3\/4 tsp. salt. Cover partially and cook, undisturbed, until the potatoes are deep golden-brown on the bottom, 5 to 7 minutes. Add the chicken broth or water and bring to a boil; reduce to a simmer and cover partially. Cook until the potatoes are tender and the liquid has reduced to 2 to 3 Tbs., 12 to 14 minutes.",
        "Add the garlic to the potatoes and cook, stirring very gently, until fragrant, about 30 seconds. Add the reserved carrots and the peas, spinach, and lemon juice mixture. Stir gently until the spinach is wilted, 1 to 2 minutes. Remove the pan from the heat and stir in the remaining 1-1\/2 Tbs. butter until just melted. Stir in the tarragon. Transfer the vegetables to a platter and serve."
    ],
    "recipeCategory": "Side dishes",
    "recipeCuisine": "French",
    "nutrition": {
        "@type": "NutritionInformation",
        "servingSize": " 4 to 6",
        "calories": "210 kcal",
        "fatContent": "110 kcal",
        "saturatedFatContent": "5 g",
        "transFatContent": "12 g",
        "carbohydrateContent": "25 g",
        "fiberContent": "5 g",
        "proteinContent": "4 g",
        "cholesterolContent": "20 mg",
        "sodiumContent": "410 mg",
        "unsaturatedFatContent": "6 g"
    },
    "aggregateRating": {
        "@type": "AggregateRating",
        "ratingValue": "5",
        "ratingCount": "4"
    },
    "isPartOf": {
        "@type": "PublicationIssue",
        "name": "Issue 116",
        "url": "https:\/\/web.archive.org\/web\/20210921133924\/https:\/\/www.finecooking.com\/issue\/2012\/03\/issue-116",
        "isPartOf": {
            "@type": "Periodical",
            "name": "Fine Cooking Magazine",
            "publisher": {
                "@type": "Organization",
                "name": "Fine Cooking",
                "url": "https:\/\/web.archive.org\/web\/20210921133924\/https:\/\/www.finecooking.com",
                "sameAs": [
                    "https:\/\/web.archive.org\/web\/20210921133924\/https:\/\/twitter.com\/finecooking",
                    "https:\/\/web.archive.org\/web\/20210921133924\/https:\/\/www.facebook.com\/FineCooking",
                    "https:\/\/web.archive.org\/web\/20210921133924\/https:\/\/www.instagram.com\/finecookingmag\/",
                    "https:\/\/web.archive.org\/web\/20210921133924\/https:\/\/www.pinterest.com\/finecooking\/"
                ],
                "logo": {
                    "@type": "ImageObject",
                    "url": "https:\/\/web.archive.org\/web\/20210921133924\/https:\/\/www.finecooking.com\/app\/plugins\/finecooking\/assets\/img\/fc-logo-black.png"
                }
            }
        },
        "issueNumber": "116",
        "image": "https:\/\/web.archive.org\/web\/20210921133924\/https:\/\/s3.amazonaws.com\/finecooking.s3.tauntonclud.com\/app\/uploads\/2017\/04\/18212453\/issue_116.jpg"
    }
}

Cookbook version: 0.11.2

Problem description (if applicable):

Cannot import recipes from archive.org.

This is because archive.org modifies the html and prepends an https://web.archive.org/... url to all urls.

If you look at the above json-ld metadata, you will notice that "@context": "http:\/\/schema.org\/" has been replaced with "@context": "https:\/\/web.archive.org\/web\/20210921133924\/http:\/\/schema.org\/", which causes this function to return false:

public function isSchemaContext(string $context): bool {
return preg_match('@^https?://schema\.org/?$@', $context) == 1;
}

The alternative I would suggest is doing:

public function isSchemaContext(string $context): bool {
	return preg_match('@^https?://schema\.org/?$@', $context) == 1 || preg_match('@^https?://web.archive.org/web/\d+/https?://schema\.org/?$@', $context) == 1;
}

This will account for the url being prefixed with https://web.archive.org/...

I have checked

and neither of them modify the @context property in the json-ld, however there may possibly be other archival sites which do modify it. So perhaps the regex could instead be changed to @https?://schema\.org/?$@, that way it doesn't need to match the beginning of the string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant