72

backstory:

คุณสนุกกับงานเขียนโปรแกรมใหม่ของคุณที่ บริษัท ขนาดใหญ่ อย่างไรก็ตามคุณไม่ได้รับอนุญาตให้เรียกดูเว็บเนื่องจากคอมพิวเตอร์ของคุณมี CLI เท่านั้น พวกเขายังเรียกใช้ฮาร์ดไดรฟ์ของพนักงานทุกคนดังนั้นคุณจึงไม่สามารถดาวน์โหลดเว็บเบราว์เซอร์ CLI ขนาดใหญ่ได้ คุณตัดสินใจที่จะสร้างเบราว์เซอร์แบบข้อความธรรมดาที่มีขนาดเล็กที่สุดเท่าที่จะเป็นไปได้เพื่อให้คุณสามารถจดจำและพิมพ์ลงในไฟล์ชั่วคราวทุกวัน

ท้าทาย:

งานของคุณคือการสร้างเว็บเบราว์เซอร์ golfed ภายในอินเตอร์เฟสบรรทัดคำสั่ง มันควรจะ:

ใช้ URL เดียวผ่านทาง args หรือ stdin
แยกdirectoryและhostส่วนประกอบของ URL
ส่งคำร้องขอ HTTP แบบง่ายไปยังhostเพื่อขอคำสั่งดังกล่าวdirectory
พิมพ์เนื้อหาของแท็กวรรคใด ๆ
และออกหรือขอหน้าอื่น

ข้อมูลเพิ่มเติม:

คำขอ HTTP แบบง่ายมีลักษณะดังนี้:

GET {{path}} HTTP/1.1
Host: {{host}}
Connection: close
\n\n

การเน้นบรรทัดใหม่สิ้นสุด

คำตอบทั่วไปดูเหมือนว่า:

HTTP/1.1 200 OK\n
<some headers separated by newlines>
\n\n
<html>
....rest of page

กฎ:

ต้องการทำงานบนพอร์ต 80 เท่านั้น (ไม่จำเป็นต้องใช้ SSL)
คุณไม่สามารถใช้ netcat
ไม่ว่าจะใช้ภาษาใดในการเขียนโปรแกรมอนุญาตให้ใช้ TCP API ระดับต่ำเท่านั้น (ยกเว้น netcat)
คุณไม่สามารถใช้ GUI จำได้ว่าเป็น CLI
คุณไม่สามารถใช้ตัวแยกวิเคราะห์ HTML ได้ยกเว้นตัวบิวอิน (BeautifulSoup ไม่ใช่บิวด์อิน)
โบนัส!! หากโปรแกรมของคุณวนกลับมาและขอ URL อื่นแทนออก -40 ตัวอักษร (ตราบใดที่คุณไม่ใช้การเรียกซ้ำ)
ไม่มีโปรแกรมบุคคลที่สาม จำไว้ว่าคุณไม่สามารถติดตั้งอะไรได้เลย
code-golfดังนั้นจำนวนไบต์ที่สั้นที่สุดจะเป็นผู้ชนะ

code-golf parsing internet

— แพทย์
แหล่งที่มา

7

Pythonimport webbrowser;webbrowser.open(url)

— Blue

8

@muddyfish อ่านกฎ

— TheDoctor

4

คุณสามารถให้หน้าตัวอย่างบางประเภทสำหรับทดสอบสิ่งนี้ได้ไหม มันยากที่จะหาสถานที่ที่ใช้ : P

— สปาเก็ตตี้

52

เราอนุญาตให้แยกวิเคราะห์ HTML โดยใช้ regexหรือไม่ ;-)

— Digital Trauma

3

ข้อ จำกัด ของอินเตอร์เฟสซ็อกเก็ตระดับต่ำดูเหมือนว่าจะห้าม API ระดับ TCP ของภาษาส่วนใหญ่ที่มี API ระดับ TCP

— Peter Taylor

63

Pure Bash (ไม่มีค่าสาธารณูปโภค), 200 ไบต์ - 40 โบนัส = 160

while read u;do
u=${u#*//}
d=${u%%/*}
exec 3<>/dev/tcp/$d/80
echo "GET /${u#*/} HTTP/1.1
host:$d
Connection:close
">&3
mapfile -tu3 A
a=${A[@]}
a=${a#*<p>}
a=${a%</p>*}
echo "${a//<\/p>*<p>/"
"}"
done

ฉันคิดว่ามันขึ้นอยู่กับสเป็ค~~แต่แน่นอนระวังการแยก HTML โดยใช้ regex~~ฉันคิดว่าสิ่งเดียวที่เลวร้ายยิ่งกว่าการแยก HTML โดยใช้ regex คือการแยก HTML โดยใช้การจับคู่รูปแบบเชลล์

ตอนนี้เกี่ยวข้องกับการ...ขยายหลายบรรทัด แต่ละรายการ...อยู่ในบรรทัดเอาต์พุตแยก:

$ echo "http://example.com/" | ./smallbrowse.sh
This domain is established to be used for illustrative examples in documents. You may use this     domain in examples without prior coordination or asking for permission.
<a href="http://www.iana.org/domains/example">More information...</a>
$

— การบาดเจ็บทางดิจิตอล
แหล่งที่มา

35

คุณต้องจำเรื่องนี้ไว้ในวันพรุ่งนี้

— Conor O'Brien

14

+ ∞สำหรับ "การแยกวิเคราะห์ HTML โดยใช้การจับคู่รูปแบบเชลล์"

— SztupY

76

-1 เนื่องจากอวตารของคุณคือการส่งข้อความอ่อน

— TheDoctor

1

... คุณสามารถทำการเชื่อมต่อ TCP จาก Bash ได้หรือไม่ ตอนนี้ฉันกลัวอย่างแท้จริง!

— ทางคณิตศาสตร์

2

หมายเหตุ: /dev/tcpเป็นส่วนขยายเพิ่มเติมและอาจไม่มีอยู่ในบิลด์ของคุณ คุณต้องคอมไพล์ด้วย--enable-net-redirectionsเพื่อให้ได้

— Chris Down

21

PHP, 175 ไบต์ (215 - 40 โบนัส) 227 229 239 202 216 186ไบต์

ขอให้สนุกกับการท่องเว็บ:

for(;$i=parse_url(trim(fgets(STDIN))),fwrite($f=fsockopen($h=$i[host],80),"GET $i[path] HTTP/1.1
Host:$h
Connection:Close

");preg_match_all('!<p>(.+?)</p>!si',stream_get_contents($f),$r),print join("
",$r[1])."
");

อ่าน URL จากเช่นSTDIN http://www.example.com/ย่อหน้าเอาท์พุทคั่นด้วยบรรทัดใหม่ " \n"

Ungolfed

for(; $i=parse_url(trim(fgets(STDIN))); ) {
    $h = $i['host'];
    $f = fsockopen($h, 80);

    fwrite($f, "GET " . $i['path'] . " HTTP/1.1\nHost:" . $h . "\nConnection:Close\n\n");

    $c = stream_get_contents($f)

    preg_match_all('!<p>(.+?)</p>!si', $c, $r);
    echo join("\n", $r[1]) . "\n";
}

เวอร์ชันแรกรองรับหนึ่ง URL เท่านั้น

$i=parse_url($argv[1]);fwrite($f=fsockopen($h=$i[host],80),"GET $i[path] HTTP/1.1\nHost:$h\nConnection:Close\n\n");while(!feof($f))$c.=fgets($f);preg_match_all('!<p>(.+?)</p>!sim',$c,$r);foreach($r[1]as$p)echo"$p\n";

การแก้ไข

ตามที่ระบุไว้ในความคิดเห็นโดยBraintistฉันลืมที่จะรวมเส้นทาง ที่ได้รับการแก้ไขแล้วขอบคุณ ที่เพิ่มเข้ามา 30 ไบต์
ที่บันทึกไว้ 3 ไบต์โดยการตั้งค่า$c(บรรจุเนื้อหาของหน้า) ด้วยการแทน$c=$i=parse_url(trim(fgets(STDIN)));$c=''
บันทึก 12 ไบต์ด้วยการแทนที่\nด้วยบรรทัดใหม่ (5 ไบต์) หนึ่งwhile-loop ด้วยfor(2 ไบต์) วางเกือบทุกอย่างลงในนิพจน์ของfor(2 ไบต์) และแทนที่foreachด้วยjoin(3 ไบต์) ขอขอบคุณที่Blackhole
ที่บันทึกไว้ 3 ไบต์โดยการแทนที่fgetsด้วยstream_get_contentsขอขอบคุณที่bwoebi
ที่บันทึกไว้ 5 ไบต์โดยการเอา~~ใหม่เริ่มต้นของการ$cที่มันไม่จำเป็นอีกต่อไป~~ $cที่ทุกคน
บันทึก 1 ไบต์โดยลบตัวดัดแปลงรูปแบบmออกจาก Regex ขอบคุณที่ทำนา

— insertusernamehere
แหล่งที่มา

6

ที่เกี่ยวข้อง: stackoverflow.com/a/1732454/4766556

— สปาเก็ตตี้

1

@ ผู้ดำเนินการโอ้โอ้ฉันพลาดไปทั้งหมด : D ขอบคุณมันได้รับการแก้ไขแล้ว

— insertusernamehere

1

ฉันไม่สามารถยืนได้ว่า Perl เต้น PHP ดังนั้นอย่าลืม: whileเป็นสิ่งต้องห้ามเมื่อเล่นกอล์ฟ ( forมักจะสั้นกว่า แต่ไม่นาน) และเพื่อขึ้นบรรทัดใหม่เพียงกด enter (1 ไบต์แทน 2 สำหรับ\n)! นี่คือรหัส (ยังไม่ทดลอง) ของคุณเพิ่มขึ้นอีกเล็กน้อย (227 ไบต์) โดยขึ้นบรรทัดใหม่แทนที่ด้วย↵:

for(;$c=$i=parse_url(trim(fgets(STDIN))),fwrite($f=fsockopen($h=$i[host],80),"GET $i[path] HTTP/1.1↵Host:$h↵Connection:Close↵↵");preg_match_all('!<p>(.+?)</p>!sim',$c,$r),print join('↵',$r[1]).'↵')for(;!feof($f);)$c.=fgets($f);

— Blackhole

1

ฉันไม่ได้หมายถึง "ต้องห้าม" ว่าเป็น "ต่อกฎ" ฉันหมายถึงว่ามันไม่ได้มีประโยชน์เลยเพราะfor-loop ดีกว่าwhile-loop;)

— Blackhole

1

@MichaelDibbets จริงๆแล้วฉันทำไปแล้วตามที่เขียนไว้ในการแก้ไข ฮึ่ม ให้ฉันดู. ฮ่าฮ่าฉันลืมคัดลอกและนับตัวอย่างข้อมูลสุดท้าย Duh : D สิ่งนี้เกิดขึ้นถ้าคุณอัพเดทรหัสก่อนอาหารเช้า ขอบคุณที่ชี้นำ

— insertusernamehere

14

Perl, 132 ไบต์

155 ไบต์รหัส +17 สำหรับ-ln -MIO::Socket- 40 สำหรับการขอ URL อย่างต่อเนื่อง

เช่นเดียวกับคำตอบของ @ DigitalTrauma การแยกวิเคราะห์ regex HTML แจ้งให้เราทราบหากไม่เป็นที่ยอมรับ ไม่แยกวิเคราะห์ URL อีกต่อไป ... ฉันจะดูทีหลัง ... ใกล้ถึง Bash! ขอขอบคุณ @ Schwern ที่ช่วยฉัน 59 (!) ไบต์และ @ skmrxสำหรับแก้ไขข้อผิดพลาดเพื่อให้ได้รับโบนัส!

m|(http://)?([^/]+)(/(\S*))?|;$s=new IO::Socket::INET"$2:80";print$s "GET /$4 HTTP/1.1
Host:$2
Connection:close

";local$/=$,;print<$s>=~m|<p>(.+?)</p>|gs

การใช้

$perl -ln -MIO::Socket -M5.010 wb.pl 
example.com
This domain is established to be used for illustrative examples in documents. You may use this
    domain in examples without prior coordination or asking for permission.<a href="http://www.iana.org/domains/example">More information...</a>
example.org
This domain is established to be used for illustrative examples in documents. You may use this
    domain in examples without prior coordination or asking for permission.<a href="http://www.iana.org/domains/example">More information...</a>

— เฮสติ้งส์ Dom
แหล่งที่มา

ฉันแก้ไขข้อผิดพลาดและทำให้รหัสสั้นลงโดยลบความจำเป็นในการประกาศ $ h และ $ p หรือมีเส้นทางเริ่มต้น นอกจากนี้ยังไม่จำเป็นต้องมีการติดตาม / บนโฮสต์อีกต่อไป

— Schwern

1

เราเป็นผู้ชนะในตอนนี้ :)

— Schwern

ฉันคิดว่าฉันทำไปแล้วทั้งคืน :)

— Schwern

เนื่องจากสคริปต์ขอ URL อื่นแทนที่จะออกคุณสามารถเรียกร้องค่าเพิ่มเติม -40 ไบต์

— svsd

1

@ DigitalTrauma คุณถูกต้องแน่นอน! ฉันได้รับโบนัสขอบคุณ skmrx แก้ไขข้อผิดพลาดของฉันด้วย '$ /' และฉันจะไม่อยู่ใกล้คุณถ้าไม่ใช่ Schwern!

— Dom Hastings

13

PowerShell, 315 294 268 262 254 ไบต์

355 334 308 302 294 - 40 สำหรับพรอมต์

$u=[uri]$args[0]
for(){
$h=$u.Host
$s=[Net.Sockets.TcpClient]::new($h,80).GetStream()
$r=[IO.StreamReader]::new($s)
$w=[IO.StreamWriter]::new($s)
$w.Write("GET $($u.PathAndQuery) HTTP/1.1
HOST: $h

")
$w.Flush()
($r.ReadToEnd()|sls '(?s)(?<=<p>).+?(?=</p>)'-a).Matches.Value
[uri]$u=Read-Host
}

ต้องการ PowerShell v5

การสิ้นสุดบรรทัดทั้งหมด (รวมถึงที่อยู่ในสตริง) เป็นบรรทัดใหม่เท่านั้น\n(ขอบคุณBlackhole ) ซึ่งรองรับโดย PowerShell อย่างสมบูรณ์ (แต่ถ้าคุณกำลังทดสอบโปรดใช้ความระมัดระวัง ISE ใช้\r\n)

— briantist
แหล่งที่มา

4

+1 สำหรับการทำหน้าที่ของผู้ดูแลระบบเซิร์ฟเวอร์ของฉันจะปรากฏมีประสิทธิผลมากขึ้น

— thanby

HTTP ต้องการ CRLF ไม่ใช่ LF! [ HTTPSYNTAX ]

— แปรงสีฟัน

2

@toothbrush ฮา! จุดที่นำมาใช้ แต่บทบัญญัติความอดทนดูเหมือนจะมีผลเต็ม เห็นได้ชัดว่างานนี้เกี่ยวกับสิ่งที่ใช้งานได้และไม่ถูกต้อง (ไม่อย่างนั้นเราจะไม่แยกวิเคราะห์ HTML ด้วย regex และใช้ไลบรารี TCP ระดับต่ำแทนไลบรารีที่มีอยู่ที่ผ่านการทดสอบอย่างดี)

— ต้มตุ๋น

1

@briantist greenbytes.de/tech/webdav/rfc7230.html#rfc.section.3.5กล่าวว่า "ผู้รับอาจรับรู้ LF เดี่ยวเป็นเทอร์มิเนเตอร์บรรทัดและเพิกเฉยต่อ CR ใด ๆ ก่อนหน้านี้" ผมอ่านที่เป็นความหมายเว็บเซิร์ฟเวอร์ส่วนใหญ่จะใช้มันและคำถามที่แน่นอนไม่ได้บอกว่ามันจะต้องสร้างที่ถูกต้อง GETร้องขอ ... :)

— แปรงสีฟัน

8

สคริปต์ Groovy, 89 , 61 ไบต์

วนกลับสำหรับโบนัส 101- 40 = 61

System.in.eachLine{l->l.toURL().text.findAll(/<p>(?s)(.*?)<\/p>/).each{println it[3..it.length()-5]}}

ด้วย args เพียง89ไบต์

this.args[0].toURL().text.findAll(/<p>(?s)(.*?)<\/p>/).each{println it[3..it.length()-5]}

— rnet
แหล่งที่มา

1

Groovy ก้าวล้ำทุกคน อย่างที่ควรจะเป็น

— ปาเก็ตตี้

1

@quartata หากยังคงเป็นเช่นนั้นมันจะเป็นครั้งแรกเลย ... ;)

— Geobits

11

"อนุญาตให้ใช้ TCP API ระดับต่ำเท่านั้น"

— Digital Trauma

ใช่ฉันจะเห็นด้วยกับ @DigitalTrauma ว่านี่ไม่ได้ใช้ TCP API ระดับต่ำ กฎระบุว่าคุณต้องแยกโฮสต์และเส้นทางด้วยตัวคุณเอง

— TheDoctor

6

Bash (อาจจะโกง แต่ดูเหมือนจะอยู่ในกฎ) 144-40 = 105

while read a;do
u=${a#*//}
d=${u%%/*}
e=www.w3.org
exec 3<>/dev/tcp/$d/80
echo "GET /services/html2txt?url=$a HTTP/1.1
Host:$d
">&3
cat <&3
done

ขอบคุณ Digital Trauma

เนื่องจากฉันไม่จำเป็นต้องแยก URL สิ่งนี้ก็ใช้ได้เช่นกัน: 122-40 = 82

while read a;do
d=www.w3.org
exec 3<>/dev/tcp/$d/80
echo "GET /services/html2txt?url=$a HTTP/1.1
Host:$d
">&3   
cat <&3
done

— philcolbourn
แหล่งที่มา

8

ฉันจะยืนยันว่าการใช้ตัวแปลง html2txt ออนไลน์นี้เป็นช่องโหว่มาตรฐาน

— Digital Trauma

1

ใช่. และฉันก็ใช้แมวเพื่อแก้ปัญหาของคุณให้ปลอดภัย

— philcolbourn

5

C 512 ไบต์

#include <netdb.h>
int main(){char i,S[999],b[99],*p,s=socket(2,1,0),*m[]={"<p>","</p>"};long n;
gets(S);p=strchr(S,'/');*p++=0;struct sockaddr_in a={0,2,5<<12};memcpy(&a.
sin_addr,gethostbyname(S)->h_addr,4);connect(s,&a,16);send(s,b,sprintf(b,
"GET /%s HTTP/1.0\r\nHost:%s\r\nAccept:*/*\r\nConnection:close\r\n\r\n",p,S),0);
p=m[i=0];while((n=recv(s,b,98,0))>0)for(char*c=b;c<b+n;c++){while(*c==*p &&*++p)
c++;if(!*p)p=m[(i=!i)||puts("")];else{while(p>m[i]){if(i)putchar(c[m[i]-p]);p--;}
if(i)putchar(*c);}}}

ตามที่ฉันป้อนไว้ที่นี่อย่างอิสระฉันต้องใช้ที่อยู่เว็บโดยไม่มี "https: //" ชั้นนำ มันจะไม่จัดการกับคู่ซ้อนกันอย่างถูกต้อง :(

ผ่านการทดสอบอย่างกว้างขวางwww.w3.org/People/Berners-Lee/
มันทำงานเมื่อรวบรวมกับApple LLVM version 6.1.0 (clang-602.0.53) / Target: x86_64-apple-darwin14.1.1
มันมีพฤติกรรมที่ไม่ได้กำหนดไว้มากพอที่มันอาจไม่ทำงานที่อื่น

— AShelly
แหล่งที่มา

ฉันจะลงแทร็คเดียวกันอย่างคร่าว ๆ (segfaults นี้เมื่อรวบรวมด้วย gcc) แต่ควรเป็นไปได้ที่จะมีขนาดต่ำกว่า 400 ไบต์ในซีไม่แน่ใจเกี่ยวกับเสียงดังกราว แต่คุณไม่ควรจะประกาศประเภทการคืนของหลัก คุณยังสามารถลบการรวมและ "เข้าถึง" structs เป็นอาร์เรย์จำนวนเต็มแทน ฉันยังได้รับการตอบสนองด้วย "GET /% s HTTP / 1.1 \ r \ n \ r \ n \" แต่ระยะทางที่อาจแตกต่างกันไปขึ้นอยู่กับเว็บไซต์ ...

— Comintern

5

ทับทิม, 118

147 ไบต์แหล่งที่มา; 11 ไบต์ ' -lprsocket'; -40 ไบต์สำหรับการวนซ้ำ

*_,h,p=$_.split'/',4
$_=(TCPSocket.new(h,80)<<"GET /#{p} HTTP/1.1
Host:#{h}
Connection:close

").read.gsub(/((\A|<\/p>).*?)?(<p>|\Z)/mi,'
').strip

ตัวอย่างการใช้งาน:

$ ruby -lprsocket wb.rb
http://example.org/
This domain is established to be used for illustrative examples in documents. You may use this
    domain in examples without prior coordination or asking for permission.
<a href="http://www.iana.org/domains/example">More information...</a>
http://www.xkcd.com/1596/
Warning: this comic occasionally contains strong language (which may be unsuitable for children), unusual humor (which may be unsuitable for adults), and advanced mathematics (which may be unsuitable for liberal-arts majors).

This work is licensed under a
<a href="http://creativecommons.org/licenses/by-nc/2.5/">Creative Commons Attribution-NonCommercial 2.5 License</a>.


This means you're free to copy and share these comics (but not to sell them). <a rel="license" href="/license.html">More details</a>.

— ezrast
แหล่งที่มา

4

AutoIt 347 ไบต์

Func _($0)
$4=StringTrimLeft
$0=$4($0,7)
$3=StringSplit($0,"/")[1]
TCPStartup()
$2=TCPConnect(TCPNameToIP($3),80)
TCPSend($2,'GET /'&$4($0,StringLen($3))&' HTTP/1.1'&@LF&'Host: '&$3&@LF&'Connection: close'&@LF&@LF)
$1=''
Do
$1&=TCPRecv($2,1)
Until @extended
For $5 In StringRegExp($1,"(?s)\Q<p>\E(.*?)(?=\Q</p>\E)",3)
ConsoleWrite($5)
Next
EndFunc

การทดสอบ

การป้อนข้อมูล:

_('http://www.autoitscript.com')

เอาท์พุท:

You don't have permission to access /error/noindex.html
on this server.

การป้อนข้อมูล:

_('http://www.autoitscript.com/site')

เอาท์พุท:

The document has moved <a href="https://www.autoitscript.com/site">here</a>.

หมายเหตุ

ไม่รองรับแท็กที่ซ้อนกัน
รองรับแท็กเท่านั้น(ไม่คำนึงถึงตัวพิมพ์เล็กและตัวพิมพ์ใหญ่) ซึ่งจะแตกหักในทุกรูปแบบของแท็กอื่น ๆ
~~Panics~~ Loops ไม่สิ้นสุดเมื่อมีข้อผิดพลาดเกิดขึ้น

— mınxomaτ
แหล่งที่มา

4

C #, 727 Bytes - 40 = 687 Bytes

using System.Text.RegularExpressions;class P{static void Main(){a:var i=System.Console.ReadLine();if(i.StartsWith("http://"))i=i.Substring(7);string p="/",h=i;var l=i.IndexOf(p);
if(l>0){h=i.Substring(0,l);p=i.Substring(l,i.Length-l);}var c=new System.Net.Sockets.TcpClient(h,80);var e=System.Text.Encoding.ASCII;var d=e.GetBytes("GET "+p+@" HTTP/1.1
Host: "+h+@"
Connection: close

");var s=c.GetStream();s.Write(d,0,d.Length);byte[]b=new byte[256],o;var m=new System.IO.MemoryStream();while(true){var r=s.Read(b,0,b.Length);if(r<=0){o=m.ToArray();break;}m.Write(b,0,r);}foreach (Match x in new Regex("<p>(.+?)</p>",RegexOptions.Singleline).Matches(e.GetString(o)))System.Console.WriteLine(x.Groups[1].Value);goto a;}}

มันเป็นการฝึกอบรมเล็กน้อย แต่ก็น่าจดจำอย่างแน่นอน :)

นี่คือเวอร์ชั่นที่ไม่ได้รับการอวด:

using System.Text.RegularExpressions;
class P
{
    static void Main()
    {
    a:
        var input = System.Console.ReadLine();
        if (input.StartsWith("http://")) input = input.Substring(7);
        string path = "/", hostname = input;
        var firstSlashIndex = input.IndexOf(path);
        if (firstSlashIndex > 0)
        {
            hostname = input.Substring(0, firstSlashIndex);
            path = input.Substring(firstSlashIndex, input.Length - firstSlashIndex);
        }
        var tcpClient = new System.Net.Sockets.TcpClient(hostname, 80);
        var asciiEncoding = System.Text.Encoding.ASCII;
        var dataToSend = asciiEncoding.GetBytes("GET " + path + @" HTTP/1.1
Host: " + hostname + @"
Connection: close

");
        var stream = tcpClient.GetStream();
        stream.Write(dataToSend, 0, dataToSend.Length);
        byte[] buff = new byte[256], output;
        var ms = new System.IO.MemoryStream();
        while (true)
        {
            var numberOfBytesRead = stream.Read(buff, 0, buff.Length);
            if (numberOfBytesRead <= 0)
            {
                output = ms.ToArray();
                break;
            }
            ms.Write(buff, 0, numberOfBytesRead);
        }
        foreach (Match match in new Regex("<p>(.+?)</p>", RegexOptions.Singleline).Matches(asciiEncoding.GetString(output)))
        {
            System.Console.WriteLine(match.Groups[1].Value);
            goto a;
        }
    }
}

อย่างที่คุณเห็นมีปัญหาการรั่วไหลของหน่วยความจำเป็นโบนัส :)

— Stephan Schinkel
แหล่งที่มา

หน่วยความจำรั่วอยู่ที่ไหน ฉันไม่เห็นusingคำแถลงรอบ ๆ ลำธาร แต่มันไม่ทำให้เกิดการรั่ว

— Gusdor

คุณสามารถตัดแต่งอีกสองสามไบต์: input = input.trimStart ("http: //") จะแทนที่ส่วน "if" และคุณควรใช้ System.Text.Encoding.ASCII.GetBytes () โดยตรงโดยไม่ต้องมี เพื่อเก็บไว้ใน asciiEncoding ก่อน ฉันคิดว่าคุณจะออกมาข้างหน้าด้วย "กำลังใช้ระบบ" บรรทัดและกำจัด "System." จำนวนหนึ่ง

— minnmass

3

JavaScript (NodeJS) - 187 166

s=require("net").connect(80,p=process.argv[2],_=>s.write("GET / HTTP/1.0\nHost: "+p+"\n\n")&s.on("data",d=>(d+"").replace(/<p>([^]+?)<\/p>/g,(_,g)=>console.log(g))));

187:

s=require("net").connect(80,p=process.argv[2],_=>s.write("GET / HTTP/1.1\nHost: "+p+"\nConnection: close\n\n")&s.on("data",d=>(d+"").replace(/<p>([^]+?)<\/p>/gm,(_,g)=>console.log(g))));

การใช้งาน:

node file.js www.example.com

หรือฟอร์แมต

var url = process.argv[2];
s=require("net").connect(80, url ,_=> {
     s.write("GET / HTTP/1.1\nHost: "+url+"\nConnection: close\n\n");
     s.on("data",d=>(d+"").replace(/<p>([^]+?)<\/p>/gm,(_,g)=>console.log(g)))
});

— Benjamin Gruenbaum
แหล่งที่มา

1

Caveat: สิ่งนี้จะใช้ได้กับหน้าเล็ก ๆ - หน้าใหญ่จะปล่อยข้อมูลหลาย ๆ เหตุการณ์

— Benjamin Gruenbaum

3

Python 2 - 212 209 ไบต์

import socket,re
h,_,d=raw_input().partition('/')
s=socket.create_connection((h,80))
s.sendall('GET /%s HTTP/1.1\nHost:%s\n\n'%(d,h))
p=''
while h:h=s.recv(9);p+=h
for g in re.findall('<p>(.*?)</p>',p):print g

— Zac Crites
แหล่งที่มา

คุณสามารถบันทึกไบต์ที่สองโดยการลอกออกช่องว่างหลังจากลำไส้ใหญ่ในและก่อนที่while h: print g

— Skyler

'GET /%s HTTP/1.1\nHost:%s\n\n'และไบต์อีกด้วย

— Cees Timmerman

3

Python 2, 187 - 40 = 147 (141 ใน REPL)

คำตอบของ Zac ที่บีบอัดและวนซ้ำ:

import socket,re
while 1:h,_,d=raw_input().partition('/');s=socket.create_connection((h,80));s.sendall('GET /%s HTTP/1.1\nHost:%s\n\n'%(d,h));print re.findall('<p>(.*?)</p>',s.recv(9000))

ตัวอย่าง:

dictionary.com
['The document has moved <a href="http://dictionary.reference.com/">here</a>.']
dictionary.reference.com
[]
paragraph.com
[]
rare.com
[]

มีประโยชน์จริง ๆ คือ:

207 - 40 = 167

import socket,re
while 1:h,_,d=raw_input().partition('/');s=socket.create_connection((h,80));s.sendall('GET /%s HTTP/1.1\nHost:%s\n\n'%(d,h));print'\n'.join(re.findall('<p>(.*?)</p>',s.recv(9000),re.DOTALL))

ตัวอย่าง:

example.org
This domain is established to be used for illustrative examples in documents. You may use this
    domain in examples without prior coordination or asking for permission.
<a href="http://www.iana.org/domains/example">More information...</a>
www.iana.org/domains/example
The document has moved <a href="/domains/reserved">here</a>.
www.iana.org/domains/reserved

dictionary.com
The document has moved <a href="http://dictionary.reference.com/">here</a>.
dictionary.reference.com

catb.org

      <a href="http://validator.w3.org/check/referer"><img
          src="http://www.w3.org/Icons/valid-xhtml10"
          alt="Valid XHTML 1.0!" height="31" width="88" /></a>

This is catb.org, named after (the) Cathedral and the Bazaar. Most
of it, under directory esr, is my personal site.  In theory other
people could shelter here as well, but this has yet to occur.
catb.org/jargon
The document has moved <a href="http://www.catb.org/jargon/">here</a>.
www.catb.org/jargon/
This page indexes all the WWW resources associated with the Jargon File
and its print version, <cite>The New Hacker's Dictionary</cite>. It's as
official as anything associated with the Jargon File gets.
On 23 October 2003, the Jargon File achieved the
dubious honor of being cited in the SCO-vs.-IBM lawsuit.  See the <a
href='html/F/FUD.html'>FUD</a> entry for details.
www.catb.org/jargon/html/F/FUD.html
 Defined by Gene Amdahl after he left IBM to found his own company:
   &#8220;<span class="quote">FUD is the fear, uncertainty, and doubt that IBM sales people
   instill in the minds of potential customers who might be considering
   [Amdahl] products.</span>&#8221; The idea, of course, was to persuade them to go
   with safe IBM gear rather than with competitors' equipment.  This implicit
   coercion was traditionally accomplished by promising that Good Things would
   happen to people who stuck with IBM, but Dark Shadows loomed over the
   future of competitors' equipment or software.  See
   <a href="../I/IBM.html"><i class="glossterm">IBM</i></a>.  After 1990 the term FUD was associated
   increasingly frequently with <a href="../M/Microsoft.html"><i class="glossterm">Microsoft</i></a>, and has
   become generalized to refer to any kind of disinformation used as a
   competitive weapon.
[In 2003, SCO sued IBM in an action which, among other things,
   alleged SCO's proprietary control of <a href="../L/Linux.html"><i class="glossterm">Linux</i></a>.  The SCO
   suit rapidly became infamous for the number and magnitude of falsehoods
   alleged in SCO's filings.  In October 2003, SCO's lawyers filed a <a href="http://www.groklaw.net/article.php?story=20031024191141102" target="_top">memorandum</a>
   in which they actually had the temerity to link to the web version of
   <span class="emphasis"><em>this entry</em></span> in furtherance of their claims. Whilst we
   appreciate the compliment of being treated as an authority, we can return
   it only by observing that SCO has become a nest of liars and thieves
   compared to which IBM at its historic worst looked positively
   angelic. Any judge or law clerk reading this should surf through to
   <a href="http://www.catb.org/~esr/sco.html" target="_top">my collected resources</a> on this
   topic for the appalling details.&#8212;ESR]

— Timesman Cees
แหล่งที่มา

1

gawk, 235 - 40 = 195 ไบต์

{for(print"GET "substr($0,j)" HTTP/1.1\nHost:"h"\n"|&(x="/inet/tcp/0/"(h=substr($0,1,(j=index($0,"/"))-1))"/80");(x|&getline)>0;)w=w RS$0
for(;o=index(w,"<p>");w=substr(w,c))print substr(w=substr(w,o+3),1,c=index(w,"/p>")-2)
close(x)}

นำมันลงมา แต่นี่เป็นเวอร์ชันที่ไม่ได้ยกให้ซึ่งต้องมีที่อยู่เว็บโดยไม่ต้องhttp://เริ่มต้น /และถ้าคุณต้องการที่จะเข้าถึงไดเรกทอรีรากที่คุณต้องจบอยู่ด้วย นอกจากนี้แท็กจะต้องเป็นตัวพิมพ์เล็ก

ที่จริงแล้วเวอร์ชันก่อนหน้าของฉันไม่ได้จัดการบรรทัดที่มีอย่างถูกต้อง ตอนนี้ได้รับการแก้ไขแล้ว

เอาต์พุตสำหรับอินพุต `example.com/`

This domain is established to be used for illustrative examples in documents. You may use this
    domain in examples without prior coordination or asking for permission.
<a href="http://www.iana.org/domains/example">More information...</a>

ยังไม่ทำงานกับ Wikipedia ฉันคิดว่าเหตุผลก็คือที่ Wikipedia ใช้httpsสำหรับทุกสิ่ง แต่ฉันไม่รู้

รุ่นต่อไปนี้เป็นเพียงการให้อภัยเล็กน้อยกับอินพุตและสามารถจัดการแท็กตัวพิมพ์ใหญ่ได้เช่นกัน

IGNORECASE=1{
    s=substr($0,(i=index($0,"//"))?i+2:0)
    x="/inet/tcp/0/"(h=(j=index(s,"/"))?substr(s,1,j-1):s)"/80"
    print"GET "substr(s,j)" HTTP/1.1\nHost:"h"\nConnection:close\n"|&x
    while((x|&getline)>0)w=w RS$0
    for(;o=index(w,"<p>");w=substr(w,c))
        print substr(w=substr(w,o+3),1,c=index(w,"/p>")-2)
    close(x)
}

ฉันไม่แน่ใจเกี่ยวกับ"Connection:close"สาย ดูเหมือนจะไม่ได้รับคำสั่ง ฉันไม่พบตัวอย่างที่จะทำงานแตกต่างกับหรือไม่มีมัน

— Cabbie407
แหล่งที่มา

1

PowerShell (4) 240

$input=Read-Host ""
$url=[uri]$input
$dir=$url.LocalPath
Do{
$res=Invoke-WebRequest -URI($url.Host+"/"+$dir) -Method Get
$res.ParsedHtml.getElementsByTagName('p')|foreach-object{write-host $_.innerText}
$dir=Read-Host ""
}While($dir -NE "")

Ungolfed (ไม่จำเป็นต้องใช้พรอกซี)

$system_proxyUri=Get-ItemProperty -Path "HKCU:\Software\Microsoft\Windows\CurrentVersion\Internet Settings" -Name ProxyServer
$proxy = [System.Net.WebRequest]::GetSystemWebProxy()
$proxyUri = $proxy.GetProxy($system_proxyUri.ProxyServer)
$input = Read-Host "Initial url"
#$input="http://stackoverflow.com/questions/tagged/powershell"
$url=[uri]$input
$dir=$url.LocalPath
Do{
$res=Invoke-WebRequest -URI($url.Host+"/"+$dir) -Method Get -Proxy($proxyUri)
$res.ParsedHtml.getElementsByTagName('p')|foreach-object{write-host $_.innerText}
$dir=Read-Host "next dir"
}While($dir -NE "")

แก้ไข * ไม่ยากที่จะจดจำ ^^

— dwana
แหล่งที่มา

-1

Java 620 B

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;

public class JavaApplication12 {

    public static void main(String[] args) {
        try {             
            BufferedReader i = new BufferedReader(new InputStreamReader(new URL(args[0]).openStream()));
            String l;
            boolean print = false;
            while ((l = i.readLine()) != null) {
                if (l.toLowerCase().contains("<p>")) {
                    print = true;
                }
                if (print) {
                    if (l.toLowerCase().contains("</p>")) {
                        print = false;
                    }
                    System.out.println(l);
                }
            }

        } catch (Exception e) {

        }
    }

}

— Shalika Ashan
แหล่งที่มา

2

ยินดีต้อนรับสู่การเขียนโปรแกรมปริศนา & รหัสกอล์ฟ! น่าเสียดายที่การส่งนี้ไม่ถูกต้อง คำถามเพียง แต่ช่วยให้เพียงระดับต่ำ APIs TCP InputStreamReaderดังนั้นคุณจึงไม่สามารถใช้

— Dennis

1

โอ้ฉันขอโทษและขอบคุณสำหรับการชี้ จะทำได้ดีกว่าในคำตอบต่อไป

— Shalika Ashan

เว็บเบราว์เซอร์ที่เล็กที่สุดในโลก

Pure Bash (ไม่มีค่าสาธารณูปโภค), 200 ไบต์ - 40 โบนัส = 160

PHP, 175 ไบต์ (215 - 40 โบนัส) 227 229 239 202 216 186ไบต์

Perl, 132 ไบต์

การใช้

PowerShell, 315 294 268 262 254 ไบต์

355 334 308 302 294 - 40 สำหรับพรอมต์

สคริปต์ Groovy, 89 , 61 ไบต์

C 512 ไบต์

ทับทิม, 118

ตัวอย่างการใช้งาน:

AutoIt 347 ไบต์

การทดสอบ

หมายเหตุ

C #, 727 Bytes - 40 = 687 Bytes

JavaScript (NodeJS) - 187 166

Python 2 - 212 209 ไบต์

Python 2, 187 - 40 = 147 (141 ใน REPL)

207 - 40 = 167

gawk, 235 - 40 = 195 ไบต์

เอาต์พุตสำหรับอินพุต example.com/

Java 620 B

เอาต์พุตสำหรับอินพุต `example.com/`